From dfenacci at openjdk.org  Mon Sep  1 06:50:28 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Mon, 1 Sep 2025 06:50:28 GMT
Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee ==
 m) failed: repeated inline attempt with different callee [v3]
In-Reply-To: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com>
References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com>
Message-ID: <mOvOsKEuhHRI02qXAr7krpQbxky-CXFBf7MDYNhHvQM=.b01e2923-6126-4b0a-8301-932770391ffe@github.com>

> # Issue
> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one.
> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan.
> 
> # Cause
> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading.
> 
> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method.
> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`.
> 
> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same.
> 
> # Fix
> 
> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. So, we change the assert to **check for invalid dependencies if the current callee and the previous one don't match**.
> 
> # Testing
> 
> This issue is very very, very intermittent and depending on a number of factors. This ...

Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:

  JDK-8355354: avoid resetting callee in call node ideal

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26441/files
  - new: https://git.openjdk.org/jdk/pull/26441/files/15bcb65e..ce807553

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=01-02

  Stats: 38 lines in 1 file changed: 4 ins; 12 del; 22 mod
  Patch: https://git.openjdk.org/jdk/pull/26441.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26441/head:pull/26441

PR: https://git.openjdk.org/jdk/pull/26441

From epeter at openjdk.org  Mon Sep  1 06:59:53 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 06:59:53 GMT
Subject: RFR: 8366357: C2 SuperWord: refactor VTransformNode::apply with
 VTransformApplyState [v3]
In-Reply-To: <kgs2oZQCU1ou7phUUBapuRzQu3rGjTUdtbQYXUGf5lY=.27339928-173b-429b-8bac-5b85a3cc58f7@github.com>
References: <h7XRXvzmCTCwr3PmURYdpwbi88G-2ykSLXiyjVl7B6w=.5edd2e56-6060-4e3f-95ea-b62b17d89c08@github.com>
 <-URf_iP7rH-Ev5PzEhDseBTqTTCuHiMEYkTdeksxP_0=.14d9721e-b5f9-4d0e-932f-78ca4a6ad12b@github.com>
 <kgs2oZQCU1ou7phUUBapuRzQu3rGjTUdtbQYXUGf5lY=.27339928-173b-429b-8bac-5b85a3cc58f7@github.com>
Message-ID: <dcc04EBfBFAjY65L048uRBhmVgBV6Nbi_UGEy4FFxMw=.9fc48ee9-5783-4819-b443-345e11dcf7cb@github.com>

On Thu, 28 Aug 2025 14:47:43 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   For Christian: use phase->intcon instead
>
> Marked as reviewed by mhaessig (Committer).

@mhaessig @vnkozlov @chhagedorn Thanks f?r the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26987#issuecomment-3241080195

From epeter at openjdk.org  Mon Sep  1 06:59:53 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 06:59:53 GMT
Subject: Integrated: 8366357: C2 SuperWord: refactor VTransformNode::apply with
 VTransformApplyState
In-Reply-To: <h7XRXvzmCTCwr3PmURYdpwbi88G-2ykSLXiyjVl7B6w=.5edd2e56-6060-4e3f-95ea-b62b17d89c08@github.com>
References: <h7XRXvzmCTCwr3PmURYdpwbi88G-2ykSLXiyjVl7B6w=.5edd2e56-6060-4e3f-95ea-b62b17d89c08@github.com>
Message-ID: <mXNpPfOeQilhzPD836dcFRMBCXydmJNn_nS7-rUXwk8=.b0cbe107-6340-4642-bd25-0e0893013cc4@github.com>

On Thu, 28 Aug 2025 12:57:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on **cost-modeling**, and am integrating some smaller changes from this proof-of-concept PR: https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a **pure refactoring** - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> The goal here is that `VTransformNode::apply` only needs a single argument. This is important, as we will soon add more components that need to be updated during apply. That way, we can simply add more parts to `VTransformApplyState`, and do not need to add more arguments to VTransformNode::apply.
> 
> And yes: I have considering passing the `apply_state` as `const`. While this may be possible with the current code state, the upcoming changes from https://github.com/openjdk/jdk/pull/20964 will require non-const access to the `apply_state` (e.g. for `set_memory_state`).
> 
> Also: Christian asked me to squeeze in some other change: `igvn.intcon` -> `phase->intcon`, so that we also set the control to root. It's not been strictly necessary, but probably better to do it.

This pull request has now been integrated.

Changeset: dbac620b
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/dbac620b996713087f0d1b1189e543e51a0bb09f
Stats:     131 lines in 3 files changed: 31 ins; 26 del; 74 mod

8366357: C2 SuperWord: refactor VTransformNode::apply with VTransformApplyState

Reviewed-by: chagedorn, kvn, mhaessig

-------------

PR: https://git.openjdk.org/jdk/pull/26987

From epeter at openjdk.org  Mon Sep  1 07:00:43 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 07:00:43 GMT
Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes [v2]
In-Reply-To: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
Message-ID: <p9TuG_EN1BZ1PnTwq9frzbLEXrFbKrwvEWt1hc6DkJE=.557c86c9-f461-4f70-a93a-125694c47752@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;)
> 
> We split the `VTransformScalarNode`:
> - `VTransformMemopScalarNode`
>   - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`.
>   - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`.
> - `VTransformLoopPhiNode`
>   - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges.
> - `VTransformCFGNode`
>   - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG.
> - `VTransformDataScalarNode`
>   - These represent all the normal "calculation" nodes in the loop.
> - `VTransformInputScalarNode` -> `VTransformOuterNode`:
>   - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later.
> 
> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`).

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Update src/hotspot/share/opto/vtransform.hpp
  
  Co-authored-by: Manuel H?ssig <manuel at haessig.org>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27002/files
  - new: https://git.openjdk.org/jdk/pull/27002/files/197d0896..86dac36b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27002&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27002&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27002.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27002/head:pull/27002

PR: https://git.openjdk.org/jdk/pull/27002

From mhaessig at openjdk.org  Mon Sep  1 07:00:44 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Mon, 1 Sep 2025 07:00:44 GMT
Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes [v2]
In-Reply-To: <p9TuG_EN1BZ1PnTwq9frzbLEXrFbKrwvEWt1hc6DkJE=.557c86c9-f461-4f70-a93a-125694c47752@github.com>
References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
 <p9TuG_EN1BZ1PnTwq9frzbLEXrFbKrwvEWt1hc6DkJE=.557c86c9-f461-4f70-a93a-125694c47752@github.com>
Message-ID: <rg1W-ZDZns2nL9FGlB5Wx02gwhXIFCnARxs9CZn5F_8=.624d6ee6-50fb-4bc5-bd8e-6421b4ca2636@github.com>

On Mon, 1 Sep 2025 06:56:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;)
>> 
>> We split the `VTransformScalarNode`:
>> - `VTransformMemopScalarNode`
>>   - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`.
>>   - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`.
>> - `VTransformLoopPhiNode`
>>   - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges.
>> - `VTransformCFGNode`
>>   - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG.
>> - `VTransformDataScalarNode`
>>   - These represent all the normal "calculation" nodes in the loop.
>> - `VTransformInputScalarNode` -> `VTransformOuterNode`:
>>   - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later.
>> 
>> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`).
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update src/hotspot/share/opto/vtransform.hpp
>   
>   Co-authored-by: Manuel H?ssig <manuel at haessig.org>

Thank you for your continued efforts, @eme64. The suspense is building for your big change...

This looks good to me, bar one typo.

Marked as reviewed by mhaessig (Committer).

src/hotspot/share/opto/vtransform.hpp line 454:

> 452: };
> 453: 
> 454: // Identity ransform for scalar loads and stores.

Suggestion:

// Identity transform for scalar loads and stores.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27002#pullrequestreview-3172282140
PR Review: https://git.openjdk.org/jdk/pull/27002#pullrequestreview-3172310479
PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2313027649

From epeter at openjdk.org  Mon Sep  1 07:08:56 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 07:08:56 GMT
Subject: RFR: 8366361: C2 SuperWord: rename VTransformNode::set_req ->
 init_req, analogue to Node::init_req [v2]
In-Reply-To: <zMcHnCVSmgkj53F43PilChJtIIKwqYbVgw-rEmUaB90=.b7ac6423-a979-4c61-957e-ff719208d68f@github.com>
References: <zMcHnCVSmgkj53F43PilChJtIIKwqYbVgw-rEmUaB90=.b7ac6423-a979-4c61-957e-ff719208d68f@github.com>
Message-ID: <H5muAU2r__jeioYyo3Ro_yrKuTx9MCi4dnPFag2W3tc=.d4f314fe-6e55-4dd8-a8ae-b02897ea72a7@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> The current implementation of `VTransformNode::set_req` has `init_req` semantics, it verifies that the corresponding input is still nullptr. We should thus rename it. It will also free up the "set_req" name for later use in VTransform optimizations, where we want to modify the graph.
> 
> See `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` in the proof-of-concept PR.
> 
> FYI: this PR is dependent on https://github.com/openjdk/jdk/pull/26987. I'll rebase once that one is integrated. We can still already review, so that the process is a little faster later on. (I have more small changes coming, but separating makes them more reviewable.)

Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:

 - Merge branch 'master' into JDK-8366361-vtn-init_req
 - JDK-8366361
 - For Christian: use phase->intcon instead
 - Update src/hotspot/share/opto/vtransform.hpp
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - JDK-8366357

-------------

Changes: https://git.openjdk.org/jdk/pull/26991/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26991&range=01
  Stats: 26 lines in 3 files changed: 0 ins; 0 del; 26 mod
  Patch: https://git.openjdk.org/jdk/pull/26991.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26991/head:pull/26991

PR: https://git.openjdk.org/jdk/pull/26991

From dskantz at openjdk.org  Mon Sep  1 07:11:25 2025
From: dskantz at openjdk.org (Daniel Skantz)
Date: Mon, 1 Sep 2025 07:11:25 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis
Message-ID: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>

This PR addresses a wrong compilation during string optimizations.

During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.

After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.

The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.

Testing: T1-3 (aed5952).

Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.

-------------

Commit messages:
 - ws
 - add an assert
 - revert to unfolded version of is_diamond
 - fix

Changes: https://git.openjdk.org/jdk/pull/27028/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27028&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8362117
  Stats: 85 lines in 2 files changed: 85 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27028.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27028/head:pull/27028

PR: https://git.openjdk.org/jdk/pull/27028

From chagedorn at openjdk.org  Mon Sep  1 07:14:44 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 1 Sep 2025 07:14:44 GMT
Subject: RFR: 8366361: C2 SuperWord: rename VTransformNode::set_req ->
 init_req, analogue to Node::init_req [v2]
In-Reply-To: <H5muAU2r__jeioYyo3Ro_yrKuTx9MCi4dnPFag2W3tc=.d4f314fe-6e55-4dd8-a8ae-b02897ea72a7@github.com>
References: <zMcHnCVSmgkj53F43PilChJtIIKwqYbVgw-rEmUaB90=.b7ac6423-a979-4c61-957e-ff719208d68f@github.com>
 <H5muAU2r__jeioYyo3Ro_yrKuTx9MCi4dnPFag2W3tc=.d4f314fe-6e55-4dd8-a8ae-b02897ea72a7@github.com>
Message-ID: <-xtJXkBZ8TsKvj1zsyDeaAlXECBhIju5TZzfxc3iuYg=.dd473b6c-ff01-4fd5-90d7-701e0407f9bc@github.com>

On Mon, 1 Sep 2025 07:08:56 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> The current implementation of `VTransformNode::set_req` has `init_req` semantics, it verifies that the corresponding input is still nullptr. We should thus rename it. It will also free up the "set_req" name for later use in VTransform optimizations, where we want to modify the graph.
>> 
>> See `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` in the proof-of-concept PR.
>> 
>> FYI: this PR is dependent on https://github.com/openjdk/jdk/pull/26987. I'll rebase once that one is integrated. We can still already review, so that the process is a little faster later on. (I have more small changes coming, but separating makes them more reviewable.)
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
> 
>  - Merge branch 'master' into JDK-8366361-vtn-init_req
>  - JDK-8366361
>  - For Christian: use phase->intcon instead
>  - Update src/hotspot/share/opto/vtransform.hpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - JDK-8366357

Marked as reviewed by chagedorn (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/26991#pullrequestreview-3172354981

From dfenacci at openjdk.org  Mon Sep  1 07:17:43 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Mon, 1 Sep 2025 07:17:43 GMT
Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee ==
 m) failed: repeated inline attempt with different callee [v3]
In-Reply-To: <X1YtSmVJpwSpxXHOjUCsAXE9RItwK5jltT0_-tEee9U=.52451930-1636-4a78-b9bb-0f6ba61482d1@github.com>
References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com>
 <kkvx19EsqyWM3aNy0H5v5rT-vVQ1TfmFkEgUTT5rqnY=.4cabbe52-f307-4cd1-8af9-99da064ad040@github.com>
 <kaWzDzWIuKnBN-va_h7wRvVhMMJFlNWReF3iIQGBR4w=.6af07502-0fd5-4d4d-87b4-c68408e9d1b9@github.com>
 <X1YtSmVJpwSpxXHOjUCsAXE9RItwK5jltT0_-tEee9U=.52451930-1636-4a78-b9bb-0f6ba61482d1@github.com>
Message-ID: <fLy9SqBwRT-zlaYfRiKhq9dzYfnZuzDEbXWV8IOSfQE=.4ff74c67-4966-46ce-b51f-7cb721302ae6@github.com>

On Fri, 29 Aug 2025 16:50:26 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> I second that. And it aligns with our effort to make CI queries report stable results.  
>> 
>> (FTR here's what I proposed to Damon privately: "Another alternative is to cache and reuse cg->callee_method() when it becomes non-null. And turn repeated CHA requests (Compile::optimize_inlining) into verification logic.")
>
>> I'm wondering if there might be other reasons that the callee might change, like JVMTI class redefinition
> 
> I guess there could be. For JVMTI we could possibly check for `Method::is_old` or `Method::is_obsolete`? But still, it might not be the only reason...
> 
>> so the easiest fix for class redefinition and CHA would be to ignore the new callee and keep the old one here.
> 
> I'm tempted by setting the callee if it is null and just removing the original assert but @iwanowww suggested moving the assert to the `Ideal` function. I've just pushed a change that should be doing that.

> so the easiest fix for class redefinition and CHA would be to ignore the new callee and keep the old one here.

> Another alternative is to cache and reuse cg->callee_method() when it becomes non-null.

Actually, I changed my mind after looking at Vladimir's advice: this alternative (ignoring the new callee if it is already set) is cleaner and simpler. Thanks @iwanowww and @dean-long for the suggestion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26441#discussion_r2313085032

From epeter at openjdk.org  Mon Sep  1 07:37:00 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 07:37:00 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
Message-ID: <nPTzMYzEUYJny3vO2sSKelMlFnsdxzKrKisedajsGlI=.d58e9b0e-dbfc-4cd1-8010-046621d48351@github.com>

On Fri, 29 Aug 2025 09:38:58 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
> 
>  - Restore modified java/lang/invoke tests
>  - Sort includes (new requirement)
>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>  - Add clarifying comments at definitions of register mask sizes
>  - Fix implicit zero and nullptr checks
>  - Add deep copy comment
>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>  - Fix typo
>  - Updates after Emanuel's comments
>  - Refactor and improve TestNestedSynchronize.java
>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47

Nice, looks like the old test issues are now gone. Great to see that ? 

I was looking for tests that verify what your PR title promises: that we successfully compile methods with many arguments.

The test you have looks like a good start: `TestMaxMethodArguments.java`

Do you think it would make sense to have more tests? I'm imagining something like this:
- Generate tests with 0-255 arguments. You could use the template framework.
- Take different types (e.g. various primitive types, also those that take 2 stack slots like long and double). You could use the template library `PrimitiveType` if you want.
- Test that we actually get the method compiled. Maybe an IR rule could be used here?
- And do some rudamentary result verification
- Make sure it does not just work with `Xcomp` but also under "normal" circumstances (tiered, profiling, etc).

I'll look a bit at your VM changes now ;)

test/hotspot/jtreg/compiler/arguments/TestMaxMethodArguments.java line 57:

> 55:         try {
> 56:             test(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217
 , 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255);
> 57:         } catch (TestException e) {

This seems to be the only test that actually tests what your PR title promises: it has a method with many arguments.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-3172429642
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313120394

From fyang at openjdk.org  Mon Sep  1 07:40:43 2025
From: fyang at openjdk.org (Fei Yang)
Date: Mon, 1 Sep 2025 07:40:43 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square)
In-Reply-To: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
Message-ID: <EsJhPSbVxJEkP9W7127uRJcJQMoZEc4JcGF0GdHEhBA=.c4568f6d-a70b-4cdc-a8f9-6e21ed157fd3@github.com>

On Tue, 26 Aug 2025 14:43:05 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> Hey, please consider!
> 
> A bunch of info in JBS entry, please read that also.
> 
> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
> This patch restores them and removes this regression.
> 
> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
> 
> Please test on your hardware!
> 
> 
> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> 
> 
> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.

That looks fine to me. I don't have other concerns modulo two minor typos.
FYI: My local hs:tier1-hs:tier3 test with fastdebug build is good.

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 125:

> 123:   set_stub_address_destination_at(stub_addr, dest);
> 124: 
> 125:   // patches jalr -> jal/jal -> jalr depeding on dest

Suggestion: s/depeding/depending/

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 146:

> 144: 
> 145:     address dest = stub_address_destination_at(stub_addr);
> 146:     optimize_call(dest, false); // patches jalr -> jal/jal -> jalr depeding on dest

Suggestion: s/depeding/depending/

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26944#pullrequestreview-3172332378
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313065100
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313095316

From jbhateja at openjdk.org  Mon Sep  1 07:54:50 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Mon, 1 Sep 2025 07:54:50 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v3]
In-Reply-To: <F_GuF6fALkPyF5Iz-sSY4NCu6msyYBT1ZIJ64HAqHHc=.d1c0388c-3f79-4fb6-9800-43a7e74e5643@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <_Wv0Roo5xUHjswP_JUy6yzoU5KCwNpIoX3S2QBceUbE=.05b5bbbd-840b-4162-a454-94a9ddc2a69f@github.com>
 <F_GuF6fALkPyF5Iz-sSY4NCu6msyYBT1ZIJ64HAqHHc=.d1c0388c-3f79-4fb6-9800-43a7e74e5643@github.com>
Message-ID: <vctouxyF_DZPik3yL78XZ80uIy4XEWV208upb8X6abw=.7c21b163-bda9-49e8-9b1f-7fbc8f0fb4e0@github.com>

On Mon, 1 Sep 2025 07:50:57 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix input size enum values for AVX 10.2 conversion instructions that take memory as the source
>
> src/hotspot/cpu/x86/x86.ad line 7804:
> 
>> 7802:   predicate(VM_Version::supports_avx10_2() &&
>> 7803:             is_integral_type(Matcher::vector_element_basic_type(n)));
>> 7804:   match(Set dst (VectorCastD2X src));
> 
> I assume your intent here is to feed the memory operand to the vector cast IR, a memory operand is first loaded into register using LoadVector IR, so a CISC / memory variant of pattern should consume the Load IR such that the operand is directly exposed to the instruction. Checkout https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8986

Make a similar change in all the newly added memory patterns.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2313167705

From jbhateja at openjdk.org  Mon Sep  1 07:54:50 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Mon, 1 Sep 2025 07:54:50 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v3]
In-Reply-To: <_Wv0Roo5xUHjswP_JUy6yzoU5KCwNpIoX3S2QBceUbE=.05b5bbbd-840b-4162-a454-94a9ddc2a69f@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <_Wv0Roo5xUHjswP_JUy6yzoU5KCwNpIoX3S2QBceUbE=.05b5bbbd-840b-4162-a454-94a9ddc2a69f@github.com>
Message-ID: <F_GuF6fALkPyF5Iz-sSY4NCu6msyYBT1ZIJ64HAqHHc=.d1c0388c-3f79-4fb6-9800-43a7e74e5643@github.com>

On Fri, 29 Aug 2025 23:46:18 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 
>> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-adv...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix input size enum values for AVX 10.2 conversion instructions that take memory as the source

src/hotspot/cpu/x86/x86.ad line 7804:

> 7802:   predicate(VM_Version::supports_avx10_2() &&
> 7803:             is_integral_type(Matcher::vector_element_basic_type(n)));
> 7804:   match(Set dst (VectorCastD2X src));

I assume your intent here is to feed the memory operand to the vector cast IR, a memory operand is first loaded into register using LoadVector IR, so a CISC / memory variant of pattern should consume the Load IR such that the operand is directly exposed to the instruction. Checkout https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8986

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2313165676

From galder at openjdk.org  Mon Sep  1 08:19:44 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 1 Sep 2025 08:19:44 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v4]
In-Reply-To: <0VA9QnuPSb55PbioO1XWtSmrAC-sQet0hb_ldRgKdFQ=.95f56a0b-3b08-4654-8f1e-7217cd9bcabe@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <Fc_WY3YbTlom7mg-PcoOAqvQN2N9r96MtavcEzikKkM=.792fdc5f-f86b-4e05-b409-4917e65b7dd1@github.com>
 <aJP823LfE3KlVIQ9ehYx2IY1J6ldDCxaiTh3hRxdyss=.4d439f2a-ab28-4b95-beb7-8e1c1b48e990@github.com>
 <0VA9QnuPSb55PbioO1XWtSmrAC-sQet0hb_ldRgKdFQ=.95f56a0b-3b08-4654-8f1e-7217cd9bcabe@github.com>
Message-ID: <5xrZ-TcQ9OaMFIAMGIMTDCwGdexIMs0eJd6Li-T1aQc=.fc863cb9-0ce2-488f-a7d6-3aa211248798@github.com>

On Wed, 27 Aug 2025 09:56:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Can you find out why we don't vectorize with AVX1 here?

This was a fun little rabbit hole. The explanation below is for `test6` but I think the same logic applies to `test9`:

The problem comes from the IR node definition, what JTreg does with that, and the what HotSpot code actually does.

The annotation definition is:

    @IR(counts = {IRNode.LOAD_VECTOR_F, "> 0",


So JTreg assumes that the regex should match a vector size of 8. With `UseAVX=1` and floats, `IRNode.getMaxElementsForTypeOnX86` returns 8 and so that's how the constraint is set:


         * Constraint 1: "(\d+(\s){2}(LoadVector.*)+(\s){2}===.*vector[A-Za-z]<F,8>)"


But the issue is that at runtime the vector size is 4:

  844  LoadVector  === ... #vectorx<F,4>


HotSpot logic is more nuanced, with the key being what happens in `SuperWord::unrolling_analysis`. The thing that JTreg doesn't know is that there are 2 types involved in the loop, float **and** int:


        for (int i = 0; i < a.length; i++) {
            a[i] = Float.floatToRawIntBits(b[i]);
        }


With `UseAVX=1`, the max vector size for floats is 8, but for ints is 4. So the JVM picks the minimum value and uses that. Hence that is how unrolling is 4... all the way to the load vector size which is 4.

IMO the right thing to do would be to fix the annotation to be:


    @IR(counts = {IRNode.LOAD_VECTOR_F, IRNode.VECTOR_SIZE_4, "> 0",


And explain it in javadoc why the expected size is 4.

The same with `test9`

WDYT @eme64?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3241348514

From epeter at openjdk.org  Mon Sep  1 08:37:07 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 08:37:07 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
Message-ID: <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>

On Fri, 29 Aug 2025 09:38:58 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
> 
>  - Restore modified java/lang/invoke tests
>  - Sort includes (new requirement)
>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>  - Add clarifying comments at definitions of register mask sizes
>  - Fix implicit zero and nullptr checks
>  - Add deep copy comment
>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>  - Fix typo
>  - Updates after Emanuel's comments
>  - Refactor and improve TestNestedSynchronize.java
>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47

I thought I'd dive straight back into `regmask.hpp`. I'm remembering some of what we discussed, but I'll need you help to fill in the picture ;)

I wonder if we could do some renamings in a prior PR, just to make this a little easier to review.

src/hotspot/share/opto/regmask.hpp line 44:

> 42: // statements in Java.
> 43: const int BoxLockNode_SLOT_LIMIT = 200;
> 44: 

Even before this constant, it would be nice to have an introductory comment, that lays out what the regmask is for, and what its basic design is.

src/hotspot/share/opto/regmask.hpp line 63:

> 61: // RM_SIZE is the base size of a register mask in 32-bit words.
> 62: // RM_SIZE_MIN is the theoretical minimum size of a register mask in 32-bit
> 63: // words.

It seems this is a bad pattern that was already here before you. But it really makes me a little scared here.

Having two variable names differ in just an underscore `_` but with different semantics is a bit confusing to me. It is hard for the reader to keep track of what is what going forward. It would be really easy for someone to confuse the two in the future and have bugs creap in that way (just because of an underscore). It may be more useful to use the units in at least one of the two names.

I would love to see names like `RM_SIZE` and `RM_SIZE_IN_LONGS`, rather than `RM_SIZE` and `_RM_SIZE`.
Even better would be `RM_SIZE_IN_INTS` and `RM_SIZE_IN_LONGS`. That way, you rould save a lot of comments. Maybe you could come up with even better names. "slots" and "words"?
You could consider doing a renaming PR first before the patch here. Maybe you can even automate the renaming with a command/script, and then apply the same renaming to the changes here?

src/hotspot/share/opto/regmask.hpp line 96:

> 94:       (((RM_SIZE_MIN << 5) +                // Slots for machine registers
> 95:         (max_method_parameter_length * 2) + // Slots for incoming arguments
> 96:         (max_method_parameter_length * 2) + // Slots for outgoing arguments

What's the meaning of incoming vs outgoing arguments? Like this?

Incoming = from caller (outer nesting)
Outgoing = to nested call (inner nesting)

src/hotspot/share/opto/regmask.hpp line 122:

> 120: 
> 121:     // Viewed as an array of machine words
> 122:     uintptr_t _RM_UP[_RM_SIZE];

Do you know what `UP` stands for? Could we rename it maybe?
Would be nice if we could have the same "units" for these arrays than for the sizes above.

src/hotspot/share/opto/regmask.hpp line 128:

> 126:   // extend the register mask with dynamically allocated memory. We keep the
> 127:   // base statically allocated _RM_UP, and arena allocate the extended mask
> 128:   // (RM_UP_EXT) separately. Another, perhaps more elegant, option would be to

Suggestion:

  // (_RM_UP_EXT) separately. Another, perhaps more elegant, option would be to

Underscore for consistency? Or does it reference something else?

src/hotspot/share/opto/regmask.hpp line 161:

> 159:   // cases, we can allow read-only sharing.
> 160:   bool _read_only = false;
> 161: #endif

Can you explain why this happens? Is this something we could clean up? It smells a bit like tech-dept. But maybe it is a really necessary performance optimization. Would be nice if there was an explanation which one it is ;)

src/hotspot/share/opto/regmask.hpp line 170:

> 168:   // variable indicates how many words we offset with. We consider all
> 169:   // registers before the offset to not be included in the register mask.
> 170:   unsigned int _offset;

Does that mean we make different slices of the mask?

src/hotspot/share/opto/regmask.hpp line 175:

> 173:   // mask can currently represent to be included. If _all_stack = false, we
> 174:   // consider the registers not included.
> 175:   bool _all_stack = false;

I'd prefer to have some kind of `_is_...` name here. Because when I read `all_stack` and see it is a bool, I wonder what it means - it does not tell me quickly. Does it mean that all registers are on the stack?

Is everything that is beyond the register mask purely on the stack? Is everything from the stack always beyond the register mask? I'm confused :face_with_peeking_eye:

src/hotspot/share/opto/regmask.hpp line 179:

> 177:   // The low and high watermarks represent the lowest and highest word that
> 178:   // might contain set register mask bits, respectively. We guarantee that
> 179:   // there are no bits in words outside this range, but any word at and between

In the example below, you have 1 bits above the `_hwm`. Is that intentional? Are those bits to be ignored? Can you please add some extra info to the example about that?

src/hotspot/share/opto/regmask.hpp line 217:

> 215:   // necessarily representing stack locations) to 1. Here is how the above
> 216:   // register mask looks like after clearing, setting _all_stack to true, and
> 217:   // successfully rolling over:

I'm still struggling to follow here. Maybe `_offset` is not clear to me yet. What is the value here for it? How is it changed with the `rollover`?

src/hotspot/share/opto/regmask.hpp line 230:

> 228:   //          \_______________________________________________________________________________/
> 229:   //                                                  |
> 230:   //                                              _rm_size

Ah, I remember this now. Really helpful. Maybe we could link to this layout explanation from the comment at the very top of the file?

-------------

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-3172500942
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313199061
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313162130
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313223912
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313184547
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313195111
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313207232
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313263478
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313219662
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313253475
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313264670
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313256455

From epeter at openjdk.org  Mon Sep  1 08:37:07 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 08:37:07 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
Message-ID: <NDDSmCvsbgpWgTU_bCIhNdo8foNn447LmTJ4HsCTv-s=.e0549027-7ac1-4794-bfce-322d3870f9d1@github.com>

On Mon, 1 Sep 2025 07:49:26 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
>> 
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Fix typo
>>  - Updates after Emanuel's comments
>>  - Refactor and improve TestNestedSynchronize.java
>>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47
>
> src/hotspot/share/opto/regmask.hpp line 63:
> 
>> 61: // RM_SIZE is the base size of a register mask in 32-bit words.
>> 62: // RM_SIZE_MIN is the theoretical minimum size of a register mask in 32-bit
>> 63: // words.
> 
> It seems this is a bad pattern that was already here before you. But it really makes me a little scared here.
> 
> Having two variable names differ in just an underscore `_` but with different semantics is a bit confusing to me. It is hard for the reader to keep track of what is what going forward. It would be really easy for someone to confuse the two in the future and have bugs creap in that way (just because of an underscore). It may be more useful to use the units in at least one of the two names.
> 
> I would love to see names like `RM_SIZE` and `RM_SIZE_IN_LONGS`, rather than `RM_SIZE` and `_RM_SIZE`.
> Even better would be `RM_SIZE_IN_INTS` and `RM_SIZE_IN_LONGS`. That way, you rould save a lot of comments. Maybe you could come up with even better names. "slots" and "words"?
> You could consider doing a renaming PR first before the patch here. Maybe you can even automate the renaming with a command/script, and then apply the same renaming to the changes here?

Oh gosh, I just realized: machine word of course depends on 32bit vs 64bit architecture. Yikes.
So maybe the names need to be stack-slots vs words? And there should probably be a quick reminder somewhere that words can be different sizes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313237509

From epeter at openjdk.org  Mon Sep  1 08:43:44 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 08:43:44 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v4]
In-Reply-To: <5xrZ-TcQ9OaMFIAMGIMTDCwGdexIMs0eJd6Li-T1aQc=.fc863cb9-0ce2-488f-a7d6-3aa211248798@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <Fc_WY3YbTlom7mg-PcoOAqvQN2N9r96MtavcEzikKkM=.792fdc5f-f86b-4e05-b409-4917e65b7dd1@github.com>
 <aJP823LfE3KlVIQ9ehYx2IY1J6ldDCxaiTh3hRxdyss=.4d439f2a-ab28-4b95-beb7-8e1c1b48e990@github.com>
 <0VA9QnuPSb55PbioO1XWtSmrAC-sQet0hb_ldRgKdFQ=.95f56a0b-3b08-4654-8f1e-7217cd9bcabe@github.com>
 <5xrZ-TcQ9OaMFIAMGIMTDCwGdexIMs0eJd6Li-T1aQc=.fc863cb9-0ce2-488f-a7d6-3aa211248798@github.com>
Message-ID: <jnNXQxAU4g9DOjrutncOGDvAJ8vwhESxMNmsm3a4lxI=.2c2121d5-88fb-4f14-b670-2135e4f91ddb@github.com>

On Mon, 1 Sep 2025 08:17:08 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> @galderz I got a failure  in out testing:
>> 
>> With VM flag: `-XX:UseAVX=1`.
>> 
>> 
>> Failed IR Rules (2) of Methods (2)
>> ----------------------------------
>> 1) Method "static java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test6(int[],float[])" - [Failed IR rules: 1]:
>>    * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_F#_", "> 0", "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
>>      > Phase "PrintIdeal":
>>        - counts: Graph contains wrong number of nodes:
>>          * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]<F,8>)"
>>            - Failed comparison: [found] 0 > 0 [given]
>>            - No nodes matched!
>> 
>> 2) Method "static java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test9(long[],double[])" - [Failed IR rules: 1]:
>>    * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_D#_", "> 0", "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
>>      > Phase "PrintIdeal":
>>        - counts: Graph contains wrong number of nodes:
>>          * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]<D,4>)"
>>            - Failed comparison: [found] 0 > 0 [given]
>>            - No nodes matched!
>> 
>> 
>> I suspect that `test6` with `floatToRawIntBits` and `test9` with `doubleToRawLongBits` are only supported with `AVX2`. Question is if that is really supposed to be like that, or if we should even file an RFE to extend support for `AVX1` and lower.
>> 
>> Can you find out why we don't vectorize with `AVX1` here?
>
>> Can you find out why we don't vectorize with AVX1 here?
> 
> This was a fun little rabbit hole. The explanation below is for `test6` but I think the same logic applies to `test9`:
> 
> The problem comes from the IR node definition, what JTreg does with that, and the what HotSpot code actually does.
> 
> The annotation definition is:
> 
>     @IR(counts = {IRNode.LOAD_VECTOR_F, "> 0",
> 
> 
> So JTreg assumes that the regex should match a vector size of 8. With `UseAVX=1` and floats, `IRNode.getMaxElementsForTypeOnX86` returns 8 and so that's how the constraint is set:
> 
> 
>          * Constraint 1: "(\d+(\s){2}(LoadVector.*)+(\s){2}===.*vector[A-Za-z]<F,8>)"
> 
> 
> But the issue is that at runtime the vector size is 4:
> 
>   844  LoadVector  === ... #vectorx<F,4>
> 
> 
> HotSpot logic is more nuanced, with the key being what happens in `SuperWord::unrolling_analysis`. The thing that JTreg doesn't know is that there are 2 types involved in the loop, float **and** int:
> 
> 
>         for (int i = 0; i < a.length; i++) {
>             a[i] = Float.floatToRawIntBits(b[i]);
>         }
> 
> 
> With `UseAVX=1`, the max vector size for floats is 8, but for ints is 4. So the JVM picks the minimum value and uses that. Hence that is how unrolling is 4... all the way to the load vector size which is 4.
> 
> IMO the right thing to do would be to fix the annotation to be:
> 
> 
>     @IR(counts = {IRNode.LOAD_VECTOR_F, IRNode.VECTOR_SIZE_4, "> 0",
> 
> 
> And explain it in javadoc why the expected size is 4.
> 
> The same with `test9`
> 
> WDYT @eme64?

@galderz Ah, maybe we just need to do it like here then:
`test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java:192:50:        counts = {IRNode.VECTOR_CAST_I2F, IRNode.VECTOR_SIZE + "min(max_int, max_float)", ">0"})`

When doing cast/reinterpret/move between types this always happens ;)

I think this should generalize over all platforms.

Does that work?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3241438142

From epeter at openjdk.org  Mon Sep  1 08:47:51 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 08:47:51 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v5]
In-Reply-To: <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
Message-ID: <8_JXUPiLQNWEmDTbAnwB1jdYu6mTE3_NbETZkQabPwU=.78227d3e-8312-47da-bb2b-0a84017fc724@github.com>

On Mon, 25 Aug 2025 07:13:43 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.
>
> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision:
> 
>  - Merge branch 'master' into topic.fp-bits-vector
>  - Add more IR node positive assertions
>  - Fix source of data for benchmarks
>  - Refactor benchmarks to TypeVectorOperations
>  - Check at the very least that auto vectorization is supported
>  - Avoid VectorReinterpret::implemented
>  - Refactor and add copyright header
>  - Rephrase comment
>  - Removed unnecessary assert methods
>  - Adjust IR test after adding Move* vector support
>  - ... and 12 more: https://git.openjdk.org/jdk/compare/fc6e0b6f...e7e4d801

test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 460:

> 458:     @IR(counts = {IRNode.LOAD_VECTOR_L, "> 0",
> 459:                   IRNode.STORE_VECTOR, "> 0",
> 460:                   IRNode.VECTOR_REINTERPRET, "> 0"},

Ah, I just saw that `VECTOR_REINTERPRET` is no `vectorNode`, so we don't check the size for it. Would it have a type and size though?

If so, we could consider making it more precise, like all the vector casts.
Would be a little bit of work, but it would make the rules more precise.
Could also be a separate RFE.


  2458     public static final String VECTOR_REINTERPRET = PREFIX + "VECTOR_REINTERPRET" + POSTFIX;
  2459     static {
  2460         beforeMatchingNameRegex(VECTOR_REINTERPRET, "VectorReinterpret");                                                                                                                                                                                                                                                                                                                                                            
  2461     }
  2462 
  2463     public static final String VECTOR_UCAST_B2S = VECTOR_PREFIX + "VECTOR_UCAST_B2S" + POSTFIX;
  2464     static {
  2465         vectorNode(VECTOR_UCAST_B2S, "VectorUCastB2X", TYPE_SHORT);
  2466     }


Depending on the dump, it may not be so easy though. Not sure.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2313298675

From epeter at openjdk.org  Mon Sep  1 08:50:58 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 08:50:58 GMT
Subject: RFR: 8366361: C2 SuperWord: rename VTransformNode::set_req ->
 init_req, analogue to Node::init_req [v2]
In-Reply-To: <-xtJXkBZ8TsKvj1zsyDeaAlXECBhIju5TZzfxc3iuYg=.dd473b6c-ff01-4fd5-90d7-701e0407f9bc@github.com>
References: <zMcHnCVSmgkj53F43PilChJtIIKwqYbVgw-rEmUaB90=.b7ac6423-a979-4c61-957e-ff719208d68f@github.com>
 <H5muAU2r__jeioYyo3Ro_yrKuTx9MCi4dnPFag2W3tc=.d4f314fe-6e55-4dd8-a8ae-b02897ea72a7@github.com>
 <-xtJXkBZ8TsKvj1zsyDeaAlXECBhIju5TZzfxc3iuYg=.dd473b6c-ff01-4fd5-90d7-701e0407f9bc@github.com>
Message-ID: <8zauGgBGFELEkml3ODhhsoVJDGJUyKNhg3cQbxF60RU=.820949e0-a455-4a5d-8c35-63af12a24e97@github.com>

On Mon, 1 Sep 2025 07:12:08 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
>> 
>>  - Merge branch 'master' into JDK-8366361-vtn-init_req
>>  - JDK-8366361
>>  - For Christian: use phase->intcon instead
>>  - Update src/hotspot/share/opto/vtransform.hpp
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - JDK-8366357
>
> Marked as reviewed by chagedorn (Reviewer).

@chhagedorn @vnkozlov Thanks for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26991#issuecomment-3241456255

From epeter at openjdk.org  Mon Sep  1 08:51:00 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 08:51:00 GMT
Subject: Integrated: 8366361: C2 SuperWord: rename VTransformNode::set_req ->
 init_req, analogue to Node::init_req
In-Reply-To: <zMcHnCVSmgkj53F43PilChJtIIKwqYbVgw-rEmUaB90=.b7ac6423-a979-4c61-957e-ff719208d68f@github.com>
References: <zMcHnCVSmgkj53F43PilChJtIIKwqYbVgw-rEmUaB90=.b7ac6423-a979-4c61-957e-ff719208d68f@github.com>
Message-ID: <AiLyNkP3BA4e6hg5-0sEK-FIId41df18xdjgE245gXU=.0302d066-5504-4a89-8725-fd6f3026f5fb@github.com>

On Thu, 28 Aug 2025 15:30:31 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> The current implementation of `VTransformNode::set_req` has `init_req` semantics, it verifies that the corresponding input is still nullptr. We should thus rename it. It will also free up the "set_req" name for later use in VTransform optimizations, where we want to modify the graph.
> 
> See `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop` in the proof-of-concept PR.
> 
> FYI: this PR is dependent on https://github.com/openjdk/jdk/pull/26987. I'll rebase once that one is integrated. We can still already review, so that the process is a little faster later on. (I have more small changes coming, but separating makes them more reviewable.)

This pull request has now been integrated.

Changeset: 56713817
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/56713817c0fd060f7106a538b0e795081f4f9d4b
Stats:     26 lines in 3 files changed: 0 ins; 0 del; 26 mod

8366361: C2 SuperWord: rename VTransformNode::set_req -> init_req, analogue to Node::init_req

Reviewed-by: kvn, chagedorn

-------------

PR: https://git.openjdk.org/jdk/pull/26991

From galder at openjdk.org  Mon Sep  1 08:51:51 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 1 Sep 2025 08:51:51 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v5]
In-Reply-To: <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
Message-ID: <d_2I4eVtuv6xKSbIPzsQfWQ48moorLPed1byZxABrT8=.60772db9-cc21-4639-9772-84bb983e4e90@github.com>

On Mon, 25 Aug 2025 07:13:43 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.
>
> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision:
> 
>  - Merge branch 'master' into topic.fp-bits-vector
>  - Add more IR node positive assertions
>  - Fix source of data for benchmarks
>  - Refactor benchmarks to TypeVectorOperations
>  - Check at the very least that auto vectorization is supported
>  - Avoid VectorReinterpret::implemented
>  - Refactor and add copyright header
>  - Rephrase comment
>  - Removed unnecessary assert methods
>  - Adjust IR test after adding Move* vector support
>  - ... and 12 more: https://git.openjdk.org/jdk/compare/54d7c4b3...e7e4d801

One correction about my suggested fix above:

This one would work for `UseAVX=1` but would fail with other `UseAVX` values.

    @IR(counts = {IRNode.LOAD_VECTOR_F, IRNode.VECTOR_SIZE_4, "> 0",


It would need to be something like this to work in all cases:


    @IR(counts = {IRNode.LOAD_VECTOR_F, IRNode.VECTOR_SIZE_ANY, "> 0",

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3241465858

From rehn at openjdk.org  Mon Sep  1 08:56:25 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Mon, 1 Sep 2025 08:56:25 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
Message-ID: <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>

> Hey, please consider!
> 
> A bunch of info in JBS entry, please read that also.
> 
> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
> This patch restores them and removes this regression.
> 
> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
> 
> Please test on your hardware!
> 
> 
> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> 
> 
> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.

Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - Merge branch 'master' into 8365926
 - Spelling
 - Merge branch 'master' into 8365926
 - draft jal<->jalr

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26944/files
  - new: https://git.openjdk.org/jdk/pull/26944/files/03505f8d..b81779cb

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=00-01

  Stats: 10832 lines in 300 files changed: 8871 ins; 705 del; 1256 mod
  Patch: https://git.openjdk.org/jdk/pull/26944.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26944/head:pull/26944

PR: https://git.openjdk.org/jdk/pull/26944

From rehn at openjdk.org  Mon Sep  1 08:56:25 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Mon, 1 Sep 2025 08:56:25 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <EsJhPSbVxJEkP9W7127uRJcJQMoZEc4JcGF0GdHEhBA=.c4568f6d-a70b-4cdc-a8f9-6e21ed157fd3@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <EsJhPSbVxJEkP9W7127uRJcJQMoZEc4JcGF0GdHEhBA=.c4568f6d-a70b-4cdc-a8f9-6e21ed157fd3@github.com>
Message-ID: <gCOmn1OCoSN7r89xPSG1rpEvWD9M0fmwg9Y6hHhehGc=.8883f3d0-94f2-46d6-b7da-4dd7444eb5b7@github.com>

On Mon, 1 Sep 2025 07:04:20 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>> 
>>  - Merge branch 'master' into 8365926
>>  - Spelling
>>  - Merge branch 'master' into 8365926
>>  - draft jal<->jalr
>
> src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 125:
> 
>> 123:   set_stub_address_destination_at(stub_addr, dest);
>> 124: 
>> 125:   // patches jalr -> jal/jal -> jalr depeding on dest
> 
> Suggestion: s/depeding/depending/

Fixed

> src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 146:
> 
>> 144: 
>> 145:     address dest = stub_address_destination_at(stub_addr);
>> 146:     optimize_call(dest, false); // patches jalr -> jal/jal -> jalr depeding on dest
> 
> Suggestion: s/depeding/depending/

Fixed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313319741
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313319894

From rcastanedalo at openjdk.org  Mon Sep  1 08:58:50 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 1 Sep 2025 08:58:50 GMT
Subject: RFR: 8365791: IGV: Update build dependencies
In-Reply-To: <prYmhXzchcflND1xoWqIlYQlhtsR86WbngDgoSe6meA=.64928f3d-6656-40a9-ae4c-c95e95924ac8@github.com>
References: <prYmhXzchcflND1xoWqIlYQlhtsR86WbngDgoSe6meA=.64928f3d-6656-40a9-ae4c-c95e95924ac8@github.com>
Message-ID: <Af8fvSBHhCHEIyYQsHenPE7aBLCcy3LBSzVNtY8fZOI=.dfff10c1-aa10-4757-81f8-935a9645358f@github.com>

On Fri, 29 Aug 2025 06:37:30 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset updates IGV's Apache Batik dependency, which is used for exporting graphs into SVG files (`File -> Export current graph...`), to its latest version.
> 
> **Testing:** checked manually that a few graphs are correctly exported as SVG files.

Thanks for reviewing, Christian and Albert!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27000#issuecomment-3241484828

From rcastanedalo at openjdk.org  Mon Sep  1 08:58:51 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 1 Sep 2025 08:58:51 GMT
Subject: Integrated: 8365791: IGV: Update build dependencies
In-Reply-To: <prYmhXzchcflND1xoWqIlYQlhtsR86WbngDgoSe6meA=.64928f3d-6656-40a9-ae4c-c95e95924ac8@github.com>
References: <prYmhXzchcflND1xoWqIlYQlhtsR86WbngDgoSe6meA=.64928f3d-6656-40a9-ae4c-c95e95924ac8@github.com>
Message-ID: <h22K4XJhhhFewFSp7hbxBP1asIZbUTolzoylW9iXwU8=.83ac6bdf-6b8a-4d16-b29a-df4298c2cd0c@github.com>

On Fri, 29 Aug 2025 06:37:30 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset updates IGV's Apache Batik dependency, which is used for exporting graphs into SVG files (`File -> Export current graph...`), to its latest version.
> 
> **Testing:** checked manually that a few graphs are correctly exported as SVG files.

This pull request has now been integrated.

Changeset: fc77e760
Author:    Roberto Casta?eda Lozano <rcastanedalo at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/fc77e7600f217cc91c24d4e512c685e176a66e4a
Stats:     1 line in 1 file changed: 0 ins; 0 del; 1 mod

8365791: IGV: Update build dependencies

Reviewed-by: chagedorn, ayang

-------------

PR: https://git.openjdk.org/jdk/pull/27000

From epeter at openjdk.org  Mon Sep  1 08:58:57 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 08:58:57 GMT
Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes [v3]
In-Reply-To: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
Message-ID: <siGeSxq7xrjkgXyW7YR27NvHcQV-cT0Xxf9MlVGqYyI=.4fc0f8d3-244c-4fad-a79b-4edafb448a18@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;)
> 
> We split the `VTransformScalarNode`:
> - `VTransformMemopScalarNode`
>   - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`.
>   - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`.
> - `VTransformLoopPhiNode`
>   - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges.
> - `VTransformCFGNode`
>   - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG.
> - `VTransformDataScalarNode`
>   - These represent all the normal "calculation" nodes in the loop.
> - `VTransformInputScalarNode` -> `VTransformOuterNode`:
>   - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later.
> 
> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`).

Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:

 - Merge branch 'JDK-8366427-VTransform-scalar-node-refactor' of https://github.com/eme64/jdk into JDK-8366427-VTransform-scalar-node-refactor
 - Update src/hotspot/share/opto/vtransform.hpp
   
   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
 - manual merge
 - improve print_spec
 - rm comment
 - InputScalar -> Outer renaming
 - rm useless methods
 - rm vloop_analyzer from vpointer method
 - JDK-8366427
 - JDK-8366361
 - ... and 3 more: https://git.openjdk.org/jdk/compare/56713817...86e88f43

-------------

Changes: https://git.openjdk.org/jdk/pull/27002/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27002&range=02
  Stats: 157 lines in 4 files changed: 114 ins; 0 del; 43 mod
  Patch: https://git.openjdk.org/jdk/pull/27002.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27002/head:pull/27002

PR: https://git.openjdk.org/jdk/pull/27002

From dfenacci at openjdk.org  Mon Sep  1 09:02:46 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Mon, 1 Sep 2025 09:02:46 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v2]
In-Reply-To: <rhW8AwHI0mzaBA8yTDbG58fKD0OVfjctCJndeyxUYK8=.1f79e07f-3586-459c-993d-5ca8d134fbf3@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
 <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com>
 <rhW8AwHI0mzaBA8yTDbG58fKD0OVfjctCJndeyxUYK8=.1f79e07f-3586-459c-993d-5ca8d134fbf3@github.com>
Message-ID: <jFiy1UPRbJFiv_yCZNd1c_pR81xbsTHK238ujxq4BMc=.9a0a35e2-fe79-4410-ac56-a457d91e948d@github.com>

On Thu, 21 Aug 2025 00:27:02 GMT, Dean Long <dlong at openjdk.org> wrote:

> This look OK on the surface, but isn't handling MemBarStoreStore and MemBarRelease differently asking for trouble? Is there a reason why they need to be handled in different passes?

I'm not sure of the reason why EA handles `MemBarStoreStore` separately. Maybe @vnkozlov can shed some light...

BTW the original assert with condition `Opcode() == Op_Initialize` seems to have been added because that was the case of the [JDK-8269771](https://bugs.openjdk.org/browse/JDK-8269771) bug ([PR](https://github.com/openjdk/jdk17/pull/193)). I'm not sure that there couldn't be any other additional case (apart from the current two) that makes the membar node have only one out edge.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3241505556

From galder at openjdk.org  Mon Sep  1 09:03:24 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 1 Sep 2025 09:03:24 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v6]
In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
Message-ID: <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>

> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
> 
> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
> 
> 
> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
> 
> 
> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
> 
> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.

Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision:

  Adjust vector size expectations

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26457/files
  - new: https://git.openjdk.org/jdk/pull/26457/files/e7e4d801..632408ba

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26457&range=04-05

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/26457.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26457/head:pull/26457

PR: https://git.openjdk.org/jdk/pull/26457

From galder at openjdk.org  Mon Sep  1 09:03:25 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 1 Sep 2025 09:03:25 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v4]
In-Reply-To: <jnNXQxAU4g9DOjrutncOGDvAJ8vwhESxMNmsm3a4lxI=.2c2121d5-88fb-4f14-b670-2135e4f91ddb@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <Fc_WY3YbTlom7mg-PcoOAqvQN2N9r96MtavcEzikKkM=.792fdc5f-f86b-4e05-b409-4917e65b7dd1@github.com>
 <aJP823LfE3KlVIQ9ehYx2IY1J6ldDCxaiTh3hRxdyss=.4d439f2a-ab28-4b95-beb7-8e1c1b48e990@github.com>
 <0VA9QnuPSb55PbioO1XWtSmrAC-sQet0hb_ldRgKdFQ=.95f56a0b-3b08-4654-8f1e-7217cd9bcabe@github.com>
 <5xrZ-TcQ9OaMFIAMGIMTDCwGdexIMs0eJd6Li-T1aQc=.fc863cb9-0ce2-488f-a7d6-3aa211248798@github.com>
 <jnNXQxAU4g9DOjrutncOGDvAJ8vwhESxMNmsm3a4lxI=.2c2121d5-88fb-4f14-b670-2135e4f91ddb@github.com>
Message-ID: <gubrInJVtmBLPucdEOqNY19jdcRyd1LkWpfw76lMfV4=.e63f8b53-e850-4eca-8fb5-455a91bd59f3@github.com>

On Mon, 1 Sep 2025 08:40:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Does that work?

Yeah that works, I'll push an update shortly

@eme64 I've just pushed an update that fixes the vector size expectations. I didn't end up writing a javadoc since the proposed solution makes it clearer what the expected size should be.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3241495486
PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3241505284

From galder at openjdk.org  Mon Sep  1 09:03:28 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 1 Sep 2025 09:03:28 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v5]
In-Reply-To: <8_JXUPiLQNWEmDTbAnwB1jdYu6mTE3_NbETZkQabPwU=.78227d3e-8312-47da-bb2b-0a84017fc724@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
 <8_JXUPiLQNWEmDTbAnwB1jdYu6mTE3_NbETZkQabPwU=.78227d3e-8312-47da-bb2b-0a84017fc724@github.com>
Message-ID: <t2zaSQ-LR7_tih-KiyBlZuGF9layTDIP0t86I3tseuU=.3a53f81e-474e-4b5c-ae47-ced61ae34cf9@github.com>

On Mon, 1 Sep 2025 08:44:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision:
>> 
>>  - Merge branch 'master' into topic.fp-bits-vector
>>  - Add more IR node positive assertions
>>  - Fix source of data for benchmarks
>>  - Refactor benchmarks to TypeVectorOperations
>>  - Check at the very least that auto vectorization is supported
>>  - Avoid VectorReinterpret::implemented
>>  - Refactor and add copyright header
>>  - Rephrase comment
>>  - Removed unnecessary assert methods
>>  - Adjust IR test after adding Move* vector support
>>  - ... and 12 more: https://git.openjdk.org/jdk/compare/57cf332d...e7e4d801
>
> test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 460:
> 
>> 458:     @IR(counts = {IRNode.LOAD_VECTOR_L, "> 0",
>> 459:                   IRNode.STORE_VECTOR, "> 0",
>> 460:                   IRNode.VECTOR_REINTERPRET, "> 0"},
> 
> Ah, I just saw that `VECTOR_REINTERPRET` is no `vectorNode`, so we don't check the size for it. Would it have a type and size though?
> 
> If so, we could consider making it more precise, like all the vector casts.
> Would be a little bit of work, but it would make the rules more precise.
> Could also be a separate RFE.
> 
> 
>   2458     public static final String VECTOR_REINTERPRET = PREFIX + "VECTOR_REINTERPRET" + POSTFIX;
>   2459     static {
>   2460         beforeMatchingNameRegex(VECTOR_REINTERPRET, "VectorReinterpret");                                                                                                                                                                                                                                                                                                                                                            
>   2461     }
>   2462 
>   2463     public static final String VECTOR_UCAST_B2S = VECTOR_PREFIX + "VECTOR_UCAST_B2S" + POSTFIX;
>   2464     static {
>   2465         vectorNode(VECTOR_UCAST_B2S, "VectorUCastB2X", TYPE_SHORT);
>   2466     }
> 
> 
> Depending on the dump, it may not be so easy though. Not sure.

That makes sense, I'll create a separate RFE for that

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2313333399

From mhaessig at openjdk.org  Mon Sep  1 09:08:44 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Mon, 1 Sep 2025 09:08:44 GMT
Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes [v3]
In-Reply-To: <siGeSxq7xrjkgXyW7YR27NvHcQV-cT0Xxf9MlVGqYyI=.4fc0f8d3-244c-4fad-a79b-4edafb448a18@github.com>
References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
 <siGeSxq7xrjkgXyW7YR27NvHcQV-cT0Xxf9MlVGqYyI=.4fc0f8d3-244c-4fad-a79b-4edafb448a18@github.com>
Message-ID: <5n2PYjLiIaiBx20TGMaRL-nWU-eRDKs-mYGVAnHQMQc=.df9923d2-963f-4589-a21a-5cd56c0467c3@github.com>

On Mon, 1 Sep 2025 08:58:57 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;)
>> 
>> We split the `VTransformScalarNode`:
>> - `VTransformMemopScalarNode`
>>   - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`.
>>   - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`.
>> - `VTransformLoopPhiNode`
>>   - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges.
>> - `VTransformCFGNode`
>>   - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG.
>> - `VTransformDataScalarNode`
>>   - These represent all the normal "calculation" nodes in the loop.
>> - `VTransformInputScalarNode` -> `VTransformOuterNode`:
>>   - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later.
>> 
>> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`).
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:
> 
>  - Merge branch 'JDK-8366427-VTransform-scalar-node-refactor' of https://github.com/eme64/jdk into JDK-8366427-VTransform-scalar-node-refactor
>  - Update src/hotspot/share/opto/vtransform.hpp
>    
>    Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>  - manual merge
>  - improve print_spec
>  - rm comment
>  - InputScalar -> Outer renaming
>  - rm useless methods
>  - rm vloop_analyzer from vpointer method
>  - JDK-8366427
>  - JDK-8366361
>  - ... and 3 more: https://git.openjdk.org/jdk/compare/56713817...86e88f43

Still good.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27002#pullrequestreview-3172793634

From epeter at openjdk.org  Mon Sep  1 09:12:44 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 09:12:44 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v6]
In-Reply-To: <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
Message-ID: <CvRoKTqLKs2t1jFrMm2Au1g85grsxItFDdk1v1VT-ag=.fa323248-154a-4273-8967-50e815ca11e4@github.com>

On Mon, 1 Sep 2025 09:03:24 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.
>
> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adjust vector size expectations

Perfect, thanks for the update! I'll submit testing again :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3241529779

From epeter at openjdk.org  Mon Sep  1 09:12:46 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 09:12:46 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v5]
In-Reply-To: <T5PdEt6YZlZF_eNaSoaOq-0AqUL2gTQ7OqOHStbjnTc=.4a173c7d-8cc1-469d-81e9-268cd58cb449@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
 <8_JXUPiLQNWEmDTbAnwB1jdYu6mTE3_NbETZkQabPwU=.78227d3e-8312-47da-bb2b-0a84017fc724@github.com>
 <t2zaSQ-LR7_tih-KiyBlZuGF9layTDIP0t86I3tseuU=.3a53f81e-474e-4b5c-ae47-ced61ae34cf9@github.com>
 <T5PdEt6YZlZF_eNaSoaOq-0AqUL2gTQ7OqOHStbjnTc=.4a173c7d-8cc1-469d-81e9-268cd58cb449@github.com>
Message-ID: <DNoxXcBb80cb00HQP7STLELHjceLSW-r9rjZipu-DYA=.9094913f-b49e-47d5-9b95-64490fbceb7e@github.com>

On Mon, 1 Sep 2025 09:07:46 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> That makes sense, I'll create a separate RFE for that
>
> Ideal output for `VectorReinterpret` seems to follow a similar pattern to `LoadVector`...etc with regards to the vector size. So seems like a similar solution could be implemented:
> 
> 
>  1306  VectorReinterpret  === _ 1307  [[ 1286 ]]  #vectory<I,8> !orig=1179,979,[846],[738],[646],[145] !jvms: TestCompatibleUseDefTypeSize::test7 @ bci:13 (line 427)

Very nice. That would be a good follow-up RFE. Do you want to work on that one? Otherwise you could tag it as a `starter` task, and we'll eventually find someone to do it ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2313364841

From galder at openjdk.org  Mon Sep  1 09:12:45 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 1 Sep 2025 09:12:45 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v5]
In-Reply-To: <t2zaSQ-LR7_tih-KiyBlZuGF9layTDIP0t86I3tseuU=.3a53f81e-474e-4b5c-ae47-ced61ae34cf9@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
 <8_JXUPiLQNWEmDTbAnwB1jdYu6mTE3_NbETZkQabPwU=.78227d3e-8312-47da-bb2b-0a84017fc724@github.com>
 <t2zaSQ-LR7_tih-KiyBlZuGF9layTDIP0t86I3tseuU=.3a53f81e-474e-4b5c-ae47-ced61ae34cf9@github.com>
Message-ID: <T5PdEt6YZlZF_eNaSoaOq-0AqUL2gTQ7OqOHStbjnTc=.4a173c7d-8cc1-469d-81e9-268cd58cb449@github.com>

On Mon, 1 Sep 2025 08:57:28 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 460:
>> 
>>> 458:     @IR(counts = {IRNode.LOAD_VECTOR_L, "> 0",
>>> 459:                   IRNode.STORE_VECTOR, "> 0",
>>> 460:                   IRNode.VECTOR_REINTERPRET, "> 0"},
>> 
>> Ah, I just saw that `VECTOR_REINTERPRET` is no `vectorNode`, so we don't check the size for it. Would it have a type and size though?
>> 
>> If so, we could consider making it more precise, like all the vector casts.
>> Would be a little bit of work, but it would make the rules more precise.
>> Could also be a separate RFE.
>> 
>> 
>>   2458     public static final String VECTOR_REINTERPRET = PREFIX + "VECTOR_REINTERPRET" + POSTFIX;
>>   2459     static {
>>   2460         beforeMatchingNameRegex(VECTOR_REINTERPRET, "VectorReinterpret");                                                                                                                                                                                                                                                                                                                                                            
>>   2461     }
>>   2462 
>>   2463     public static final String VECTOR_UCAST_B2S = VECTOR_PREFIX + "VECTOR_UCAST_B2S" + POSTFIX;
>>   2464     static {
>>   2465         vectorNode(VECTOR_UCAST_B2S, "VectorUCastB2X", TYPE_SHORT);
>>   2466     }
>> 
>> 
>> Depending on the dump, it may not be so easy though. Not sure.
>
> That makes sense, I'll create a separate RFE for that

Ideal output for `VectorReinterpret` seems to follow a similar pattern to `LoadVector`...etc with regards to the vector size. So seems like a similar solution could be implemented:


 1306  VectorReinterpret  === _ 1307  [[ 1286 ]]  #vectory<I,8> !orig=1179,979,[846],[738],[646],[145] !jvms: TestCompatibleUseDefTypeSize::test7 @ bci:13 (line 427)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2313358195

From galder at openjdk.org  Mon Sep  1 09:19:51 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 1 Sep 2025 09:19:51 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v5]
In-Reply-To: <DNoxXcBb80cb00HQP7STLELHjceLSW-r9rjZipu-DYA=.9094913f-b49e-47d5-9b95-64490fbceb7e@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
 <8_JXUPiLQNWEmDTbAnwB1jdYu6mTE3_NbETZkQabPwU=.78227d3e-8312-47da-bb2b-0a84017fc724@github.com>
 <t2zaSQ-LR7_tih-KiyBlZuGF9layTDIP0t86I3tseuU=.3a53f81e-474e-4b5c-ae47-ced61ae34cf9@github.com>
 <T5PdEt6YZlZF_eNaSoaOq-0AqUL2gTQ7OqOHStbjnTc=.4a173c7d-8cc1-469d-81e9-268cd58cb449@github.com>
 <DNoxXcBb80cb00HQP7STLELHjceLSW-r9rjZipu-DYA=.9094913f-b49e-47d5-9b95-64490fbceb7e@github.com>
Message-ID: <WS1L9Zx2lGkEGBT-AIRIrxKB7AV4a4JyE_wWkasBsG8=.a8a4166b-fb1b-47cc-aca9-202b067f2ac6@github.com>

On Mon, 1 Sep 2025 09:10:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Ideal output for `VectorReinterpret` seems to follow a similar pattern to `LoadVector`...etc with regards to the vector size. So seems like a similar solution could be implemented:
>> 
>> 
>>  1306  VectorReinterpret  === _ 1307  [[ 1286 ]]  #vectory<I,8> !orig=1179,979,[846],[738],[646],[145] !jvms: TestCompatibleUseDefTypeSize::test7 @ bci:13 (line 427)
>
> Very nice. That would be a good follow-up RFE. Do you want to work on that one? Otherwise you could tag it as a `starter` task, and we'll eventually find someone to do it ;)

Yeah I'd like to work on it. Seems like a good one to work in between bigger tasks.

I had a question about it though. I noticed that `STORE_VECTOR` is also in a similar situation. Any specific reason to leave that one as is? Or was it just an oversight? If an oversight, a second RFE could be added for that one?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2313379507

From bkilambi at openjdk.org  Mon Sep  1 09:21:54 2025
From: bkilambi at openjdk.org (Bhavana Kilambi)
Date: Mon, 1 Sep 2025 09:21:54 GMT
Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with
 SVE [v8]
In-Reply-To: <Tfh57ykiKef1rh4CMRnqYRZLs9B94k5JPLp1jaZrKCg=.19d7f37e-1673-44a0-bec6-2be6e037e228@github.com>
References: <kUM5Yem8Uqw5Zxm-7Q4yZid7bWbsVUDCqvqrDYIKurI=.c0a5a2d5-4ad5-4a7d-af9a-3b52d170d849@github.com>
 <UNYd7Mv5E3at9yNJoIBAKbb-_FVqSfIrIChXyWvDbAI=.a26f6801-5440-4d5b-aaa4-fb895da639d6@github.com>
 <Tfh57ykiKef1rh4CMRnqYRZLs9B94k5JPLp1jaZrKCg=.19d7f37e-1673-44a0-bec6-2be6e037e228@github.com>
Message-ID: <ypLlQmGVXHCFIOvNYJbhogkPGIZ5vcQrz0hqIF1LD5I=.8fea8e95-b511-4279-a1d7-49e0e78db3ee@github.com>

On Thu, 28 Aug 2025 15:01:29 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Modified JTREG testcase to address review comments
>
> Let's go, we need this patch in JDK 25, which requires some soak time in mainline :)

@shipilev Could I please ask you to sponsor this patch? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3236177433

From shade at openjdk.org  Mon Sep  1 09:21:55 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 1 Sep 2025 09:21:55 GMT
Subject: RFR: 8361582: AArch64: Some ConH values cannot be replicated with
 SVE [v8]
In-Reply-To: <UNYd7Mv5E3at9yNJoIBAKbb-_FVqSfIrIChXyWvDbAI=.a26f6801-5440-4d5b-aaa4-fb895da639d6@github.com>
References: <kUM5Yem8Uqw5Zxm-7Q4yZid7bWbsVUDCqvqrDYIKurI=.c0a5a2d5-4ad5-4a7d-af9a-3b52d170d849@github.com>
 <UNYd7Mv5E3at9yNJoIBAKbb-_FVqSfIrIChXyWvDbAI=.a26f6801-5440-4d5b-aaa4-fb895da639d6@github.com>
Message-ID: <C0e9xrJyjOywRJHzB6fqKhC-UeVhsExgDhErp37fb_8=.d9d5763e-98cb-4ee3-9763-5728bc121469@github.com>

On Tue, 26 Aug 2025 13:07:21 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test -
>> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as -
>> 
>> 
>> public void vectorAddConstInputFloat16() {
>>          for (int i = 0; i < LEN; ++i) {
>>              output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST));
>>          }
>>      }
>> 
>> 
>> 
>> <The full failure log is present in the JBS ticket, thus not reproducing it here>
>> 
>> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates.
>> 
>> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node).
>> 
>> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine.
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Modified JTREG testcase to address review comments

Huh, I don't see the integration message from bot. Let's see if this message gets the PR on bot notification queue.

There it is.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3241560416
PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3241563354

From bkilambi at openjdk.org  Mon Sep  1 09:21:56 2025
From: bkilambi at openjdk.org (Bhavana Kilambi)
Date: Mon, 1 Sep 2025 09:21:56 GMT
Subject: Integrated: 8361582: AArch64: Some ConH values cannot be replicated
 with SVE
In-Reply-To: <kUM5Yem8Uqw5Zxm-7Q4yZid7bWbsVUDCqvqrDYIKurI=.c0a5a2d5-4ad5-4a7d-af9a-3b52d170d849@github.com>
References: <kUM5Yem8Uqw5Zxm-7Q4yZid7bWbsVUDCqvqrDYIKurI=.c0a5a2d5-4ad5-4a7d-af9a-3b52d170d849@github.com>
Message-ID: <TDp80tcTFdz_V3GypjpyRD5QTq1oPwFRMNpHtOza2A8=.7601cb37-d9fc-42a6-9816-d1256e0e61af@github.com>

On Fri, 1 Aug 2025 09:31:40 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

> After this commit - https://github.com/openjdk/jdk/commit/a49ecb26c5ff2f949851937f3bb036d7946a103e, the JTREG test -
> `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` fails for some of the tests which contain constant values such as -
> 
> 
> public void vectorAddConstInputFloat16() {
>          for (int i = 0; i < LEN; ++i) {
>              output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST));
>          }
>      }
> 
> 
> 
> <The full failure log is present in the JBS ticket, thus not reproducing it here>
> 
> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates.
> 
> This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node).
> 
> Both the tests - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` and `test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java` pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine.

This pull request has now been integrated.

Changeset: 7f0cd648
Author:    Bhavana Kilambi <bkilambi at openjdk.org>
Committer: Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7f0cd6488ba969d5cffe8ebe9b95e4ad70982188
Stats:     220 lines in 7 files changed: 182 ins; 7 del; 31 mod

8361582: AArch64: Some ConH values cannot be replicated with SVE

Reviewed-by: shade, epeter, aph

-------------

PR: https://git.openjdk.org/jdk/pull/26589

From epeter at openjdk.org  Mon Sep  1 09:27:45 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 09:27:45 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v5]
In-Reply-To: <WS1L9Zx2lGkEGBT-AIRIrxKB7AV4a4JyE_wWkasBsG8=.a8a4166b-fb1b-47cc-aca9-202b067f2ac6@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
 <8_JXUPiLQNWEmDTbAnwB1jdYu6mTE3_NbETZkQabPwU=.78227d3e-8312-47da-bb2b-0a84017fc724@github.com>
 <t2zaSQ-LR7_tih-KiyBlZuGF9layTDIP0t86I3tseuU=.3a53f81e-474e-4b5c-ae47-ced61ae34cf9@github.com>
 <T5PdEt6YZlZF_eNaSoaOq-0AqUL2gTQ7OqOHStbjnTc=.4a173c7d-8cc1-469d-81e9-268cd58cb449@github.com>
 <DNoxXcBb80cb00HQP7STLELHjceLSW-r9rjZipu-DYA=.9094913f-b49e-47d5-9b95-64490fbceb7e@github.com>
 <WS1L9Zx2lGkEGBT-AIRIrxKB7AV4a4JyE_wWkasBsG8=.a8a4166b-fb1b-47cc-aca9-202b067f2ac6@github.com>
Message-ID: <LDhTRxMmfmRQIKOlhn5aQze6j5fQNhEPpa9078DIBjo=.3a78425c-af35-460b-9a40-8df856b7a271@github.com>

On Mon, 1 Sep 2025 09:16:39 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> Very nice. That would be a good follow-up RFE. Do you want to work on that one? Otherwise you could tag it as a `starter` task, and we'll eventually find someone to do it ;)
>
> Yeah I'd like to work on it. Seems like a good one to work in between bigger tasks.
> 
> I had a question about it though. I noticed that `STORE_VECTOR` is also in a similar situation. Any specific reason to leave that one as is? Or was it just an oversight? If an oversight, a second RFE could be added for that one?

It would probably also be good if we did stores as well, yes. But you'll touch many many tests, having to specify the type of the store. Still I would say it is worth it ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2313398780

From galder at openjdk.org  Mon Sep  1 09:46:46 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 1 Sep 2025 09:46:46 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v5]
In-Reply-To: <LDhTRxMmfmRQIKOlhn5aQze6j5fQNhEPpa9078DIBjo=.3a78425c-af35-460b-9a40-8df856b7a271@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <mY73YUx3M0wpCEuRskC7KYzVGDbnKRKHOE3SHtsAahs=.179d5d83-9485-4dc6-8dfc-8155cec042b7@github.com>
 <8_JXUPiLQNWEmDTbAnwB1jdYu6mTE3_NbETZkQabPwU=.78227d3e-8312-47da-bb2b-0a84017fc724@github.com>
 <t2zaSQ-LR7_tih-KiyBlZuGF9layTDIP0t86I3tseuU=.3a53f81e-474e-4b5c-ae47-ced61ae34cf9@github.com>
 <T5PdEt6YZlZF_eNaSoaOq-0AqUL2gTQ7OqOHStbjnTc=.4a173c7d-8cc1-469d-81e9-268cd58cb449@github.com>
 <DNoxXcBb80cb00HQP7STLELHjceLSW-r9rjZipu-DYA=.9094913f-b49e-47d5-9b95-64490fbceb7e@github.com>
 <WS1L9Zx2lGkEGBT-AIRIrxKB7AV4a4JyE_wWkasBsG8=.a8a4166b-fb1b-47cc-aca9-202b067f2ac6@github.com>
 <LDhTRxMmfmRQIKOlhn5aQze6j5fQNhEPpa9078DIBjo=.3a78425c-af35-460b-9a40-8df856b7a271@github.com>
Message-ID: <F4zoleovVHR3I3ES0aS8-zDDDbJhy_qcpJ_dZrqTOSc=.995e99fe-e6d2-492a-a127-e950859b93b1@github.com>

On Mon, 1 Sep 2025 09:24:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Yeah I'd like to work on it. Seems like a good one to work in between bigger tasks.
>> 
>> I had a question about it though. I noticed that `STORE_VECTOR` is also in a similar situation. Any specific reason to leave that one as is? Or was it just an oversight? If an oversight, a second RFE could be added for that one?
>
> It would probably also be good if we did stores as well, yes. But you'll touch many many tests, having to specify the type of the store. Still I would say it is worth it ?

I've created [JDK-8366531](https://bugs.openjdk.org/browse/JDK-8366531) for `VectorReinterpret` and [JDK-8366532](https://bugs.openjdk.org/browse/JDK-8366532) for `StoreVector`. I've assigned `VectorReinterpret` one to myself and I left the other one unassigned for someone else to maybe pick it in the future?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2313446621

From mli at openjdk.org  Mon Sep  1 09:53:44 2025
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 1 Sep 2025 09:53:44 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
Message-ID: <QYmZrBcnRfgTE_ArNIDro7ZG4TROSjxJ95rKCbmYfVM=.ec60a334-b64b-4578-8ae6-8ddf7d2859fa@github.com>

On Mon, 1 Sep 2025 08:56:25 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hey, please consider!
>> 
>> A bunch of info in JBS entry, please read that also.
>> 
>> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
>> This patch restores them and removes this regression.
>> 
>> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
>> 
>> Please test on your hardware!
>> 
>> 
>> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
>> JDK-23 (last version with trampoline calls)
>> Mean: 3189.5827
>> Standard Deviation: 284.6478
>> 
>> JDK-25
>> Mean: 3424.8905
>> Standard Deviation: 222.2208
>> 
>> Patch:
>> Mean: 3144.8535
>> Standard Deviation: 229.2577
>> 
>> 
>> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into 8365926
>  - Spelling
>  - Merge branch 'master' into 8365926
>  - draft jal<->jalr

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 60:

> 58:   assert(cb != nullptr && cb->is_nmethod(), "nmethod expected");
> 59:   nmethod *nm = (nmethod *)cb;
> 60:   assert(nm != nullptr, "Sanity");

This line can be removed.

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 62:

> 60:   assert(nm != nullptr, "Sanity");
> 61:   assert(nm->stub_contains(stub_addr), "Sanity");
> 62:   assert(stub_addr!= nullptr, "Sanity");

Suggestion:

  assert(stub_addr != nullptr, "Sanity");

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 95:

> 93:   // Skip over auipc + ld
> 94:   address jal_pc = instruction_address() + 2 * NativeInstruction::instruction_size;
> 95:   uint32_t *jal_pos = (uint32_t *)jal_pc;

Is it possible to lose some data in this conversion?
If not, maybe an assert here?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 103:

> 101:   } else if (!MacroAssembler::is_jalr_at(jal_pc)) { // The jalr is always identical: jalr ra, 0(t1)
> 102:     uint32_t new_jal = Assembler::encode_jalr(ra, t1, 0);
> 103:     Atomic::store(jal_pos, new_jal);

Suggestion:

    uint32_t new_jalr = Assembler::encode_jalr(ra, t1, 0);
    Atomic::store(jal_pos, new_jalr);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313464396
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313464121
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313463904
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313463847

From mli at openjdk.org  Mon Sep  1 10:06:44 2025
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 1 Sep 2025 10:06:44 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
Message-ID: <LUj3nxfsDhJ2kTdroX_W8MZCHAlgjEtQ2byk-ke_Cos=.f8503d8f-02dc-442c-b871-1a7fa735dc93@github.com>

On Mon, 1 Sep 2025 08:56:25 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hey, please consider!
>> 
>> A bunch of info in JBS entry, please read that also.
>> 
>> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
>> This patch restores them and removes this regression.
>> 
>> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
>> 
>> Please test on your hardware!
>> 
>> 
>> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
>> JDK-23 (last version with trampoline calls)
>> Mean: 3189.5827
>> Standard Deviation: 284.6478
>> 
>> JDK-25
>> Mean: 3424.8905
>> Standard Deviation: 222.2208
>> 
>> Patch:
>> Mean: 3144.8535
>> Standard Deviation: 229.2577
>> 
>> 
>> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into 8365926
>  - Spelling
>  - Merge branch 'master' into 8365926
>  - draft jal<->jalr

Nice fix! Thanks!
Got some questions.

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 110:

> 108:   // We changed instruction stream
> 109:   if (mt_safe) {
> 110:     OrderAccess::release();

If we have relese here, do we still need the release in `set_stub_address_destination_at`?

src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 111:

> 109:   if (mt_safe) {
> 110:     OrderAccess::release();
> 111:     ICache::invalidate_range(jal_pc, NativeInstruction::instruction_size);

should `jal_pc` be `instruction_address()`?

-------------

PR Review: https://git.openjdk.org/jdk/pull/26944#pullrequestreview-3173008459
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313495692
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2313495802

From mli at openjdk.org  Mon Sep  1 10:12:43 2025
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 1 Sep 2025 10:12:43 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
Message-ID: <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>

On Mon, 1 Sep 2025 08:56:25 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hey, please consider!
>> 
>> A bunch of info in JBS entry, please read that also.
>> 
>> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
>> This patch restores them and removes this regression.
>> 
>> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
>> 
>> Please test on your hardware!
>> 
>> 
>> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
>> JDK-23 (last version with trampoline calls)
>> Mean: 3189.5827
>> Standard Deviation: 284.6478
>> 
>> JDK-25
>> Mean: 3424.8905
>> Standard Deviation: 222.2208
>> 
>> Patch:
>> Mean: 3144.8535
>> Standard Deviation: 229.2577
>> 
>> 
>> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into 8365926
>  - Spelling
>  - Merge branch 'master' into 8365926
>  - draft jal<->jalr

JDK-23 (last version with trampoline calls)
Mean: 3189.5827
Standard Deviation: 284.6478

JDK-25
Mean: 3424.8905
Standard Deviation: 222.2208

Patch:
Mean: 3144.8535
Standard Deviation: 229.2577


For the performance data, do you have some data for applying this fix on top of the next commit after`JDK-23 (last version with trampoline calls)`? I think this data might be more helpful to understand the performance comparison.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3241745418

From jbhateja at openjdk.org  Mon Sep  1 13:05:42 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Mon, 1 Sep 2025 13:05:42 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2
In-Reply-To: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
Message-ID: <pL8aEKAMjQnyUaSodISAz7Mcuu9SDY6zsR-E8E850Ps=.6bee42bd-057e-4199-8fdf-7cf69ff23184@github.com>

On Thu, 28 Aug 2025 21:09:03 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
> 
> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
> 
> For example:
> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding

Hi @vamsi-parasa , thanks for working on this, I am process of validating https://github.com/openjdk/jdk/pull/26283 and find that additional RA biasing will enable demotion for more cases, with a minimal test case I see following results

Test point:-
<img width="440" height="150" alt="image" src="https://github.com/user-attachments/assets/e294762f-3a7a-4b2c-9498-01c870997797" />
Baseline:-
<img width="440" height="150" alt="image" src="https://github.com/user-attachments/assets/635b6f40-7404-4438-b4b5-78d86354f112" />
With this patch:-
<img width="440" height="150" alt="image" src="https://github.com/user-attachments/assets/e56f910b-8e96-4332-b8b4-78428ce1a9ee" />
With additional RA biasing
<img width="440" height="150" alt="image" src="https://github.com/user-attachments/assets/14271ea0-63d5-4146-935d-c0b2c6de6c52" />

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26997#issuecomment-3242279829

From thartmann at openjdk.org  Mon Sep  1 13:14:50 2025
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Mon, 1 Sep 2025 13:14:50 GMT
Subject: RFR: 8366118: DontCompileHugeMethods is not respected with
 -XX:-TieredCompilation [v5]
In-Reply-To: <kaYxfqjgve0ZNzFAQ3P4s3tJjsaiN3StSdDVj7C71gs=.acbbf467-df7c-4ec8-a6a4-b997ce25c163@github.com>
References: <KeZJJ4fwtgBQCPof9uvquApw5fUZ75EjT2KhM3_ZpEU=.e28f986e-61b8-40d4-b812-58cfae0e2270@github.com>
 <kaYxfqjgve0ZNzFAQ3P4s3tJjsaiN3StSdDVj7C71gs=.acbbf467-df7c-4ec8-a6a4-b997ce25c163@github.com>
Message-ID: <w-bNwLEri71gh9yUJfGGfnEsja0RMMPyc9WpvJeqldQ=.e23cd22f-f2e7-4a08-9220-e94deb44d68e@github.com>

On Fri, 29 Aug 2025 23:12:18 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi,
>> 
>> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause.
>> 
>> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation.
>> 
>> -Man
>
> Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8366118-DontCompileHugeMethods
>  - Add -Xbatch to test
>  - Use List.of in test
>  - Add a jtreg test
>  - 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation

Thanks! Testing all passed now on our side.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3242321745

From epeter at openjdk.org  Mon Sep  1 13:50:52 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 13:50:52 GMT
Subject: Integrated: 8366427: C2 SuperWord: refactor VTransform scalar nodes
In-Reply-To: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
Message-ID: <Up8umKXusDhPjsobO6szvG1MxCYm6q8jRN-RsTkDgBk=.256ba000-065b-4a5c-b743-c9eb06ed0f38@github.com>

On Fri, 29 Aug 2025 09:49:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;)
> 
> We split the `VTransformScalarNode`:
> - `VTransformMemopScalarNode`
>   - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`.
>   - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`.
> - `VTransformLoopPhiNode`
>   - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges.
> - `VTransformCFGNode`
>   - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG.
> - `VTransformDataScalarNode`
>   - These represent all the normal "calculation" nodes in the loop.
> - `VTransformInputScalarNode` -> `VTransformOuterNode`:
>   - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later.
> 
> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`).

This pull request has now been integrated.

Changeset: 99223eea
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/99223eea03e2ed714f7a5408c356fdf06efc9200
Stats:     157 lines in 4 files changed: 114 ins; 0 del; 43 mod

8366427: C2 SuperWord: refactor VTransform scalar nodes

Reviewed-by: mhaessig, chagedorn, kvn

-------------

PR: https://git.openjdk.org/jdk/pull/27002

From chagedorn at openjdk.org  Mon Sep  1 13:50:51 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 1 Sep 2025 13:50:51 GMT
Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes [v3]
In-Reply-To: <siGeSxq7xrjkgXyW7YR27NvHcQV-cT0Xxf9MlVGqYyI=.4fc0f8d3-244c-4fad-a79b-4edafb448a18@github.com>
References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
 <siGeSxq7xrjkgXyW7YR27NvHcQV-cT0Xxf9MlVGqYyI=.4fc0f8d3-244c-4fad-a79b-4edafb448a18@github.com>
Message-ID: <1sMQoalYAvK2WnOtHEzQP6PFAvYM9BvF6cLfDCai7Xc=.b298d47f-fcc8-413f-bd3b-2dcebd3f099b@github.com>

On Mon, 1 Sep 2025 08:58:57 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;)
>> 
>> We split the `VTransformScalarNode`:
>> - `VTransformMemopScalarNode`
>>   - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`.
>>   - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`.
>> - `VTransformLoopPhiNode`
>>   - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges.
>> - `VTransformCFGNode`
>>   - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG.
>> - `VTransformDataScalarNode`
>>   - These represent all the normal "calculation" nodes in the loop.
>> - `VTransformInputScalarNode` -> `VTransformOuterNode`:
>>   - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later.
>> 
>> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`).
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:
> 
>  - Merge branch 'JDK-8366427-VTransform-scalar-node-refactor' of https://github.com/eme64/jdk into JDK-8366427-VTransform-scalar-node-refactor
>  - Update src/hotspot/share/opto/vtransform.hpp
>    
>    Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>  - manual merge
>  - improve print_spec
>  - rm comment
>  - InputScalar -> Outer renaming
>  - rm useless methods
>  - rm vloop_analyzer from vpointer method
>  - JDK-8366427
>  - JDK-8366361
>  - ... and 3 more: https://git.openjdk.org/jdk/compare/56713817...86e88f43

Looks good to me, too!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27002#pullrequestreview-3173752945

From epeter at openjdk.org  Mon Sep  1 13:50:51 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 13:50:51 GMT
Subject: RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes [v3]
In-Reply-To: <1sMQoalYAvK2WnOtHEzQP6PFAvYM9BvF6cLfDCai7Xc=.b298d47f-fcc8-413f-bd3b-2dcebd3f099b@github.com>
References: <0BaZ4QsDU5cQnZpcb3WzmX8UDIaomZOKkg0_BjuzLJY=.1d891297-dc22-4c79-a951-5d7456bac0cd@github.com>
 <siGeSxq7xrjkgXyW7YR27NvHcQV-cT0Xxf9MlVGqYyI=.4fc0f8d3-244c-4fad-a79b-4edafb448a18@github.com>
 <1sMQoalYAvK2WnOtHEzQP6PFAvYM9BvF6cLfDCai7Xc=.b298d47f-fcc8-413f-bd3b-2dcebd3f099b@github.com>
Message-ID: <CAEcmhh8E7eOp5hO6pe1ekyfEVB7gsiqs7ZIZZ1FD4g=.9f1fafdf-c83f-49cd-94db-32141cf0eac1@github.com>

On Mon, 1 Sep 2025 13:46:23 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:
>> 
>>  - Merge branch 'JDK-8366427-VTransform-scalar-node-refactor' of https://github.com/eme64/jdk into JDK-8366427-VTransform-scalar-node-refactor
>>  - Update src/hotspot/share/opto/vtransform.hpp
>>    
>>    Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>>  - manual merge
>>  - improve print_spec
>>  - rm comment
>>  - InputScalar -> Outer renaming
>>  - rm useless methods
>>  - rm vloop_analyzer from vpointer method
>>  - JDK-8366427
>>  - JDK-8366361
>>  - ... and 3 more: https://git.openjdk.org/jdk/compare/56713817...86e88f43
>
> Looks good to me, too!

@chhagedorn @mhaessig @vnkozlov Thanks a lot for all the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27002#issuecomment-3242452006

From epeter at openjdk.org  Mon Sep  1 14:08:43 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 14:08:43 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis
In-Reply-To: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
Message-ID: <v-Ko3HnIS-z0nv67E4Xz52-GQDAqyU-jA69rq8VetAw=.43e7beee-d48d-4769-8e6b-78c04cb7770a@github.com>

On Mon, 1 Sep 2025 07:04:25 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

> This PR addresses a wrong compilation during string optimizations.
> 
> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
> 
> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
> 
> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
> 
> Testing: T1-3 (aed5952).
> 
> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.

src/hotspot/share/opto/stringopts.cpp line 1072:

> 1070: 
> 1071:         // First exclude the following pattern:
> 1072:         // append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString;

2 things:
- I was a bit confused about the `->` directionality. Just to confirm: `toString` happens first, then the if-diamond, then the append, right? If yes: I would have reversed the order here. Then again, I'm not super familiar with string opts, so maybe the convention is different here than elsewhere.
- Are you sure this can only happen with diamonds? What about nested diamonds?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27028#discussion_r2314059400

From epeter at openjdk.org  Mon Sep  1 14:13:42 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 14:13:42 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis
In-Reply-To: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
Message-ID: <wi5AksobK_nr04pZtV6gaNUdjvVL3V0ZIIsaVg4XIvI=.213906bf-52a8-485e-baa0-1f72138b2b55@github.com>

On Mon, 1 Sep 2025 07:04:25 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

> This PR addresses a wrong compilation during string optimizations.
> 
> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
> 
> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
> 
> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
> 
> Testing: T1-3 (aed5952).
> 
> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.

test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsPhiUseOfDiamondRegion.java line 57:

> 55:         return s;
> 56:     }
> 57: }

I wonder if we could write some kind of `StringBuilder` fuzzer. Not saying it has to happen as part of this fix. But it seems we have issues with very similar patterns. And they seem quite basic: chains, diamonds, etc.

Would probably not be too hard to use the template framework to generate some random shapes, and verify the result the compiled code gives vs the interpreter.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27028#discussion_r2314076685

From epeter at openjdk.org  Mon Sep  1 14:17:47 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 14:17:47 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis
In-Reply-To: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
Message-ID: <oUdbg0W1L3xYxpOAWzr6-As8Podjed7rUNmjB7d2FH0=.a2fb4b1e-4838-4cff-9c19-1af9aac58a6f@github.com>

On Mon, 1 Sep 2025 07:04:25 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

> This PR addresses a wrong compilation during string optimizations.
> 
> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
> 
> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
> 
> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
> 
> Testing: T1-3 (aed5952).
> 
> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.

@danielogh Thanks for working on this!
I'd love to review, but I'm not very familiar with string opts. Would you mind explaining in a bit more detail what would have gone wrong here?
> Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in copy_string. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27028#issuecomment-3242540283

From epeter at openjdk.org  Mon Sep  1 14:29:42 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 14:29:42 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis
In-Reply-To: <v-Ko3HnIS-z0nv67E4Xz52-GQDAqyU-jA69rq8VetAw=.43e7beee-d48d-4769-8e6b-78c04cb7770a@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
 <v-Ko3HnIS-z0nv67E4Xz52-GQDAqyU-jA69rq8VetAw=.43e7beee-d48d-4769-8e6b-78c04cb7770a@github.com>
Message-ID: <nvCZB8H5UTln8uVLdxv-FMn2itWbVaIpu40Rc8JVGwk=.934f78ed-37ef-427c-a602-c611f9843786@github.com>

On Mon, 1 Sep 2025 14:05:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This PR addresses a wrong compilation during string optimizations.
>> 
>> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
>> 
>> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
>> 
>> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
>> 
>> Testing: T1-3 (aed5952).
>> 
>> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.
>
> src/hotspot/share/opto/stringopts.cpp line 1072:
> 
>> 1070: 
>> 1071:         // First exclude the following pattern:
>> 1072:         // append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString;
> 
> 2 things:
> - I was a bit confused about the `->` directionality. Just to confirm: `toString` happens first, then the if-diamond, then the append, right? If yes: I would have reversed the order here. Then again, I'm not super familiar with string opts, so maybe the convention is different here than elsewhere.
> - Are you sure this can only happen with diamonds? What about nested diamonds?

Ah I see the condition above already checks that it can only be a diamond.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27028#discussion_r2314106464

From epeter at openjdk.org  Mon Sep  1 14:29:43 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 1 Sep 2025 14:29:43 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis
In-Reply-To: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
Message-ID: <oRA8tNRQrMCREdXLV0JL2Bxul7GFozc_YAb7z-5QaB0=.ab7a9ded-212a-4d63-896a-a5f72bca7ebb@github.com>

On Mon, 1 Sep 2025 07:04:25 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

> This PR addresses a wrong compilation during string optimizations.
> 
> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
> 
> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
> 
> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
> 
> Testing: T1-3 (aed5952).
> 
> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.

src/hotspot/share/opto/stringopts.cpp line 1078:

> 1076:         assert(ptr->in(1)->in(0)->in(1)->is_Bool(), "unexpected if shape");
> 1077:         Node* v1 = ptr->in(1)->in(0)->in(1)->in(1)->in(1);
> 1078:         Node* v2 = ptr->in(1)->in(0)->in(1)->in(1)->in(2);

You may want to use some intermediate results and give them names.
For example:
`Node* iff = ptr->in(1)->in(0)`
You seem to make an assumption that the input of the bool is a cmp, right? Did you check that? Or is it somehow guaranteed? What if in some edge-case of an edge-case it is something else that has only one input? Could that happen?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27028#discussion_r2314106612

From aph at openjdk.org  Mon Sep  1 14:38:41 2025
From: aph at openjdk.org (Andrew Haley)
Date: Mon, 1 Sep 2025 14:38:41 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
Message-ID: <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>

On Wed, 27 Aug 2025 01:34:25 GMT, erifan <duke at openjdk.org> wrote:

> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
> 
> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
> 
> 2. Additionally, the encoding of the negative floating-point number is incorrect:
> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
> - Bit **13** should be encoded as **0** for floating-point numbers.
> 
> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
> 
> Some test cases are added to aarch64-asmtest.py, and all tests passed.

Thanks.

I'm not convinced that the refactoring is necessary. Why not write a replacement for `checked_cast<int8_t>(pack(d))` that does the right thing and fix the first `sve_cpy()` so that it does the right thing for float args?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3242601638

From rehn at openjdk.org  Mon Sep  1 14:38:49 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Mon, 1 Sep 2025 14:38:49 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <LUj3nxfsDhJ2kTdroX_W8MZCHAlgjEtQ2byk-ke_Cos=.f8503d8f-02dc-442c-b871-1a7fa735dc93@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <LUj3nxfsDhJ2kTdroX_W8MZCHAlgjEtQ2byk-ke_Cos=.f8503d8f-02dc-442c-b871-1a7fa735dc93@github.com>
Message-ID: <GKX55pfbOY1fAST3KbnIX3a-ZSHTg704ry9wrXbMbmQ=.4ebe7c56-6339-4982-a59b-1297f4a35732@github.com>

On Mon, 1 Sep 2025 10:01:58 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>> 
>>  - Merge branch 'master' into 8365926
>>  - Spelling
>>  - Merge branch 'master' into 8365926
>>  - draft jal<->jalr
>
> src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 110:
> 
>> 108:   // We changed instruction stream
>> 109:   if (mt_safe) {
>> 110:     OrderAccess::release();
> 
> If we have relese here, do we still need the release in `set_stub_address_destination_at`?

>From JBS entry, the point is to do it in a sane order:

The release in make_jal_opt so to make sure the store to instruction stream happens before I-cache flush.

1: store destination to stub
2: release
3: store destination to instruction stream
4: release
5: i-cache flush

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2314129918

From rehn at openjdk.org  Mon Sep  1 14:47:42 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Mon, 1 Sep 2025 14:47:42 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
Message-ID: <JwURJGEePbyAN6WQ26BLgr6jhvCS-AXsPyMUl8jPdK8=.dd8b659a-f596-47d5-8be4-3539c107b88d@github.com>

On Mon, 1 Sep 2025 10:10:31 GMT, Hamlin Li <mli at openjdk.org> wrote:

> ```
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> ```
> 
> For the performance data, do you have some data for applying this fix on top of the next commit after`JDK-23 (last version with trampoline calls)`? I think this data might be more helpful to understand the performance comparison between old trampoline, stub and this pr.

JDK-23 is last released version with trampoline calls.
JDK-24 is first released version with load calls.

What I can do is run ~jdk-24-prelease version which have both and backport to it... running that now...

> src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 111:
> 
>> 109:   if (mt_safe) {
>> 110:     OrderAccess::release();
>> 111:     ICache::invalidate_range(jal_pc, NativeInstruction::instruction_size);
> 
> should `jal_pc` be `instruction_address()`?

We have:

auipc // instruction_address()                                                                                     # Never changed
ld       // instruction_address() + NativeInstruction::instruction_size                    # Never changed
jal(r)  // instruction_address() + 2 * NativeInstruction::instruction_size (jal_pc) # jal<->jalr

We only change the instruction at "instruction_address() + 2 * NativeInstruction::instruction_size".

Note that jal_pos and jal_pc means a "jump and link instruction", not specifically jal or jalr.

Make sense?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3242627359
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2314145607

From dskantz at openjdk.org  Mon Sep  1 15:31:47 2025
From: dskantz at openjdk.org (Daniel Skantz)
Date: Mon, 1 Sep 2025 15:31:47 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis
In-Reply-To: <oRA8tNRQrMCREdXLV0JL2Bxul7GFozc_YAb7z-5QaB0=.ab7a9ded-212a-4d63-896a-a5f72bca7ebb@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
 <oRA8tNRQrMCREdXLV0JL2Bxul7GFozc_YAb7z-5QaB0=.ab7a9ded-212a-4d63-896a-a5f72bca7ebb@github.com>
Message-ID: <tp1qnQO66qvJFfavtY5Kv832ugNN-p1RdruGwV4zIqU=.0fb319a3-29de-4431-bda3-719fe6b17e1e@github.com>

On Mon, 1 Sep 2025 14:24:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This PR addresses a wrong compilation during string optimizations.
>> 
>> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
>> 
>> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
>> 
>> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
>> 
>> Testing: T1-3 (aed5952).
>> 
>> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.
>
> src/hotspot/share/opto/stringopts.cpp line 1078:
> 
>> 1076:         assert(ptr->in(1)->in(0)->in(1)->is_Bool(), "unexpected if shape");
>> 1077:         Node* v1 = ptr->in(1)->in(0)->in(1)->in(1)->in(1);
>> 1078:         Node* v2 = ptr->in(1)->in(0)->in(1)->in(1)->in(2);
> 
> You may want to use some intermediate results and give them names.
> For example:
> `Node* iff = ptr->in(1)->in(0)`
> You seem to make an assumption that the input of the bool is a cmp, right? Did you check that? Or is it somehow guaranteed? What if in some edge-case of an edge-case it is something else that has only one input? Could that happen?

I'm not sure if there is a guarantee, but it appears to be a pre-existing assumption that is asserted later in `eliminate_unneeded_control`: https://github.com/openjdk/jdk/blob/b06459d3a83c13c0fbc7a0a7698435f17265982e/src/hotspot/share/opto/stringopts.cpp#L268

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27028#discussion_r2314233957

From dskantz at openjdk.org  Mon Sep  1 15:25:43 2025
From: dskantz at openjdk.org (Daniel Skantz)
Date: Mon, 1 Sep 2025 15:25:43 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis
In-Reply-To: <wi5AksobK_nr04pZtV6gaNUdjvVL3V0ZIIsaVg4XIvI=.213906bf-52a8-485e-baa0-1f72138b2b55@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
 <wi5AksobK_nr04pZtV6gaNUdjvVL3V0ZIIsaVg4XIvI=.213906bf-52a8-485e-baa0-1f72138b2b55@github.com>
Message-ID: <76Nc115yY4tjDPTNDrfY6LrtPvFevss4IVo6D-0abOg=.4bffc940-f1ac-40b0-a892-7a1d5bbd39ca@github.com>

On Mon, 1 Sep 2025 14:10:56 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This PR addresses a wrong compilation during string optimizations.
>> 
>> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
>> 
>> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
>> 
>> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
>> 
>> Testing: T1-3 (aed5952).
>> 
>> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.
>
> test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsPhiUseOfDiamondRegion.java line 57:
> 
>> 55:         return s;
>> 56:     }
>> 57: }
> 
> I wonder if we could write some kind of `StringBuilder` fuzzer. Not saying it has to happen as part of this fix. But it seems we have issues with very similar patterns. And they seem quite basic: chains, diamonds, etc.
> 
> Would probably not be too hard to use the template framework to generate some random shapes, and verify the result the compiled code gives vs the interpreter.

I think this is a good idea for sure.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27028#discussion_r2314224094

From dlunden at openjdk.org  Mon Sep  1 15:58:01 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 1 Sep 2025 15:58:01 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v23]
In-Reply-To: <qfJuLa2rYGYnrmbp32LpJgVaZfShvNjVkGOuJrSw00A=.5f7b712d-5700-45b5-8beb-fde3611e31de@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <KuhZYofHDkGkzw1Kq6vDvRs4_aDxOJDbTpIL8gnkQL8=.0d25e4bc-1f73-490f-a65b-29bef7ac8903@github.com>
 <qfJuLa2rYGYnrmbp32LpJgVaZfShvNjVkGOuJrSw00A=.5f7b712d-5700-45b5-8beb-fde3611e31de@github.com>
Message-ID: <nTSapDgyDayBDpIY6CF9hZByKVT0PPLIrnaRFzUbEzY=.7b6937f4-1aec-4a3c-9c99-1614cbe47603@github.com>

On Wed, 27 Aug 2025 09:08:09 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add clarifying comments at definitions of register mask sizes
>
>> For reference, here is now the changeset adding an IFG bailout: #26118
> 
> Since that is now integrated: do we need to make any changes to the patch here? I thought the goal was to use the bailouts instead of increasing `MaxNodeLimit`.
> 
> Because looking at the discussions above: we were worried that there could be compile-time regressions - even if quite rare. But they were in the range of 40s which is quite scary. Are these now gone?

Thanks @eme64!

> Do you think it would make sense to have more tests? I'm imagining something like this:
> 
> * Generate tests with 0-255 arguments. You could use the template framework.
> * Take different types (e.g. various primitive types, also those that take 2 stack slots like long and double). You could use the template library `PrimitiveType` if you want.
> * Test that we actually get the method compiled. Maybe an IR rule could be used here?
> * And do some rudamentary result verification
> * Make sure it does not just work with `Xcomp` but also under "normal" circumstances (tiered, profiling, etc).

Sure, I can expand upon the testing. It's also a good opportunity to have a look at the template framework. Note that for `TestMaxMethodArguments.java`, I do already check that it compiles via `-XX:+AbortVMOnCompilationFailure`.

> I'll look a bit at your VM changes now ;)

Thanks, I'll have a look and respond in the individual threads.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3242808695

From dlunden at openjdk.org  Mon Sep  1 16:03:57 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 1 Sep 2025 16:03:57 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <NDDSmCvsbgpWgTU_bCIhNdo8foNn447LmTJ4HsCTv-s=.e0549027-7ac1-4794-bfce-322d3870f9d1@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <NDDSmCvsbgpWgTU_bCIhNdo8foNn447LmTJ4HsCTv-s=.e0549027-7ac1-4794-bfce-322d3870f9d1@github.com>
Message-ID: <8L3IGg5YYgi2EjlC-v5U3FkkWvK1swESQFAMwX02I84=.d597910f-0aca-4eb2-b68c-fbe565e73291@github.com>

On Mon, 1 Sep 2025 08:20:42 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 63:
>> 
>>> 61: // RM_SIZE is the base size of a register mask in 32-bit words.
>>> 62: // RM_SIZE_MIN is the theoretical minimum size of a register mask in 32-bit
>>> 63: // words.
>> 
>> It seems this is a bad pattern that was already here before you. But it really makes me a little scared here.
>> 
>> Having two variable names differ in just an underscore `_` but with different semantics is a bit confusing to me. It is hard for the reader to keep track of what is what going forward. It would be really easy for someone to confuse the two in the future and have bugs creap in that way (just because of an underscore). It may be more useful to use the units in at least one of the two names.
>> 
>> I would love to see names like `RM_SIZE` and `RM_SIZE_IN_LONGS`, rather than `RM_SIZE` and `_RM_SIZE`.
>> Even better would be `RM_SIZE_IN_INTS` and `RM_SIZE_IN_LONGS`. That way, you rould save a lot of comments. Maybe you could come up with even better names. "slots" and "words"?
>> You could consider doing a renaming PR first before the patch here. Maybe you can even automate the renaming with a command/script, and then apply the same renaming to the changes here?
>
> Oh gosh, I just realized: machine word of course depends on 32bit vs 64bit architecture. Yikes.
> So maybe the names need to be stack-slots vs words? And there should probably be a quick reminder somewhere that words can be different sizes.

Sure, we can rename them. I think `RM_SIZE_IN_INTS` and `RM_SIZE_IN_WORDS` would be most suitable. I avoided such a change in this changeset to not make it bigger than it already is. Isn't it easier to do the renaming in a follow-up RFE though, instead of before this PR? I'm fine with both though, not that much extra work to do it before.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314286169

From rehn at openjdk.org  Mon Sep  1 16:11:34 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Mon, 1 Sep 2025 16:11:34 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v3]
In-Reply-To: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
Message-ID: <HI3yqmQd3DLFwwi8LMBRh-u6uMTZ-kTfYanxTiFlPx0=.e99fedad-3866-4b34-ac95-0dcdb7e07b8d@github.com>

> Hey, please consider!
> 
> A bunch of info in JBS entry, please read that also.
> 
> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
> This patch restores them and removes this regression.
> 
> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
> 
> Please test on your hardware!
> 
> 
> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> 
> 
> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

  Review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26944/files
  - new: https://git.openjdk.org/jdk/pull/26944/files/b81779cb..f0f7f20e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=01-02

  Stats: 9 lines in 1 file changed: 3 ins; 1 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/26944.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26944/head:pull/26944

PR: https://git.openjdk.org/jdk/pull/26944

From dlunden at openjdk.org  Mon Sep  1 16:11:44 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 1 Sep 2025 16:11:44 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v25]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <xIq-w0Zkn4KnLRENZtys8VnFtf2XTe-Y-pbku89u04U=.3ac52fd7-3a79-4c00-8385-dd417e172220@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Update src/hotspot/share/opto/regmask.hpp
  
  Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20404/files
  - new: https://git.openjdk.org/jdk/pull/20404/files/80c6cf47..c4a706b5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=24
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=23-24

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From dlunden at openjdk.org  Mon Sep  1 16:11:49 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 1 Sep 2025 16:11:49 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
Message-ID: <FnEcppBk4MRrTJSfHJd8Ceu0_TtzsvdTNp_LOfVOlfU=.71e7d64b-93f8-4af7-9dbd-c9b19e48071a@github.com>

On Mon, 1 Sep 2025 08:05:04 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
>> 
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Fix typo
>>  - Updates after Emanuel's comments
>>  - Refactor and improve TestNestedSynchronize.java
>>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47
>
> src/hotspot/share/opto/regmask.hpp line 44:
> 
>> 42: // statements in Java.
>> 43: const int BoxLockNode_SLOT_LIMIT = 200;
>> 44: 
> 
> Even before this constant, it would be nice to have an introductory comment, that lays out what the regmask is for, and what its basic design is.

Yes, you are right. I'll add it!

> src/hotspot/share/opto/regmask.hpp line 122:
> 
>> 120: 
>> 121:     // Viewed as an array of machine words
>> 122:     uintptr_t _RM_UP[_RM_SIZE];
> 
> Do you know what `UP` stands for? Could we rename it maybe?
> Would be nice if we could have the same "units" for these arrays than for the sizes above.

I would guess it stands for **u**int**p**tr, and the `I` in `_RM_I` is for **i**nteger. Maybe `_RM_INT` and `_RM_WORD`?

> src/hotspot/share/opto/regmask.hpp line 128:
> 
>> 126:   // extend the register mask with dynamically allocated memory. We keep the
>> 127:   // base statically allocated _RM_UP, and arena allocate the extended mask
>> 128:   // (RM_UP_EXT) separately. Another, perhaps more elegant, option would be to
> 
> Suggestion:
> 
>   // (_RM_UP_EXT) separately. Another, perhaps more elegant, option would be to
> 
> Underscore for consistency? Or does it reference something else?

Yes, thanks (typo).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314295280
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314290338
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314292191

From dlunden at openjdk.org  Mon Sep  1 16:18:01 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 1 Sep 2025 16:18:01 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
Message-ID: <AlpIiTJMMMbzzHhby7ZRsvE5HRO4KaOSesck96YewtY=.bc61ee1f-2b40-4c43-81e1-9feb66151de9@github.com>

On Mon, 1 Sep 2025 08:08:27 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
>> 
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Fix typo
>>  - Updates after Emanuel's comments
>>  - Refactor and improve TestNestedSynchronize.java
>>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47
>
> src/hotspot/share/opto/regmask.hpp line 161:
> 
>> 159:   // cases, we can allow read-only sharing.
>> 160:   bool _read_only = false;
>> 161: #endif
> 
> Can you explain why this happens? Is this something we could clean up? It smells a bit like tech-dept. But maybe it is a really necessary performance optimization. Would be nice if there was an explanation which one it is ;)

The main issue is that register masks are stored as part of certain nodes, and nodes get copied by `Node::clone`. If someone in the future decide to add a register mask to some type of node, and forget to add a special case (like what I've now added for `MachProj`) in `Node::clone` for the node type, this safeguard will catch it and complain.

Register masks are used in peculiar ways throughout C2, and there may be other unexpected cases as well that this safeguard catches. I doubt the `_read_only` part has a measurable performance effect, I only added it because it was easy and couldn't hurt.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314305615

From dlunden at openjdk.org  Mon Sep  1 16:30:00 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 1 Sep 2025 16:30:00 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
Message-ID: <dqxTytPsW2lYZV-H1GTUmXITgosk7E7__pGsbUPeXCU=.154f7378-0e1f-4b0d-a5b1-9dc6003fd411@github.com>

On Mon, 1 Sep 2025 08:15:53 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
>> 
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Fix typo
>>  - Updates after Emanuel's comments
>>  - Refactor and improve TestNestedSynchronize.java
>>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47
>
> src/hotspot/share/opto/regmask.hpp line 96:
> 
>> 94:       (((RM_SIZE_MIN << 5) +                // Slots for machine registers
>> 95:         (max_method_parameter_length * 2) + // Slots for incoming arguments
>> 96:         (max_method_parameter_length * 2) + // Slots for outgoing arguments
> 
> What's the meaning of incoming vs outgoing arguments? Like this?
> 
> Incoming = from caller (outer nesting)
> Outgoing = to nested call (inner nesting)

Yes, you are correct. There is a detailed explanation in `x86_64.ad` ("Definition of frame structure and management information").

> src/hotspot/share/opto/regmask.hpp line 175:
> 
>> 173:   // mask can currently represent to be included. If _all_stack = false, we
>> 174:   // consider the registers not included.
>> 175:   bool _all_stack = false;
> 
> I'd prefer to have some kind of `_is_...` name here. Because when I read `all_stack` and see it is a bool, I wonder what it means - it does not tell me quickly. Does it mean that all registers are on the stack?
> 
> Is everything that is beyond the register mask purely on the stack? Is everything from the stack always beyond the register mask? I'm confused :face_with_peeking_eye:

Right, we should probably update this terminology as well. It comes from the fact that register masks can always represent all registers (+ a few stack slots), and anything beyond the mask is necessarily additional stack slots. So, if `_all_stack` is set, it means the register mask includes all of the stack slots. Any suggestion for a better name?

> src/hotspot/share/opto/regmask.hpp line 179:
> 
>> 177:   // The low and high watermarks represent the lowest and highest word that
>> 178:   // might contain set register mask bits, respectively. We guarantee that
>> 179:   // there are no bits in words outside this range, but any word at and between
> 
> In the example below, you have 1 bits above the `_hwm`. Is that intentional? Are those bits to be ignored? Can you please add some extra info to the example about that?

Right, `_lwm` and `_hwm` does not apply for `_all_stack` bits. I'll clarify!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314315615
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314312930
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314317882

From dlunden at openjdk.org  Mon Sep  1 16:35:00 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 1 Sep 2025 16:35:00 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
Message-ID: <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>

On Mon, 1 Sep 2025 08:30:19 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
>> 
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Fix typo
>>  - Updates after Emanuel's comments
>>  - Refactor and improve TestNestedSynchronize.java
>>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47
>
> src/hotspot/share/opto/regmask.hpp line 170:
> 
>> 168:   // variable indicates how many words we offset with. We consider all
>> 169:   // registers before the offset to not be included in the register mask.
>> 170:   unsigned int _offset;
> 
> Does that mean we make different slices of the mask?

I don't quite understand the question, can you please elaborate? The `_offset` means we shift the register mask to the right, so that the first bit of the first `_RM_UP` element no longer represents `OptoReg` 0 (but rather `OptoReg` `_offset * BitsPerWord`).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314325238

From dlunden at openjdk.org  Mon Sep  1 16:40:01 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 1 Sep 2025 16:40:01 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
Message-ID: <4qDpmwc0x5HCjDTzH_JUV3YtxNAFUremZZhu6G1usgM=.bc9b101a-d1f9-4ba5-bcc1-0b1afdb9d2a0@github.com>

On Mon, 1 Sep 2025 08:30:47 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
>> 
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Fix typo
>>  - Updates after Emanuel's comments
>>  - Refactor and improve TestNestedSynchronize.java
>>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47
>
> src/hotspot/share/opto/regmask.hpp line 217:
> 
>> 215:   // necessarily representing stack locations) to 1. Here is how the above
>> 216:   // register mask looks like after clearing, setting _all_stack to true, and
>> 217:   // successfully rolling over:
> 
> I'm still struggling to follow here. Maybe `_offset` is not clear to me yet. What is the value here for it? How is it changed with the `rollover`?

This `_offset` stuff is really only for a very specific use case in `PhaseChaitin::Select`, so I understand it can be hard to follow. The value for `_offset` in the example after rollover is 5 = `_rm_size`, since we have rolled over once. When we roll over the next time, the `_offset` is 10, and so on.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2314330646

From djelinski1 at gmail.com  Mon Sep  1 18:50:50 2025
From: djelinski1 at gmail.com (=?UTF-8?Q?Daniel_Jeli=C5=84ski?=)
Date: Mon, 1 Sep 2025 20:50:50 +0200
Subject: Delay slot handling
Message-ID: <CAMrH03+PFr+uUkHMm_k_8kJUBqMSXr1HgMA__bYNO-Etg4Bfdg@mail.gmail.com>

Hi all,
Does anyone still use the delay slot handling code? Can we remove it?

The code was used by the SPARC port, which was removed in JDK 15.
Looking at the list of architectures that use delay slots [1], the
removal of delay slot support could possibly affect the MIPS port.

The arm (32-bit) AD file mentions delay slots in a few places, but as
far as I can tell, that's a copy-paste error that can be easily
corrected.

The cleanup would involve at least:
- removing the LIR_OpDelay class (C1)
- removing support for ADL "branch_has_delay_slot",
"one_instruction_with_delay_slot",
"single_instruction_with_delay_slot", and "has_delay_slot"

Thoughts?


[1] https://en.wikipedia.org/wiki/Delay_slot#Implementations

From fyang at openjdk.org  Tue Sep  2 02:08:49 2025
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 2 Sep 2025 02:08:49 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <JwURJGEePbyAN6WQ26BLgr6jhvCS-AXsPyMUl8jPdK8=.dd8b659a-f596-47d5-8be4-3539c107b88d@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
 <JwURJGEePbyAN6WQ26BLgr6jhvCS-AXsPyMUl8jPdK8=.dd8b659a-f596-47d5-8be4-3539c107b88d@github.com>
Message-ID: <r59HzSMnxiMQcRFZ5aBbq1CDJSCMJQ0iHnuwTMg4eEA=.0db27f56-cef3-410a-ba97-f56b4870ce7a@github.com>

On Mon, 1 Sep 2025 14:42:47 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 111:
>> 
>>> 109:   if (mt_safe) {
>>> 110:     OrderAccess::release();
>>> 111:     ICache::invalidate_range(jal_pc, NativeInstruction::instruction_size);
>> 
>> should `jal_pc` be `instruction_address()`?
>
> We have:
> 
> auipc // instruction_address()                                                                                     # Never changed
> ld       // instruction_address() + NativeInstruction::instruction_size                    # Never changed
> jal(r)  // instruction_address() + 2 * NativeInstruction::instruction_size (jal_pc) # jal<->jalr
> 
> We only change the instruction at "instruction_address() + 2 * NativeInstruction::instruction_size".
> 
> Note that jal_pos and jal_pc means a "jump and link instruction", not specifically jal or jalr.
> 
> Make sense?

Maybe we can give it a new name to avoid possible confusion? `jmp_pc` or simply `pc`?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2314762084

From jbhateja at openjdk.org  Tue Sep  2 02:55:45 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Tue, 2 Sep 2025 02:55:45 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2
In-Reply-To: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
Message-ID: <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>

On Thu, 28 Aug 2025 21:09:03 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
> 
> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
> 
> For example:
> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding

src/hotspot/cpu/x86/assembler_x86.cpp line 12932:

> 12930:   if (is_commutative && is_demotable(no_flags, dst->encoding(), src2->encoding())) {
> 12931:     if (size == EVEX_64bit) {
> 12932:       emit_prefix_and_int8(get_prefixq(src1, dst, is_map1), opcode_byte + 2);

It will be good to write a comment on top of opcode_byte adjustment on account of opcode mismatch b/w NDD and equivalent demotable variant.


EVEX.LLZ.NP.MAP4.SCALABLE 21 /r      AND {NF} {ND=1} rv, rv/mv, rv


`REX.W + 23 /r      AND r64, r/m64 | RM | Valid | N.E. | r64 AND r/m64
`

src/hotspot/cpu/x86/assembler_x86.cpp line 13055:

> 13053:   bool is_prefixq = (size == EVEX_64bit) ? true : false;
> 13054:   bool normal_demotion = is_demotable(no_flags, dst_enc, nds_enc);
> 13055:   bool commutative_demotion = is_commutative && is_demotable(no_flags, dst_enc, src_enc);

Nomenclature change: instead of normal_demotion and commutative demotion, it will be more appropriate to use first/second_operand_demotable.

src/hotspot/cpu/x86/x86_64.ad line 7121:

> 7119: %{
> 7120:   predicate(UseAPX);
> 7121:   match(Set dst (AddI (LoadI src1) src2));

Will this not be covered by the pattern at line 7103, since ADLC automatically generates a DFA to handle both cases?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2314775483
PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2313941101
PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2314792264

From duke at openjdk.org  Tue Sep  2 03:04:46 2025
From: duke at openjdk.org (erifan)
Date: Tue, 2 Sep 2025 03:04:46 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
Message-ID: <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>

On Mon, 1 Sep 2025 14:35:40 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
>> 
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>> 
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>> 
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>> 
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> Thanks.
> 
> I'm not convinced that the refactoring is necessary. Why not write a replacement for `checked_cast<int8_t>(pack(d))` that does the right thing and fix the first `sve_cpy()` so that it does the right thing for float args?

Thanks @theRealAph .

I've indeed considered and implemented your idea. The code diff:

diff --git a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
index 11d302e9026..841d24f516b 100644
--- a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
+++ b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
@@ -3813,8 +3813,9 @@ template<typename R, typename... Rx>
                bool isMerge, bool isFloat) {
     starti;
     assert(T != Q, "invalid size");
+    assert((!isFloat) || (isFloat && T != B), "invalid size");
     int sh = 0;
-    if (imm8 <= 127 && imm8 >= -128) {
+    if ((imm8 <= 127 && imm8 >= -128) || (isFloat && (imm8 >> 8) == 0)) {
       sh = 0;
     } else if (T != B && imm8 <= 32512 && imm8 >= -32768 && (imm8 & 0xff) == 0) {
       sh = 1;
@@ -3824,7 +3825,7 @@ template<typename R, typename... Rx>
     }
     int m = isMerge ? 1 : 0;
     f(0b00000101, 31, 24), f(T, 23, 22), f(0b01, 21, 20);
-    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), sf(imm8, 12, 5), rf(Zd, 0);
+    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8&0xff, 12, 5), rf(Zd, 0);
   }

 public:
@@ -3834,7 +3835,7 @@ template<typename R, typename... Rx>
   }
   // SVE copy floating-point immediate to vector elements (predicated)
   void sve_cpy(FloatRegister Zd, SIMD_RegVariant T, PRegister Pg, double d) {
-    sve_cpy(Zd, T, Pg, checked_cast<int8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
+    sve_cpy(Zd, T, Pg, checked_cast<uint8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
   }

   // SVE conditionally select elements from two vectors


However, some of my colleagues have differing opinions:
1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** . While some encoding code can be reused, separating the encodings makes the code clearer.

I think both implementations are fine. If you think it's better to not refactor, I'll revert.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3243633607

From hgreule at openjdk.org  Tue Sep  2 06:04:43 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Tue, 2 Sep 2025 06:04:43 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2]
In-Reply-To: <kbnqPgzu1B_licQfGS08Ft27PBqsEuM4u41fhWKPqVA=.83b62444-41d6-4af2-878e-1060d216760d@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <egju31L-1yCzNrJI2pNxwvV7uffT5BPgWtFx6vURMBo=.14407344-322f-41cd-a726-9fed278f8d73@github.com>
 <WmUEFadtBk_lX01hWiW_vP-5OJGB8DJZLcuG-t_OZYs=.b683a318-8c94-460e-a7e0-f5cade71553c@github.com>
 <3BJWLK3FukQCp2FHGcyBDTZtbc5aS8VreNKYKAaQrdU=.43a7e821-8d56-4161-850a-9137d17d44de@github.com>
 <kbnqPgzu1B_licQfGS08Ft27PBqsEuM4u41fhWKPqVA=.83b62444-41d6-4af2-878e-1060d216760d@github.com>
Message-ID: <f_QYY0KSruPfxiI2PhAP8qMWenQ5_2lKs_c4bzJoj3o=.9afa1434-fe92-462c-b0a1-7e9a9762b06a@github.com>

On Mon, 25 Aug 2025 13:20:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> @eme64 I merged master and hopefully addressed your latest comments. Now that we have #17508 integrated, I could also directly update the unsigned variant, but I'm also fine with doing that separately. WDYT?
>>> 
>>> I also checked the constant folding part again (or generally whenever the RHS is a constant), these code paths are indeed not used by PhaseGVN directly (but by PhaseCCP and PhaseIdealLoop). That makes it a bit difficult to test that part properly.
>> 
>> Let's keep the patch as it is. With #17508 we will have to also probably refactor and add more tests, if we want to do any unsigned and known-bit optimizations.
>> 
>> ----------------
>> 
>> @SirYwell Thanks for the updates, I had a few more comments, but we are getting there :)
>
>> @eme64 I addressed your latest comments now, please re-review :)
>> 
>> Regarding my previous observation
>> 
>> > * If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> 
>> should I open a new RFE for that? Or generally, what's your opinion on this?
> 
> Can you show some examples? Filing an RFE would surely not be wrong.

@eme64 gentle ping in case you missed my latest changes :)
Please let me know if there is more to do.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3243887966

From aph at openjdk.org  Tue Sep  2 08:12:42 2025
From: aph at openjdk.org (Andrew Haley)
Date: Tue, 2 Sep 2025 08:12:42 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
 <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>
Message-ID: <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>

On Tue, 2 Sep 2025 03:01:36 GMT, erifan <duke at openjdk.org> wrote:

> 1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.

That's a fair point, but the Arch64 name for all four instructions is CPY, and they are distinguished by their operands. Deviation from the names in the Reference Manual is occasionally necessary, but it makes life painful for maintainers when they have to search for what we've called an instruction they want to use.
 
>     2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** .

Yes, that's right.

> While some encoding code can be reused, separating the encodings makes the code clearer.

I don't agree that it makes the code clearer. In fact, tight factoring emphasizes the fact that these instructions are similar, and explicitly shows where they are different.

It is true that I have a strong bias against copy-and-paste programming.

> I think both implementations are fine. If you think it's better to not refactor, I'll revert.

I do. Thank you.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3244259237

From shade at openjdk.org  Tue Sep  2 08:50:09 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Tue, 2 Sep 2025 08:50:09 GMT
Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles
 type checks [v24]
In-Reply-To: <EHaEtNfaTsFGbyNRL3O-wYfnD1Y6y55o_UpkVz1n3l4=.15aa7c47-f9ea-41a4-b74f-71c571aacdd2@github.com>
References: <EHaEtNfaTsFGbyNRL3O-wYfnD1Y6y55o_UpkVz1n3l4=.15aa7c47-f9ea-41a4-b74f-71c571aacdd2@github.com>
Message-ID: <s8QFCh3pN4xFAs9BRosu5WMc_kAlEs0rAlhxV16kKlk=.8b9a4815-e1d2-47cf-b99a-1d6d5586bd7d@github.com>

> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this.
> 
> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow.
> 
> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often.
> 
> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer.
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code
>  - [x] Linux x86_64 server fastdebug, `all`
>  - [x] Linux AArch64 server fastdebug, `all`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:

 - Fix build failure
 - Merge branch 'master' into JDK-8231269-compile-task-weaks
 - Docs touchup
 - Use enum class
 - Further simplify the API
 - Tune up for release builds
 - Move release() to destructor
 - Deal with things without spinlocks
 - Merge branch 'master' into JDK-8231269-compile-task-weaks
 - Merge branch 'master' into JDK-8231269-compile-task-weaks
 - ... and 35 more: https://git.openjdk.org/jdk/compare/af532cc1...ed7aef7e

-------------

Changes: https://git.openjdk.org/jdk/pull/24018/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=23
  Stats: 376 lines in 14 files changed: 332 ins; 23 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/24018.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018

PR: https://git.openjdk.org/jdk/pull/24018

From epeter at openjdk.org  Tue Sep  2 12:33:33 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 12:33:33 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is missing
 ctrl and floats over SafePoint creating stale oops
Message-ID: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>

**Analysis**

A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).

With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.

**Fix:** add control to the `CastP2X` so that it cannot float too far.

**Details**


rbp = Allcoate array
spill <- rbp + 0x20

call to allocateArrays
-> allocates a lot, and triggers GC. That moves the allocated array behind rbp
-> rbp is oop-mapped, so it is updated automatically to the new oop
-> spill value remains based on the old oop

We now compute the aliasing runtime check:
-> one side of the comparison is computed from rbp (new oop)
-> the other side is computed from the the spill value (old oop)
-> the cmp returns a nonsensical value, and we take the wrong branch
-> vectorize even though we have aliasing!

-------------

Commit messages:
 - fix test flags
 - the fix
 - JDK-8366490

Changes: https://git.openjdk.org/jdk/pull/27045/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27045&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366490
  Stats: 152 lines in 5 files changed: 139 ins; 1 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/27045.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27045/head:pull/27045

PR: https://git.openjdk.org/jdk/pull/27045

From mli at openjdk.org  Tue Sep  2 12:50:46 2025
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 2 Sep 2025 12:50:46 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <GKX55pfbOY1fAST3KbnIX3a-ZSHTg704ry9wrXbMbmQ=.4ebe7c56-6339-4982-a59b-1297f4a35732@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <LUj3nxfsDhJ2kTdroX_W8MZCHAlgjEtQ2byk-ke_Cos=.f8503d8f-02dc-442c-b871-1a7fa735dc93@github.com>
 <GKX55pfbOY1fAST3KbnIX3a-ZSHTg704ry9wrXbMbmQ=.4ebe7c56-6339-4982-a59b-1297f4a35732@github.com>
Message-ID: <mp1Jr2Vt3uN19r4fINgunk_5JleGfm7HVpYsKSdeF5c=.4097c1ec-0dbd-4ede-8850-d4dac6a33705@github.com>

On Mon, 1 Sep 2025 14:35:38 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 110:
>> 
>>> 108:   // We changed instruction stream
>>> 109:   if (mt_safe) {
>>> 110:     OrderAccess::release();
>> 
>> If we have relese here, do we still need the release in `set_stub_address_destination_at`?
>
> From JBS entry, the point is to do it in a sane order:
> 
> The release in make_jal_opt so to make sure the store to instruction stream happens before I-cache flush.
> 
> 1: store destination to stub
> 2: release
> 3: store destination to instruction stream
> 4: release
> 5: i-cache flush

I don't see a detailed discussion about why there needs to be 2 `release`.
Seems the `2: release` is redundant? does a single release (step 4) after step 3 work as well?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2315984966

From thartmann at openjdk.org  Tue Sep  2 12:50:45 2025
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Tue, 2 Sep 2025 12:50:45 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops
In-Reply-To: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
Message-ID: <eu2Ai-c_VrI41-EyUVmHwT92WCH8cI8rgjfYwLMQk70=.0103b73c-2289-4e78-9340-9e44b3a4e713@github.com>

On Tue, 2 Sep 2025 10:45:33 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> **Analysis**
> 
> A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).
> 
> With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.
> 
> **Fix:** add control to the `CastP2X` so that it cannot float too far.
> 
> **Details**
> 
> 
> rbp = Allcoate array
> spill <- rbp + 0x20
> 
> call to allocateArrays
> -> allocates a lot, and triggers GC. That moves the allocated array behind rbp
> -> rbp is oop-mapped, so it is updated automatically to the new oop
> -> spill value remains based on the old oop
> 
> We now compute the aliasing runtime check:
> -> one side of the comparison is computed from rbp (new oop)
> -> the other side is computed from the the spill value (old oop)
> -> the cmp returns a nonsensical value, and we take the wrong branch
> -> vectorize even though we have aliasing!

Nice analysis! Looks good to me.

src/hotspot/share/opto/vectorization.cpp line 1128:

> 1126:       Node* variable = (s.variable() == iv) ? iv_value : s.variable();
> 1127:       if (variable->bottom_type()->isa_ptr() != nullptr) {
> 1128:         // Make sure that ctlr is late enough, so that we do not

Suggestion:

        // Make sure that ctrl is late enough, so that we do not

test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCastP2XCtrl.java line 2:

> 1: /*
> 2:  * Copyright (c) 2024, 2025, Oracle and/or its affiliates. All rights reserved.

Suggestion:

 * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved.

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27045#pullrequestreview-3176374409
PR Review Comment: https://git.openjdk.org/jdk/pull/27045#discussion_r2315976205
PR Review Comment: https://git.openjdk.org/jdk/pull/27045#discussion_r2315967541

From mli at openjdk.org  Tue Sep  2 12:50:46 2025
From: mli at openjdk.org (Hamlin Li)
Date: Tue, 2 Sep 2025 12:50:46 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <r59HzSMnxiMQcRFZ5aBbq1CDJSCMJQ0iHnuwTMg4eEA=.0db27f56-cef3-410a-ba97-f56b4870ce7a@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
 <JwURJGEePbyAN6WQ26BLgr6jhvCS-AXsPyMUl8jPdK8=.dd8b659a-f596-47d5-8be4-3539c107b88d@github.com>
 <r59HzSMnxiMQcRFZ5aBbq1CDJSCMJQ0iHnuwTMg4eEA=.0db27f56-cef3-410a-ba97-f56b4870ce7a@github.com>
Message-ID: <VaFi5GTeRk5rXKSoSN-5bGyW60tqEv9Ub-jw8xAYpN4=.f908e30b-dadd-4794-aa24-ca5507a5ff11@github.com>

On Tue, 2 Sep 2025 02:06:16 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> We have:
>> 
>> auipc // instruction_address()                                                                                     # Never changed
>> ld       // instruction_address() + NativeInstruction::instruction_size                    # Never changed
>> jal(r)  // instruction_address() + 2 * NativeInstruction::instruction_size (jal_pc) # jal<->jalr
>> 
>> We only change the instruction at "instruction_address() + 2 * NativeInstruction::instruction_size".
>> 
>> Note that jal_pos and jal_pc means a "jump and link instruction", not specifically jal or jalr.
>> 
>> Make sense?
>
> Maybe we can give it a new name to avoid possible confusion? `jmp_pc` or simply `pc`?

> We only change the instruction at "instruction_address() + 2 * NativeInstruction::instruction_size".

Right!

> Note that jal_pos and jal_pc means a "jump and link instruction", not specifically jal or jalr.

As we're patching either `jal` or `jalr` instruction, so jal is misleading, I agree `jmp_xxx` is a better name.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2315982823

From epeter at openjdk.org  Tue Sep  2 12:58:37 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 12:58:37 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops [v2]
In-Reply-To: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
Message-ID: <NSMK0nJ3hKA-wBH_fjc0a4bTT9SixSE9Xm8gk69Ub0Q=.666c98dd-894b-4fe0-9e4d-6c735bbb48fa@github.com>

> **Analysis**
> 
> A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).
> 
> With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.
> 
> **Fix:** add control to the `CastP2X` so that it cannot float too far.
> 
> **Details**
> 
> 
> rbp = Allcoate array
> spill <- rbp + 0x20
> 
> call to allocateArrays
> -> allocates a lot, and triggers GC. That moves the allocated array behind rbp
> -> rbp is oop-mapped, so it is updated automatically to the new oop
> -> spill value remains based on the old oop
> 
> We now compute the aliasing runtime check:
> -> one side of the comparison is computed from rbp (new oop)
> -> the other side is computed from the the spill value (old oop)
> -> the cmp returns a nonsensical value, and we take the wrong branch
> -> vectorize even though we have aliasing!

Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:

 - fix test requires
 - Apply suggestions from code review
   
   Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27045/files
  - new: https://git.openjdk.org/jdk/pull/27045/files/91652115..13f70d31

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27045&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27045&range=00-01

  Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27045.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27045/head:pull/27045

PR: https://git.openjdk.org/jdk/pull/27045

From chagedorn at openjdk.org  Tue Sep  2 13:02:46 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 2 Sep 2025 13:02:46 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops [v2]
In-Reply-To: <NSMK0nJ3hKA-wBH_fjc0a4bTT9SixSE9Xm8gk69Ub0Q=.666c98dd-894b-4fe0-9e4d-6c735bbb48fa@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
 <NSMK0nJ3hKA-wBH_fjc0a4bTT9SixSE9Xm8gk69Ub0Q=.666c98dd-894b-4fe0-9e4d-6c735bbb48fa@github.com>
Message-ID: <_WggL2SoHOitvvIsgLxItXT9Tr8vk-gcfray1GOTvsw=.bf03d0f6-bafb-4010-b4d9-f927e0cbe944@github.com>

On Tue, 2 Sep 2025 12:58:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **Analysis**
>> 
>> A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).
>> 
>> With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.
>> 
>> **Fix:** add control to the `CastP2X` so that it cannot float too far.
>> 
>> **Details**
>> 
>> 
>> rbp = Allcoate array
>> spill <- rbp + 0x20
>> 
>> call to allocateArrays
>> -> allocates a lot, and triggers GC. That moves the allocated array behind rbp
>> -> rbp is oop-mapped, so it is updated automatically to the new oop
>> -> spill value remains based on the old oop
>> 
>> We now compute the aliasing runtime check:
>> -> one side of the comparison is computed from rbp (new oop)
>> -> the other side is computed from the the spill value (old oop)
>> -> the cmp returns a nonsensical value, and we take the wrong branch
>> -> vectorize even though we have aliasing!
>
> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - fix test requires
>  - Apply suggestions from code review
>    
>    Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>

Otherwise, looks good to me, too!

test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCastP2XCtrl.java line 59:

> 57:  * @test id=vanilla
> 58:  * @bug 8366490
> 59:  * @run driver compiler.loopopts.superword.TestAliasingCastP2XCtrl

Should be `main` to allow to run with passed in flags
Suggestion:

 * @run main compiler.loopopts.superword.TestAliasingCastP2XCtrl

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27045#pullrequestreview-3176446699
PR Review Comment: https://git.openjdk.org/jdk/pull/27045#discussion_r2316014972

From mhaessig at openjdk.org  Tue Sep  2 13:02:45 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 2 Sep 2025 13:02:45 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops [v2]
In-Reply-To: <NSMK0nJ3hKA-wBH_fjc0a4bTT9SixSE9Xm8gk69Ub0Q=.666c98dd-894b-4fe0-9e4d-6c735bbb48fa@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
 <NSMK0nJ3hKA-wBH_fjc0a4bTT9SixSE9Xm8gk69Ub0Q=.666c98dd-894b-4fe0-9e4d-6c735bbb48fa@github.com>
Message-ID: <CUI0y8N5FgEYjNO85hEiByfZ-N0m60XRHUQkeOMu-Zc=.1de81907-bee2-49ba-8cea-5fed5205b5cb@github.com>

On Tue, 2 Sep 2025 12:58:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **Analysis**
>> 
>> A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).
>> 
>> With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.
>> 
>> **Fix:** add control to the `CastP2X` so that it cannot float too far.
>> 
>> **Details**
>> 
>> 
>> rbp = Allcoate array
>> spill <- rbp + 0x20
>> 
>> call to allocateArrays
>> -> allocates a lot, and triggers GC. That moves the allocated array behind rbp
>> -> rbp is oop-mapped, so it is updated automatically to the new oop
>> -> spill value remains based on the old oop
>> 
>> We now compute the aliasing runtime check:
>> -> one side of the comparison is computed from rbp (new oop)
>> -> the other side is computed from the the spill value (old oop)
>> -> the cmp returns a nonsensical value, and we take the wrong branch
>> -> vectorize even though we have aliasing!
>
> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - fix test requires
>  - Apply suggestions from code review
>    
>    Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>

Thank you for the fix and the easy to follow analysis, @eme64. I just have a few minor comments. Otherwise, this looks good.

src/hotspot/share/opto/vectorization.cpp line 1128:

> 1126:       Node* variable = (s.variable() == iv) ? iv_value : s.variable();
> 1127:       if (variable->bottom_type()->isa_ptr() != nullptr) {
> 1128:         // Make sure that ctrl is late enough, so that we do not

Suggestion:

        // Use a ctrl that is late enough, so that we do not

At first, I read this as "we need to make sure here that the `ctrl` is late enough` when we really use a `ctrl` that is passed and we cannot really affect its place anymore. But feel free to ignore.

test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCastP2XCtrl.java line 31:

> 29:  *          from floating over a SafePoint that could move the oop,
> 30:  *          and render the cast value stale.
> 31:  *

Suggestion:


Nit: superfluous empty line

test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCastP2XCtrl.java line 71:

> 69:             int[] a = new int[N];
> 70:         }
> 71:         // Makes GC more likely.

No clue if this is the right use case, but maybe this would be a good use of `-XX:+GCALot`?

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27045#pullrequestreview-3176427125
PR Review Comment: https://git.openjdk.org/jdk/pull/27045#discussion_r2316012817
PR Review Comment: https://git.openjdk.org/jdk/pull/27045#discussion_r2316005451
PR Review Comment: https://git.openjdk.org/jdk/pull/27045#discussion_r2316002575

From epeter at openjdk.org  Tue Sep  2 13:09:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 13:09:32 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops [v3]
In-Reply-To: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
Message-ID: <zaZmh64g0pWJgG_qND5A-EeRQc5wO8yAiiiYQtgsnMQ=.f0076ad4-8511-4084-9ef1-220d04eb2c15@github.com>

> **Analysis**
> 
> A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).
> 
> With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.
> 
> **Fix:** add control to the `CastP2X` so that it cannot float too far.
> 
> **Details**
> 
> 
> rbp = Allcoate array
> spill <- rbp + 0x20
> 
> call to allocateArrays
> -> allocates a lot, and triggers GC. That moves the allocated array behind rbp
> -> rbp is oop-mapped, so it is updated automatically to the new oop
> -> spill value remains based on the old oop
> 
> We now compute the aliasing runtime check:
> -> one side of the comparison is computed from rbp (new oop)
> -> the other side is computed from the the spill value (old oop)
> -> the cmp returns a nonsensical value, and we take the wrong branch
> -> vectorize even though we have aliasing!

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Apply suggestions from code review
  
  Co-authored-by: Manuel H?ssig <manuel at haessig.org>
  Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27045/files
  - new: https://git.openjdk.org/jdk/pull/27045/files/13f70d31..d1a35d12

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27045&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27045&range=01-02

  Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27045.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27045/head:pull/27045

PR: https://git.openjdk.org/jdk/pull/27045

From epeter at openjdk.org  Tue Sep  2 13:09:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 13:09:32 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops [v3]
In-Reply-To: <CUI0y8N5FgEYjNO85hEiByfZ-N0m60XRHUQkeOMu-Zc=.1de81907-bee2-49ba-8cea-5fed5205b5cb@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
 <NSMK0nJ3hKA-wBH_fjc0a4bTT9SixSE9Xm8gk69Ub0Q=.666c98dd-894b-4fe0-9e4d-6c735bbb48fa@github.com>
 <CUI0y8N5FgEYjNO85hEiByfZ-N0m60XRHUQkeOMu-Zc=.1de81907-bee2-49ba-8cea-5fed5205b5cb@github.com>
Message-ID: <_e00ByDtLuvLzyrtZJik2E5wVLYm70Oj0d_7f3zi2oU=.af116453-1cf1-4c4c-bd2e-8fc33d6be943@github.com>

On Tue, 2 Sep 2025 12:54:23 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Apply suggestions from code review
>>   
>>   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>>   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>
> test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCastP2XCtrl.java line 71:
> 
>> 69:             int[] a = new int[N];
>> 70:         }
>> 71:         // Makes GC more likely.
> 
> No clue if this is the right use case, but maybe this would be a good use of `-XX:+GCALot`?

Maybe, you could be right!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27045#discussion_r2316033051

From chagedorn at openjdk.org  Tue Sep  2 13:22:41 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 2 Sep 2025 13:22:41 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops [v3]
In-Reply-To: <zaZmh64g0pWJgG_qND5A-EeRQc5wO8yAiiiYQtgsnMQ=.f0076ad4-8511-4084-9ef1-220d04eb2c15@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
 <zaZmh64g0pWJgG_qND5A-EeRQc5wO8yAiiiYQtgsnMQ=.f0076ad4-8511-4084-9ef1-220d04eb2c15@github.com>
Message-ID: <V9vvzPujZaVKTBdg3hd1KUH5YbI-Sfp1UsES7jWCDJM=.65f5033e-619a-4d49-a021-df55094f4dca@github.com>

On Tue, 2 Sep 2025 13:09:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **Analysis**
>> 
>> A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).
>> 
>> With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.
>> 
>> **Fix:** add control to the `CastP2X` so that it cannot float too far.
>> 
>> **Details**
>> 
>> 
>> rbp = Allcoate array
>> spill <- rbp + 0x20
>> 
>> call to allocateArrays
>> -> allocates a lot, and triggers GC. That moves the allocated array behind rbp
>> -> rbp is oop-mapped, so it is updated automatically to the new oop
>> -> spill value remains based on the old oop
>> 
>> We now compute the aliasing runtime check:
>> -> one side of the comparison is computed from rbp (new oop)
>> -> the other side is computed from the the spill value (old oop)
>> -> the cmp returns a nonsensical value, and we take the wrong branch
>> -> vectorize even though we have aliasing!
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Apply suggestions from code review
>   
>   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

Marked as reviewed by chagedorn (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27045#pullrequestreview-3176542659

From epeter at openjdk.org  Tue Sep  2 13:23:49 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 13:23:49 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v6]
In-Reply-To: <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
Message-ID: <ecCUXuaiM38DU57X4nxOBa4O2tS_Fxq_NugFrtGweCA=.b8827411-3c04-4f2d-a7a9-e461276fb53d@github.com>

On Mon, 1 Sep 2025 09:03:24 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.
>
> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adjust vector size expectations

Testing passed, Approved!
@galderz Thanks for working on this :)

-------------

PR Review: https://git.openjdk.org/jdk/pull/26457#pullrequestreview-3176546977

From thartmann at openjdk.org  Tue Sep  2 14:00:53 2025
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Tue, 2 Sep 2025 14:00:53 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops [v3]
In-Reply-To: <zaZmh64g0pWJgG_qND5A-EeRQc5wO8yAiiiYQtgsnMQ=.f0076ad4-8511-4084-9ef1-220d04eb2c15@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
 <zaZmh64g0pWJgG_qND5A-EeRQc5wO8yAiiiYQtgsnMQ=.f0076ad4-8511-4084-9ef1-220d04eb2c15@github.com>
Message-ID: <uNuQAJ1qBB2NOyG5FxRsz2XziOyKuNCI4ReOBb7nAWc=.6550f453-b067-428d-b34a-baab0a00d124@github.com>

On Tue, 2 Sep 2025 13:09:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **Analysis**
>> 
>> A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).
>> 
>> With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.
>> 
>> **Fix:** add control to the `CastP2X` so that it cannot float too far.
>> 
>> **Details**
>> 
>> 
>> rbp = Allcoate array
>> spill <- rbp + 0x20
>> 
>> call to allocateArrays
>> -> allocates a lot, and triggers GC. That moves the allocated array behind rbp
>> -> rbp is oop-mapped, so it is updated automatically to the new oop
>> -> spill value remains based on the old oop
>> 
>> We now compute the aliasing runtime check:
>> -> one side of the comparison is computed from rbp (new oop)
>> -> the other side is computed from the the spill value (old oop)
>> -> the cmp returns a nonsensical value, and we take the wrong branch
>> -> vectorize even though we have aliasing!
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Apply suggestions from code review
>   
>   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

Marked as reviewed by thartmann (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27045#pullrequestreview-3176700213

From epeter at openjdk.org  Tue Sep  2 14:08:05 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 14:08:05 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <8L3IGg5YYgi2EjlC-v5U3FkkWvK1swESQFAMwX02I84=.d597910f-0aca-4eb2-b68c-fbe565e73291@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <NDDSmCvsbgpWgTU_bCIhNdo8foNn447LmTJ4HsCTv-s=.e0549027-7ac1-4794-bfce-322d3870f9d1@github.com>
 <8L3IGg5YYgi2EjlC-v5U3FkkWvK1swESQFAMwX02I84=.d597910f-0aca-4eb2-b68c-fbe565e73291@github.com>
Message-ID: <ahDUkle8YPUMjgTpAS3CWajrsK6AYoo13oVeoPGV16s=.f754aa0f-649a-4cd4-bbf1-85296035b413@github.com>

On Mon, 1 Sep 2025 16:01:23 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Oh gosh, I just realized: machine word of course depends on 32bit vs 64bit architecture. Yikes.
>> So maybe the names need to be stack-slots vs words? And there should probably be a quick reminder somewhere that words can be different sizes.
>
> Sure, we can rename them. I think `RM_SIZE_IN_INTS` and `RM_SIZE_IN_WORDS` would be most suitable. I avoided such a change in this changeset to not make it bigger than it already is. Isn't it easier to do the renaming in a follow-up RFE though, instead of before this PR? I'm fine with both though, not that much extra work to do it before.

I think it would be easier to review if you do it first.
That PR won't be super controversial, and just makes the code nicer.
And then when we come back here, we may even be able to drop some comments, or be able to catch bugs just because the reviewers understand better what's going on ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2316205052

From epeter at openjdk.org  Tue Sep  2 14:11:07 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 14:11:07 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <AlpIiTJMMMbzzHhby7ZRsvE5HRO4KaOSesck96YewtY=.bc61ee1f-2b40-4c43-81e1-9feb66151de9@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <AlpIiTJMMMbzzHhby7ZRsvE5HRO4KaOSesck96YewtY=.bc61ee1f-2b40-4c43-81e1-9feb66151de9@github.com>
Message-ID: <ce_3y-SiKQOv1BaliDzNA3rWZMuJDwHeCSUAU5hTxyY=.e14a6019-9e7b-416c-bf16-da62ce46d210@github.com>

On Mon, 1 Sep 2025 16:15:28 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

> The main issue is that register masks are stored as part of certain nodes, and nodes get copied by Node::clone

Ok, that answers it for me. Maybe you can expand the comment a little where you mention that masks are `shallowly copied`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2316213034

From epeter at openjdk.org  Tue Sep  2 14:19:02 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 14:19:02 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <dqxTytPsW2lYZV-H1GTUmXITgosk7E7__pGsbUPeXCU=.154f7378-0e1f-4b0d-a5b1-9dc6003fd411@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <dqxTytPsW2lYZV-H1GTUmXITgosk7E7__pGsbUPeXCU=.154f7378-0e1f-4b0d-a5b1-9dc6003fd411@github.com>
Message-ID: <INWolnROoCsmEkID5uTRUD-dIEd_-V5AWST3c2BEtlA=.4a5ce5ec-92f5-470c-aa0b-9c8985c882be@github.com>

On Mon, 1 Sep 2025 16:23:57 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 96:
>> 
>>> 94:       (((RM_SIZE_MIN << 5) +                // Slots for machine registers
>>> 95:         (max_method_parameter_length * 2) + // Slots for incoming arguments
>>> 96:         (max_method_parameter_length * 2) + // Slots for outgoing arguments
>> 
>> What's the meaning of incoming vs outgoing arguments? Like this?
>> 
>> Incoming = from caller (outer nesting)
>> Outgoing = to nested call (inner nesting)
>
> Yes, you are correct. There is a detailed explanation in `x86_64.ad` ("Definition of frame structure and management information").

Ok. But that's not immediately apparent here. If you already have a comment, why not mention caller/callee or inner/outer scope?

>> src/hotspot/share/opto/regmask.hpp line 175:
>> 
>>> 173:   // mask can currently represent to be included. If _all_stack = false, we
>>> 174:   // consider the registers not included.
>>> 175:   bool _all_stack = false;
>> 
>> I'd prefer to have some kind of `_is_...` name here. Because when I read `all_stack` and see it is a bool, I wonder what it means - it does not tell me quickly. Does it mean that all registers are on the stack?
>> 
>> Is everything that is beyond the register mask purely on the stack? Is everything from the stack always beyond the register mask? I'm confused :face_with_peeking_eye:
>
> Right, we should probably update this terminology as well. It comes from the fact that register masks can always represent all registers (+ a few stack slots), and anything beyond the mask is necessarily additional stack slots. So, if `_all_stack` is set, it means the register mask includes all of the stack slots. Any suggestion for a better name?

So that could mean that we have stack slots that are in the mask, and that are off, but we still have `_all_stack = true`, right? That sounds a little contradictory to me.

Some ideas:
- `_value_of_bits_above_mask` - though strictly speaking the mask also represents those bits, and so they are not really "above" the mask.
- `_value_of_bits_above_...` ah it is above the register mask `size`, right? Of course it is a bit suboptimal that the `size` is only for those that we explicitly represent, and does not capture that we implicitly represent. Maybe you can think about naming here too. Optional.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2316237483
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2316234008

From epeter at openjdk.org  Tue Sep  2 14:25:14 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 14:25:14 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
Message-ID: <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>

On Mon, 1 Sep 2025 16:31:58 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 170:
>> 
>>> 168:   // variable indicates how many words we offset with. We consider all
>>> 169:   // registers before the offset to not be included in the register mask.
>>> 170:   unsigned int _offset;
>> 
>> Does that mean we make different slices of the mask?
>
> I don't quite understand the question, can you please elaborate? The `_offset` means we shift the register mask to the right, so that the first bit of the first `_RM_UP` element no longer represents `OptoReg` 0 (but rather `OptoReg` `_offset * BitsPerWord`).

Hmm ok. Now I went to `rm_up` and thought that you would do `i - _offset`. But that's not what happens.

Hmm but then here there is a subtraction:

  bool Member(OptoReg::Name reg) const {
    reg = reg - offset_bits();


Is that consistent? I hope you understand why I'm confused ?

>> src/hotspot/share/opto/regmask.hpp line 217:
>> 
>>> 215:   // necessarily representing stack locations) to 1. Here is how the above
>>> 216:   // register mask looks like after clearing, setting _all_stack to true, and
>>> 217:   // successfully rolling over:
>> 
>> I'm still struggling to follow here. Maybe `_offset` is not clear to me yet. What is the value here for it? How is it changed with the `rollover`?
>
> This `_offset` stuff is really only for a very specific use case in `PhaseChaitin::Select`, so I understand it can be hard to follow. The value for `_offset` in the example after rollover is 5 = `_rm_size`, since we have rolled over once. When we roll over the next time, the `_offset` is 10, and so on.

Ok, just make sure you document it in the example :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2316251262
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2316252858

From dlunden at openjdk.org  Tue Sep  2 14:41:58 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 2 Sep 2025 14:41:58 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
 <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>
Message-ID: <GdM72hQe1NvODLC6vcGtXrL5GnMA2c6IsRcdVW3z6r8=.740386db-28e4-46b7-a321-2218dfe2d846@github.com>

On Tue, 2 Sep 2025 14:20:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I don't quite understand the question, can you please elaborate? The `_offset` means we shift the register mask to the right, so that the first bit of the first `_RM_UP` element no longer represents `OptoReg` 0 (but rather `OptoReg` `_offset * BitsPerWord`).
>
> Hmm ok. Now I went to `rm_up` and thought that you would do `i - _offset`. But that's not what happens.
> 
> Hmm but then here there is a subtraction:
> 
>   bool Member(OptoReg::Name reg) const {
>     reg = reg - offset_bits();
> 
> 
> Is that consistent? I hope you understand why I'm confused ?

Yes, the subtraction is consistent, because if the register mask is offset, we can no longer use the OptoReg to directly index the mask. Small simplified example: register mask with 5 bits, offset by 10. First bit (index 0) represents OptoReg 10, second bit (index 1) represents OptoReg 11, etc. If we call `Member(15)`, we need to subtract the offset so we look at the correct index in the register mask (index 5).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2316301804

From epeter at openjdk.org  Tue Sep  2 14:56:46 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 2 Sep 2025 14:56:46 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v6]
In-Reply-To: <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
Message-ID: <idfXzOAwRtFdxYxtZ5KEM_zMP5YkK7Ezf403EIh4OKM=.2de7fd1b-fade-4574-9e8d-26b5155ca4e0@github.com>

On Mon, 1 Sep 2025 09:03:24 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.
>
> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adjust vector size expectations

Marked as reviewed by epeter (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/26457#pullrequestreview-3176939920

From galder at openjdk.org  Tue Sep  2 14:56:47 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Tue, 2 Sep 2025 14:56:47 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v6]
In-Reply-To: <CvRoKTqLKs2t1jFrMm2Au1g85grsxItFDdk1v1VT-ag=.fa323248-154a-4273-8967-50e815ca11e4@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
 <CvRoKTqLKs2t1jFrMm2Au1g85grsxItFDdk1v1VT-ag=.fa323248-154a-4273-8967-50e815ca11e4@github.com>
Message-ID: <IEwknlbMUvJ8m2KMIgSTT76b6tis4SaKEtLgPKTzYHI=.6c8cccfa-0293-4897-b967-8b3678541d7b@github.com>

On Mon, 1 Sep 2025 09:07:07 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Adjust vector size expectations
>
> Perfect, thanks for the update! I'll submit testing again :)

@eme64 thanks for running the tests! Did you actually mark the review as approved?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3245692537

From duke at openjdk.org  Tue Sep  2 15:03:45 2025
From: duke at openjdk.org (duke)
Date: Tue, 2 Sep 2025 15:03:45 GMT
Subject: RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F [v6]
In-Reply-To: <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
 <nepdy9F_UuqGkx4xGZmEcM_bM_Sz1v0FXM6L-RvqotA=.f93adb07-85b8-4bfa-83b5-5e4f769cfd86@github.com>
Message-ID: <yPQyFXqKrARnxITd7CDLtlaDAa8fmhIc_cUxSk-J6_c=.84690886-a684-41bb-a892-9dce42836944@github.com>

On Mon, 1 Sep 2025 09:03:24 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.
>
> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adjust vector size expectations

@galderz 
Your change (at version 632408ba2adf8f3bffe226a9c2bb0db022d4e8d1) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3245723721

From vlivanov at openjdk.org  Tue Sep  2 16:00:48 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 2 Sep 2025 16:00:48 GMT
Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8]
In-Reply-To: <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com>
 <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
Message-ID: <bhuh8Pj6MVAhJi3zuFa9sIdBKChzxz94Ue_QuSSuEdE=.2d7ee4eb-6462-44c1-b645-27d5b58ad333@github.com>

On Tue, 26 Aug 2025 22:59:54 GMT, Igor Veresov <iveresov at openjdk.org> wrote:

>> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause  any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent  it from working reliably.
>
> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Relax verification invariant

src/hotspot/share/oops/trainingData.cpp line 635:

> 633:     int init_deps_left2 = compute_init_deps_left();
> 634: 
> 635:     bool invariant = (init_deps_left1 >= init_deps_left2);

I assume this check takes concurrent class initialization into account and init notification events are processed on a dedicated thread. Can we strengthen the check by repeatedly performing it and ensuring the value converges? Also, maybe take event queue into account?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2316527758

From iveresov at openjdk.org  Tue Sep  2 16:19:44 2025
From: iveresov at openjdk.org (Igor Veresov)
Date: Tue, 2 Sep 2025 16:19:44 GMT
Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8]
In-Reply-To: <bhuh8Pj6MVAhJi3zuFa9sIdBKChzxz94Ue_QuSSuEdE=.2d7ee4eb-6462-44c1-b645-27d5b58ad333@github.com>
References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com>
 <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
 <bhuh8Pj6MVAhJi3zuFa9sIdBKChzxz94Ue_QuSSuEdE=.2d7ee4eb-6462-44c1-b645-27d5b58ad333@github.com>
Message-ID: <JwnNrCDWVBvRYXASJFxAXV0PyDvvkn4Fz0F_B1B9ty4=.7dbebfeb-fa0c-43f9-a2c9-ef5c81089d80@github.com>

On Tue, 2 Sep 2025 15:57:42 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Relax verification invariant
>
> src/hotspot/share/oops/trainingData.cpp line 635:
> 
>> 633:     int init_deps_left2 = compute_init_deps_left();
>> 634: 
>> 635:     bool invariant = (init_deps_left1 >= init_deps_left2);
> 
> I assume this check takes concurrent class initialization into account and init notification events are processed on a dedicated thread. Can we strengthen the check by repeatedly performing it and ensuring the value converges? Also, maybe take event queue into account?

It's very hard to do reliably given the way the vm shutdown currently works. There is no way to ensure that all the java threads are stopped, so checking the convergence is problematic. So, the best I can do right now is prove the `>=` property.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2316575038

From iveresov at openjdk.org  Tue Sep  2 16:19:45 2025
From: iveresov at openjdk.org (Igor Veresov)
Date: Tue, 2 Sep 2025 16:19:45 GMT
Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8]
In-Reply-To: <JwnNrCDWVBvRYXASJFxAXV0PyDvvkn4Fz0F_B1B9ty4=.7dbebfeb-fa0c-43f9-a2c9-ef5c81089d80@github.com>
References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com>
 <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
 <bhuh8Pj6MVAhJi3zuFa9sIdBKChzxz94Ue_QuSSuEdE=.2d7ee4eb-6462-44c1-b645-27d5b58ad333@github.com>
 <JwnNrCDWVBvRYXASJFxAXV0PyDvvkn4Fz0F_B1B9ty4=.7dbebfeb-fa0c-43f9-a2c9-ef5c81089d80@github.com>
Message-ID: <iH-Isn80LyqlCHEE4-165oAmC1wkJGjx9ngxpjuANms=.8ffc05b5-d960-4ef8-882a-770457261a9f@github.com>

On Tue, 2 Sep 2025 16:15:42 GMT, Igor Veresov <iveresov at openjdk.org> wrote:

>> src/hotspot/share/oops/trainingData.cpp line 635:
>> 
>>> 633:     int init_deps_left2 = compute_init_deps_left();
>>> 634: 
>>> 635:     bool invariant = (init_deps_left1 >= init_deps_left2);
>> 
>> I assume this check takes concurrent class initialization into account and init notification events are processed on a dedicated thread. Can we strengthen the check by repeatedly performing it and ensuring the value converges? Also, maybe take event queue into account?
>
> It's very hard to do reliably given the way the vm shutdown currently works. There is no way to ensure that all the java threads are stopped, so checking the convergence is problematic. So, the best I can do right now is prove the `>=` property.

I mean, I tired, but gave up on the convergence for now. Perhaps we'd make a stab at it another time.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2316577781

From vlivanov at openjdk.org  Tue Sep  2 16:59:43 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 2 Sep 2025 16:59:43 GMT
Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8]
In-Reply-To: <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com>
 <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
Message-ID: <BICzZtAjyJRSWJVQH9-vYotUR3TikzxT5diBiO9xPSk=.72ae61ba-1d99-4e0d-b70a-dfb87e25be9f@github.com>

On Tue, 26 Aug 2025 22:59:54 GMT, Igor Veresov <iveresov at openjdk.org> wrote:

>> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause  any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent  it from working reliably.
>
> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Relax verification invariant

Looks good.

-------------

Marked as reviewed by vlivanov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26866#pullrequestreview-3177412360

From vlivanov at openjdk.org  Tue Sep  2 16:59:44 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 2 Sep 2025 16:59:44 GMT
Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8]
In-Reply-To: <iH-Isn80LyqlCHEE4-165oAmC1wkJGjx9ngxpjuANms=.8ffc05b5-d960-4ef8-882a-770457261a9f@github.com>
References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com>
 <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
 <bhuh8Pj6MVAhJi3zuFa9sIdBKChzxz94Ue_QuSSuEdE=.2d7ee4eb-6462-44c1-b645-27d5b58ad333@github.com>
 <JwnNrCDWVBvRYXASJFxAXV0PyDvvkn4Fz0F_B1B9ty4=.7dbebfeb-fa0c-43f9-a2c9-ef5c81089d80@github.com>
 <iH-Isn80LyqlCHEE4-165oAmC1wkJGjx9ngxpjuANms=.8ffc05b5-d960-4ef8-882a-770457261a9f@github.com>
Message-ID: <qeEG0W6LodkUmy_-jGTRKWnWMtBPkjnS5nrn6thpxAw=.ce8a771f-82b8-453c-85a8-7dbb5ccef5f8@github.com>

On Tue, 2 Sep 2025 16:16:41 GMT, Igor Veresov <iveresov at openjdk.org> wrote:

>> It's very hard to do reliably given the way the vm shutdown currently works. There is no way to ensure that all the java threads are stopped, so checking the convergence is problematic. So, the best I can do right now is prove the `>=` property.
>
> I mean, I tired, but gave up on the convergence for now. Perhaps we'd make a stab at it another time.

Ok, sounds good. Thanks for the clarifications.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26866#discussion_r2316665607

From vlivanov at openjdk.org  Tue Sep  2 17:58:41 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 2 Sep 2025 17:58:41 GMT
Subject: RFR: 8358751: C2: Recursive inlining check for compiled lambda
 forms is broken
In-Reply-To: <osdBJeRtiCSx5LDg56oHUFfA5vJVsD3ipLTO9fY7Awg=.4b2d1298-9b16-49d6-ab05-dec04644433e@github.com>
References: <osdBJeRtiCSx5LDg56oHUFfA5vJVsD3ipLTO9fY7Awg=.4b2d1298-9b16-49d6-ab05-dec04644433e@github.com>
Message-ID: <KlYOpisR8BFEY-kfZlfi3yyoW-rZxtUvNB_wAbCuo8U=.a2405d68-2124-441b-b3b0-f2a5639276bf@github.com>

On Fri, 22 Aug 2025 01:24:52 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead.
> 
> Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames.
> An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM.
>   
> Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts.  
> 
> Testing: hs-tier1 - hs-tier8
> 
> (Special thanks to @mroth23 who prepared a reproducer of the bug.)

Thanks for the reviews, Dean and Roland.

> What about a regression test?

I wasn't able to extract a regression test from the failing program. I added additional asserts to catch problematic accesses, so a similar bug should be easier to caught in the future.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26891#issuecomment-3246286042

From iklam at openjdk.org  Tue Sep  2 18:10:43 2025
From: iklam at openjdk.org (Ioi Lam)
Date: Tue, 2 Sep 2025 18:10:43 GMT
Subject: RFR: 8365407: Race condition in MethodTrainingData::verify() [v8]
In-Reply-To: <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com>
 <e6BxIQEd-ibINc9J7QTxPN2ELJSMWrofPk9z_j11ESk=.4e8ab4f4-696c-4476-afcb-a36fcf610a4d@github.com>
Message-ID: <KfmAVmGdSiFiziYgfPYLhofH6WIboNWnDbBjHnEV0tw=.11dff486-a38e-4bbb-aa1a-7c4cf4776e88@github.com>

On Tue, 26 Aug 2025 22:59:54 GMT, Igor Veresov <iveresov at openjdk.org> wrote:

>> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause  any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent  it from working reliably.
>
> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Relax verification invariant

LGTM

-------------

Marked as reviewed by iklam (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26866#pullrequestreview-3177620970

From vlivanov at openjdk.org  Tue Sep  2 18:26:28 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 2 Sep 2025 18:26:28 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v6]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <nWicBeUUypvCJDy88-jvVzeS5sB-V71Tge1TCw1i2GU=.944068c6-5f47-4e72-870f-b340966ae9db@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:

  Unconditionally schedule RF nodes for IGVN

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25315/files
  - new: https://git.openjdk.org/jdk/pull/25315/files/8b1c6dff..0762dda9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=04-05

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From manc at openjdk.org  Tue Sep  2 18:51:48 2025
From: manc at openjdk.org (Man Cao)
Date: Tue, 2 Sep 2025 18:51:48 GMT
Subject: RFR: 8366118: DontCompileHugeMethods is not respected with
 -XX:-TieredCompilation [v5]
In-Reply-To: <kaYxfqjgve0ZNzFAQ3P4s3tJjsaiN3StSdDVj7C71gs=.acbbf467-df7c-4ec8-a6a4-b997ce25c163@github.com>
References: <KeZJJ4fwtgBQCPof9uvquApw5fUZ75EjT2KhM3_ZpEU=.e28f986e-61b8-40d4-b812-58cfae0e2270@github.com>
 <kaYxfqjgve0ZNzFAQ3P4s3tJjsaiN3StSdDVj7C71gs=.acbbf467-df7c-4ec8-a6a4-b997ce25c163@github.com>
Message-ID: <1F0kx1dn34B_lxtY-AMLLkLR3PFXeB3kVUudkbQNuS4=.58673aae-839a-40b2-8814-2972e27d85cb@github.com>

On Fri, 29 Aug 2025 23:12:18 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi,
>> 
>> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause.
>> 
>> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation.
>> 
>> -Man
>
> Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8366118-DontCompileHugeMethods
>  - Add -Xbatch to test
>  - Use List.of in test
>  - Add a jtreg test
>  - 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation

Could anyone give another approval on the latest change?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3246433933

From fferrari at openjdk.org  Tue Sep  2 19:05:45 2025
From: fferrari at openjdk.org (Francisco Ferrari Bihurriet)
Date: Tue, 2 Sep 2025 19:05:45 GMT
Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead
 of the Bool type [v3]
In-Reply-To: <_EN6o6Jwu73CNwvSXYt2cHSHu6Yglkp86f1t7lywwi4=.a84b6fac-327a-48a5-8f1e-772b31d8da10@github.com>
References: <LmCIZuKf5HvIO11yvPWX2H7f_2cYqD0EVUNZffsuLh4=.06f2b4b8-3c6f-4cfa-91bb-03df54688033@github.com>
 <oV2eNI_Xgm8CUnHEodKU1dxGRxOFOHEyH-zrf8BmniM=.745c473b-1319-46a2-9ef9-ecaf6bec8668@github.com>
 <aWrX_YhB8STx3Donb9aTYSYgWQ8TSOqMMqOyUuGX_j4=.2b4180a0-4112-4a89-824d-4ccac0f9718d@github.com>
 <_EN6o6Jwu73CNwvSXYt2cHSHu6Yglkp86f1t7lywwi4=.a84b6fac-327a-48a5-8f1e-772b31d8da10@github.com>
Message-ID: <fVF69bQkTbApxdv4G7V9Net5FpWmP4n471WhmEIQoiM=.c8b671f2-be7d-4bff-8f6c-ef80d0a8ce35@github.com>

On Fri, 29 Aug 2025 13:17:48 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> # Absence note
>> 
>> Today is the last day before a ~2 weeks vacation, so my next working day is Monday, September 1st.
>> 
>> Please feel free to keep giving feedback and/or reviews, and I will continue when I'm back.
>> 
>> Cheers,
>> Francisco
>
> Hi @franferrax, hope you had a good vacation!
> 
>> Hi @chhagedorn,
>> 
>> I added the new tests in [e6b1cb8](https://github.com/openjdk/jdk/commit/e6b1cb897d9c75b34744c7d24f72abcec9986b0b). One problem I'm facing is that I'm unable to generate `Bool` nodes with arbitrary `BoolTest` values. Even if I try the assert inversions I removed in [10e1e3f](https://github.com/openjdk/jdk/commit/10e1e3f4f796d05dcd5c56bc2365d5d564d93952), C2 has preference for `BoolTest::ne`, `BoolTest::le` and `BoolTest::lt`. Instead of using `BoolTest::eq`, `BoolTest::gt` or `BoolTest::ge`, it swaps what is put in `IfTrue` and `IfFalse`.
>> 
>> Even if `javac` generates an `ifeq` and an `ifne` with the same inputs, instead of a single `CmpU` with two `Bool`s (`BoolTest::eq` and `BoolTest::ne`), I get a single `Bool` (`BoolTest::ne`) with two `If` (one of them swapping `IfTrue` with `IfFalse`). I guess this is some sort of canonicalization to enable further optimizations.
>> 
>> Do you know a way to influence the `Bool`'s `BoolTest` value? Or @rwestrel do you?
>> 
>> This means the following 8 cases are not really testing what they claim, but repeating other cases with `IfTrue` and `IfFalse` swapped:
>> 
>> * `testCase1aOptimizeAsFalseForGT(xm|mx)` (they should use `BoolTest::gt`, but use `BoolTest::le`)
>> * `testCase1bOptimizeAsFalseForEQ(xm|mx)` (they should use `BoolTest::eq`, but use `BoolTest::ne`)
>> * `testCase1bOptimizeAsFalseForGE(xm|mx)` (they should use `BoolTest::ge`, but use `BoolTest::lt`)
>> * `testCase1bOptimizeAsFalseForGT(xm|mx)` (they should use `BoolTest::gt`, but use `BoolTest::le`)
>> 
>> Even if we don't find a way to influence the `BoolTest`, the cases are still valid and can be kept (just in case the described behaviour changes).
> 
> Hm, that's a good point. `Parse::do_if()` indeed always canonicalizes the `Bool` nodes... But I was sure we can still somehow end up with non-canonicalized versions again with some tricks. I was curious and played around with some examples and could indeed find test cases for `gt`, `ge` , and `eq`.
> 
> I was then also thinking about notification code in IGVN. We already concluded further up that it's not needed for CCP because `CmpU` nodes below `AddI` nodes are put to the worklist again. However, with IGVN, we could modify the graph above the `AndI` as well. We miss notification code for `CmpU` below `AndI`. I changed my test cases further to also run into such a missing optimization case. When run with `-XX:VerifyIterativeGVN=1110`, we indeed get su...

Hi @chhagedorn, thank you for the additional work and your insights. This is much appreciated from a learner perspective.

I didn't fully analyze the `Test.java` you provided yet, but wanted to check if you are aiming to include the missing IGVN notification code as part of this issue (and its corresponding test). Or are you working on an independent issue?

My availability will be limited as the October CPU approaches, but it will try to find some timeboxes to make `TestBoolNodeGVN.java` emit the right test cases for `gt`, `ge` , and `eq`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3246471074

From rehn at openjdk.org  Tue Sep  2 19:31:45 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Tue, 2 Sep 2025 19:31:45 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
Message-ID: <djhXJ16-orCF1zf0UrYoRUfNvPObWQNWvoZGV-AOR5s=.99883aa8-3374-4ccc-95bd-075df7a9aabf@github.com>

On Mon, 1 Sep 2025 10:10:31 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>> 
>>  - Merge branch 'master' into 8365926
>>  - Spelling
>>  - Merge branch 'master' into 8365926
>>  - draft jal<->jalr
>
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> 
> 
> For the performance data, do you have some data for applying this fix on top of the next commit after`JDK-23 (last version with trampoline calls)`? I think this data might be more helpful to understand the performance comparison between old trampoline, stub and this pr.

@Hamlin-Li

This is the first version which hade the new auipc+ld+jalr, i.e. we could toogle with UseTrampolines.
I backported optimize_call to it. This version is still using t0/x5 for calls, thus return predictions are all messed up.
An even better comparison would be to also back-port use t1/x6 for calls.
Anyways here are the numbers, from 400 benchmarks runs each using the last iteration:

Using trampolines:
##############
--- Statistical Analysis ---
Average (Mean):     3610.00
Median:             3645.09
Standard Deviation: 297.11
--------------------------

Using load calls:
##############
--- Statistical Analysis ---
Average (Mean):     3691.09
Median:             3793.11
Standard Deviation: 403.80
--------------------------

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3246536967

From dlong at openjdk.org  Tue Sep  2 20:52:32 2025
From: dlong at openjdk.org (Dean Long)
Date: Tue, 2 Sep 2025 20:52:32 GMT
Subject: RFR: 8366461: Remove obsolete method handle invoke logic [v3]
In-Reply-To: <LQQer6eHAvGEV6clizLClEdOtBBIO7GCQCzibGcEzL8=.7ec9480c-c660-460d-ab5c-69d4d4a4d03d@github.com>
References: <LQQer6eHAvGEV6clizLClEdOtBBIO7GCQCzibGcEzL8=.7ec9480c-c660-460d-ab5c-69d4d4a4d03d@github.com>
Message-ID: <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>

> At one time, JSR292 support needed special logic to save and restore SP across method handle instrinsic calls, but that is no longer the case. The only platform that still does the save/restore is arm32, which is no longer necessary. The save/restore can be removed along with related APIs and logic. Note that the arm32 port is largely based on the x86 port, which stopped doing the save/restore in jdk9 ([JDK-8068945](https://bugs.openjdk.org/browse/JDK-8068945)).

Dean Long has updated the pull request incrementally with three additional commits since the last revision:

 - revert whitespace change
 - undo debug changes
 - cleanup

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27059/files
  - new: https://git.openjdk.org/jdk/pull/27059/files/303305ae..eac482a5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27059&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27059&range=01-02

  Stats: 7 lines in 4 files changed: 1 ins; 6 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27059.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27059/head:pull/27059

PR: https://git.openjdk.org/jdk/pull/27059

From vlivanov at openjdk.org  Tue Sep  2 21:11:46 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 2 Sep 2025 21:11:46 GMT
Subject: RFR: 8366461: Remove obsolete method handle invoke logic [v3]
In-Reply-To: <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
References: <LQQer6eHAvGEV6clizLClEdOtBBIO7GCQCzibGcEzL8=.7ec9480c-c660-460d-ab5c-69d4d4a4d03d@github.com>
 <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
Message-ID: <jtd7gfuV5C8sQBsx4bQIhWLFlDnIEyCza6u3foAV4RU=.2b6d7b42-3c1f-41ba-8a8e-056a982ad588@github.com>

On Tue, 2 Sep 2025 20:52:32 GMT, Dean Long <dlong at openjdk.org> wrote:

>> At one time, JSR292 support needed special logic to save and restore SP across method handle instrinsic calls, but that is no longer the case. The only platform that still does the save/restore is arm32, which is no longer necessary. The save/restore can be removed along with related APIs and logic. Note that the arm32 port is largely based on the x86 port, which stopped doing the save/restore in jdk9 ([JDK-8068945](https://bugs.openjdk.org/browse/JDK-8068945)).
>
> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - revert whitespace change
>  - undo debug changes
>  - cleanup

Nice cleanup! Looks good.

-------------

Marked as reviewed by vlivanov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27059#pullrequestreview-3178139499

From iveresov at openjdk.org  Tue Sep  2 21:30:52 2025
From: iveresov at openjdk.org (Igor Veresov)
Date: Tue, 2 Sep 2025 21:30:52 GMT
Subject: Integrated: 8365407: Race condition in MethodTrainingData::verify()
In-Reply-To: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com>
References: <0795hrryFZveQb4GgjNhdGJSYwIz98RHoJx3JX8LSDY=.dae4d10e-5a1f-49da-bec7-e77360f8026e@github.com>
Message-ID: <rtGfwRrEfd5hOQw6Kz4OV8u5VStOhiwTZsQ5Irqwlx8=.e2413b28-3fdc-4bcf-8dea-6798075487a7@github.com>

On Wed, 20 Aug 2025 18:19:25 GMT, Igor Veresov <iveresov at openjdk.org> wrote:

> This change fixes multiple issue with training data verification. While the current state of things in the mainline will not cause  any issues (because of the absence of the call to `TD::verify()` during the shutdown) it does problems in the leyden repo. This change strengthens verification in the mainline (by adding the shutdown verify call), and fixes the problems that prevent  it from working reliably.

This pull request has now been integrated.

Changeset: 991ac9e6
Author:    Igor Veresov <iveresov at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/991ac9e6168b2573f78772e2d7936792a43fe336
Stats:     90 lines in 6 files changed: 32 ins; 17 del; 41 mod

8365407: Race condition in MethodTrainingData::verify()

Reviewed-by: kvn, vlivanov, iklam

-------------

PR: https://git.openjdk.org/jdk/pull/26866

From dlong at openjdk.org  Tue Sep  2 21:54:45 2025
From: dlong at openjdk.org (Dean Long)
Date: Tue, 2 Sep 2025 21:54:45 GMT
Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee ==
 m) failed: repeated inline attempt with different callee [v3]
In-Reply-To: <mOvOsKEuhHRI02qXAr7krpQbxky-CXFBf7MDYNhHvQM=.b01e2923-6126-4b0a-8301-932770391ffe@github.com>
References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com>
 <mOvOsKEuhHRI02qXAr7krpQbxky-CXFBf7MDYNhHvQM=.b01e2923-6126-4b0a-8301-932770391ffe@github.com>
Message-ID: <Vzn4JZ8KFUqOQYjVuTEX7mwf6wonAB4yGngh9FZarEY=.91038c73-a0f8-40ec-8c15-ccb537f73b81@github.com>

On Mon, 1 Sep 2025 06:50:28 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> # Issue
>> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one.
>> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan.
>> 
>> # Cause
>> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading.
>> 
>> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method.
>> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`.
>> 
>> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same.
>> 
>> # Fix
>> 
>> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method ...
>
> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JDK-8355354: avoid resetting callee in call node ideal

src/hotspot/share/opto/compile.cpp line 2117:

> 2115:         cg->call_node()->set_generator(cg);
> 2116:         C->igvn_worklist()->push(cg->call_node());
> 2117:         should_stress = true;

I have a guess what this stress code is doing, but a good comment would help.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26441#discussion_r2317250134

From dlong at openjdk.org  Tue Sep  2 22:16:44 2025
From: dlong at openjdk.org (Dean Long)
Date: Tue, 2 Sep 2025 22:16:44 GMT
Subject: RFR: 8366461: Remove obsolete method handle invoke logic [v3]
In-Reply-To: <jtd7gfuV5C8sQBsx4bQIhWLFlDnIEyCza6u3foAV4RU=.2b6d7b42-3c1f-41ba-8a8e-056a982ad588@github.com>
References: <LQQer6eHAvGEV6clizLClEdOtBBIO7GCQCzibGcEzL8=.7ec9480c-c660-460d-ab5c-69d4d4a4d03d@github.com>
 <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
 <jtd7gfuV5C8sQBsx4bQIhWLFlDnIEyCza6u3foAV4RU=.2b6d7b42-3c1f-41ba-8a8e-056a982ad588@github.com>
Message-ID: <Qs9S_gOm2lAwk5iQCgkL_VglUnqegw7tuI8rjVdJ2hg=.6a5633c7-910f-4bb4-9866-ffb6e5c2b0d1@github.com>

On Tue, 2 Sep 2025 21:09:27 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
>> 
>>  - revert whitespace change
>>  - undo debug changes
>>  - cleanup
>
> Nice cleanup! Looks good.

Thanks @iwanowww !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27059#issuecomment-3246957430

From wenanjian at openjdk.org  Wed Sep  3 01:30:56 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Wed, 3 Sep 2025 01:30:56 GMT
Subject: RFR: 8366747: RISC-V: Improve VerifyMethodHandles for method handle
 linkers
Message-ID: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>

According to JDK-8353216?Add extra verification logic into MethodHandle::invokeBasic/linkTo* to ensure that holder classes are properly initialized on riscv platform.

-------------

Commit messages:
 - RISC-V: Improve VerifyMethodHandles for method handle linkers

Changes: https://git.openjdk.org/jdk/pull/26938/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26938&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366747
  Stats: 52 lines in 2 files changed: 46 ins; 1 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/26938.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26938/head:pull/26938

PR: https://git.openjdk.org/jdk/pull/26938

From fyang at openjdk.org  Wed Sep  3 02:04:43 2025
From: fyang at openjdk.org (Fei Yang)
Date: Wed, 3 Sep 2025 02:04:43 GMT
Subject: RFR: 8366747: RISC-V: Improve VerifyMethodHandles for method
 handle linkers
In-Reply-To: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
References: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
Message-ID: <Wc1t9J11DTVqlqX_8o94iKdd2ErFar8zDMONjN10CA8=.f9998458-5eb9-4e77-af2c-79d2c5e10dc6@github.com>

On Tue, 26 Aug 2025 09:18:14 GMT, Anjian Wen <wenanjian at openjdk.org> wrote:

> According to JDK-8353216?Add extra verification logic into MethodHandle::invokeBasic/linkTo* to ensure that holder classes are properly initialized on riscv platform.

Seems fine to me. Two minor comments.

src/hotspot/cpu/riscv/methodHandles_riscv.cpp line 100:

> 98:   __ verify_method_ptr(method);
> 99:   if (VerifyMethodHandles) {
> 100:     Label L_ok;

Can you add an assertion here about the registers? Like: `assert_different_registers(method, t0, t1);`

src/hotspot/cpu/riscv/methodHandles_riscv.cpp line 102:

> 100:     Label L_ok;
> 101:     const Register method_holder = t1;
> 102:     __ load_method_holder(method_holder, method);

Please leave a new line before the swith-case structure.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26938#pullrequestreview-3178689768
PR Review Comment: https://git.openjdk.org/jdk/pull/26938#discussion_r2317554909
PR Review Comment: https://git.openjdk.org/jdk/pull/26938#discussion_r2317575345

From dzhang at openjdk.org  Wed Sep  3 02:13:50 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Wed, 3 Sep 2025 02:13:50 GMT
Subject: RFR: 8366747: RISC-V: Improve VerifyMethodHandles for method
 handle linkers
In-Reply-To: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
References: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
Message-ID: <jSXWApP5VSenP1-o4B2b0C-OT1WaW5DoLBqjJALmZjg=.537607df-9011-43ff-8b99-84c826bfc274@github.com>

On Tue, 26 Aug 2025 09:18:14 GMT, Anjian Wen <wenanjian at openjdk.org> wrote:

> According to JDK-8353216?Add extra verification logic into MethodHandle::invokeBasic/linkTo* to ensure that holder classes are properly initialized on riscv platform.

LGTM, thanks!

-------------

Marked as reviewed by dzhang (Author).

PR Review: https://git.openjdk.org/jdk/pull/26938#pullrequestreview-3178726464

From wenanjian at openjdk.org  Wed Sep  3 02:40:27 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Wed, 3 Sep 2025 02:40:27 GMT
Subject: RFR: 8366747: RISC-V: Improve VerifyMethodHandles for method
 handle linkers [v2]
In-Reply-To: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
References: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
Message-ID: <ryamqp6ust0SfR5AZITl2xskZFY-5y8qLjitIHyZNx0=.752c2e1e-951a-46ed-ab72-5bfd217ab4cb@github.com>

> According to JDK-8353216?Add extra verification logic into MethodHandle::invokeBasic/linkTo* to ensure that holder classes are properly initialized on riscv platform.

Anjian Wen has updated the pull request incrementally with one additional commit since the last revision:

  Add assertion and modify format

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26938/files
  - new: https://git.openjdk.org/jdk/pull/26938/files/52f76be1..b5eb3bd1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26938&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26938&range=00-01

  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/26938.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26938/head:pull/26938

PR: https://git.openjdk.org/jdk/pull/26938

From wenanjian at openjdk.org  Wed Sep  3 02:40:27 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Wed, 3 Sep 2025 02:40:27 GMT
Subject: RFR: 8366747: RISC-V: Improve VerifyMethodHandles for method
 handle linkers [v2]
In-Reply-To: <jSXWApP5VSenP1-o4B2b0C-OT1WaW5DoLBqjJALmZjg=.537607df-9011-43ff-8b99-84c826bfc274@github.com>
References: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
 <jSXWApP5VSenP1-o4B2b0C-OT1WaW5DoLBqjJALmZjg=.537607df-9011-43ff-8b99-84c826bfc274@github.com>
Message-ID: <rUmqbTGpH0iq2GMKLanzry-TrBP9g9Ve6vF0sUCaHUo=.753ff376-bfbc-4b51-ac6a-fefea8f34da5@github.com>

On Wed, 3 Sep 2025 02:11:07 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

>> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add assertion and modify format
>
> LGTM, thanks!

@DingliZhang Thanks for your review and approve!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26938#issuecomment-3247480195

From wenanjian at openjdk.org  Wed Sep  3 02:40:28 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Wed, 3 Sep 2025 02:40:28 GMT
Subject: RFR: 8366747: RISC-V: Improve VerifyMethodHandles for method
 handle linkers [v2]
In-Reply-To: <Wc1t9J11DTVqlqX_8o94iKdd2ErFar8zDMONjN10CA8=.f9998458-5eb9-4e77-af2c-79d2c5e10dc6@github.com>
References: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
 <Wc1t9J11DTVqlqX_8o94iKdd2ErFar8zDMONjN10CA8=.f9998458-5eb9-4e77-af2c-79d2c5e10dc6@github.com>
Message-ID: <iP3l-x0t4OGBN8qJCfxiJcKGxXczhcK5HIJxs3oPjrM=.492daee1-3d6e-45c5-ac7d-9937e3e3fc0f@github.com>

On Wed, 3 Sep 2025 01:40:43 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add assertion and modify format
>
> src/hotspot/cpu/riscv/methodHandles_riscv.cpp line 100:
> 
>> 98:   __ verify_method_ptr(method);
>> 99:   if (VerifyMethodHandles) {
>> 100:     Label L_ok;
> 
> Can you add an assertion here about the registers? Like: `assert_different_registers(method, t0, t1);`

Thanks for the review, I have added the assertion.

> src/hotspot/cpu/riscv/methodHandles_riscv.cpp line 102:
> 
>> 100:     Label L_ok;
>> 101:     const Register method_holder = t1;
>> 102:     __ load_method_holder(method_holder, method);
> 
> Please leave a new line before the swith-case structure.

done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26938#discussion_r2317610255
PR Review Comment: https://git.openjdk.org/jdk/pull/26938#discussion_r2317610296

From fyang at openjdk.org  Wed Sep  3 02:46:43 2025
From: fyang at openjdk.org (Fei Yang)
Date: Wed, 3 Sep 2025 02:46:43 GMT
Subject: RFR: 8366747: RISC-V: Improve VerifyMethodHandles for method
 handle linkers [v2]
In-Reply-To: <ryamqp6ust0SfR5AZITl2xskZFY-5y8qLjitIHyZNx0=.752c2e1e-951a-46ed-ab72-5bfd217ab4cb@github.com>
References: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
 <ryamqp6ust0SfR5AZITl2xskZFY-5y8qLjitIHyZNx0=.752c2e1e-951a-46ed-ab72-5bfd217ab4cb@github.com>
Message-ID: <-VWFI0Aldis7LKGbm9HRzfokwCOnrjROtPgdE4Hqogw=.d3631ecd-02a6-413c-83ef-d7f2e877c216@github.com>

On Wed, 3 Sep 2025 02:40:27 GMT, Anjian Wen <wenanjian at openjdk.org> wrote:

>> According to JDK-8353216?Add extra verification logic into MethodHandle::invokeBasic/linkTo* to ensure that holder classes are properly initialized on riscv platform.
>
> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add assertion and modify format

Thanks for the update.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26938#pullrequestreview-3178764845

From galder at openjdk.org  Wed Sep  3 06:40:49 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Wed, 3 Sep 2025 06:40:49 GMT
Subject: Integrated: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I,
 MoveI2F
In-Reply-To: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
References: <0bYYOS5AYvN4ZD1xAGBRqV_xasw-np3JWKXC7WcGhyc=.74d97456-f406-4dbe-be09-77ed3b9a66fd@github.com>
Message-ID: <28rS8Vqoyrc09J9cdn1tWXByIZUV9GL-_Hjcn3bMLBk=.ff9e5545-9f30-42cd-b2e1-56954bbdfbf2@github.com>

On Thu, 24 Jul 2025 10:29:15 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
> 
> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
> 
> 
> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
> 
> 
> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
> 
> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.

This pull request has now been integrated.

Changeset: 8c4090c2
Author:    Galder Zamarre?o <galder at openjdk.org>
Committer: Roland Westrelin <roland at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/8c4090c2cfa00f9c3550669a0726a785b30ac1d5
Stats:     67 lines in 4 files changed: 57 ins; 4 del; 6 mod

8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F

Reviewed-by: epeter, qamai

-------------

PR: https://git.openjdk.org/jdk/pull/26457

From dfenacci at openjdk.org  Wed Sep  3 06:50:26 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Wed, 3 Sep 2025 06:50:26 GMT
Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee ==
 m) failed: repeated inline attempt with different callee [v4]
In-Reply-To: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com>
References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com>
Message-ID: <ihIxDFcOzNC4d90jJJihutTjrYZZKraIWJLKhbCB6hE=.e7b1c093-432c-4680-b26c-d88c2b34f41b@github.com>

> # Issue
> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one.
> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan.
> 
> # Cause
> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading.
> 
> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method.
> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`.
> 
> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same.
> 
> # Fix
> 
> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method if it is already defined.
> 
> # T...

Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:

  JDK-8355354: add stress comment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26441/files
  - new: https://git.openjdk.org/jdk/pull/26441/files/ce807553..bf92e244

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26441&range=02-03

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/26441.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26441/head:pull/26441

PR: https://git.openjdk.org/jdk/pull/26441

From dfenacci at openjdk.org  Wed Sep  3 06:50:27 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Wed, 3 Sep 2025 06:50:27 GMT
Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee ==
 m) failed: repeated inline attempt with different callee [v3]
In-Reply-To: <Vzn4JZ8KFUqOQYjVuTEX7mwf6wonAB4yGngh9FZarEY=.91038c73-a0f8-40ec-8c15-ccb537f73b81@github.com>
References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com>
 <mOvOsKEuhHRI02qXAr7krpQbxky-CXFBf7MDYNhHvQM=.b01e2923-6126-4b0a-8301-932770391ffe@github.com>
 <Vzn4JZ8KFUqOQYjVuTEX7mwf6wonAB4yGngh9FZarEY=.91038c73-a0f8-40ec-8c15-ccb537f73b81@github.com>
Message-ID: <jKCCBgLaHAGK-RBtgxEifXLtbcJBbERqqotFeGrE2K8=.3ccbadf1-e974-47ae-a874-6facab9760f9@github.com>

On Tue, 2 Sep 2025 21:52:27 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JDK-8355354: avoid resetting callee in call node ideal
>
> src/hotspot/share/opto/compile.cpp line 2117:
> 
>> 2115:         cg->call_node()->set_generator(cg);
>> 2116:         C->igvn_worklist()->push(cg->call_node());
>> 2117:         should_stress = true;
> 
> I have a guess what this stress code is doing, but a good comment would help.

Sure! Comment added.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26441#discussion_r2317935692

From rehn at openjdk.org  Wed Sep  3 06:54:29 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Sep 2025 06:54:29 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v4]
In-Reply-To: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
Message-ID: <jR8vo1gVzyw2jtIlqaU2Ly1jUZzJnDHlu_RM5asgd3g=.8b58556a-a750-4131-aef7-97168578401d@github.com>

> Hey, please consider!
> 
> A bunch of info in JBS entry, please read that also.
> 
> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
> This patch restores them and removes this regression.
> 
> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
> 
> Please test on your hardware!
> 
> 
> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> 
> 
> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

  Review comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26944/files
  - new: https://git.openjdk.org/jdk/pull/26944/files/f0f7f20e..72e3ba6a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=02-03

  Stats: 10 lines in 1 file changed: 1 ins; 0 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/26944.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26944/head:pull/26944

PR: https://git.openjdk.org/jdk/pull/26944

From mhaessig at openjdk.org  Wed Sep  3 07:17:44 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 3 Sep 2025 07:17:44 GMT
Subject: RFR: 8366461: Remove obsolete method handle invoke logic [v3]
In-Reply-To: <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
References: <LQQer6eHAvGEV6clizLClEdOtBBIO7GCQCzibGcEzL8=.7ec9480c-c660-460d-ab5c-69d4d4a4d03d@github.com>
 <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
Message-ID: <j17HsQMH4wjiC9yAOssV1Ivx6dOMyUw_dgT-Q0KlV-c=.b7523e4d-515f-4384-adb3-cc3c9763db5c@github.com>

On Tue, 2 Sep 2025 20:52:32 GMT, Dean Long <dlong at openjdk.org> wrote:

>> At one time, JSR292 support needed special logic to save and restore SP across method handle instrinsic calls, but that is no longer the case. The only platform that still does the save/restore is arm32, which is no longer necessary. The save/restore can be removed along with related APIs and logic. Note that the arm32 port is largely based on the x86 port, which stopped doing the save/restore in jdk9 ([JDK-8068945](https://bugs.openjdk.org/browse/JDK-8068945)).
>
> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - revert whitespace change
>  - undo debug changes
>  - cleanup

Thank you for cleaning this up, @dean-long. I just have a drive-by comment.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java line 372:

> 370:         // DEBUG_ONLY(verifyDeoptriginalPc(senderNm, raw_unextendedSp));
> 371:       }
> 372:     }

`<arch>Frame.java adjustUnextendedSP()` do not seem to do anything? Perhaps these could be cleaned up as well?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27059#pullrequestreview-3179245014
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2317990499

From duke at openjdk.org  Wed Sep  3 07:22:27 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 07:22:27 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v2]
In-Reply-To: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
Message-ID: <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>

> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
> 
> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
> 
> 2. Additionally, the encoding of the negative floating-point number is incorrect:
> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
> - Bit **13** should be encoded as **0** for floating-point numbers.
> 
> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
> 
> Some test cases are added to aarch64-asmtest.py, and all tests passed.

erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Don't rename sve_cpy as sve_fcpy
 - Merge branch 'master' into JDK-8365911
 - 8365911: AArch64: Fix encoding error in sve_cpy for negative floats
   
   The?sve_cpy?instruction is not correctly implemented for?negative
   floating-point?values. The issues include:
   
   1. When a negative floating-point number (e.g. `-1.0`) is passed, the
   `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
   - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
   - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
   - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
   - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
   
   2. Additionally, the encoding of the negative floating-point number is incorrect:
   - The imm8?field can fall outside the valid range of?**[-128, 127]**.
   - Bit **13** should be encoded as **0** for floating-point numbers.
   
   This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
   
   Some test cases are added to aarch64-asmtest.py, and all tests passed.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26951/files
  - new: https://git.openjdk.org/jdk/pull/26951/files/dad0e011..16a06948

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26951&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26951&range=00-01

  Stats: 16389 lines in 782 files changed: 11717 ins; 2213 del; 2459 mod
  Patch: https://git.openjdk.org/jdk/pull/26951.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26951/head:pull/26951

PR: https://git.openjdk.org/jdk/pull/26951

From duke at openjdk.org  Wed Sep  3 07:22:27 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 07:22:27 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
 <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>
 <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
Message-ID: <bWiaf5dT3JLWsXpijrpnjMCTToX-daYvE5rXh0VafsI=.695b1b57-2540-4e0d-ab23-3b0ec418bb2d@github.com>

On Tue, 2 Sep 2025 08:10:02 GMT, Andrew Haley <aph at openjdk.org> wrote:

> I do. Thank you.

Ok, I have reverted the refactoring. Please help take another look, thanks~

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3247994550

From rehn at openjdk.org  Wed Sep  3 07:56:43 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Sep 2025 07:56:43 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <mp1Jr2Vt3uN19r4fINgunk_5JleGfm7HVpYsKSdeF5c=.4097c1ec-0dbd-4ede-8850-d4dac6a33705@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <LUj3nxfsDhJ2kTdroX_W8MZCHAlgjEtQ2byk-ke_Cos=.f8503d8f-02dc-442c-b871-1a7fa735dc93@github.com>
 <GKX55pfbOY1fAST3KbnIX3a-ZSHTg704ry9wrXbMbmQ=.4ebe7c56-6339-4982-a59b-1297f4a35732@github.com>
 <mp1Jr2Vt3uN19r4fINgunk_5JleGfm7HVpYsKSdeF5c=.4097c1ec-0dbd-4ede-8850-d4dac6a33705@github.com>
Message-ID: <r6BB4foBp1qFm3dabVKnYTWHDXM7ZgiWlBnR4tN0Tdg=.d862d996-2e84-49fd-983a-29be846e23b1@github.com>

On Tue, 2 Sep 2025 12:48:11 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> From JBS entry, the point is to do it in a sane order:
>> 
>> The release in make_jal_opt so to make sure the store to instruction stream happens before I-cache flush.
>> 
>> 1: store destination to stub
>> 2: release
>> 3: store destination to instruction stream
>> 4: release
>> 5: i-cache flush
>
> I don't see a detailed discussion about why there needs to be 2 `release`.
> Seems the `2: release` is redundant? does a single release (step 4) after step 3 work as well?

Regarding 4:
Now, from this code perspective i-cache invalidate is a bit opaque.
We do know that we don't want the store to happen after the flush.
The risc-v implementation do emit a full fence before flush, as stores may be reordered over fence.i.
But the AbstractICache::invalidate_range is not documented to guarantee to have this effect.

Regarding 2:
If someone executes the new instruction when changed to jalr(3), we did want them to call the new location we stored to the stub(1). By saying 1 happens before 3, we convey our intent.
Aarch64 also have this.

So non of the releases (2,4) is truly need AFIACT, as this code must support both calling old dest and new dest.
E.g. if you are context switch after loading old dest, context switch back and executes jalr, you will be calling old dest, which is fine as that method is marked not-entrant. Causing you to resolve this call then you will see the new dest.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2318105086

From rehn at openjdk.org  Wed Sep  3 07:56:44 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Sep 2025 07:56:44 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <VaFi5GTeRk5rXKSoSN-5bGyW60tqEv9Ub-jw8xAYpN4=.f908e30b-dadd-4794-aa24-ca5507a5ff11@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
 <JwURJGEePbyAN6WQ26BLgr6jhvCS-AXsPyMUl8jPdK8=.dd8b659a-f596-47d5-8be4-3539c107b88d@github.com>
 <r59HzSMnxiMQcRFZ5aBbq1CDJSCMJQ0iHnuwTMg4eEA=.0db27f56-cef3-410a-ba97-f56b4870ce7a@github.com>
 <VaFi5GTeRk5rXKSoSN-5bGyW60tqEv9Ub-jw8xAYpN4=.f908e30b-dadd-4794-aa24-ca5507a5ff11@github.com>
Message-ID: <ogTPA6VWJuWUgNlvfLJqXNp2WjtxrOLN-b-hPDyZQ-o=.76c5e1e2-83d2-4d63-9abb-4d118ec86fd5@github.com>

On Tue, 2 Sep 2025 12:47:23 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Maybe we can give it a new name to avoid possible confusion? `jmp_pc` or simply `pc`?
>
>> We only change the instruction at "instruction_address() + 2 * NativeInstruction::instruction_size".
> 
> Right!
> 
>> Note that jal_pos and jal_pc means a "jump and link instruction", not specifically jal or jalr.
> 
> As we're patching either `jal` or `jalr` instruction, so jal is misleading, I agree `jmp_xxx` is a better name.

fixed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2318105646

From dskantz at openjdk.org  Wed Sep  3 08:02:04 2025
From: dskantz at openjdk.org (Daniel Skantz)
Date: Wed, 3 Sep 2025 08:02:04 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis [v2]
In-Reply-To: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
Message-ID: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>

> This PR addresses a wrong compilation during string optimizations.
> 
> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
> 
> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
> 
> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
> 
> Testing: T1-3 (aed5952).
> 
> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.

Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision:

 - store intermediate calculations
 - direction convention

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27028/files
  - new: https://git.openjdk.org/jdk/pull/27028/files/8e93056d..5638f221

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27028&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27028&range=00-01

  Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27028.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27028/head:pull/27028

PR: https://git.openjdk.org/jdk/pull/27028

From aph at openjdk.org  Wed Sep  3 08:14:45 2025
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 3 Sep 2025 08:14:45 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v2]
In-Reply-To: <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>
Message-ID: <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>

On Wed, 3 Sep 2025 07:22:27 GMT, erifan <duke at openjdk.org> wrote:

>> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
>> 
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>> 
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>> 
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>> 
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Don't rename sve_cpy as sve_fcpy
>  - Merge branch 'master' into JDK-8365911
>  - 8365911: AArch64: Fix encoding error in sve_cpy for negative floats
>    
>    The?sve_cpy?instruction is not correctly implemented for?negative
>    floating-point?values. The issues include:
>    
>    1. When a negative floating-point number (e.g. `-1.0`) is passed, the
>    `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>    - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>    - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>    - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>    - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>    
>    2. Additionally, the encoding of the negative floating-point number is incorrect:
>    - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>    - Bit **13** should be encoded as **0** for floating-point numbers.
>    
>    This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>    
>    Some test cases are added to aarch64-asmtest.py, and all tests passed.

This looks good, modulo the minor style fixes.

src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3819:

> 3817:     if (isFloat) {
> 3818:       assert(T != B, "invalid size");
> 3819:       assert((imm8 >> 8) == 0, "invalid immediate");

Suggestion:

      assert((imm8 & 0xff) == 0, "invalid immediate");

To match line 3819.

src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3831:

> 3829:     int m = isMerge ? 1 : 0;
> 3830:     f(0b00000101, 31, 24), f(T, 23, 22), f(0b01, 21, 20);
> 3831:     prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8&0xff, 12, 5), rf(Zd, 0);

Suggestion:

    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8 & 0xff, 12, 5), rf(Zd, 0);

General HotSpot style.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26951#pullrequestreview-3179466006
PR Review Comment: https://git.openjdk.org/jdk/pull/26951#discussion_r2318148242
PR Review Comment: https://git.openjdk.org/jdk/pull/26951#discussion_r2318149316

From mli at openjdk.org  Wed Sep  3 08:15:46 2025
From: mli at openjdk.org (Hamlin Li)
Date: Wed, 3 Sep 2025 08:15:46 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <djhXJ16-orCF1zf0UrYoRUfNvPObWQNWvoZGV-AOR5s=.99883aa8-3374-4ccc-95bd-075df7a9aabf@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
 <djhXJ16-orCF1zf0UrYoRUfNvPObWQNWvoZGV-AOR5s=.99883aa8-3374-4ccc-95bd-075df7a9aabf@github.com>
Message-ID: <f2GiS68zTqQbTPaB7Ld7pscgB0mI28Ws2l4KSSpXTx0=.e56c1bc3-6fb7-43d3-a44e-d95aeb04992c@github.com>

On Tue, 2 Sep 2025 19:28:51 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> ```
> Using trampolines:
> ##############
> --- Statistical Analysis ---
> Average (Mean):     3610.00
> Median:             3645.09
> Standard Deviation: 297.11
> --------------------------
> 
> Using load calls:
> ##############
> --- Statistical Analysis ---
> Average (Mean):     3691.09
> Median:             3793.11
> Standard Deviation: 403.80
> --------------------------
> ```

Not sure if I understood the data right. Does this mean old trampolines perform better than new implementation related to benchmark chi-square?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3248164717

From duke at openjdk.org  Wed Sep  3 08:25:58 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 08:25:58 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v2]
In-Reply-To: <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>
 <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>
Message-ID: <R6euG0d-37EpZPYDKDgKPXzmRCNHr4xb5Ao1vgQhhpI=.00f3e651-3208-498a-96ad-9c1e7c84c0e1@github.com>

On Wed, 3 Sep 2025 08:11:27 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Don't rename sve_cpy as sve_fcpy
>>  - Merge branch 'master' into JDK-8365911
>>  - 8365911: AArch64: Fix encoding error in sve_cpy for negative floats
>>    
>>    The?sve_cpy?instruction is not correctly implemented for?negative
>>    floating-point?values. The issues include:
>>    
>>    1. When a negative floating-point number (e.g. `-1.0`) is passed, the
>>    `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>>    - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>>    - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>>    - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>>    - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>>    
>>    2. Additionally, the encoding of the negative floating-point number is incorrect:
>>    - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>>    - Bit **13** should be encoded as **0** for floating-point numbers.
>>    
>>    This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>>    
>>    Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3819:
> 
>> 3817:     if (isFloat) {
>> 3818:       assert(T != B, "invalid size");
>> 3819:       assert((imm8 >> 8) == 0, "invalid immediate");
> 
> Suggestion:
> 
>       assert((imm8 & 0xff) == 0, "invalid immediate");
> 
> To match line 3819.

This may not be the case, `imm8 >> 8` doesn't equal to `imm8 & 0xff`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26951#discussion_r2318181658

From rehn at openjdk.org  Wed Sep  3 08:28:42 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Sep 2025 08:28:42 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <f2GiS68zTqQbTPaB7Ld7pscgB0mI28Ws2l4KSSpXTx0=.e56c1bc3-6fb7-43d3-a44e-d95aeb04992c@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <qsCdM9M_iBoyg5STxTJgwB8NGx3P5kqqDTPgncWzvW8=.05b756a4-a84a-4d8c-8641-de629692dbae@github.com>
 <djhXJ16-orCF1zf0UrYoRUfNvPObWQNWvoZGV-AOR5s=.99883aa8-3374-4ccc-95bd-075df7a9aabf@github.com>
 <f2GiS68zTqQbTPaB7Ld7pscgB0mI28Ws2l4KSSpXTx0=.e56c1bc3-6fb7-43d3-a44e-d95aeb04992c@github.com>
Message-ID: <iybmsfgtQGIQRld2yVkBEH-_Kag-K7eUacJAIKuglPY=.583ad5e9-7964-4ba9-91da-e0543c3e162d@github.com>

On Wed, 3 Sep 2025 08:13:24 GMT, Hamlin Li <mli at openjdk.org> wrote:

> > ```
> > Using trampolines:
> > ##############
> > --- Statistical Analysis ---
> > Average (Mean):     3610.00
> > Median:             3645.09
> > Standard Deviation: 297.11
> > --------------------------
> > 
> > Using load calls:
> > ##############
> > --- Statistical Analysis ---
> > Average (Mean):     3691.09
> > Median:             3793.11
> > Standard Deviation: 403.80
> > --------------------------
> > ```
> 
> Not sure if I understood the data right. Does this mean old trampolines perform better than new implementation related to benchmark chi-square?

As the averages are within one standard deviation from each other, it's not statistical certain.
But it do indicate that, but as I said without also backporting "8340241: RISC-V: Returns mispredicted", it's not so clear to me at least.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3248205920

From aph at openjdk.org  Wed Sep  3 08:30:43 2025
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 3 Sep 2025 08:30:43 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v2]
In-Reply-To: <R6euG0d-37EpZPYDKDgKPXzmRCNHr4xb5Ao1vgQhhpI=.00f3e651-3208-498a-96ad-9c1e7c84c0e1@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>
 <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>
 <R6euG0d-37EpZPYDKDgKPXzmRCNHr4xb5Ao1vgQhhpI=.00f3e651-3208-498a-96ad-9c1e7c84c0e1@github.com>
Message-ID: <t7-Y2sA9pRihFSYuMo-jkNGUhICdNLI2LLhkKqgwjQU=.310906ac-79e0-45c3-8c35-5906948d172d@github.com>

On Wed, 3 Sep 2025 08:23:32 GMT, erifan <duke at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3819:
>> 
>>> 3817:     if (isFloat) {
>>> 3818:       assert(T != B, "invalid size");
>>> 3819:       assert((imm8 >> 8) == 0, "invalid immediate");
>> 
>> Suggestion:
>> 
>>       assert((imm8 & 0xff) == 0, "invalid immediate");
>> 
>> To match line 3819.
>
> This may not be the case, `imm8 >> 8` doesn't equal to `imm8 & 0xff`

What is the range of values you're trying to test?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26951#discussion_r2318194266

From epeter at openjdk.org  Wed Sep  3 08:33:47 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 3 Sep 2025 08:33:47 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
Message-ID: <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>

On Fri, 23 May 2025 04:42:08 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   review feedback
>
>>> Representing ReachabilityFence as memory barrier (e.g., MemBarCPUOrder) would solve the issue, but performance costs are prohibitively high.
> 
>> How bad is it? MemBarCPUOrder pinches all memory, so I assume this breaks a lot of optimizations when RF is sitting in the hot loop? I remember we went through a similar exercise with Blackholes: [JDK-8296545](https://bugs.openjdk.org/browse/JDK-8296545) -- and decided to pinch only the control. I guessing this is not enough to fix RF, or is it?
> 
> Yes, if a barrier stays inside loop body, it breaks a lot of important optimizations. It may end up almost as bad as a full-blown call (except a barrier can be moved around while a call can't). And moving a node when it depends both on control and memory is more complicated than just a CFG node. Moreover, as you can see in the proposed solution, even CFG-only representation is problematic for loop opts, so additional care is needed to ensure RFs are moved out of loops. 
> 
> As an alternative approach, I thought about reifying RF as a data node (think of `CastPP`) and then linking its referent to all safepoints it dominates after loop opts are over.  But that would only affect `optimize_reachability_fences()`. Everything else  would stay the same. So, I decided to stay with CFG-only representation for now.

@iwanowww Let me know whenever this is ready to review again ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3248221299

From duke at openjdk.org  Wed Sep  3 08:36:42 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 08:36:42 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v2]
In-Reply-To: <t7-Y2sA9pRihFSYuMo-jkNGUhICdNLI2LLhkKqgwjQU=.310906ac-79e0-45c3-8c35-5906948d172d@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>
 <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>
 <R6euG0d-37EpZPYDKDgKPXzmRCNHr4xb5Ao1vgQhhpI=.00f3e651-3208-498a-96ad-9c1e7c84c0e1@github.com>
 <t7-Y2sA9pRihFSYuMo-jkNGUhICdNLI2LLhkKqgwjQU=.310906ac-79e0-45c3-8c35-5906948d172d@github.com>
Message-ID: <Wg7vR5sqiT047mST7dfsDS1SX2l5GaDvAotz3NVpzfk=.1380cd99-db62-44c8-83b5-647d1fe7aa7f@github.com>

On Wed, 3 Sep 2025 08:28:30 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> This may not be the case, `imm8 >> 8` doesn't equal to `imm8 & 0xff`
>
> What is the range of values you're trying to test?

It's hard to say, because it is actually the value bits of a fp8.

Simply put, the lower 8 bits are valid values. The remaining bits must be 0.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26951#discussion_r2318210500

From aph at openjdk.org  Wed Sep  3 08:53:46 2025
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 3 Sep 2025 08:53:46 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v2]
In-Reply-To: <Wg7vR5sqiT047mST7dfsDS1SX2l5GaDvAotz3NVpzfk=.1380cd99-db62-44c8-83b5-647d1fe7aa7f@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>
 <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>
 <R6euG0d-37EpZPYDKDgKPXzmRCNHr4xb5Ao1vgQhhpI=.00f3e651-3208-498a-96ad-9c1e7c84c0e1@github.com>
 <t7-Y2sA9pRihFSYuMo-jkNGUhICdNLI2LLhkKqgwjQU=.310906ac-79e0-45c3-8c35-5906948d172d@github.com>
 <Wg7vR5sqiT047mST7dfsDS1SX2l5GaDvAotz3NVpzfk=.1380cd99-db62-44c8-83b5-647d1fe7aa7f@github.com>
Message-ID: <AKig5C5Bzl9skvDYx1W3Y_Csgx3lmcYlemp9iIq37TI=.876f127e-5077-4261-a41e-9073a7928d52@github.com>

On Wed, 3 Sep 2025 08:33:40 GMT, erifan <duke at openjdk.org> wrote:

>> What is the range of values you're trying to test?
>
> It's hard to say, because it is actually the value bits of a fp8.
> 
> Simply put, the lower 8 bits are valid values. The remaining bits must be 0.

Sorry, thinko. I meant to say

`imm8 & ~0xff`

but never mind, let it stand.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26951#discussion_r2318269552

From mli at openjdk.org  Wed Sep  3 09:19:42 2025
From: mli at openjdk.org (Hamlin Li)
Date: Wed, 3 Sep 2025 09:19:42 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <r6BB4foBp1qFm3dabVKnYTWHDXM7ZgiWlBnR4tN0Tdg=.d862d996-2e84-49fd-983a-29be846e23b1@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <LUj3nxfsDhJ2kTdroX_W8MZCHAlgjEtQ2byk-ke_Cos=.f8503d8f-02dc-442c-b871-1a7fa735dc93@github.com>
 <GKX55pfbOY1fAST3KbnIX3a-ZSHTg704ry9wrXbMbmQ=.4ebe7c56-6339-4982-a59b-1297f4a35732@github.com>
 <mp1Jr2Vt3uN19r4fINgunk_5JleGfm7HVpYsKSdeF5c=.4097c1ec-0dbd-4ede-8850-d4dac6a33705@github.com>
 <r6BB4foBp1qFm3dabVKnYTWHDXM7ZgiWlBnR4tN0Tdg=.d862d996-2e84-49fd-983a-29be846e23b1@github.com>
Message-ID: <dK7zHpwKfAakWruo-sKFP5pPlMJV-ZWL0h5oX5Jag5Q=.3d028e3f-2d6b-40d0-90fd-9a0d4cd4c7f8@github.com>

On Wed, 3 Sep 2025 07:54:10 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> But the AbstractICache::invalidate_range is not documented to guarantee to have this effect.

what "not documented" here mean? By reading the code, seems `AbstractICache::invalidate_range` will delegate to `icache_flush` in riscv which will do the fence and flush.

BTW, here are some comments from hotspot/share/runtime/icache.hpp,

// Default implementation is in icache.cpp, and can be hidden per-platform.
// Most platforms must provide only ICacheStubGenerator::generate_icache_flush().


> If someone executes the new instruction when changed to jalr(3), we did want them to call the new location we stored to the stub(1). By saying 1 happens before 3, we convey our intent.
> Aarch64 also have this.

Make sense!
In worst condition, what will happen if we remove the 2 release here and just count on `fence rw, rw` in `AbstractICache::invalidate_range`? Seems we're fine based on your latter comment.
I suppose these extra 2 releases bring some performance penalty? If this is true, I'm not sure if it's worth to treat such a rare condition in such a proper way.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2318341469

From rehn at openjdk.org  Wed Sep  3 09:49:49 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 3 Sep 2025 09:49:49 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <dK7zHpwKfAakWruo-sKFP5pPlMJV-ZWL0h5oX5Jag5Q=.3d028e3f-2d6b-40d0-90fd-9a0d4cd4c7f8@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <LUj3nxfsDhJ2kTdroX_W8MZCHAlgjEtQ2byk-ke_Cos=.f8503d8f-02dc-442c-b871-1a7fa735dc93@github.com>
 <GKX55pfbOY1fAST3KbnIX3a-ZSHTg704ry9wrXbMbmQ=.4ebe7c56-6339-4982-a59b-1297f4a35732@github.com>
 <mp1Jr2Vt3uN19r4fINgunk_5JleGfm7HVpYsKSdeF5c=.4097c1ec-0dbd-4ede-8850-d4dac6a33705@github.com>
 <r6BB4foBp1qFm3dabVKnYTWHDXM7ZgiWlBnR4tN0Tdg=.d862d996-2e84-49fd-983a-29be846e23b1@github.com>
 <dK7zHpwKfAakWruo-sKFP5pPlMJV-ZWL0h5oX5Jag5Q=.3d028e3f-2d6b-40d0-90fd-9a0d4cd4c7f8@github.com>
Message-ID: <R_lZuWiCR0VbCKRNb6cIlcdJKcmUca2HdD4m-Z_lK-w=.f2351d4e-1c54-4eac-9ff6-1e43a06ecfad@github.com>

On Wed, 3 Sep 2025 09:17:14 GMT, Hamlin Li <mli at openjdk.org> wrote:

> > But the AbstractICache::invalidate_range is not documented to guarantee to have this effect.
> 
> what "not documented" here mean? By reading the code, seems `AbstractICache::invalidate_range` will delegate to `icache_flush` in riscv which will do the fence and flush.
> 
> BTW, here are some comments from hotspot/share/runtime/icache.hpp,
> 
> ```
> // Default implementation is in icache.cpp, and can be hidden per-platform.
> // Most platforms must provide only ICacheStubGenerator::generate_icache_flush().
> ```

Yes, and it doesn't say this method also provide a release fence or anything like that.
I other general code we seem to needed, I can remove release(4) for a comment if you like.

> 
> > If someone executes the new instruction when changed to jalr(3), we did want them to call the new location we stored to the stub(1). By saying 1 happens before 3, we convey our intent.
> > Aarch64 also have this.
> 
> Make sense! In worst condition, what will happen if we remove the 2 release here and just count on `fence rw, rw` in `AbstractICache::invalidate_range`? Seems we're fine based on your latter comment. I suppose these extra 2 releases bring some performance penalty? If this is true, I'm not sure if it's worth to treat such a rare condition in such a proper way.

Yes, we should be fine, but there is no reason to not store them in 'wish' order.
No there is no perfomance differences, this code is not executed often and the call to invalidate_range is so slow that anything else don't matter. You are talking about removing a few cycles from something that take tens of thousands of cycles.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2318417867

From duke at openjdk.org  Wed Sep  3 10:02:24 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 10:02:24 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v3]
In-Reply-To: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
Message-ID: <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>

> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
> 
> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
> 
> 2. Additionally, the encoding of the negative floating-point number is incorrect:
> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
> - Bit **13** should be encoded as **0** for floating-point numbers.
> 
> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
> 
> Some test cases are added to aarch64-asmtest.py, and all tests passed.

erifan has updated the pull request incrementally with one additional commit since the last revision:

  Code style fixes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26951/files
  - new: https://git.openjdk.org/jdk/pull/26951/files/16a06948..66ba6570

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26951&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26951&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/26951.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26951/head:pull/26951

PR: https://git.openjdk.org/jdk/pull/26951

From duke at openjdk.org  Wed Sep  3 10:06:46 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 10:06:46 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v3]
In-Reply-To: <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
Message-ID: <lgK8jhIsWF620AfN-t9j3mSEmHIRgXjcipf1MBjCvAg=.9bb84430-8460-4a25-94e1-ccf81c3fb11c@github.com>

On Wed, 3 Sep 2025 10:02:24 GMT, erifan <duke at openjdk.org> wrote:

>> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
>> 
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>> 
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>> 
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>> 
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Code style fixes

Thanks @theRealAph , I have addressed your suggested changes.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26951#pullrequestreview-3179900424

From duke at openjdk.org  Wed Sep  3 10:06:47 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 10:06:47 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v2]
In-Reply-To: <AKig5C5Bzl9skvDYx1W3Y_Csgx3lmcYlemp9iIq37TI=.876f127e-5077-4261-a41e-9073a7928d52@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>
 <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>
 <R6euG0d-37EpZPYDKDgKPXzmRCNHr4xb5Ao1vgQhhpI=.00f3e651-3208-498a-96ad-9c1e7c84c0e1@github.com>
 <t7-Y2sA9pRihFSYuMo-jkNGUhICdNLI2LLhkKqgwjQU=.310906ac-79e0-45c3-8c35-5906948d172d@github.com>
 <Wg7vR5sqiT047mST7dfsDS1SX2l5GaDvAotz3NVpzfk=.1380cd99-db62-44c8-83b5-647d1fe7aa7f@github.com>
 <AKig5C5Bzl9skvDYx1W3Y_Csgx3lmcYlemp9iIq37TI=.876f127e-5077-4261-a41e-9073a7928d52@github.com>
Message-ID: <CdJXnRENBqOYTi0i4FWSvNJ6w9oJthv-7WvPFgv5U74=.2427f9bc-ebf2-44ed-9499-08930366697f@github.com>

On Wed, 3 Sep 2025 08:50:57 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> It's hard to say, because it is actually the value bits of a fp8.
>> 
>> Simply put, the lower 8 bits are valid values. The remaining bits must be 0.
>
> Sorry, thinko. I meant to say
> 
> `imm8 & ~0xff`
> 
> but never mind, let it stand.

Ok, thanks!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26951#discussion_r2318451958

From duke at openjdk.org  Wed Sep  3 10:06:50 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 10:06:50 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v2]
In-Reply-To: <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <jgoCbFvAF8o6IzWisNXIdm8k7rzxaMWWDGEWLnhRAKQ=.b47495d2-d806-44e2-989a-25ad589de88a@github.com>
 <mKHzVV0tUR5OpQA43MbZQyw-M9nRZjoqBim_VSOYQjU=.7839c2a9-ab12-4587-bad9-78bf6ea94fcc@github.com>
Message-ID: <O_0-vClL1TY2IzZrfIpF9fcPZpVi51YqjwRljveXJ2o=.e7d07128-4bf9-4031-b6a7-29814d6c209b@github.com>

On Wed, 3 Sep 2025 08:11:55 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Don't rename sve_cpy as sve_fcpy
>>  - Merge branch 'master' into JDK-8365911
>>  - 8365911: AArch64: Fix encoding error in sve_cpy for negative floats
>>    
>>    The?sve_cpy?instruction is not correctly implemented for?negative
>>    floating-point?values. The issues include:
>>    
>>    1. When a negative floating-point number (e.g. `-1.0`) is passed, the
>>    `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>>    - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>>    - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>>    - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>>    - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>>    
>>    2. Additionally, the encoding of the negative floating-point number is incorrect:
>>    - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>>    - Bit **13** should be encoded as **0** for floating-point numbers.
>>    
>>    This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>>    
>>    Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3831:
> 
>> 3829:     int m = isMerge ? 1 : 0;
>> 3830:     f(0b00000101, 31, 24), f(T, 23, 22), f(0b01, 21, 20);
>> 3831:     prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8&0xff, 12, 5), rf(Zd, 0);
> 
> Suggestion:
> 
>     prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8 & 0xff, 12, 5), rf(Zd, 0);
> 
> General HotSpot style.

Done, thanks.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26951#discussion_r2318454592

From duke at openjdk.org  Wed Sep  3 10:12:54 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 10:12:54 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v7]
In-Reply-To: <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <dCstHcUFS9A79fKEf3RWnPrxvnzKjyVfbBzyT_iyzYo=.19255391-54fb-445e-b7e8-faf016e8a79f@github.com>
 <jc11aMooMRS54e6I3rd0HyobUW38VG_SbP60BoHUu48=.6ad63307-03bb-4171-bfa6-4f40741a1fc6@github.com>
 <NOSjg9nd8YCpTLPchcVXO2KxOzfTmYuxaQHqZhmHGUo=.e98cf933-0c08-4761-8210-75d56ece7542@github.com>
 <tLkj61MwZSaQEeLO3reAqAWfAMbs_hcR4wVXuUNpu5E=.197c558b-665f-4d7d-8f0c-97031a0ccf16@github.com>
 <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com>
Message-ID: <FFyeak7o5Plkg2ljHZD05VetZ9uI81UnZN1sc65ZqAg=.201bccb4-361c-4869-baac-d73c49f5f8d7@github.com>

On Thu, 5 Jun 2025 11:05:48 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> > FYI: `BoolTest::negate` already does what you want: `mask negate( ) const { return mask(_test^4); }` I think you should use that instead :)
>>> 
>>> Indeed, I hadn't noticed that, thank you.
>> 
>> Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function.
>
>> Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function.
> 
> I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because that is where the exact implementation of the enum values is. In that context it is easier to see why `^4` does the negation. And imagine we were ever to change the enum values, then it would be harder to find your code and fix it.
> 
> Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a comment that both signed and unsigned is supported.

Hi @eme64 @theRealAph @XiaohongGong @fg1417 @shqking ,  could you help take a look at this PR, thanks

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3248596662

From duke at openjdk.org  Wed Sep  3 10:14:56 2025
From: duke at openjdk.org (erifan)
Date: Wed, 3 Sep 2025 10:14:56 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation
In-Reply-To: <tmEj88Ez4DURxmS7pPm8t1lhRct4fHlQcBySljEu-tg=.a18e80cf-7711-4ac6-990f-4c630b90f98b@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <tmEj88Ez4DURxmS7pPm8t1lhRct4fHlQcBySljEu-tg=.a18e80cf-7711-4ac6-990f-4c630b90f98b@github.com>
Message-ID: <tHD8aWJ_d1GaBqE6Sw7Ip_Yt_Y2y6m6OhaKj0e1mq7U=.8b515a36-eb3f-47bc-9d1d-861b68d32c6d@github.com>

On Wed, 20 Aug 2025 11:27:59 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> The algorithm description here is great. Please paste all of it from "Since there are" to "but with different instructions where appropriate." into this PR, before the vector expand implementation.

@theRealAph @e1iu @XiaohongGong @fg1417 @shqking, could you help take a look at this PR, thanks~

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3248603705

From aph at openjdk.org  Wed Sep  3 10:16:44 2025
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 3 Sep 2025 10:16:44 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v3]
In-Reply-To: <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
Message-ID: <Pyn6_YsVLxnpGgcNXyVDWVprS1bFAHlF3_uyHrZDjsU=.7dba62c6-c4ef-4b1f-93d7-8836e965ee06@github.com>

On Wed, 3 Sep 2025 10:02:24 GMT, erifan <duke at openjdk.org> wrote:

>> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
>> 
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>> 
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>> 
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>> 
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Code style fixes

Good.

-------------

Marked as reviewed by aph (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26951#pullrequestreview-3179955144

From epeter at openjdk.org  Wed Sep  3 12:39:44 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 3 Sep 2025 12:39:44 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <bWiaf5dT3JLWsXpijrpnjMCTToX-daYvE5rXh0VafsI=.695b1b57-2540-4e0d-ab23-3b0ec418bb2d@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
 <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>
 <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
 <bWiaf5dT3JLWsXpijrpnjMCTToX-daYvE5rXh0VafsI=.695b1b57-2540-4e0d-ab23-3b0ec418bb2d@github.com>
Message-ID: <HrtNCvbVFwcmnXfnhiky9VSHP7VDDRE4jkF49VKPFaQ=.2ff5166f-ff99-4c2a-b828-99dc4d5edc30@github.com>

On Wed, 3 Sep 2025 07:19:06 GMT, erifan <duke at openjdk.org> wrote:

>>> 1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
>> 
>> That's a fair point, but the Arch64 name for all four instructions is CPY, and they are distinguished by their operands. Deviation from the names in the Reference Manual is occasionally necessary, but it makes life painful for maintainers when they have to search for what we've called an instruction they want to use.
>>  
>>>     2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** .
>> 
>> Yes, that's right.
>> 
>>> While some encoding code can be reused, separating the encodings makes the code clearer.
>> 
>> I don't agree that it makes the code clearer. In fact, tight factoring emphasizes the fact that these instructions are similar, and explicitly shows where they are different.
>> 
>> It is true that I have a strong bias against copy-and-paste programming.
>> 
>>> I think both implementations are fine. If you think it's better to not refactor, I'll revert.
>> 
>> I do. Thank you.
>
>> I do. Thank you.
> 
> Ok, I have reverted the refactoring. Please help take another look, thanks~

@erifan I'm running some internal testing - though we don't have SVE machines so you are responsible to make sure it is adequately tested for that ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3249077329

From epeter at openjdk.org  Wed Sep  3 12:47:46 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 3 Sep 2025 12:47:46 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v2]
In-Reply-To: <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
Message-ID: <ipmog7dqfmJW0_2DUXcfFgIeMd_X9QF-3aQN4fqNOj0=.ad46150d-9b51-477e-8c30-2c9db56cff6f@github.com>

On Thu, 21 Aug 2025 07:00:35 GMT, erifan <duke at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Improve the comment of the vector expand implementation
>  - Merge branch 'master' into JDK-8363989
>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>    
>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>    cases, `expand` has not yet been intrinsified:
>    1. **Subword types** on SVE2-capable hardware.
>    2. **All types** on NEON and SVE1 environments.
>    
>    As a result, `expand` API performance is very poor in these scenarios.
>    This patch intrinsifies the `expand` operation in the above environments.
>    
>    Since there are no native instructions directly corresponding to `expand`
>    in these cases, this patch mainly leverages the `TBL` instruction to
>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>    Take a 128-bit byte vector on SVE2 as an example:
>    ```
>    To compute: dst = src.expand(mask)
>    Data direction: high <== low
>    Input:
>      src                         = p o n m l k j i h g f e d c b a
>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    Expected result:
>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>    ```
>    Step 1: calculate the index input of the TBL instruction.
>    ```
>    // Set tmp1 as all 0 vector.
>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>    
>    // Move the mask bits from the predicate register to a vector register.
>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    
>    // Shift the entire register. Prefix sum algorithm.
>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>    
>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>    
>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>    
>    dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>    tmp2 += ...

test/hotspot/jtreg/compiler/vectorapi/VectorExpandTest.java line 48:

> 46:     static final VectorSpecies<Float> F_SPECIES = FloatVector.SPECIES_MAX;
> 47:     static final VectorSpecies<Long> L_SPECIES = LongVector.SPECIES_MAX;
> 48:     static final VectorSpecies<Double> D_SPECIES = DoubleVector.SPECIES_MAX;

Would it make sense to run these tests with various vector sizes?
Because it seems your algorithm depends on `vector_length_in_bytes` in the prefix sum algo.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26740#discussion_r2318862195

From epeter at openjdk.org  Wed Sep  3 12:52:49 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 3 Sep 2025 12:52:49 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v2]
In-Reply-To: <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
Message-ID: <kOz6uh-zcSXfhMf5UioaIXmQ3vbU18UMgncgOslfyv0=.c6d6ef2c-8b9e-48c9-b78d-df96e07d7832@github.com>

On Thu, 21 Aug 2025 07:00:35 GMT, erifan <duke at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Improve the comment of the vector expand implementation
>  - Merge branch 'master' into JDK-8363989
>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>    
>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>    cases, `expand` has not yet been intrinsified:
>    1. **Subword types** on SVE2-capable hardware.
>    2. **All types** on NEON and SVE1 environments.
>    
>    As a result, `expand` API performance is very poor in these scenarios.
>    This patch intrinsifies the `expand` operation in the above environments.
>    
>    Since there are no native instructions directly corresponding to `expand`
>    in these cases, this patch mainly leverages the `TBL` instruction to
>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>    Take a 128-bit byte vector on SVE2 as an example:
>    ```
>    To compute: dst = src.expand(mask)
>    Data direction: high <== low
>    Input:
>      src                         = p o n m l k j i h g f e d c b a
>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    Expected result:
>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>    ```
>    Step 1: calculate the index input of the TBL instruction.
>    ```
>    // Set tmp1 as all 0 vector.
>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>    
>    // Move the mask bits from the predicate register to a vector register.
>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    
>    // Shift the entire register. Prefix sum algorithm.
>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>    
>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>    
>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>    
>    dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>    tmp2 += ...

Looks like a nice improvement!

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2819:

> 2817:   subv(dst, size, tmp2, tmp1);
> 2818:   // dst = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
> 2819:   tbl(dst, size, src, 1, dst);

It would make it a little easier to read the example if the numbers were aligned.
Now the minus sign disrupts that a little. Maybe leave 2 spaces if the number is positive?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3249121442
PR Review Comment: https://git.openjdk.org/jdk/pull/26740#discussion_r2318874112

From hgreule at openjdk.org  Wed Sep  3 15:22:51 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Wed, 3 Sep 2025 15:22:51 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v7]
In-Reply-To: <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
Message-ID: <qvXeqU-AMI1hIL6NtQ92h-Z24x41RIkGG67wcZP6m-8=.df359e17-9726-4cd2-ae95-874099f65b76@github.com>

On Tue, 26 Aug 2025 12:46:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

I also filed https://bugs.openjdk.org/browse/JDK-8366815 now regarding the early transformation of div/mod by constants.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3249705608

From fgao at openjdk.org  Wed Sep  3 16:55:45 2025
From: fgao at openjdk.org (Fei Gao)
Date: Wed, 3 Sep 2025 16:55:45 GMT
Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some
 small trip counts [v2]
In-Reply-To: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com>
References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com>
Message-ID: <ezYpgocDEtRwma9bmy95F6Ia6Tl7T05HjQwb1RVuzhg=.b465eae1-403b-4924-8ab7-cb39dd4e4b7c@github.com>

> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the
> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go.
> 
> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`.
> 
> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop.
> 
> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop.
> 
> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow.
> 
> The whole process is done by the function `insert_post_loop()`.
> 
> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`:
> 
> 1. The fall-in control flow to the vectorized drain loop comes from a `RegionNode` merging exits ...

Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits:

 - Merge branch 'master' into optimize-atomic-post
 - Clean up comments for consistency and add spacing for readability
 - Fix some corner case failures and refined part of code
 - Merge branch 'master' into optimize-atomic-post
 - Refine ascii art, rename some variables and resolve conflicts
 - Merge branch 'master' into optimize-atomic-post
 - Add necessary ASCII art, refactor insert_post_loop() and rename
   "atomic post loop" with "vectorized drain loop.
 - Merge branch 'master' into optimize-atomic-post
 - 8307084: C2: Vector atomic post loop is not executed for some small trip counts
   
   In C2's loop optimization, for a counted loop, if we have any of
   these conditions (RCE, unrolling) met, we switch to the
   pre-main-post-loop model. Then a counted loop could be split into
   pre-main-post loops. Meanwhile, C2 inserts minimum trip guards
   (a.k.a. zero-trip guards) before the main loop and the post loop.
   These guards test if the remaining trip count is less than the
   loop stride (after unrolling). If yes, The execution jumps over
   the loop code to avoid loop over-running. For example, if a main
   loop is unrolled to 8x, the main loop guard tests if the loop has
   less than 8 iterations and then decide which way to go.
   
   Usually, the vectorized main loop will be super-unrolled after
   vectorization. In such cases, the main loop's stride is going to
   be further multiplied. After the main loop is super-unrolled, the
   minimum trip guard test will be updated. Assuming one vector can
   operate 8 iterations and the super-unrolling count is 4, the trip
   guard of the main loop will test if remaining trip is less than
   8 * 4 = 32.
   
   To avoid the scalar post loop running too many iterations after
   super-unrolling, C2 clones the main loop before super-unrolling to
   create a vector drain loop, i.e. atomic post loop. The newly
   inserted post loop also has a minimum trip guard. And, both trip
   guards of the main loop and vector post loop jump to the scalar
   post loop.
   
   The problem here is, if the remaining trip count when exiting from
   the pre-loop is relatively small but larger than the vector length,
   the vector atomic post loop will never be executed. Because the
   minimum trip guard test of main loop fails, the execution will
   jump over both the main loop and the atomic post loop. For
   example, in the above case, a loop still has 25 iterations after the
   pre-loop, we may run 3 rounds of the atomic post loop but
   it's impossible. It would be better if the minimum trip guard
   test of the main loop does not jump over the atomic post loop.
   
   This patch is to improve it by modifying the control flow when
   the minimum trip guard test of the main loop fails. Obviously,
   we need to sync all data uses and control uses to adjust to the
   change of control flow.
   
   The whole process is done by the function
   insert_atomic_post_loop_impl().
   
   We introduce a new CloneLoopMode, InsertAtomicPost. When we're cloning
   vector main loop to atomic post loop with mode InsertAtomicPost:
   
   1. The fall-in control flow to the atomic post-loop comes from a
   RegionNode merging exits from pre-loop and main-loop, implemented in
   insert_atomic_post_loop_impl().
   2. All fall-in values to the atomic post-loop come from (one or more)
   phis merging exit values from pre-loop and main-loop, implemented by
   clone_up_atomic_post_backedge_goo().
   3. All control uses of exits from old-loop now should use new
   RegionNodes that merge RegionNodes which merge exits from pre-loop
   and main-loop and exits from the new-loop (atomic post loop)
   equivalents, implemented by fix_ctrl_uses_for_atomic_post()
   4. All data uses of values from old-loop now should use new Phis
   that merge Phis which merge values from pre-loop and main-loop and
   values from the new-loop (atomic post loop) equivalents, implemented
   by handle_data_uses_for_atomic_post_loop().
   
   We also add a new micro-benchmark to test the performance gain. Here are
   the performance results from different vector-length machines.
   
   Tier 1- 3 passed on aarch64 and x86. There are still a few fuzzer
   test failures.

-------------

Changes: https://git.openjdk.org/jdk/pull/22629/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22629&range=01
  Stats: 1542 lines in 8 files changed: 1358 ins; 59 del; 125 mod
  Patch: https://git.openjdk.org/jdk/pull/22629.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/22629/head:pull/22629

PR: https://git.openjdk.org/jdk/pull/22629

From fgao at openjdk.org  Wed Sep  3 17:10:47 2025
From: fgao at openjdk.org (Fei Gao)
Date: Wed, 3 Sep 2025 17:10:47 GMT
Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some
 small trip counts
In-Reply-To: <ZHCa5aRFYGgD-SqSlhZPcLXjQC0rDDrvM9ZOHneXcAY=.3f0435bc-3fbd-4c39-954b-22749d946ed5@github.com>
References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com>
 <ZHCa5aRFYGgD-SqSlhZPcLXjQC0rDDrvM9ZOHneXcAY=.3f0435bc-3fbd-4c39-954b-22749d946ed5@github.com>
Message-ID: <aYo2GTTIJarX_bmJGNRQ2-jbNQ2qGYMucpxTpee0DUc=.1bcd70d4-06bc-4830-b48e-d9ff38cb14a8@github.com>

On Thu, 28 Aug 2025 14:58:25 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> BTW: I just integrated https://github.com/openjdk/jdk/pull/24278 which may have silent merge conflicts, so it would be good if you merged and tested again.

Hi @eme64 , I?ve rebased the patch onto the latest JDK, and all tier1 to tier3 tests have passed on my local AArch64 and x86 machines.

> It would be good if you re-ran the benchmarks. It seems the last ones you did in December of 2024.
We should see that we have various benchmarks, both for array and MemorySegment.
You could look at the array benchmarks from here: https://github.com/openjdk/jdk/pull/22070

I also re-verified the benchmark from [PR #22070](https://github.com/openjdk/jdk/pull/22070) on 128-bit, 256-bit, and 512-bit vector machines. The results show no significant regressions and performance changes are consistent with the previous round described in [perf results]( https://bugs.openjdk.org/browse/JDK-8307084?focusedId=14729524&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14729524). 

> Once you do that I could also run some internal testing, if you like :)

I?d really appreciate it if you could run some internal testing at a time you think is suitable.
Thanks :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3250077476

From cslucas at openjdk.org  Wed Sep  3 17:13:22 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Wed, 3 Sep 2025 17:13:22 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed: Sanity:
 previous reducible Phi is no longer reducible before SUT.
Message-ID: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>

Please, review this patch to fix issue that may occur when reducing allocation merge.

As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 

The change in `revisit_reducible_phi_status` is just a clean-up.
The real fix is in `find_scalar_replaceable_allocs`.

Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.

-------------

Commit messages:
 - Fix for RAM not reducible before SUT & Test.

Changes: https://git.openjdk.org/jdk/pull/27063/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27063&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8361699
  Stats: 87 lines in 2 files changed: 73 ins; 13 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27063.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27063/head:pull/27063

PR: https://git.openjdk.org/jdk/pull/27063

From vlivanov at openjdk.org  Wed Sep  3 17:34:42 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 3 Sep 2025 17:34:42 GMT
Subject: RFR: 8355354: C2 crashed: assert(_callee == nullptr || _callee ==
 m) failed: repeated inline attempt with different callee [v4]
In-Reply-To: <ihIxDFcOzNC4d90jJJihutTjrYZZKraIWJLKhbCB6hE=.e7b1c093-432c-4680-b26c-d88c2b34f41b@github.com>
References: <_eAERVexsTQc_Acje4IUJ9yqqE98dB4-hz_fJ0jrUhs=.b2194a63-2599-42f7-a65f-41c29bb37bc3@github.com>
 <ihIxDFcOzNC4d90jJJihutTjrYZZKraIWJLKhbCB6hE=.e7b1c093-432c-4680-b26c-d88c2b34f41b@github.com>
Message-ID: <_XB8zA3RZEgWvAwKe8DsB3Udb7gaIqBHiEPHw_28t6Y=.4a54aa09-b496-4818-a4cd-7e7013970c72@github.com>

On Wed, 3 Sep 2025 06:50:26 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> # Issue
>> The CTW test `applications/ctw/modules/java_xml.java` crashes when trying to repeat late inlining of a virtual method (after IGVN passes through the method's call node again). The failure originates [here](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callGenerator.cpp#L473) because `_callee != m`. Apparently when running IGVN a second time after a first late inline failure and [setting the callee in the call generator](https://github.com/openjdk/jdk/blob/e2ae50d877b13b121912e2496af4b5209b315a05/src/hotspot/share/opto/callnode.cpp#L1240) we notice that the previous callee is not the same as the current one.
>> In this specific instance it seems that the issue happens when CTW is compiling Apache Xalan.
>> 
>> # Cause
>> The root of the issue has to do with repeated late inlining, class hierarchy analysis and dynamic class loading.
>> 
>> For this particular issue the two differing methods are `org.apache.xalan.xsltc.compiler.LocationPathPattern::translate` first and `org.apache.xalan.xsltc.compiler.AncestorPattern::translate` the second time. `LocationPathPattern` is an abstract class but has a concrete `translate` method. `AncestorPattern` is a concrete class that extends another abstract class `RelativePathPattern` that extends `LocationPathPattern`. `AncestorPattern` overrides the translate method.
>> What seems to be happening is the following: we compile a virtual call `RelativePathPattern::translate` and at compile time. Only the abstract classes `RelativePathPattern` <: `LocationPathPattern` are loaded. CHA then finds out that the call must always call `LocationPathPattern::translate` because the method is not overwritten anywhere else. However, there is still no non-abstract class in the entire class hierarchy, i.e. as soon as `AncestorPattern` is loaded, this class is then the only non-abstract class in the class hierarchy and therefore the receiver type must be `AncestorPattern`.
>> 
>> More in general, when late inlining is repeated and classes are loaded dynamically, it is possible that the resolved method between a late inlining attempt and the next one is not the same.
>> 
>> # Fix
>> 
>> This looks like a very edge-case. If CHA is affected by class loading the original recorded dependency becomes invalid. This can possibly happen in other situations (e.g JVMTI class redefinition). So, instead of modifying the assert (to check for invalid dependencies) we avoid re-setting the callee method ...
>
> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JDK-8355354: add stress comment

Looks fine.

-------------

Marked as reviewed by vlivanov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26441#pullrequestreview-3181755233

From vlivanov at openjdk.org  Wed Sep  3 20:43:52 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 3 Sep 2025 20:43:52 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <UKkT1Wqi4ftj3eGF2KzT8saeWoWSBTXx5kw0FOiJyLE=.c10dbf15-3348-495b-b9aa-556b78bc1e0b@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <UKkT1Wqi4ftj3eGF2KzT8saeWoWSBTXx5kw0FOiJyLE=.c10dbf15-3348-495b-b9aa-556b78bc1e0b@github.com>
Message-ID: <ejAu9M0FYELqOdzDW8uankmdRt0w8bloAwcxWcyx5k0=.9a47c6c4-e9df-40f5-aba9-23073a12bd17@github.com>

On Tue, 3 Jun 2025 17:20:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   renaming
>
> src/hotspot/share/opto/c2_globals.hpp line 83:
> 
>> 81:                                                                             \
>> 82:   product(bool, StressReachabilityFences, false, DIAGNOSTIC,                \
>> 83:           "Randomly insert ReachabilityFence nodes")                        \
> 
> Drive-by sniping: what about a hello-world test where you test out these flags?

Good idea. Added one.

> src/hotspot/share/opto/callGenerator.cpp line 617:
> 
>> 615:   uint endoff = call->jvms()->endoff();
>> 616:   if (C->inlining_incrementally()) {
>> 617:     assert(endoff == call->req(), ""); // assert in SafePointNode::grow_stack
> 
> What exactly are you asserting here? And what is the comment for?

The assert ensures there are no reachability edges present when incremental inlining takes place. Inlining logic doesn't expect any extra edges past debug info and the comment refers to the assert which fires the first.

> src/hotspot/share/opto/callnode.hpp line 497:
> 
>> 495:   // Are we guaranteed that this node is a safepoint?  Not true for leaf calls and
>> 496:   // for some macro nodes whose expansion does not have a safepoint on the fast path.
>> 497:   virtual bool guaranteed_safepoint()  { return true; }
> 
> I see you only copied it. It makes me a little nervous when we call the "default" case safe. Because when you add more cases, you just assume it is safe... and if it is not we first have to discover that through a bug. What do you think?

Well, it's a SafePointNode class after all. I lifted it from `CallNode` subclass to avoid elaborate check on SafePoint nodes (!is_Call() || as_Call() && guaranteed_safepoint()`)).

If some node extends SafePointNode, but doesn't keep JVM state, it has to communicate it to users one way or another. And changing the default doesn't improve the situation IMO: reporting a safepoint node as a non-safepoint is still a bug.

> src/hotspot/share/opto/compile.cpp line 3958:
> 
>> 3956:     Node* rf = C->reachability_fence(i);
>> 3957:     Node* in = rf->in(1);
>> 3958:     if (in->is_DecodeN()) {
> 
> Why not:
> Suggestion:
> 
>     ReachabilityFence* rf = C->reachability_fence(i);
>     DecodeNNode* dn = rf->in(1)->isa_DecodeN();
>     if (dn != nullptr) {

Ok, reshaped as you suggested.

> src/hotspot/share/opto/compile.hpp line 381:
> 
>> 379:   GrowableArray<OpaqueTemplateAssertionPredicateNode*>  _template_assertion_predicate_opaques;
>> 380:   GrowableArray<Node*>  _expensive_nodes;       // List of nodes that are expensive to compute and that we'd better not let the GVN freely common
>> 381:   GrowableArray<Node*>  _reachability_fences;   // List of reachability fences
> 
> Why not:
> Suggestion:
> 
>   GrowableArray<ReachabilityFenceNode*>  _reachability_fences;   // List of all reachability fences

Ok, done.

> src/hotspot/share/opto/compile.hpp line 741:
> 
>> 739:   void remove_reachability_fence(Node* n) {
>> 740:     _reachability_fences.remove_if_existing(n);
>> 741:   }
> 
> You could also add the type `ReachabilityFenceNode*` here.

Done.

> src/hotspot/share/opto/loopTransform.cpp line 78:
> 
>> 76:   }
>> 77:   return unique_loop_exit;
>> 78: }
> 
> `proj_out_or_null` returns a `ProjNode` (it is probably a `IfTrue` or `IfFalse`, right?) and `outer_loop_exit` returns a `IfFalseNode`. So we should be able to return a `IfProjNode` from this method. What do you think?
> 
> What is the benefit of the `unique_loop_exit` variable here? Why not return immediately?

It was easier to inspect it in the debugger. Reshaped as you suggested.

> src/hotspot/share/opto/macro.cpp line 983:
> 
>> 981:         _igvn._worklist.push(ac);
>> 982:       } else if (use->is_ReachabilityFence() && OptimizeReachabilityFences) {
>> 983:         _igvn.replace_input_of(use, 1, _igvn.makecon(TypePtr::NULL_PTR)); // reset; redundant fence
> 
> Can you quickly explain in a code comment how this does a "reset"? What happens with it next?

Turned it into `ReachabilityFenceNode::clear_referent()`. Hope it makes it clearer.

> src/hotspot/share/opto/node.hpp line 701:
> 
>> 699:       DEFINE_CLASS_ID(MemBar,      Multi, 3)
>> 700:         DEFINE_CLASS_ID(Initialize,        MemBar, 0)
>> 701:         DEFINE_CLASS_ID(MemBarStoreStore,  MemBar, 1)
> 
> Suggestion:
> 
>         DEFINE_CLASS_ID(Initialize,       MemBar, 0)
>         DEFINE_CLASS_ID(MemBarStoreStore, MemBar, 1)
> 
> I don't think you needed to touch the lines above, right?

Fixed.

> src/hotspot/share/opto/parse.hpp line 361:
> 
>> 359:   bool          _wrote_fields;       // Did we write any field?
>> 360:   Node*         _alloc_with_final_or_stable; // An allocation node with final or @Stable field
>> 361:   Node*         _stress_rf_hook; // StressReachabilityFences support
> 
> You could write out the `rf`

I'd like to avoid that. `_stress_reachability_fence_hook` is way too verbose IMO. The declaration and all the accesses are accompanied by `StressReachabilityFences` which should make it clear what `rf` refers to.

> src/hotspot/share/opto/parse1.cpp line 379:
> 
>> 377:         _stress_rf_hook->add_req(loc);
>> 378:       }
>> 379:     }
> 
> Can you add a short code comment describing what you are doing here, please?

Done.

> src/hotspot/share/opto/parse1.cpp line 394:
> 
>> 392:         _stress_rf_hook->add_req(stk);
>> 393:       }
>> 394:     }
> 
> A short code comment would be helpful

Done.

> src/hotspot/share/opto/parse1.cpp line 2222:
> 
>> 2220: 
>> 2221:   if (StressReachabilityFences) {
>> 2222:     // Keep all oop arguments alive until method return.
> 
> Why? Can you extend the comment a little?

Done. Does it look better now?

> src/hotspot/share/opto/reachability.cpp line 44:
> 
>> 42:  *   (0) initial set of RFs is materialized during parsing;
>> 43:  *   (1) optimization pass during loop opts which eliminates redundant nodes and
>> 44:  *     moves loop-invariant ones outside loops;
> 
> Suggestion:
> 
>  *   (1) optimization pass during loop opts which eliminates redundant nodes and
>  *       moves loop-invariant ones outside loops;
> 
> I'd prever consistent indentation, but optional/question of taste

Fixed.

> src/hotspot/share/opto/reachability.cpp line 51:
> 
>> 49:  *
>> 50:  * It looks attractive to get rid of RF nodes early and transfer to safepoint-attached representation,
>> 51:  * but it is not correct until loop opts are done.
> 
> Why is it not correct? What could go wrong? Why is it safe to do it after loop opts?

Live ranges of values are routinely extended during loop opts. And it can break the invariant that all interfering safepoints contain the referent in their oop map. (If an interfering safepoint doesn't keep the referent alive, then it becomes possible for the referent to be prematurely GCed.)  

After loop opts are over, it becomes possible to reliably enumerate all interfering safe points and ensure the referent present in their oop maps.

> test/hotspot/jtreg/compiler/c2/TestReachabilityFence.java line 38:
> 
>> 36:  * @summary Tests to ensure that reachabilityFence() correctly keeps objects from being collected prematurely.
>> 37:  * @modules java.base/jdk.internal.misc
>> 38:  * @run main/othervm -Xbatch compiler.c2.TestReachabilityFence
> 
> What about some extra runs where you use your new flags?

This particular test is carefully crafted to provoke a failure when reachability fence effects aren't properly modeled. Stressing RF implementation doesn't help here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320090697
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320120466
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320062127
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320121852
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320122602
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320123063
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320123818
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320135080
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320135683
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320066556
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320136667
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320137310
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320138235
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320138496
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320080872
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320087314

From vlivanov at openjdk.org  Wed Sep  3 20:43:54 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 3 Sep 2025 20:43:54 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <vtlALBbMlS-Zj5Qqzp6PpEFQq6fq7xUskZaCXfADorM=.cfcb1f7c-489b-47e0-b8ea-8b0a87dc9d5d@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <N4plHRx1Hm8W3kbZb0JeoddmFdkM3tjsA0agyrQ40fE=.2da64dae-d06e-45df-be2e-5c7ceb4005f1@github.com>
 <vtlALBbMlS-Zj5Qqzp6PpEFQq6fq7xUskZaCXfADorM=.cfcb1f7c-489b-47e0-b8ea-8b0a87dc9d5d@github.com>
Message-ID: <YwP3BI5-UT6-DwM53nsC1R_zikvBs6dGI-ITm0fABPo=.5de44414-8e0a-4351-bdbc-05d90c21cd79@github.com>

On Mon, 16 Jun 2025 09:28:59 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/callnode.cpp line 950:
>> 
>>> 948:               case CatchProjNode::catch_all_index:    projs->catchall_catchproj    = cpn; break;
>>> 949:               default: {
>>> 950:                 assert(cpn->_con > 1, ""); // exception table; rethrow case
>> 
>> Can we please turn this into a helpful assert message?
>
> Can you quickly comment why you changed this?

Some call nodes inspected during `expand_reachability_fences` demonstrate this IR shape where some exception table projections are directly attached to the call node.

Looks like a missed case in `CallNode::extract_projections` we simply never hit before.

>> src/hotspot/share/opto/loopnode.hpp line 1485:
>> 
>>> 1483:   void remove_rf(Node* rf);
>>> 1484: #ifdef ASSERT
>>> 1485:   bool has_redundant_rfs(Unique_Node_List& ignored_rfs, bool rf_only);
>> 
>> I would prefer if all the method names spelled out `reachability_fences` instead of `rf / rfs`.
>
> The arguments are less important for me.

There are 2 types of methods here: internal ones (used solely in `reachability.cpp`) and those which are called from loop optimization code (`optimize_reachability_fences` and `eliminate_reachability_fences`). 

IMO it's counter-productive to repeatedly spell out what "RF" means inside `reachability.cpp`, so I kept the names intact. I split the declarations into public and private ones to stress the distinction.

>> src/hotspot/share/opto/reachability.cpp line 46:
>> 
>>> 44:  *     moves loop-invariant ones outside loops;
>>> 45:  *   (2) reachability information is transferred to safepoint nodes (appended as edges after debug info);
>>> 46:  *   (3) reachability information from safepoints materialized as RF nodes attached to the safepoint node.
>> 
>> Can you expand the explanation a little, please? I don't really understand. Why do you do this? What does it achieve?
>
> It could be helpful if you wrote a paragraph (maybe at the top), about the interaction of SafePoint and ReachabilityFence. And you should also define "reachability information", I don't yet understand what that entails.

I elaborated the description a bit and added more details. Let me know how it reads now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320108310
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320132768
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320139929

From vlivanov at openjdk.org  Wed Sep  3 20:43:54 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 3 Sep 2025 20:43:54 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <_yDpYorDH_2ox5RaGm_JdCk4uYbiUYanemuUGR2LCp4=.33c1414a-7c61-45bb-9632-dbff88711fde@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <N4plHRx1Hm8W3kbZb0JeoddmFdkM3tjsA0agyrQ40fE=.2da64dae-d06e-45df-be2e-5c7ceb4005f1@github.com>
 <vtlALBbMlS-Zj5Qqzp6PpEFQq6fq7xUskZaCXfADorM=.cfcb1f7c-489b-47e0-b8ea-8b0a87dc9d5d@github.com>
 <_yDpYorDH_2ox5RaGm_JdCk4uYbiUYanemuUGR2LCp4=.33c1414a-7c61-45bb-9632-dbff88711fde@github.com>
Message-ID: <PmFXKsKDKsOiNfm8Ebj6fPQIKdICrdIldVnoLwE0fw4=.642ba291-0f46-46cb-91ab-f38f57cc3f29@github.com>

On Mon, 16 Jun 2025 09:40:30 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Might be helpful if you write in a comment if this eliminates all or just some of the reachability fences.
>
> Can we limit it to cases where we actually have reachability fences?

Good point. Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320121142

From vlivanov at openjdk.org  Wed Sep  3 20:43:55 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 3 Sep 2025 20:43:55 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v6]
In-Reply-To: <t29GLzzULJoHTvaeIdgZEhSGzLOmh_NnR_7UIiP-aZA=.4c85f592-5b48-49d0-a240-13af0a153f90@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <jtZA9Nw-jg2ifyMRp3_l3c0gUhAY2o0o0sePJAXw-zY=.9b153898-6207-4cc3-8a2b-6c4cd1f48095@github.com>
 <Tl4_ZO7lL6YQE0KmfAWIuJkG1BdNTpxdJ1aVKYWSj0I=.d3ea1a7d-03a9-4b22-895f-74b7750cab8a@github.com>
 <s4m4aI0SWBX74c4-PuBEo5ZTQsQqXidwMmiN-F1WKiI=.bec2eef0-0ec5-45c2-8a64-89deaa33b257@github.com>
 <yNlpWIC5EVBmB6herUmKYvaNbyUZgualzPFwWVgr4YY=.076f3402-9a0f-4ea5-aa24-1d0048b1333b@github.com>
 <If38WQj5Xdi3f8johmbRXFsKgs1oo1mErFmnpKVhs3Q=.ca73b674-7fab-4a22-be13-e146beee19d8@github.com>
 <t29GLzzULJoHTvaeIdgZEhSGzLOmh_NnR_7UIiP-aZA=.4c85f592-5b48-49d0-a240-13af0a153f90@github.com>
Message-ID: <zhIa7QpX8Y_DZ_h-tQHE428Nol9xhllqpb9op2YcwZ4=.dee5770d-a3c2-4255-9ae0-cbafb12121b9@github.com>

On Mon, 16 Jun 2025 09:44:48 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Are you asking specifically about `ReachabilityFence -> DecodeN -> LoadN` shape? Yes, it's common, especially after inlining.
>
> @iwanowww Can you add a code comment why this is safe to look through the ReachabilityFence?

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2320062861

From vlivanov at openjdk.org  Wed Sep  3 21:18:06 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 3 Sep 2025 21:18:06 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v7]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <9zI6zFF3tzgRMp6RidkEIIIYg_qMVU3tfdhQMVG84d4=.1c4e2c34-d8be-40af-b160-0f0542934bae@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:

  Update

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25315/files
  - new: https://git.openjdk.org/jdk/pull/25315/files/0762dda9..bdf1b396

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=05-06

  Stats: 55 lines in 3 files changed: 52 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From vlivanov at openjdk.org  Wed Sep  3 21:24:46 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 3 Sep 2025 21:24:46 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
 <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
Message-ID: <iSf14zRwWhUIi1MOa2LgjP0T_S9HGBxYlaEaQgp8QdA=.5b1ab0e9-f07a-4432-96a4-54dac66741ba@github.com>

On Wed, 3 Sep 2025 08:30:47 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>>> Representing ReachabilityFence as memory barrier (e.g., MemBarCPUOrder) would solve the issue, but performance costs are prohibitively high.
>> 
>>> How bad is it? MemBarCPUOrder pinches all memory, so I assume this breaks a lot of optimizations when RF is sitting in the hot loop? I remember we went through a similar exercise with Blackholes: [JDK-8296545](https://bugs.openjdk.org/browse/JDK-8296545) -- and decided to pinch only the control. I guessing this is not enough to fix RF, or is it?
>> 
>> Yes, if a barrier stays inside loop body, it breaks a lot of important optimizations. It may end up almost as bad as a full-blown call (except a barrier can be moved around while a call can't). And moving a node when it depends both on control and memory is more complicated than just a CFG node. Moreover, as you can see in the proposed solution, even CFG-only representation is problematic for loop opts, so additional care is needed to ensure RFs are moved out of loops. 
>> 
>> As an alternative approach, I thought about reifying RF as a data node (think of `CastPP`) and then linking its referent to all safepoints it dominates after loop opts are over.  But that would only affect `optimize_reachability_fences()`. Everything else  would stay the same. So, I decided to stay with CFG-only representation for now.
>
> @iwanowww Let me know whenever this is ready to review again ?

@eme64 please, take another look. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3250854323

From vlivanov at openjdk.org  Wed Sep  3 21:29:43 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 3 Sep 2025 21:29:43 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:

  whitespaces

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25315/files
  - new: https://git.openjdk.org/jdk/pull/25315/files/bdf1b396..e95d4eb9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=06-07

  Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From missa at openjdk.org  Thu Sep  4 00:22:58 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 4 Sep 2025 00:22:58 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v4]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <UpnysT8GCHxp3eMm2kr6EWRu6yP8nQS2bg_K8EARAus=.e551974b-7273-4fc8-96fe-ff73529ed362@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:

 - Update floating point conversion tests to check for AVX 10.2 CPU feature ID
 - Correct matching rules for AVX 10.2 floating point conversion instructions that involve memory

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/be5c0b4e..07ac817a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=02-03

  Stats: 22 lines in 4 files changed: 0 ins; 0 del; 22 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From dlong at openjdk.org  Thu Sep  4 00:31:42 2025
From: dlong at openjdk.org (Dean Long)
Date: Thu, 4 Sep 2025 00:31:42 GMT
Subject: RFR: 8366461: Remove obsolete method handle invoke logic [v3]
In-Reply-To: <j17HsQMH4wjiC9yAOssV1Ivx6dOMyUw_dgT-Q0KlV-c=.b7523e4d-515f-4384-adb3-cc3c9763db5c@github.com>
References: <LQQer6eHAvGEV6clizLClEdOtBBIO7GCQCzibGcEzL8=.7ec9480c-c660-460d-ab5c-69d4d4a4d03d@github.com>
 <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
 <j17HsQMH4wjiC9yAOssV1Ivx6dOMyUw_dgT-Q0KlV-c=.b7523e4d-515f-4384-adb3-cc3c9763db5c@github.com>
Message-ID: <pUL45XVf0NV3n93568ueZXcoW_l4UquTL5czT5ZXxjA=.8de29494-8151-4ac5-9c9b-4a18a163d5ed@github.com>

On Wed, 3 Sep 2025 07:12:20 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
>> 
>>  - revert whitespace change
>>  - undo debug changes
>>  - cleanup
>
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java line 372:
> 
>> 370:         // DEBUG_ONLY(verifyDeoptriginalPc(senderNm, raw_unextendedSp));
>> 371:       }
>> 372:     }
> 
> `<arch>Frame.java adjustUnextendedSP()` do not seem to do anything? Perhaps these could be cleaned up as well?

Yes, it's tempting to want to clean these up, but I noticed that SA code really tries to mirror the C++ code, so I'm inclined to leave it.  Is there a Serviceability expert that would like to see this code cleaned up further?  @plummercj , what do you think?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2320526360

From xgong at openjdk.org  Thu Sep  4 02:15:45 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Thu, 4 Sep 2025 02:15:45 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
Message-ID: <XF8uxpRWCmFFTI33uimu0x4tHTgt8CAtIMeO2pQ_oEc=.be9ac493-0984-49ab-a24c-f7a63ee06085@github.com>

On Mon, 4 Aug 2025 02:31:08 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
>> 
>> ### Background
>> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
>> 
>> ### Implementation
>> 
>> #### Challenges
>> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
>> 
>> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
>> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
>> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
>> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
>> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
>> 
>> Use `ByteVector.SPECIES_512` as an example:
>> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
>> - It requires 4 times of vector gather-loads to finish the whole operation.
>> 
>> 
>> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
>> int[] idx = [0, 1, 2, 3, ..., 63, ...]
>> 
>> 4 gather-load:
>> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
>> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
>> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
>> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
>> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
>> 
>> 
>> #### Solution
>> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
>> 
>> Here is the main changes:
>> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
>> - Added `VectorSliceNode` for result mer...
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge 'jdk:master' into JDK-8351623-sve
>  - Address review comments
>  - Refine IR pattern and clean backend rules
>  - Fix indentation issue and move the helper matcher method to header files
>  - Merge branch jdk:master into JDK-8351623-sve
>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation

Hi @eme64 , could you please help take a look at this PR? Thanks a lot in advance!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3251469639

From xgong at openjdk.org  Thu Sep  4 02:18:47 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Thu, 4 Sep 2025 02:18:47 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v2]
In-Reply-To: <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
Message-ID: <sUzyT9_nDT91PJsD4UnGIwd8Adnqvc3uXD7ggaK526U=.aa5807a1-97c3-4fbd-83f1-f7b2e58ccc9a@github.com>

On Thu, 21 Aug 2025 07:00:35 GMT, erifan <duke at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Improve the comment of the vector expand implementation
>  - Merge branch 'master' into JDK-8363989
>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>    
>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>    cases, `expand` has not yet been intrinsified:
>    1. **Subword types** on SVE2-capable hardware.
>    2. **All types** on NEON and SVE1 environments.
>    
>    As a result, `expand` API performance is very poor in these scenarios.
>    This patch intrinsifies the `expand` operation in the above environments.
>    
>    Since there are no native instructions directly corresponding to `expand`
>    in these cases, this patch mainly leverages the `TBL` instruction to
>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>    Take a 128-bit byte vector on SVE2 as an example:
>    ```
>    To compute: dst = src.expand(mask)
>    Data direction: high <== low
>    Input:
>      src                         = p o n m l k j i h g f e d c b a
>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    Expected result:
>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>    ```
>    Step 1: calculate the index input of the TBL instruction.
>    ```
>    // Set tmp1 as all 0 vector.
>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>    
>    // Move the mask bits from the predicate register to a vector register.
>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    
>    // Shift the entire register. Prefix sum algorithm.
>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>    
>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>    
>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>    
>    dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>    tmp2 += ...

Reviewed internally. So LGTM!

-------------

Marked as reviewed by xgong (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26740#pullrequestreview-3183180082

From haosun at openjdk.org  Thu Sep  4 02:47:45 2025
From: haosun at openjdk.org (Hao Sun)
Date: Thu, 4 Sep 2025 02:47:45 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
Message-ID: <bbkmGH8VSH_8Lm6dgt-P8XnlK3ghhsjtl6nBVQpJadU=.5a22c170-1da9-41d2-9091-f55d28489270@github.com>

On Mon, 4 Aug 2025 02:31:08 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
>> 
>> ### Background
>> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
>> 
>> ### Implementation
>> 
>> #### Challenges
>> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
>> 
>> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
>> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
>> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
>> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
>> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
>> 
>> Use `ByteVector.SPECIES_512` as an example:
>> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
>> - It requires 4 times of vector gather-loads to finish the whole operation.
>> 
>> 
>> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
>> int[] idx = [0, 1, 2, 3, ..., 63, ...]
>> 
>> 4 gather-load:
>> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
>> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
>> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
>> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
>> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
>> 
>> 
>> #### Solution
>> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
>> 
>> Here is the main changes:
>> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
>> - Added `VectorSliceNode` for result mer...
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge 'jdk:master' into JDK-8351623-sve
>  - Address review comments
>  - Refine IR pattern and clean backend rules
>  - Fix indentation issue and move the helper matcher method to header files
>  - Merge branch jdk:master into JDK-8351623-sve
>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation

LGTM

-------------

Marked as reviewed by haosun (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26236#pullrequestreview-3183215589

From missa at openjdk.org  Thu Sep  4 05:20:30 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 4 Sep 2025 05:20:30 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v5]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <eQC5NKVWyupKfHixG9i4qZBbLmoMdR5By9_SddHV1WM=.e21d5302-f02b-4733-8bc5-5e797fef9ab3@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Add AVX 10.2 CPU feature flag to list of verified ones

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/07ac817a..e0c84f69

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=03-04

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From jbhateja at openjdk.org  Thu Sep  4 05:44:45 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Thu, 4 Sep 2025 05:44:45 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v5]
In-Reply-To: <eQC5NKVWyupKfHixG9i4qZBbLmoMdR5By9_SddHV1WM=.e21d5302-f02b-4733-8bc5-5e797fef9ab3@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <eQC5NKVWyupKfHixG9i4qZBbLmoMdR5By9_SddHV1WM=.e21d5302-f02b-4733-8bc5-5e797fef9ab3@github.com>
Message-ID: <V03-wM4Ds15lpJj6OToGNIWdZVyV-rWCX8WrGo1_mjs=.e277016e-48be-4a5e-9d00-3f8e06a88175@github.com>

On Thu, 4 Sep 2025 05:20:30 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
>> 1...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add AVX 10.2 CPU feature flag to list of verified ones

test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java line 90:

> 88:     @Test
> 89:     @IR(counts = {IRNode.VECTOR_CAST_F2I, IRNode.VECTOR_SIZE_16, "> 0"},
> 90:         applyIfCPUFeatureOr = {"avx512f", "true", "avx10_2", "true"})

You should check for target specific Machine IR which is selected on AVX10_2 targets.

test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java line 108:

> 106:     @Test
> 107:     @IR(counts = {IRNode.VECTOR_CAST_F2L, IRNode.VECTOR_SIZE_8, "> 0"},
> 108:         applyIfCPUFeatureOr = {"avx512dq", "true", "avx10_2", "true"})

avx10_2 is super set of AVX512DQ, we enable all AVX512 featurs during VM initialization and IRFrameWork rely on the same.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2320889420
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2320891875

From jbhateja at openjdk.org  Thu Sep  4 05:47:25 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Thu, 4 Sep 2025 05:47:25 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using knownbits
Message-ID: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>

This patch optimizes PopCount value transforms using KnownBits information.
Following are the results of the micro-benchmark included with the patch

System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.30GHz (Emerald Rapids)


Baseline:-
Benchmark                                      Mode  Cnt       Score   Error  Units
PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  151997.051          ops/s
PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  161261.825          ops/s
PopCountValueTransform.StockKernelInt         thrpt    2  194680.419          ops/s
PopCountValueTransform.StockKernelLong        thrpt    2  216580.319          ops/s

Withopt:-
Benchmark                                      Mode  Cnt       Score   Error  Units
PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  216502.647          ops/s
PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  193400.575          ops/s
PopCountValueTransform.StockKernelInt         thrpt    2  195595.989          ops/s
PopCountValueTransform.StockKernelLong        thrpt    2  217776.426          ops/s 


Kindly review and share your feedback.

Best Regards,
Jatin

-------------

Commit messages:
 - 8365205: C2: Optimize popcount value computation using knownbits

Changes: https://git.openjdk.org/jdk/pull/27075/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8365205
  Stats: 137 lines in 3 files changed: 137 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From hgreule at openjdk.org  Thu Sep  4 06:29:40 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Thu, 4 Sep 2025 06:29:40 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <o6g1ThW4JWbqZyNKK_r51cAcF5yaYx9bBEeST44uT8k=.e6596448-3dbb-4fc8-a061-50fb37f3d843@github.com>

On Wed, 3 Sep 2025 16:10:43 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.30GHz (Emerald Rapids)
> 
> 
> Baseline:-
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  151997.051          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  161261.825          ops/s
> PopCountValueTransform.StockKernelInt         thrpt    2  194680.419          ops/s
> PopCountValueTransform.StockKernelLong        thrpt    2  216580.319          ops/s
> 
> Withopt:-
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  216502.647          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  193400.575          ops/s
> PopCountValueTransform.StockKernelInt         thrpt    2  195595.989          ops/s
> PopCountValueTransform.StockKernelLong        thrpt    2  217776.426          ops/s 
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

The change looks good, but I wonder:

- if it makes sense to have some kind of IR tests (i.e., it's folded away when unneeded, when the input is a constant, ...)?
- whether the explanation could be simplified: Assuming a correct implementation of the KnownBits canonicalization, we can argue
	- `_zeroes` has the bits set that are known to be always 0. So `BitsPer<Type> - popCount(x)` gives you an upper limit of how many bits *might* be 1. And `BitsPer<Type> - popCount(_zeroes)` is equivalent to `popCount(~_zeroes)`.
	- `_ones` has the bits set that are known to be always 1. Trivially, `popCount(_ones)` is a valid lower bound.
	- The rest repeats how `adjust_bits_from_unsigned_bounds` works, but that's not specific to the popcount nodes.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3252114288

From dskantz at openjdk.org  Thu Sep  4 06:34:44 2025
From: dskantz at openjdk.org (Daniel Skantz)
Date: Thu, 4 Sep 2025 06:34:44 GMT
Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with
 "Hit MemLimit" and other resourcing errors [v4]
In-Reply-To: <ImE4GMvRS0mguhEym1s84tDliD6VdBzqsLi_7LVkiiE=.2c7a9e8a-16a8-4b68-a67b-12e3be3317cc@github.com>
References: <oE4pDFEgcIH13lUcCbdn20KwW63_9RRpaZCsmNPZzWQ=.832b9063-9bdc-413a-9741-b7d6bb629e8a@github.com>
 <ImE4GMvRS0mguhEym1s84tDliD6VdBzqsLi_7LVkiiE=.2c7a9e8a-16a8-4b68-a67b-12e3be3317cc@github.com>
Message-ID: <NuaFgBVeX_zFeneArsAnzRp4Hkqn4lP7nT8tzgCICBc=.3a612336-f669-4552-b6a3-5083e03e0135@github.com>

On Thu, 21 Aug 2025 07:41:32 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

>> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments().
>> 
>> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations.
>> 
>> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2].
>> 
>> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303
>> 
>> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806
>> 
>> Testing: T1-4.
>> 
>> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass.
>
> Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision:
> 
>   compare order

A comment to keep PR active.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26685#issuecomment-3252128277

From epeter at openjdk.org  Thu Sep  4 06:56:52 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 4 Sep 2025 06:56:52 GMT
Subject: RFR: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops [v3]
In-Reply-To: <V9vvzPujZaVKTBdg3hd1KUH5YbI-Sfp1UsES7jWCDJM=.65f5033e-619a-4d49-a021-df55094f4dca@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
 <zaZmh64g0pWJgG_qND5A-EeRQc5wO8yAiiiYQtgsnMQ=.f0076ad4-8511-4084-9ef1-220d04eb2c15@github.com>
 <V9vvzPujZaVKTBdg3hd1KUH5YbI-Sfp1UsES7jWCDJM=.65f5033e-619a-4d49-a021-df55094f4dca@github.com>
Message-ID: <agw479-Q2rEXI4cywY3Ln3zOZ0VCRcaGfx0uHHYKkos=.7d644f93-f23a-4a04-8d98-0197037eaf30@github.com>

On Tue, 2 Sep 2025 13:19:40 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Apply suggestions from code review
>>   
>>   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>>   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>
> Marked as reviewed by chagedorn (Reviewer).

@chhagedorn @TobiHartmann @mhaessig Thanks for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27045#issuecomment-3252182731

From epeter at openjdk.org  Thu Sep  4 06:56:53 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 4 Sep 2025 06:56:53 GMT
Subject: Integrated: 8366490: C2 SuperWord: wrong result because CastP2X is
 missing ctrl and floats over SafePoint creating stale oops
In-Reply-To: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
References: <7guNwHJ6tuJXGG-X9aACAWAHjsneD4uryM-ZazES_Uc=.fe831ae6-c8a1-446d-b63e-5b7a1a1f8704@github.com>
Message-ID: <6K1D9UzhzSh8gyGh3FefsMHXkABL_nKWlJkHkopRahE=.2ca357ec-0e05-4f22-bb3a-08e8e8b630ba@github.com>

On Tue, 2 Sep 2025 10:45:33 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> **Analysis**
> 
> A `CastP2X` without ctrl can float. If it floats over a `SafePoint` (or call), we may GC and move the oop. But the `CastP2X` value does not end up on the oop-map, and so the pointer is stale (old).
> 
> With `StressGCM`, the aliasing runtime check has one `CastP2X` that floats over the SafePoint, and another that stays after the SafePoint. Both read the oop of the same array, so instead of getting the same address, we now get the old and the new oop. And so the aliasing runtime check passes (thinks there is no aliasing), even though there is aliasing. We end up vectorizing, which reorders the loads/stores and would only be safe if there is no aliasing.
> 
> **Fix:** add control to the `CastP2X` so that it cannot float too far.
> 
> **Details**
> 
> 
> rbp = Allcoate array
> spill <- rbp + 0x20
> 
> call to allocateArrays
> -> allocates a lot, and triggers GC. That moves the allocated array behind rbp
> -> rbp is oop-mapped, so it is updated automatically to the new oop
> -> spill value remains based on the old oop
> 
> We now compute the aliasing runtime check:
> -> one side of the comparison is computed from rbp (new oop)
> -> the other side is computed from the the spill value (old oop)
> -> the cmp returns a nonsensical value, and we take the wrong branch
> -> vectorize even though we have aliasing!

This pull request has now been integrated.

Changeset: 2527e9e5
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/2527e9e58d770c50e6d807bf1483c6bb07dd3de7
Stats:     152 lines in 5 files changed: 139 ins; 1 del; 12 mod

8366490: C2 SuperWord: wrong result because CastP2X is missing ctrl and floats over SafePoint creating stale oops

Reviewed-by: thartmann, chagedorn, mhaessig

-------------

PR: https://git.openjdk.org/jdk/pull/27045

From rcastanedalo at openjdk.org  Thu Sep  4 07:47:41 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 4 Sep 2025 07:47:41 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT.
In-Reply-To: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
Message-ID: <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>

On Wed, 3 Sep 2025 00:53:59 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

> Please, review this patch to fix issue that may occur when reducing allocation merge.
> 
> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
> 
> The change in `revisit_reducible_phi_status` is just a clean-up.
> The real fix is in `find_scalar_replaceable_allocs`.
> 
> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.

Hi Cesar, thanks for addressing this issue. I will run some more comprehensive testing and have a look at it in the next days.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3252350744

From duke at openjdk.org  Thu Sep  4 08:04:46 2025
From: duke at openjdk.org (erifan)
Date: Thu, 4 Sep 2025 08:04:46 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v2]
In-Reply-To: <kOz6uh-zcSXfhMf5UioaIXmQ3vbU18UMgncgOslfyv0=.c6d6ef2c-8b9e-48c9-b78d-df96e07d7832@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
 <kOz6uh-zcSXfhMf5UioaIXmQ3vbU18UMgncgOslfyv0=.c6d6ef2c-8b9e-48c9-b78d-df96e07d7832@github.com>
Message-ID: <_VZ4L0DTdTxRz1XzG4QIyYY7TyCHzroEOeOV21N17_Y=.e92ad3fd-3e94-4bf5-a570-dc8cc8c9e9ed@github.com>

On Wed, 3 Sep 2025 12:49:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Improve the comment of the vector expand implementation
>>  - Merge branch 'master' into JDK-8363989
>>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>>    
>>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>>    cases, `expand` has not yet been intrinsified:
>>    1. **Subword types** on SVE2-capable hardware.
>>    2. **All types** on NEON and SVE1 environments.
>>    
>>    As a result, `expand` API performance is very poor in these scenarios.
>>    This patch intrinsifies the `expand` operation in the above environments.
>>    
>>    Since there are no native instructions directly corresponding to `expand`
>>    in these cases, this patch mainly leverages the `TBL` instruction to
>>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>>    Take a 128-bit byte vector on SVE2 as an example:
>>    ```
>>    To compute: dst = src.expand(mask)
>>    Data direction: high <== low
>>    Input:
>>      src                         = p o n m l k j i h g f e d c b a
>>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>>    Expected result:
>>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>>    ```
>>    Step 1: calculate the index input of the TBL instruction.
>>    ```
>>    // Set tmp1 as all 0 vector.
>>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>    
>>    // Move the mask bits from the predicate register to a vector register.
>>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>>    
>>    // Shift the entire register. Prefix sum algorithm.
>>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>>    
>>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>>    
>>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 ...
>
> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2819:
> 
>> 2817:   subv(dst, size, tmp2, tmp1);
>> 2818:   // dst = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 2819:   tbl(dst, size, src, 1, dst);
> 
> It would make it a little easier to read the example if the numbers were aligned.
> Now the minus sign disrupts that a little. Maybe leave 2 spaces if the number is positive?

Make sense, I'll update it in the following commit.

> test/hotspot/jtreg/compiler/vectorapi/VectorExpandTest.java line 48:
> 
>> 46:     static final VectorSpecies<Float> F_SPECIES = FloatVector.SPECIES_MAX;
>> 47:     static final VectorSpecies<Long> L_SPECIES = LongVector.SPECIES_MAX;
>> 48:     static final VectorSpecies<Double> D_SPECIES = DoubleVector.SPECIES_MAX;
> 
> Would it make sense to run these tests with various vector sizes?
> Because it seems your algorithm depends on `vector_length_in_bytes` in the prefix sum algo.

Since we already have correctness tests for `expand` on **all vector types** under `test/jdk/jdk/incubator/vector/`, such as https://github.com/openjdk/jdk/blob/986ecff5f9b16f1b41ff15ad94774d65f3a4631d/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L5375, this test primarily verifies that the expected IR is generated. So, I think this is sufficient?

I've tested this PR locally on a 128-bit SVE2 machine, a 256-bit SVE machine, and a 512-bit QEMU environment, and all tests passed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26740#discussion_r2321198368
PR Review Comment: https://git.openjdk.org/jdk/pull/26740#discussion_r2321194040

From duke at openjdk.org  Thu Sep  4 08:13:42 2025
From: duke at openjdk.org (erifan)
Date: Thu, 4 Sep 2025 08:13:42 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <bWiaf5dT3JLWsXpijrpnjMCTToX-daYvE5rXh0VafsI=.695b1b57-2540-4e0d-ab23-3b0ec418bb2d@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
 <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>
 <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
 <bWiaf5dT3JLWsXpijrpnjMCTToX-daYvE5rXh0VafsI=.695b1b57-2540-4e0d-ab23-3b0ec418bb2d@github.com>
Message-ID: <xwxGY16S_Ofw2mF6-QQe4DA9w4TTKwJMVX68z3vm3xU=.b6f04325-8a90-4c31-9628-9bf2ae4ef858@github.com>

On Wed, 3 Sep 2025 07:19:06 GMT, erifan <duke at openjdk.org> wrote:

>>> 1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
>> 
>> That's a fair point, but the Arch64 name for all four instructions is CPY, and they are distinguished by their operands. Deviation from the names in the Reference Manual is occasionally necessary, but it makes life painful for maintainers when they have to search for what we've called an instruction they want to use.
>>  
>>>     2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** .
>> 
>> Yes, that's right.
>> 
>>> While some encoding code can be reused, separating the encodings makes the code clearer.
>> 
>> I don't agree that it makes the code clearer. In fact, tight factoring emphasizes the fact that these instructions are similar, and explicitly shows where they are different.
>> 
>> It is true that I have a strong bias against copy-and-paste programming.
>> 
>>> I think both implementations are fine. If you think it's better to not refactor, I'll revert.
>> 
>> I do. Thank you.
>
>> I do. Thank you.
> 
> Ok, I have reverted the refactoring. Please help take another look, thanks~

> @erifan I'm running some internal testing - though we don't have SVE machines so you are responsible to make sure it is adequately tested for that ;)

Yeah, I have tested the PR on a 128-bit sve2 machine, 512-bit and 256-bit qemu environments. All tests passed.

A test timed out on macOS, which I believe is unrelated to the PR. I retriggered the test to see what was happening.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3252432122

From duke at openjdk.org  Thu Sep  4 08:16:46 2025
From: duke at openjdk.org (erifan)
Date: Thu, 4 Sep 2025 08:16:46 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
Message-ID: <ntGcrez0iCOa8r-Df8hX8Eku8lKmDw34osFdl715xks=.cf770227-394b-4f7b-86c3-340b89101a30@github.com>

On Mon, 4 Aug 2025 02:31:08 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
>> 
>> ### Background
>> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
>> 
>> ### Implementation
>> 
>> #### Challenges
>> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
>> 
>> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
>> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
>> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
>> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
>> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
>> 
>> Use `ByteVector.SPECIES_512` as an example:
>> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
>> - It requires 4 times of vector gather-loads to finish the whole operation.
>> 
>> 
>> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
>> int[] idx = [0, 1, 2, 3, ..., 63, ...]
>> 
>> 4 gather-load:
>> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
>> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
>> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
>> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
>> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
>> 
>> 
>> #### Solution
>> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
>> 
>> Here is the main changes:
>> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
>> - Added `VectorSliceNode` for result mer...
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge 'jdk:master' into JDK-8351623-sve
>  - Address review comments
>  - Refine IR pattern and clean backend rules
>  - Fix indentation issue and move the helper matcher method to header files
>  - Merge branch jdk:master into JDK-8351623-sve
>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation

LGTM

-------------

Marked as reviewed by erifan at github.com (no known OpenJDK username).

PR Review: https://git.openjdk.org/jdk/pull/26236#pullrequestreview-3183982402

From mchevalier at openjdk.org  Thu Sep  4 08:49:44 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 08:49:44 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
Message-ID: <Yt3sM3X1mG9lZWiuiXjBcSiHmezLbfVtaRZmOT4LMpk=.fcc107e7-a005-4132-a0ab-c24fd030a69d@github.com>

On Mon, 25 Aug 2025 13:44:27 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/phaseX.hpp line 615:
>> 
>>> 613:   Node* _verify_window[_verify_window_size];
>>> 614:   void verify_step(Node* n);
>>> 615:   GraphInvariantChecker* _invariant_checker;
>> 
>> Why do you allocate it separately, and not have it in-place?
>
> Is there only a single PhaseIterGVN per compilation? I forgot. An alternative would be to allocate it at the level of the compilation.

> Why do you allocate it separately, and not have it in-place?

So that I can forward declare `GraphInvariantChecker` so I won't leak a non-trivial header everywhere through a widely included header.

> Is there only a single PhaseIterGVN per compilation? I forgot. An alternative would be to allocate it at the level of the compilation.

Not quite, indeed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321310336

From mchevalier at openjdk.org  Thu Sep  4 08:53:43 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 08:53:43 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
Message-ID: <RemdjEfPCVrFh3MEdDolKYCgSdmF2GrEvFc_XQRiXlM=.099b1b59-8499-46d5-9220-54ae5e59657d@github.com>

On Mon, 25 Aug 2025 13:46:55 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Beno?t's comments
>
> src/hotspot/share/opto/graphInvariants.cpp line 32:
> 
>> 30: 
>> 31: void LocalGraphInvariant::LazyReachableCFGNodes::fill() {
>> 32:   precond(live_nodes.size() == 0);
> 
> Maybe I missed something here: where do the `precond` and `postcond` come from?

`debug.hpp` just next to `assert`. They are "standard", but not very widely used. I think they are good as they clearly state what is a precondition or a postcondition. There is no message (or rather a default one), but it's better (or not worse) than giving a not very inspired one, like "fail", which one can find often.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321320549

From mchevalier at openjdk.org  Thu Sep  4 09:04:43 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 09:04:43 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
Message-ID: <k1jTUITJhTVZs4Yjn9rK2aSP2vyEsvnPX6Ag4_sVwiE=.43e142cf-3066-48ba-ba3e-363c9f6a4420@github.com>

On Mon, 25 Aug 2025 13:50:24 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Beno?t's comments
>
> src/hotspot/share/opto/graphInvariants.hpp line 37:
> 
>> 35:   static constexpr int OutputStep = -1;
>> 36: 
>> 37:   struct LazyReachableCFGNodes {
> 
> You could add a comment here. What I was surprised by: that you do a whole graph traversal the first time we call `is_node_dead`. I thought you would just visit a subgraph every time, and fill out the `live_nodes` gradually.
> 
> You could also give an explanation why it needs to be lazy. Is it possible that we never call `is_node_dead`?

I don't think it makes much sense to visit a subgraph: I want a proof a node is dead. I could climb from the node, trying to reach the root, and traverse a lot of things, that I can't say yet if they are dead or not. Once I saturated or reached the root, I can say for those, but it seems that the logic is more tricky: I would have 3 state per node: dead, alive, not decided yet. I don't think it's worth the complexity.

Also, let's not exaggerate: it's a traversal only of the control sub-graph, not the whole graph. It is much smaller.

I think I give an explanation in the comment of `LocalGraphInvariant::check`:
> The parameter [live_nodes] is used to share the lazily computed set of CFG nodes reachable from root. This is because some checks don't apply to dead code, suppress their error if a violation is detected in dead code.

So we would call `is_node_dead` only if there is a violation of a check that we should suppress in dead code. If there is no violation of such a check (no violation at all, or only of checks that we still want to see from dead code), we won't call `is_node_dead`. I'll improve the comment and point to there.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321348168

From mchevalier at openjdk.org  Thu Sep  4 09:19:43 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 09:19:43 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
Message-ID: <OW3WWkCleTJthk05JQXizAgHSMb8d8gWYHLS0iCNgY0=.1d936714-83a6-45c2-8957-38f3b346f75d@github.com>

On Mon, 25 Aug 2025 14:03:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Beno?t's comments
>
> src/hotspot/share/opto/graphInvariants.cpp line 207:
> 
>> 205:   }
>> 206:   bool (Node::*_type_check)() const;
>> 207: };
> 
> You could probably generalize this with a callback approach. And then one concrete implentation is the one that does the type check. Just an idea.

Seems overengineered to me. The callback version would be similarly long as this. The user that must provide the callback will also be similarly long. It makes the logic unnecessarily complicated to me. Of course, everything boils down to a function that takes a node and perform a specific check, but then, this generalized version does nothing significant but calling the callback. The concrete implementation will just have all the same logic, but in a callback passed to another method instead of having it as a first class method...

If I don't have an adapter class that would only check type but I leave that at instanciation time, the code would look like

NodeCallback([](const Node* n) { return n->is_Region(); })

instead of

NodeClass(&Node::is_Region)

which is unreadable. That's the point of patterns: it makes easy to understand the shape, otherwise, one can just write normal, manual traversal, which is all powerful.

It was also discussed above that something like the `NodeCallback` could exist for when we need something that can't be expressed simply, but:
- will it ever happen?
- NodeCallback doesn't even provide a useful error messages, we would also need a callback to craft it (or make the one callback more complicated, that would be pretty much the content of `NodeClass::check`)
- I'm not willing to make the common kind of patterns ugly for a rare usecase.

And as for implementing `NodeClass` from a hypothetical `NodeCallback`, what would be the concrete benefits? (kinda the first paragraph again: all the logic in the callback, and NodeCallback doing nothing).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321388003

From chagedorn at openjdk.org  Thu Sep  4 09:33:47 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 4 Sep 2025 09:33:47 GMT
Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead
 of the Bool type [v3]
In-Reply-To: <_EN6o6Jwu73CNwvSXYt2cHSHu6Yglkp86f1t7lywwi4=.a84b6fac-327a-48a5-8f1e-772b31d8da10@github.com>
References: <LmCIZuKf5HvIO11yvPWX2H7f_2cYqD0EVUNZffsuLh4=.06f2b4b8-3c6f-4cfa-91bb-03df54688033@github.com>
 <oV2eNI_Xgm8CUnHEodKU1dxGRxOFOHEyH-zrf8BmniM=.745c473b-1319-46a2-9ef9-ecaf6bec8668@github.com>
 <aWrX_YhB8STx3Donb9aTYSYgWQ8TSOqMMqOyUuGX_j4=.2b4180a0-4112-4a89-824d-4ccac0f9718d@github.com>
 <_EN6o6Jwu73CNwvSXYt2cHSHu6Yglkp86f1t7lywwi4=.a84b6fac-327a-48a5-8f1e-772b31d8da10@github.com>
Message-ID: <NLFyMNJKy92NbY57fsb7Cljlhy-P8RyMnajL_KVwIU8=.00e773f5-8215-4322-a509-8c192cbc985e@github.com>

On Fri, 29 Aug 2025 13:17:48 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> # Absence note
>> 
>> Today is the last day before a ~2 weeks vacation, so my next working day is Monday, September 1st.
>> 
>> Please feel free to keep giving feedback and/or reviews, and I will continue when I'm back.
>> 
>> Cheers,
>> Francisco
>
> Hi @franferrax, hope you had a good vacation!
> 
>> Hi @chhagedorn,
>> 
>> I added the new tests in [e6b1cb8](https://github.com/openjdk/jdk/commit/e6b1cb897d9c75b34744c7d24f72abcec9986b0b). One problem I'm facing is that I'm unable to generate `Bool` nodes with arbitrary `BoolTest` values. Even if I try the assert inversions I removed in [10e1e3f](https://github.com/openjdk/jdk/commit/10e1e3f4f796d05dcd5c56bc2365d5d564d93952), C2 has preference for `BoolTest::ne`, `BoolTest::le` and `BoolTest::lt`. Instead of using `BoolTest::eq`, `BoolTest::gt` or `BoolTest::ge`, it swaps what is put in `IfTrue` and `IfFalse`.
>> 
>> Even if `javac` generates an `ifeq` and an `ifne` with the same inputs, instead of a single `CmpU` with two `Bool`s (`BoolTest::eq` and `BoolTest::ne`), I get a single `Bool` (`BoolTest::ne`) with two `If` (one of them swapping `IfTrue` with `IfFalse`). I guess this is some sort of canonicalization to enable further optimizations.
>> 
>> Do you know a way to influence the `Bool`'s `BoolTest` value? Or @rwestrel do you?
>> 
>> This means the following 8 cases are not really testing what they claim, but repeating other cases with `IfTrue` and `IfFalse` swapped:
>> 
>> * `testCase1aOptimizeAsFalseForGT(xm|mx)` (they should use `BoolTest::gt`, but use `BoolTest::le`)
>> * `testCase1bOptimizeAsFalseForEQ(xm|mx)` (they should use `BoolTest::eq`, but use `BoolTest::ne`)
>> * `testCase1bOptimizeAsFalseForGE(xm|mx)` (they should use `BoolTest::ge`, but use `BoolTest::lt`)
>> * `testCase1bOptimizeAsFalseForGT(xm|mx)` (they should use `BoolTest::gt`, but use `BoolTest::le`)
>> 
>> Even if we don't find a way to influence the `BoolTest`, the cases are still valid and can be kept (just in case the described behaviour changes).
> 
> Hm, that's a good point. `Parse::do_if()` indeed always canonicalizes the `Bool` nodes... But I was sure we can still somehow end up with non-canonicalized versions again with some tricks. I was curious and played around with some examples and could indeed find test cases for `gt`, `ge` , and `eq`.
> 
> I was then also thinking about notification code in IGVN. We already concluded further up that it's not needed for CCP because `CmpU` nodes below `AddI` nodes are put to the worklist again. However, with IGVN, we could modify the graph above the `AndI` as well. We miss notification code for `CmpU` below `AndI`. I changed my test cases further to also run into such a missing optimization case. When run with `-XX:VerifyIterativeGVN=1110`, we indeed get su...

> Hi @chhagedorn, thank you for the additional work and your insights. This is much appreciated from a learner perspective.

Sure, you're welcome :-)

> I didn't fully analyze the Test.java you provided yet, but wanted to check if you are aiming to include the missing IGVN notification code as part of this issue (and its corresponding test). Or are you working on an independent issue?

I think you could squeeze that in here as well. With mainline, you probably need a different notification code because we need to add the `Bool` node instead of the `CmpU` node. But with this patch, we only require the `CmpU`. So, I guess it's not worth to fix it separately only to update it again with this patch.

> My availability will be limited as the October CPU approaches, but it will try to find some timeboxes to make TestBoolNodeGVN.java emit the right test cases for gt, ge , and eq

Sounds good, no hurry. Thanks for taking another look to improve the test!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3252790567

From duke at openjdk.org  Thu Sep  4 09:42:49 2025
From: duke at openjdk.org (erifan)
Date: Thu, 4 Sep 2025 09:42:49 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v3]
In-Reply-To: <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
Message-ID: <q2XDj9jri77RC-AAE0lOHWpiD-5hrmSq3niaDmrm81Y=.13f17498-c1f1-499b-a2dd-d626d7b919fd@github.com>

On Wed, 3 Sep 2025 10:02:24 GMT, erifan <duke at openjdk.org> wrote:

>> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
>> 
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>> 
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>> 
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>> 
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Code style fixes

The test failure should be irrelevant to this PR, I can see it in other PR's test results, like https://github.com/egahlin/jdk/actions/runs/17436633376/job/49510579213

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3252830637

From mchevalier at openjdk.org  Thu Sep  4 10:53:43 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 10:53:43 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
Message-ID: <_-lh_iD6ya5G9_ODqDXbfa1aTrC6J1DP5hUM4RHQUKo=.1b66fd8a-ee5c-4844-bb81-01b7758ba5dc@github.com>

On Mon, 25 Aug 2025 14:09:09 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 270:
>> 
>>> 268:                 new HasNOutputs(2),
>>> 269:                 new AtSingleOutputOfType(&Node::is_IfTrue, new True()),
>>> 270:                 new AtSingleOutputOfType(&Node::is_IfFalse, new True()))) {
>> 
>> I would suggest that you append the word `Pattern` to all `Patterns` - at least in most cases this will make it a bit easier to see what you have at the use-site. I'm looking at `new True()` and wonder what might be passed here... if it was called `TruePattern`, it would be immediately clear.
>
> You could leave a comment at `True(Pattern)` that it is (often) used as the terminal pattern, at the end of a branch / search.

> I would suggest that you append the word Pattern to all Patterns - at least in most cases this will make it a bit easier to see what you have at the use-site. I'm looking at new True() and wonder what might be passed here... if it was called TruePattern, it would be immediately clear.

For `True` alright, no strong opinion. Could also be `TrivialPattern` or so. For everything else, that looks very verbose and hurts readability a lot, and I think readability is very important. I think the other patterns are pretty understandable: for instance, I don't see how `HasAtLeastNInputsPattern` would really help compared to `HasAtLeastNInputs`, it seems just like bloat my brain will have to strip.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321622684

From mchevalier at openjdk.org  Thu Sep  4 11:10:46 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 11:10:46 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
Message-ID: <DCA_CeYoAocClu9HKpT4-ShN-6NEHufaas9ZGYaoELQ=.c61468b6-7c94-4118-a17d-905a1dc86a12@github.com>

On Mon, 25 Aug 2025 14:14:41 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Beno?t's comments
>
> src/hotspot/share/opto/graphInvariants.cpp line 279:
> 
>> 277:       return CheckResult::NOT_APPLICABLE;
>> 278:     }
>> 279:     CheckResult r = PatternBasedCheck::check(center, reachable_cfg_nodes, steps, path, ss);
> 
> Could this not be solved with a `OrPattern`?
> 
> Or::make( <not is_If ...,
>   <the And::make from above>
> )
> 
> Not sure that's worth it...

I understand that OrPatterns are tempting! I also thought about it, it's naturally the dual of `And`. At this point, they are not actually a good idea.

First, they cannot provide good reporting. When an `And` is failing, we can at least blame the first thing that fails: "I followed this path, I expected to find 5 inputs (for instance), there are only 2!". With `Or` we would get that and... maybe it's fine? Maybe not? Depends on the next branches, and if it ends up failing, how to provide a good message?

Also, they cause a mess with binding. If a branch contains a `Bind`, one cannot know which branch matched and whether the content of the `Node` pointer given to `Bind` is trustworthy. We can't even rely on a test whether the pointer was set because the execution of a branch might find a `Bind` first, run it, assign the pointer and later fail, and then the `Bind` is not to use. This is a common problem with pattern matching in functional programming: the same bindings must appear (with same types) on each branch of or-patterns. But we have no such mechanisms to enforce that yet, and it seems like setting a trap for future us.

There is also relatively few use cases, and that would not profit a lot from a `Or` pattern. Maybe in the future, we will have more interesting usecases and we will see how to address these issues. But for now, I think we should not include it for now rather than making a bad choice.

By the way, I think something that has more future than `Or` is rather a case analysis: `IfThenElse(CondtionPattern, TrueBranchPattern, FalseBranchPattern)` if CondtionPattern is true, then we try to match TrueBranchPattern, otherwise FalseBranchPattern. This is better for reporting since we know which branch to we expect to be true, and so to blame (assuming we don't blame CondtionPattern, but we can include that in the message possibly). This still has the binding consistency issue, but more boilerplate could help (querying the set of pointers that would be set in each branch with helping methods...). Yet, let's wait and see.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321677987

From mchevalier at openjdk.org  Thu Sep  4 11:13:43 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 11:13:43 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
Message-ID: <SRco-rpGpWxOW4UJdjjXJ5nWvJ7ytlG3_L6xL9SQZUw=.c42cdf01-39c3-47f4-8cd2-84feb8fa0e90@github.com>

On Mon, 25 Aug 2025 14:16:42 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 287:
>> 
>>> 285:       }
>>> 286:     }
>>> 287:     return r;
>> 
>> Also this could probably be handled with a pattern wrapping mechanism, right?
>> `FailOnlyForLiveNodes( <the pattern from above> )`
>
> I'm just suggesting it in case you need to do this sort of special-casing elsewhere too ;)

That would be possible. It's still rare, and I'm not convinced we should make so specialized such patterns for one usecase. If it gets more usage, sure, that would be something to do. The only other usage is not so easy to phrase as template.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321687572

From mchevalier at openjdk.org  Thu Sep  4 11:16:42 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 11:16:42 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
Message-ID: <m9YpeewjFcL8PG--Vlc6zbfffPpNFVu27WyFwcSLxYk=.9e0e9e3c-29ff-4ca4-afea-d648752646db@github.com>

On Mon, 25 Aug 2025 14:20:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 301:
>> 
>>> 299:                     And::make(
>>> 300:                         new NodeClass(&Node::is_Region),
>>> 301:                         new Bind(region_node))))) {
>> 
>> This sort of binding is kinda cool! Never thought of it before. Could be really cool for general pattern matching.
>> We would have to find a solution if there would be multiple bindings though ... I think that's not possible with your patterns, right? Is that a fundamental constraint?
>
> What would be extra cool / funky:
> If we could somehow already cast the `Bind` variable to `Region`. Could be tricky.
> Doing this `is_Region and bind` could be a very common idiom, so very useful.

>  We would have to find a solution if there would be multiple bindings though ... I think that's not possible with your patterns, right? Is that a fundamental constraint?

Not sure what you mean? `And::make(new Bind(bla), AtInput(1, new Bind(bli)))`? You probably mean something else.


> If we could somehow already cast the Bind variable to Region. Could be tricky.
> Doing this is_Region and bind could be a very common idiom, so very useful.

Interesting... Not sure how with some template magic we don't have (like `Node::is<RegionNode>`) but probably doable with macros. I'll give it a try.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321698421

From mchevalier at openjdk.org  Thu Sep  4 11:21:50 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 11:21:50 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
Message-ID: <mOEGABxIy2JYZYwjmxnznr-46OsPd7sOTm6mrJpvv6g=.646d0008-c703-4c00-9d71-098bb7c3b9fd@github.com>

On Mon, 25 Aug 2025 14:23:12 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Beno?t's comments
>
> src/hotspot/share/opto/graphInvariants.cpp line 319:
> 
>> 317:       return CheckResult::FAILED;
>> 318:     }
>> 319:     return CheckResult::VALID;
> 
> Another funky idea: could probably be handled with some callback, some "terminal" check you do on the bound variable. Not sure if worth it.

It's difficult if we want to speak about more than one node. It cannot be part of the pattern since it'd be very non-local. Also with only one node, it must be executed at the end, and not when still traversing. I think it'd get even messier when we have a few bindings and we want to do things with them in a couple of different ways... Not sure how to express that much nicer.

> src/hotspot/share/opto/graphInvariants.cpp line 332:
> 
>> 330:     }
>> 331: 
>> 332:     Node_List ctrl_succ;
> 
> Do we need a `ResouceMark` for this?

Everything will run under `GraphInvariantChecker::run()` that has a `ResouceMark`. I'm not sure, but my guess is that it's not worth keeping entering and leaving resource marks for relatively short lists? At the very list, everything will be released at the end of the whole check. I can still add one here if you think it's better.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321705602
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321711388

From mchevalier at openjdk.org  Thu Sep  4 11:32:49 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 11:32:49 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
Message-ID: <nDGqLQkb8SlZedYauHiMqLvU26Qt7EphqzpVgPePwhs=.284eef62-53e7-4a2c-8414-9808a7852b7f@github.com>

On Mon, 25 Aug 2025 14:27:27 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Beno?t's comments
>
> src/hotspot/share/opto/graphInvariants.cpp line 338:
> 
>> 336:       if (out->is_CFG()) {
>> 337:         cfg_out++;
>> 338:         ctrl_succ.push(out);
> 
> Seems you do these in a pair. So why do you need `cfg_out` at all? Can you not take the length/size of `ctrl_succ`? After all, it counts duplicates too (hope that is intended).

True. And yes, duplicated input must still be counted!

> src/hotspot/share/opto/graphInvariants.cpp line 413:
> 
>> 411:       ss.print_cr("%s nodes' 0-th input must be itself or nullptr (for a copy Region).", center->Name());
>> 412:       return CheckResult::FAILED;
>> 413:     }
> 
> Absolutely subjective: checking `self != center` is more about `self`, checking `center != self` is more about `center`. So I would use `self != center` :rofl: 
> Suggestion:
> 
>     if (self != center || (center->is_Region() && self == nullptr)) {
>       ss.print_cr("%s nodes' 0-th input must be itself or nullptr (for a copy Region).", center->Name());
>       return CheckResult::FAILED;
>     }

yes

> src/hotspot/share/opto/graphInvariants.cpp line 447:
> 
>> 445:                     And::make(
>> 446:                         new NodeClass(&Node::is_IfTrue),
>> 447:                         new HasAtLeastNInputs(1),
> 
> Can an `IfTrue` have more than 1 input?

I surely hope not!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321724188
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321730065
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321743991

From mchevalier at openjdk.org  Thu Sep  4 11:32:52 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 11:32:52 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
Message-ID: <6AiBxTxm_R4n0IV0yrqX0qT6nHhmg_-QcYcrJ8c3XNA=.ff226726-3312-4a42-8bef-fb577da92782@github.com>

On Mon, 25 Aug 2025 14:36:41 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 417:
>> 
>>> 415:     if (self == nullptr) {
>>> 416:       // Must be a copy Region
>>> 417:       Node_List non_null_inputs;
>> 
>> ResouceMark?
>
> Is it worth it to do the allocation, if in most cases we just expect 1 non-null?
> Why not count non-nulls, and if we find more than one, traverse again over the Region, and filter and dump them?

True.

> And I would call it `counted_loop_end`.

Right

> Ah, another check and Bind! Why not allow Bind<BaseCountedLoopEndNode*>, so we can bind it with the cast?

I'll try something, but that would be the rather disappointing drawback (since it won't check the type at the same time). Let's see what I can do.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321735660
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321740949

From mchevalier at openjdk.org  Thu Sep  4 11:36:46 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Thu, 4 Sep 2025 11:36:46 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
Message-ID: <MWD6W6EKFwacrD_30-M92_lCWWk8JRM9dq09bFbq4Dw=.98aa75d3-b49c-4ec4-a880-9af5791b83b1@github.com>

On Mon, 25 Aug 2025 14:43:26 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 469:
>> 
>>> 467:     assert(counted_loop != nullptr, "sanity");
>>> 468:     if (is_long) {
>>> 469:       if (counted_loop->is_CountedLoopEnd()) {
>> 
>> Sounds like head/tail confusion here. Call it `counted_loop_end`.
>
> Also: I would invert the check to `!counted_loop_end->is_LongCountedLoopEnd()`. Because you expect it to be a long end here. Subjective.

If you want. I don't think it's perfect because then the message might be less accurate: I don't know that
> A CountedLoopEnd is the backedge of a LongCountedLoop.

I rather know that
> The backedge of a LongCountedLoop is not a LongCountedLoopEnd

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2321755027

From chagedorn at openjdk.org  Thu Sep  4 12:49:56 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 4 Sep 2025 12:49:56 GMT
Subject: RFR: 8366890: C2: Split through phi printing with TraceLoopOpts misses
 line break
Message-ID: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>

[JDK-8356176](https://bugs.openjdk.org/browse/JDK-8356176) added new printing code for `TraceLoopOpts` when splitting nodes through a phi but missed a line break. This will result in:

Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 RegionSplit-If

instead of

Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 Region
Split-If

This patch fixes this.

Thanks,
Christian

-------------

Commit messages:
 - C2: Split through phi printing with TraceLoopOpts misses line break

Changes: https://git.openjdk.org/jdk/pull/27092/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27092&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366890
  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27092.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27092/head:pull/27092

PR: https://git.openjdk.org/jdk/pull/27092

From rcastanedalo at openjdk.org  Thu Sep  4 13:25:43 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 4 Sep 2025 13:25:43 GMT
Subject: RFR: 8366890: C2: Split through phi printing with TraceLoopOpts
 misses line break
In-Reply-To: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
References: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
Message-ID: <wJQ26w4TGzd8upKilJcDTMOGM9L3FF2cK-QwV2EyWqI=.5a058edd-bb7a-4f4b-a081-05d5984f07df@github.com>

On Thu, 4 Sep 2025 12:44:43 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

> [JDK-8356176](https://bugs.openjdk.org/browse/JDK-8356176) added new printing code for `TraceLoopOpts` when splitting nodes through a phi but missed a line break. This will result in:
> 
> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 RegionSplit-If
> 
> instead of
> 
> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 Region
> Split-If
> 
> This patch fixes this.
> 
> Thanks,
> Christian

Looks good, and trivial.

-------------

Marked as reviewed by rcastanedalo (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27092#pullrequestreview-3185272793

From mhaessig at openjdk.org  Thu Sep  4 13:30:42 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 4 Sep 2025 13:30:42 GMT
Subject: RFR: 8366890: C2: Split through phi printing with TraceLoopOpts
 misses line break
In-Reply-To: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
References: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
Message-ID: <WGtHeyGchQmEJIIts-Q4BEk8d-6zqLnFUpjkeD_200Q=.789de2b8-9bc2-49fe-92ad-cb3a6d993cdf@github.com>

On Thu, 4 Sep 2025 12:44:43 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

> [JDK-8356176](https://bugs.openjdk.org/browse/JDK-8356176) added new printing code for `TraceLoopOpts` when splitting nodes through a phi but missed a line break. This will result in:
> 
> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 RegionSplit-If
> 
> instead of
> 
> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 Region
> Split-If
> 
> This patch fixes this.
> 
> Thanks,
> Christian

Thank you for fixing my silly mistake, @chhagedorn! Looks good to me as well.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27092#pullrequestreview-3185304718

From mhaessig at openjdk.org  Thu Sep  4 13:31:15 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 4 Sep 2025 13:31:15 GMT
Subject: RFR: 8366775: TestCompileTaskTimeout should use timeoutFactor
Message-ID: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>

`TestCompileTaskTimeout.java` employs a timeout to test that methods compiled faster than a specified `CompileTaskTimeout`. However, it does not make use of the jtreg timeout factor, which lead to #26963 increasing the timeout to 2 s. This PR remedies this, by using the timeout factor and reducing the default timeout to 500 ms.

Testing:
 - [ ] Github Actions
 - [ ] tier1, tier2 linux-x64-debug, linux-x64, linux-aarch64-debug, linux-aarch64

-------------

Commit messages:
 - Use timeuot factor

Changes: https://git.openjdk.org/jdk/pull/27094/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27094&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366775
  Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27094.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27094/head:pull/27094

PR: https://git.openjdk.org/jdk/pull/27094

From rehn at openjdk.org  Thu Sep  4 13:32:34 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 4 Sep 2025 13:32:34 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v5]
In-Reply-To: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
Message-ID: <64z-PlrnxAISLzKBq-RZz7CXkQirGTvOgTGMJQl833o=.73ea3239-dfb6-4e32-b20f-8398334f2759@github.com>

> Hey, please consider!
> 
> A bunch of info in JBS entry, please read that also.
> 
> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
> This patch restores them and removes this regression.
> 
> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
> 
> Please test on your hardware!
> 
> 
> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> 
> 
> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.

Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:

 - Merge branch 'master' into 8365926
 - Review comments
 - Review comments
 - Merge branch 'master' into 8365926
 - Spelling
 - Merge branch 'master' into 8365926
 - draft jal<->jalr

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26944/files
  - new: https://git.openjdk.org/jdk/pull/26944/files/72e3ba6a..da18e6b6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=03-04

  Stats: 6217 lines in 654 files changed: 3237 ins; 1282 del; 1698 mod
  Patch: https://git.openjdk.org/jdk/pull/26944.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26944/head:pull/26944

PR: https://git.openjdk.org/jdk/pull/26944

From chagedorn at openjdk.org  Thu Sep  4 14:06:45 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 4 Sep 2025 14:06:45 GMT
Subject: RFR: 8366775: TestCompileTaskTimeout should use timeoutFactor
In-Reply-To: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
References: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
Message-ID: <FvPHZtksYvAMn3EhmOUmJTP5zzEqOf2GDH6kA2zQbSA=.07764143-7699-4ec8-bd38-0eb0cd23ed1f@github.com>

On Thu, 4 Sep 2025 13:26:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> `TestCompileTaskTimeout.java` employs a timeout to test that methods compiled faster than a specified `CompileTaskTimeout`. However, it does not make use of the jtreg timeout factor, which lead to #26963 increasing the timeout to 2 s. This PR remedies this, by using the timeout factor and reducing the default timeout to 500 ms.
> 
> Testing:
>  - [ ] Github Actions
>  - [ ] tier1, tier2 linux-x64-debug, linux-x64, linux-aarch64-debug, linux-aarch64

Looks reasonable, thanks for adjusting it again!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27094#pullrequestreview-3185494488

From rcastanedalo at openjdk.org  Thu Sep  4 14:06:45 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 4 Sep 2025 14:06:45 GMT
Subject: RFR: 8366775: TestCompileTaskTimeout should use timeoutFactor
In-Reply-To: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
References: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
Message-ID: <oPZrsjSQKBHfNo9Eyo5zJqfwErIyo7pKtrwWi0is1eM=.879b0c80-6c07-49b4-bb87-eda3a242e47f@github.com>

On Thu, 4 Sep 2025 13:26:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> `TestCompileTaskTimeout.java` employs a timeout to test that methods compiled faster than a specified `CompileTaskTimeout`. However, it does not make use of the jtreg timeout factor, which lead to #26963 increasing the timeout to 2 s. This PR remedies this, by using the timeout factor and reducing the default timeout to 500 ms.
> 
> Testing:
>  - [ ] Github Actions
>  - [ ] tier1, tier2 linux-x64-debug, linux-x64, linux-aarch64-debug, linux-aarch64

Looks good to me! Please check with the PPC port maintainers (and perhaps [the maintainers of RISC-V, s390, and ARM32](https://wiki.openjdk.org/display/HotSpot/Ports)?) that this works in their environment.

-------------

Marked as reviewed by rcastanedalo (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27094#pullrequestreview-3185506163

From mhaessig at openjdk.org  Thu Sep  4 15:13:42 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 4 Sep 2025 15:13:42 GMT
Subject: RFR: 8366775: TestCompileTaskTimeout should use timeoutFactor
In-Reply-To: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
References: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
Message-ID: <g0rIi1y3ncfP3t1Bju_TJW3Gge7bpFH8I5rh2Unj4Cw=.3738907d-7f4c-4fef-b70a-d68c0bf80c16@github.com>

On Thu, 4 Sep 2025 13:26:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> `TestCompileTaskTimeout.java` employs a timeout to test that methods compiled faster than a specified `CompileTaskTimeout`. However, it does not make use of the jtreg timeout factor, which lead to #26963 increasing the timeout to 2 s. This PR remedies this, by using the timeout factor and reducing the default timeout to 500 ms.
> 
> Testing:
>  - [ ] Github Actions
>  - [ ] tier1, tier2 linux-x64-debug, linux-x64, linux-aarch64-debug, linux-aarch64

@MBaesken, could you please have a look, since you filed the issue? Is the reduced default a problem on your side?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27094#issuecomment-3254177977

From cslucas at openjdk.org  Thu Sep  4 17:17:44 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Thu, 4 Sep 2025 17:17:44 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT.
In-Reply-To: <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
Message-ID: <Fk5LMdAohAl9LtsN5SWHIz4MJMO8U8HHMb8We0pVLKo=.2d8352e3-ba6c-4c25-94a7-b674e6e251f5@github.com>

On Thu, 4 Sep 2025 07:44:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Please, review this patch to fix issue that may occur when reducing allocation merge.
>> 
>> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
>> 
>> The change in `revisit_reducible_phi_status` is just a clean-up.
>> The real fix is in `find_scalar_replaceable_allocs`.
>> 
>> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.
>
> Hi Cesar, thanks for addressing this issue. I will run some more comprehensive testing and have a look at it in the next days.

Thank you @robcasloz

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3254677197

From snatarajan at openjdk.org  Thu Sep  4 19:54:19 2025
From: snatarajan at openjdk.org (Saranya Natarajan)
Date: Thu, 4 Sep 2025 19:54:19 GMT
Subject: RFR: 8356779: IGV: dump the index of the SafePointNode containing the
 current JVMS during parsing
Message-ID: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>

This PR prints index of the SafePointNode containing the current JVMS during parsing in IGV. As stated in JBS the reason for this is that there are a lot of nodes during parsing, it would be nice to know what are the current nodes in the local slots or in the stack when looking at a graph.

-------------

Commit messages:
 - initial fix

Changes: https://git.openjdk.org/jdk/pull/27083/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27083&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8356779
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27083.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27083/head:pull/27083

PR: https://git.openjdk.org/jdk/pull/27083

From sparasa at openjdk.org  Thu Sep  4 20:11:28 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Thu, 4 Sep 2025 20:11:28 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v2]
In-Reply-To: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
Message-ID: <sbf00znMv9WzwFzEeUmfpmCPJ02Zdp4RK6vIacVtYH8=.27735940-f027-487b-9ca6-9cfe9944da23@github.com>

> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
> 
> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
> 
> For example:
> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding

Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:

 - nomenclature change
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into cdemotion
 - remove trailing whitespaces
 - remove unused instructions
 - 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26997/files
  - new: https://git.openjdk.org/jdk/pull/26997/files/bd14470a..91962f4f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=00-01

  Stats: 26115 lines in 1121 files changed: 16613 ins; 5592 del; 3910 mod
  Patch: https://git.openjdk.org/jdk/pull/26997.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26997/head:pull/26997

PR: https://git.openjdk.org/jdk/pull/26997

From sparasa at openjdk.org  Thu Sep  4 20:15:52 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Thu, 4 Sep 2025 20:15:52 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v2]
In-Reply-To: <sbf00znMv9WzwFzEeUmfpmCPJ02Zdp4RK6vIacVtYH8=.27735940-f027-487b-9ca6-9cfe9944da23@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <sbf00znMv9WzwFzEeUmfpmCPJ02Zdp4RK6vIacVtYH8=.27735940-f027-487b-9ca6-9cfe9944da23@github.com>
Message-ID: <WUQbYp0tHFYrfJu5eXpdI59xxqhOEtsuQCkez1n1zv8=.8835f357-1090-4aa8-90b2-83da856faa0a@github.com>

On Thu, 4 Sep 2025 20:11:28 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
>> 
>> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
>> 
>> For example:
>> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
>> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - nomenclature change
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into cdemotion
>  - remove trailing whitespaces
>  - remove unused instructions
>  - 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2

> Hi @vamsi-parasa , thanks for working on this, I am process of validating #26283 and find that additional RA biasing will enable demotion for more cases, with a minimal test case I see following results
> 
Hi Jatin (@jatin-bhateja), thank you for sharing the information about the register allocation biasing PR you're working on that will improve demotion.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26997#issuecomment-3255441849

From sparasa at openjdk.org  Thu Sep  4 20:15:53 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Thu, 4 Sep 2025 20:15:53 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v2]
In-Reply-To: <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>
Message-ID: <Z4MWP_4umYbLXo4pUCHzA1bU_sGj4nBrO_Icj1xLFZw=.18e8991c-000c-48ac-8c9e-693eb17773e0@github.com>

On Mon, 1 Sep 2025 13:17:23 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
>> 
>>  - nomenclature change
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into cdemotion
>>  - remove trailing whitespaces
>>  - remove unused instructions
>>  - 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2
>
> src/hotspot/cpu/x86/assembler_x86.cpp line 13055:
> 
>> 13053:   bool is_prefixq = (size == EVEX_64bit) ? true : false;
>> 13054:   bool normal_demotion = is_demotable(no_flags, dst_enc, nds_enc);
>> 13055:   bool commutative_demotion = is_commutative && is_demotable(no_flags, dst_enc, src_enc);
> 
> Nomenclature change: instead of normal_demotion and commutative demotion, it will be more appropriate to use first/second_operand_demotable.

Please see the updated nomenclature changed in the updated code as suggested.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2323411449

From sparasa at openjdk.org  Thu Sep  4 20:18:43 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Thu, 4 Sep 2025 20:18:43 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v2]
In-Reply-To: <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>
Message-ID: <0X5cvpQZxb1l5Q_8f-iU0K4WtdyFW8ehdPXR2zsnSzo=.7f4f3d03-94db-4482-b5ee-c5f1362d84b5@github.com>

On Tue, 2 Sep 2025 02:40:59 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
>> 
>>  - nomenclature change
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into cdemotion
>>  - remove trailing whitespaces
>>  - remove unused instructions
>>  - 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2
>
> src/hotspot/cpu/x86/x86_64.ad line 7121:
> 
>> 7119: %{
>> 7120:   predicate(UseAPX);
>> 7121:   match(Set dst (AddI (LoadI src1) src2));
> 
> Will this not be covered by the pattern at line 7103, since ADLC automatically generates a DFA to handle both cases?

Will run experiments to make sure that the RegRegMem pattern also applies to RegMemReg case and remove the newly added match rules if they're redundant. Will update you soon.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2323424398

From dlong at openjdk.org  Fri Sep  5 01:42:19 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 5 Sep 2025 01:42:19 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis [v2]
In-Reply-To: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
 <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>
Message-ID: <en-8Qc3n83tGrHV_6HoBLRtp6YMxuU6rm_KoynALoPo=.fcbbf084-4ccc-466d-baff-383d8a60a4bf@github.com>

On Wed, 3 Sep 2025 08:02:04 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

>> This PR addresses a wrong compilation during string optimizations.
>> 
>> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
>> 
>> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
>> 
>> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
>> 
>> Testing: T1-3 (aed5952).
>> 
>> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.
>
> Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - store intermediate calculations
>  - direction convention

This seems to be missing the root cause of the problem.  From what I can tell, we have two string concats here, with the 2nd dependent on the first.  But we incorrectly decide to coalesce them into a single concat, which then causes havoc when eliminate_unneeded_control() starts nuking edges without regard for the dependency.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27028#issuecomment-3256794151

From dlong at openjdk.org  Fri Sep  5 01:59:09 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 5 Sep 2025 01:59:09 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis [v2]
In-Reply-To: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
 <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>
Message-ID: <lO71oKNibA1MF-KQQe6gBb5c07wLNEfzycQ7oGCIpIc=.24bcdcb1-c2d7-4151-a3d3-9d9f6276ce94@github.com>

On Wed, 3 Sep 2025 08:02:04 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

>> This PR addresses a wrong compilation during string optimizations.
>> 
>> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
>> 
>> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
>> 
>> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
>> 
>> Testing: T1-3 (aed5952).
>> 
>> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.
>
> Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - store intermediate calculations
>  - direction convention

Hmm, I see now that validate_control_flow() does limit coalescing, but I'm worried that the pattern matching may not catch all problematic cases.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27028#issuecomment-3256815844

From epeter at openjdk.org  Fri Sep  5 06:06:41 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 06:06:41 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after VectorReinterpret
 with swapped src/dst type
Message-ID: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>

I have seen 3 manifestations of this bug:

1. assert

# Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
# assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require


2. assert

# Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
# Error: assert(bt == T_FLOAT) failed


3. Wrong result
When the feature was available but we used the wrong CastVector

It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:


  } else if (VectorNode::is_reinterpret_opcode(opc)) {
    assert(first->req() == 2 && req() == 2, "only one input expected");
    const TypeVect* vt = TypeVect::make(bt, vlen);
    vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());


Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.

But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.

The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.

-------------

Commit messages:
 - fix whitespace
 - fix test vector api visibility
 - fix copyright
 - IR rules
 - JDK-8366845

Changes: https://git.openjdk.org/jdk/pull/27100/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27100&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366845
  Stats: 226 lines in 2 files changed: 225 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27100.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27100/head:pull/27100

PR: https://git.openjdk.org/jdk/pull/27100

From galder at openjdk.org  Fri Sep  5 06:06:42 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Fri, 5 Sep 2025 06:06:42 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
Message-ID: <wM2-jj9R6aEVh5NPIoI3ycOdwjpABPLXAkzoA9Xa_8I=.d5042609-8733-4a46-9cba-220fd11e44b8@github.com>

On Thu, 4 Sep 2025 14:42:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I have seen 3 manifestations of this bug:
> 
> 1. assert
> 
> # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
> # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require
> 
> 
> 2. assert
> 
> # Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
> # Error: assert(bt == T_FLOAT) failed
> 
> 
> 3. Wrong result
> When the feature was available but we used the wrong CastVector
> 
> It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:
> 
> 
>   } else if (VectorNode::is_reinterpret_opcode(opc)) {
>     assert(first->req() == 2 && req() == 2, "only one input expected");
>     const TypeVect* vt = TypeVect::make(bt, vlen);
>     vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());
> 
> 
> Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.
> 
> But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
> The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.
> 
> The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.

Great catch @eme64! Sorry for introducing this issue :$

I was wondering if we'd need more cases being tested? Reversed ones? E.g. `test1 ` goes from long -> double -> float -> int, do we need something that does int -> float -> double -> long? Does that make sense?

Makes sense @eme64. Happy with the fix and tests :)

-------------

PR Review: https://git.openjdk.org/jdk/pull/27100#pullrequestreview-3185798345
Marked as reviewed by galder (Author).

PR Review: https://git.openjdk.org/jdk/pull/27100#pullrequestreview-3185920801

From vlivanov at openjdk.org  Fri Sep  5 06:06:42 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 5 Sep 2025 06:06:42 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
Message-ID: <_favZs4uLmC9KBKZiZekexIi8GRq66w1s0tgqZ5gOiw=.abb71cf4-6d27-4343-a9d8-6bcab85125cb@github.com>

On Thu, 4 Sep 2025 14:42:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I have seen 3 manifestations of this bug:
> 
> 1. assert
> 
> # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
> # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require
> 
> 
> 2. assert
> 
> # Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
> # Error: assert(bt == T_FLOAT) failed
> 
> 
> 3. Wrong result
> When the feature was available but we used the wrong CastVector
> 
> It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:
> 
> 
>   } else if (VectorNode::is_reinterpret_opcode(opc)) {
>     assert(first->req() == 2 && req() == 2, "only one input expected");
>     const TypeVect* vt = TypeVect::make(bt, vlen);
>     vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());
> 
> 
> Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.
> 
> But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
> The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.
> 
> The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.

Looks good.

-------------

Marked as reviewed by vlivanov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27100#pullrequestreview-3186086231

From thartmann at openjdk.org  Fri Sep  5 06:06:42 2025
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Fri, 5 Sep 2025 06:06:42 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
Message-ID: <20PTiZWdvMZQNCgMBJSOzD7f7uWC-J8t0bWoXT6NV7Q=.ed065007-326d-4228-b23a-e0964fc8940f@github.com>

On Thu, 4 Sep 2025 14:42:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I have seen 3 manifestations of this bug:
> 
> 1. assert
> 
> # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
> # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require
> 
> 
> 2. assert
> 
> # Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
> # Error: assert(bt == T_FLOAT) failed
> 
> 
> 3. Wrong result
> When the feature was available but we used the wrong CastVector
> 
> It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:
> 
> 
>   } else if (VectorNode::is_reinterpret_opcode(opc)) {
>     assert(first->req() == 2 && req() == 2, "only one input expected");
>     const TypeVect* vt = TypeVect::make(bt, vlen);
>     vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());
> 
> 
> Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.
> 
> But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
> The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.
> 
> The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.

Looks good to me otherwise. Nice test!

test/hotspot/jtreg/compiler/loopopts/superword/TestReinterpretAndCast.java line 170:

> 168:             int v0 = a[i];
> 169:             float v1 = Float.intBitsToFloat(v0);
> 170:             // Reinterpret: int -> float

Same here.

test/hotspot/jtreg/compiler/loopopts/superword/TestReinterpretAndCast.java line 212:

> 210:             float v2 = v1.floatValue();
> 211:             int v3 = Float.floatToRawIntBits(v2);
> 212:             // Reinterpret: float -> int

The indentation is off here. Please also fix the whitespace errors.

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27100#pullrequestreview-3188100502
PR Review Comment: https://git.openjdk.org/jdk/pull/27100#discussion_r2324177886
PR Review Comment: https://git.openjdk.org/jdk/pull/27100#discussion_r2324177346

From epeter at openjdk.org  Fri Sep  5 06:06:42 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 06:06:42 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <wM2-jj9R6aEVh5NPIoI3ycOdwjpABPLXAkzoA9Xa_8I=.d5042609-8733-4a46-9cba-220fd11e44b8@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
 <wM2-jj9R6aEVh5NPIoI3ycOdwjpABPLXAkzoA9Xa_8I=.d5042609-8733-4a46-9cba-220fd11e44b8@github.com>
Message-ID: <MPVzGXAYafzRZ9StYydOgQ8BHuxP8UDuMORE3QmLKjc=.7e0f973e-2956-4090-8303-fcc8999b7eb7@github.com>

On Thu, 4 Sep 2025 15:08:04 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I have seen 3 manifestations of this bug:
>> 
>> 1. assert
>> 
>> # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
>> # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require
>> 
>> 
>> 2. assert
>> 
>> # Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
>> # Error: assert(bt == T_FLOAT) failed
>> 
>> 
>> 3. Wrong result
>> When the feature was available but we used the wrong CastVector
>> 
>> It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:
>> 
>> 
>>   } else if (VectorNode::is_reinterpret_opcode(opc)) {
>>     assert(first->req() == 2 && req() == 2, "only one input expected");
>>     const TypeVect* vt = TypeVect::make(bt, vlen);
>>     vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());
>> 
>> 
>> Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.
>> 
>> But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
>> The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.
>> 
>> The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.
>
> Great catch @eme64! Sorry for introducing this issue :$
> 
> I was wondering if we'd need more cases being tested? Reversed ones? E.g. `test1 ` goes from long -> double -> float -> int, do we need something that does int -> float -> double -> long? Does that make sense?

@galderz Thanks for having a look.

We could add more cases, but I'd also like to integrate rather quickly since this is failing 10x or more on our CI daily.
If it takes too long we would have to back out [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) instead.

So I'd suggest this:
We can file a follow-up RFE that covers more cases. Because we basically need to cover: all Reinterpret (I2F, F2I, L2D, D2L, HF2S, S2HF) with all compatible casts after it. That is a lot of cases. We can consider using a templated test for it, or just generate them ahead.

Generally, it is quite difficult to test the "moves" well because of the way that different NaN bits are handled. I'd like to develop generally more templated tests. But it is difficult to do arbitrary expressions, because if you have some float expression that can generate a NaN, and then you "move" it to int with `Float.floatToRawIntBits`, you can get different results if you are in the interpreter or in compiled code.

The I2F, F2I, L2D, D2L are "moves" are currently also tested with unaligned memory accesses via MemorySegment - that is how we found this bug in the first place.

For now, I think the fix is quite simple and clear, so I'd think it is ok to defer the tests a little.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27100#issuecomment-3254226006

From duke at openjdk.org  Fri Sep  5 06:08:20 2025
From: duke at openjdk.org (duke)
Date: Fri, 5 Sep 2025 06:08:20 GMT
Subject: RFR: 8366747: RISC-V: Improve VerifyMethodHandles for method
 handle linkers [v2]
In-Reply-To: <ryamqp6ust0SfR5AZITl2xskZFY-5y8qLjitIHyZNx0=.752c2e1e-951a-46ed-ab72-5bfd217ab4cb@github.com>
References: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
 <ryamqp6ust0SfR5AZITl2xskZFY-5y8qLjitIHyZNx0=.752c2e1e-951a-46ed-ab72-5bfd217ab4cb@github.com>
Message-ID: <_EIRPDGLVZ9QPEc95OcNBQgvga1GmohBC7QyniOVM-w=.c56ef657-cc77-4f28-9202-8d99a61e7e37@github.com>

On Wed, 3 Sep 2025 02:40:27 GMT, Anjian Wen <wenanjian at openjdk.org> wrote:

>> According to JDK-8353216?Add extra verification logic into MethodHandle::invokeBasic/linkTo* to ensure that holder classes are properly initialized on riscv platform.
>
> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add assertion and modify format

@Anjian-Wen 
Your change (at version b5eb3bd13bef6bb886e4bd8e0b91a8fe67f64354) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26938#issuecomment-3257172995

From thartmann at openjdk.org  Fri Sep  5 06:08:11 2025
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Fri, 5 Sep 2025 06:08:11 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
Message-ID: <s1V2bl-StGYiS5C5uuFvt5v7OKyiDIufgIMkUpLsuVQ=.7894706c-bc09-4d71-94db-83960fe06c34@github.com>

On Thu, 4 Sep 2025 14:42:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I have seen 3 manifestations of this bug:
> 
> 1. assert
> 
> # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
> # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require
> 
> 
> 2. assert
> 
> # Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
> # Error: assert(bt == T_FLOAT) failed
> 
> 
> 3. Wrong result
> When the feature was available but we used the wrong CastVector
> 
> It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:
> 
> 
>   } else if (VectorNode::is_reinterpret_opcode(opc)) {
>     assert(first->req() == 2 && req() == 2, "only one input expected");
>     const TypeVect* vt = TypeVect::make(bt, vlen);
>     vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());
> 
> 
> Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.
> 
> But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
> The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.
> 
> The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.

Looks good!

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27100#pullrequestreview-3188108067

From wenanjian at openjdk.org  Fri Sep  5 06:16:16 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Fri, 5 Sep 2025 06:16:16 GMT
Subject: Integrated: 8366747: RISC-V: Improve VerifyMethodHandles for method
 handle linkers
In-Reply-To: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
References: <jbHif2K7OR4Am8IbYwqbhYkV9iF_MG3sY08ZV_ll4qc=.95f93d39-e6f8-4ef4-8968-beefe909bc5d@github.com>
Message-ID: <rHKWOqwsIyMGAL0YjHZcz1jerAmn7MN2Gpvzo4l3W5E=.7f19631a-fd1c-434d-8c72-f743174503c4@github.com>

On Tue, 26 Aug 2025 09:18:14 GMT, Anjian Wen <wenanjian at openjdk.org> wrote:

> According to JDK-8353216?Add extra verification logic into MethodHandle::invokeBasic/linkTo* to ensure that holder classes are properly initialized on riscv platform.

This pull request has now been integrated.

Changeset: 0d7f8f83
Author:    Anjian Wen <wenanjian at openjdk.org>
Committer: Fei Yang <fyang at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/0d7f8f83c7a674f5da4b93d66a24f9ce5ba46011
Stats:     54 lines in 2 files changed: 48 ins; 1 del; 5 mod

8366747: RISC-V: Improve VerifyMethodHandles for method handle linkers

Reviewed-by: fyang, dzhang

-------------

PR: https://git.openjdk.org/jdk/pull/26938

From duke at openjdk.org  Fri Sep  5 06:30:34 2025
From: duke at openjdk.org (erifan)
Date: Fri, 5 Sep 2025 06:30:34 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v3]
In-Reply-To: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
Message-ID: <TuBiSPqTkozHX6ZgMqeWkjxvoZR7qZgnZQH9q85B_cs=.0a26c93f-537c-4277-ae1c-7ea2ce0dbc1e@github.com>

> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
> 1. **Subword types** on SVE2-capable hardware.
> 2. **All types** on NEON and SVE1 environments.
> 
> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
> 
> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
> 
> To compute: dst = src.expand(mask)
> Data direction: high <== low
> Input:
>   src                         = p o n m l k j i h g f e d c b a
>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
> Expected result:
>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
> 
> Step 1: calculate the index input of the TBL instruction.
> 
> // Set tmp1 as all 0 vector.
> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 
> // Move the mask bits from the predicate register to a vector register.
> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
> 
> // Shift the entire register. Prefix sum algorithm.
> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
> 
> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
> 
> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
> 
> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
> 
> // Clear inactive elements.
> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
> 
> // Set the inactive lane value to -1 and set the active lane to the target index.
> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
> 
> Step 2: shuffle the source vector elements to the target vector
> 
> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
> 
> 
> The same algorithm is used for NEON and SVE1, but with different instructions where appropriate.
> 
> The following benchmarks are from panama-...

erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:

 - Align code example data for better reading
 - Merge branch 'master' into JDK-8363989
 - Improve the comment of the vector expand implementation
 - Merge branch 'master' into JDK-8363989
 - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
   
   Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
   for 32-bit and 64-bit types only when SVE2 is available. In the following
   cases, `expand` has not yet been intrinsified:
   1. **Subword types** on SVE2-capable hardware.
   2. **All types** on NEON and SVE1 environments.
   
   As a result, `expand` API performance is very poor in these scenarios.
   This patch intrinsifies the `expand` operation in the above environments.
   
   Since there are no native instructions directly corresponding to `expand`
   in these cases, this patch mainly leverages the `TBL` instruction to
   implement `expand`. To compute the index input for `TBL`, the prefix sum
   algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
   Take a 128-bit byte vector on SVE2 as an example:
   ```
   To compute: dst = src.expand(mask)
   Data direction: high <== low
   Input:
     src                         = p o n m l k j i h g f e d c b a
     mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
   Expected result:
     dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
   ```
   Step 1: calculate the index input of the TBL instruction.
   ```
   // Set tmp1 as all 0 vector.
   tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   
   // Move the mask bits from the predicate register to a vector register.
   // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
   tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
   
   // Shift the entire register. Prefix sum algorithm.
   dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
   tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
   
   dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
   tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
   
   dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
   tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
   
   dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
   tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
   
   // Clear inactive elements.
   dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
   
   // Set the inactive lane value to -1 and set the active lane to the target index.
   dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
   ```
   Step 2: shuffle the source vector elements to the target vector
   ```
   tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
   ```
   
   The same algorithm is used for NEON and SVE1, but with different
   instructions where appropriate.
   
   The following benchmarks are from panama-vector/vectorIntrinsics.
   
   On Nvidia Grace machine with option `-XX:UseSVE=2`:
   ```
   Benchmark		Unit	Before		Score Error	After		Score Error	Uplift
   Byte128Vector.expand	ops/ms	1791.022366	5.619883	9633.388683	1.968788	5.37
   Double128Vector.expand	ops/ms	4489.255846	0.48485		4488.772949	0.491596	0.99
   Float128Vector.expand	ops/ms	8863.02424	6.888087	8908.352235	51.487453	1
   Int128Vector.expand	ops/ms	8873.485683	3.275682	8879.635643	1.243863	1
   Long128Vector.expand	ops/ms	4485.1149	4.458073	4489.365269	0.851093	1
   Short128Vector.expand	ops/ms	792.068834	2.640398	5880.811288	6.40683		7.42
   Byte64Vector.expand	ops/ms	854.455002	8.548982	5999.046295	37.209987	7.02
   Double64Vector.expand	ops/ms	46.49763	0.104773	46.526043	0.102451	1
   Float64Vector.expand	ops/ms	4510.596811	0.504477	4509.984244	1.519178	0.99
   Int64Vector.expand	ops/ms	4508.778322	1.664461	4535.216611	26.742484	1
   Long64Vector.expand	ops/ms	45.665462	0.705485	46.496232	0.075648	1.01
   Short64Vector.expand	ops/ms	394.527324	1.284691	3860.199621	0.720015	9.78
   ```
   
   On Nvidia Grace machine with option `-XX:UseSVE=1`:
   ```
   Benchmark		Unit	Before		Score Error	After		Score Error	Uplift
   Byte128Vector.expand	ops/ms	1767.314171	12.431526	9630.892248	1.478813	5.44
   Double128Vector.expand	ops/ms	197.614381	0.945541	2416.075281	2.664325	12.22
   Float128Vector.expand	ops/ms	390.878183	2.089234	3844.011978	3.792751	9.83
   Int128Vector.expand	ops/ms	394.550044	2.025371	3843.280133	3.528017	9.74
   Long128Vector.expand	ops/ms	198.366863	0.651726	2423.234639	4.911434	12.21
   Short128Vector.expand	ops/ms	790.044704	3.339363	5885.595035	1.440598	7.44
   Byte64Vector.expand	ops/ms	853.479119	7.158898	5942.750116	1.054905	6.96
   Double64Vector.expand	ops/ms	46.550458	0.079191	46.423053	0.057554	0.99
   Float64Vector.expand	ops/ms	197.977215	1.156535	2445.010767	1.992358	12.34
   Int64Vector.expand	ops/ms	198.326857	1.02785		2444.211583	2.5432		12.32
   Long64Vector.expand	ops/ms	46.526513	0.25779		45.984253	0.566691	0.98
   Short64Vector.expand	ops/ms	398.649412	1.87764		3837.495773	3.528926	9.62
   ```
   
   On Nvidia Grace machine with option `-XX:UseSVE=0`:
   ```
   Benchmark		Unit	Before		Score Error	After		Score Error	Uplift
   Byte128Vector.expand	ops/ms	1802.98702	6.906394	9427.491602	2.067934	5.22
   Double128Vector.expand	ops/ms	198.498191	0.429071	1190.476326	0.247358	5.99
   Float128Vector.expand	ops/ms	392.849005	2.034676	2373.195574	2.006566	6.04
   Int128Vector.expand	ops/ms	395.69179	2.194773	2372.084745	2.058303	5.99
   Long128Vector.expand	ops/ms	198.191673	1.476362	1189.712301	1.006821	6
   Short128Vector.expand	ops/ms	795.785831	5.62611		4731.514053	2.365213	5.94
   Byte64Vector.expand	ops/ms	843.549268	7.174254	5865.556155	37.639415	6.95
   Double64Vector.expand	ops/ms	45.943599	0.484743	46.529755	0.111551	1.01
   Float64Vector.expand	ops/ms	193.945993	0.943338	1463.836772	0.618393	7.54
   Int64Vector.expand	ops/ms	194.168021	0.492286	1473.004575	8.802656	7.58
   Long64Vector.expand	ops/ms	46.570488	0.076372	46.696353	0.078649	1
   Short64Vector.expand	ops/ms	387.973334	2.367312	2920.428114	0.863635	7.52
   ```
   
   Some JTReg test cases are added for the above changes. And the patch was
   tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26740/files
  - new: https://git.openjdk.org/jdk/pull/26740/files/a1777974..8f1f8aaf

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26740&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26740&range=01-02

  Stats: 22892 lines in 964 files changed: 15292 ins; 4162 del; 3438 mod
  Patch: https://git.openjdk.org/jdk/pull/26740.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26740/head:pull/26740

PR: https://git.openjdk.org/jdk/pull/26740

From duke at openjdk.org  Fri Sep  5 06:30:34 2025
From: duke at openjdk.org (erifan)
Date: Fri, 5 Sep 2025 06:30:34 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v2]
In-Reply-To: <_VZ4L0DTdTxRz1XzG4QIyYY7TyCHzroEOeOV21N17_Y=.e92ad3fd-3e94-4bf5-a570-dc8cc8c9e9ed@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
 <kOz6uh-zcSXfhMf5UioaIXmQ3vbU18UMgncgOslfyv0=.c6d6ef2c-8b9e-48c9-b78d-df96e07d7832@github.com>
 <_VZ4L0DTdTxRz1XzG4QIyYY7TyCHzroEOeOV21N17_Y=.e92ad3fd-3e94-4bf5-a570-dc8cc8c9e9ed@github.com>
Message-ID: <tmQTkWs5K_wMB2_7SPwtu2-7YWGG2z8h-m6wgIhRayc=.03d70fa4-2c03-4458-9fc7-fc558cdc224f@github.com>

On Thu, 4 Sep 2025 08:01:40 GMT, erifan <duke at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2819:
>> 
>>> 2817:   subv(dst, size, tmp2, tmp1);
>>> 2818:   // dst = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>>> 2819:   tbl(dst, size, src, 1, dst);
>> 
>> It would make it a little easier to read the example if the numbers were aligned.
>> Now the minus sign disrupts that a little. Maybe leave 2 spaces if the number is positive?
>
> Make sense, I'll update it in the following commit.

Done, thanks!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26740#discussion_r2324218817

From epeter at openjdk.org  Fri Sep  5 06:33:16 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 06:33:16 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v3]
In-Reply-To: <q2XDj9jri77RC-AAE0lOHWpiD-5hrmSq3niaDmrm81Y=.13f17498-c1f1-499b-a2dd-d626d7b919fd@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
 <q2XDj9jri77RC-AAE0lOHWpiD-5hrmSq3niaDmrm81Y=.13f17498-c1f1-499b-a2dd-d626d7b919fd@github.com>
Message-ID: <bq_pX0RUF1Se5VZU2iKxjnfwh2G3tG2p2TjzGhOakWs=.940ac6ae-e641-4d84-b78e-1a58684a770e@github.com>

On Thu, 4 Sep 2025 09:39:54 GMT, erifan <duke at openjdk.org> wrote:

>> erifan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Code style fixes
>
> The test failure should be irrelevant to this PR, I can see it in other PR's test results, like https://github.com/egahlin/jdk/actions/runs/17436633376/job/49510579213

@erifan There are only unrelated test failures, so good on testing front.

The patch looks reasonable, though I'm not a aarch64 expert.

Is the issue at all observable from Java? With the wrong encoding, could there be a wrong result that we could test in a jtreg test?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3257224069

From dskantz at openjdk.org  Fri Sep  5 06:33:11 2025
From: dskantz at openjdk.org (Daniel Skantz)
Date: Fri, 5 Sep 2025 06:33:11 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis [v2]
In-Reply-To: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
 <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>
Message-ID: <t5HDv9HYOlPGHwJcacoaXp0RNNmT-hR2FfpVYrfOg6E=.978348e2-3356-4a71-a084-66f4159960b0@github.com>

On Wed, 3 Sep 2025 08:02:04 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

>> This PR addresses a wrong compilation during string optimizations.
>> 
>> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
>> 
>> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
>> 
>> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
>> 
>> Testing: T1-3 (aed5952).
>> 
>> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.
>
> Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - store intermediate calculations
>  - direction convention

Yes, `validate_control_flow()` is used for individual and coalesced concatenations, but just re-using those same checks for coalesced concatenations has been shown to not be sufficient in recent bugs -- in particular when the result of SB1 is used in unexpected ways in SB2. I am not convinced that we have covered all the cases yet. Would it be an idea to fix this issue and then go for the fuzzing approach next to cover more patterns (follow-up RFE), or is there a more general pattern we could prevent here already?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27028#issuecomment-3257224173

From duke at openjdk.org  Fri Sep  5 06:33:12 2025
From: duke at openjdk.org (erifan)
Date: Fri, 5 Sep 2025 06:33:12 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v2]
In-Reply-To: <_VZ4L0DTdTxRz1XzG4QIyYY7TyCHzroEOeOV21N17_Y=.e92ad3fd-3e94-4bf5-a570-dc8cc8c9e9ed@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
 <kOz6uh-zcSXfhMf5UioaIXmQ3vbU18UMgncgOslfyv0=.c6d6ef2c-8b9e-48c9-b78d-df96e07d7832@github.com>
 <_VZ4L0DTdTxRz1XzG4QIyYY7TyCHzroEOeOV21N17_Y=.e92ad3fd-3e94-4bf5-a570-dc8cc8c9e9ed@github.com>
Message-ID: <5_oA0GhSFquOBfMBsQ7atQZBOR8R14Qc1GiDMS7Xbsc=.491d977f-8272-4415-9db1-7e8a12d41a6b@github.com>

On Thu, 4 Sep 2025 08:00:14 GMT, erifan <duke at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/vectorapi/VectorExpandTest.java line 48:
>> 
>>> 46:     static final VectorSpecies<Float> F_SPECIES = FloatVector.SPECIES_MAX;
>>> 47:     static final VectorSpecies<Long> L_SPECIES = LongVector.SPECIES_MAX;
>>> 48:     static final VectorSpecies<Double> D_SPECIES = DoubleVector.SPECIES_MAX;
>> 
>> Would it make sense to run these tests with various vector sizes?
>> Because it seems your algorithm depends on `vector_length_in_bytes` in the prefix sum algo.
>
> Since we already have correctness tests for `expand` on **all vector types** under `test/jdk/jdk/incubator/vector/`, such as https://github.com/openjdk/jdk/blob/986ecff5f9b16f1b41ff15ad94774d65f3a4631d/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L5375, this test primarily verifies that the expected IR is generated. So, I think this is sufficient?
> 
> I've tested this PR locally on a 128-bit SVE2 machine, a 256-bit SVE machine, and a 512-bit QEMU environment, and all tests passed.

By the way, `vector_length_in_bytes` doesn't affect the IR generation.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26740#discussion_r2324223475

From duke at openjdk.org  Fri Sep  5 06:41:14 2025
From: duke at openjdk.org (erifan)
Date: Fri, 5 Sep 2025 06:41:14 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v3]
In-Reply-To: <bq_pX0RUF1Se5VZU2iKxjnfwh2G3tG2p2TjzGhOakWs=.940ac6ae-e641-4d84-b78e-1a58684a770e@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
 <q2XDj9jri77RC-AAE0lOHWpiD-5hrmSq3niaDmrm81Y=.13f17498-c1f1-499b-a2dd-d626d7b919fd@github.com>
 <bq_pX0RUF1Se5VZU2iKxjnfwh2G3tG2p2TjzGhOakWs=.940ac6ae-e641-4d84-b78e-1a58684a770e@github.com>
Message-ID: <_C1JCBLGEQNTp2YPZOvu403adPXceeh1Dg6MYqWiqdw=.e94c5fdf-f2e6-4f95-b390-2cb3106673c7@github.com>

On Fri, 5 Sep 2025 06:30:16 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Is the issue at all observable from Java? With the wrong encoding, could there be a wrong result that we could test in a jtreg test?

No this is not observable from java because the JVM currently doesn't use `sve_cpy` to copy negative floating-point numbers, only positive floating-point numbers.

I discovered this issue while trying to use this instruction to optimize `VectorMask.toVector()` , which needs to do `sve_cpy(-1.0)`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3257238951

From epeter at openjdk.org  Fri Sep  5 06:46:13 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 06:46:13 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v2]
In-Reply-To: <5_oA0GhSFquOBfMBsQ7atQZBOR8R14Qc1GiDMS7Xbsc=.491d977f-8272-4415-9db1-7e8a12d41a6b@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <_YDJIkwt0sdsOAMfNNn1fHTVwH0SHDpJv5NpQoxnfiA=.a0ddb5f3-00f1-47e2-93da-f47cb3f62288@github.com>
 <kOz6uh-zcSXfhMf5UioaIXmQ3vbU18UMgncgOslfyv0=.c6d6ef2c-8b9e-48c9-b78d-df96e07d7832@github.com>
 <_VZ4L0DTdTxRz1XzG4QIyYY7TyCHzroEOeOV21N17_Y=.e92ad3fd-3e94-4bf5-a570-dc8cc8c9e9ed@github.com>
 <5_oA0GhSFquOBfMBsQ7atQZBOR8R14Qc1GiDMS7Xbsc=.491d977f-8272-4415-9db1-7e8a12d41a6b@github.com>
Message-ID: <VvDUft-FUw5EDY72-3EY5JG-pmA1d-0uakUDd4ikhN4=.2d620739-c161-410b-aca1-167e171f2b59@github.com>

On Fri, 5 Sep 2025 06:30:30 GMT, erifan <duke at openjdk.org> wrote:

>> Since we already have correctness tests for `expand` on **all vector types** under `test/jdk/jdk/incubator/vector/`, such as https://github.com/openjdk/jdk/blob/986ecff5f9b16f1b41ff15ad94774d65f3a4631d/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L5375, this test primarily verifies that the expected IR is generated. So, I think this is sufficient?
>> 
>> I've tested this PR locally on a 128-bit SVE2 machine, a 256-bit SVE machine, and a 512-bit QEMU environment, and all tests passed.
>
> By the way, `vector_length_in_bytes` doesn't affect the IR generation.

Ok, that sounds good, as long as we test all vector types elsewhere already :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26740#discussion_r2324243461

From epeter at openjdk.org  Fri Sep  5 07:17:10 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 07:17:10 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v3]
In-Reply-To: <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
Message-ID: <7Qekyc5L6ZJS4G9DqSp6Ur68K-Jqv-EgPYcUMK0CrOc=.4a4331ea-ed91-4cd9-92a6-fd84b175dc0c@github.com>

On Wed, 3 Sep 2025 10:02:24 GMT, erifan <duke at openjdk.org> wrote:

>> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
>> 
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>> 
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>> 
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>> 
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Code style fixes

Alright, let me rubber stamp it then. Looks reasonable and tests are passing on our side.
Thanks for fixing this :)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26951#pullrequestreview-3188266800

From epeter at openjdk.org  Fri Sep  5 07:27:10 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 07:27:10 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <RemdjEfPCVrFh3MEdDolKYCgSdmF2GrEvFc_XQRiXlM=.099b1b59-8499-46d5-9220-54ae5e59657d@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <RemdjEfPCVrFh3MEdDolKYCgSdmF2GrEvFc_XQRiXlM=.099b1b59-8499-46d5-9220-54ae5e59657d@github.com>
Message-ID: <qVpc3H7XuCrzBbqMBnBd7HI7C7fFuMNd1qmFKLyqiV8=.79507104-5d95-43f1-8c5f-e5b22020e41a@github.com>

On Thu, 4 Sep 2025 08:50:59 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 32:
>> 
>>> 30: 
>>> 31: void LocalGraphInvariant::LazyReachableCFGNodes::fill() {
>>> 32:   precond(live_nodes.size() == 0);
>> 
>> Maybe I missed something here: where do the `precond` and `postcond` come from?
>
> `debug.hpp` just next to `assert`. They are "standard", but not very widely used. I think they are good as they clearly state what is a precondition or a postcondition. There is no message (or rather a default one), but it's better (or not worse) than giving a not very inspired one, like "fail", which one can find often.

Nice, did not know that :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324319149

From epeter at openjdk.org  Fri Sep  5 07:34:11 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 07:34:11 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <OW3WWkCleTJthk05JQXizAgHSMb8d8gWYHLS0iCNgY0=.1d936714-83a6-45c2-8957-38f3b346f75d@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <OW3WWkCleTJthk05JQXizAgHSMb8d8gWYHLS0iCNgY0=.1d936714-83a6-45c2-8957-38f3b346f75d@github.com>
Message-ID: <Lg0NHUYtZ3BvDBIylqFeLx9JfxHsJ7pcpjldTv-Fg4U=.bd9a5796-df6a-4fc7-973d-e1e3c89237b2@github.com>

On Thu, 4 Sep 2025 09:16:52 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 207:
>> 
>>> 205:   }
>>> 206:   bool (Node::*_type_check)() const;
>>> 207: };
>> 
>> You could probably generalize this with a callback approach. And then one concrete implentation is the one that does the type check. Just an idea.
>
> Seems overengineered to me. The callback version would be similarly long as this. The user that must provide the callback will also be similarly long. It makes the logic unnecessarily complicated to me. Of course, everything boils down to a function that takes a node and perform a specific check, but then, this generalized version does nothing significant but calling the callback. The concrete implementation will just have all the same logic, but in a callback passed to another method instead of having it as a first class method...
> 
> If I don't have an adapter class that would only check type but I leave that at instanciation time, the code would look like
> 
> NodeCallback([](const Node* n) { return n->is_Region(); })
> 
> instead of
> 
> NodeClass(&Node::is_Region)
> 
> which is unreadable. That's the point of patterns: it makes easy to understand the shape, otherwise, one can just write normal, manual traversal, which is all powerful.
> 
> It was also discussed above that something like the `NodeCallback` could exist for when we need something that can't be expressed simply, but:
> - will it ever happen?
> - NodeCallback doesn't even provide a useful error messages, we would also need a callback to craft it (or make the one callback more complicated, that would be pretty much the content of `NodeClass::check`)
> - I'm not willing to make the common kind of patterns ugly for a rare usecase.
> 
> And as for implementing `NodeClass` from a hypothetical `NodeCallback`, what would be the concrete benefits? (kinda the first paragraph again: all the logic in the callback, and NodeCallback doing nothing).

Sounds good :)
It was just an idea, and it is also a bit a question of taste. But you are right: callbacks can also look ugly and hard to read.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324333231

From mchevalier at openjdk.org  Fri Sep  5 07:42:07 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 5 Sep 2025 07:42:07 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v4]
In-Reply-To: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
Message-ID: <U5Q37LxsRx_SIfTxQRRLDcAOebDWLN1BDa0e1sxmRg0=.eb6f7757-7b35-4a84-9703-2370a58c4c42@github.com>

> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
> 
> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
> 
> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
> 
> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
> 
> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
> 
> 1 failure for node
>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
> At node
>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>   From path:
>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>       <-(0)- 210  IfFalse  === 209  [[ 215 216 ]] #0 !orig=198 !jvms: StringL...

Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision:

 - With typed binding
 - Review

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26362/files
  - new: https://git.openjdk.org/jdk/pull/26362/files/700310e1..3c33fac9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=02-03

  Stats: 334 lines in 7 files changed: 211 ins; 64 del; 59 mod
  Patch: https://git.openjdk.org/jdk/pull/26362.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26362/head:pull/26362

PR: https://git.openjdk.org/jdk/pull/26362

From epeter at openjdk.org  Fri Sep  5 07:42:08 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 07:42:08 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <DCA_CeYoAocClu9HKpT4-ShN-6NEHufaas9ZGYaoELQ=.c61468b6-7c94-4118-a17d-905a1dc86a12@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <DCA_CeYoAocClu9HKpT4-ShN-6NEHufaas9ZGYaoELQ=.c61468b6-7c94-4118-a17d-905a1dc86a12@github.com>
Message-ID: <bNCJxjS9k6LhYDV_FZ9L_DBMVKI0igdIyy8menUfRP8=.810aabc4-8154-4714-8f0b-54a4dce49810@github.com>

On Thu, 4 Sep 2025 11:08:20 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 279:
>> 
>>> 277:       return CheckResult::NOT_APPLICABLE;
>>> 278:     }
>>> 279:     CheckResult r = PatternBasedCheck::check(center, reachable_cfg_nodes, steps, path, ss);
>> 
>> Could this not be solved with a `OrPattern`?
>> 
>> Or::make( <not is_If ...,
>>   <the And::make from above>
>> )
>> 
>> Not sure that's worth it...
>
> I understand that OrPatterns are tempting! I also thought about it, it's naturally the dual of `And`. At this point, they are not actually a good idea.
> 
> First, they cannot provide good reporting. When an `And` is failing, we can at least blame the first thing that fails: "I followed this path, I expected to find 5 inputs (for instance), there are only 2!". With `Or` we would get that and... maybe it's fine? Maybe not? Depends on the next branches, and if it ends up failing, how to provide a good message?
> 
> Also, they cause a mess with binding. If a branch contains a `Bind`, one cannot know which branch matched and whether the content of the `Node` pointer given to `Bind` is trustworthy. We can't even rely on a test whether the pointer was set because the execution of a branch might find a `Bind` first, run it, assign the pointer and later fail, and then the `Bind` is not to use. This is a common problem with pattern matching in functional programming: the same bindings must appear (with same types) on each branch of or-patterns. But we have no such mechanisms to enforce that yet, and it seems like setting a trap for future us.
> 
> There is also relatively few use cases, and that would not profit a lot from a `Or` pattern. Maybe in the future, we will have more interesting usecases and we will see how to address these issues. But for now, I think we should not include it for now rather than making a bad choice.
> 
> By the way, I think something that has more future than `Or` is rather a case analysis: `IfThenElse(CondtionPattern, TrueBranchPattern, FalseBranchPattern)` if CondtionPattern is true, then we try to match TrueBranchPattern, otherwise FalseBranchPattern. This is better for reporting since we know which branch to we expect to be true, and so to blame (assuming we don't blame CondtionPattern, but we can include that in the message possibly). This still has the binding consistency issue, but more boilerplate could help (querying the set of pointers that would be set in each branch with helping methods...). Yet, let's wait and see.

Hmm, yes. I did later on think about binding.
If we ever use pattern matching for IGVN optimizations, we need to be able to do or-like patterns, maybe even over 2, 3, ...n many branches. And then bind to something.

And you are right, reporting could also be an issue.
Maybe there could be some kind of reporting still though: we could evaluate both branches and report where each fails.

I saw multiple uses, so maybe at some point an Or could be justified. But maybe not yet.

I think this is an interesting thread, so a shame you closed it as "resolved" ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324343167

From mchevalier at openjdk.org  Fri Sep  5 07:53:15 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 5 Sep 2025 07:53:15 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <bNCJxjS9k6LhYDV_FZ9L_DBMVKI0igdIyy8menUfRP8=.810aabc4-8154-4714-8f0b-54a4dce49810@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <DCA_CeYoAocClu9HKpT4-ShN-6NEHufaas9ZGYaoELQ=.c61468b6-7c94-4118-a17d-905a1dc86a12@github.com>
 <bNCJxjS9k6LhYDV_FZ9L_DBMVKI0igdIyy8menUfRP8=.810aabc4-8154-4714-8f0b-54a4dce49810@github.com>
Message-ID: <Mux4IaBJhswhUUwY4tNyXPcmSTOe2-F4w445fJ5Cpds=.07fb2b31-7b9a-41d1-bd6e-b1461d2cc1b0@github.com>

On Fri, 5 Sep 2025 07:36:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I understand that OrPatterns are tempting! I also thought about it, it's naturally the dual of `And`. At this point, they are not actually a good idea.
>> 
>> First, they cannot provide good reporting. When an `And` is failing, we can at least blame the first thing that fails: "I followed this path, I expected to find 5 inputs (for instance), there are only 2!". With `Or` we would get that and... maybe it's fine? Maybe not? Depends on the next branches, and if it ends up failing, how to provide a good message?
>> 
>> Also, they cause a mess with binding. If a branch contains a `Bind`, one cannot know which branch matched and whether the content of the `Node` pointer given to `Bind` is trustworthy. We can't even rely on a test whether the pointer was set because the execution of a branch might find a `Bind` first, run it, assign the pointer and later fail, and then the `Bind` is not to use. This is a common problem with pattern matching in functional programming: the same bindings must appear (with same types) on each branch of or-patterns. But we have no such mechanisms to enforce that yet, and it seems like setting a trap for future us.
>> 
>> There is also relatively few use cases, and that would not profit a lot from a `Or` pattern. Maybe in the future, we will have more interesting usecases and we will see how to address these issues. But for now, I think we should not include it for now rather than making a bad choice.
>> 
>> By the way, I think something that has more future than `Or` is rather a case analysis: `IfThenElse(CondtionPattern, TrueBranchPattern, FalseBranchPattern)` if CondtionPattern is true, then we try to match TrueBranchPattern, otherwise FalseBranchPattern. This is better for reporting since we know which branch to we expect to be true, and so to blame (assuming we don't blame CondtionPattern, but we can include that in the message possibly). This still has the binding consistency issue, but more boilerplate could help (querying the set of pointers that would be set in each branch with helping methods...). Yet, let's wait and see.
>
> Hmm, yes. I did later on think about binding.
> If we ever use pattern matching for IGVN optimizations, we need to be able to do or-like patterns, maybe even over 2, 3, ...n many branches. And then bind to something.
> 
> And you are right, reporting could also be an issue.
> Maybe there could be some kind of reporting still though: we could evaluate both branches and report where each fails.
> 
> I saw multiple uses, so maybe at some point an Or could be justified. But maybe not yet.
> 
> I think this is an interesting thread, so a shame you closed it as "resolved" ;)

I think using Or patterns for recognizing patterns, and not for enforcing them is nicer since there is no need reporting problem. We can add that when we are there.

>> Everything will run under `GraphInvariantChecker::run()` that has a `ResouceMark`. I'm not sure, but my guess is that it's not worth keeping entering and leaving resource marks for relatively short lists? At the very list, everything will be released at the end of the whole check. I can still add one here if you think it's better.
>
> I would do it defensively. Might save us from out-of-memory later on with higher tiers, and it could also make things faster: i.e. we might avoid timeouts, just because we need less memory. I don't have the overview how large these are and how many you'd create, so maybe it is unnecessary. Up to you.

I will then! I've run this flag to tier3 + stress without OOM, so it's not too terrible, but yeah, we never know what kind of memory usage will come!

>> I surely hope not!
>
> Then I would assert that it has exactly 1 input instead ;)

Yes, I've changed it already (is GH laggy?...)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324352658
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324363672
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324367309

From epeter at openjdk.org  Fri Sep  5 07:53:16 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 07:53:16 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <m9YpeewjFcL8PG--Vlc6zbfffPpNFVu27WyFwcSLxYk=.9e0e9e3c-29ff-4ca4-afea-d648752646db@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
 <m9YpeewjFcL8PG--Vlc6zbfffPpNFVu27WyFwcSLxYk=.9e0e9e3c-29ff-4ca4-afea-d648752646db@github.com>
Message-ID: <q0ZOVsKy5HgILw8bukWe3mN8p4c1xgdKwUngwiz35Pg=.2559b91e-8ef8-4e45-bc4e-d69d0e2fb9b7@github.com>

On Thu, 4 Sep 2025 11:14:27 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

> We would have to find a solution if there would be multiple bindings though ... I think that's not possible with your patterns, right? Is that a fundamental constraint?

Sorry, that was not very clear. Yes you can bind multiple variables already. But you cannot do a disjuction (or) with binding. That would be helpful if you wanted to match patterns like:

((x + a) + a)
or
(a + (x + a))

We do that sort of thing a lot in IGVN optimizations: we need to be prepared to iterate over all associative reorderings.

>> Also: I would invert the check to `!counted_loop_end->is_LongCountedLoopEnd()`. Because you expect it to be a long end here. Subjective.
>
> If you want. I don't think it's perfect because then the message might be less accurate: I don't know that
>> A CountedLoopEnd is the backedge of a LongCountedLoop.
> 
> I rather know that
>> The backedge of a LongCountedLoop is not a LongCountedLoopEnd

As far as I know, CountedLoopEnd is always the backedge of LongCountedLoop. Same for int. If not, I'd like to see a counter example ;)

At least this should be true after IGVN.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324352933
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324364567

From mchevalier at openjdk.org  Fri Sep  5 07:53:16 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 5 Sep 2025 07:53:16 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <q0ZOVsKy5HgILw8bukWe3mN8p4c1xgdKwUngwiz35Pg=.2559b91e-8ef8-4e45-bc4e-d69d0e2fb9b7@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <eRSPXdu_TchSS8aABPgp0JEs_fhUyvvbhBnjGZ6_fEU=.0af530a9-66ab-4b4b-8557-ce73687edf24@github.com>
 <m9YpeewjFcL8PG--Vlc6zbfffPpNFVu27WyFwcSLxYk=.9e0e9e3c-29ff-4ca4-afea-d648752646db@github.com>
 <q0ZOVsKy5HgILw8bukWe3mN8p4c1xgdKwUngwiz35Pg=.2559b91e-8ef8-4e45-bc4e-d69d0e2fb9b7@github.com>
Message-ID: <eFnsrHU83QoAcMhd9Tz1BvGdPekDf0ifzS6G_4E7_p0=.1d6c6fc7-0c29-451e-88a9-f64e131b5b17@github.com>

On Fri, 5 Sep 2025 07:41:24 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>>  We would have to find a solution if there would be multiple bindings though ... I think that's not possible with your patterns, right? Is that a fundamental constraint?
>> 
>> Not sure what you mean? `And::make(new Bind(bla), AtInput(1, new Bind(bli)))`? You probably mean something else.
>> 
>> 
>>> If we could somehow already cast the Bind variable to Region. Could be tricky.
>>> Doing this is_Region and bind could be a very common idiom, so very useful.
>> 
>> Interesting... Not sure how with some template magic we don't have (like `Node::is<RegionNode>`) but probably doable with macros. I'll give it a try.
>
>> We would have to find a solution if there would be multiple bindings though ... I think that's not possible with your patterns, right? Is that a fundamental constraint?
> 
> Sorry, that was not very clear. Yes you can bind multiple variables already. But you cannot do a disjuction (or) with binding. That would be helpful if you wanted to match patterns like:
> 
> ((x + a) + a)
> or
> (a + (x + a))
> 
> We do that sort of thing a lot in IGVN optimizations: we need to be prepared to iterate over all associative reorderings.

True. There is also no notion of "everyway this pattern can be matched around this center" (even tho, I tried to make patterns deterministic with numbering inputs and picking output of given type).

I think that can fit with Or pattern, and rather for a IGVN use than a checking one. Let's see when we make use of that actually in this context. Then, we will not make stupid blind guesses on how to do it.

>> If you want. I don't think it's perfect because then the message might be less accurate: I don't know that
>>> A CountedLoopEnd is the backedge of a LongCountedLoop.
>> 
>> I rather know that
>>> The backedge of a LongCountedLoop is not a LongCountedLoopEnd
>
> As far as I know, CountedLoopEnd is always the backedge of LongCountedLoop. Same for int. If not, I'd like to see a counter example ;)
> 
> At least this should be true after IGVN.

Don't you mean "LongCountedLoopEnd is always the backedge of LongCountedLoop"? But I rather meant "what if we add another derived class of `CounterLoopEnd`? But I think the new assert should do the trick.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324360017
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324371124

From epeter at openjdk.org  Fri Sep  5 07:53:18 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 07:53:18 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v3]
In-Reply-To: <mOEGABxIy2JYZYwjmxnznr-46OsPd7sOTm6mrJpvv6g=.646d0008-c703-4c00-9d71-098bb7c3b9fd@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <fXrcwiU_LDyWFyfzZf0SOel1mb_kne4OfyZs_bLde70=.0830080a-7f52-44e5-ae6b-a74d50185b36@github.com>
 <3-ZWJMEYL6eWaILQXqX4RskVroCjpFlNdGkmTQMt8Jc=.b09b689a-981f-4f95-83fa-015f0bd698cf@github.com>
 <mOEGABxIy2JYZYwjmxnznr-46OsPd7sOTm6mrJpvv6g=.646d0008-c703-4c00-9d71-098bb7c3b9fd@github.com>
Message-ID: <Lr91OErzsii-oET2hhZijo_-Fo7o3KmcG6_zXK1_gwI=.f2522c8e-4900-4f84-b646-b578a444df57@github.com>

On Thu, 4 Sep 2025 11:19:13 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 332:
>> 
>>> 330:     }
>>> 331: 
>>> 332:     Node_List ctrl_succ;
>> 
>> Do we need a `ResouceMark` for this?
>
> Everything will run under `GraphInvariantChecker::run()` that has a `ResouceMark`. I'm not sure, but my guess is that it's not worth keeping entering and leaving resource marks for relatively short lists? At the very list, everything will be released at the end of the whole check. I can still add one here if you think it's better.

I would do it defensively. Might save us from out-of-memory later on with higher tiers, and it could also make things faster: i.e. we might avoid timeouts, just because we need less memory. I don't have the overview how large these are and how many you'd create, so maybe it is unnecessary. Up to you.

>> src/hotspot/share/opto/graphInvariants.cpp line 447:
>> 
>>> 445:                     And::make(
>>> 446:                         new NodeClass(&Node::is_IfTrue),
>>> 447:                         new HasAtLeastNInputs(1),
>> 
>> Can an `IfTrue` have more than 1 input?
>
> I surely hope not!

Then I would assert that it has exactly 1 input instead ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324357889
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2324360409

From mchevalier at openjdk.org  Fri Sep  5 08:10:17 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 5 Sep 2025 08:10:17 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v4]
In-Reply-To: <U5Q37LxsRx_SIfTxQRRLDcAOebDWLN1BDa0e1sxmRg0=.eb6f7757-7b35-4a84-9703-2370a58c4c42@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <U5Q37LxsRx_SIfTxQRRLDcAOebDWLN1BDa0e1sxmRg0=.eb6f7757-7b35-4a84-9703-2370a58c4c42@github.com>
Message-ID: <SRLXy1MnrLg7Oqlcjow27zpLUzeyCHQdzARm1HTXacM=.2bb842eb-e46b-40d8-8386-fef040a10b50@github.com>

On Fri, 5 Sep 2025 07:42:07 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
>> 
>> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
>> 
>> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
>> 
>> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
>> 
>> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
>> 
>> 1 failure for node
>>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>> At node
>>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>>   From path:
>>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>>       <-(0)- 210  IfFalse  === 209  [[ 21...
>
> Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - With typed binding
>  - Review

I've fixed a lot, but notably added a basic test, and gave the typed binding a try. I would have liked it without macro, but I think it's ok to use. I sometime dream we had `node->is<RegionNode>()`, that would ease a few of these things (subjective).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26362#issuecomment-3257455746

From mchevalier at openjdk.org  Fri Sep  5 08:13:35 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 5 Sep 2025 08:13:35 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
Message-ID: <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>

> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
> 
> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
> 
> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
> 
> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
> 
> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
> 
> 1 failure for node
>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
> At node
>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>   From path:
>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>       <-(0)- 210  IfFalse  === 209  [[ 215 216 ]] #0 !orig=198 !jvms: StringL...

Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:

  One more ResourceMark

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26362/files
  - new: https://git.openjdk.org/jdk/pull/26362/files/3c33fac9..ea78a5a3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=03-04

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/26362.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26362/head:pull/26362

PR: https://git.openjdk.org/jdk/pull/26362

From epeter at openjdk.org  Fri Sep  5 08:13:35 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 08:13:35 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v4]
In-Reply-To: <SRLXy1MnrLg7Oqlcjow27zpLUzeyCHQdzARm1HTXacM=.2bb842eb-e46b-40d8-8386-fef040a10b50@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <U5Q37LxsRx_SIfTxQRRLDcAOebDWLN1BDa0e1sxmRg0=.eb6f7757-7b35-4a84-9703-2370a58c4c42@github.com>
 <SRLXy1MnrLg7Oqlcjow27zpLUzeyCHQdzARm1HTXacM=.2bb842eb-e46b-40d8-8386-fef040a10b50@github.com>
Message-ID: <qg2qigpPmxgPBa88zcPy1CCIplIIHgid08Mxu2TRkX0=.2de21455-bfdb-4091-861a-69be000bf8b6@github.com>

On Fri, 5 Sep 2025 08:07:21 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

> I sometime dream we had node->is<RegionNode>(), that would ease a few of these things (subjective).

Is that something we could do, in a separate RFE? I wonder if we could generate it with the same kind of macros as with which we define `is_Region`... we would just forward from the instantiation `is<RegionNode>` to `is_Region`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26362#issuecomment-3257466360

From mchevalier at openjdk.org  Fri Sep  5 08:17:18 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 5 Sep 2025 08:17:18 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
Message-ID: <f5pPcfq-OkVnLhq25SvCSvRES9PLFpGJnCjUULQ_Qwc=.6d1d2ae8-2c20-4fbd-93c5-86796d8de3eb@github.com>

On Fri, 5 Sep 2025 08:13:35 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
>> 
>> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
>> 
>> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
>> 
>> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
>> 
>> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
>> 
>> 1 failure for node
>>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>> At node
>>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>>   From path:
>>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>>       <-(0)- 210  IfFalse  === 209  [[ 21...
>
> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
> 
>   One more ResourceMark

Totally not in this change, yes. And indeed, we could just use the macro to define a bit more. But I fear it will be a controversial topic.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26362#issuecomment-3257476856

From duke at openjdk.org  Fri Sep  5 08:19:35 2025
From: duke at openjdk.org (erifan)
Date: Fri, 5 Sep 2025 08:19:35 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where the
 input index is a variable
Message-ID: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>

Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.

Benchmarks on Nvidia Grace machine with 128-bit SVE:

Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799


Benchmarks on Intel 6444y machine with 512-bit avx3:

Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
microMaskLaneIsSetLong256_var	ops/ms	113838.3822	415.784529	360782.0645	710.076899	3.169247
microMaskLaneIsSetLong512_var	ops/ms	57314.02695	190.1762	211690.8492	26.47233	3.693526
microMaskLaneIsSetShort128_var	ops/ms	57675.58965	65.940976	211549.9551	276.57545	3.667928
microMaskLaneIsSetShort256_var	ops/ms	57628.8642	91.957833	211694.0864	16.559412	3.673403
microMaskLaneIsSetShort512_var	ops/ms	57845.35211	160.537421	211358.872	660.777147	3.65386
microMaskLaneIsSetShort64_var	ops/ms	113848.8846	222.787418	360294.6295	491.425656	3.164674

-------------

Commit messages:
 - 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where the input index is a variable

Changes: https://git.openjdk.org/jdk/pull/27113/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27113&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366588
  Stats: 170 lines in 4 files changed: 168 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27113.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27113/head:pull/27113

PR: https://git.openjdk.org/jdk/pull/27113

From duke at openjdk.org  Fri Sep  5 08:21:11 2025
From: duke at openjdk.org (erifan)
Date: Fri, 5 Sep 2025 08:21:11 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
 <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>
 <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
Message-ID: <WkhK27_th579OEdhuFnWjb64eBfKkY8giwZFR9VUr6U=.9bd898fe-2c5c-43fd-b8d2-35fd5a1194bf@github.com>

On Tue, 2 Sep 2025 08:10:02 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Thanks @theRealAph .
>> 
>> I've indeed considered and implemented your idea. The code diff:
>> 
>> diff --git a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
>> index 11d302e9026..841d24f516b 100644
>> --- a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
>> +++ b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
>> @@ -3813,8 +3813,9 @@ template<typename R, typename... Rx>
>>                 bool isMerge, bool isFloat) {
>>      starti;
>>      assert(T != Q, "invalid size");
>> +    assert((!isFloat) || (isFloat && T != B), "invalid size");
>>      int sh = 0;
>> -    if (imm8 <= 127 && imm8 >= -128) {
>> +    if ((imm8 <= 127 && imm8 >= -128) || (isFloat && (imm8 >> 8) == 0)) {
>>        sh = 0;
>>      } else if (T != B && imm8 <= 32512 && imm8 >= -32768 && (imm8 & 0xff) == 0) {
>>        sh = 1;
>> @@ -3824,7 +3825,7 @@ template<typename R, typename... Rx>
>>      }
>>      int m = isMerge ? 1 : 0;
>>      f(0b00000101, 31, 24), f(T, 23, 22), f(0b01, 21, 20);
>> -    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), sf(imm8, 12, 5), rf(Zd, 0);
>> +    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8&0xff, 12, 5), rf(Zd, 0);
>>    }
>> 
>>  public:
>> @@ -3834,7 +3835,7 @@ template<typename R, typename... Rx>
>>    }
>>    // SVE copy floating-point immediate to vector elements (predicated)
>>    void sve_cpy(FloatRegister Zd, SIMD_RegVariant T, PRegister Pg, double d) {
>> -    sve_cpy(Zd, T, Pg, checked_cast<int8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
>> +    sve_cpy(Zd, T, Pg, checked_cast<uint8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
>>    }
>> 
>>    // SVE conditionally select elements from two vectors
>> 
>> 
>> However, some of my colleagues have differing opinions:
>> 1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
>> 2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** . While some encoding code can be reused, separating the encodings makes the code clearer.
>> 
>> I think both implementations are fine. If you think it's better to not refactor, I'll revert.
>
>> 1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
> 
> That's a fair point, but the Arch64 name for all four instructions is CPY, and they are distinguished by their operands. Deviation from the names in the Reference Manual is occasionally necessary, but it makes life painful for maintainers when they have to search for what we've called an instruction they want to use.
>  
>>     2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** .
> 
> Yes, that's right.
> 
>> While some encoding code can be reused, separating the encodings makes the code clearer.
> 
> I don't agree that it makes the code clearer. In fact, tight factoring emphasizes the fact that these instructions are similar, and explicitly shows where they are different.
> 
> It is true that I have a strong bias against copy-and-paste programming.
> 
>> I think both implementations are fine. If you think it's better to not refactor, I'll revert.
> 
> I do. Thank you.

Thanks for your review @theRealAph @eme64

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3257481227

From duke at openjdk.org  Fri Sep  5 08:21:13 2025
From: duke at openjdk.org (duke)
Date: Fri, 5 Sep 2025 08:21:13 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats [v3]
In-Reply-To: <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <s-iQ75fUMDx85bnb148Js3YzCsbhCVcFfy6rBUN7u54=.39d32f15-9fc9-4059-8a88-6974a4e97170@github.com>
Message-ID: <CQjKZMbteeU8fWQCZJybZ3b8-QCusQCcUc6_RLFh1dE=.eeed10dd-be84-4189-810a-09b9f65cae0d@github.com>

On Wed, 3 Sep 2025 10:02:24 GMT, erifan <duke at openjdk.org> wrote:

>> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
>> 
>> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
>> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
>> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
>> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
>> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
>> 
>> 2. Additionally, the encoding of the negative floating-point number is incorrect:
>> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
>> - Bit **13** should be encoded as **0** for floating-point numbers.
>> 
>> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
>> 
>> Some test cases are added to aarch64-asmtest.py, and all tests passed.
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Code style fixes

@erifan 
Your change (at version 66ba6570fd3a6f1a8faa794ed019e7aa768ac38e) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3257486195

From epeter at openjdk.org  Fri Sep  5 08:50:20 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 08:50:20 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <wM2-jj9R6aEVh5NPIoI3ycOdwjpABPLXAkzoA9Xa_8I=.d5042609-8733-4a46-9cba-220fd11e44b8@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
 <wM2-jj9R6aEVh5NPIoI3ycOdwjpABPLXAkzoA9Xa_8I=.d5042609-8733-4a46-9cba-220fd11e44b8@github.com>
Message-ID: <yFLvXkNvbhJczFW4Bj1W7tgg4p6FLzSU0UI7ZGn3Yw0=.87cb2152-4bcb-47d2-8610-131a10fb8cef@github.com>

On Thu, 4 Sep 2025 15:39:20 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I have seen 3 manifestations of this bug:
>> 
>> 1. assert
>> 
>> # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
>> # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require
>> 
>> 
>> 2. assert
>> 
>> # Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
>> # Error: assert(bt == T_FLOAT) failed
>> 
>> 
>> 3. Wrong result
>> When the feature was available but we used the wrong CastVector
>> 
>> It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:
>> 
>> 
>>   } else if (VectorNode::is_reinterpret_opcode(opc)) {
>>     assert(first->req() == 2 && req() == 2, "only one input expected");
>>     const TypeVect* vt = TypeVect::make(bt, vlen);
>>     vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());
>> 
>> 
>> Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.
>> 
>> But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
>> The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.
>> 
>> The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.
>
> Makes sense @eme64. Happy with the fix and tests :)

@galderz @iwanowww @TobiHartmann Thanks for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27100#issuecomment-3257571484

From epeter at openjdk.org  Fri Sep  5 08:50:22 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 08:50:22 GMT
Subject: Integrated: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
Message-ID: <tQjtgmOMbGrhPvzuto43rIJUGtOCLsOH6ie9rJgwwfo=.1b39f2aa-fca4-4e35-86b1-5d1a3e1d3074@github.com>

On Thu, 4 Sep 2025 14:42:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I have seen 3 manifestations of this bug:
> 
> 1. assert
> 
> # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
> # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require
> 
> 
> 2. assert
> 
> # Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
> # Error: assert(bt == T_FLOAT) failed
> 
> 
> 3. Wrong result
> When the feature was available but we used the wrong CastVector
> 
> It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:
> 
> 
>   } else if (VectorNode::is_reinterpret_opcode(opc)) {
>     assert(first->req() == 2 && req() == 2, "only one input expected");
>     const TypeVect* vt = TypeVect::make(bt, vlen);
>     vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());
> 
> 
> Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.
> 
> But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
> The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.
> 
> The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.

This pull request has now been integrated.

Changeset: e6fa8aae
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/e6fa8aae6168ea5a8579cd0a38209ca71c32e704
Stats:     226 lines in 2 files changed: 225 ins; 0 del; 1 mod

8366845: C2 SuperWord: wrong VectorCast after VectorReinterpret with swapped src/dst type

Reviewed-by: thartmann, galder, vlivanov

-------------

PR: https://git.openjdk.org/jdk/pull/27100

From epeter at openjdk.org  Fri Sep  5 09:03:21 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 09:03:21 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <wM2-jj9R6aEVh5NPIoI3ycOdwjpABPLXAkzoA9Xa_8I=.d5042609-8733-4a46-9cba-220fd11e44b8@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
 <wM2-jj9R6aEVh5NPIoI3ycOdwjpABPLXAkzoA9Xa_8I=.d5042609-8733-4a46-9cba-220fd11e44b8@github.com>
Message-ID: <h8Wj-7MocJpnnBqCE_UUJvMMEYKUz5Xv6imVz-Q7ziA=.87f4f694-815b-47b9-ab1f-7916da64ea8a@github.com>

On Thu, 4 Sep 2025 15:39:20 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> I have seen 3 manifestations of this bug:
>> 
>> 1. assert
>> 
>> # Internal Error (.../src/hotspot/cpu/x86/x86.ad:7640), pid=84140, tid=28419
>> # assert(UseAVX > 2 && VM_Version::supports_avx512dq()) failed: require
>> 
>> 
>> 2. assert
>> 
>> # Internal Error (.../src/hotspot/share/opto/vectornode.cpp:1601), pid=4022154, tid=4022168
>> # Error: assert(bt == T_FLOAT) failed
>> 
>> 
>> 3. Wrong result
>> When the feature was available but we used the wrong CastVector
>> 
>> It seems that [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) introduced reinterpret nodes to SuperWord:
>> 
>> 
>>   } else if (VectorNode::is_reinterpret_opcode(opc)) {
>>     assert(first->req() == 2 && req() == 2, "only one input expected");
>>     const TypeVect* vt = TypeVect::make(bt, vlen);
>>     vn = new VectorReinterpretNode(in1, vt, in1->bottom_type()->is_vect());
>> 
>> 
>> Sadly, the `src` and `dst` type are swapped. For JDK25 [JDK-8346236](https://bugs.openjdk.org/browse/JDK-8346236) this had no bad effect yet, since we only cast between HF and short, which are both based on short.
>> 
>> But with [JDK-8329077](https://bugs.openjdk.org/browse/JDK-8329077) we can now do reinterpret between I/F and between D/L. Here swapping has an effect, especially if it is followed by a cast:
>> The cast deterines its input type from the output type of the input node. If that was a reinterpret node with the wrong output type, **we would get a cast with the wrong src type**. We might do a double -> int cast instead of a long -> int cast. That leads to all sorts of issues.
>> 
>> The fuzzer test was only just recently added with [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). It uses MemorySegment, where unaligned float/double access gets handled with long/int memory access and then reinterpret (eg `MoveI2F`). But I was able to find examples that just work with `Float.intBitsToFloat` etc.
>
> Makes sense @eme64. Happy with the fix and tests :)

@galderz @iwanowww @TobiHartmann FYI, I filed:
[JDK-8366965](https://bugs.openjdk.org/browse/JDK-8366965) C2 SuperWord: add more tests for MoveF2I / Float.floatToRawIntBits and friends

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27100#issuecomment-3257611988

From duke at openjdk.org  Fri Sep  5 09:21:18 2025
From: duke at openjdk.org (Yuri Gaevsky)
Date: Fri, 5 Sep 2025 09:21:18 GMT
Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8]
In-Reply-To: <isq4_HXM5380mciK0K2BDfDp_hZwd3alU04VJN2Ac-M=.72a798ae-a268-426a-b47f-0d952df58097@github.com>
References: <zLCHjD8YiUNz4lIXlaVpeUlxRFNkExCwc-3lwUl2lVw=.72822718-a2b3-49e0-b3cb-ca1c803cbb4f@github.com>
 <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com>
 <RwWythxhNVWvJiCOA57QNH0F_uvxGi2k7SbjsGbToes=.8e100e02-e8a7-4cb1-89a9-3d546882de84@github.com>
 <isq4_HXM5380mciK0K2BDfDp_hZwd3alU04VJN2Ac-M=.72a798ae-a268-426a-b47f-0d952df58097@github.com>
Message-ID: <0xBTvjLjmNJpFoSlXulP1kaiNo97ld-fYPqsfLBzZXQ=.0b0baf4d-2373-4ad2-8bc2-47b68cc8d24f@github.com>

On Wed, 4 Jun 2025 06:04:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> As you can expect I am trying to implement the following code with RVV:
>> 
>> for (; i + (N-1) < cnt; i += N) {
>>    h =   31^^N     * h 
>>        + 31^^(N-1) * val[i + 0] 
>>        + 31^^(N-2) * val[i + 1] 
>> 	   ...
>>        + 31^^1 * val[i + (N-2)] 
>>        + 31^^0 * val[i + (N-1)];
>> }
>> for (; i < cnt; i++) {
>>    h = 31 * h + val[i];
>> }
>> 
>> where `N` is a number of processing array elements in "chunk".
>> IIUC, the main issue with your approach is "reverse" order of array elements versus preloaded `31^^X` coeffs WHEN the remaining number of elems is less than `N`, say `M=N-1`.
>> 
>>    h =   31^^M     * h 
>>        + 31^^(M-1) * val[i + 0] 
>>        + 31^^(M-2) * val[i + 1] 
>> 	   ...
>>        + 31^^1 * val[i + (M-2)] 
>>        + 32^^0 * val[i + (M-1)];
>> 
>> or returning to our `N` for clarity
>> 
>>    h =   31^^(N-1)     * h 
>>        + 31^^(N-2) * val[i + 0] 
>>        + 31^^(N-3) * val[i + 1] 
>> 	   ...
>>        + 31^^1 * val[i + (N-3)] 
>>        + 31^^0 * val[i + (N-2)];
>> 
>> Now we need to "slide down" preloaded multiplier coeffs in designated vector register by one (as `M=N-1`) to be in "sync" with `val[i + X]` (may be move them into temporary VR in the process), and moreover, DO this operation IFF the remaining `cnt` is less than `N` (==>an additional check on every iteration). That's probably acceptable only at tail phase as one-time operation but NOT inside of main loop...
>
> @ygaevsky @RealFYang how can we procced ?

Hi @robehn, could you please take a look at the latest updates? Thanks...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3257668418

From epeter at openjdk.org  Fri Sep  5 09:38:09 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 09:38:09 GMT
Subject: RFR: 8366890: C2: Split through phi printing with TraceLoopOpts
 misses line break
In-Reply-To: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
References: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
Message-ID: <ApSAihcZFykCbzrVsixrC4-oZR8nROMzY1kiqIjozfg=.de5e9cb3-ddb1-4290-bbd6-fd63b6bd2e3e@github.com>

On Thu, 4 Sep 2025 12:44:43 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

> [JDK-8356176](https://bugs.openjdk.org/browse/JDK-8356176) added new printing code for `TraceLoopOpts` when splitting nodes through a phi but missed a line break. This will result in:
> 
> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 RegionSplit-If
> 
> instead of
> 
> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 Region
> Split-If
> 
> This patch fixes this.
> 
> Thanks,
> Christian

Marked as reviewed by epeter (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27092#pullrequestreview-3188663001

From epeter at openjdk.org  Fri Sep  5 09:42:10 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 09:42:10 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT.
In-Reply-To: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
Message-ID: <ATM_n_8iXzz0fMLRguR9NY3zzlaUJJa3LlcI6eNr_Jw=.238c7672-90a8-470e-962d-40af844d8962@github.com>

On Wed, 3 Sep 2025 00:53:59 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

> Please, review this patch to fix issue that may occur when reducing allocation merge.
> 
> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
> 
> The change in `revisit_reducible_phi_status` is just a clean-up.
> The real fix is in `find_scalar_replaceable_allocs`.
> 
> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.

test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationNotReducibleAnymore.java line 58:

> 56:             }
> 57:         }
> 58:     }

Could we make the catch exception matching more precise? I'd just like to avoid a case where we miscompile and throw the wrong exception and that gets caught silently.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2324608898

From dlong at openjdk.org  Fri Sep  5 09:48:12 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 5 Sep 2025 09:48:12 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v2]
In-Reply-To: <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
 <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com>
Message-ID: <Oq3YTqRpayxl8xuTMum0szW-ILHPr_Yq_yYGiY0Yfww=.a0bcfb72-6863-497e-8bfd-dc9bff4ec140@github.com>

On Thu, 14 Aug 2025 10:54:08 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> # Issue
>> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered
>> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235
>> 
>> # Cause
>> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens:
>> * we insert a trailing `MemBarStoreStore` in the constructor
>> <img height="200" alt="before_folding" src="https://github.com/user-attachments/assets/c1aab634-808d-4198-94ac-8093c6b85c5d" />
>> 
>> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. 
>> <img height="200" alt="after_folding" src="https://github.com/user-attachments/assets/568e9fc3-5f19-4e10-a72e-f0a5e772daed" />
>> 
>> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302
>> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235
>> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier
>> 
>> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped).
>> 
>> # Fix
>> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation.
>> 
>> # Testing
>> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after.
>> Tier 1-3+ tests passed.
>
> Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8360031
>  - JDK-8360031: update assert message
>  - Merge branch 'master' into JDK-8360031
>  - JDK-8360031: remove unnecessary include
>  - JDK-8360031: remove UseNewCode
>  - JDK-8360031: compilation asserts in MemBarNode::remove

I stepped through the crash with the replay file, and I'm not convinced that the problem is only with MemBarStoreStore and not MemBarRelease.  What happens in the replay crash is the MemBarStoreStore gets onto the worklist through an indirect route in ConnectionGraph::split_unique_types() because of its memory edge.  I think this explains why it is intermittent and hard to reproduce.  A MemBarRelease on the other hand would get added to the worklist directly in compute_escape() if it has a Precedent edge.
The different handling of MemBarStoreStore vs MemBarRelease in this code is confusing.  The MemBarRelease code came from JDK-6934604.  It adds the node to the worklist, and lets MemBarNode::Ideal remove it based on does_not_escape_thread() on the alloc node.  Contrast that with the MemBarStoreStore handling, which came from JDK-7121140, and instead of removing the node, it replaces it with a MemBarCPUOrder based on not_global_escape() on the alloc node.  This MemBarStoreStore handling is for "MemBarStoreStore nodes added in library_call.cpp" and seems to fail to work for  MemBarStoreStore nodes added in the ctor, which means MemBarStoreStore nodes added in the ctor only get on the worklist by accident, as mentioned above.
I think the conservative fix is to have compute_escape() always add the MemBarStoreStore to the worklist if it has a Precedent edge.  Because of StressIGVN randomizing the worklist, I think the outcnt() can be 1 for either MemBarStoreStore or MemBarRelease, so we should relax the assert accordingly.  I'm not sure how useful the assert will be after that.  It might be better to remove it.
Longer-term, it might be nice to get rid of the separate handling of "MemBarStoreStore nodes added in library_call.cpp" if the MemBarCPUOrder is not really needed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3257743488

From epeter at openjdk.org  Fri Sep  5 09:49:10 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 09:49:10 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT.
In-Reply-To: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
Message-ID: <HZn3JeENl-T5K7vUd3hATBtHU7OuYga4gICJ-DC3z5c=.c3d954d1-eef8-4054-bc08-47d140f3f681@github.com>

On Wed, 3 Sep 2025 00:53:59 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

> Please, review this patch to fix issue that may occur when reducing allocation merge.
> 
> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
> 
> The change in `revisit_reducible_phi_status` is just a clean-up.
> The real fix is in `find_scalar_replaceable_allocs`.
> 
> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.

Just a drive-by comment.

You also have a broken title ;)

src/hotspot/share/opto/escape.cpp line 3078:

> 3076:     Node* phi = reducible_merges.at(i);
> 3077: 
> 3078:     if (!can_reduce_phi(phi->as_Phi())) {

You say this is a pure cleanup? There are some slight differences in the code though, right?
This method call checks `PhaseMacroExpand::can_eliminate_allocation`, and has a side effect with `ptn->set_scalar_replaceable(false)`.

Just pointing it out, not a EA expert.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27063#pullrequestreview-3188687569
PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2324618370

From shade at openjdk.org  Fri Sep  5 10:16:12 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 5 Sep 2025 10:16:12 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
Message-ID: <QunSTeuoPDHArXeK7cAKGYHYG8uy9S49l8XXAXrxfTc=.34152256-e559-45ee-b91d-2952663005ed@github.com>

On Fri, 5 Sep 2025 08:13:28 GMT, erifan <duke at openjdk.org> wrote:

> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
> 
> 
> Benchmarks on Intel 6444y machine with 512-bit avx3:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
> microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
> microMaskLane...

This looks fine to me. I took another look at [JDK-8358749](https://bugs.openjdk.org/browse/JDK-8358749), and I think this is the only place where we can really accept the non-constant input. In all other cases, we either pull `is_con()` or `const_oop()` out of the input.

I think we will bikeshed about the tests a bit.

test/micro/org/openjdk/bench/jdk/incubator/vector/VectorExtractBenchmark.java line 34:

> 32: @Warmup(iterations = 5, time = 1)
> 33: @Measurement(iterations = 5, time = 1)
> 34: @Fork(value = 1, jvmArgs = {"--add-modules=jdk.incubator.vector"})

Don't do 1 fork, do at least 3.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27113#pullrequestreview-3188769547
PR Review Comment: https://git.openjdk.org/jdk/pull/27113#discussion_r2324679427

From epeter at openjdk.org  Fri Sep  5 10:52:16 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 10:52:16 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
Message-ID: <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>

On Mon, 4 Aug 2025 02:31:08 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
>> 
>> ### Background
>> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
>> 
>> ### Implementation
>> 
>> #### Challenges
>> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
>> 
>> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
>> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
>> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
>> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
>> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
>> 
>> Use `ByteVector.SPECIES_512` as an example:
>> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
>> - It requires 4 times of vector gather-loads to finish the whole operation.
>> 
>> 
>> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
>> int[] idx = [0, 1, 2, 3, ..., 63, ...]
>> 
>> 4 gather-load:
>> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
>> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
>> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
>> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
>> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
>> 
>> 
>> #### Solution
>> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
>> 
>> Here is the main changes:
>> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
>> - Added `VectorSliceNode` for result mer...
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge 'jdk:master' into JDK-8351623-sve
>  - Address review comments
>  - Refine IR pattern and clean backend rules
>  - Fix indentation issue and move the helper matcher method to header files
>  - Merge branch jdk:master into JDK-8351623-sve
>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation

Looks very interesting. I have a first series of questions / comments :)

There is definitively a tradeoff between complexity in the backend and in the C2 IR. So I'm yet trying to wrap my head around that decision. I'm just afraid that adding more very specific C2 IR nodes makes things more complicated to do optimizations in the C2 IR.

src/hotspot/cpu/aarch64/aarch64_vector.ad line 6008:

> 6006: // predicate and place in elements of twice their size within
> 6007: // the destination predicate.
> 6008: 

Suggestion:


unnecessary empty line

src/hotspot/share/opto/vectornode.hpp line 1123:

> 1121:   // The basic type of memory, which might be different with the vector element type
> 1122:   // when it is a subword type loading.
> 1123:   BasicType _mem_bt;

Can you make an example and add it to the comment?
Can you please also add some comment at the node about what we expect the index map to be? What basic type does it have?

src/hotspot/share/opto/vectornode.hpp line 1769:

> 1767: //      dst = [h g f e d c b a]
> 1768: //
> 1769: class VectorConcatenateNode : public VectorNode {

That semantic is not quite what I would expect from `Concatenate`. Maybe we can call it something else?
`VectorConcatenateAndNarrowNode`?

src/hotspot/share/opto/vectornode.hpp line 1774:

> 1772:     : VectorNode(vec1, vec2, vt) {
> 1773:     assert(type2aelembytes(vec1->bottom_type()->is_vect()->element_basic_type()) ==
> 1774:            type2aelembytes(vt->element_basic_type()) * 2, "must be half size");

What about asserting that `vec1` and `vec2` have the same `vect`?

src/hotspot/share/opto/vectornode.hpp line 1841:

> 1839: 
> 1840: // Unpack the elements to twice size.
> 1841: class VectorMaskWidenNode : public VectorNode {

Can you add a visual example like above for `VectorConcatenateNode`, please?

-------------

PR Review: https://git.openjdk.org/jdk/pull/26236#pullrequestreview-3188813972
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324710079
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324736345
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324740007
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324741462
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324744990

From epeter at openjdk.org  Fri Sep  5 10:52:17 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 10:52:17 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v3]
In-Reply-To: <b1TzbMFznYJuFizcy93hsTxo9-hoyDe7YKUuIsy7xRA=.6811ef6f-3b3b-4b8a-b63b-75d824e65968@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <O5IGyu-C8N8goFvkFoKQxKuJ67f1_tedjCMqIwsLx1g=.69f50bdd-781e-4379-a8b5-12f8858ea299@github.com>
 <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com>
 <b1TzbMFznYJuFizcy93hsTxo9-hoyDe7YKUuIsy7xRA=.6811ef6f-3b3b-4b8a-b63b-75d824e65968@github.com>
Message-ID: <hjuxd7lDyNoeFhxtYBMJQA1IDwzdu5tb1ZQcBqQLSeA=.623134f4-b2b8-4010-a6b5-5815e9d29aaf@github.com>

On Fri, 1 Aug 2025 01:48:51 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> src/hotspot/cpu/arm/matcher_arm.hpp line 160:
>> 
>>> 158:   static const bool supports_encode_ascii_array = false;
>>> 159: 
>>> 160:   // Return true if vector gather-load/scatter-store needs vector index as input.
>> 
>> If the function returns `false`, does it indicate one of the following cases?
>> - Vector gather-load or scatter-store does not accept a vector index for the current use case on this platform.
>> - The current platform does not support vector gather-load or scatter-store at all.
>
> Yes, I think so.

To me a `false` means this:
If we support gater/scalter, then we do not need a vector index, we can do without it.

Is that correct?

But that would contradict @fg1417 's statement:
If we support gater/scalter, then we do not permit a vector index.

Can you clarify?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324726476

From epeter at openjdk.org  Fri Sep  5 10:52:18 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 10:52:18 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
Message-ID: <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>

On Fri, 5 Sep 2025 10:37:39 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Merge 'jdk:master' into JDK-8351623-sve
>>  - Address review comments
>>  - Refine IR pattern and clean backend rules
>>  - Fix indentation issue and move the helper matcher method to header files
>>  - Merge branch jdk:master into JDK-8351623-sve
>>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation
>
> src/hotspot/share/opto/vectornode.hpp line 1123:
> 
>> 1121:   // The basic type of memory, which might be different with the vector element type
>> 1122:   // when it is a subword type loading.
>> 1123:   BasicType _mem_bt;
> 
> Can you make an example and add it to the comment?
> Can you please also add some comment at the node about what we expect the index map to be? What basic type does it have?

Same for the scatter.

> src/hotspot/share/opto/vectornode.hpp line 1769:
> 
>> 1767: //      dst = [h g f e d c b a]
>> 1768: //
>> 1769: class VectorConcatenateNode : public VectorNode {
> 
> That semantic is not quite what I would expect from `Concatenate`. Maybe we can call it something else?
> `VectorConcatenateAndNarrowNode`?

Have you considered using `2x Cast + Concatenate` instead, and just matching that in the backend? I don't remember how to do the mere Concat, but it should be possible via the `unslice` or some other operation that concatenates two vectors.

> src/hotspot/share/opto/vectornode.hpp line 1774:
> 
>> 1772:     : VectorNode(vec1, vec2, vt) {
>> 1773:     assert(type2aelembytes(vec1->bottom_type()->is_vect()->element_basic_type()) ==
>> 1774:            type2aelembytes(vt->element_basic_type()) * 2, "must be half size");
> 
> What about asserting that `vec1` and `vec2` have the same `vect`?

And what about the vector length being consistent between `vec1`, `vec2` and `vt`?

> src/hotspot/share/opto/vectornode.hpp line 1841:
> 
>> 1839: 
>> 1840: // Unpack the elements to twice size.
>> 1841: class VectorMaskWidenNode : public VectorNode {
> 
> Can you add a visual example like above for `VectorConcatenateNode`, please?

Did you consider the alternative of `Extract` + `Cast`? Not sure if that would be better, you know more about the code complexity. It would just allow us to have one fewer nodes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324737096
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324754984
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324742727
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2324748722

From shade at openjdk.org  Fri Sep  5 11:45:20 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 5 Sep 2025 11:45:20 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
Message-ID: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>

See the bug for discussion what issues current machinery has. 

This PR executes the plan outlined in the bug:
 1. Common the receiver type profiling code in interpreter and C1
 2. Rewrite receiver type profiling code to only do atomic receiver slot installations
 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed 

This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.

Additional testing:
  - [x] Linux x86_64 server fastdebug, `compiler/`
  - [ ] Linux x86_64 server fastdebug, `all`

-------------

Commit messages:
 - Drop atomic counters
 - Initial version

Changes: https://git.openjdk.org/jdk/pull/25305/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8357258
  Stats: 350 lines in 7 files changed: 135 ins; 196 del; 19 mod
  Patch: https://git.openjdk.org/jdk/pull/25305.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305

PR: https://git.openjdk.org/jdk/pull/25305

From shade at openjdk.org  Fri Sep  5 11:45:20 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 5 Sep 2025 11:45:20 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
In-Reply-To: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
References: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
Message-ID: <2TI8gwmSmpcW8-UCscGYU_5qijJhfmmetVox0yDDkOU=.2bf74836-67c3-4b21-92b8-1780f2e03582@github.com>

On Mon, 19 May 2025 14:59:36 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See the bug for discussion what issues current machinery has. 
> 
> This PR executes the plan outlined in the bug:
>  1. Common the receiver type profiling code in interpreter and C1
>  2. Rewrite receiver type profiling code to only do atomic receiver slot installations
>  3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed 
> 
> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
> 
> Additional testing:
>   - [x] Linux x86_64 server fastdebug, `compiler/`
>   - [ ] Linux x86_64 server fastdebug, `all`

In addition to reliability improvements, doing a denser loop allows to significantly optimize tier3 code density. With larger `TypeProfileWidth`, type profile checks are the significant part of generated code. This density improvement allows us to do the CAS without increasing the code size. It also allows us to store (more) tier3 code in AOTCache going forward. If/when folks (looking at @theRealAph, really) start doing probabilistic profiling counters, this budget increase would also help to cram in more code.


$ for I in 1 2 3 4; do build/linux-x86_64-server-release/images/jdk/bin/java -XX:TieredStopAtLevel=${I} \
  -Xcomp -XX:+CITime -Xmx2g Hello.java 2>&1 | grep "Tier${I}" | cut -d' ' -f 3,23-; done

=== -XX:TypeProfileWidth=2 (default)

# Baseline
Tier1 nmethods_code_size:  7091616 bytes
Tier2 nmethods_code_size:  7579424 bytes
Tier3 nmethods_code_size: 17494984 bytes
Tier4 nmethods_code_size:  6058128 bytes

# Patched
Tier1 nmethods_code_size:  7091648 bytes
Tier2 nmethods_code_size:  7581808 bytes
Tier3 nmethods_code_size: 16806440 bytes (-4.1%)
Tier4 nmethods_code_size:  6057920 bytes

=== -XX:TypeProfileWidth=8 (default with +UseJVMCICompiler)

# Baseline
Tier1 nmethods_code_size:  7091672 bytes
Tier2 nmethods_code_size:  7580576 bytes
Tier3 nmethods_code_size: 28096448 bytes
Tier4 nmethods_code_size:  6061280 bytes

# Patched
Tier1 nmethods_code_size:  7090760 bytes
Tier2 nmethods_code_size:  7579432 bytes
Tier3 nmethods_code_size: 16837688 bytes (-66.7% !!!)
Tier4 nmethods_code_size:  6058104 bytes

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3258049226

From djelinski at openjdk.org  Fri Sep  5 13:37:41 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Fri, 5 Sep 2025 13:37:41 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from PhaseOutput::init_buffer
Message-ID: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>

The nop list has never been used in the history of OpenJDK. Let's clean it up.

Tested with Mach5 tier 1-5, no related failures.

-------------

Commit messages:
 - Update copyright
 - Remove outdated comment
 - Remove nop list

Changes: https://git.openjdk.org/jdk/pull/27117/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27117&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366971
  Stats: 83 lines in 11 files changed: 1 ins; 77 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27117.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27117/head:pull/27117

PR: https://git.openjdk.org/jdk/pull/27117

From epeter at openjdk.org  Fri Sep  5 13:51:21 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 13:51:21 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
Message-ID: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>

I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
https://github.com/openjdk/jdk/pull/20964
[See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)

This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.

---------------------------------

I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)

I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.

My vision:
- VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
- SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
- `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
- The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
  - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
  - That means it is straight-forward to compute cost
  - And it also makes optimizations on that graph easier
  - And the `apply` methods are simpler too

----------------------------------

So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.

One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.

What I did:
- Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
  - Will make it easier to optimize and compute cost in future RFE's.
- `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
  - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
- New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
  - `VTransformReinterpretVectorNode`
  - `VTransformElementWiseLongOpWithCastToIntVectorNode`
  - `VTransformCmpVectorNode`
- Rename `set_all_req_with_vectors` -> `init_all_req_with_vectors` (forgot it in #26991)
- A few smaller changes / refactorings.

-------------

Commit messages:
 - fix merge
 - manual merge conflict resolution
 - flatten
 - cleanup
 - adr_type refactor
 - hide prototype
 - wip x1
 - wip continued 2
 - wip continued
 - wip cleanup
 - ... and 13 more: https://git.openjdk.org/jdk/compare/0dad3f1a...05ee2800

Changes: https://git.openjdk.org/jdk/pull/27056/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366702
  Stats: 327 lines in 4 files changed: 169 ins; 55 del; 103 mod
  Patch: https://git.openjdk.org/jdk/pull/27056.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27056/head:pull/27056

PR: https://git.openjdk.org/jdk/pull/27056

From epeter at openjdk.org  Fri Sep  5 13:51:25 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 13:51:25 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
Message-ID: <nnUjRLzWULRJlT6s8JO8uKa2mULnxKUrBawK1cXfBcg=.57127717-cbd1-46e9-8ba2-05b3478e233e@github.com>

On Tue, 2 Sep 2025 15:30:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 91:

> 89:       init_req_with_scalar(p0,   vtn, MemNode::Control);
> 90:       init_req_with_scalar(p0,   vtn, MemNode::Address);
> 91:       init_req_with_vector(pack, vtn, MemNode::ValueIn);

I'm also adding control to the load/store vectors. That allows us to load control without access to the `nodes` in `VTransformLoadVectorNode::apply` and `VTransformStoreVectorNode::apply`:
https://github.com/openjdk/jdk/blob/05ee280048757e6ac095bf7e28708dce258635bf/src/hotspot/share/opto/vtransform.cpp#L877
https://github.com/openjdk/jdk/blob/05ee280048757e6ac095bf7e28708dce258635bf/src/hotspot/share/opto/vtransform.cpp#L906

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 119:

> 117:     } else {
> 118:       init_all_req_with_vectors(pack, vtn);
> 119:     }

I'm mostly flattening the control flow here.
There is also a new else case that just does `init_all_req_with_vectors(pack, vtn);` this applies to the new nodes that I split away from `ElementWiseVector`:
- `VTransformReinterpretVectorNode`
- `VTransformElementWiseLongOpWithCastToIntVectorNode`
- `VTransformCmpVectorNode`

I also adapted the logic for `CMove`, to integrate the special handling logic from `VTransformElementWiseVectorNode::apply`, so now the inputs are differently permuted already at this stage, and they are now already the same as the generated `BlendVector` will once have them.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 196:

> 194:     vtn = new (_vtransform.arena()) VTransformElementWiseVectorNode(_vtransform, p0->req(), prototype, vopc);
> 195:   } else if (VectorNode::is_scalar_op_that_returns_int_but_vector_op_returns_long(sopc)) {
> 196:     vtn = new (_vtransform.arena()) VTransformElementWiseLongOpWithCastToIntVectorNode(_vtransform, prototype);

Cases moved from `VTransformElementWiseVectorNode::apply`.

src/hotspot/share/opto/vtransform.cpp line 108:

> 106: #ifndef PRODUCT
> 107:   if (_trace._info) {
> 108:     print_schedule();

Verbose is often too much, but it is nice to see which `VTransformNode`s are generated, and to see their order after scheduling.

src/hotspot/share/opto/vtransform.cpp line 163:

> 161:       VTransformMemVectorNode* vtn = vtnodes.at(i)->isa_MemVector();
> 162:       if (vtn == nullptr) { continue; }
> 163:       const VPointer& vp = vtn->vpointer();

We can check for `MemVector` directly, and then we know that they all represent `Mem` nodes and they all have a `vpointer`.

src/hotspot/share/opto/vtransform.cpp line 798:

> 796:     // Handled by Bool / VTransformBoolVectorNode, so we do not generate any nodes here.
> 797:     return VTransformApplyResult::make_empty();
> 798:   }

Moved to `VTransformCmpVectorNode` -> has empty apply.

src/hotspot/share/opto/vtransform.cpp line 801:

> 799:     vn = VectorNode::make(vopc, in1, in2, vt); // unary and binary
> 800:   } else {
> 801:     vn = VectorNode::make(vopc, in1, in2, in3, vt); // ternary

Moved to `SuperWordVTransformBuilder::build_inputs_for_vector_vtnodes`, to simplify the logic here.

src/hotspot/share/opto/vtransform.cpp line 880:

> 878:   // first has the correct memory state, determined by VTransformGraph::apply_memops_reordering_with_schedule
> 879:   Node* mem  = first->in(MemNode::Memory);
> 880:   Node* adr  = apply_state.transformed_node(in_req(MemNode::Address));

There is still minimal reliance on `nodes` / `first`: but only for `mem` state. And currently, we cannot remove that yet, because we rely on the memory graph being reordered before vectorization, see `VTransformGraph::apply_memops_reordering_with_schedule`.

In a future RFE, I will refactor scheduling, so that we build the memory graph during apply.
See step 3 in [plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)

src/hotspot/share/opto/vtransform.cpp line 909:

> 907:   // first has the correct memory state, determined by VTransformGraph::apply_memops_reordering_with_schedule
> 908:   Node* mem  = first->in(MemNode::Memory);
> 909:   Node* adr  = apply_state.transformed_node(in_req(MemNode::Address));

There is still minimal reliance on nodes / first: but only for mem state. And currently, we cannot remove that yet, because we rely on the memory graph being reordered before vectorization, see VTransformGraph::apply_memops_reordering_with_schedule.

In a future RFE, I will refactor scheduling, so that we build the memory graph during apply.
See step 3 in [plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)

src/hotspot/share/opto/vtransform.cpp line 933:

> 931:   phase->register_new_node(vn, apply_state.vloop().cl());
> 932:   phase->igvn()._worklist.push(vn);
> 933:   VectorNode::trace_new_vector(vn, "AutoVectorization");

Removing the argument here allows us yet another removal of dependency on the old scalar graph. We only needed it for using the same control as the old graph - but that is not necessary, we can just use the CountedLoop as control, which is good enough.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325037712
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325051629
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325054470
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325058939
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325065437
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325066973
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325068600
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325076487
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325077256
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325079910

From epeter at openjdk.org  Fri Sep  5 13:51:26 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 5 Sep 2025 13:51:26 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <nnUjRLzWULRJlT6s8JO8uKa2mULnxKUrBawK1cXfBcg=.57127717-cbd1-46e9-8ba2-05b3478e233e@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <nnUjRLzWULRJlT6s8JO8uKa2mULnxKUrBawK1cXfBcg=.57127717-cbd1-46e9-8ba2-05b3478e233e@github.com>
Message-ID: <Iejrqoxe4Ds6mMcR2JXsMTP4oSIuFpxICcVrBIAl8iI=.40de9a6d-1a6a-466c-94f3-80fe8afc6361@github.com>

On Fri, 5 Sep 2025 13:13:02 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ---------------------------------
>> 
>> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>> 
>> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>> 
>> My vision:
>> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
>> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
>> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
>> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>>   - That means it is straight-forward to compute cost
>>   - And it also makes optimizations on that graph easier
>>   - And the `apply` methods are simpler too
>> 
>> ----------------------------------
>> 
>> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>> 
>> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>> 
>> What I did:
>> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>>   - Will make it easier to optimize and compute cost in future RFE's.
>> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
>> - New vector nodes, they are special cases I split away from ...
>
> src/hotspot/share/opto/vtransform.cpp line 801:
> 
>> 799:     vn = VectorNode::make(vopc, in1, in2, vt); // unary and binary
>> 800:   } else {
>> 801:     vn = VectorNode::make(vopc, in1, in2, in3, vt); // ternary
> 
> Moved to `SuperWordVTransformBuilder::build_inputs_for_vector_vtnodes`, to simplify the logic here.

`is_scalar_op_that_returns_int_but_vector_op_returns_long` moved down to `VTransformElementWiseLongOpWithCastToIntVectorNode`.

`is_reinterpret_opcode` moved down to `VTransformReinterpretVectorNode`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325071420

From chagedorn at openjdk.org  Fri Sep  5 14:56:18 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 5 Sep 2025 14:56:18 GMT
Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v8]
In-Reply-To: <7qgNsgKbFFtzVwuDG2yM_vIczHbzMj6ZUKh_7sz1qow=.d9aeab55-f647-43bc-af2a-48f23d5bbcca@github.com>
References: <RYYMa4Btd93FIC9tCnYMCX7HvuP4D-ODaICLXmjmKic=.5892c08b-92e9-4edd-b37f-cc13e90b469e@github.com>
 <7qgNsgKbFFtzVwuDG2yM_vIczHbzMj6ZUKh_7sz1qow=.d9aeab55-f647-43bc-af2a-48f23d5bbcca@github.com>
Message-ID: <1PU55shmn1ijfzU6eeVUqZ4aAMd1szDOfSN24J1wfKE=.475818f9-7120-42bc-9717-54358fd4e855@github.com>

On Tue, 26 Aug 2025 14:47:00 GMT, Kangcheng Xu <kxu at openjdk.org> wrote:

>> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. 
>> 
>> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think.
>> 
>> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759).
>
> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits:
> 
>  - Merge branch 'openjdk:master' into counted-loop-refactor
>  - Merge remote-tracking branch 'origin/master' into counted-loop-refactor
>    
>    # Conflicts:
>    #	src/hotspot/share/opto/loopnode.cpp
>    #	src/hotspot/share/opto/loopnode.hpp
>  - Merge branch 'master' into counted-loop-refactor
>    
>    # Conflicts:
>    #	src/hotspot/share/opto/loopnode.cpp
>    #	src/hotspot/share/opto/loopnode.hpp
>    #	src/hotspot/share/opto/loopopts.cpp
>  - Merge remote-tracking branch 'origin/master' into counted-loop-refactor
>  - further refactor is_counted_loop() by extracting functions
>  - WIP: refactor is_counted_loop()
>  - WIP: refactor is_counted_loop()
>  - WIP: review followups
>  - reviewer suggested changes
>  - line break
>  - ... and 14 more: https://git.openjdk.org/jdk/compare/173dedfb...763adeda

Hi @tabjy, sorry for letting you wait! I was very busy with other things. 

Thanks for coming back with an improved version! This looks much better already. I had a first over-viewing look. Will dive into it more again next week. I've just left some thoughts/comments here and there. Generally, I think we could improve on the classes to not just make them pure data holders with public access but actually allow users to call methods to interact with the data that we could hide to prevent modification. Let me know what you think :-)

src/hotspot/share/opto/loopnode.cpp line 442:

> 440: }
> 441: 
> 442: PhaseIdealLoop::LoopExitTest PhaseIdealLoop::loop_exit_test(const Node* back_control, const IdealLoopTree* loop) {

Just an idea here: Could this also be part of `LoopExitTest` instead? Then a user could do something like:

LoopExitTest loop_exit_test(...);
 // i.e. = PhaseIdealLoop::loop_exit_test() but with a `_is_valid` flag. Then at the end you 
 // can also check for the right Cmp opcode and that it's not null which the current caller of 
 // loop_exit_test() are all doing. If that's off, you set `_is_valid` to false accordingly.
loop_exit_test.build();
if (loop_exit_test.is_not_valid()) {
...
}


The same also applies for the other classes like `LoopIVIncr`, `LoopIVStride` etc.

src/hotspot/share/opto/loopnode.cpp line 1881:

> 1879:   PhaseIterGVN* igvn = &_phase->igvn();
> 1880: 
> 1881:   LoopStructure structure{};

I think you can remove the `{}`:
Suggestion:

  LoopStructure structure;

src/hotspot/share/opto/loopnode.cpp line 2258:

> 2256: }
> 2257: 
> 2258: bool CountedLoopConverter::build_loop_structure(CountedLoopConverter::LoopStructure& structure) {

Suggestion:

bool CountedLoopConverter::build_loop_structure(LoopStructure& structure) {

src/hotspot/share/opto/loopnode.cpp line 2259:

> 2257: 
> 2258: bool CountedLoopConverter::build_loop_structure(CountedLoopConverter::LoopStructure& structure) {
> 2259:   PhaseIterGVN* igvn = &_phase->igvn();

Not used anymore and can be removed

src/hotspot/share/opto/loopnode.cpp line 2266:

> 2264:   }
> 2265: 
> 2266:   PhaseIdealLoop::LoopExitTest exit_test = _phase->loop_exit_test(back_control, _loop);

Some thoughts/suggestions here:
- The method is still big and you need a moment to figure out what's going on/what checks we do.
- It looks like you are only initializing fields of `LoopStructure`. Couldn't you move the method to this class?
- You could have a separate field `_is_valid` in `LoopStructure`, then you could remove the `bool` return. I.e. this would then look something like this:

LoopStructure loop_structure;
loop_structure.build();
if (loop_structure.is_not_valid()) {
  return false;
}

You might need to pass in some additional info like `phase` to `LoopStructure` but I think that's okay.
- When doing the thing above, then you can just step by step assign the fields as you go and as soon as something is off (i.e. not a counted loop anymore), you set `_is_valid` to false and stop parsing further. This would allow you to further split the method up which also improves documentation and moves field specific things to separate initializer methods:

back_control = _phase->loop_exit_control(_head, _loop);
if (back_control == nullptr) {
  _is_valid = false;
  return false;
}
exit_test = exit_test();
if (exit_test.is_not_valid()) {
  _is_valid = false;
  return;
}
incr = incr();
iv_incr = PhaseIdealLoop::loop_iv_incr(incr, _head, _loop);
....

- Btw, you should use a `_` prefix for the fields.

src/hotspot/share/opto/loopnode.cpp line 2329:

> 2327:   structure.phi = phi;
> 2328: 
> 2329:   structure.sfpt = sfpt;

Are you later really going to use all these fields? I haven't double-checked. Another note here: I think it would be better to hide the fields and provide accessor methods. Otherwise, everyone can update them.

src/hotspot/share/opto/loopnode.hpp line 1327:

> 1325:   static PhiNode* loop_iv_phi(const Node* xphi, const Node* phi_incr, const Node* head);
> 1326: 
> 1327:   bool try_convert_to_counted_loop(Node* head, IdealLoopTree*&loop, const BasicType iv_bt);

Suggestion:

  bool try_convert_to_counted_loop(Node* head, IdealLoopTree*& loop, const BasicType iv_bt);

-------------

PR Review: https://git.openjdk.org/jdk/pull/24458#pullrequestreview-3189442899
PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2325295897
PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2325148526
PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2325150664
PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2325150111
PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2325253187
PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2325256932
PR Review Comment: https://git.openjdk.org/jdk/pull/24458#discussion_r2325264509

From chagedorn at openjdk.org  Fri Sep  5 15:29:16 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 5 Sep 2025 15:29:16 GMT
Subject: RFR: 8366890: C2: Split through phi printing with TraceLoopOpts
 misses line break
In-Reply-To: <wJQ26w4TGzd8upKilJcDTMOGM9L3FF2cK-QwV2EyWqI=.5a058edd-bb7a-4f4b-a081-05d5984f07df@github.com>
References: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
 <wJQ26w4TGzd8upKilJcDTMOGM9L3FF2cK-QwV2EyWqI=.5a058edd-bb7a-4f4b-a081-05d5984f07df@github.com>
Message-ID: <LPlJms8dcisLWVZTD14lkMIckYvU2jrj1YhaZhjK9e4=.ecf495c6-7e9d-4451-bb9d-7de70cf89371@github.com>

On Thu, 4 Sep 2025 13:23:24 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> [JDK-8356176](https://bugs.openjdk.org/browse/JDK-8356176) added new printing code for `TraceLoopOpts` when splitting nodes through a phi but missed a line break. This will result in:
>> 
>> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 RegionSplit-If
>> 
>> instead of
>> 
>> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 Region
>> Split-If
>> 
>> This patch fixes this.
>> 
>> Thanks,
>> Christian
>
> Looks good, and trivial.

Thanks @robcasloz, @mhaessig and @eme64 for your reviews! And no worries @mhaessig, was easy to overlook :-)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27092#issuecomment-3258771939

From chagedorn at openjdk.org  Fri Sep  5 15:29:17 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 5 Sep 2025 15:29:17 GMT
Subject: Integrated: 8366890: C2: Split through phi printing with TraceLoopOpts
 misses line break
In-Reply-To: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
References: <w38THu82glx944Ctpeyeg9xwQttEGlEuq5iqacW6vTE=.3c2890ac-f09f-4af7-965c-5a9c754619f0@github.com>
Message-ID: <yXlVQS_fGuPhL_ct6borF3Xf4xO9CzCg2YZfE8qXYp0=.bf8b0887-4e2b-48dc-b7e3-678a51e646d0@github.com>

On Thu, 4 Sep 2025 12:44:43 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

> [JDK-8356176](https://bugs.openjdk.org/browse/JDK-8356176) added new printing code for `TraceLoopOpts` when splitting nodes through a phi but missed a line break. This will result in:
> 
> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 RegionSplit-If
> 
> instead of
> 
> Split 974 CmpI through 1465 Phi in 953 RegionSplit 474 Bool through 1468 Phi in 953 Region
> Split-If
> 
> This patch fixes this.
> 
> Thanks,
> Christian

This pull request has now been integrated.

Changeset: ceacf6f7
Author:    Christian Hagedorn <chagedorn at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/ceacf6f7852514dc9877cfe284f9550c179d913a
Stats:     2 lines in 1 file changed: 0 ins; 0 del; 2 mod

8366890: C2: Split through phi printing with TraceLoopOpts misses line break

Reviewed-by: rcastanedalo, mhaessig, epeter

-------------

PR: https://git.openjdk.org/jdk/pull/27092

From vlivanov at openjdk.org  Fri Sep  5 16:47:23 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 5 Sep 2025 16:47:23 GMT
Subject: Integrated: 8358751: C2: Recursive inlining check for compiled lambda
 forms is broken
In-Reply-To: <osdBJeRtiCSx5LDg56oHUFfA5vJVsD3ipLTO9fY7Awg=.4b2d1298-9b16-49d6-ab05-dec04644433e@github.com>
References: <osdBJeRtiCSx5LDg56oHUFfA5vJVsD3ipLTO9fY7Awg=.4b2d1298-9b16-49d6-ab05-dec04644433e@github.com>
Message-ID: <H66XKR0QL50gtD0l2YqaSHl79Qi_OQWLCQT2F_UlqQY=.a691a6b5-589a-4705-850e-faac9cb6b76d@github.com>

On Fri, 22 Aug 2025 01:24:52 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> Recursive inlining checks are relaxed for compiled LambdaForms. Since LambdaForms are heavily reused, the check is performed on `MethodHandle` receivers instead.
> 
> Unfortunately, the current implementation is broken. JVMState doesn't guarantee presence of receivers for caller frames.
> An attempt to fetch pruned receiver reports unrelated info, but, in the worst case, it ends up as an out-of-bounds access into node's input array and crashes the JVM.
>   
> Proposed fix captures receiver information as part of inlining and preserves it on `JVMState` for every compiled LambdaForm frame, so it can be reliably recovered during subsequent inlining attempts.  
> 
> Testing: hs-tier1 - hs-tier8
> 
> (Special thanks to @mroth23 who prepared a reproducer of the bug.)

This pull request has now been integrated.

Changeset: 9cca4f7c
Author:    Vladimir Ivanov <vlivanov at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/9cca4f7c760bea9bf79f7c03f37a70449acad51e
Stats:     76 lines in 4 files changed: 42 ins; 1 del; 33 mod

8358751: C2: Recursive inlining check for compiled lambda forms is broken

Reviewed-by: dlong, roland

-------------

PR: https://git.openjdk.org/jdk/pull/26891

From mhaessig at openjdk.org  Fri Sep  5 16:52:20 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 5 Sep 2025 16:52:20 GMT
Subject: RFR: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java
Message-ID: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>

The test definitions of `TestAlignVectorFuzzer.java` all contain `printcompilation` directives. These are redundant and slow down the test execution of a test that already often times out. @eme64 also suggested adding a `compileonly` directive to one of the four tests.

Testing:
 - [ ] Github Actions
 - [ ] tier1 and stress testing (features `TestAlignVectorFuzzer.java`)

-------------

Commit messages:
 - Fix flags

Changes: https://git.openjdk.org/jdk/pull/27122/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27122&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366878
  Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27122.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27122/head:pull/27122

PR: https://git.openjdk.org/jdk/pull/27122

From jbhateja at openjdk.org  Fri Sep  5 17:14:27 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 5 Sep 2025 17:14:27 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v2]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <gozlEWWUYOptCN9oFGHXDOZToKwFAddZfMM5yPVYhlk=.d60b5191-2a6f-4015-90b7-0e1b7e8d3251@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> PopCountValueTransform.StockKernelInt         thrpt    2  409295.875          ops/s
> PopCountValueTransform.StockKernelLong        thrpt    2  368025.608          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> PopCountValueTransform.StockKernelInt         thrpt    2  418649.269          ops/s
> PopCountValueTransform.StockKernelLong        thrpt    2  381330.221          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  New IR test addition and review resolutions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/c83be331..a68fbc08

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=00-01

  Stats: 170 lines in 4 files changed: 155 ins; 4 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From mhaessig at openjdk.org  Fri Sep  5 17:15:47 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 5 Sep 2025 17:15:47 GMT
Subject: RFR: 8366569: Disable CompileTaskTimeout for known long-running test
 cases
Message-ID: <rCQCdTqE2N4pT5FMErsG2TaTihvD2M5zrTt7AxWVCU0=.3fdb499a-9fb0-4f1b-b8c3-81bab4e66f94@github.com>

This PR deliberately disables compile task timeouts using `-XX:CompileTaskTimeout=0` on some tests that are known to have long compilation times due to their construction. Disabling the timeouts in the task description enables running all other tests in the test suite in a Ci with a lower timeout and thus a higher chance of discovering degenerate compilations.

In a perfect world, timeout values passed from the commandline would be increased by some factor to also have timeouts on these tests when requested. However, I am working with the tools I know and have...

Testing:
 - [ ] Github Actions
 - [x] tier1,tier2,tier3 and stress testing with fastdebug on Oracle supported platforms.

-------------

Commit messages:
 - Disable timeouts

Changes: https://git.openjdk.org/jdk/pull/27123/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27123&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366569
  Stats: 21 lines in 5 files changed: 16 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27123.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27123/head:pull/27123

PR: https://git.openjdk.org/jdk/pull/27123

From jbhateja at openjdk.org  Fri Sep  5 17:17:52 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 5 Sep 2025 17:17:52 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> PopCountValueTransform.StockKernelInt         thrpt    2  409295.875          ops/s
> PopCountValueTransform.StockKernelLong        thrpt    2  368025.608          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> PopCountValueTransform.StockKernelInt         thrpt    2  418649.269          ops/s
> PopCountValueTransform.StockKernelLong        thrpt    2  381330.221          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Update countbitsnode.cpp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/a68fbc08..52ae6bc8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=01-02

  Stats: 9 lines in 1 file changed: 0 ins; 1 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From cslucas at openjdk.org  Fri Sep  5 17:19:11 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Fri, 5 Sep 2025 17:19:11 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <ATM_n_8iXzz0fMLRguR9NY3zzlaUJJa3LlcI6eNr_Jw=.238c7672-90a8-470e-962d-40af844d8962@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <ATM_n_8iXzz0fMLRguR9NY3zzlaUJJa3LlcI6eNr_Jw=.238c7672-90a8-470e-962d-40af844d8962@github.com>
Message-ID: <qC_SHeldUkzEbgRBMVczPsJhWUgWfL9puQZQzSXjQjs=.b9d45ba5-7a52-4fc0-9864-ee54739ac62a@github.com>

On Fri, 5 Sep 2025 09:39:54 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Please, review this patch to fix issue that may occur when reducing allocation merge.
>> 
>> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
>> 
>> The change in `revisit_reducible_phi_status` is just a clean-up.
>> The real fix is in `find_scalar_replaceable_allocs`.
>> 
>> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.
>
> test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationNotReducibleAnymore.java line 58:
> 
>> 56:             }
>> 57:         }
>> 58:     }
> 
> Could we make the catch exception matching more precise? I'd just like to avoid a case where we miscompile and throw the wrong exception and that gets caught silently.

I'll do a best effort attempt to minimize this test again.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2325647857

From cslucas at openjdk.org  Fri Sep  5 17:24:15 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Fri, 5 Sep 2025 17:24:15 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <HZn3JeENl-T5K7vUd3hATBtHU7OuYga4gICJ-DC3z5c=.c3d954d1-eef8-4054-bc08-47d140f3f681@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <HZn3JeENl-T5K7vUd3hATBtHU7OuYga4gICJ-DC3z5c=.c3d954d1-eef8-4054-bc08-47d140f3f681@github.com>
Message-ID: <wj_1hdt5WcbD5unkehMsdZD9EnEAolkuD9M7HAMIj3g=.57a67b52-b032-4523-ad10-9d14474c3cc2@github.com>

On Fri, 5 Sep 2025 09:44:22 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Please, review this patch to fix issue that may occur when reducing allocation merge.
>> 
>> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
>> 
>> The change in `revisit_reducible_phi_status` is just a clean-up.
>> The real fix is in `find_scalar_replaceable_allocs`.
>> 
>> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.
>
> src/hotspot/share/opto/escape.cpp line 3078:
> 
>> 3076:     Node* phi = reducible_merges.at(i);
>> 3077: 
>> 3078:     if (!can_reduce_phi(phi->as_Phi())) {
> 
> You say this is a pure cleanup? There are some slight differences in the code though, right?
> This method call checks `PhaseMacroExpand::can_eliminate_allocation`, and has a side effect with `ptn->set_scalar_replaceable(false)`.
> 
> Just pointing it out, not a EA expert.

That shouldn't make a difference at this point in the analysis. I mentioned this is just a clean up because the verification that needs to be done at this point is essentially what is already performed in `can_reduce_phi` and this change doesn't have anything to do with the original issue.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2325657330

From mhaessig at openjdk.org  Fri Sep  5 17:51:14 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 5 Sep 2025 17:51:14 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
Message-ID: <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>

On Tue, 2 Sep 2025 15:30:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

Thank you for your continued effort on cost modelling, @eme64! I have some minor style comments and questions, but this mostly looks good to me. 

Regarding style, I find the alignment of local variables to be a bit distracting, especially when the aligned "things" are different operations and things are sometimes aligned and sometimes not. However, I do not know the style of the rest of the SuperWord code.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 115:

> 113:       VTransformBoolVectorNode* vtn_mask_cmp = vtn->in_req(3)->isa_BoolVector();
> 114:       if (vtn_mask_cmp->test()._is_negated) {
> 115:         vtn->swap_req(1, 2); // swap if test was negated.

Suggestion:

      // Inputs must be permuted from (mask, blend1, blend2) -> (blend1, blend2, mask)
      // Or, if the test was negated: (blend1, blend2, mask) -> (blend2, blend1, mask)
      vtn->swap_req(1, 3); // Now, the reqs are negated.
      VTransformBoolVectorNode* vtn_mask_cmp = vtn->in_req(3)->isa_BoolVector();
      if (!vtn_mask_cmp->test()._is_negated) {
        vtn->swap_req(1, 2); // Swap if test was not negated.

This would save to a swap, but I am unsure if this is also more readable.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 154:

> 152:   Node* p0 = pack->at(0);
> 153:   const VTransformVectorNodePrototype prototype = VTransformVectorNodePrototype::make_from_pack(pack, _vloop_analyzer);
> 154:   const int  sopc = prototype.scalar_opcode();

Suggestion:

  const int sopc = prototype.scalar_opcode();

Nit: whitespace
Or were you trying to align with the line below? Personally, I find this a bit too much, but up to you.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 155:

> 153:   const VTransformVectorNodePrototype prototype = VTransformVectorNodePrototype::make_from_pack(pack, _vloop_analyzer);
> 154:   const int  sopc = prototype.scalar_opcode();
> 155:   const uint vlen = prototype.vector_length();

As someone that is not familiar with the Superword code: is "pack size" and "vector length" often used interchangeably? if not, then I would keep the name.

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27056#pullrequestreview-3190237084
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325655080
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325656896
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325662867

From mhaessig at openjdk.org  Fri Sep  5 17:51:14 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 5 Sep 2025 17:51:14 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <nnUjRLzWULRJlT6s8JO8uKa2mULnxKUrBawK1cXfBcg=.57127717-cbd1-46e9-8ba2-05b3478e233e@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <nnUjRLzWULRJlT6s8JO8uKa2mULnxKUrBawK1cXfBcg=.57127717-cbd1-46e9-8ba2-05b3478e233e@github.com>
Message-ID: <-ntIikGlF88WGhDkEeTA2rU7xWHBuSRlaLUVwDvDzUY=.b482c2ed-ddc5-46e8-b7ad-2ee8e9f8fd67@github.com>

On Fri, 5 Sep 2025 13:18:07 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> we can just use the CountedLoop as control, which is good enough

For my understanding: this is because we can only vectorize if there are no other control dependencies in the loop?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325684207

From dlong at openjdk.org  Fri Sep  5 19:58:09 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 5 Sep 2025 19:58:09 GMT
Subject: RFR: 8366569: Disable CompileTaskTimeout for known long-running
 test cases
In-Reply-To: <rCQCdTqE2N4pT5FMErsG2TaTihvD2M5zrTt7AxWVCU0=.3fdb499a-9fb0-4f1b-b8c3-81bab4e66f94@github.com>
References: <rCQCdTqE2N4pT5FMErsG2TaTihvD2M5zrTt7AxWVCU0=.3fdb499a-9fb0-4f1b-b8c3-81bab4e66f94@github.com>
Message-ID: <YpZOsVy38nVN6HkEhWECN7Ncq31uvYVNE_-iutFXMQc=.861aa11b-9228-4823-94a2-44cae9b8d88f@github.com>

On Fri, 5 Sep 2025 16:59:18 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> This PR deliberately disables compile task timeouts using `-XX:CompileTaskTimeout=0` on some tests that are known to have long compilation times due to their construction. Disabling the timeouts in the task description enables running all other tests in the test suite in a Ci with a lower timeout and thus a higher chance of discovering degenerate compilations.
> 
> In a perfect world, timeout values passed from the commandline would be increased by some factor to also have timeouts on these tests when requested. However, I am working with the tools I know and have...
> 
> Testing:
>  - [ ] Github Actions
>  - [x] tier1,tier2,tier3 and stress testing with fastdebug on Oracle supported platforms.

Looks good, and trivial.

-------------

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27123#pullrequestreview-3190614580

From mhaessig at openjdk.org  Fri Sep  5 20:01:16 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 5 Sep 2025 20:01:16 GMT
Subject: RFR: 8366569: Disable CompileTaskTimeout for known long-running
 test cases
In-Reply-To: <YpZOsVy38nVN6HkEhWECN7Ncq31uvYVNE_-iutFXMQc=.861aa11b-9228-4823-94a2-44cae9b8d88f@github.com>
References: <rCQCdTqE2N4pT5FMErsG2TaTihvD2M5zrTt7AxWVCU0=.3fdb499a-9fb0-4f1b-b8c3-81bab4e66f94@github.com>
 <YpZOsVy38nVN6HkEhWECN7Ncq31uvYVNE_-iutFXMQc=.861aa11b-9228-4823-94a2-44cae9b8d88f@github.com>
Message-ID: <L_26z-Kp2ZiXR6nh0QFAdkxwm-SZN6gaBh1luuRctfk=.df565f9f-6a88-49fe-9780-3894302d4e1a@github.com>

On Fri, 5 Sep 2025 19:55:20 GMT, Dean Long <dlong at openjdk.org> wrote:

>> This PR deliberately disables compile task timeouts using `-XX:CompileTaskTimeout=0` on some tests that are known to have long compilation times due to their construction. Disabling the timeouts in the task description enables running all other tests in the test suite in a Ci with a lower timeout and thus a higher chance of discovering degenerate compilations.
>> 
>> In a perfect world, timeout values passed from the commandline would be increased by some factor to also have timeouts on these tests when requested. However, I am working with the tools I know and have...
>> 
>> Testing:
>>  - [x] Github Actions
>>  - [x] tier1,tier2,tier3 and stress testing with fastdebug on Oracle supported platforms.
>
> Looks good, and trivial.

Thank you for reviewing, @dean-long!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27123#issuecomment-3259592016

From mhaessig at openjdk.org  Fri Sep  5 20:01:17 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 5 Sep 2025 20:01:17 GMT
Subject: Integrated: 8366569: Disable CompileTaskTimeout for known long-running
 test cases
In-Reply-To: <rCQCdTqE2N4pT5FMErsG2TaTihvD2M5zrTt7AxWVCU0=.3fdb499a-9fb0-4f1b-b8c3-81bab4e66f94@github.com>
References: <rCQCdTqE2N4pT5FMErsG2TaTihvD2M5zrTt7AxWVCU0=.3fdb499a-9fb0-4f1b-b8c3-81bab4e66f94@github.com>
Message-ID: <C_yP19DVjSwxwP5H6LHCp1ibVE6ZoE7TgFfPQVJoc_I=.12010454-3df9-4921-b9e8-421f46543c1d@github.com>

On Fri, 5 Sep 2025 16:59:18 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> This PR deliberately disables compile task timeouts using `-XX:CompileTaskTimeout=0` on some tests that are known to have long compilation times due to their construction. Disabling the timeouts in the task description enables running all other tests in the test suite in a Ci with a lower timeout and thus a higher chance of discovering degenerate compilations.
> 
> In a perfect world, timeout values passed from the commandline would be increased by some factor to also have timeouts on these tests when requested. However, I am working with the tools I know and have...
> 
> Testing:
>  - [x] Github Actions
>  - [x] tier1,tier2,tier3 and stress testing with fastdebug on Oracle supported platforms.

This pull request has now been integrated.

Changeset: 4ab2b5bd
Author:    Manuel H?ssig <mhaessig at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/4ab2b5bdb4e6d40a747d4088a25f7c6351131759
Stats:     21 lines in 5 files changed: 16 ins; 0 del; 5 mod

8366569: Disable CompileTaskTimeout for known long-running test cases

Reviewed-by: dlong

-------------

PR: https://git.openjdk.org/jdk/pull/27123

From sviswanathan at openjdk.org  Fri Sep  5 22:08:14 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Fri, 5 Sep 2025 22:08:14 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v2]
In-Reply-To: <sbf00znMv9WzwFzEeUmfpmCPJ02Zdp4RK6vIacVtYH8=.27735940-f027-487b-9ca6-9cfe9944da23@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <sbf00znMv9WzwFzEeUmfpmCPJ02Zdp4RK6vIacVtYH8=.27735940-f027-487b-9ca6-9cfe9944da23@github.com>
Message-ID: <RgCxtL-YvvIRVHHMEIBPkeWqCQoCtO9qfpqu6E68x68=.764e737f-d163-4195-bb92-caeac831add9@github.com>

On Thu, 4 Sep 2025 20:11:28 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
>> 
>> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
>> 
>> For example:
>> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
>> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - nomenclature change
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into cdemotion
>  - remove trailing whitespaces
>  - remove unused instructions
>  - 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2

src/hotspot/cpu/x86/assembler_x86.cpp line 13125:

> 13123:   emit_arith(op1, op2, src1, src2, second_operand_demotable);
> 13124: }
> 13125: 

This could be written something like below:

void Assembler::emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register src1, Register src2, VexSimdPrefix pre, VexOpcode opc,
                                                      InstructionAttr *attributes, int op1, int op2, bool no_flags, bool use_prefixq, bool is_commutative) {
  bool demotable = is_demotable(no_flags, dst->encoding(), src1->encoding());
  if (!demotable && is_commutative) {
      if (is_demotable(no_flags, dst->encoding(), src2->encoding())) {
        demotable = true;
        // swap src1 and src2
        Register tmp = src1;
        src1 = src2;
        src2 = tmp;
      }     
  } 
 (void)emit_eevex_prefix_or_demote_ndd(src1->encoding(), dst->encoding(), src2->encoding(), pre, opc, attributes, no_flags, use_prefixq);
  emit_arith(op1, op2, src1, src2);
}


Then we don't need extra argument in emit_arith() and emit_eevex_prefix_or_demote_ndd.

src/hotspot/cpu/x86/assembler_x86.hpp line 812:

> 810:   void emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register src1, Register src2, VexSimdPrefix pre, VexOpcode opc,
> 811:                                       InstructionAttr *attributes, int op1, int op2, bool no_flags = false, bool use_prefixq = false, bool is_commutative = false);
> 812: 

The attributes parameter could be replaced by int size and the attributes computed inside the emit_eevex_prefix_or_demote_arith_ndd. Also then no need to have use_prefixq as a separate parameter, (size == EVEX_64bit) implies use_prefixq.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2326128354
PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2325781623

From missa at openjdk.org  Sat Sep  6 00:31:36 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 6 Sep 2025 00:31:36 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v6]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <aTEg_T0IcgmwJyWbKPcOekrWBepNB9Uvaqs0NNScbvo=.b542d137-157b-439e-a18c-d0e97feab330@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:

 - Add new IR rules to vector float to integer conversion tests
 - Fix match rule for AVX 10.2 double to long scalar conversion

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/e0c84f69..6407cc48

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=04-05

  Stats: 81 lines in 3 files changed: 64 ins; 0 del; 17 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From missa at openjdk.org  Sat Sep  6 00:31:36 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 6 Sep 2025 00:31:36 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v2]
In-Reply-To: <Ome8T1rq6SFBf9AkwRZvjV2UbPPX9EnaEgoTE5oJz7Y=.53fdfa3f-31d4-4a74-8a0c-14317ef5c5f1@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <qCTL7ytC98tTgipvaLxd9U5mgqBWhI45eS-Gy4_EnSo=.939751de-2adf-45e5-924a-4469de333938@github.com>
 <OGnqrh6sJ6pUldrhttHHkG_tSVVv7vM2So_Q2F9F-wI=.224b58bf-6464-4a16-bbc0-6bf61d904009@github.com>
 <Ome8T1rq6SFBf9AkwRZvjV2UbPPX9EnaEgoTE5oJz7Y=.53fdfa3f-31d4-4a74-8a0c-14317ef5c5f1@github.com>
Message-ID: <n-siOErgS2rctFXJMCJCghmKgcF7Zup5Et-PsHNRzIg=.229d78d2-a26c-47b3-b506-d486a2dd17cd@github.com>

On Fri, 29 Aug 2025 23:35:37 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> @missa-prime Looks like an interesting patch! Do you think you could add some sort of IR test here, to verify that the correct code is generated on AVX10 vs lower AVX?
>
>> @missa-prime Looks like an interesting patch! Do you think you could add some sort of IR test here, to verify that the correct code is generated on AVX10 vs lower AVX?
> 
> @eme64 Thanks for the suggestion. This patch doesn't modify any IR though, so I'm not sure what IR test(s) to add. I could modify existing tests (`test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`, `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java`, `test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java`) that use IR nodes as dependencies though. Would that be sufficient? Or did you have something else in mind?

> @missa-prime Could you not match on the mach graph? See example: `test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java` with `CompilePhase.FINAL_CODE`.
> 
> Maybe another `CompilePhase` is better. I have never matched on the mach graph myself, but I wonder if it may be useful here.

I modified existing vector conversion tests, and I'll add some matching scalar tests to get full coverage.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3260121943

From dlong at openjdk.org  Sat Sep  6 00:34:11 2025
From: dlong at openjdk.org (Dean Long)
Date: Sat, 6 Sep 2025 00:34:11 GMT
Subject: RFR: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation
In-Reply-To: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
Message-ID: <em2n02cGTkEEL9PRHfV9pRcQga7Yft_yndZGv4lzbLA=.5c5f7ef9-6e07-43af-9e9a-e72e7bbfff6e@github.com>

On Fri, 5 Sep 2025 15:27:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> When running a debug JVM on Linux with a compile task timeout and repeated compilation, the execution will time out almost always because the timeout does not reset for repetitions of a compilation. The core of the compile task timeout is to limit the amount of time a single compilation can take. Thus, this PR resets the `CompileTaskTimeout` for every compilation when running with `-XX:RepeatCompilation=<n>` for n > 1.
> 
> This PR is stacked on top of #27094.
> 
> Testing:
>  - [x] Github Actions (failures are unrelated)
>  - [x] tier1, tier2, tier3 plus some additional internal testing

Looks good!

-------------

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27120#pullrequestreview-3191180852

From missa at openjdk.org  Sat Sep  6 00:42:15 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 6 Sep 2025 00:42:15 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v3]
In-Reply-To: <vctouxyF_DZPik3yL78XZ80uIy4XEWV208upb8X6abw=.7c21b163-bda9-49e8-9b1f-7fbc8f0fb4e0@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <_Wv0Roo5xUHjswP_JUy6yzoU5KCwNpIoX3S2QBceUbE=.05b5bbbd-840b-4162-a454-94a9ddc2a69f@github.com>
 <F_GuF6fALkPyF5Iz-sSY4NCu6msyYBT1ZIJ64HAqHHc=.d1c0388c-3f79-4fb6-9800-43a7e74e5643@github.com>
 <vctouxyF_DZPik3yL78XZ80uIy4XEWV208upb8X6abw=.7c21b163-bda9-49e8-9b1f-7fbc8f0fb4e0@github.com>
Message-ID: <1Yb2TM9_3y_98k508MooB4a5amzOh8hhuhVV4HjjmTI=.a19114e4-1eec-4c70-8f50-d6b3941b7f24@github.com>

On Mon, 1 Sep 2025 07:51:49 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> src/hotspot/cpu/x86/x86.ad line 7804:
>> 
>>> 7802:   predicate(VM_Version::supports_avx10_2() &&
>>> 7803:             is_integral_type(Matcher::vector_element_basic_type(n)));
>>> 7804:   match(Set dst (VectorCastD2X src));
>> 
>> I assume your intent here is to feed the memory operand to the vector cast IR, a memory operand is first loaded into register using LoadVector IR, so a CISC / memory variant of pattern should consume the Load IR such that the operand is directly exposed to the instruction. Checkout https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8986
>
> Make a similar change in all the newly added memory patterns.

I updated the scalar and vector memory patterns. I'm not completely sure about the vector ones though, so I'll try and test further.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2326312130

From missa at openjdk.org  Sat Sep  6 00:42:17 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 6 Sep 2025 00:42:17 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v5]
In-Reply-To: <V03-wM4Ds15lpJj6OToGNIWdZVyV-rWCX8WrGo1_mjs=.e277016e-48be-4a5e-9d00-3f8e06a88175@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <eQC5NKVWyupKfHixG9i4qZBbLmoMdR5By9_SddHV1WM=.e21d5302-f02b-4733-8bc5-5e797fef9ab3@github.com>
 <V03-wM4Ds15lpJj6OToGNIWdZVyV-rWCX8WrGo1_mjs=.e277016e-48be-4a5e-9d00-3f8e06a88175@github.com>
Message-ID: <g3PP585na6bPCBvU7e3vBnOjVCI2qO1DfI9quwNJiNU=.63eb59c8-140f-463c-b7da-cb20518ae71a@github.com>

On Thu, 4 Sep 2025 05:40:17 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add AVX 10.2 CPU feature flag to list of verified ones
>
> test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java line 90:
> 
>> 88:     @Test
>> 89:     @IR(counts = {IRNode.VECTOR_CAST_F2I, IRNode.VECTOR_SIZE_16, "> 0"},
>> 90:         applyIfCPUFeatureOr = {"avx512f", "true", "avx10_2", "true"})
> 
> You should check for target specific Machine IR which is selected on AVX10_2 targets.

New checks are added.

> test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java line 108:
> 
>> 106:     @Test
>> 107:     @IR(counts = {IRNode.VECTOR_CAST_F2L, IRNode.VECTOR_SIZE_8, "> 0"},
>> 108:         applyIfCPUFeatureOr = {"avx512dq", "true", "avx10_2", "true"})
> 
> avx10_2 is super set of AVX512DQ, we enable all AVX512 featurs during VM initialization and IRFrameWork rely on the same.

I updated the checks to account for this.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2326312302
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2326312430

From missa at openjdk.org  Sat Sep  6 04:09:17 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 6 Sep 2025 04:09:17 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v7]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <Rl8ykv-PSXjjwJURmICqdrnueUVkcR6ai0MTxs9XKB8=.274fd9d7-1530-4c8a-a5ed-090e7feb80c4@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Avoid machine instruction searches in IR rules for non-AVX10.2 platforms during Vector floating point to integer conversion tests

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/6407cc48..709b4439

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=05-06

  Stats: 24 lines in 2 files changed: 0 ins; 0 del; 24 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From duke at openjdk.org  Sat Sep  6 05:44:27 2025
From: duke at openjdk.org (duke)
Date: Sat, 6 Sep 2025 05:44:27 GMT
Subject: Withdrawn: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed
In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
Message-ID: <-zRTU72Jahqjyst5pleNDuMXos0x4i0S5H_YDZsVTzQ=.60ec04de-ca7e-4639-9545-7105fd199a09@github.com>

On Thu, 10 Apr 2025 11:39:36 GMT, Roland Westrelin <roland at openjdk.org> wrote:

> An `Initialize` node for an `Allocate` node is created with a memory
> `Proj` of adr type raw memory. In order for stores to be captured, the
> memory state out of the allocation is a `MergeMem` with slices for the
> various object fields/array element set to the raw memory `Proj` of
> the `Initialize` node. If `Phi`s need to be created during later
> transformations from this memory state, The `Phi` for a particular
> slice gets its adr type from the type of the `Proj` which is raw
> memory. If during macro expansion, the `Allocate` is found to have no
> use and so can be removed, the `Proj` out of the `Initialize` is
> replaced by the memory state on input to the `Allocate`. A `Phi` for
> some slice for a field of an object will end up with the raw memory
> state on input to the `Allocate` node. As a result, memory state at
> the `Phi` is incorrect and incorrect execution can happen.
> 
> The fix I propose is, rather than have a single `Proj` for the memory
> state out of the `Initialize` with adr type raw memory, to use one
> `Proj` per slice added to the memory state after the `Initalize`. Each
> of the `Proj` should return the right adr type for its slice. For that
> I propose having a new type of `Proj`: `NarrowMemProj` that captures
> the right adr type.
> 
> Logic for the construction of the `Allocate`/`Initialize` subgraph is
> tweaked so the right adr type captured in is own `NarrowMemProj` is
> added to the memory sugraph. Code that removes an allocation or moves
> it also has to be changed so it correctly takes the multiple memory
> projections out of the `Initialize` node into account.
> 
> One tricky issue is that when EA split types for a scalar replaceable
> `Allocate` node:
> 
> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>   with the type of the slices for the allocation
>   
> 2- before EA, the memory state for one particular field out of the
>   `Initialize` node can be used for a `Store` to the just allocated
>   object or some other. So we can have a chain of `Store`s, some to
>   the newly allocated object, some to some other objects, all of them
>   using the state of `NarrowMemProj` out of the `Initialize`. After
>   split unique types, the `NarrowMemProj` is for the slice of a
>   particular allocation. So `Store`s to some other objects shouldn't
>   use that memory state but the memory state before the `Allocate`.
>   
> For that, I added logic to update the adr type of `NarrowMemProj`
> during split unique types and update the memory input of `Store`s that
> don't depend on the memory state ...

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/24570

From missa at openjdk.org  Sat Sep  6 06:38:20 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 6 Sep 2025 06:38:20 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v8]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <KoHIBZJrRo54C3ewtNXiwjy0w_tJKprOg2o2F3IGHHA=.8b38484e-1a4c-48f1-b244-fff3af32bc8a@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Use applyIfCPUFeatureAnd to check multiple CPU feature pairs in tests

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/709b4439..b7d3ae34

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=06-07

  Stats: 16 lines in 2 files changed: 0 ins; 0 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From missa at openjdk.org  Sat Sep  6 09:44:56 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 6 Sep 2025 09:44:56 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v9]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <mG1sFdV99uAG4cWGfM6kCew9UVLdVuG4_GHADimAsVQ=.8013b182-afc1-4156-9718-13efb348bbb6@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Check for scalar casting instead of vector casting in tests when disabling vector alignment or compact object headers

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/b7d3ae34..4d8f3ab6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=07-08

  Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From duke at openjdk.org  Mon Sep  8 01:25:18 2025
From: duke at openjdk.org (erifan)
Date: Mon, 8 Sep 2025 01:25:18 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <QunSTeuoPDHArXeK7cAKGYHYG8uy9S49l8XXAXrxfTc=.34152256-e559-45ee-b91d-2952663005ed@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
 <QunSTeuoPDHArXeK7cAKGYHYG8uy9S49l8XXAXrxfTc=.34152256-e559-45ee-b91d-2952663005ed@github.com>
Message-ID: <U2EmBE3HG7bJwIATZZZ_I-2u-LLXuXEc50677VYSicQ=.4578989a-ea70-4708-8725-72d9e2a2275f@github.com>

On Fri, 5 Sep 2025 10:12:35 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
>> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
>> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
>> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
>> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
>> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
>> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
>> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
>> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
>> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
>> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
>> 
>> 
>> Benchmarks on Intel 6444y machine with 512-bit avx3:
>> 
>> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
>> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
>> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
>> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
>> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
>> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
>> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
>> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
>> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
>> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
>> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
>> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
>> microMaskLaneIsSetInt512_var	ops/ms	573...
>
> test/micro/org/openjdk/bench/jdk/incubator/vector/VectorExtractBenchmark.java line 34:
> 
>> 32: @Warmup(iterations = 5, time = 1)
>> 33: @Measurement(iterations = 5, time = 1)
>> 34: @Fork(value = 1, jvmArgs = {"--add-modules=jdk.incubator.vector"})
> 
> Don't do 1 fork, do at least 3.

The test results show that this test is stable, so I think forking once is enough? We have many JMH benchmarks that fork once.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27113#discussion_r2328949227

From xgong at openjdk.org  Mon Sep  8 02:33:14 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Mon, 8 Sep 2025 02:33:14 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v3]
In-Reply-To: <hjuxd7lDyNoeFhxtYBMJQA1IDwzdu5tb1ZQcBqQLSeA=.623134f4-b2b8-4010-a6b5-5815e9d29aaf@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <O5IGyu-C8N8goFvkFoKQxKuJ67f1_tedjCMqIwsLx1g=.69f50bdd-781e-4379-a8b5-12f8858ea299@github.com>
 <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com>
 <b1TzbMFznYJuFizcy93hsTxo9-hoyDe7YKUuIsy7xRA=.6811ef6f-3b3b-4b8a-b63b-75d824e65968@github.com>
 <hjuxd7lDyNoeFhxtYBMJQA1IDwzdu5tb1ZQcBqQLSeA=.623134f4-b2b8-4010-a6b5-5815e9d29aaf@github.com>
Message-ID: <ylIL4AVS9i4oBXIImUlxGzE1uDAToMvzF282-EnOG8A=.a61aaeeb-cfef-44ae-8913-ee8f6f58b781@github.com>

On Fri, 5 Sep 2025 10:32:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> To me a `false` means this: If we support gater/scalter, then we do not need a vector index, we can do without it.
> 
> Is that correct?

Thanks for your review!  Actually gather/scatter always need an index input. What this function want to decide is how the index elements are passed to the operations.

It doesn't take an assumption whether vector gather_load/scatter_store is supported or not in backend. It just checks whether the `index` input of such operations requires a vector register or an address which stores the indexes. Currently, on x86, it passes an array address for subword types (the indexes are then will be loaded one-by-one in backend codegen). However, on AArch64, we requires it a vector type for all types instead (the indexes have been loaded and saved into vector registers in IR level). 

> The current platform does not support vector gather-load or scatter-store at all.

I'm sorry that I didn't  clarify very clear about @fg1417 's second statement. Whether the current platform supports vector gather-load/scatter-store is still decided by `Matcher::match_rule_supported_vector()` like other operations. It return `false` here just because arm doesn't support any vector operations. Assume if it want to support a vector gather/scatter, the index input must not be a vector, right?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2328999842

From xgong at openjdk.org  Mon Sep  8 02:33:15 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Mon, 8 Sep 2025 02:33:15 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
Message-ID: <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>

On Fri, 5 Sep 2025 10:47:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/vectornode.hpp line 1769:
>> 
>>> 1767: //      dst = [h g f e d c b a]
>>> 1768: //
>>> 1769: class VectorConcatenateNode : public VectorNode {
>> 
>> That semantic is not quite what I would expect from `Concatenate`. Maybe we can call it something else?
>> `VectorConcatenateAndNarrowNode`?
>
> Have you considered using `2x Cast + Concatenate` instead, and just matching that in the backend? I don't remember how to do the mere Concat, but it should be possible via the `unslice` or some other operation that concatenates two vectors.

> That semantic is not quite what I would expect from `Concatenate`. Maybe we can call it something else? `VectorConcatenateAndNarrowNode`?

Yeah, `VectorConcatenateAndNarrowNode` would be much match. I just thought the name would be too long. I will change it in next commit. Thanks for your suggestion!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2329001531

From xgong at openjdk.org  Mon Sep  8 03:03:13 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Mon, 8 Sep 2025 03:03:13 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
 <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>
Message-ID: <P1FNs23o3qks_15w5YJCBfiwLMs1QW_aBI2KSkptKZ4=.83c7e3b3-15fe-47a0-86dc-e1549af59e20@github.com>

On Mon, 8 Sep 2025 02:30:20 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Have you considered using `2x Cast + Concatenate` instead, and just matching that in the backend? I don't remember how to do the mere Concat, but it should be possible via the `unslice` or some other operation that concatenates two vectors.
>
>> That semantic is not quite what I would expect from `Concatenate`. Maybe we can call it something else? `VectorConcatenateAndNarrowNode`?
> 
> Yeah, `VectorConcatenateAndNarrowNode` would be much match. I just thought the name would be too long. I will change it in next commit. Thanks for your suggestion!

> Have you considered using `2x Cast + Concatenate` instead, and just matching that in the backend? I don't remember how to do the mere Concat, but it should be possible via the `unslice` or some other operation that concatenates two vectors.

Would using `2x Cast + Concatenate` make the IRs and match rule more complex? Mere concatenate would be something like `vector slice` in Vector API.  It concatenates two vectors into one with an index denoting the merging position. And it requires the vector types are the same for two input vectors and the dst vector. Hence, if we want to separate this operation with cast and concatenate, the IRs would be (assume original type of `v1/v2` is `4-int`, the result type should be `8-short`):
1) Narrow two input vectors:
`v1 = VectorCast(v1)  (4-short); v2 = VectorCast(v2) (4-short)`. 
The vector length are not changed while the element size is half size. Hence the vector length in bytes is half size as well.
2) Resize `v1` and `v2` to double vector length. The higher bits are cleared:
`v1 = VectorReinterpret(v1) (8-short); v2 = VectorReinterpret(v2) (8-short)`.
3) Concatenate `v1` and `v2` like slice. The position is the middle of the vector length.
`v = VectorSlice(v1, v2, 4)  (8-short)`.

If we want to merging these IRs in backend, would the match rule be more complex? I will take a considering.

>> And what about the vector length being consistent between `vec1`, `vec2` and `vt`?
>
>> What about asserting that `vec1` and `vec2` have the same `vect`?
> 
> That would be fine. Thanks! I will add it in next commit.

> And what about the vector length being consistent between `vec1`, `vec2` and `vt`?

Yes, I think the vector length in bytes must be consistent. I will add the assertion as well.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2329024826
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2329027242

From xgong at openjdk.org  Mon Sep  8 03:03:14 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Mon, 8 Sep 2025 03:03:14 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
Message-ID: <B05Ii9Aad79PBnDuHLSJXQs1W7EASfULsQDpqbwru98=.71b6c595-0b8e-432f-87fc-a3d13acaab67@github.com>

On Fri, 5 Sep 2025 10:41:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/vectornode.hpp line 1774:
>> 
>>> 1772:     : VectorNode(vec1, vec2, vt) {
>>> 1773:     assert(type2aelembytes(vec1->bottom_type()->is_vect()->element_basic_type()) ==
>>> 1774:            type2aelembytes(vt->element_basic_type()) * 2, "must be half size");
>> 
>> What about asserting that `vec1` and `vec2` have the same `vect`?
>
> And what about the vector length being consistent between `vec1`, `vec2` and `vt`?

> What about asserting that `vec1` and `vec2` have the same `vect`?

That would be fine. Thanks! I will add it in next commit.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2329026579

From xgong at openjdk.org  Mon Sep  8 03:15:22 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Mon, 8 Sep 2025 03:15:22 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
Message-ID: <nqrYrU7Mek9J-2Ogzuo1ZEMXN6iVdKsOztA403z7kcg=.ac9b40ba-5cd0-4f61-b245-a09559fb1de4@github.com>

On Fri, 5 Sep 2025 10:44:28 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/vectornode.hpp line 1841:
>> 
>>> 1839: 
>>> 1840: // Unpack the elements to twice size.
>>> 1841: class VectorMaskWidenNode : public VectorNode {
>> 
>> Can you add a visual example like above for `VectorConcatenateNode`, please?
>
> Did you consider the alternative of `Extract` + `Cast`? Not sure if that would be better, you know more about the code complexity. It would just allow us to have one fewer nodes.

It just has the `Extract` node to extract an element from vector in C2, right? Extracting the lowest part can be implemented with `VectorReinterpret` easily. But how about the higher parts? Maybe this can also be implemented with operations like `slice` ? But, seems this will also make the IR more complex? For `Cast`, we have `VectorCastMask` now, but it assumes the vector length should be the same for input and output. So the `VectorReinterpret` or an `VectorExtract` is sill needed. 

I can have a try with separating the IR. But I guess an additional new node is still necessary. 

> It would just allow us to have one fewer nodes.

This is also what I expect really.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2329040437

From epeter at openjdk.org  Mon Sep  8 05:56:19 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 05:56:19 GMT
Subject: RFR: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java
In-Reply-To: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
References: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
Message-ID: <5PWmoHhlhYHDD7WBje51yGzGHr1Dq3QCDRNApA64MmY=.ed2e0b11-e144-4e24-97dd-7a7ccdd208c0@github.com>

On Fri, 5 Sep 2025 16:46:09 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> The test definitions of `TestAlignVectorFuzzer.java` all contain `printcompilation` directives. These are redundant and slow down the test execution of a test that already often times out. @eme64 also suggested adding a `compileonly` directive to one of the four tests.
> 
> Testing:
>  - [ ] Github Actions
>  - [ ] tier1 and stress testing (features `TestAlignVectorFuzzer.java`)

test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java line 35:

> 33:  *                                 -XX:CompileCommand=compileonly,compiler.loopopts.superword.TestAlignVectorFuzzer::*
> 34:  *                                 compiler.loopopts.superword.TestAlignVectorFuzzer
> 35:  */

I think it would be good if we also had the same run but without the compileonly. That's what I meant by duplication ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27122#discussion_r2329202898

From epeter at openjdk.org  Mon Sep  8 06:01:11 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 06:01:11 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>
Message-ID: <4IAtrzmFmfx0HgchA6HNgqifFCbTFxAmfJyQgym5O3w=.c0ed236e-cd04-447e-ac6f-1e4cd14ebdb8@github.com>

On Fri, 5 Sep 2025 17:25:01 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ---------------------------------
>> 
>> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>> 
>> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>> 
>> My vision:
>> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
>> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
>> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
>> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>>   - That means it is straight-forward to compute cost
>>   - And it also makes optimizations on that graph easier
>>   - And the `apply` methods are simpler too
>> 
>> ----------------------------------
>> 
>> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>> 
>> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>> 
>> What I did:
>> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>>   - Will make it easier to optimize and compute cost in future RFE's.
>> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
>> - New vector nodes, they are special cases I split away from ...
>
> src/hotspot/share/opto/superwordVTransformBuilder.cpp line 155:
> 
>> 153:   const VTransformVectorNodePrototype prototype = VTransformVectorNodePrototype::make_from_pack(pack, _vloop_analyzer);
>> 154:   const int  sopc = prototype.scalar_opcode();
>> 155:   const uint vlen = prototype.vector_length();
> 
> As someone that is not familiar with the Superword code: is "pack size" and "vector length" often used interchangeably? if not, then I would keep the name.

`SuperWord` works with packs, i.e. packing scalars. So then we can measure the size of a pack, that measures how many scalars we have packed.

But `VTransform` is a "preview" of the new vectorized C2 IR. And there, we talk about the length of vectors. So I think this is exactly the right place to do the transition ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329211537

From epeter at openjdk.org  Mon Sep  8 06:04:11 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 06:04:11 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <-ntIikGlF88WGhDkEeTA2rU7xWHBuSRlaLUVwDvDzUY=.b482c2ed-ddc5-46e8-b7ad-2ee8e9f8fd67@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <nnUjRLzWULRJlT6s8JO8uKa2mULnxKUrBawK1cXfBcg=.57127717-cbd1-46e9-8ba2-05b3478e233e@github.com>
 <-ntIikGlF88WGhDkEeTA2rU7xWHBuSRlaLUVwDvDzUY=.b482c2ed-ddc5-46e8-b7ad-2ee8e9f8fd67@github.com>
Message-ID: <wleNFVd9vTR5L67ePguMMpPaaxbZ2qlAk-5zVAYNL68=.203526b6-f0b8-4265-962e-f81b6663a4ce@github.com>

On Fri, 5 Sep 2025 17:37:21 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> src/hotspot/share/opto/vtransform.cpp line 933:
>> 
>>> 931:   phase->register_new_node(vn, apply_state.vloop().cl());
>>> 932:   phase->igvn()._worklist.push(vn);
>>> 933:   VectorNode::trace_new_vector(vn, "AutoVectorization");
>> 
>> Removing the argument here allows us yet another removal of dependency on the old scalar graph. We only needed it for using the same control as the old graph - but that is not necessary, we can just use the CountedLoop as control, which is good enough.
>
>> we can just use the CountedLoop as control, which is good enough
> 
> For my understanding: this is because we can only vectorize if there are no other control dependencies in the loop?

Setting control is only for PhaseIdealLoop. It sets the internal ctrl that other loop-opts would rely on if we kept on optimizing the loop in the same PhaseIdealLoop. But if we set major progress, that means that we have messed up the graph so much that the state of PhaseIdealLoop may no longer be correct/accurate enough.

So if we set major progress, we are essencially allowed to mess up the PhaseIdealLoop state. I still have to set ctrl, but it does not matter if it is not correct ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329215458

From epeter at openjdk.org  Mon Sep  8 06:07:12 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 06:07:12 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>
Message-ID: <sqxqDGsPq5F9VCNoJRS05ERzJ5g_vQU2tZ8Jg9Kbja0=.db920a99-3ccb-4c7d-9773-9fc6ff9f5fa6@github.com>

On Fri, 5 Sep 2025 17:20:36 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ---------------------------------
>> 
>> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>> 
>> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>> 
>> My vision:
>> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
>> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
>> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
>> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>>   - That means it is straight-forward to compute cost
>>   - And it also makes optimizations on that graph easier
>>   - And the `apply` methods are simpler too
>> 
>> ----------------------------------
>> 
>> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>> 
>> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>> 
>> What I did:
>> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>>   - Will make it easier to optimize and compute cost in future RFE's.
>> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
>> - New vector nodes, they are special cases I split away from ...
>
> src/hotspot/share/opto/superwordVTransformBuilder.cpp line 115:
> 
>> 113:       VTransformBoolVectorNode* vtn_mask_cmp = vtn->in_req(3)->isa_BoolVector();
>> 114:       if (vtn_mask_cmp->test()._is_negated) {
>> 115:         vtn->swap_req(1, 2); // swap if test was negated.
> 
> Suggestion:
> 
>       // Inputs must be permuted from (mask, blend1, blend2) -> (blend1, blend2, mask)
>       // Or, if the test was negated: (blend1, blend2, mask) -> (blend2, blend1, mask)
>       vtn->swap_req(1, 3); // Now, the reqs are negated.
>       VTransformBoolVectorNode* vtn_mask_cmp = vtn->in_req(3)->isa_BoolVector();
>       if (!vtn_mask_cmp->test()._is_negated) {
>         vtn->swap_req(1, 2); // Swap if test was not negated.
> 
> This would save to a swap, but I am unsure if this is also more readable.

It would be a nice optimization if swap was expensive. But it is not really. I think I prefer the more readable solution  here. But it's a bit of a toss-up. If another reviewer has a preference I'm willing to go with the majority ;)

> src/hotspot/share/opto/superwordVTransformBuilder.cpp line 154:
> 
>> 152:   Node* p0 = pack->at(0);
>> 153:   const VTransformVectorNodePrototype prototype = VTransformVectorNodePrototype::make_from_pack(pack, _vloop_analyzer);
>> 154:   const int  sopc = prototype.scalar_opcode();
> 
> Suggestion:
> 
>   const int sopc = prototype.scalar_opcode();
> 
> Nit: whitespace
> Or were you trying to align with the line below? Personally, I find this a bit too much, but up to you.

Yes, I was trying to get alignment. I'll try some alternatives.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329219093
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329220235

From dzhang at openjdk.org  Mon Sep  8 06:08:10 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Mon, 8 Sep 2025 06:08:10 GMT
Subject: RFR: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture
Message-ID: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>

Hi,
Can you help to review this patch? Thanks!

This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.

### Test
- [x] Run tier1 and tier2 on sg2042

-------------

Commit messages:
 - 8367048: RISC-V: Correct pipeline descriptions of the architecture

Changes: https://git.openjdk.org/jdk/pull/27134/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27134&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367048
  Stats: 12 lines in 1 file changed: 5 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/27134.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27134/head:pull/27134

PR: https://git.openjdk.org/jdk/pull/27134

From epeter at openjdk.org  Mon Sep  8 06:16:13 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 06:16:13 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <sqxqDGsPq5F9VCNoJRS05ERzJ5g_vQU2tZ8Jg9Kbja0=.db920a99-3ccb-4c7d-9773-9fc6ff9f5fa6@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>
 <sqxqDGsPq5F9VCNoJRS05ERzJ5g_vQU2tZ8Jg9Kbja0=.db920a99-3ccb-4c7d-9773-9fc6ff9f5fa6@github.com>
Message-ID: <9l9xVu-ih86ETpJvp7_L85Jlzzv-lOpiMGR2C004T4E=.48f81a2d-d964-445b-a448-5b2184e4b6d6@github.com>

On Mon, 8 Sep 2025 06:05:02 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/superwordVTransformBuilder.cpp line 154:
>> 
>>> 152:   Node* p0 = pack->at(0);
>>> 153:   const VTransformVectorNodePrototype prototype = VTransformVectorNodePrototype::make_from_pack(pack, _vloop_analyzer);
>>> 154:   const int  sopc = prototype.scalar_opcode();
>> 
>> Suggestion:
>> 
>>   const int sopc = prototype.scalar_opcode();
>> 
>> Nit: whitespace
>> Or were you trying to align with the line below? Personally, I find this a bit too much, but up to you.
>
> Yes, I was trying to get alignment. I'll try some alternatives.

Variant 1: no alignment

~  787   uint vlen = vector_length();
~  788   int vopc = _vector_opcode;
   789   BasicType bt = element_basic_type();
   790   const TypeVect* vt = TypeVect::make(bt, vlen);

It looks a bit noisy to me.

Variant 2: align on assignment operator

~  787   uint vlen          = vector_length();                                                                                                                                                                                                                                                                                                                                                                                              
~  788   int vopc           = _vector_opcode;
~  789   BasicType bt       = element_basic_type();
   790   const TypeVect* vt = TypeVect::make(bt, vlen);

Better. But somehow I'd still prefer if the names were also aligned. Question if left or right aligned looks better.

3a

~  787   uint            vlen = vector_length();                                                                                                                                                                                                                                                                                                                                                                                            
~  788   int             vopc = _vector_opcode;
~  789   BasicType       bt   = element_basic_type();
~  790   const TypeVect* vt   = TypeVect::make(bt, vlen);

3b

~  787   uint          vlen = vector_length();
~  788   int           vopc = _vector_opcode;
~  789   BasicType       bt = element_basic_type();                                                                                                                                                                                                                                                                                                                                                                                         
   790   const TypeVect* vt = TypeVect::make(bt, vlen);


Personally the last one looks the calmest to me. But it's a bit of a funky choice compared to the rest of hotspot, so it is probably just my own brain that thinks it is good.
I think I'm going with variant 2, since that is a little less controversial I think, and still a bit better than no alignment at all.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329234097

From epeter at openjdk.org  Mon Sep  8 06:21:11 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 06:21:11 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <9l9xVu-ih86ETpJvp7_L85Jlzzv-lOpiMGR2C004T4E=.48f81a2d-d964-445b-a448-5b2184e4b6d6@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>
 <sqxqDGsPq5F9VCNoJRS05ERzJ5g_vQU2tZ8Jg9Kbja0=.db920a99-3ccb-4c7d-9773-9fc6ff9f5fa6@github.com>
 <9l9xVu-ih86ETpJvp7_L85Jlzzv-lOpiMGR2C004T4E=.48f81a2d-d964-445b-a448-5b2184e4b6d6@github.com>
Message-ID: <J3UV6077kae5b4vyFB65fijBhb9REpCXEQyaurZpDsM=.ee0c2236-f0eb-4f04-863e-0e06ae71437c@github.com>

On Mon, 8 Sep 2025 06:13:21 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Yes, I was trying to get alignment. I'll try some alternatives.
>
> Variant 1: no alignment
> 
> ~  787   uint vlen = vector_length();
> ~  788   int vopc = _vector_opcode;
>    789   BasicType bt = element_basic_type();
>    790   const TypeVect* vt = TypeVect::make(bt, vlen);
> 
> It looks a bit noisy to me.
> 
> Variant 2: align on assignment operator
> 
> ~  787   uint vlen          = vector_length();                                                                                                                                                                                                                                                                                                                                                                                              
> ~  788   int vopc           = _vector_opcode;
> ~  789   BasicType bt       = element_basic_type();
>    790   const TypeVect* vt = TypeVect::make(bt, vlen);
> 
> Better. But somehow I'd still prefer if the names were also aligned. Question if left or right aligned looks better.
> 
> 3a
> 
> ~  787   uint            vlen = vector_length();                                                                                                                                                                                                                                                                                                                                                                                            
> ~  788   int             vopc = _vector_opcode;
> ~  789   BasicType       bt   = element_basic_type();
> ~  790   const TypeVect* vt   = TypeVect::make(bt, vlen);
> 
> 3b
> 
> ~  787   uint          vlen = vector_length();
> ~  788   int           vopc = _vector_opcode;
> ~  789   BasicType       bt = element_basic_type();                                                                                                                                                                                                                                                                                                                                                                                         
>    790   const TypeVect* vt = TypeVect::make(bt, vlen);
> 
> 
> Personally the last one looks the calmest to me. But it's a bit of a funky choice compared to the rest of hotspot, so it is probably just my own brain that thinks it is good.
> I think I'm going with variant 2, since that is a little less controversial I think, and still a bit better than no alignment at all.

I'm also refactoring away some of the assignments, and put the value directly at the use-site.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329242463

From fyang at openjdk.org  Mon Sep  8 06:39:12 2025
From: fyang at openjdk.org (Fei Yang)
Date: Mon, 8 Sep 2025 06:39:12 GMT
Subject: RFR: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture
In-Reply-To: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
References: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
Message-ID: <GEuR7djbS47PeURxdISIHeJSFFWyWiTSoIdvAiHigN4=.f7102211-7031-47ae-a277-67f069be2c7d@github.com>

On Mon, 8 Sep 2025 05:13:32 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
> Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.
> 
> ### Test
> - [x] Run tier1 and tier2 on sg2042

Looks good. Seems a leftover when adding support for compressed instructions.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27134#pullrequestreview-3195142444

From chagedorn at openjdk.org  Mon Sep  8 06:47:12 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 8 Sep 2025 06:47:12 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <wleNFVd9vTR5L67ePguMMpPaaxbZ2qlAk-5zVAYNL68=.203526b6-f0b8-4265-962e-f81b6663a4ce@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <nnUjRLzWULRJlT6s8JO8uKa2mULnxKUrBawK1cXfBcg=.57127717-cbd1-46e9-8ba2-05b3478e233e@github.com>
 <-ntIikGlF88WGhDkEeTA2rU7xWHBuSRlaLUVwDvDzUY=.b482c2ed-ddc5-46e8-b7ad-2ee8e9f8fd67@github.com>
 <wleNFVd9vTR5L67ePguMMpPaaxbZ2qlAk-5zVAYNL68=.203526b6-f0b8-4265-962e-f81b6663a4ce@github.com>
Message-ID: <dIP9rRF7-1wN88o5Tjc6lXHnwndimw2T5I3qYZ374Ms=.8db9e2e6-44fa-41b5-a686-45268661f5b2@github.com>

On Mon, 8 Sep 2025 06:01:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> we can just use the CountedLoop as control, which is good enough
>> 
>> For my understanding: this is because we can only vectorize if there are no other control dependencies in the loop?
>
> Setting control is only for PhaseIdealLoop. It sets the internal ctrl that other loop-opts would rely on if we kept on optimizing the loop in the same PhaseIdealLoop. But if we set major progress, that means that we have messed up the graph so much that the state of PhaseIdealLoop may no longer be correct/accurate enough.
> 
> So if we set major progress, we are essencially allowed to mess up the PhaseIdealLoop state. I still have to set ctrl, but it does not matter if it is not correct ;)

I think it should generally still be correct but might not need to be as accurate as possible. You'll never know if some code will rely on the correctness later in the same loop opts - might not today but at some point, especially when trying to add some verification code. So, IIUC, you are just more conservative/less accurate now while still being correct. Maybe you can tweak the comment to express that more clearly since "not always correct" could also imply some actual illegal control.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329284434

From fjiang at openjdk.org  Mon Sep  8 06:55:10 2025
From: fjiang at openjdk.org (Feilong Jiang)
Date: Mon, 8 Sep 2025 06:55:10 GMT
Subject: RFR: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture
In-Reply-To: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
References: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
Message-ID: <jncDSCCYnkFp2xHzaoS53jPRMfSKQ73A6BbtWUBYCDU=.49b10ac5-bc16-4753-a8cd-641e33797e70@github.com>

On Mon, 8 Sep 2025 05:13:32 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
> Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.
> 
> ### Test
> - [x] Run tier1 and tier2 on sg2042

Looks good. Thanks for finding this!

-------------

Marked as reviewed by fjiang (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27134#pullrequestreview-3195188030

From epeter at openjdk.org  Mon Sep  8 07:00:54 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 07:00:54 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v2]
In-Reply-To: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
Message-ID: <Xx-1QbUjNJoxeX5bJGS0VBt4PXBXA7txNSR_ldSVao4=.52714db0-cb04-4b10-9fca-771d507a662c@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  review comment implemented

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27056/files
  - new: https://git.openjdk.org/jdk/pull/27056/files/05ee2800..9bc510e4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=00-01

  Stats: 49 lines in 3 files changed: 7 ins; 12 del; 30 mod
  Patch: https://git.openjdk.org/jdk/pull/27056.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27056/head:pull/27056

PR: https://git.openjdk.org/jdk/pull/27056

From epeter at openjdk.org  Mon Sep  8 07:00:54 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 07:00:54 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v2]
In-Reply-To: <dIP9rRF7-1wN88o5Tjc6lXHnwndimw2T5I3qYZ374Ms=.8db9e2e6-44fa-41b5-a686-45268661f5b2@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <nnUjRLzWULRJlT6s8JO8uKa2mULnxKUrBawK1cXfBcg=.57127717-cbd1-46e9-8ba2-05b3478e233e@github.com>
 <-ntIikGlF88WGhDkEeTA2rU7xWHBuSRlaLUVwDvDzUY=.b482c2ed-ddc5-46e8-b7ad-2ee8e9f8fd67@github.com>
 <wleNFVd9vTR5L67ePguMMpPaaxbZ2qlAk-5zVAYNL68=.203526b6-f0b8-4265-962e-f81b6663a4ce@github.com>
 <dIP9rRF7-1wN88o5Tjc6lXHnwndimw2T5I3qYZ374Ms=.8db9e2e6-44fa-41b5-a686-45268661f5b2@github.com>
Message-ID: <ozMHcKVAc1sxqOHZsn4IH2kKzvhKEeEGsSLbSgLwxVk=.cb616292-5bc0-409b-8504-434ed63a81ee@github.com>

On Mon, 8 Sep 2025 06:43:20 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Setting control is only for PhaseIdealLoop. It sets the internal ctrl that other loop-opts would rely on if we kept on optimizing the loop in the same PhaseIdealLoop. But if we set major progress, that means that we have messed up the graph so much that the state of PhaseIdealLoop may no longer be correct/accurate enough.
>> 
>> So if we set major progress, we are essencially allowed to mess up the PhaseIdealLoop state. I still have to set ctrl, but it does not matter if it is not correct ;)
>
> I think it should generally still be correct but might not need to be as accurate as possible. You'll never know if some code will rely on the correctness later in the same loop opts - might not today but at some point, especially when trying to add some verification code. So, IIUC, you are just more conservative/less accurate now while still being correct. Maybe you can tweak the comment to express that more clearly since "not always correct" could also imply some actual illegal control.

Adjusted the comment.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329312853

From chagedorn at openjdk.org  Mon Sep  8 08:04:14 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 8 Sep 2025 08:04:14 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v2]
In-Reply-To: <Xx-1QbUjNJoxeX5bJGS0VBt4PXBXA7txNSR_ldSVao4=.52714db0-cb04-4b10-9fca-771d507a662c@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <Xx-1QbUjNJoxeX5bJGS0VBt4PXBXA7txNSR_ldSVao4=.52714db0-cb04-4b10-9fca-771d507a662c@github.com>
Message-ID: <boun7peEgQOPBVf51XYcHtV7RGBS14O3DvN_O0NIbms=.f7332f17-3370-4dd6-88b3-8cd66b17af5e@github.com>

On Mon, 8 Sep 2025 07:00:54 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ---------------------------------
>> 
>> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>> 
>> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>> 
>> My vision:
>> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
>> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
>> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
>> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>>   - That means it is straight-forward to compute cost
>>   - And it also makes optimizations on that graph easier
>>   - And the `apply` methods are simpler too
>> 
>> ----------------------------------
>> 
>> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>> 
>> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>> 
>> What I did:
>> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>>   - Will make it easier to optimize and compute cost in future RFE's.
>> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
>> - New vector nodes, they are special cases I split away from ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review comment implemented

Nice refactoring! Just some small suggestions, otherwise, it looks good to me.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 154:

> 152:   Node* p0 = pack->at(0);
> 153:   const VTransformVectorNodePrototype prototype = VTransformVectorNodePrototype::make_from_pack(pack, _vloop_analyzer);
> 154:   const int sopc     = prototype.scalar_opcode();

You use this at other places already but could be more readable when renamed to `scalar_opc` or `scalar_opcode`.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 173:

> 171:     vtn = new (_vtransform.arena()) VTransformBoolVectorNode(_vtransform, prototype, kind);
> 172:   } else if (p0->is_CMove()) {
> 173:     vtn = new (_vtransform.arena()) VTransformElementWiseVectorNode(_vtransform, p0->req(), prototype, Op_VectorBlend);

You also seem to use `p0->req()` a lot. Should we create a `const` above for easier access? Could we also have a better name than `p0`? But again, you are using `p0` a lot at other places already and it might be evidently clear in this context.

src/hotspot/share/opto/vtransform.hpp line 600:

> 598: 
> 599: // Bundle the information needed for vector nodes.
> 600: class VTransformVectorNodePrototype : public StackObj {

Prototype sounds like actually having something concrete, not fully set up or just something to copy/clone from as a starting point. But IIUC, this class just serves as a holder class for some information. How about naming it `Prototype` -> `Properties`?

src/hotspot/share/opto/vtransform.hpp line 617:

> 615: 
> 616: public:
> 617:   static VTransformVectorNodePrototype make_from_pack(const Node_List* pack, const VLoopAnalyzer& vloop_analyzer) {

When switching to "Properties", you could also rename this to something like "fetch_from_pack" since `make` also suggests to actually creating a dummy-kind node when it's only trying to fetch useful information.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27056#pullrequestreview-3195283374
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329386959
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329409899
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329374048
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329392637

From shade at openjdk.org  Mon Sep  8 08:10:10 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 8 Sep 2025 08:10:10 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <U2EmBE3HG7bJwIATZZZ_I-2u-LLXuXEc50677VYSicQ=.4578989a-ea70-4708-8725-72d9e2a2275f@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
 <QunSTeuoPDHArXeK7cAKGYHYG8uy9S49l8XXAXrxfTc=.34152256-e559-45ee-b91d-2952663005ed@github.com>
 <U2EmBE3HG7bJwIATZZZ_I-2u-LLXuXEc50677VYSicQ=.4578989a-ea70-4708-8725-72d9e2a2275f@github.com>
Message-ID: <kFPBgSdiSWJhMCHLq55tVdzO6Z-WIklnAG7fgYVQ-jY=.c4cd103d-d340-4885-a394-a8139b195a62@github.com>

On Mon, 8 Sep 2025 01:22:46 GMT, erifan <duke at openjdk.org> wrote:

>> test/micro/org/openjdk/bench/jdk/incubator/vector/VectorExtractBenchmark.java line 34:
>> 
>>> 32: @Warmup(iterations = 5, time = 1)
>>> 33: @Measurement(iterations = 5, time = 1)
>>> 34: @Fork(value = 1, jvmArgs = {"--add-modules=jdk.incubator.vector"})
>> 
>> Don't do 1 fork, do at least 3.
>
> The test results show that this test is stable, so I think forking once is enough? We have many JMH benchmarks that fork once.

OK then.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27113#discussion_r2329468832

From epeter at openjdk.org  Mon Sep  8 08:14:54 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 08:14:54 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v3]
In-Reply-To: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
Message-ID: <ps40ZgeR7QzjAfQBt0veYA3HglDcC02BRNTEGHYOfvg=.16262ae0-3b03-4b3a-b2cf-7ea59c39ea42@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  prototype -> properties

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27056/files
  - new: https://git.openjdk.org/jdk/pull/27056/files/9bc510e4..8a63899a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=01-02

  Stats: 51 lines in 3 files changed: 0 ins; 0 del; 51 mod
  Patch: https://git.openjdk.org/jdk/pull/27056.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27056/head:pull/27056

PR: https://git.openjdk.org/jdk/pull/27056

From epeter at openjdk.org  Mon Sep  8 08:14:54 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 08:14:54 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v2]
In-Reply-To: <boun7peEgQOPBVf51XYcHtV7RGBS14O3DvN_O0NIbms=.f7332f17-3370-4dd6-88b3-8cd66b17af5e@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <Xx-1QbUjNJoxeX5bJGS0VBt4PXBXA7txNSR_ldSVao4=.52714db0-cb04-4b10-9fca-771d507a662c@github.com>
 <boun7peEgQOPBVf51XYcHtV7RGBS14O3DvN_O0NIbms=.f7332f17-3370-4dd6-88b3-8cd66b17af5e@github.com>
Message-ID: <gFNP92IB4gpvDcy_qcw4plDysG_zQyoebIGG1fAf-PE=.89ba2f2d-f48d-44c2-bb4b-1959f3102c82@github.com>

On Mon, 8 Sep 2025 07:25:47 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   review comment implemented
>
> src/hotspot/share/opto/vtransform.hpp line 600:
> 
>> 598: 
>> 599: // Bundle the information needed for vector nodes.
>> 600: class VTransformVectorNodePrototype : public StackObj {
> 
> Prototype sounds like actually having something concrete, not fully set up or just something to copy/clone from as a starting point. But IIUC, this class just serves as a holder class for some information. How about naming it `Prototype` -> `Properties`?

I like the idea :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329474779

From epeter at openjdk.org  Mon Sep  8 08:18:15 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 08:18:15 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v2]
In-Reply-To: <boun7peEgQOPBVf51XYcHtV7RGBS14O3DvN_O0NIbms=.f7332f17-3370-4dd6-88b3-8cd66b17af5e@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <Xx-1QbUjNJoxeX5bJGS0VBt4PXBXA7txNSR_ldSVao4=.52714db0-cb04-4b10-9fca-771d507a662c@github.com>
 <boun7peEgQOPBVf51XYcHtV7RGBS14O3DvN_O0NIbms=.f7332f17-3370-4dd6-88b3-8cd66b17af5e@github.com>
Message-ID: <KT4yvv3U8nV1El7_QlApOHmdxkBWOSoWHMKQ5KeQCvo=.f3d588fe-8b3a-48c4-925c-8b0d60a537b0@github.com>

On Mon, 8 Sep 2025 07:31:40 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   review comment implemented
>
> src/hotspot/share/opto/superwordVTransformBuilder.cpp line 154:
> 
>> 152:   Node* p0 = pack->at(0);
>> 153:   const VTransformVectorNodePrototype prototype = VTransformVectorNodePrototype::make_from_pack(pack, _vloop_analyzer);
>> 154:   const int sopc     = prototype.scalar_opcode();
> 
> You use this at other places already but could be more readable when renamed to `scalar_opc` or `scalar_opcode`.

We use `sopc` and `vopc` in many places in vectorization code - just grep for it ;)
Otherwise some lines just get much longer, and I think that makes the code less readable generally.

> src/hotspot/share/opto/vtransform.hpp line 617:
> 
>> 615: 
>> 616: public:
>> 617:   static VTransformVectorNodePrototype make_from_pack(const Node_List* pack, const VLoopAnalyzer& vloop_analyzer) {
> 
> When switching to "Properties", you could also rename this to something like "fetch_from_pack" since `make` also suggests to actually creating a dummy-kind node when it's only trying to fetch useful information.

I prefer the `make_...` naming. I think it is generally used as a "factory" prefix all over the place. `fetch` means we would be "loading" if from somewhere, and that's not what we do here - rather we just construct the `Properties` given the pack.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329483851
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329487912

From epeter at openjdk.org  Mon Sep  8 08:28:51 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 08:28:51 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v4]
In-Reply-To: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
Message-ID: <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  fix typo

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27056/files
  - new: https://git.openjdk.org/jdk/pull/27056/files/8a63899a..e3fe36ee

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=02-03

  Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27056.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27056/head:pull/27056

PR: https://git.openjdk.org/jdk/pull/27056

From epeter at openjdk.org  Mon Sep  8 08:28:51 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 08:28:51 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v2]
In-Reply-To: <boun7peEgQOPBVf51XYcHtV7RGBS14O3DvN_O0NIbms=.f7332f17-3370-4dd6-88b3-8cd66b17af5e@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <Xx-1QbUjNJoxeX5bJGS0VBt4PXBXA7txNSR_ldSVao4=.52714db0-cb04-4b10-9fca-771d507a662c@github.com>
 <boun7peEgQOPBVf51XYcHtV7RGBS14O3DvN_O0NIbms=.f7332f17-3370-4dd6-88b3-8cd66b17af5e@github.com>
Message-ID: <qeX0GbXtNfp7c_B-YNolW7JRzjszmOz1rn4HbCz7QGI=.9d4b34ab-aa95-4733-842d-c173133d075f@github.com>

On Mon, 8 Sep 2025 08:01:52 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   review comment implemented
>
> Nice refactoring! Just some small suggestions, otherwise, it looks good to me.

@chhagedorn Thanks for reviewing! I responded to all your suggestions :)

> src/hotspot/share/opto/superwordVTransformBuilder.cpp line 173:
> 
>> 171:     vtn = new (_vtransform.arena()) VTransformBoolVectorNode(_vtransform, prototype, kind);
>> 172:   } else if (p0->is_CMove()) {
>> 173:     vtn = new (_vtransform.arena()) VTransformElementWiseVectorNode(_vtransform, p0->req(), prototype, Op_VectorBlend);
> 
> You also seem to use `p0->req()` a lot. Should we create a `const` above for easier access? Could we also have a better name than `p0`? But again, you are using `p0` a lot at other places already and it might be evidently clear in this context.

Personally, I'd like to keep `p0`. An alternative is `first`. Or something even much longer that just inflates the code and does not make it more readable either. We also have `t0` and `s0` all over the SuperWord code. And honestly we do the same in all sorts of IGVN code as well, right?

We could make a `uint req = p0->req()` but I don't think that is more helpful. `req` is not a very great name but we are stuck with it because of the definition in `Node`. Detaching it from `p0` would probably not help but rather make it harder to read.

All of this is rather subjective though :/

If a second reviewer wants to see the change, I propose we do that in a separate RFE, and then consistently over the SuperWord code at large.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27056#issuecomment-3265144091
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329502044

From qxing at openjdk.org  Mon Sep  8 08:32:11 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Mon, 8 Sep 2025 08:32:11 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v11]
In-Reply-To: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
Message-ID: <yTWwSREwIf1mkt5RagBHspChSn7tELss49cweCZCSU0=.f7db92bb-059c-46fc-89e6-4dc9c908a4d3@github.com>

> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases.
> 
> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch:
> 
> 
> public static int numberOfNibbles(int i) {
>   int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i);
>   return Math.max((mag + 3) / 4, 1);
> }
> 
> 
> Testing: tier1, IR test

Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:

  Add proof of correstness comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25928/files
  - new: https://git.openjdk.org/jdk/pull/25928/files/f1c0b45a..5cfe39b6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=09-10

  Stats: 36 lines in 1 file changed: 36 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/25928.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928

PR: https://git.openjdk.org/jdk/pull/25928

From qxing at openjdk.org  Mon Sep  8 08:32:13 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Mon, 8 Sep 2025 08:32:13 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v10]
In-Reply-To: <8cq6Lhw9sc_Fd7adnL0t1F10UowOHDr8eEgZSD9MFUc=.d6b189a1-ac3e-4175-8e15-5e16691b6422@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
 <y1b0oyJhY7YkAtBpuou3hMv2aSy7SnM1M5Y5QH4oLi4=.6502c4df-4bdf-433f-840c-1de76de82c22@github.com>
 <bVqqmEXHIoBacby_IoOzCHbAt4nzoS4M6p-QySMh7gc=.57711175-230b-4ce2-85db-c050e4912509@github.com>
 <8cq6Lhw9sc_Fd7adnL0t1F10UowOHDr8eEgZSD9MFUc=.d6b189a1-ac3e-4175-8e15-5e16691b6422@github.com>
Message-ID: <2GMwFeMbO1iD2MoDsbJs0mAc6ayBANcl_XNS2c9lm4I=.dc9cc10b-ae8a-490c-9513-8eaca2890ab4@github.com>

On Tue, 19 Aug 2025 13:51:36 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/countbitsnode.cpp line 57:
>> 
>>> 55:   const TypeInt* ti = t->is_int();
>>> 56:   return TypeInt::make(count_leading_zeros_int(~ti->_bits._zeros),
>>> 57:                        count_leading_zeros_int(ti->_bits._ones),
>> 
>> I think this is correct, but I would like to see a short comment why it is correct.
>
> Same in other cases below

Updated.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2329519407

From chagedorn at openjdk.org  Mon Sep  8 08:45:17 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 8 Sep 2025 08:45:17 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v4]
In-Reply-To: <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
Message-ID: <ZPR_sB_guKS0lMVdVWH3y_jN4750qvUjRKpJTzZTHvo=.f4b57f53-f941-40ce-b7a9-ab48b3239110@github.com>

On Mon, 8 Sep 2025 08:28:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ---------------------------------
>> 
>> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>> 
>> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>> 
>> My vision:
>> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
>> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
>> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
>> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>>   - That means it is straight-forward to compute cost
>>   - And it also makes optimizations on that graph easier
>>   - And the `apply` methods are simpler too
>> 
>> ----------------------------------
>> 
>> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>> 
>> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>> 
>> What I did:
>> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>>   - Will make it easier to optimize and compute cost in future RFE's.
>> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
>> - New vector nodes, they are special cases I split away from ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix typo

Looks good, thanks for the updates!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27056#pullrequestreview-3195544178

From chagedorn at openjdk.org  Mon Sep  8 08:45:19 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 8 Sep 2025 08:45:19 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v2]
In-Reply-To: <qeX0GbXtNfp7c_B-YNolW7JRzjszmOz1rn4HbCz7QGI=.9d4b34ab-aa95-4733-842d-c173133d075f@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <Xx-1QbUjNJoxeX5bJGS0VBt4PXBXA7txNSR_ldSVao4=.52714db0-cb04-4b10-9fca-771d507a662c@github.com>
 <boun7peEgQOPBVf51XYcHtV7RGBS14O3DvN_O0NIbms=.f7332f17-3370-4dd6-88b3-8cd66b17af5e@github.com>
 <qeX0GbXtNfp7c_B-YNolW7JRzjszmOz1rn4HbCz7QGI=.9d4b34ab-aa95-4733-842d-c173133d075f@github.com>
Message-ID: <8mjKHo07OuMaW0hIahWBm_N-5RhK4umUzrOpV0wr8cs=.bca1780b-9091-4560-a3ab-01cf46d3ed1f@github.com>

On Mon, 8 Sep 2025 08:21:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Personally, I'd like to keep p0. An alternative is first. Or something even much longer that just inflates the code and does not make it more readable either. 

"first_node_in_pack" would be more understandable I think. But it's much longer than `p0` indeed. Does it matter here if we pick the first, second or just any other node? If not, than maybe "pack_node" would just be expressive enough? But anyways, as you point out, we already use `p0` all over the place. And doing an extensive renaming should be done in a separate task in one go and more people should agree to it before doing it.

> We also have t0 and s0 all over the SuperWord code. And honestly we do the same in all sorts of IGVN code as well, right?

Yes, I personally would prefer to have more names than abbreviations. But that's subjective again :-)

> We could make a uint req = p0->req() but I don't think that is more helpful. req is not a very great name but we are stuck with it because of the definition in Node. Detaching it from p0 would probably not help but rather make it harder to read.

What if you just name it `p0_req`? It was more about sharing and making it `const` since `p0` does not change. But feel free to leave it as it is.

> All of this is rather subjective though :/

Indeed...

>> src/hotspot/share/opto/vtransform.hpp line 617:
>> 
>>> 615: 
>>> 616: public:
>>> 617:   static VTransformVectorNodePrototype make_from_pack(const Node_List* pack, const VLoopAnalyzer& vloop_analyzer) {
>> 
>> When switching to "Properties", you could also rename this to something like "fetch_from_pack" since `make` also suggests to actually creating a dummy-kind node when it's only trying to fetch useful information.
>
> I prefer the `make_...` naming. I think it is generally used as a "factory" prefix all over the place. `fetch` means we would be "loading" if from somewhere, and that's not what we do here - rather we just construct the `Properties` given the pack.

I guess with `Properties` in the name, it's more clear now ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329554430
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2329553671

From epeter at openjdk.org  Mon Sep  8 09:06:33 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 09:06:33 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v4]
In-Reply-To: <ZPR_sB_guKS0lMVdVWH3y_jN4750qvUjRKpJTzZTHvo=.f4b57f53-f941-40ce-b7a9-ab48b3239110@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
 <ZPR_sB_guKS0lMVdVWH3y_jN4750qvUjRKpJTzZTHvo=.f4b57f53-f941-40ce-b7a9-ab48b3239110@github.com>
Message-ID: <kd6X3R4OLOxFg4K8qHyr2kEZvgqlk84D0hXkmtGaI8c=.ba03c410-ce18-44cc-a2e1-24e2fee92713@github.com>

On Mon, 8 Sep 2025 08:42:51 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix typo
>
> Looks good, thanks for the updates!

@chhagedorn Thanks for the approval!

@mhaessig is on vacation - so I'm hoping someone else can help review here ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27056#issuecomment-3265317349

From dfenacci at openjdk.org  Mon Sep  8 09:29:26 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Mon, 8 Sep 2025 09:29:26 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v3]
In-Reply-To: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
Message-ID: <jR9U9f8GNW0wPSQZD_UYRJf4hwGCCP7umyTm0bNDz4o=.ed4f44bd-57e7-4b2a-83b3-c8da05609dc4@github.com>

> # Issue
> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered
> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235
> 
> # Cause
> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens:
> * we insert a trailing `MemBarStoreStore` in the constructor
> <img height="200" alt="before_folding" src="https://github.com/user-attachments/assets/c1aab634-808d-4198-94ac-8093c6b85c5d" />
> 
> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. 
> <img height="200" alt="after_folding" src="https://github.com/user-attachments/assets/568e9fc3-5f19-4e10-a72e-f0a5e772daed" />
> 
> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302
> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235
> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier
> 
> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped).
> 
> # Fix
> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation.
> 
> # Testing
> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after.
> Tier 1-3+ tests passed.

Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:

  JDK-8360031: add MemBarStoreStore node to worklist during escape analysis/adapt remove assert

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26556/files
  - new: https://git.openjdk.org/jdk/pull/26556/files/f7bc08c9..57073b96

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26556&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26556&range=01-02

  Stats: 6 lines in 2 files changed: 0 ins; 4 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/26556.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26556/head:pull/26556

PR: https://git.openjdk.org/jdk/pull/26556

From dfenacci at openjdk.org  Mon Sep  8 09:29:27 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Mon, 8 Sep 2025 09:29:27 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v2]
In-Reply-To: <Oq3YTqRpayxl8xuTMum0szW-ILHPr_Yq_yYGiY0Yfww=.a0bcfb72-6863-497e-8bfd-dc9bff4ec140@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
 <5CGrcWjFZ7Zqj_Tm0LO6Tqg9cUA-xxvcaa2J-yWW8BE=.af4dea7c-e39d-491d-b924-c89fa82e757a@github.com>
 <Oq3YTqRpayxl8xuTMum0szW-ILHPr_Yq_yYGiY0Yfww=.a0bcfb72-6863-497e-8bfd-dc9bff4ec140@github.com>
Message-ID: <Gz5X5_33-5aAmjEUbfOV144qRnRMecD4zPXAg3t__yY=.197969f0-ea30-48b3-b70b-f405b7354029@github.com>

On Fri, 5 Sep 2025 09:45:38 GMT, Dean Long <dlong at openjdk.org> wrote:

> What happens in the replay crash is the MemBarStoreStore gets onto the worklist through an indirect route in ConnectionGraph::split_unique_types() because of its memory edge.

Oh I see! Thanks @dean-long! I noticed that `MemBarStoreStore` was added later on but didn't really figure out where/why.
 
> I think the conservative fix is to have compute_escape() always add the MemBarStoreStore to the worklist if it has a Precedent edge. Because of StressIGVN randomizing the worklist, I think the outcnt() can be 1 for either MemBarStoreStore or MemBarRelease, so we should relax the assert accordingly. I'm not sure how useful the assert will be after that. It might be better to remove it.

I made `compute_escape` add `MemBarStoreStore` to the worklist. By doing so the assert doesn't trigger anymore with the reproducer but, as you wrote, there seems to be no reason why `outcnt()` couldn't be 1 for `MemBarStoreStore` or `MemBarRelease`. So I modified the assert to only leave the `outcnt() <=2` part.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3265399663

From qxing at openjdk.org  Mon Sep  8 09:32:58 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Mon, 8 Sep 2025 09:32:58 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v12]
In-Reply-To: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
Message-ID: <RiSS9ADKu_Krqu_orRPOogzbFaO_5uQ1nbscVRGOnoQ=.5572cd29-d7c9-47f1-9547-feb0a5585814@github.com>

> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases.
> 
> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch:
> 
> 
> public static int numberOfNibbles(int i) {
>   int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i);
>   return Math.max((mag + 3) / 4, 1);
> }
> 
> 
> Testing: tier1, IR test

Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:

  Add more constant folding tests for CLZ/CTZ

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25928/files
  - new: https://git.openjdk.org/jdk/pull/25928/files/5cfe39b6..d09d4cb0

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=10-11

  Stats: 279 lines in 1 file changed: 223 ins; 0 del; 56 mod
  Patch: https://git.openjdk.org/jdk/pull/25928.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928

PR: https://git.openjdk.org/jdk/pull/25928

From qxing at openjdk.org  Mon Sep  8 09:32:59 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Mon, 8 Sep 2025 09:32:59 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v12]
In-Reply-To: <bVqqmEXHIoBacby_IoOzCHbAt4nzoS4M6p-QySMh7gc=.57711175-230b-4ce2-85db-c050e4912509@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
 <y1b0oyJhY7YkAtBpuou3hMv2aSy7SnM1M5Y5QH4oLi4=.6502c4df-4bdf-433f-840c-1de76de82c22@github.com>
 <bVqqmEXHIoBacby_IoOzCHbAt4nzoS4M6p-QySMh7gc=.57711175-230b-4ce2-85db-c050e4912509@github.com>
Message-ID: <u8c83QRn__WCvD982glAuAUMWpKn6z-U_AU2IyNVRxo=.b469a529-e0fa-466e-8dfb-d477c50c98c4@github.com>

On Tue, 19 Aug 2025 13:54:31 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add more constant folding tests for CLZ/CTZ
>
> src/hotspot/share/opto/countbitsnode.cpp line 47:
> 
>> 45:     if (x >> 30 == 0) { n +=  2; x <<=  2; }
>> 46:     n -= x >> 31;
>> 47:     return TypeInt::make(n);
> 
> Is there already a test that covers all the cases that constant fold here? Just to make sure we do not get regressions here.

Added in IR test `TestCountBitsRange.java`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2329682681

From mbaesken at openjdk.org  Mon Sep  8 10:26:09 2025
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 8 Sep 2025 10:26:09 GMT
Subject: RFR: 8366775: TestCompileTaskTimeout should use timeoutFactor
In-Reply-To: <g0rIi1y3ncfP3t1Bju_TJW3Gge7bpFH8I5rh2Unj4Cw=.3738907d-7f4c-4fef-b70a-d68c0bf80c16@github.com>
References: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
 <g0rIi1y3ncfP3t1Bju_TJW3Gge7bpFH8I5rh2Unj4Cw=.3738907d-7f4c-4fef-b70a-d68c0bf80c16@github.com>
Message-ID: <pbIbE84GD_g5R1sfxVBiCETZyNgOXW7kPT74pAn0Z7I=.80643c68-5f95-4f38-90c1-b92a6b51e24b@github.com>

On Thu, 4 Sep 2025 15:11:14 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> Is the reduced default a problem on your side?
I added the PR to our build/test queue. The reduced default should be okay for us.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27094#issuecomment-3265626746

From epeter at openjdk.org  Mon Sep  8 11:03:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 11:03:32 GMT
Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some
 small trip counts [v2]
In-Reply-To: <ezYpgocDEtRwma9bmy95F6Ia6Tl7T05HjQwb1RVuzhg=.b465eae1-403b-4924-8ab7-cb39dd4e4b7c@github.com>
References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com>
 <ezYpgocDEtRwma9bmy95F6Ia6Tl7T05HjQwb1RVuzhg=.b465eae1-403b-4924-8ab7-cb39dd4e4b7c@github.com>
Message-ID: <aSz1y8hezDp5vbEekgVKcNl1Rnnx0VCk6bYvI24He-4=.eb07a978-a2ae-4323-91ae-a09da7c05fad@github.com>

On Wed, 3 Sep 2025 16:55:45 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> In C2's loop optimization, for a counted loop, if we have any of these conditions (RCE, unrolling) met, we switch to the
>> `pre-main-post-loop` model. Then a counted loop could be split into `pre-main-post` loops. Meanwhile, C2 inserts minimum trip guards (a.k.a. zero-trip guards) before the main loop and the post loop. These guards test if the remaining trip count is less than the loop stride (after unrolling). If yes, the execution jumps over the loop code to avoid loop over-running. For example, if a main loop is unrolled to `8x`, the main loop guard tests if the loop has less than `8` iterations and then decide which way to go.
>> 
>> Usually, the vectorized main loop will be super-unrolled after vectorization. In such cases, the main loop's stride is going to be further multiplied. After the main loop is super-unrolled, the minimum trip guard test will be updated. Assuming one vector can operate `8` iterations and the super-unrolling count is `4`, the trip guard of the main loop will test if remaining trip is less than `8 * 4 = 32`.
>> 
>> To avoid the scalar post loop running too many iterations after super-unrolling, C2 clones the main loop before super-unrolling to create a vectorized drain loop. The newly inserted post loop also has a minimum trip guard. And, both trip guards of the main loop and the vectorized drain loop jump to the scalar post loop.
>> 
>> The problem here is, if the remaining trip count when exiting from the pre-loop is relatively small but larger than the vector length, the vectorized drain loop will never be executed. Because the minimum trip guard test of main loop fails, the execution will jump over both the main loop and the vectorized drain loop. For example, in the above case, a loop still has `25` iterations after the pre-loop, we may run `3` rounds of the vectorized drain loop but it's impossible. It would be better if the minimum trip guard test of the main loop does not jump over the vectorized drain loop.
>> 
>> This patch is to improve it by modifying the control flow when the minimum trip guard test of the main loop fails. Obviously, we need to sync all data uses and control uses to adjust to the change of control flow.
>> 
>> The whole process is done by the function `insert_post_loop()`.
>> 
>> We introduce a new `CloneLoopMode`, `InsertVectorizedDrain`. When we're cloning the vector main loop to vectorized drain loop with mode `InsertVectorizedDrain`:
>> 
>> 1. The fall-in control flow to the vectorized drain loop comes fr...
>
> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits:
> 
>  - Merge branch 'master' into optimize-atomic-post
>  - Clean up comments for consistency and add spacing for readability
>  - Fix some corner case failures and refined part of code
>  - Merge branch 'master' into optimize-atomic-post
>  - Refine ascii art, rename some variables and resolve conflicts
>  - Merge branch 'master' into optimize-atomic-post
>  - Add necessary ASCII art, refactor insert_post_loop() and rename
>    "atomic post loop" with "vectorized drain loop.
>  - Merge branch 'master' into optimize-atomic-post
>  - 8307084: C2: Vector atomic post loop is not executed for some small trip counts
>    
>    In C2's loop optimization, for a counted loop, if we have any of
>    these conditions (RCE, unrolling) met, we switch to the
>    pre-main-post-loop model. Then a counted loop could be split into
>    pre-main-post loops. Meanwhile, C2 inserts minimum trip guards
>    (a.k.a. zero-trip guards) before the main loop and the post loop.
>    These guards test if the remaining trip count is less than the
>    loop stride (after unrolling). If yes, The execution jumps over
>    the loop code to avoid loop over-running. For example, if a main
>    loop is unrolled to 8x, the main loop guard tests if the loop has
>    less than 8 iterations and then decide which way to go.
>    
>    Usually, the vectorized main loop will be super-unrolled after
>    vectorization. In such cases, the main loop's stride is going to
>    be further multiplied. After the main loop is super-unrolled, the
>    minimum trip guard test will be updated. Assuming one vector can
>    operate 8 iterations and the super-unrolling count is 4, the trip
>    guard of the main loop will test if remaining trip is less than
>    8 * 4 = 32.
>    
>    To avoid the scalar post loop running too many iterations after
>    super-unrolling, C2 clones the main loop before super-unrolling to
>    create a vector drain loop, i.e. atomic post loop. The newly
>    inserted post loop also has a minimum trip guard. And, both trip
>    guards of the main loop and vector post loop jump to the scalar
>    post loop.
>    
>    The problem here is, if the remaining trip count when exiting from
>    the pre-loop is relatively small but larger than the vector length,
>    the vector atomic post loop will never be executed. Because the
>    minimum trip guard test of main loop fails, the execution will
>    jump over both the main loop and the atomic p...

I'm really impressed by this change. This was a lot of work @fg1417 !

Please don't be discouraged by the many comments / suggestions. A lot of them are minor code style issues, so should be quick to address.

I have not made it through every detail yet, but I'll get there in the next cycle.

I have one concern:
We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch. It would also be interesting to see a case where the SIZE of the array is not constant, and so the branches become impossible to predict, and there are a lot of branch misses. What do you think?

I would also suggest that @chhagedorn or @rwestrel should review this patch, since they are much more familiar with loop-opts structures than I.

I'm super happy that you are putting the time in for this. I think it is a really important task that closes a gap in the small-iteration space. And that is actually quite important :)

src/hotspot/share/opto/loopTransform.cpp line 1325:

> 1323: //   - Clone 'n' into 'preheader_ctrl' if its block does not strictly dominate 'preheader_ctrl'.
> 1324: //   - Otherwise, return 'n'.
> 1325: Node *PhaseIdealLoop::clone_up_backedge_goo(Node *back_ctrl, Node *preheader_ctrl, Node *n, VectorSet &visited, Node_Stack &clones) {

Could you please add a general comment about what this does at the top? The name is a bit funny with `goo`, but that's not your fault. If you have a better name feel free to rename ;)

src/hotspot/share/opto/loopTransform.cpp line 1332:

> 1330:     if (!requires_clone_from_preloop_exit) return n;
> 1331:   } else {
> 1332:     if (get_ctrl(n) != back_ctrl) return n;

Suggestion:

    if (!requires_clone_from_preloop_exit) { return n; }
  } else {
    if (get_ctrl(n) != back_ctrl) { return n; }

We generally like to be explicit with the brackets

src/hotspot/share/opto/loopTransform.cpp line 1394:

> 1392: // now we need to make the fall-in values to the vectorized drain
> 1393: // loop come from phis merging exit values from the pre loop and
> 1394: // the main loop.

Suggestion:

// After inserting zero trip guard for the vectorized drain loop,
// we now need to make the fall-in values to the vectorized drain
// loop come from phis merging exit values from the pre loop and
// the main loop.

src/hotspot/share/opto/loopTransform.cpp line 1437:

> 1435: // We look for an existing Phi node 'drain_input' among the uses of 'main_incr'.
> 1436: // If no valid Phi is found, we create a new Phi that merges output data edges
> 1437: // from both the pre-loop and main loop.

Why can that happen? Do you have a small example?

src/hotspot/share/opto/loopTransform.cpp line 1848:

> 1846: //               /        /
> 1847: //              after loop
> 1848: Node* PhaseIdealLoop::insert_post_loop(IdealLoopTree* loop, Node_List& old_new,

Consider renaming to `insert_post_or_drain_loop`

src/hotspot/share/opto/loopTransform.cpp line 1865:

> 1863:   int dd_main_exit = dom_depth(main_exit);
> 1864: 
> 1865:   // Step 1: Clone the loop body of main loop. The clone becomes the new loop.

Suggestion:

  // Step 1: Clone the loop body of main loop. The clone becomes the new loop (post or drain).

src/hotspot/share/opto/loopTransform.cpp line 1887:

> 1885:     // from the main loop and the pre loop.
> 1886:     zero_ctrl = main_exit->unique_ctrl_out_or_null();
> 1887:     assert(zero_ctrl, "if zero_ctrl doesn't exist, pre-main-post model fails.");

Style guide forbids implicit null / zero checks.
Suggestion:

    assert(zero_ctrl != nullptr, "if zero_ctrl doesn't exist, pre-main-post model fails.");

src/hotspot/share/opto/loopTransform.cpp line 1910:

> 1908:   // Step 2.2: Find 'exit_point', which is taken when zero trip guard fails.
> 1909:   Node* exit_point = nullptr;
> 1910:   uint replace_idx = 0;

Why not name it `exit_ctrl` and `exit_ctrl_idx`? Maybe you have a better name, but I'd make sure that they have a parallel name so it is obvious that they belong together.

src/hotspot/share/opto/loopTransform.cpp line 1927:

> 1925:       assert(exit_point->in(replace_idx) == zero_ctrl,
> 1926:              "The zero_ctrl should be the second input");
> 1927:     )

Here, `exit_point` is a region, right? Why not assert that?
Also: thee is no need to wrap an `assert` in a `DEBUG_ONLY` - it only makes sense if you define local variables ;)

src/hotspot/share/opto/loopTransform.cpp line 1934:

> 1932: 
> 1933:   // Step 3: Find a 'new_phi' which is the input trip count of the zero trip guard.
> 1934:   Node* new_incr = nullptr;

Is it called `new_phi` or `new_incr`?

src/hotspot/share/opto/loopTransform.cpp line 1951:

> 1949:       Node* cmp  = main_guard_opaq->unique_out();
> 1950:       Node* pre_incr = cmp->in(1);
> 1951:       assert(new_incr && new_incr->in(1) == pre_incr && new_incr->in(2) == main_incr, "");

Suggestion:

      assert(new_incr != nullptr && new_incr->in(1) == pre_incr && new_incr->in(2) == main_incr, "");

No implicit null check

src/hotspot/share/opto/loopTransform.cpp line 1965:

> 1963:   // trip guard until all unrolling is done.
> 1964:   // For example, when we're inserting vectorized drain loop, after several steps above,
> 1965:   // the loop structure is showed in the comments for handle_data_uses_for_vectorized_drain().

Which "several steps" are your referencing here?
- The steps 1-3 from above?
- Or several steps further down, to the point we draw in `handle_data_uses_for_vectorized_drain`?
Can you please reformulate a bit?

src/hotspot/share/opto/loopTransform.cpp line 2090:

> 2088:         _igvn.hash_delete(post_phi);
> 2089:         post_phi->set_req(LoopNode::EntryControl, fallnew);
> 2090:       }

Looks like a bit much code duplication. But maybe that is justified here. Up to you.

src/hotspot/share/opto/loopnode.hpp line 1359:

> 1357:   // from old-loop now should use new Phis that merges Phis which merges
> 1358:   // values from pre-loop and main-loop and values from the new-loop
> 1359:   // (vectorized drain loop) equivalents.

I'm struggling with reading this. "x that merges y that merges z and w and v" - where do I have to place the brackets?
`x that merges y that (merges z and w) and v`
Probably this?

src/hotspot/share/opto/loopnode.hpp line 1410:

> 1408: 
> 1409:   // Add post loop after the given loop.
> 1410:   Node* insert_post_loop(IdealLoopTree* loop, Node_List& old_new,

Consider renamint to `insert_post_or_drain_loop`, and adjust comment above.

src/hotspot/share/opto/loopnode.hpp line 1434:

> 1432:   Node* get_vectorized_drain_input(Node* main_backedge_ctrl, VectorSet& visited,
> 1433:                                    Node_Stack& clones, Node* main_merge_region,
> 1434:                                    Node* main_phi);

We don't just do this for the trip-counter though, right? Because the `main_incr` suggests that a bit here. Could you rephrase to make it more accurate? Do you think that could be worth it? It is also nice to have the analogy to the trip-counter, so I like that in the example ASCII art.

src/hotspot/share/opto/loopopts.cpp line 2341:

> 2339: // Take the loop increment "i" as an example.
> 2340: // Now the data uses about "i" are like:
> 2341: 

Nit: I would do the `//` continuously, like elsewhere.

src/hotspot/share/opto/loopopts.cpp line 2351:

> 2349: //        |       /
> 2350: //        |     /
> 2351: //   main zero-trip guard

Kinda subjective, but I'd prefer if the corners were the other way around ;)
Suggestion:

//   -----> pre loop head ...
//   |      |          \  /
//  IfTrue  |  ----->  PhiNode
//   |      v  |         |
//  loop end   ------ addI('pre_incr')
//        |           /
//    IfFalse       /
//        |       /
//        |     /
//   main zero-trip guard

Otherwise I'm wondering if the line may continue further up and just be cropped? Of course not.
Putting the IfTrue above the `loop end` can also be a little confusing. But it does save space. But not much. You could just extend the picture a little further to the right.

Sorry, this is very much a nit, so feel free to ignore ;)

src/hotspot/share/opto/loopopts.cpp line 2356:

> 2354: //   /       |  \_____________________________
> 2355: //  /        |                                \
> 2356: //  |   |--> main loop head             |---> vectorized drain loop head

Is there some IfNode here that decides between main and drain? Or does that come later?
Suggestion:

// IfFalse  IfTrue
//   /       |  ________(moved later)________
//  /        |                                \
//  |   |--> main loop head             |---> vectorized drain loop head

src/hotspot/share/opto/loopopts.cpp line 2394:

> 2392: // The data uses will become:
> 2393: // (new edges are marked with "*/*" or "*\*".)
> 2394: 

Again: use trip-counter phi instead of `i`

src/hotspot/share/opto/loopopts.cpp line 2435:

> 2433: void PhaseIdealLoop::handle_data_uses_for_vectorized_drain(Node* main_old, Node_List &old_new,
> 2434:                                                            IdealLoopTree* loop, IdealLoopTree* outer_loop,
> 2435:                                                            Node_List& worklist, uint new_counter) {

`handle` is a very generic verb. `fix` is already used elsewhere for the same purpose, so why not use that instead?

Maybe `fix_data_uses_with_drain_merge_phis`?
I'll have to read the code below to make sure that makes sense now.

src/hotspot/share/opto/loopopts.cpp line 2452:

> 2450:         _igvn.replace_node(use, hit);
> 2451:     }
> 2452:   };

The existing code style is to avoid lambdas and use helper methods instead. Would that be possible here? Probably just requires a few more arguments, right? `new_counter` for example.

src/hotspot/share/opto/loopopts.cpp line 2455:

> 2453: 
> 2454:   for (DUIterator_Fast jmax, j = main_old->fast_outs(jmax); j < jmax; j++)
> 2455:     worklist.push(main_old->fast_out(j));

Please use explicit {} everywhere :)

src/hotspot/share/opto/loopopts.cpp line 2458:

> 2456: 
> 2457:   while (worklist.size()) {
> 2458:     Node* use = worklist.pop();

Can you add a quick comment what kind of traversal this is? BFS? Over what nodes?

src/hotspot/share/opto/loopopts.cpp line 2461:

> 2459:     if (!has_node(use)) continue; // Ignore dead nodes
> 2460:     if (use->in(0) == C->top()) continue;
> 2461:     IdealLoopTree* use_loop = get_loop(has_ctrl(use) ? get_ctrl(use) : use);

Could you do this with `ctrl_or_self` instead?

src/hotspot/share/opto/loopopts.cpp line 2466:

> 2464:       // Find the phi node merging the data from pre-loop and vector main-loop.
> 2465:       Node_List visit_list;
> 2466:       Node_List phi_list;

You are doing this in a loop. And you set no `ResouceMark`. I'm afraid this could end up allocating a lot of memory. What do you think?

src/hotspot/share/opto/loopopts.cpp line 2475:

> 2473:       // Use BFS to clone all necessary nodes starting from the 'use' node, which exits the main loop,
> 2474:       // until reaching a merge point with a path from the pre-loop.
> 2475:       while (visit_list.size()) {

Suggestion:

      while (visit_list.size() != 0) {

src/hotspot/share/opto/loopopts.cpp line 2477:

> 2475:       while (visit_list.size()) {
> 2476:         Node* curr = visit_list.at(0);
> 2477:         visit_list.remove(0);

That `remove` ends up calling `Node_Array::remove`, which copies all upper entries. Generally not very performant. Not sure if it matters here, just noticed it.

src/hotspot/share/opto/loopopts.cpp line 2481:

> 2479:         if (newcurr) {
> 2480:           continue;
> 2481:         }

Suggestion:

        if (newcurr != nullptr) { continue; }

src/hotspot/share/opto/loopopts.cpp line 2514:

> 2512:           assert(!has_ctrl(outn) || !has_ctrl(curr) || is_dominator(get_ctrl(curr), get_ctrl(outn)),
> 2513:                  "Only these nodes controlled by loop exit edge need to be cloned");
> 2514:           visit_list.push(outn);

Might we visit nodes more than once? Or is that already prevented?

src/hotspot/share/opto/loopopts.cpp line 2518:

> 2516:       }
> 2517: 
> 2518:       // 'use' may have more than one valid "Phi" uses.

Example?

src/hotspot/share/opto/loopopts.cpp line 2844:

> 2842: // from old-loop now should use new Phis that merges Phis which merges
> 2843: // values from pre-loop and main-loop and values from the new-loop
> 2844: // (vectorized drain loop) equivalents.

Same issue as above.

Nit: Language should also just declare what the new form "is", not what is "should" be.
Nit: use space after period `.All` -> `. All`

src/hotspot/share/opto/loopopts.cpp line 2921:

> 2919:                                               worklist, new_counter);
> 2920:       }
> 2921:       break;

Do we need to do both `fix_data_uses` and `handle_data_uses_for_vectorized_drain`? Ah, they do it one for the old and one for the new loop?

It is kinda funny that we do a loop here for the `old` loop, but then do the loop inside `fix_data_uses` for the other loop - did I understand this right? Is there a good way to refactor this a little? We can also do that in a separate RFE first maybe? Because now with the large switch case here things are harder to read and get an overview quickly. What do you think?

test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 86:

> 84:                   "multiversion_delayed_slow", "= 0", // The second loop's multiversion_if was also not used, so it is constant folded after loop opts.
> 85:                   "multiversion",              ">= 5", // nothing unexpected
> 86:                   "multiversion",              "<= 7", // nothing unexpected

Can you please also add a lower bound for
`"post .* multiversion_fast", ">= 3",`
That should be correct, right?

Ah ok, now we also vectorize the smaller (first) loop. But we still fully unroll the main-loop, because its stride becomes too large compared to the SIZE, right? But the post-vectorized loop is still reachable. Correct?


I'm a little bit unsure where the `On platforms (> 32 bytes)` is coming from. Does this IR rule fail with a smaller `MaxVectorSize=32`?

I'm wondering if it would make sense to have a few extra IR tests, with various constant SIZEs, and see which ones constant fold which loops, and if that happens as expected. I think that would be worth it.

You could even automate this to some degree with the template framework. We could also make this a follow-up RFE.

test/hotspot/jtreg/compiler/loopopts/superword/TestVectorizedDrainLoop.java line 31:

> 29:  *          generated by fuzzer.
> 30:  *
> 31:  * @run main/othervm -Xint compiler.loopopts.superword.TestVectorizedDrainLoop

What is the interpreter run good for? Why not just have a run without any flags instead?

test/micro/org/openjdk/bench/vm/compiler/VectorizedDrainLoopPerf.java line 51:

> 49: @CompilerControl(CompilerControl.Mode.DONT_INLINE)
> 50: 
> 51: public class VectorizedDrainLoopPerf {

Can you add some comments to this benchmark and to `test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java`, making sure that people are aware of both if they look at one?

I'm also wondering if we really need to add `VectorizedDrainLoopPerf.java`, since the other benchmark does the same and even more. I have not compared them in super detail now, so maybe there are reasons. In the end, I would prefer to have one benchmark that is really good, rather than multiple ones that do similar things. So feel free to modify `VectorThroughputForIterationCount.java` if it does not do everything you need it to do.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/22629#pullrequestreview-3195298102
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329800122
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329802649
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329806392
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329812997
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329687426
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329692663
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329698477
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329732109
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329722628
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329736777
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329753240
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329763943
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329780926
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329517111
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329690592
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329789991
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329568979
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329604565
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329613322
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329640978
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329634882
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329829785
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329842181
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329850866
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329846345
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329849286
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329858268
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329864514
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329867796
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329881209
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329877189
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329554220
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329661057
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329435125
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329384829
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329401879

From epeter at openjdk.org  Mon Sep  8 11:03:33 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 11:03:33 GMT
Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some
 small trip counts [v2]
In-Reply-To: <aSz1y8hezDp5vbEekgVKcNl1Rnnx0VCk6bYvI24He-4=.eb07a978-a2ae-4323-91ae-a09da7c05fad@github.com>
References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com>
 <ezYpgocDEtRwma9bmy95F6Ia6Tl7T05HjQwb1RVuzhg=.b465eae1-403b-4924-8ab7-cb39dd4e4b7c@github.com>
 <aSz1y8hezDp5vbEekgVKcNl1Rnnx0VCk6bYvI24He-4=.eb07a978-a2ae-4323-91ae-a09da7c05fad@github.com>
Message-ID: <dG-4RWyk9dZzFNqjocMxYUOX1BMLVqbYY3E4fgtZ_ZY=.fa22c947-b57e-42b6-b633-89ac644215bf@github.com>

On Mon, 8 Sep 2025 10:22:14 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits:
>> 
>>  - Merge branch 'master' into optimize-atomic-post
>>  - Clean up comments for consistency and add spacing for readability
>>  - Fix some corner case failures and refined part of code
>>  - Merge branch 'master' into optimize-atomic-post
>>  - Refine ascii art, rename some variables and resolve conflicts
>>  - Merge branch 'master' into optimize-atomic-post
>>  - Add necessary ASCII art, refactor insert_post_loop() and rename
>>    "atomic post loop" with "vectorized drain loop.
>>  - Merge branch 'master' into optimize-atomic-post
>>  - 8307084: C2: Vector atomic post loop is not executed for some small trip counts
>>    
>>    In C2's loop optimization, for a counted loop, if we have any of
>>    these conditions (RCE, unrolling) met, we switch to the
>>    pre-main-post-loop model. Then a counted loop could be split into
>>    pre-main-post loops. Meanwhile, C2 inserts minimum trip guards
>>    (a.k.a. zero-trip guards) before the main loop and the post loop.
>>    These guards test if the remaining trip count is less than the
>>    loop stride (after unrolling). If yes, The execution jumps over
>>    the loop code to avoid loop over-running. For example, if a main
>>    loop is unrolled to 8x, the main loop guard tests if the loop has
>>    less than 8 iterations and then decide which way to go.
>>    
>>    Usually, the vectorized main loop will be super-unrolled after
>>    vectorization. In such cases, the main loop's stride is going to
>>    be further multiplied. After the main loop is super-unrolled, the
>>    minimum trip guard test will be updated. Assuming one vector can
>>    operate 8 iterations and the super-unrolling count is 4, the trip
>>    guard of the main loop will test if remaining trip is less than
>>    8 * 4 = 32.
>>    
>>    To avoid the scalar post loop running too many iterations after
>>    super-unrolling, C2 clones the main loop before super-unrolling to
>>    create a vector drain loop, i.e. atomic post loop. The newly
>>    inserted post loop also has a minimum trip guard. And, both trip
>>    guards of the main loop and vector post loop jump to the scalar
>>    post loop.
>>    
>>    The problem here is, if the remaining trip count when exiting from
>>    the pre-loop is relatively small but larger than the vector length,
>>    the vector atomic post loop will never be executed. Because the
>>    minimum trip guard test o...
>
> src/hotspot/share/opto/loopTransform.cpp line 1437:
> 
>> 1435: // We look for an existing Phi node 'drain_input' among the uses of 'main_incr'.
>> 1436: // If no valid Phi is found, we create a new Phi that merges output data edges
>> 1437: // from both the pre-loop and main loop.
> 
> Why can that happen? Do you have a small example?

The solution looks a little complex, so I just want to understand why we need it ;)

> src/hotspot/share/opto/loopTransform.cpp line 1848:
> 
>> 1846: //               /        /
>> 1847: //              after loop
>> 1848: Node* PhaseIdealLoop::insert_post_loop(IdealLoopTree* loop, Node_List& old_new,
> 
> Consider renaming to `insert_post_or_drain_loop`

Should we have an assert, that `mode` can only be
- `ControlAroundStripMined` -> pre
- `InsertVectorizedDrain` -> drain
That might also help the reader understand the options here.

> src/hotspot/share/opto/loopTransform.cpp line 1887:
> 
>> 1885:     // from the main loop and the pre loop.
>> 1886:     zero_ctrl = main_exit->unique_ctrl_out_or_null();
>> 1887:     assert(zero_ctrl, "if zero_ctrl doesn't exist, pre-main-post model fails.");
> 
> Style guide forbids implicit null / zero checks.
> Suggestion:
> 
>     assert(zero_ctrl != nullptr, "if zero_ctrl doesn't exist, pre-main-post model fails.");

What do you mean by `pre-main-post model fails`? PreMainPost has presumably already succeeded. Can you reformulate?

> src/hotspot/share/opto/loopTransform.cpp line 1934:
> 
>> 1932: 
>> 1933:   // Step 3: Find a 'new_phi' which is the input trip count of the zero trip guard.
>> 1934:   Node* new_incr = nullptr;
> 
> Is it called `new_phi` or `new_incr`?

`phi` is usually the `PhiNode`, and `incr` is the `AddINode`, right?

> src/hotspot/share/opto/loopnode.hpp line 1359:
> 
>> 1357:   // from old-loop now should use new Phis that merges Phis which merges
>> 1358:   // values from pre-loop and main-loop and values from the new-loop
>> 1359:   // (vectorized drain loop) equivalents.
> 
> I'm struggling with reading this. "x that merges y that merges z and w and v" - where do I have to place the brackets?
> `x that merges y that (merges z and w) and v`
> Probably this?

Maybe you can just write it like this instead:

Before:
r_old = Region(pre_loop_exit ... or zero-trip-guard?, main_loop_exit)
phi_old = Phi(pre_loop_outputs, main_loop_outputs)
After:
r_old = Region(pre_loop_exit, main_loop_exit)
phi_old = Phi(....)
r_new = Region(r_old, drain_loop_exit)
phi_new = Phi(phi_old, drain_loop_outputs)

Or you just say that we first merge pre-loop and main-loop, and then merge that with drain-loop? So a more high level comment, and then refer for more details elsewhere?

> src/hotspot/share/opto/loopopts.cpp line 2341:
> 
>> 2339: // Take the loop increment "i" as an example.
>> 2340: // Now the data uses about "i" are like:
>> 2341: 
> 
> Nit: I would do the `//` continuously, like elsewhere.

With `i` do you mean the trip-counter phi? You don't really use `i` below anyway, so I'd just drop it.
Suggestion:

// This function is going to fix all data uses of the new loop body.
//
// Let us look at the trip-counter phi, as an example to understand the data uses:
//

> src/hotspot/share/opto/loopopts.cpp line 2356:
> 
>> 2354: //   /       |  \_____________________________
>> 2355: //  /        |                                \
>> 2356: //  |   |--> main loop head             |---> vectorized drain loop head
> 
> Is there some IfNode here that decides between main and drain? Or does that come later?
> Suggestion:
> 
> // IfFalse  IfTrue
> //   /       |  ________(moved later)________
> //  /        |                                \
> //  |   |--> main loop head             |---> vectorized drain loop head

Ah, also the input to the PhiNode below is not yet fixed, right? Maybe it's not worth mentioning any of it at all then... not sure.

> src/hotspot/share/opto/loopopts.cpp line 2435:
> 
>> 2433: void PhaseIdealLoop::handle_data_uses_for_vectorized_drain(Node* main_old, Node_List &old_new,
>> 2434:                                                            IdealLoopTree* loop, IdealLoopTree* outer_loop,
>> 2435:                                                            Node_List& worklist, uint new_counter) {
> 
> `handle` is a very generic verb. `fix` is already used elsewhere for the same purpose, so why not use that instead?
> 
> Maybe `fix_data_uses_with_drain_merge_phis`?
> I'll have to read the code below to make sure that makes sense now.

It would probably make sense to have things matching with:
`fix_ctrl_uses_for_vectorized_drain`

Suggestion alternatives:
- `fix_ctrl_uses_for_vectorized_drain` and `fix_data_uses_for_vectorized_drain`
- `fix_ctrl_uses_with_drain_merge_region` and `fix_data_uses_with_drain_merge_phis`

> src/hotspot/share/opto/loopopts.cpp line 2452:
> 
>> 2450:         _igvn.replace_node(use, hit);
>> 2451:     }
>> 2452:   };
> 
> The existing code style is to avoid lambdas and use helper methods instead. Would that be possible here? Probably just requires a few more arguments, right? `new_counter` for example.

The comment is a little hard to read. Maybe say this instead:
`For the 'use' node, replace all input occurances of 'old_in' with 'new_in'.`
You also do more in the method than the name/comment promises: you replace use with hit.

> src/hotspot/share/opto/loopopts.cpp line 2458:
> 
>> 2456: 
>> 2457:   while (worklist.size()) {
>> 2458:     Node* use = worklist.pop();
> 
> Can you add a quick comment what kind of traversal this is? BFS? Over what nodes?

Ah, are we only removing nodes?

> src/hotspot/share/opto/loopopts.cpp line 2477:
> 
>> 2475:       while (visit_list.size()) {
>> 2476:         Node* curr = visit_list.at(0);
>> 2477:         visit_list.remove(0);
> 
> That `remove` ends up calling `Node_Array::remove`, which copies all upper entries. Generally not very performant. Not sure if it matters here, just noticed it.

Maybe you can construct some graph where this really visits a lot of nodes, then this could blow up quadratically.

> src/hotspot/share/opto/loopopts.cpp line 2481:
> 
>> 2479:         if (newcurr) {
>> 2480:           continue;
>> 2481:         }
> 
> Suggestion:
> 
>         if (newcurr != nullptr) { continue; }

You have more implicit zero/null checks below.

> src/hotspot/share/opto/loopopts.cpp line 2518:
> 
>> 2516:       }
>> 2517: 
>> 2518:       // 'use' may have more than one valid "Phi" uses.
> 
> Example?

Can you quickly say what this loop does with each phi?

> test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 86:
> 
>> 84:                   "multiversion_delayed_slow", "= 0", // The second loop's multiversion_if was also not used, so it is constant folded after loop opts.
>> 85:                   "multiversion",              ">= 5", // nothing unexpected
>> 86:                   "multiversion",              "<= 7", // nothing unexpected
> 
> Can you please also add a lower bound for
> `"post .* multiversion_fast", ">= 3",`
> That should be correct, right?
> 
> Ah ok, now we also vectorize the smaller (first) loop. But we still fully unroll the main-loop, because its stride becomes too large compared to the SIZE, right? But the post-vectorized loop is still reachable. Correct?
> 
> 
> I'm a little bit unsure where the `On platforms (> 32 bytes)` is coming from. Does this IR rule fail with a smaller `MaxVectorSize=32`?
> 
> I'm wondering if it would make sense to have a few extra IR tests, with various constant SIZEs, and see which ones constant fold which loops, and if that happens as expected. I think that would be worth it.
> 
> You could even automate this to some degree with the template framework. We could also make this a follow-up RFE.

I'm also wondering if it would not be nicer to have a different tag for the vectorized drain loop, instead of `post`. Could we call it `vector_drain` maybe? That would make it easier to spot it correctly and to write more expressive IR rules.

> test/hotspot/jtreg/compiler/loopopts/superword/TestVectorizedDrainLoop.java line 31:
> 
>> 29:  *          generated by fuzzer.
>> 30:  *
>> 31:  * @run main/othervm -Xint compiler.loopopts.superword.TestVectorizedDrainLoop
> 
> What is the interpreter run good for? Why not just have a run without any flags instead?

Ah, you have exact constant results that you compare with. Could be good to state this here as a comment, so that nobody removes this in the future. You are just making sure that the interpreter would have produced the same results.

Still: why not add a run without any flags?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329815918
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329691247
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329702206
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329743606
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329531126
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329591377
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329624643
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329646048
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329841458
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329853895
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329870109
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329870887
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329884145
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329438824
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329388742

From epeter at openjdk.org  Mon Sep  8 11:03:33 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 11:03:33 GMT
Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some
 small trip counts [v2]
In-Reply-To: <dG-4RWyk9dZzFNqjocMxYUOX1BMLVqbYY3E4fgtZ_ZY=.fa22c947-b57e-42b6-b633-89ac644215bf@github.com>
References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com>
 <ezYpgocDEtRwma9bmy95F6Ia6Tl7T05HjQwb1RVuzhg=.b465eae1-403b-4924-8ab7-cb39dd4e4b7c@github.com>
 <aSz1y8hezDp5vbEekgVKcNl1Rnnx0VCk6bYvI24He-4=.eb07a978-a2ae-4323-91ae-a09da7c05fad@github.com>
 <dG-4RWyk9dZzFNqjocMxYUOX1BMLVqbYY3E4fgtZ_ZY=.fa22c947-b57e-42b6-b633-89ac644215bf@github.com>
Message-ID: <YZnv9qklWb6qqWhtXLq2EQHlgkxSCkNCui42QYRr1hA=.1b4d2655-0693-4b13-85cb-da6d4a300aa8@github.com>

On Mon, 8 Sep 2025 09:38:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/loopTransform.cpp line 1887:
>> 
>>> 1885:     // from the main loop and the pre loop.
>>> 1886:     zero_ctrl = main_exit->unique_ctrl_out_or_null();
>>> 1887:     assert(zero_ctrl, "if zero_ctrl doesn't exist, pre-main-post model fails.");
>> 
>> Style guide forbids implicit null / zero checks.
>> Suggestion:
>> 
>>     assert(zero_ctrl != nullptr, "if zero_ctrl doesn't exist, pre-main-post model fails.");
>
> What do you mean by `pre-main-post model fails`? PreMainPost has presumably already succeeded. Can you reformulate?

Why not add the `Region` check already here?

>> src/hotspot/share/opto/loopTransform.cpp line 1934:
>> 
>>> 1932: 
>>> 1933:   // Step 3: Find a 'new_phi' which is the input trip count of the zero trip guard.
>>> 1934:   Node* new_incr = nullptr;
>> 
>> Is it called `new_phi` or `new_incr`?
>
> `phi` is usually the `PhiNode`, and `incr` is the `AddINode`, right?

So you could actually make the type more precise than `Node*` :)

>> src/hotspot/share/opto/loopopts.cpp line 2458:
>> 
>>> 2456: 
>>> 2457:   while (worklist.size()) {
>>> 2458:     Node* use = worklist.pop();
>> 
>> Can you add a quick comment what kind of traversal this is? BFS? Over what nodes?
>
> Ah, are we only removing nodes?

Oh, you have another implicit zero check here.

>> src/hotspot/share/opto/loopopts.cpp line 2477:
>> 
>>> 2475:       while (visit_list.size()) {
>>> 2476:         Node* curr = visit_list.at(0);
>>> 2477:         visit_list.remove(0);
>> 
>> That `remove` ends up calling `Node_Array::remove`, which copies all upper entries. Generally not very performant. Not sure if it matters here, just noticed it.
>
> Maybe you can construct some graph where this really visits a lot of nodes, then this could blow up quadratically.

`pop` is more efficient, because it just takes it from the end. But then you'd get a DFS and not BFS.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329711083
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329746816
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329856226
PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329879100

From epeter at openjdk.org  Mon Sep  8 11:03:33 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 11:03:33 GMT
Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some
 small trip counts [v2]
In-Reply-To: <YZnv9qklWb6qqWhtXLq2EQHlgkxSCkNCui42QYRr1hA=.1b4d2655-0693-4b13-85cb-da6d4a300aa8@github.com>
References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com>
 <ezYpgocDEtRwma9bmy95F6Ia6Tl7T05HjQwb1RVuzhg=.b465eae1-403b-4924-8ab7-cb39dd4e4b7c@github.com>
 <aSz1y8hezDp5vbEekgVKcNl1Rnnx0VCk6bYvI24He-4=.eb07a978-a2ae-4323-91ae-a09da7c05fad@github.com>
 <dG-4RWyk9dZzFNqjocMxYUOX1BMLVqbYY3E4fgtZ_ZY=.fa22c947-b57e-42b6-b633-89ac644215bf@github.com>
 <YZnv9qklWb6qqWhtXLq2EQHlgkxSCkNCui42QYRr1hA=.1b4d2655-0693-4b13-85cb-da6d4a300aa8@github.com>
Message-ID: <bx6bwe-9YrHcwkKaxFGrhBfqEnBFJuLtbUFMXGWKJYs=.7f8189c2-a046-4dac-b49c-bb4d48da82cc@github.com>

On Mon, 8 Sep 2025 09:53:49 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> `phi` is usually the `PhiNode`, and `incr` is the `AddINode`, right?
>
> So you could actually make the type more precise than `Node*` :)

Or do we have to somehow support `long` loops too here? then we could just make it an `AddNode*`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2329751232

From mli at openjdk.org  Mon Sep  8 12:49:13 2025
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 8 Sep 2025 12:49:13 GMT
Subject: RFR: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture
In-Reply-To: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
References: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
Message-ID: <QapL1puZn7vaCGX9fFkSaqbUe-9VQGHiLdYdqm93Mt0=.e11cfbca-b6a0-49b1-bbd4-1e335781bf28@github.com>

On Mon, 8 Sep 2025 05:13:32 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
> Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.
> 
> ### Test
> - [x] Run tier1 and tier2 on sg2042

Marked as reviewed by mli (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27134#pullrequestreview-3196372198

From epeter at openjdk.org  Mon Sep  8 13:34:26 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 13:34:26 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
Message-ID: <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>

On Wed, 3 Sep 2025 21:29:43 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This PR introduces C2 support for `Reference.reachabilityFence()`.
>> 
>> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
>> 
>> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
>> 
>> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
>> 
>> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
>> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
>> 
>> Testing:
>> - [x] hs-tier1 - hs-tier8
>> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
>> - [x] java/lang/foreign microbenchmarks
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   whitespaces

Thanks for all the updates. I went over all your responses quickly, but still need to read through the description in `reachability.cpp` now.

I'll do another pass over everything once you address my responses ;)

src/hotspot/share/opto/callGenerator.cpp line 623:

> 621:       return; // keep the original call node as the holder of reachability info
> 622:     }
> 623:   }

Maybe that's just me. But people use the assert messages both in positive and negative ways, and so this is a bit ambiguous. Maybe you can write:
`no reachability edge should be present`

I'm still a bit unsure what the `SafePointNode::grow_stack` comment means.
In the previous comment https://github.com/openjdk/jdk/pull/25315#discussion_r2320120466 you explained more. Why not add that here instead?

src/hotspot/share/opto/compile.cpp line 2522:

> 2520:     if (failing())  return;
> 2521:     assert(_reachability_fences.length() == 0, "no RF nodes allowed");
> 2522:   }

Looks better than before :)

I'm still wondering: do we need to do a whole loop-opts phase here? It probably has a performance impact, right?
Have you measured that?

If it is measurable: could we just go through `_reachability_fences`, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?

src/hotspot/share/opto/loopTransform.cpp line 66:

> 64: //------------------------------unique_loop_exit_or_null----------------------
> 65: // Return the loop-exit projection if it is unique.
> 66: Node* IdealLoopTree::unique_loop_exit_or_null() {

I suggested it here:
https://github.com/openjdk/jdk/pull/25315#discussion_r2149677594
Can we change the return type to `IfProjNode`?

Also: when is it possible that there are none or multiple loop exits?
Can you add a comment below where you return nullptr?

src/hotspot/share/opto/macro.cpp line 973:

> 971:         _igvn._worklist.push(ac);
> 972:       } else if (use->is_ReachabilityFence() && OptimizeReachabilityFences) {
> 973:         use->as_ReachabilityFence()->clear_referent(_igvn); // redundant fence

Thanks for refactoring a bit here :)

Is this rf guaranteed to belong to the Allocation somehow?

src/hotspot/share/opto/parse1.cpp line 2233:

> 2231:       insert_reachability_fence(referent);
> 2232:     }
> 2233:   }

Comments look better, thanks :)

But `StressReachabilityFences` seems to promise that it should happen randomly. Did you want to do that or adjust the flag comment?

src/hotspot/share/opto/reachability.cpp line 136:

> 134:     return true;
> 135:   }
> 136: }

Nit: `an no-op` -> `a no-op`

Also: do you need the return value? The only use case does not do anything with it.

src/hotspot/share/opto/reachability.cpp line 438:

> 436:   if (!OptimizeReachabilityFences) {
> 437:     return false;
> 438:   }

Can this ever fail? Could it be an assert?

src/hotspot/share/opto/reachability.cpp line 441:

> 439: 
> 440:   Unique_Node_List redundant_rfs;
> 441:   Node_List worklist;

Not sure if necessary, but maybe good practice anyway: add `ResourceMark`.

src/hotspot/share/opto/reachability.cpp line 453:

> 451:         SafePointNode* sfpt = safepoints.pop()->as_SafePoint();
> 452:         assert(is_dominator(get_ctrl(referent), sfpt), "");
> 453:         assert(sfpt->req() == rf_start_offset(sfpt), "");

Is this the only reason we need this to happend during LoopOpts - i.e. that we can call `get_ctrl` and `is_dominator`?

Because it is potentially a lot of overhead to create the whole loop-opts structures just for this.

-------------

PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3196301873
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330095168
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330176841
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330209593
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330230044
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330256973
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330221500
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330181204
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330188708
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330192891

From epeter at openjdk.org  Mon Sep  8 13:34:28 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 13:34:28 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <ejAu9M0FYELqOdzDW8uankmdRt0w8bloAwcxWcyx5k0=.9a47c6c4-e9df-40f5-aba9-23073a12bd17@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <UKkT1Wqi4ftj3eGF2KzT8saeWoWSBTXx5kw0FOiJyLE=.c10dbf15-3348-495b-b9aa-556b78bc1e0b@github.com>
 <ejAu9M0FYELqOdzDW8uankmdRt0w8bloAwcxWcyx5k0=.9a47c6c4-e9df-40f5-aba9-23073a12bd17@github.com>
Message-ID: <L0LfhTmGAmKPwwFXYaibWSIA3rLaL8j1xL4OL4XkutY=.70bbb39a-5e86-4a80-952b-d3b98a2a4a36@github.com>

On Wed, 3 Sep 2025 20:19:38 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> src/hotspot/share/opto/c2_globals.hpp line 83:
>> 
>>> 81:                                                                             \
>>> 82:   product(bool, StressReachabilityFences, false, DIAGNOSTIC,                \
>>> 83:           "Randomly insert ReachabilityFence nodes")                        \
>> 
>> Drive-by sniping: what about a hello-world test where you test out these flags?
>
> Good idea. Added one.

Also: you promise that it happens randomly. But it seems to be added deterministically everywhere. Did I miss something?

>> src/hotspot/share/opto/callnode.hpp line 497:
>> 
>>> 495:   // Are we guaranteed that this node is a safepoint?  Not true for leaf calls and
>>> 496:   // for some macro nodes whose expansion does not have a safepoint on the fast path.
>>> 497:   virtual bool guaranteed_safepoint()  { return true; }
>> 
>> I see you only copied it. It makes me a little nervous when we call the "default" case safe. Because when you add more cases, you just assume it is safe... and if it is not we first have to discover that through a bug. What do you think?
>
> Well, it's a SafePointNode class after all. I lifted it from `CallNode` subclass to avoid elaborate check on SafePoint nodes (!is_Call() || as_Call() && guaranteed_safepoint()`)).
> 
> If some node extends SafePointNode, but doesn't keep JVM state, it has to communicate it to users one way or another. And changing the default doesn't improve the situation IMO: reporting a safepoint node as a non-safepoint is still a bug.

Hmm. The way it is formulated it sounds more like:
- `true` -> we are guaranteed that it is a safepoint.
- `false` -> it may or may not be a safepoint - no guarantees.
Am I understanding this right?

If yes, then it would make more sense to have a default that is `no guarantee`. But maybe that makes things more complicated in other ways. All I'm saying it makes me nervous ;)

>> src/hotspot/share/opto/parse.hpp line 361:
>> 
>>> 359:   bool          _wrote_fields;       // Did we write any field?
>>> 360:   Node*         _alloc_with_final_or_stable; // An allocation node with final or @Stable field
>>> 361:   Node*         _stress_rf_hook; // StressReachabilityFences support
>> 
>> You could write out the `rf`
>
> I'd like to avoid that. `_stress_reachability_fence_hook` is way too verbose IMO. The declaration and all the accesses are accompanied by `StressReachabilityFences` which should make it clear what `rf` refers to.

Fair enough. It's always a trade-off. Works here because of `StressReachabilityFences` :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330253854
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330166192
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330245481

From epeter at openjdk.org  Mon Sep  8 13:34:29 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 13:34:29 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
Message-ID: <_n3uP_Dkl3RNq3MFoRDXsS28SM8CcQHaR6vdUJF9U8s=.dcfab97b-be28-4244-93df-c8a23d6d66b8@github.com>

On Mon, 8 Sep 2025 12:29:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   whitespaces
>
> src/hotspot/share/opto/callGenerator.cpp line 623:
> 
>> 621:       return; // keep the original call node as the holder of reachability info
>> 622:     }
>> 623:   }
> 
> Maybe that's just me. But people use the assert messages both in positive and negative ways, and so this is a bit ambiguous. Maybe you can write:
> `no reachability edge should be present`
> 
> I'm still a bit unsure what the `SafePointNode::grow_stack` comment means.
> In the previous comment https://github.com/openjdk/jdk/pull/25315#discussion_r2320120466 you explained more. Why not add that here instead?

I'm also not sure yet why there is a difference between incremental inlining and regular inlining.
Do you think it would make sense to explain that here, or is it explained elsewhere?

> src/hotspot/share/opto/macro.cpp line 973:
> 
>> 971:         _igvn._worklist.push(ac);
>> 972:       } else if (use->is_ReachabilityFence() && OptimizeReachabilityFences) {
>> 973:         use->as_ReachabilityFence()->clear_referent(_igvn); // redundant fence
> 
> Thanks for refactoring a bit here :)
> 
> Is this rf guaranteed to belong to the Allocation somehow?

Ah, you could mention that later `ReachabilityFenceNode::Identity` removes the rf.

> src/hotspot/share/opto/reachability.cpp line 136:
> 
>> 134:     return true;
>> 135:   }
>> 136: }
> 
> Nit: `an no-op` -> `a no-op`
> 
> Also: do you need the return value? The only use case does not do anything with it.

You could mention that `Identity` will remove the node later.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330138204
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330236031
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330237394

From epeter at openjdk.org  Mon Sep  8 13:34:30 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 13:34:30 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <YwP3BI5-UT6-DwM53nsC1R_zikvBs6dGI-ITm0fABPo=.5de44414-8e0a-4351-bdbc-05d90c21cd79@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <N4plHRx1Hm8W3kbZb0JeoddmFdkM3tjsA0agyrQ40fE=.2da64dae-d06e-45df-be2e-5c7ceb4005f1@github.com>
 <vtlALBbMlS-Zj5Qqzp6PpEFQq6fq7xUskZaCXfADorM=.cfcb1f7c-489b-47e0-b8ea-8b0a87dc9d5d@github.com>
 <YwP3BI5-UT6-DwM53nsC1R_zikvBs6dGI-ITm0fABPo=.5de44414-8e0a-4351-bdbc-05d90c21cd79@github.com>
Message-ID: <mtq_TRVg-FJ4AF-K15YmH51LnnC2Hv2WLyr4YfoQfBo=.e8ac2a94-a1f0-464e-a8bb-95ee5d10db47@github.com>

On Wed, 3 Sep 2025 20:28:40 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Can you quickly comment why you changed this?
>
> Some call nodes inspected during `expand_reachability_fences` demonstrate this IR shape where some exception table projections are directly attached to the call node.
> 
> Looks like a missed case in `CallNode::extract_projections` we simply never hit before.

Alright, sounds good! Do you think this could have happened somehow, i.e. was this a bug that we could somehow reproduce?

>> The arguments are less important for me.
>
> There are 2 types of methods here: internal ones (used solely in `reachability.cpp`) and those which are called from loop optimization code (`optimize_reachability_fences` and `eliminate_reachability_fences`). 
> 
> IMO it's counter-productive to repeatedly spell out what "RF" means inside `reachability.cpp`, so I kept the names intact. I split the declarations into public and private ones to stress the distinction.

Great, the private/public split works for me :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330151757
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330213859

From epeter at openjdk.org  Mon Sep  8 14:52:18 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 14:52:18 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
Message-ID: <xdG4HrlPg5ytK8z03Vz8Z-M1OWgWh1HXoUiuJd4Eaxw=.8e958498-2646-4376-9cd2-5d9ee9461b18@github.com>

On Wed, 3 Sep 2025 21:29:43 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This PR introduces C2 support for `Reference.reachabilityFence()`.
>> 
>> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
>> 
>> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
>> 
>> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
>> 
>> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
>> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
>> 
>> Testing:
>> - [x] hs-tier1 - hs-tier8
>> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
>> - [x] java/lang/foreign microbenchmarks
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   whitespaces

A few comments about the `reachability.cpp` intro. I think we are on a good way here :)

src/hotspot/share/opto/reachability.cpp line 49:

> 47:  *
> 48:  * It is tempting to directly attach referents to interfering safepoints right from the beginning, but it
> 49:  * doesn't play well with some optimizations C2 does.

Do you have an example for such optimizations?

src/hotspot/share/opto/reachability.cpp line 67:

> 65:  * RF nodes may interfere with RA, so stand-alone RF nodes are eliminated and their referents are
> 66:  * transferred to corresponding safepoints (phase #2). When safepoints are pruned during macro expansion,
> 67:  * corresponding reachability edges also go away.

Spell our RA on first use. Make more clear that this is why we eliminate RF before RA.
Suggestion:

 * RF nodes may interfere with register allocation (RA), hence we eliminate RF nodes and transfer their
 * referents  to corresponding safepoints (phase #2). When safepoints are pruned during macro expansion,
 * corresponding reachability edges also go away.

`reachability edges also go away` ... and that is ok why? Sketch of what you could write, is it correct?
- reachability only needs to be correct at SafePoints. If all the SafePoints are removed for a referent, then we don't need to ensure its reachablility.

src/hotspot/share/opto/reachability.cpp line 71:

> 69:  * Unfortunately, it's not straightforward to stay with safepoint-attached representation till the very end,
> 70:  * because information about derived oops is attached to safepoints the very same similar way. So, for now RFs are
> 71:  * rematerialized at safepoints before RA (phase #3).

`the very same similar way` sounds a little funny. I'm also not quite seeing the problem yet. What is the issue with the edges being attached to safepoints here?

-------------

PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3196820681
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330441117
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330487392
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330491632

From epeter at openjdk.org  Mon Sep  8 14:52:19 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 14:52:19 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <ejAu9M0FYELqOdzDW8uankmdRt0w8bloAwcxWcyx5k0=.9a47c6c4-e9df-40f5-aba9-23073a12bd17@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <UKkT1Wqi4ftj3eGF2KzT8saeWoWSBTXx5kw0FOiJyLE=.c10dbf15-3348-495b-b9aa-556b78bc1e0b@github.com>
 <ejAu9M0FYELqOdzDW8uankmdRt0w8bloAwcxWcyx5k0=.9a47c6c4-e9df-40f5-aba9-23073a12bd17@github.com>
Message-ID: <GNv8IJhp5805ZBw64DNq7LKRXg1UUrw2i_dGn1Xc7UU=.e08dff4b-ffd5-43b9-8ca6-646a2141c965@github.com>

On Wed, 3 Sep 2025 20:14:56 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> src/hotspot/share/opto/reachability.cpp line 51:
>> 
>>> 49:  *
>>> 50:  * It looks attractive to get rid of RF nodes early and transfer to safepoint-attached representation,
>>> 51:  * but it is not correct until loop opts are done.
>> 
>> Why is it not correct? What could go wrong? Why is it safe to do it after loop opts?
>
> Live ranges of values are routinely extended during loop opts. And it can break the invariant that all interfering safepoints contain the referent in their oop map. (If an interfering safepoint doesn't keep the referent alive, then it becomes possible for the referent to be prematurely GCed.)  
> 
> After loop opts are over, it becomes possible to reliably enumerate all interfering safe points and ensure the referent present in their oop maps.

Can you make sure this explanation is in the comment ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2330449580

From rcastanedalo at openjdk.org  Mon Sep  8 15:43:10 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 8 Sep 2025 15:43:10 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
Message-ID: <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>

On Thu, 4 Sep 2025 07:44:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> Hi Cesar, thanks for addressing this issue. I will run some more comprehensive testing and have a look at it in the next days.

Testing did not reveal any issue. I have, however, a high-level question: could the current two-step design ([SR state adjustment loop](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L300-L315) followed by a [NSR propagation loop](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L318-L320) miss marking allocations as NSR in more complex scenarios, e.g. involving longer points-to/merge chains? Wouldn't it be more principled to re-run the SR state adjustment loop until a fixed point is reached, keeping `reducible_merges` consistent as new allocations are discovered to be NSR? (e.g. by calling `revisit_reducible_phi_status` - with your clean-up applied - every time [an allocation is marked as NSR due to non-removable merges](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L2962-L2964)).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3266887455

From dlunden at openjdk.org  Mon Sep  8 15:49:27 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 8 Sep 2025 15:49:27 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <ce_3y-SiKQOv1BaliDzNA3rWZMuJDwHeCSUAU5hTxyY=.e14a6019-9e7b-416c-bf16-da62ce46d210@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <AlpIiTJMMMbzzHhby7ZRsvE5HRO4KaOSesck96YewtY=.bc61ee1f-2b40-4c43-81e1-9feb66151de9@github.com>
 <ce_3y-SiKQOv1BaliDzNA3rWZMuJDwHeCSUAU5hTxyY=.e14a6019-9e7b-416c-bf16-da62ce46d210@github.com>
Message-ID: <cEFtp9nMZF_fWFvs9HqaIkmNjr_cgkfT3bJtXMWWtMo=.abd4d017-4330-43b1-acf8-fe5d7608b73d@github.com>

On Tue, 2 Sep 2025 14:08:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The main issue is that register masks are stored as part of certain nodes, and nodes get copied by `Node::clone`. If someone in the future decide to add a register mask to some type of node, and forget to add a special case (like what I've now added for `MachProj`) in `Node::clone` for the node type, this safeguard will catch it and complain.
>> 
>> Register masks are used in peculiar ways throughout C2, and there may be other unexpected cases as well that this safeguard catches. I doubt the `_read_only` part has a measurable performance effect, I only added it because it was easy and couldn't hurt.
>
>> The main issue is that register masks are stored as part of certain nodes, and nodes get copied by Node::clone
> 
> Ok, that answers it for me. Maybe you can expand the comment a little where you mention that masks are `shallowly copied`

Sure, will do.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2330650060

From dlunden at openjdk.org  Mon Sep  8 15:56:28 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 8 Sep 2025 15:56:28 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <INWolnROoCsmEkID5uTRUD-dIEd_-V5AWST3c2BEtlA=.4a5ce5ec-92f5-470c-aa0b-9c8985c882be@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <dqxTytPsW2lYZV-H1GTUmXITgosk7E7__pGsbUPeXCU=.154f7378-0e1f-4b0d-a5b1-9dc6003fd411@github.com>
 <INWolnROoCsmEkID5uTRUD-dIEd_-V5AWST3c2BEtlA=.4a5ce5ec-92f5-470c-aa0b-9c8985c882be@github.com>
Message-ID: <7oAZrBdRb6r_63mYjkvgPVjc_eTbqVwtD0SSp33MOzo=.54688223-6b53-4811-89b4-e1a1eac60355@github.com>

On Tue, 2 Sep 2025 14:16:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Yes, you are correct. There is a detailed explanation in `x86_64.ad` ("Definition of frame structure and management information").
>
> Ok. But that's not immediately apparent here. If you already have a comment, why not mention caller/callee or inner/outer scope?

Sure, I'll add that.

>> Right, we should probably update this terminology as well. It comes from the fact that register masks can always represent all registers (+ a few stack slots), and anything beyond the mask is necessarily additional stack slots. So, if `_all_stack` is set, it means the register mask includes all of the stack slots. Any suggestion for a better name?
>
> So that could mean that we have stack slots that are in the mask, and that are off, but we still have `_all_stack = true`, right? That sounds a little contradictory to me.
> 
> Some ideas:
> - `_value_of_bits_above_mask` - though strictly speaking the mask also represents those bits, and so they are not really "above" the mask.
> - `_value_of_bits_above_...` ah it is above the register mask `size`, right? Of course it is a bit suboptimal that the `size` is only for those that we explicitly represent, and does not capture that we implicitly represent. Maybe you can think about naming here too. Optional.

I agree that the current naming is a bit contradictory, but I'm not sure how to rename it. I'll think a bit and propose something in the renaming-PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2330673189
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2330663867

From dlunden at openjdk.org  Mon Sep  8 16:23:29 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 8 Sep 2025 16:23:29 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <GdM72hQe1NvODLC6vcGtXrL5GnMA2c6IsRcdVW3z6r8=.740386db-28e4-46b7-a321-2218dfe2d846@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
 <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>
 <GdM72hQe1NvODLC6vcGtXrL5GnMA2c6IsRcdVW3z6r8=.740386db-28e4-46b7-a321-2218dfe2d846@github.com>
Message-ID: <zJ_64S_33_XM0PJXxKU5cVJKeawayMUaWU7E0iBKkKw=.d046b0de-1772-4904-916a-cfca5034f634@github.com>

On Tue, 2 Sep 2025 14:38:45 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Hmm ok. Now I went to `rm_up` and thought that you would do `i - _offset`. But that's not what happens.
>> 
>> Hmm but then here there is a subtraction:
>> 
>>   bool Member(OptoReg::Name reg) const {
>>     reg = reg - offset_bits();
>> 
>> 
>> Is that consistent? I hope you understand why I'm confused ?
>
> Yes, the subtraction is consistent, because if the register mask is offset, we can no longer use the OptoReg to directly index the mask. Small simplified example: register mask with 5 bits, offset by 10. First bit (index 0) represents OptoReg 10, second bit (index 1) represents OptoReg 11, etc. If we call `Member(15)`, we need to subtract the offset so we look at the correct index in the register mask (index 5).

Ah, I think I now better understand your question. `rm_up` is a low-level method for internal use in `regmask.hpp` and `regmask.cpp` only (perhaps I should prepend it with an underscore?). It basically makes it so that we can regard the backing storage (`_RM_UP` and `_RM_UP_EXT`) as one contiguous array. `Member` is exposed externally and so needs the offset logic.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2330742374

From epeter at openjdk.org  Mon Sep  8 17:07:31 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 17:07:31 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
Message-ID: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>

On Fri, 5 Sep 2025 08:13:35 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
>> 
>> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
>> 
>> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
>> 
>> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
>> 
>> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
>> 
>> 1 failure for node
>>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>> At node
>>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>>   From path:
>>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>>       <-(0)- 210  IfFalse  === 209  [[ 21...
>
> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
> 
>   One more ResourceMark

Alright, I have another barrage of comments.

Things are much better already.

Though it would be good to discuss a bit more how the patterns now look, especially if this becomes something that we do more widely eventually.
I would like to know what are the advantages and disadvantages, and what alternatives we would have ;)

src/hotspot/share/opto/compile.cpp line 702:

> 700:       ,
> 701:       _in_dump_cnt(0),
> 702:       _invariant_checker(GraphInvariantChecker::make_default())

How does this interface with `ResouceMarks`?
Because it is now resource allocated. And so is the `_checks`.
How does this not trip the nesting asserts of allocation there?
I'm probably missing something here.

I would have expected that we need to allocate it from the `_comp_arena`.

src/hotspot/share/opto/graphInvariants.cpp line 32:

> 30: constexpr int LocalGraphInvariant::OutputStep;
> 31: 
> 32: void LocalGraphInvariant::LazyReachableCFGNodes::fill() {

Nit: I would call it `compute`. `fill` sounds like you are going to fill the nodes themselves or something.
And: I know they are synonyms. But why do you use both `reachable` (class name) and `live` (array)?

src/hotspot/share/opto/graphInvariants.cpp line 45:

> 43:       }
> 44:     }
> 45:   }

It seems you are assuming that all CFG nodes are reachable "from below".
That is true in most cases... but:
Have we not had this pesky case where we have a "infinite loop", where there is really no reachability from below, but from above it is reachable.

See `_root_and_safepoints` in `PhaseCCP`. I'm not sure we need to worry about this, but I'd like to be sure that we have considered infinite loops here.

The risk is that otherwise you just call those nodes dead, and do not verify them, right? Or you would just ignore failures there.

src/hotspot/share/opto/graphInvariants.cpp line 61:

> 59:  * and compositional to express complex structures from simple properties.
> 60:  * For instance, we have a pattern for saying "the first input of the center match P" where P is another
> 61:  * Pattern. We end up with trees of patterns matching the graph.

`the first input of the center match P` does not sound like a proper assertion.
Some alternatives I could think of:
- `match P on the first input of center` ok
- `the first input of center must match P` ok
- `match the first input of center with P` meh

src/hotspot/share/opto/graphInvariants.cpp line 63:

> 61:  * Pattern. We end up with trees of patterns matching the graph.
> 62:  */
> 63: struct Pattern : ResourceObj {

Why not move the `Pattern` classes to a separate `pattern.hpp/cpp`? If we did ever use them for `IGVN`, then it would not make so much sense to have them in `graphInvariants.cpp`, right?

src/hotspot/share/opto/graphInvariants.cpp line 64:

> 62:  */
> 63: struct Pattern : ResourceObj {
> 64:   virtual bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream&) const = 0;

Since this is the abstract class, it could make sense to define all inputs, as well as their invariants: precondition / postcondition. Why do all args have a name except the stream?

src/hotspot/share/opto/graphInvariants.cpp line 67:

> 65: };
> 66: 
> 67: /* This pattern just accepts any node. This is convenient mostly as leaves in a pattern tree.

Suggestion:

/* This pattern just accepts any node. This is convenient mostly as leaf in a pattern tree.

I think this is a bit more consistently singular? Optional.

src/hotspot/share/opto/graphInvariants.cpp line 116:

> 114: private:
> 115:   const N*& _binding;
> 116: };

Would it not make sense to move it a bit closer to the related code? Do you need it much before `NodeClassIsAndBind`?

src/hotspot/share/opto/graphInvariants.cpp line 128:

> 126:  *    new AtInput(1, P1),
> 127:  *    new AtInput(2, P2),
> 128:  * )

`In particular, check a node has enough inputs`: At first it is not clear if the code already does that, or if the user is supposed to do it. Why "in particular", does the statement make more clear what you just said? Ah no you are saying that it is best practice to do #input checking first for good reporting :)
Suggestion:

 * Evaluation order is guaranteed to be left-to-right.
 * Good practice:
 *   To get better reporting, the number of inputs should be checked first, before checking concrete inputs.
 *    If you know a node has 3 inputs and want patterns to be applied to each input, it would look like
 *   And::make(
 *      new HasExactlyNInputs(3),
 *      new AtInput(0, P0),
 *      new AtInput(1, P1),
 *      new AtInput(2, P2),
 *   )

src/hotspot/share/opto/graphInvariants.cpp line 159:

> 157:       if (!_checks.at(i)->check(center, steps, path, ss)) {
> 158:         return false;
> 159:       }

Why do you not update steps and path here? If there is a reason, add a comment ;)
I suppose it is because you don't step to another `center`?

src/hotspot/share/opto/graphInvariants.cpp line 175:

> 173:     }
> 174:   }
> 175: }

`make` suggests that this is a factory pattern. But it rather just prints / dumps.
I'd suggest `print_list_of_inputs`.

src/hotspot/share/opto/graphInvariants.cpp line 181:

> 179:   bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
> 180:     if (center->req() != _expect_req) {
> 181:       ss.print_cr("Unexpected number of input. Expected: %d. Found: %d", _expect_req, center->req());

Suggestion:

      ss.print_cr("Unexpected number of inputs. Expected exactly: %d. Found: %d", _expect_req, center->req());

Something should say that the expected number was exact.

src/hotspot/share/opto/graphInvariants.cpp line 187:

> 185:     return true;
> 186:   }
> 187:   const uint _expect_req;

Suggestion:

private:
  const uint _expect_req;

src/hotspot/share/opto/graphInvariants.cpp line 194:

> 192:   bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
> 193:     if (center->req() < _expect_req) {
> 194:       ss.print_cr("Too small number of input. Expected: %d. Found: %d", _expect_req, center->req());

Grammar: Either "Too few inputs" or "Number of inputs too small".

src/hotspot/share/opto/graphInvariants.cpp line 200:

> 198:     return true;
> 199:   }
> 200:   const uint _expect_req;

Suggestion:

private:
  const uint _expect_req;

src/hotspot/share/opto/graphInvariants.cpp line 211:

> 209:   AtInput(uint which_input, const Pattern* pattern) : _which_input(which_input), _pattern(pattern) {}
> 210:   bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
> 211:     assert(_which_input < center->req(), "Input number is out of range");

Hmm. Could still be nice if we did our best here, and responded nicely.
Just in case someone messes up the pattern, and then we get an assert here.
Maybe the bug is hard to reproduce, and having the printed statements would have helped a little?

src/hotspot/share/opto/graphInvariants.cpp line 215:

> 213:       ss.print_cr("Input at index %d is nullptr.", _which_input);
> 214:       return false;
> 215:     }

So we would never do `AtInput(0, ExpectNullptr())` for example?
Fine with me, just an idea to consider ;)

src/hotspot/share/opto/graphInvariants.cpp line 222:

> 220:     }
> 221:     return result;
> 222:   }

Would this not read better?
Suggestion:

    bool success = _pattern->check(center->in(_which_input), state);
    if (!success) {
      state.trace_failure_path(center, _which_input);
    }
    return success;
  }

src/hotspot/share/opto/graphInvariants.cpp line 224:

> 222:   }
> 223:   const uint _which_input;
> 224:   const Pattern* const _pattern;

Suggestion:

private:
  const uint _which_input;
  const Pattern* const _pattern;

src/hotspot/share/opto/graphInvariants.cpp line 234:

> 232:   bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
> 233:     if (!(center->*_type_check)()) {
> 234:       ss.print_cr("Unexpected type: %s.", center->Name());

Is there a way we could say what we actually do expect? Not really, right? We'd need to do it via macro again.

src/hotspot/share/opto/graphInvariants.cpp line 239:

> 237:     return true;
> 238:   }
> 239:   bool (Node::*_type_check)() const;

Suggestion:

private:
  bool (Node::*_type_check)() const;

I would also suggest that you use a `typedef` here.
Something like:
`typedef bool (Node::*TypeCheckMethod)() const;`
Then you can write
Suggestion:

public:
  const TypeCheckMethod _type_check;

src/hotspot/share/opto/graphInvariants.cpp line 282:

> 280: 
> 281:   bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
> 282:     Node_List outputs_of_correct_type;

You should probably have a `ResourceMark` here.
Or just avoid the allocation by first only holding a pointer, and then if you find multiple you just traverse again.

src/hotspot/share/opto/graphInvariants.cpp line 304:

> 302:   }
> 303:   bool (Node::*_type_check)() const;
> 304:   const Pattern* const _pattern;

Suggestion:

private:
  bool (Node::*_type_check)() const;
  const Pattern* const _pattern;

src/hotspot/share/opto/graphInvariants.cpp line 307:

> 305: };
> 306: 
> 307: /* A LocalGraphInvariant that mostly use a Pattern for checking.

Suggestion:

/* A LocalGraphInvariant that mostly uses a Pattern for checking.

src/hotspot/share/opto/graphInvariants.cpp line 312:

> 310:  */
> 311: struct PatternBasedCheck : LocalGraphInvariant {
> 312:   const Pattern* const _pattern;

Suggestion:

private:
  const Pattern* const _pattern;
public:

src/hotspot/share/opto/graphInvariants.cpp line 336:

> 334:       return CheckResult::NOT_APPLICABLE;
> 335:     }
> 336:     CheckResult r = PatternBasedCheck::check(center, reachable_cfg_nodes, steps, path, ss);

Suggestion:

    CheckResult result = PatternBasedCheck::check(center, reachable_cfg_nodes, state);

Packing the 3 args would give us some extra space to write out a name for `r` ;)

src/hotspot/share/opto/graphInvariants.cpp line 351:

> 349:  */
> 350: struct PhiArity : PatternBasedCheck {
> 351:   const RegionNode* region_node = nullptr;

Suggestion:

private:
  const RegionNode* _region_node = nullptr;

You've been giving fields the `_` consistenly up to now, as we usually doing in hotspot ;)

src/hotspot/share/opto/graphInvariants.cpp line 359:

> 357:                     0,
> 358:                     NodeClassIsAndBind(Region, region_node)))) {
> 359:   }

Are there Phi's that only have the ctrl input? I'd be quite surprised if they did not at least have a single data input. What do you think?

src/hotspot/share/opto/graphInvariants.cpp line 378:

> 376:     return CheckResult::VALID;
> 377:   }
> 378: };

I am wondering if it is really worth it to do the whole pattern matching approach, if we still have to write so much code.

There is a lot of boiler plate now, that has replaced the procedural code.

I'm just wondering if we are there yet, or if we need to find some way to make it more concise.
Maybe we can do something like this:

return <something>
       .applies_if(&Node::is_Phi)
       .check([&]() { return PatternBasedCheck::check(center, reachable_cfg_nodes, steps, path, ss); })
       .require(...)
       .finish();

Just an idea. It would probably be lambda based again, which has its disadvantages.
Maybe you have an even better idea.
I'd just like to understand why the Pattern based approach is really super desirable, what are the advantages and disadvantages?

src/hotspot/share/opto/graphInvariants.cpp line 400:

> 398:     }
> 399: 
> 400:     uint cfg_out = ctrl_succ.size();

Suggestion:

    const uint cfg_out = ctrl_succ.size();

Though you could also use `ctrl_succ.size()` directly. Matter of taste.

src/hotspot/share/opto/graphInvariants.cpp line 421:

> 419:           ss.print("  ");
> 420:           ctrl_succ.at(i)->dump("\n", false, &ss);
> 421:         }

You repeat this 4x. Can we do something reasonable about that?

src/hotspot/share/opto/graphInvariants.cpp line 438:

> 436:         ss.print_cr("%s node must have at least one control successors. Found %d.", center->Name(), cfg_out);
> 437:         return CheckResult::FAILED;
> 438:       }

Is there some upper bound?

src/hotspot/share/opto/graphInvariants.cpp line 454:

> 452: };
> 453: 
> 454: /* Checks that Region Start and Root nodes' first input is a self loop, except for copy regions, which then must have only one non null input.

Suggestion:

/* Checks that Region, Start and Root nodes' first input is a self loop, except for copy regions, which then must have only one non null input.

src/hotspot/share/opto/graphInvariants.cpp line 487:

> 485:       if (non_null_inputs_count != 1) {
> 486:         // Should be a rare case, hence the second (but more expensive) traversal.
> 487:         Node_List non_null_inputs;

`ResourceMark`?

src/hotspot/share/opto/graphInvariants.cpp line 509:

> 507: // CountedLoopEnd -> IfTrue -> CountedLoop
> 508: struct CountedLoopInvariants : PatternBasedCheck {
> 509:   const BaseCountedLoopEndNode* counted_loop_end = nullptr;

Suggestion:

private:
  const BaseCountedLoopEndNode* _counted_loop_end = nullptr;
public:

src/hotspot/share/opto/graphInvariants.cpp line 528:

> 526:     if (!center->is_CountedLoop() && !center->is_LongCountedLoop()) {
> 527:       return CheckResult::NOT_APPLICABLE;
> 528:     }

Actually: why not applie that to `OuterStripMinedLoop` as well? Or any `BaseCountedLoop`? Are there more than these 3 cases? If there are ever more, they should probably also adhere to this backedge pattern, we'll just need an extension. But it would be nice to trip over something here if we ever do extend.

src/hotspot/share/opto/graphInvariants.cpp line 547:

> 545:         return CheckResult::FAILED;
> 546:       }
> 547:     }

If you do add `OuterStripMinedLoop`, make it a swich, and assert in the default case ;)

src/hotspot/share/opto/graphInvariants.cpp line 552:

> 550: };
> 551: 
> 552: // CountedLoopEnd -> IfFalse -> SafePoint -> OuterStripMinedLoopEnd[center] -> IfTrue -> OuterStripMinedLoop -> CountedLoop

Could we close the loop, and check that the CountedLoop match via their backedge?

src/hotspot/share/opto/graphInvariants.hpp line 73:

> 71:    * In addition, if the check fails, it must write its error message in [ss].
> 72:    *
> 73:    * If the check succeeds or is not applicable, [steps], [path] and [ss] must be untouched.

I wonder if we should not have some object that represents these 3 args. You pass them everywhere, and they seem to be a unit. And they have invariants that we may want to check.
You could for example enforce that steps and path are in synch just by only providing the access methods that allow it.
What do you think?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26362#pullrequestreview-3196932276
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330519851
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330535043
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330549479
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330572874
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330560727
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330581064
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330584310
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330600482
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330620646
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330682903
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330647056
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330634803
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330699234
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330643795
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330699535
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330652147
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330654896
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330691118
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330699783
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330709824
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330706819
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330732097
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330734834
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330736070
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330737470
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330747146
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330754630
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330750840
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330786828
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330798028
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330801396
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330803456
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330805158
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330814454
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330815461
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330820754
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330822466
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330832982
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330674639

From epeter at openjdk.org  Mon Sep  8 17:07:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 8 Sep 2025 17:07:32 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>

On Mon, 8 Sep 2025 15:17:43 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   One more ResourceMark
>
> src/hotspot/share/opto/graphInvariants.cpp line 61:
> 
>> 59:  * and compositional to express complex structures from simple properties.
>> 60:  * For instance, we have a pattern for saying "the first input of the center match P" where P is another
>> 61:  * Pattern. We end up with trees of patterns matching the graph.
> 
> `the first input of the center match P` does not sound like a proper assertion.
> Some alternatives I could think of:
> - `match P on the first input of center` ok
> - `the first input of center must match P` ok
> - `match the first input of center with P` meh

Also: are we `check` ing or `match` ing? I would pick one consistently.

> src/hotspot/share/opto/graphInvariants.cpp line 234:
> 
>> 232:   bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
>> 233:     if (!(center->*_type_check)()) {
>> 234:       ss.print_cr("Unexpected type: %s.", center->Name());
> 
> Is there a way we could say what we actually do expect? Not really, right? We'd need to do it via macro again.

Or we pass a string .. not nice but would work with the macro for `NodeClassIsAndBind`. Not sure what's best here.

> src/hotspot/share/opto/graphInvariants.cpp line 378:
> 
>> 376:     return CheckResult::VALID;
>> 377:   }
>> 378: };
> 
> I am wondering if it is really worth it to do the whole pattern matching approach, if we still have to write so much code.
> 
> There is a lot of boiler plate now, that has replaced the procedural code.
> 
> I'm just wondering if we are there yet, or if we need to find some way to make it more concise.
> Maybe we can do something like this:
> 
> return <something>
>        .applies_if(&Node::is_Phi)
>        .check([&]() { return PatternBasedCheck::check(center, reachable_cfg_nodes, steps, path, ss); })
>        .require(...)
>        .finish();
> 
> Just an idea. It would probably be lambda based again, which has its disadvantages.
> Maybe you have an even better idea.
> I'd just like to understand why the Pattern based approach is really super desirable, what are the advantages and disadvantages?

One advantage is definitively reporting. And it is still reasonably debuggable I think, my solution may be a little trickier that way.

I think there are multiple factors:
- Simple: fewer abstractions can be easier to read/debug.
- Concise: few lines of code.
- Reporting: nice output when rules fail.

> src/hotspot/share/opto/graphInvariants.cpp line 547:
> 
>> 545:         return CheckResult::FAILED;
>> 546:       }
>> 547:     }
> 
> If you do add `OuterStripMinedLoop`, make it a swich, and assert in the default case ;)

I just saw that you do the `OuterStripMinedLoop` below. But to capture the parallel structure it may still be good. And to capture possible future extension.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330614759
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330713629
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330793156
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2330826144

From sparasa at openjdk.org  Mon Sep  8 21:44:52 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Mon, 8 Sep 2025 21:44:52 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v3]
In-Reply-To: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
Message-ID: <AfOE7sHN1dScGX75jKtPn4UtyNcyK9oWi2hZyb1WV78=.04895073-20c8-4a7e-8642-90d2f7fc116d@github.com>

> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
> 
> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
> 
> For example:
> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding

Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision:

 - refactor emit_eevex_prefix_or_demote_arith_ndd to use size instead of passing attribute
 - undo swap in emit_arith and refactor accordinly

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26997/files
  - new: https://git.openjdk.org/jdk/pull/26997/files/91962f4f..83a22e1c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=01-02

  Stats: 52 lines in 2 files changed: 14 ins; 15 del; 23 mod
  Patch: https://git.openjdk.org/jdk/pull/26997.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26997/head:pull/26997

PR: https://git.openjdk.org/jdk/pull/26997

From sparasa at openjdk.org  Mon Sep  8 21:44:52 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Mon, 8 Sep 2025 21:44:52 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v3]
In-Reply-To: <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>
Message-ID: <pP9ZsXSxXIL48TQcPHSpEOFBZelrchnJAHvcdpu0s9U=.1790a94a-8a96-4c56-8e8a-43bd33bbdf88@github.com>

On Tue, 2 Sep 2025 02:21:44 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - refactor emit_eevex_prefix_or_demote_arith_ndd to use size instead of passing attribute
>>  - undo swap in emit_arith and refactor accordinly
>
> src/hotspot/cpu/x86/assembler_x86.cpp line 12932:
> 
>> 12930:   if (is_commutative && is_demotable(no_flags, dst->encoding(), src2->encoding())) {
>> 12931:     if (size == EVEX_64bit) {
>> 12932:       emit_prefix_and_int8(get_prefixq(src1, dst, is_map1), opcode_byte + 2);
> 
> It will be good to write a comment on top of opcode_byte adjustment on account of opcode mismatch b/w NDD and equivalent demotable variant.
> 
> 
> EVEX.LLZ.NP.MAP4.SCALABLE 21 /r      AND {NF} {ND=1} rv, rv/mv, rv
> 
> 
> `REX.W + 23 /r      AND r64, r/m64 | RM | Valid | N.E. | r64 AND r/m64
> `

Please see a comment added as suggested.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2331439482

From sparasa at openjdk.org  Mon Sep  8 21:44:55 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Mon, 8 Sep 2025 21:44:55 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v2]
In-Reply-To: <RgCxtL-YvvIRVHHMEIBPkeWqCQoCtO9qfpqu6E68x68=.764e737f-d163-4195-bb92-caeac831add9@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <sbf00znMv9WzwFzEeUmfpmCPJ02Zdp4RK6vIacVtYH8=.27735940-f027-487b-9ca6-9cfe9944da23@github.com>
 <RgCxtL-YvvIRVHHMEIBPkeWqCQoCtO9qfpqu6E68x68=.764e737f-d163-4195-bb92-caeac831add9@github.com>
Message-ID: <nsbd3q6Gn5_WNhAXwCQbpLc7hLHWx9jtadhI8w8kDB8=.e0937deb-53b4-40f5-9770-0e17b487e8f8@github.com>

On Fri, 5 Sep 2025 22:03:59 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
>> 
>>  - nomenclature change
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into cdemotion
>>  - remove trailing whitespaces
>>  - remove unused instructions
>>  - 8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2
>
> src/hotspot/cpu/x86/assembler_x86.cpp line 13125:
> 
>> 13123:   emit_arith(op1, op2, src1, src2, second_operand_demotable);
>> 13124: }
>> 13125: 
> 
> This could be written something like below:
> 
> void Assembler::emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register src1, Register src2, VexSimdPrefix pre, VexOpcode opc,
>                                                       InstructionAttr *attributes, int op1, int op2, bool no_flags, bool use_prefixq, bool is_commutative) {
>   bool demotable = is_demotable(no_flags, dst->encoding(), src1->encoding());
>   if (!demotable && is_commutative) {
>       if (is_demotable(no_flags, dst->encoding(), src2->encoding())) {
>         demotable = true;
>         // swap src1 and src2
>         Register tmp = src1;
>         src1 = src2;
>         src2 = tmp;
>       }     
>   } 
>  (void)emit_eevex_prefix_or_demote_ndd(src1->encoding(), dst->encoding(), src2->encoding(), pre, opc, attributes, no_flags, use_prefixq);
>   emit_arith(op1, op2, src1, src2);
> }
> 
> 
> Then we don't need extra argument in emit_arith() and emit_eevex_prefix_or_demote_ndd.

Please see the updated code with the suggestion incorporated.

> src/hotspot/cpu/x86/assembler_x86.hpp line 812:
> 
>> 810:   void emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register src1, Register src2, VexSimdPrefix pre, VexOpcode opc,
>> 811:                                       InstructionAttr *attributes, int op1, int op2, bool no_flags = false, bool use_prefixq = false, bool is_commutative = false);
>> 812: 
> 
> The attributes parameter could be replaced by int size and the attributes computed inside the emit_eevex_prefix_or_demote_arith_ndd. Also then no need to have use_prefixq as a separate parameter, (size == EVEX_64bit) implies use_prefixq.

Please see the updated code to pass size and attributes computed inside the `emit_eevex_prefix_or_demote_arith_ndd`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2331441375
PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2331440906

From cslucas at openjdk.org  Mon Sep  8 22:12:16 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Mon, 8 Sep 2025 22:12:16 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
 <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>
Message-ID: <o3cb6UBPQ_uOvIAtnivXKsPSoquqbeG6jfejnfElaM4=.30285a62-04de-4648-9316-26b20f6cc2fa@github.com>

On Mon, 8 Sep 2025 15:38:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Hi Cesar, thanks for addressing this issue. I will run some more comprehensive testing and have a look at it in the next days.
>
>> Hi Cesar, thanks for addressing this issue. I will run some more comprehensive testing and have a look at it in the next days.
> 
> Testing did not reveal any issue. I have, however, a high-level question: could the current two-step design ([SR state adjustment loop](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L300-L315) followed by a [NSR propagation loop](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L318-L320) miss marking allocations as NSR in more complex scenarios, e.g. involving longer points-to/merge chains? Wouldn't it be more principled to re-run the SR state adjustment loop until a fixed point is reached, keeping `reducible_merges` consistent as new allocations are discovered to be NSR? (e.g. by calling `revisit_reducible_phi_status` - with your clean-up applied - every time [an allocation is marked as NSR due to non-removable merges](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L2962-L2964)).

@robcasloz - are you thinking that the "fixed point" loops on `find_scalar_replaceable_allocs` aren't sufficient? At first glance yes, I think that the code would be more cleaned up if done that way. If the code had been written like that in the first place we wouldn't have seen the current issue. But I don't think this is a correctness issue. As long as we call `revisit_reducible_phi_status` when an object is marked as NSR the eventual call to `unique_java_object` should find that NSR object if it's used by a reducible phi. I propose that we move forward with the current patch and work on this refactoring as a separate issue.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3268175631

From dlong at openjdk.org  Mon Sep  8 23:27:32 2025
From: dlong at openjdk.org (Dean Long)
Date: Mon, 8 Sep 2025 23:27:32 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer
In-Reply-To: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
Message-ID: <RNeW9WTvNU9ySAAEihkuseHSlveeaFiThg5xtLo_Rao=.27185a05-07c3-4d6a-b626-6c56a750947a@github.com>

On Fri, 5 Sep 2025 13:02:00 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

> The nop list has never been used in the history of OpenJDK. Let's clean it up.
> 
> Tested with Mach5 tier 1-5, no related failures.

Looks good.

-------------

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27117#pullrequestreview-3198456672

From dlong at openjdk.org  Mon Sep  8 23:27:48 2025
From: dlong at openjdk.org (Dean Long)
Date: Mon, 8 Sep 2025 23:27:48 GMT
Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4
 only MacOSX aarch64 [v5]
In-Reply-To: <LZMIDjRdRM4uKuuhsDrLGrwWJoVMrwwEv_bprRIjddk=.048960e2-ef28-4274-a8c4-1f0d1d417100@github.com>
References: <FYgWIv_iFwkr9an56KHdJqZlyUgtD_4g2f51hvavZWw=.f5c943f7-65e6-4944-afec-0b9c19a7b284@github.com>
 <LZMIDjRdRM4uKuuhsDrLGrwWJoVMrwwEv_bprRIjddk=.048960e2-ef28-4274-a8c4-1f0d1d417100@github.com>
Message-ID: <rJvU2Bv2F6mkpbYUqADsJNNHIBE55P-5Os5Hl-WWG-8=.0f7b0d41-8e7e-48ff-a390-72b0f60a76b3@github.com>

On Mon, 4 Aug 2025 21:26:22 GMT, Dean Long <dlong at openjdk.org> wrote:

>> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value.  Further, it takes a fast-path that uses the previous direct store when at a safepoint.  Combined, these changes should get us back to almost where we were before in terms of overhead.  If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value.
>
> Dean Long has updated the pull request incrementally with one additional commit since the last revision:
> 
>   one unconditional release should be enough

I need another review for this.  Any volunteers?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3268323422

From sparasa at openjdk.org  Mon Sep  8 23:30:02 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Mon, 8 Sep 2025 23:30:02 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v4]
In-Reply-To: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
Message-ID: <FajL6klNo82iwN4f-PEyqqJ8nmMCVUJbHV0__ze0E8o=.457778c5-ef85-466d-81cd-2d919a93ed07@github.com>

> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
> 
> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
> 
> For example:
> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding

Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:

  undo the passing of demotable flag

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26997/files
  - new: https://git.openjdk.org/jdk/pull/26997/files/83a22e1c..9714a9b1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=02-03

  Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/26997.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26997/head:pull/26997

PR: https://git.openjdk.org/jdk/pull/26997

From dlong at openjdk.org  Mon Sep  8 23:56:16 2025
From: dlong at openjdk.org (Dean Long)
Date: Mon, 8 Sep 2025 23:56:16 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v3]
In-Reply-To: <jR9U9f8GNW0wPSQZD_UYRJf4hwGCCP7umyTm0bNDz4o=.ed4f44bd-57e7-4b2a-83b3-c8da05609dc4@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
 <jR9U9f8GNW0wPSQZD_UYRJf4hwGCCP7umyTm0bNDz4o=.ed4f44bd-57e7-4b2a-83b3-c8da05609dc4@github.com>
Message-ID: <njrDWUd6VvlyU-9HiTyJxDYRF4GNh586kyg5jHziRdI=.b1e51205-11f6-462b-b839-91bed0866fd7@github.com>

On Mon, 8 Sep 2025 09:29:26 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> # Issue
>> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered
>> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235
>> 
>> # Cause
>> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens:
>> * we insert a trailing `MemBarStoreStore` in the constructor
>> <img height="200" alt="before_folding" src="https://github.com/user-attachments/assets/c1aab634-808d-4198-94ac-8093c6b85c5d" />
>> 
>> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. 
>> <img height="200" alt="after_folding" src="https://github.com/user-attachments/assets/568e9fc3-5f19-4e10-a72e-f0a5e772daed" />
>> 
>> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302
>> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235
>> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier
>> 
>> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped).
>> 
>> # Fix
>> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation.
>> 
>> # Testing
>> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after.
>> Tier 1-3+ tests passed.
>
> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JDK-8360031: add MemBarStoreStore node to worklist during escape analysis/adapt remove assert

src/hotspot/share/opto/memnode.cpp line 4232:

> 4230: 
> 4231: void MemBarNode::remove(PhaseIterGVN *igvn) {
> 4232:   if (outcnt() != 2) {

By itself, this allows outcnt() == 0, so maybe we need to continue to fail if that happens.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26556#discussion_r2331612715

From sviswanathan at openjdk.org  Tue Sep  9 00:17:42 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Tue, 9 Sep 2025 00:17:42 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v4]
In-Reply-To: <FajL6klNo82iwN4f-PEyqqJ8nmMCVUJbHV0__ze0E8o=.457778c5-ef85-466d-81cd-2d919a93ed07@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <FajL6klNo82iwN4f-PEyqqJ8nmMCVUJbHV0__ze0E8o=.457778c5-ef85-466d-81cd-2d919a93ed07@github.com>
Message-ID: <a4Ckkdi0WmRwJjWbqZaPtgTsJTBNQrdsabAfexNmB_s=.0b5108ef-afa8-4c81-8a3c-86cfef34c10b@github.com>

On Mon, 8 Sep 2025 23:30:02 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
>> 
>> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
>> 
>> For example:
>> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
>> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   undo the passing of demotable flag

Looks good to me.

-------------

Marked as reviewed by sviswanathan (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26997#pullrequestreview-3198574139

From dzhang at openjdk.org  Tue Sep  9 00:29:58 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Tue, 9 Sep 2025 00:29:58 GMT
Subject: RFR: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture [v2]
In-Reply-To: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
References: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
Message-ID: <N-Ajw5dK79CC_ySFWVDnqAtk_5c1IXqUr1t9szGvys8=.b7c7df8b-f40b-45c4-8a51-92bf6f22c2a8@github.com>

> Hi,
> Can you help to review this patch? Thanks!
> 
> This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
> Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.
> 
> ### Test
> - [x] Run tier1 and tier2 on sg2042

Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision:

  fix typo

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27134/files
  - new: https://git.openjdk.org/jdk/pull/27134/files/99846cd6..b5d87735

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27134&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27134&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27134.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27134/head:pull/27134

PR: https://git.openjdk.org/jdk/pull/27134

From fyang at openjdk.org  Tue Sep  9 00:29:59 2025
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 9 Sep 2025 00:29:59 GMT
Subject: RFR: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture [v2]
In-Reply-To: <N-Ajw5dK79CC_ySFWVDnqAtk_5c1IXqUr1t9szGvys8=.b7c7df8b-f40b-45c4-8a51-92bf6f22c2a8@github.com>
References: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
 <N-Ajw5dK79CC_ySFWVDnqAtk_5c1IXqUr1t9szGvys8=.b7c7df8b-f40b-45c4-8a51-92bf6f22c2a8@github.com>
Message-ID: <oMWpivNABfZu2oy1zRcvifTmsWYLzVgxvO9OJU4ohS8=.24d44efe-cbec-47f7-b21d-4ac1ea054586@github.com>

On Tue, 9 Sep 2025 00:24:38 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

>> Hi,
>> Can you help to review this patch? Thanks!
>> 
>> This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
>> Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.
>> 
>> ### Test
>> - [x] Run tier1 and tier2 on sg2042
>
> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix typo

Marked as reviewed by fyang (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27134#pullrequestreview-3198583207

From dzhang at openjdk.org  Tue Sep  9 00:30:00 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Tue, 9 Sep 2025 00:30:00 GMT
Subject: RFR: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture
In-Reply-To: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
References: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
Message-ID: <JCQQyzredf7Cvaopw291683CKFzaoEFPYOxaLxGNwSk=.f6539391-725b-4e1b-bf4e-69028e4b5507@github.com>

On Mon, 8 Sep 2025 05:13:32 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
> Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.
> 
> ### Test
> - [x] Run tier1 and tier2 on sg2042

Thanks all for the review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27134#issuecomment-3268434597

From duke at openjdk.org  Tue Sep  9 00:30:01 2025
From: duke at openjdk.org (duke)
Date: Tue, 9 Sep 2025 00:30:01 GMT
Subject: RFR: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture
In-Reply-To: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
References: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
Message-ID: <rInxHzjQ2e9DmrR8aCYDpetiBUtahA48w_AXaMs0Q-Q=.9f0a9c8c-5081-4520-8b6a-bd9690ec072c@github.com>

On Mon, 8 Sep 2025 05:13:32 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
> Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.
> 
> ### Test
> - [x] Run tier1 and tier2 on sg2042

@DingliZhang 
Your change (at version b5d87735bc2f6a1540676722c0befcca95557fa9) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27134#issuecomment-3268436960

From dzhang at openjdk.org  Tue Sep  9 00:41:35 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Tue, 9 Sep 2025 00:41:35 GMT
Subject: Integrated: 8367048: RISC-V: Correct pipeline descriptions of the
 architecture
In-Reply-To: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
References: <sCc5RPOEdpio8GKk_vj-TmYTa2wjSzWhqgtXvp3xZh4=.b6e51a9b-0328-4c1d-a268-abb0d130c19e@github.com>
Message-ID: <E7QJlnl9GnDnC-SIzz102j_bbrMkoCUhWI0TGjBfSzQ=.e1ee9455-b8a9-4ba6-98ea-03c8123164d5@github.com>

On Mon, 8 Sep 2025 05:13:32 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> This patch updates the RISC-V pipeline attributes to variable_size_instructions to properly account for the 2-byte compressed instructions from the C extension. 
> Furthermore, it increases the max_instructions_per_bundle to 4 and adjusts the instruction_unit_size to match 4-issue RISC-V hardware like the UR-CP100.
> 
> ### Test
> - [x] Run tier1 and tier2 on sg2042

This pull request has now been integrated.

Changeset: 0aee7bf2
Author:    Dingli Zhang <dzhang at openjdk.org>
Committer: Fei Yang <fyang at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/0aee7bf24d7f2578d3867bcfa25646cb0bd06d9a
Stats:     12 lines in 1 file changed: 5 ins; 0 del; 7 mod

8367048: RISC-V: Correct pipeline descriptions of the architecture

Reviewed-by: fyang, fjiang, mli

-------------

PR: https://git.openjdk.org/jdk/pull/27134

From jbhateja at openjdk.org  Tue Sep  9 02:12:11 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Tue, 9 Sep 2025 02:12:11 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
Message-ID: <cwluATNnzACJ0UXNLV2hG9aF1bQzVXlewzGHmYhSz0M=.f2d2a6c0-e49f-419c-820b-5d6103eeeba9@github.com>

On Fri, 5 Sep 2025 17:17:52 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> PopCountValueTransform.StockKernelInt         thrpt    2  409295.875          ops/s
>> PopCountValueTransform.StockKernelLong        thrpt    2  368025.608          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> PopCountValueTransform.StockKernelInt         thrpt    2  418649.269          ops/s
>> PopCountValueTransform.StockKernelLong        thrpt    2  381330.221          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update countbitsnode.cpp

Hi @TobiHartmann , @SirYwell , @eme64 , can you kindly verify the changes in the latest patch?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3268608172

From jbhateja at openjdk.org  Tue Sep  9 02:21:11 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Tue, 9 Sep 2025 02:21:11 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v4]
In-Reply-To: <0X5cvpQZxb1l5Q_8f-iU0K4WtdyFW8ehdPXR2zsnSzo=.7f4f3d03-94db-4482-b5ee-c5f1362d84b5@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>
 <0X5cvpQZxb1l5Q_8f-iU0K4WtdyFW8ehdPXR2zsnSzo=.7f4f3d03-94db-4482-b5ee-c5f1362d84b5@github.com>
Message-ID: <hnQ3ti2GcS0BrzRaU2jKby4D1ou_1niFo_l4_WxwYvk=.837b0e66-4472-49be-8f3f-a07ef331af74@github.com>

On Thu, 4 Sep 2025 20:16:30 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> src/hotspot/cpu/x86/x86_64.ad line 7121:
>> 
>>> 7119: %{
>>> 7120:   predicate(UseAPX);
>>> 7121:   match(Set dst (AddI (LoadI src1) src2));
>> 
>> Will this not be covered by the pattern at line 7103, since ADLC automatically generates a DFA to handle both cases?
>
> Will run experiments to make sure that the RegRegMem pattern also applies to RegMemReg case and remove the newly added match rules if they're redundant. Will update you soon.

Hi @vamsi-parasa, your latest patch does not address this.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2331827360

From jbhateja at openjdk.org  Tue Sep  9 02:31:18 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Tue, 9 Sep 2025 02:31:18 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v9]
In-Reply-To: <mG1sFdV99uAG4cWGfM6kCew9UVLdVuG4_GHADimAsVQ=.8013b182-afc1-4156-9718-13efb348bbb6@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <mG1sFdV99uAG4cWGfM6kCew9UVLdVuG4_GHADimAsVQ=.8013b182-afc1-4156-9718-13efb348bbb6@github.com>
Message-ID: <sSFSX4MSj6uHOW0VOmXBi75QgHEbOGUHh4wJRtxgR44=.b3b4e38f-0719-4c3d-ae38-14d2df8fd9f7@github.com>

On Sat, 6 Sep 2025 09:44:56 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
>> 1...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Check for scalar casting instead of vector casting in tests when disabling vector alignment or compact object headers

test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 432:

> 430:         applyIfCPUFeatureAnd = {"avx", "true", "avx10_2", "false"})
> 431:     @IR(counts = {"cast2DtoX", " >0 "}, phase = CompilePhase.FINAL_CODE,
> 432:         applyIfCPUFeature = {"avx10_2", "true"})

Please refer to https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java#L2638 for adding MachNode IR node based checks

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2331837526

From duke at openjdk.org  Tue Sep  9 05:39:14 2025
From: duke at openjdk.org (erifan)
Date: Tue, 9 Sep 2025 05:39:14 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
 <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>
 <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
Message-ID: <AM2FRG6JTLj9k11GrHjNRKtBVMnRRjIwOZFoqedQ56k=.426311e2-d7e0-42e3-b4d0-67c7fba66a86@github.com>

On Tue, 2 Sep 2025 08:10:02 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Thanks @theRealAph .
>> 
>> I've indeed considered and implemented your idea. The code diff:
>> 
>> diff --git a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
>> index 11d302e9026..841d24f516b 100644
>> --- a/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
>> +++ b/src/hotspot/cpu/aarch64/assembler_aarch64.hpp
>> @@ -3813,8 +3813,9 @@ template<typename R, typename... Rx>
>>                 bool isMerge, bool isFloat) {
>>      starti;
>>      assert(T != Q, "invalid size");
>> +    assert((!isFloat) || (isFloat && T != B), "invalid size");
>>      int sh = 0;
>> -    if (imm8 <= 127 && imm8 >= -128) {
>> +    if ((imm8 <= 127 && imm8 >= -128) || (isFloat && (imm8 >> 8) == 0)) {
>>        sh = 0;
>>      } else if (T != B && imm8 <= 32512 && imm8 >= -32768 && (imm8 & 0xff) == 0) {
>>        sh = 1;
>> @@ -3824,7 +3825,7 @@ template<typename R, typename... Rx>
>>      }
>>      int m = isMerge ? 1 : 0;
>>      f(0b00000101, 31, 24), f(T, 23, 22), f(0b01, 21, 20);
>> -    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), sf(imm8, 12, 5), rf(Zd, 0);
>> +    prf(Pg, 16), f(isFloat ? 1 : 0, 15), f(m, 14), f(sh, 13), f(imm8&0xff, 12, 5), rf(Zd, 0);
>>    }
>> 
>>  public:
>> @@ -3834,7 +3835,7 @@ template<typename R, typename... Rx>
>>    }
>>    // SVE copy floating-point immediate to vector elements (predicated)
>>    void sve_cpy(FloatRegister Zd, SIMD_RegVariant T, PRegister Pg, double d) {
>> -    sve_cpy(Zd, T, Pg, checked_cast<int8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
>> +    sve_cpy(Zd, T, Pg, checked_cast<uint8_t>(pack(d)), /*isMerge*/true, /*isFloat*/true);
>>    }
>> 
>>    // SVE conditionally select elements from two vectors
>> 
>> 
>> However, some of my colleagues have differing opinions:
>> 1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
>> 2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** . While some encoding code can be reused, separating the encodings makes the code clearer.
>> 
>> I think both implementations are fine. If you think it's better to not refactor, I'll revert.
>
>> 1. sve `cpy` and `fcpy` are actually two different instructions, and distinguishing them might be clearer.
> 
> That's a fair point, but the Arch64 name for all four instructions is CPY, and they are distinguished by their operands. Deviation from the names in the Reference Manual is occasionally necessary, but it makes life painful for maintainers when they have to search for what we've called an instruction they want to use.
>  
>>     2. sve `cpy` 's imm8 is an **int** , while `fcpy` 's imm8 is an **fp8** .
> 
> Yes, that's right.
> 
>> While some encoding code can be reused, separating the encodings makes the code clearer.
> 
> I don't agree that it makes the code clearer. In fact, tight factoring emphasizes the fact that these instructions are similar, and explicitly shows where they are different.
> 
> It is true that I have a strong bias against copy-and-paste programming.
> 
>> I think both implementations are fine. If you think it's better to not refactor, I'll revert.
> 
> I do. Thank you.

Hi @theRealAph @eme64  , would you mind sponsoring this PR? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3268944989

From xgong at openjdk.org  Tue Sep  9 06:53:31 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Tue, 9 Sep 2025 06:53:31 GMT
Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v11]
In-Reply-To: <ezowLR1tocCY7LvboEC3gAfVphplLZH9WcfUgrbiPnk=.0b60331f-89be-4c4f-ab96-380d437d9b74@github.com>
References: <cGkYMFJGc4N5Wwje26vKLpmnV4UpfT8tZpLOeGfosxI=.219cd257-382f-401b-8c15-2e7803ae7b01@github.com>
 <ezowLR1tocCY7LvboEC3gAfVphplLZH9WcfUgrbiPnk=.0b60331f-89be-4c4f-ab96-380d437d9b74@github.com>
Message-ID: <8T7swIJ17tLLg4FO_N5UZ0HsMYrz31ywBiMZohefGTE=.386eeb0d-8541-4c35-8a68-6caf31ea867e@github.com>

On Thu, 14 Aug 2025 14:01:13 GMT, Mikhail Ablakatov <mablakatov at openjdk.org> wrote:

>> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used.
>> 
>> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still.
>> 
>> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks.
>> 
>> Benchmarks results:
>> 
>> Neoverse-V1 (SVE 256-bit)
>> 
>>   Benchmark                 (size)   Mode   master         PR  Units
>>   ByteMaxVector.MULLanes      1024  thrpt 5447.643  11455.535 ops/ms
>>   ShortMaxVector.MULLanes     1024  thrpt 3388.183   7144.301 ops/ms
>>   IntMaxVector.MULLanes       1024  thrpt 3010.974   4911.485 ops/ms
>>   LongMaxVector.MULLanes      1024  thrpt 1539.137   2562.835 ops/ms
>>   FloatMaxVector.MULLanes     1024  thrpt 1355.551   4158.128 ops/ms
>>   DoubleMaxVector.MULLanes    1024  thrpt 1715.854   3284.189 ops/ms
>> 
>> 
>> Fujitsu A64FX (SVE 512-bit):
>> 
>>   Benchmark                 (size)   Mode   master         PR  Units
>>   ByteMaxVector.MULLanes      1024  thrpt 1091.692   2887.798 ops/ms
>>   ShortMaxVector.MULLanes     1024  thrpt  597.008   1863.338 ops/ms
>>   IntMaxVector.MULLanes       1024  thrpt  510.642   1348.651 ops/ms
>>   LongMaxVector.MULLanes      1024  thrpt  468.878    878.620 ops/ms
>>   FloatMaxVector.MULLanes     1024  thrpt  376.284   2237.564 ops/ms
>>   DoubleMaxVector.MULLanes    1024  thrpt  431.343   1646.792 ops/ms
>
> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   cleanup: start the SVE Integer Misc - Unpredicated section

Do you intend to ignore ops with >32B vector size? May I ask the reason?

If so, maybe the title like `AArch64: Implement MulReduction for 256-bit SVE` is more accurate?

src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 2199:

> 2197: 
> 2198: instruct reduce_non_strict_order_mulF_256b(vRegF dst, vRegF fsrc, vReg vsrc, vReg tmp1, vReg tmp2) %{
> 2199:   predicate(Matcher::vector_length_in_bytes(n->in(2)) == 32 && !n->as_Reduction()->requires_strict_order());

Suggestion:

  predicate(Matcher::vector_length_in_bytes(n->in(2)) == 32 &&
            !n->as_Reduction()->requires_strict_order());

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2119:

> 2117:     assert(false, "unsupported");
> 2118:     ShouldNotReachHere();
> 2119:   }

Can we just add a type assertion at the start of the method and remove the switch-case?

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2165:

> 2163:                                                             FloatRegister vtmp1,
> 2164:                                                             FloatRegister vtmp2) {
> 2165:   assert(vector_length_in_bytes > FloatRegister::neon_vl, "ASIMD impl should be used instead");

Is it better to assert `vector_length_in_bytes == 32`  or `vector_length_in_bytes == 2 * FloatRegister::neon_vl`?

-------------

PR Review: https://git.openjdk.org/jdk/pull/23181#pullrequestreview-3199499604
PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2332130585
PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2332153670
PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2332197936

From xgong at openjdk.org  Tue Sep  9 06:53:32 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Tue, 9 Sep 2025 06:53:32 GMT
Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v8]
In-Reply-To: <jORpdOKtHZsNteFjGizf3ZJRng0i7CQvfKKhPd-Dhck=.f305aeaf-b3a4-43a9-82f4-bc978e7996a8@github.com>
References: <cGkYMFJGc4N5Wwje26vKLpmnV4UpfT8tZpLOeGfosxI=.219cd257-382f-401b-8c15-2e7803ae7b01@github.com>
 <nQd6i2ytEm1KQmRIJHFfmOEDAu5sIlK0k_b2gCQXuJo=.44dd06c4-8f04-4e5b-b158-9f102348c1de@github.com>
 <6H9X-NXKOGd9BZVhTDiKNf7OO2KQTciRKGnXY-5C9yA=.e25f9e69-44c2-48d1-b4e3-cb8f1af79546@github.com>
 <_gHaFQTNq2bApeWAE88cWxcNULRDqndSSo3hrY31FgI=.132b7c24-7205-4877-9b95-3d9d13ac7ec8@github.com>
 <d_111-0nDlF3hjrXC7VkiuArMfoYOaN-TNKnabK_VRg=.b21c6a19-5a3f-41ae-8d60-e6652c58bee1@github.com>
 <L983x9x4LmCEGwq9uxApFIFB59cceFD0IPnpFheMehA=.1baac890-743e-400e-a05b-6642860fe642@github.com>
 <-SwJHROQB4jO9nlICIWSwNGXZDIQUy8O54baR-Xe80o=.f7c4fd43-330d-4870-ae4b-316ab7507b06@github.com>
 <jORpdOKtHZsNteFjGizf3ZJRng0i7CQvfKKhPd-Dhck=.f305aeaf-b3a4-43a9-82f4-bc978e7996a8@github.com>
Message-ID: <Fd8MEnF6uX07SHfi_ZC-RzMhaiGpkNnvzP-FwUhaG_4=.fb8195ae-f0b1-4bd6-a7bc-2a394a782223@github.com>

On Fri, 11 Jul 2025 09:32:14 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> @XiaohongGong , JIC, you've referenced the PR you left this comment in. Did you intend to post it somewhere else?
>
> Oh, sorry, my bad. I intended to post this one: https://github.com/openjdk/jdk/pull/21895/files#diff-7b82624b78127158abbce6835eeba196bd062aee59512ec2d4e4c8c7d681573b

So do you intend to change this still? Either is fine to me. But I still prefer not touching the PR un-relative code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23181#discussion_r2332201601

From duke at openjdk.org  Tue Sep  9 07:02:31 2025
From: duke at openjdk.org (erifan)
Date: Tue, 9 Sep 2025 07:02:31 GMT
Subject: RFR: 8365911: AArch64: Fix encoding error in sve_cpy for negative
 floats
In-Reply-To: <AM2FRG6JTLj9k11GrHjNRKtBVMnRRjIwOZFoqedQ56k=.426311e2-d7e0-42e3-b4d0-67c7fba66a86@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
 <KE8_jx6V7-oTcCoQKwZ0Gyzdr2qPbWETxrneZvPAxHs=.bd39c34f-972e-4ecc-bc16-eca3fcfc175d@github.com>
 <2R6O7Jhv3catwxc6rXJdh7Uiq-NFBp7beCmP49CLTqU=.7ba72e39-6efd-47fe-8ad9-6df54a45c99b@github.com>
 <-G8GwIflOhFjOL-PAG6_oylu0Fa9c8iNUB57EC6oo4s=.a0126087-2a97-4542-a555-27c12578fccf@github.com>
 <AM2FRG6JTLj9k11GrHjNRKtBVMnRRjIwOZFoqedQ56k=.426311e2-d7e0-42e3-b4d0-67c7fba66a86@github.com>
Message-ID: <AhuLifqz9YvGtJfyCH73Vznm8V_VUve-1rG2ccU5F50=.c9de4094-2a27-4093-94ec-00a73e52c834@github.com>

On Tue, 9 Sep 2025 05:36:35 GMT, erifan <duke at openjdk.org> wrote:

> /sponsor

Thank you very much ! @eme64

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26951#issuecomment-3269160855

From duke at openjdk.org  Tue Sep  9 07:02:32 2025
From: duke at openjdk.org (erifan)
Date: Tue, 9 Sep 2025 07:02:32 GMT
Subject: Integrated: 8365911: AArch64: Fix encoding error in sve_cpy for
 negative floats
In-Reply-To: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
References: <EXHv-dYdlu9sWHf5otvUqNqFY2l7MbjkHCyiifwfhqg=.f2a1352f-8d68-4767-9276-742690e1c96f@github.com>
Message-ID: <DsJND0jZP-pZxXdXqinofPHWz1j_CoUvS9a8kNRd2Rk=.ab4bd221-f39c-4ea8-bbde-b6fb240392f7@github.com>

On Wed, 27 Aug 2025 01:34:25 GMT, erifan <duke at openjdk.org> wrote:

> The?sve_cpy?instruction is not correctly implemented for?negative floating-point?values. The issues include:
> 
> 1. When a negative floating-point number (e.g. `-1.0`) is passed, the `checked_cast<int8_t>(pack(d))`?check fails. For example, assume?`d = -1.0`:
> - `pack(-1.0)`?returns an unsigned int with the 7th bit set, i.e.,?`0xf0`.
> - `checked_cast<int8_t>(0xf0)`?casts?`0xf0`?to an?int8_t?value, which is?`-16`.
> - Casting this int8_t `-16`?back to unsigned int results in?`0xfffffff0`.
> - The check compares `0xf0`?to?`0xfffffff0`, which obviously fails.
> 
> 2. Additionally, the encoding of the negative floating-point number is incorrect:
> - The imm8?field can fall outside the valid range of?**[-128, 127]**.
> - Bit **13** should be encoded as **0** for floating-point numbers.
> 
> This PR fixes these issues and renames floating-point `sve_cpy` as `sve_fcpy`.
> 
> Some test cases are added to aarch64-asmtest.py, and all tests passed.

This pull request has now been integrated.

Changeset: 680bf758
Author:    erifan <erfang at nvidia.com>
Committer: Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/680bf758980452511ea72224066358e5fd38f060
Stats:     136 lines in 3 files changed: 9 ins; 0 del; 127 mod

8365911: AArch64: Fix encoding error in sve_cpy for negative floats

Reviewed-by: aph, epeter

-------------

PR: https://git.openjdk.org/jdk/pull/26951

From rcastanedalo at openjdk.org  Tue Sep  9 07:04:20 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 9 Sep 2025 07:04:20 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
 <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>
Message-ID: <2brDXuLmbVBVRaeSyCdKokA706v3t6VsZfGvj_QceJ4=.4483390e-c726-4d82-b220-f1dbdf4efef0@github.com>

On Mon, 8 Sep 2025 15:38:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Hi Cesar, thanks for addressing this issue. I will run some more comprehensive testing and have a look at it in the next days.
>
>> Hi Cesar, thanks for addressing this issue. I will run some more comprehensive testing and have a look at it in the next days.
> 
> Testing did not reveal any issue. I have, however, a high-level question: could the current two-step design ([SR state adjustment loop](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L300-L315) followed by a [NSR propagation loop](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L318-L320) miss marking allocations as NSR in more complex scenarios, e.g. involving longer points-to/merge chains? Wouldn't it be more principled to re-run the SR state adjustment loop until a fixed point is reached, keeping `reducible_merges` consistent as new allocations are discovered to be NSR? (e.g. by calling `revisit_reducible_phi_status` - with your clean-up applied - every time [an allocation is marked as NSR due to non-removable merges](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L2962-L2964)).

> @robcasloz - are you thinking that the "fixed point" loops on `find_scalar_replaceable_allocs` aren't sufficient?

You're right, that should do.

> At first glance yes, I think that the code would be more cleaned up if done that way. If the code had been written like that in the first place we wouldn't have seen the current issue. (...)

Agree, a single fixed point loop combining NSR detection and propagation would be ideal for clarity and maintainability.

>  I propose that we move forward with the current patch and work on this refactoring as a separate issue.

Sounds good, please file a RFE for that. I would suggest then to postpone the clean-up in `revisit_reducible_phi_status` to that RFE.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3269166743

From mchevalier at openjdk.org  Tue Sep  9 07:31:30 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 07:31:30 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <yoQkUQQyMtEYQah2nRalN3sAAI3YT03or-K4rgPSWwI=.d9558bf4-4b78-4467-aa14-25ba3c2ae1c7@github.com>

On Mon, 8 Sep 2025 15:08:54 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   One more ResourceMark
>
> src/hotspot/share/opto/graphInvariants.cpp line 45:
> 
>> 43:       }
>> 44:     }
>> 45:   }
> 
> It seems you are assuming that all CFG nodes are reachable "from below".
> That is true in most cases... but:
> Have we not had this pesky case where we have a "infinite loop", where there is really no reachability from below, but from above it is reachable.
> 
> See `_root_and_safepoints` in `PhaseCCP`. I'm not sure we need to worry about this, but I'd like to be sure that we have considered infinite loops here.
> 
> The risk is that otherwise you just call those nodes dead, and do not verify them, right? Or you would just ignore failures there.

I guess that is the risk, but I'm going from root, and follow the outputs, I'm checking reachability from above, no?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332296280

From epeter at openjdk.org  Tue Sep  9 07:32:36 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 07:32:36 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v3]
In-Reply-To: <ylIL4AVS9i4oBXIImUlxGzE1uDAToMvzF282-EnOG8A=.a61aaeeb-cfef-44ae-8913-ee8f6f58b781@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <O5IGyu-C8N8goFvkFoKQxKuJ67f1_tedjCMqIwsLx1g=.69f50bdd-781e-4379-a8b5-12f8858ea299@github.com>
 <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com>
 <b1TzbMFznYJuFizcy93hsTxo9-hoyDe7YKUuIsy7xRA=.6811ef6f-3b3b-4b8a-b63b-75d824e65968@github.com>
 <hjuxd7lDyNoeFhxtYBMJQA1IDwzdu5tb1ZQcBqQLSeA=.623134f4-b2b8-4010-a6b5-5815e9d29aaf@github.com>
 <ylIL4AVS9i4oBXIImUlxGzE1uDAToMvzF282-EnOG8A=.a61aaeeb-cfef-44ae-8913-ee8f6f58b781@github.com>
Message-ID: <k0ubo89q5sh66RtJ1D3sHphTg9s3NCBE_wkQv9KHDD4=.f7875a72-2bd7-45c5-ae16-1558e9997339@github.com>

On Mon, 8 Sep 2025 02:28:16 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> To me a `false` means this:
>> If we support gater/scalter, then we do not need a vector index, we can do without it.
>> 
>> Is that correct?
>> 
>> But that would contradict @fg1417 's statement:
>> If we support gater/scalter, then we do not permit a vector index.
>> 
>> Can you clarify?
>
>> To me a `false` means this: If we support gater/scalter, then we do not need a vector index, we can do without it.
>> 
>> Is that correct?
> 
> Thanks for your review!  Actually gather/scatter always need an index input. What this function want to decide is how the index elements are passed to the operations.
> 
> It doesn't take an assumption whether vector gather_load/scatter_store is supported or not in backend. It just checks whether the `index` input of such operations requires a vector register or an address which stores the indexes. Currently, on x86, it passes an array address for subword types (the indexes are then will be loaded one-by-one in backend codegen). However, on AArch64, we requires it a vector type for all types instead (the indexes have been loaded and saved into vector registers in IR level). 
> 
>> The current platform does not support vector gather-load or scatter-store at all.
> 
> I'm sorry that I didn't  clarify very clear about @fg1417 's second statement. Whether the current platform supports vector gather-load/scatter-store is still decided by `Matcher::match_rule_supported_vector()` like other operations. It return `false` here just because arm doesn't support any vector operations. Assume if it want to support a vector gather/scatter, the index input must not be a vector, right?

Thanks for all the explanations, that was very helpful!

Can you please adjust the comment so that all the relevant information is there?
We could also make the name of the method more precise / informative?
Maybe you could write something like this:

// true -> if gather/scatter supported: require index in vector register
// false -> if gather/scatter supported: allows both index in vector register AND array address holding indices

Then give more information about platform specific things that you mentioned about aarch64 and x86 in the relevant files ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2332295242

From xgong at openjdk.org  Tue Sep  9 07:32:37 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Tue, 9 Sep 2025 07:32:37 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v3]
In-Reply-To: <k0ubo89q5sh66RtJ1D3sHphTg9s3NCBE_wkQv9KHDD4=.f7875a72-2bd7-45c5-ae16-1558e9997339@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <O5IGyu-C8N8goFvkFoKQxKuJ67f1_tedjCMqIwsLx1g=.69f50bdd-781e-4379-a8b5-12f8858ea299@github.com>
 <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com>
 <b1TzbMFznYJuFizcy93hsTxo9-hoyDe7YKUuIsy7xRA=.6811ef6f-3b3b-4b8a-b63b-75d824e65968@github.com>
 <hjuxd7lDyNoeFhxtYBMJQA1IDwzdu5tb1ZQcBqQLSeA=.623134f4-b2b8-4010-a6b5-5815e9d29aaf@github.com>
 <ylIL4AVS9i4oBXIImUlxGzE1uDAToMvzF282-EnOG8A=.a61aaeeb-cfef-44ae-8913-ee8f6f58b781@github.com>
 <k0ubo89q5sh66RtJ1D3sHphTg9s3NCBE_wkQv9KHDD4=.f7875a72-2bd7-45c5-ae16-1558e9997339@github.com>
Message-ID: <uLaQ6INqEPrD-G2XOIiIyhKgldALClCiUiVKgZNLm34=.84d9d564-1b9a-4db6-a14e-82a0c1dad625@github.com>

On Tue, 9 Sep 2025 07:27:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> To me a `false` means this: If we support gater/scalter, then we do not need a vector index, we can do without it.
>>> 
>>> Is that correct?
>> 
>> Thanks for your review!  Actually gather/scatter always need an index input. What this function want to decide is how the index elements are passed to the operations.
>> 
>> It doesn't take an assumption whether vector gather_load/scatter_store is supported or not in backend. It just checks whether the `index` input of such operations requires a vector register or an address which stores the indexes. Currently, on x86, it passes an array address for subword types (the indexes are then will be loaded one-by-one in backend codegen). However, on AArch64, we requires it a vector type for all types instead (the indexes have been loaded and saved into vector registers in IR level). 
>> 
>>> The current platform does not support vector gather-load or scatter-store at all.
>> 
>> I'm sorry that I didn't  clarify very clear about @fg1417 's second statement. Whether the current platform supports vector gather-load/scatter-store is still decided by `Matcher::match_rule_supported_vector()` like other operations. It return `false` here just because arm doesn't support any vector operations. Assume if it want to support a vector gather/scatter, the index input must not be a vector, right?
>
> Thanks for all the explanations, that was very helpful!
> 
> Can you please adjust the comment so that all the relevant information is there?
> We could also make the name of the method more precise / informative?
> Maybe you could write something like this:
> 
> // true -> if gather/scatter supported: require index in vector register
> // false -> if gather/scatter supported: allows both index in vector register AND array address holding indices
> 
> Then give more information about platform specific things that you mentioned about aarch64 and x86 in the relevant files ;)

Sure, I will do that in next commit. Thanks for your suggestion!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2332301225

From epeter at openjdk.org  Tue Sep  9 07:32:39 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 07:32:39 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <P1FNs23o3qks_15w5YJCBfiwLMs1QW_aBI2KSkptKZ4=.83c7e3b3-15fe-47a0-86dc-e1549af59e20@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
 <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>
 <P1FNs23o3qks_15w5YJCBfiwLMs1QW_aBI2KSkptKZ4=.83c7e3b3-15fe-47a0-86dc-e1549af59e20@github.com>
Message-ID: <mJ83rN71NdOfHjFP9bpFosqBeBf220ODeE36Bt6wBmA=.74e33425-bda6-41ff-81ac-880210e98c3b@github.com>

On Mon, 8 Sep 2025 02:57:55 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>>> That semantic is not quite what I would expect from `Concatenate`. Maybe we can call it something else? `VectorConcatenateAndNarrowNode`?
>> 
>> Yeah, `VectorConcatenateAndNarrowNode` would be much match. I just thought the name would be too long. I will change it in next commit. Thanks for your suggestion!
>
>> Have you considered using `2x Cast + Concatenate` instead, and just matching that in the backend? I don't remember how to do the mere Concat, but it should be possible via the `unslice` or some other operation that concatenates two vectors.
> 
> Would using `2x Cast + Concatenate` make the IRs and match rule more complex? Mere concatenate would be something like `vector slice` in Vector API.  It concatenates two vectors into one with an index denoting the merging position. And it requires the vector types are the same for two input vectors and the dst vector. Hence, if we want to separate this operation with cast and concatenate, the IRs would be (assume original type of `v1/v2` is `4-int`, the result type should be `8-short`):
> 1) Narrow two input vectors:
> `v1 = VectorCast(v1)  (4-short); v2 = VectorCast(v2) (4-short)`. 
> The vector length are not changed while the element size is half size. Hence the vector length in bytes is half size as well.
> 2) Resize `v1` and `v2` to double vector length. The higher bits are cleared:
> `v1 = VectorReinterpret(v1) (8-short); v2 = VectorReinterpret(v2) (8-short)`.
> 3) Concatenate `v1` and `v2` like slice. The position is the middle of the vector length.
> `v = VectorSlice(v1, v2, 4)  (8-short)`.
> 
> If we want to merging these IRs in backend, would the match rule be more complex? I will take a considering.

I'm not saying I know that this alternative would be better. I'm just worried about having extra IR nodes, and then optimizations are more complex / just don't work because we don't handle all nodes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2332301985

From epeter at openjdk.org  Tue Sep  9 07:36:52 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 07:36:52 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <nqrYrU7Mek9J-2Ogzuo1ZEMXN6iVdKsOztA403z7kcg=.ac9b40ba-5cd0-4f61-b245-a09559fb1de4@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
 <nqrYrU7Mek9J-2Ogzuo1ZEMXN6iVdKsOztA403z7kcg=.ac9b40ba-5cd0-4f61-b245-a09559fb1de4@github.com>
Message-ID: <b_jxflpoOL-0fjwOgW7C3ldG27PR6VRovvLFG8ccpEM=.8ca0f698-c57b-4a0e-a520-0d46fe4305a6@github.com>

On Mon, 8 Sep 2025 03:12:18 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Did you consider the alternative of `Extract` + `Cast`? Not sure if that would be better, you know more about the code complexity. It would just allow us to have one fewer nodes.
>
> It just has the `Extract` node to extract an element from vector in C2, right? Extracting the lowest part can be implemented with `VectorReinterpret` easily. But how about the higher parts? Maybe this can also be implemented with operations like `slice` ? But, seems this will also make the IR more complex? For `Cast`, we have `VectorCastMask` now, but it assumes the vector length should be the same for input and output. So the `VectorReinterpret` or an `VectorExtract` is sill needed. 
> 
> I can have a try with separating the IR. But I guess an additional new node is still necessary. 
> 
>> It would just allow us to have one fewer nodes.
> 
> This is also what I expect really.

It would just be nice to build on "simple" building blocks and not have too many complex nodes, that have very special semantics (widen + split into two). It just means that the IR optimizations have to take care of more special cases, rather than following simple rules/optimizations because every IR node does a relatively simple thing.

Maybe you find out that we really need a complex node, and can provide good arguments. Looking forward to what you find :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2332311631

From mchevalier at openjdk.org  Tue Sep  9 07:38:59 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 07:38:59 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <xAmOcPQgDd2KuCxet58jDVYlvXSD4hQolqiGZw8l75o=.4849ddca-0b5d-4347-8642-9bc2c73d237b@github.com>

On Mon, 8 Sep 2025 15:48:48 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   One more ResourceMark
>
> src/hotspot/share/opto/graphInvariants.cpp line 215:
> 
>> 213:       ss.print_cr("Input at index %d is nullptr.", _which_input);
>> 214:       return false;
>> 215:     }
> 
> So we would never do `AtInput(0, ExpectNullptr())` for example?
> Fine with me, just an idea to consider ;)

No, we can't do that because every pattern must be applied on a center. `AtInput` moves the center. We cannot use a parametric pattern to check that a node around is not there: there would be no place to apply the parameter pattern. We can make `InputIsNull(int)` for that.

> src/hotspot/share/opto/graphInvariants.hpp line 73:
> 
>> 71:    * In addition, if the check fails, it must write its error message in [ss].
>> 72:    *
>> 73:    * If the check succeeds or is not applicable, [steps], [path] and [ss] must be untouched.
> 
> I wonder if we should not have some object that represents these 3 args. You pass them everywhere, and they seem to be a unit. And they have invariants that we may want to check.
> You could for example enforce that steps and path are in synch just by only providing the access methods that allow it.
> What do you think?

`steps` and `path` can make sense. I don't think it makes sense for `ss` because we just fill it from `steps` and `path` at some point, it doesn't really evolve with. If you like it, I won't fight, but is it worth it? It seems like more ad-hoc types to be aware of for simplifying the code a little and real benefits but not big benefits imo.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332310623
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332318192

From epeter at openjdk.org  Tue Sep  9 07:38:59 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 07:38:59 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <xAmOcPQgDd2KuCxet58jDVYlvXSD4hQolqiGZw8l75o=.4849ddca-0b5d-4347-8642-9bc2c73d237b@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <xAmOcPQgDd2KuCxet58jDVYlvXSD4hQolqiGZw8l75o=.4849ddca-0b5d-4347-8642-9bc2c73d237b@github.com>
Message-ID: <W0_kqs2nu1NAKo8ToUG04-CBJ-XPY1EkZ93EZJ4bP2A=.f78e3b75-65e3-4897-b084-3d4fd6cce040@github.com>

On Tue, 9 Sep 2025 07:33:25 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 215:
>> 
>>> 213:       ss.print_cr("Input at index %d is nullptr.", _which_input);
>>> 214:       return false;
>>> 215:     }
>> 
>> So we would never do `AtInput(0, ExpectNullptr())` for example?
>> Fine with me, just an idea to consider ;)
>
> No, we can't do that because every pattern must be applied on a center. `AtInput` moves the center. We cannot use a parametric pattern to check that a node around is not there: there would be no place to apply the parameter pattern. We can make `InputIsNull(int)` for that.

Sounds good, we can do that when we need it :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332316770

From mchevalier at openjdk.org  Tue Sep  9 07:45:15 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 07:45:15 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <sYk2XwTRlI1qcJCfFB9-vIbob_OybhJd7dD73KnIrCk=.81140267-24ce-4ff0-a684-63f98fe24278@github.com>

On Mon, 8 Sep 2025 15:59:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   One more ResourceMark
>
> src/hotspot/share/opto/graphInvariants.cpp line 222:
> 
>> 220:     }
>> 221:     return result;
>> 222:   }
> 
> Would this not read better?
> Suggestion:
> 
>     bool success = _pattern->check(center->in(_which_input), state);
>     if (!success) {
>       state.trace_failure_path(center, _which_input);
>     }
>     return success;
>   }

I don't think it's terrible, but I don't think it's much better. If I know the code and that we want to add the new points in the path, and I'll read it as such either way. Or I don't know the code, but I know the types of `steps` and `path`, and I know what `push` does, while a custom type with custom methods has an higher learning cost. So to me, it's pretty equivalent.

> src/hotspot/share/opto/graphInvariants.cpp line 239:
> 
>> 237:     return true;
>> 238:   }
>> 239:   bool (Node::*_type_check)() const;
> 
> Suggestion:
> 
> private:
>   bool (Node::*_type_check)() const;
> 
> I would also suggest that you use a `typedef` here.
> Something like:
> `typedef bool (Node::*TypeCheckMethod)() const;`
> Then you can write
> Suggestion:
> 
> public:
>   const TypeCheckMethod _type_check;

Again, is it really better? If I know the code, I know what it needs, however I express it. If I don't know the code, looking at the signature won't be enough, I'll need to look up one level deeper the definition. Not sure it's a win.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332328768
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332333010

From mchevalier at openjdk.org  Tue Sep  9 07:48:26 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 07:48:26 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>
Message-ID: <k0FHgFPoG8gM3kEtbMGLfNap6yApYM_fJe3EdxqC1Q8=.1f42f626-dde4-45a3-b756-ca782be6e436@github.com>

On Mon, 8 Sep 2025 16:08:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 234:
>> 
>>> 232:   bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
>>> 233:     if (!(center->*_type_check)()) {
>>> 234:       ss.print_cr("Unexpected type: %s.", center->Name());
>> 
>> Is there a way we could say what we actually do expect? Not really, right? We'd need to do it via macro again.
>
> Or we pass a string .. not nice but would work with the macro for `NodeClassIsAndBind`. Not sure what's best here.

I thought about that and I think the current situation is ok. The pattern is not something highly mutable, it's mostly some hardcoded thing. I don't think it's hard to figure out what you're expecting. I'm very reluctant to add some ugliness to the patterns who must stay readable, to be easy to verify by a human. It could be solved with more templates tho.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332342466

From aph at openjdk.org  Tue Sep  9 07:58:11 2025
From: aph at openjdk.org (Andrew Haley)
Date: Tue, 9 Sep 2025 07:58:11 GMT
Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4
 only MacOSX aarch64 [v5]
In-Reply-To: <LZMIDjRdRM4uKuuhsDrLGrwWJoVMrwwEv_bprRIjddk=.048960e2-ef28-4274-a8c4-1f0d1d417100@github.com>
References: <FYgWIv_iFwkr9an56KHdJqZlyUgtD_4g2f51hvavZWw=.f5c943f7-65e6-4944-afec-0b9c19a7b284@github.com>
 <LZMIDjRdRM4uKuuhsDrLGrwWJoVMrwwEv_bprRIjddk=.048960e2-ef28-4274-a8c4-1f0d1d417100@github.com>
Message-ID: <yuxkc5X6CdT6Lo2rMXjYGtzJ6uu2CMYm3MEfgAyUdaI=.a784c733-cf12-422d-aa62-4d202ac4bd3d@github.com>

On Mon, 4 Aug 2025 21:26:22 GMT, Dean Long <dlong at openjdk.org> wrote:

>> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value.  Further, it takes a fast-path that uses the previous direct store when at a safepoint.  Combined, these changes should get us back to almost where we were before in terms of overhead.  If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value.
>
> Dean Long has updated the pull request incrementally with one additional commit since the last revision:
> 
>   one unconditional release should be enough

That looks like a nice improvement. Thanks.

-------------

Marked as reviewed by aph (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26399#pullrequestreview-3199895559

From mchevalier at openjdk.org  Tue Sep  9 08:01:24 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 08:01:24 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>
Message-ID: <4jB_I2sHD7IfzhR7ojHfsFPlvZFCOWaHf8aS0AZshj0=.d0162feb-10b8-488d-82fa-eb816ce5dda9@github.com>

On Mon, 8 Sep 2025 16:42:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 378:
>> 
>>> 376:     return CheckResult::VALID;
>>> 377:   }
>>> 378: };
>> 
>> I am wondering if it is really worth it to do the whole pattern matching approach, if we still have to write so much code.
>> 
>> There is a lot of boiler plate now, that has replaced the procedural code.
>> 
>> I'm just wondering if we are there yet, or if we need to find some way to make it more concise.
>> Maybe we can do something like this:
>> 
>> return <something>
>>        .applies_if(&Node::is_Phi)
>>        .check([&]() { return PatternBasedCheck::check(center, reachable_cfg_nodes, steps, path, ss); })
>>        .require(...)
>>        .finish();
>> 
>> Just an idea. It would probably be lambda based again, which has its disadvantages.
>> Maybe you have an even better idea.
>> I'd just like to understand why the Pattern based approach is really super desirable, what are the advantages and disadvantages?
>
> One advantage is definitively reporting. And it is still reasonably debuggable I think, my solution may be a little trickier that way.
> 
> I think there are multiple factors:
> - Simple: fewer abstractions can be easier to read/debug.
> - Concise: few lines of code.
> - Reporting: nice output when rules fail.

I could have wrote this without pattern at all, but I also want to make more example of differently complex usage of patterns. Writing it without patterns at all would be pretty similar to me.

I think the boilerplate has to exist somewhere. It's not nice to read, it's long, but it's (actually) simple. If we hide it somewhere, it's nicer to read and gives an impression of easier to understand, but harder to actually understand when something goes wrong. No strong opinion.

>> src/hotspot/share/opto/graphInvariants.cpp line 547:
>> 
>>> 545:         return CheckResult::FAILED;
>>> 546:       }
>>> 547:     }
>> 
>> If you do add `OuterStripMinedLoop`, make it a swich, and assert in the default case ;)
>
> I just saw that you do the `OuterStripMinedLoop` below. But to capture the parallel structure it may still be good. And to capture possible future extension.

I don't understand.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332373118
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332384303

From mchevalier at openjdk.org  Tue Sep  9 08:01:27 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 08:01:27 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <q3X_EWZRIU-1SVW3m5JilUivZd2tAv8G-IeGilUZpKY=.ce081014-24ce-44f8-9cd3-776930629e97@github.com>

On Mon, 8 Sep 2025 16:47:23 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   One more ResourceMark
>
> src/hotspot/share/opto/graphInvariants.cpp line 438:
> 
>> 436:         ss.print_cr("%s node must have at least one control successors. Found %d.", center->Name(), cfg_out);
>> 437:         return CheckResult::FAILED;
>> 438:       }
> 
> Is there some upper bound?

I don't think so.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332377717

From epeter at openjdk.org  Tue Sep  9 08:08:36 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:08:36 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer
In-Reply-To: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
Message-ID: <G8LTw3Itb1nlDmVNaEFLky860NKviRQ5YEY8L8xJwdQ=.9a1d668d-f40d-4982-89f6-24bf999cdece@github.com>

On Fri, 5 Sep 2025 13:02:00 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

> The nop list has never been used in the history of OpenJDK. Let's clean it up.
> 
> Tested with Mach5 tier 1-5, no related failures.

Looks quite reasonable. Thanks for cleaning the code :)

src/hotspot/cpu/ppc/ppc.ad line 4926:

> 4924:   // Unused, list one so that array generated by adlc is not empty.
> 4925:   // Aix compiler chokes if _nop_count = 0.
> 4926:   nops(fxNop);

There seems to be some justification here why we needed to have the list.
Can you quickly say why we should not be worried about that now? ;)

-------------

PR Review: https://git.openjdk.org/jdk/pull/27117#pullrequestreview-3199928796
PR Review Comment: https://git.openjdk.org/jdk/pull/27117#discussion_r2332395694

From epeter at openjdk.org  Tue Sep  9 08:13:24 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:13:24 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <yoQkUQQyMtEYQah2nRalN3sAAI3YT03or-K4rgPSWwI=.d9558bf4-4b78-4467-aa14-25ba3c2ae1c7@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <yoQkUQQyMtEYQah2nRalN3sAAI3YT03or-K4rgPSWwI=.d9558bf4-4b78-4467-aa14-25ba3c2ae1c7@github.com>
Message-ID: <Cr1UEMl2OZkAAx5it6c6OxAV-J3-UPFin9qRNyfkXlY=.6ff710a7-4563-4ccc-a92d-640ca2a4eb90@github.com>

On Tue, 9 Sep 2025 07:28:14 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 45:
>> 
>>> 43:       }
>>> 44:     }
>>> 45:   }
>> 
>> It seems you are assuming that all CFG nodes are reachable "from below".
>> That is true in most cases... but:
>> Have we not had this pesky case where we have a "infinite loop", where there is really no reachability from below, but from above it is reachable.
>> 
>> See `_root_and_safepoints` in `PhaseCCP`. I'm not sure we need to worry about this, but I'd like to be sure that we have considered infinite loops here.
>> 
>> The risk is that otherwise you just call those nodes dead, and do not verify them, right? Or you would just ignore failures there.
>
> I guess that is the risk, but I'm going from root, and follow the outputs, I'm checking reachability from above, no?

Never mind, I somehow did not look at this right. Sorry ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332431787

From epeter at openjdk.org  Tue Sep  9 08:18:03 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:18:03 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <4jB_I2sHD7IfzhR7ojHfsFPlvZFCOWaHf8aS0AZshj0=.d0162feb-10b8-488d-82fa-eb816ce5dda9@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>
 <4jB_I2sHD7IfzhR7ojHfsFPlvZFCOWaHf8aS0AZshj0=.d0162feb-10b8-488d-82fa-eb816ce5dda9@github.com>
Message-ID: <CTvOJ3ySq51MIP-9Edzrguc7qn5NG4oYQC7yuSEpoqo=.e4f47579-2230-45eb-861e-d93d05011505@github.com>

On Tue, 9 Sep 2025 07:58:45 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> I just saw that you do the `OuterStripMinedLoop` below. But to capture the parallel structure it may still be good. And to capture possible future extension.
>
> I don't understand.

I would still consider adding `OuterStripMinedLoop` here, to capture that it has a similar structure. Even if you also verify below specific things for `OuterStripMinedLoop`. Just to check that all these loop structures have the same kind of backedge shape.
And then make a switch out of it, with a default case that fails. In case we add yet another `Loop` shape, we would then catch that and add the logic for it.

But actually: do not all `Loop` shapes have this backedge pattern? Or are there some that have a `IfFalse` on the backedge? Because then you could also add `LoopNode` with `LoopEndNode`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332447521

From epeter at openjdk.org  Tue Sep  9 08:25:23 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:25:23 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <q3X_EWZRIU-1SVW3m5JilUivZd2tAv8G-IeGilUZpKY=.ce081014-24ce-44f8-9cd3-776930629e97@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <q3X_EWZRIU-1SVW3m5JilUivZd2tAv8G-IeGilUZpKY=.ce081014-24ce-44f8-9cd3-776930629e97@github.com>
Message-ID: <rJiYQRIbfT5zCARiBxUQHlMRb8LadmcxInoc5EslZio=.900e813b-a33e-462b-8700-74922d000cb9@github.com>

On Tue, 9 Sep 2025 07:57:08 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 438:
>> 
>>> 436:         ss.print_cr("%s node must have at least one control successors. Found %d.", center->Name(), cfg_out);
>>> 437:         return CheckResult::FAILED;
>>> 438:       }
>> 
>> Is there some upper bound?
>
> I don't think so.

Can you add a comment, why it can be arbitrarily large?
Do you have an example where we have very many ctrl uses?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332468642

From epeter at openjdk.org  Tue Sep  9 08:25:24 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:25:24 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <rJiYQRIbfT5zCARiBxUQHlMRb8LadmcxInoc5EslZio=.900e813b-a33e-462b-8700-74922d000cb9@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <q3X_EWZRIU-1SVW3m5JilUivZd2tAv8G-IeGilUZpKY=.ce081014-24ce-44f8-9cd3-776930629e97@github.com>
 <rJiYQRIbfT5zCARiBxUQHlMRb8LadmcxInoc5EslZio=.900e813b-a33e-462b-8700-74922d000cb9@github.com>
Message-ID: <JlGBhH5VrDCRo0FgBh96FPz45d0mXRkyYfcqIHEDwBY=.50195674-df30-491f-b757-d7da4e44c845@github.com>

On Tue, 9 Sep 2025 08:21:28 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I don't think so.
>
> Can you add a comment, why it can be arbitrarily large?
> Do you have an example where we have very many ctrl uses?

Also: are these all supposed to be projections of a specific kind? We could also test for that. You can also add that to a future RFE.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332472835

From djelinski at openjdk.org  Tue Sep  9 08:28:44 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 9 Sep 2025 08:28:44 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer
In-Reply-To: <G8LTw3Itb1nlDmVNaEFLky860NKviRQ5YEY8L8xJwdQ=.9a1d668d-f40d-4982-89f6-24bf999cdece@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
 <G8LTw3Itb1nlDmVNaEFLky860NKviRQ5YEY8L8xJwdQ=.9a1d668d-f40d-4982-89f6-24bf999cdece@github.com>
Message-ID: <GyKeQV23cKD4uCnOUw2XdACewnaVVKh8nOZ9jFy7WD8=.23acbe92-7336-4ee2-9c0b-6ac028cde77b@github.com>

On Tue, 9 Sep 2025 08:01:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The nop list has never been used in the history of OpenJDK. Let's clean it up.
>> 
>> Tested with Mach5 tier 1-5, no related failures.
>
> src/hotspot/cpu/ppc/ppc.ad line 4926:
> 
>> 4924:   // Unused, list one so that array generated by adlc is not empty.
>> 4925:   // Aix compiler chokes if _nop_count = 0.
>> 4926:   nops(fxNop);
> 
> There seems to be some justification here why we needed to have the list.
> Can you quickly say why we should not be worried about that now? ;)

I don't have the AIX compiler at hand, but based on the comment I'd guess that the AIX compiler errored out either on [this](https://github.com/openjdk/jdk/blob/b1fa1ecc988fb07f191892a459625c2c8f2de3b5/src/hotspot/share/opto/output.cpp#L1403) or on [this](https://github.com/openjdk/jdk/blob/91f12600d2b188ca98c5c575a34b85f5835399a0/src/hotspot/share/adlc/output_h.cpp#L1122). Both these lines are removed in this PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27117#discussion_r2332483194

From roland at openjdk.org  Tue Sep  9 08:35:14 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Tue, 9 Sep 2025 08:35:14 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v6]
In-Reply-To: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
Message-ID: <ta7-wzIVMl8i_e9Cdjk8aHxTCiCL_VusR8tkGUtgqko=.7d04d4a2-b81a-4d64-846d-17426b12b0b5@github.com>

> A node in a pre loop only has uses out of the loop dominated by the
> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
> to the loop exit projection. A range check in the main loop has this
> node as input (through a chain of some other nodes). Range check
> elimination needs to update the exit condition of the pre loop with an
> expression that depends on the node pinned on its exit: that's
> impossible and the assert fires. This is a variant of 8314024 (this
> one was for a node with uses out of the pre loop on multiple paths). I
> propose the same fix: leave the node with control in the pre loop in
> this case.

Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:

 - Merge branch 'master' into JDK-8361702
 - review
 - Merge branch 'master' into JDK-8361702
 - Update src/hotspot/share/opto/loopopts.cpp
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - Update src/hotspot/share/opto/loopopts.cpp
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - tests
 - fix

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26424/files
  - new: https://git.openjdk.org/jdk/pull/26424/files/6da75e9d..b220867d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=04-05

  Stats: 34368 lines in 1338 files changed: 21178 ins; 7472 del; 5718 mod
  Patch: https://git.openjdk.org/jdk/pull/26424.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26424/head:pull/26424

PR: https://git.openjdk.org/jdk/pull/26424

From roland at openjdk.org  Tue Sep  9 08:35:18 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Tue, 9 Sep 2025 08:35:18 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v4]
In-Reply-To: <Kb_eKOfe-eEGcKsnF-ff8f7Uf4c2XMf0cvRlx8s1wIY=.7051a4c1-6525-440f-8ec7-fce603faf1f6@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <SGrdYRJonE7IeyU3AwADQcdZrKgkZggYb7utb4-vE0o=.1646c8b9-e5b0-4d0e-bb79-4452b115e4f9@github.com>
 <Kb_eKOfe-eEGcKsnF-ff8f7Uf4c2XMf0cvRlx8s1wIY=.7051a4c1-6525-440f-8ec7-fce603faf1f6@github.com>
Message-ID: <chGOlGrPMBpebb61fkzCf-gXaATEvgi5dQpwj5skC5k=.d4ffa04f-39c6-41c0-9bbe-23dc2a39b348@github.com>

On Mon, 28 Jul 2025 06:34:46 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8361702
>>  - Update src/hotspot/share/opto/loopopts.cpp
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update src/hotspot/share/opto/loopopts.cpp
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - tests
>>  - fix
>
> Marked as reviewed by chagedorn (Reviewer).

@chhagedorn would you mind re-approving this change now that I added the run without flags and merged with latest?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26424#issuecomment-3269515987

From epeter at openjdk.org  Tue Sep  9 08:35:50 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:35:50 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <sYk2XwTRlI1qcJCfFB9-vIbob_OybhJd7dD73KnIrCk=.81140267-24ce-4ff0-a684-63f98fe24278@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <sYk2XwTRlI1qcJCfFB9-vIbob_OybhJd7dD73KnIrCk=.81140267-24ce-4ff0-a684-63f98fe24278@github.com>
Message-ID: <sgsDAVatavD0Cpl4DKpP3aYmcXaO1xAzSDpDthjdOY0=.de774f26-67ae-42bb-a9fb-0c81fb3aba5a@github.com>

On Tue, 9 Sep 2025 07:40:23 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 222:
>> 
>>> 220:     }
>>> 221:     return result;
>>> 222:   }
>> 
>> Would this not read better?
>> Suggestion:
>> 
>>     bool success = _pattern->check(center->in(_which_input), state);
>>     if (!success) {
>>       state.trace_failure_path(center, _which_input);
>>     }
>>     return success;
>>   }
>
> I don't think it's terrible, but I don't think it's much better. If I know the code and that we want to add the new points in the path, and I'll read it as such either way. Or I don't know the code, but I know the types of `steps` and `path`, and I know what `push` does, while a custom type with custom methods has an higher learning cost. So to me, it's pretty equivalent.

To me, it was a high overhead having to find out where the `steps` `path` and `ss` were defined. If I know it is some state, I can quickly go to the definition, and see what it is all about.
You can also call the method `state.push_to_paths_and_steps(center, _which_input)`.

>> src/hotspot/share/opto/graphInvariants.cpp line 239:
>> 
>>> 237:     return true;
>>> 238:   }
>>> 239:   bool (Node::*_type_check)() const;
>> 
>> Suggestion:
>> 
>> private:
>>   bool (Node::*_type_check)() const;
>> 
>> I would also suggest that you use a `typedef` here.
>> Something like:
>> `typedef bool (Node::*TypeCheckMethod)() const;`
>> Then you can write
>> Suggestion:
>> 
>> public:
>>   const TypeCheckMethod _type_check;
>
> Again, is it really better? If I know the code, I know what it needs, however I express it. If I don't know the code, looking at the signature won't be enough, I'll need to look up one level deeper the definition. Not sure it's a win.

I think it is a matter of taste. I don't personally like the C++ way of expressing pointer types. But I can get used to it.

>> src/hotspot/share/opto/graphInvariants.hpp line 73:
>> 
>>> 71:    * In addition, if the check fails, it must write its error message in [ss].
>>> 72:    *
>>> 73:    * If the check succeeds or is not applicable, [steps], [path] and [ss] must be untouched.
>> 
>> I wonder if we should not have some object that represents these 3 args. You pass them everywhere, and they seem to be a unit. And they have invariants that we may want to check.
>> You could for example enforce that steps and path are in synch just by only providing the access methods that allow it.
>> What do you think?
>
> `steps` and `path` can make sense. I don't think it makes sense for `ss` because we just fill it from `steps` and `path` at some point, it doesn't really evolve with. If you like it, I won't fight, but is it worth it? It seems like more ad-hoc types to be aware of for simplifying the code a little and real benefits but not big benefits imo.

Let's ask @chhagedorn . He might have a good idea too here ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332506238
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332490006
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332509836

From epeter at openjdk.org  Tue Sep  9 08:35:51 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:35:51 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <k0FHgFPoG8gM3kEtbMGLfNap6yApYM_fJe3EdxqC1Q8=.1f42f626-dde4-45a3-b756-ca782be6e436@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>
 <k0FHgFPoG8gM3kEtbMGLfNap6yApYM_fJe3EdxqC1Q8=.1f42f626-dde4-45a3-b756-ca782be6e436@github.com>
Message-ID: <8F_IhYAZ2XxKl9SzWYNYkGvXzKEj1rl8GsRFrORBWaE=.4bd4bd61-f01d-480b-86b1-e65bbf61b065@github.com>

On Tue, 9 Sep 2025 07:45:18 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> Or we pass a string .. not nice but would work with the macro for `NodeClassIsAndBind`. Not sure what's best here.
>
> I thought about that and I think the current situation is ok. The pattern is not something highly mutable, it's mostly some hardcoded thing. I don't think it's hard to figure out what you're expecting. I'm very reluctant to add some ugliness to the patterns who must stay readable, to be easy to verify by a human. It could be solved with more templates tho.

Once we have more complex patterns, will it really be that easy to see what was expected?
All you will see is what we actually got. You are already all about good reporting, so I just noticed a hole here.
You know the code better, so I'll leave it up to you in the end ;)

>> One advantage is definitively reporting. And it is still reasonably debuggable I think, my solution may be a little trickier that way.
>> 
>> I think there are multiple factors:
>> - Simple: fewer abstractions can be easier to read/debug.
>> - Concise: few lines of code.
>> - Reporting: nice output when rules fail.
>
> I could have wrote this without pattern at all, but I also want to make more example of differently complex usage of patterns. Writing it without patterns at all would be pretty similar to me.
> 
> I think the boilerplate has to exist somewhere. It's not nice to read, it's long, but it's (actually) simple. If we hide it somewhere, it's nicer to read and gives an impression of easier to understand, but harder to actually understand when something goes wrong. No strong opinion.

Yes, these are the trade-offs. Maybe we can discuss in the office, and pull in some others to discuss the pros and cons. Because if we are going to use Patterns more in other places, we should not shy away from doing some design brainstorming together. I really appreciate the new approach, and I can see a lot of benefits, including for IGVN.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332497359
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332483756

From epeter at openjdk.org  Tue Sep  9 08:38:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:38:32 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <zJ_64S_33_XM0PJXxKU5cVJKeawayMUaWU7E0iBKkKw=.d046b0de-1772-4904-916a-cfca5034f634@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
 <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>
 <GdM72hQe1NvODLC6vcGtXrL5GnMA2c6IsRcdVW3z6r8=.740386db-28e4-46b7-a321-2218dfe2d846@github.com>
 <zJ_64S_33_XM0PJXxKU5cVJKeawayMUaWU7E0iBKkKw=.d046b0de-1772-4904-916a-cfca5034f634@github.com>
Message-ID: <Op0K9v60ajIqvDAbyLxf7vvLtHrSaJgAjCIMzK_6WGE=.0106823b-a76c-44cf-b93b-6ab8ef700d77@github.com>

On Mon, 8 Sep 2025 16:20:20 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Yes, the subtraction is consistent, because if the register mask is offset, we can no longer use the OptoReg to directly index the mask. Small simplified example: register mask with 5 bits, offset by 10. First bit (index 0) represents OptoReg 10, second bit (index 1) represents OptoReg 11, etc. If we call `Member(15)`, we need to subtract the offset so we look at the correct index in the register mask (index 5).
>
> Ah, I think I now better understand your question. `rm_up` is a low-level method for internal use in `regmask.hpp` and `regmask.cpp` only (perhaps I should prepend it with an underscore?). It basically makes it so that we can regard the backing storage (`_RM_UP` and `_RM_UP_EXT`) as one contiguous array. `Member` is exposed externally and so needs the offset logic.

Makes sense. Maybe we can make that a bit more clear in the renaming.
Maybe we can make a clear distinction between the two mappings somehow?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2332518245

From roland at openjdk.org  Tue Sep  9 08:39:37 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Tue, 9 Sep 2025 08:39:37 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v7]
In-Reply-To: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
Message-ID: <BfbDzTNQbwqUh0pXFbXiFy09JPGfbLPVmUTx-YEe1KM=.cad33852-809f-4907-a41d-628d0d0db07e@github.com>

> A node in a pre loop only has uses out of the loop dominated by the
> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
> to the loop exit projection. A range check in the main loop has this
> node as input (through a chain of some other nodes). Range check
> elimination needs to update the exit condition of the pre loop with an
> expression that depends on the node pinned on its exit: that's
> impossible and the assert fires. This is a variant of 8314024 (this
> one was for a node with uses out of the pre loop on multiple paths). I
> propose the same fix: leave the node with control in the pre loop in
> this case.

Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:

 - Merge branch 'master' into JDK-8361702
 - Merge branch 'master' into JDK-8361702
 - review
 - Merge branch 'master' into JDK-8361702
 - Update src/hotspot/share/opto/loopopts.cpp
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - Update src/hotspot/share/opto/loopopts.cpp
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
 - tests
 - ... and 1 more: https://git.openjdk.org/jdk/compare/e3d13e64...91a7d73c

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26424/files
  - new: https://git.openjdk.org/jdk/pull/26424/files/b220867d..91a7d73c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=05-06

  Stats: 228 lines in 12 files changed: 43 ins; 163 del; 22 mod
  Patch: https://git.openjdk.org/jdk/pull/26424.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26424/head:pull/26424

PR: https://git.openjdk.org/jdk/pull/26424

From epeter at openjdk.org  Tue Sep  9 08:43:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:43:32 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v9]
In-Reply-To: <dA3mvVbfZcBhR9Yi6HKk9s_7UZ76kI1CkReQFbyDZms=.cc99b4a1-dff5-4202-8936-86301a41e766@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
 <py_UgbCQ3Y7BlN3tQkylQSISyMJ3zHa3VoDP7VK83jY=.e71f52c0-ef80-4f7e-afe8-0e60d33cb785@github.com>
 <dA3mvVbfZcBhR9Yi6HKk9s_7UZ76kI1CkReQFbyDZms=.cc99b4a1-dff5-4202-8936-86301a41e766@github.com>
Message-ID: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com>

On Fri, 8 Aug 2025 08:21:56 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Add microbench
>>  - Add missing test method declarations
>
> Hi @jatin-bhateja, I've added a micro benchmark that includes the `numberOfNibbles` implementation from this PR description and your micro kernel.
> 
> Here's my test results on an Intel(R) Xeon(R) Platinum:
> 
> 
> # Baseline:
> Benchmark                                  Mode  Cnt     Score   Error  Units
> CountLeadingZeros.benchClzLongConstrained  avgt   15  1517.888 ? 5.691  ns/op
> CountLeadingZeros.benchNumberOfNibbles     avgt   15  1094.422 ? 1.753  ns/op
> 
> # This patch:
> Benchmark                                  Mode  Cnt    Score   Error  Units
> CountLeadingZeros.benchClzLongConstrained  avgt   15    0.948 ? 0.002  ns/op
> CountLeadingZeros.benchNumberOfNibbles     avgt   15  942.438 ? 1.742  ns/op

@MaxXSoft Feel free to just ping me again when you want another review :)
FYI: I'll be on a longer vacation starting in about a week, so don't expect me to respond then.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3269553729

From epeter at openjdk.org  Tue Sep  9 08:45:30 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:45:30 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
Message-ID: <FxG5R2UVA9XAmC3t6QNyvO8wWsnNk82E9C-SwfqBNbk=.f3c0015a-6718-4aa7-a246-7b1b1f6739ef@github.com>

On Fri, 5 Sep 2025 08:13:28 GMT, erifan <duke at openjdk.org> wrote:

> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
> 
> 
> Benchmarks on Intel 6444y machine with 512-bit avx3:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
> microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
> microMaskLane...

@erifan This is a regression / bug fix for https://github.com/openjdk/jdk/pull/25673, right? If so, please convert the JBS issue into a bug.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27113#issuecomment-3269564806

From epeter at openjdk.org  Tue Sep  9 08:51:21 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:51:21 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
Message-ID: <gQGD4OgDoIK3W6iFkrD7ib0ZYu7cM3FSB6_5-4uMT6k=.e4c341da-1123-4356-a9f9-632d91d903ce@github.com>

On Fri, 5 Sep 2025 08:13:28 GMT, erifan <duke at openjdk.org> wrote:

> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
> 
> 
> Benchmarks on Intel 6444y machine with 512-bit avx3:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
> microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
> microMaskLane...

The patch looks reasonable, thanks for fixing this and writing an IR test!
I'm launching some internal testing now, should hopefully not take much more than 24h.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27113#issuecomment-3269585613

From epeter at openjdk.org  Tue Sep  9 08:56:40 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 08:56:40 GMT
Subject: RFR: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation
In-Reply-To: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
Message-ID: <mb7qlfOHy2o6D1Qrz5IcKEzO6XfyNvVrSaIRKLbZiAc=.0478c423-72c5-4c3c-8d5c-39819f1b6866@github.com>

On Fri, 5 Sep 2025 15:27:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> When running a debug JVM on Linux with a compile task timeout and repeated compilation, the execution will time out almost always because the timeout does not reset for repetitions of a compilation. The core of the compile task timeout is to limit the amount of time a single compilation can take. Thus, this PR resets the `CompileTaskTimeout` for every compilation when running with `-XX:RepeatCompilation=<n>` for n > 1.
> 
> This PR is stacked on top of #27094.
> 
> Testing:
>  - [x] Github Actions (failures are unrelated)
>  - [x] tier1, tier2, tier3 plus some additional internal testing

Looks reasonable :)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27120#pullrequestreview-3200195978

From epeter at openjdk.org  Tue Sep  9 09:02:00 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 09:02:00 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer
In-Reply-To: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
Message-ID: <HegklSTA_wQupfMv-AG7oA04JsaceJhF3ugV_glnLdY=.bf7d54f3-22a0-4b8c-bef3-30ee8d38fa9b@github.com>

On Fri, 5 Sep 2025 13:02:00 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

> The nop list has never been used in the history of OpenJDK. Let's clean it up.
> 
> Tested with Mach5 tier 1-5, no related failures.

Approved.
(assuming you run the additional stress testing I asked for over slack)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27117#pullrequestreview-3200229837

From epeter at openjdk.org  Tue Sep  9 09:02:03 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 09:02:03 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer
In-Reply-To: <GyKeQV23cKD4uCnOUw2XdACewnaVVKh8nOZ9jFy7WD8=.23acbe92-7336-4ee2-9c0b-6ac028cde77b@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
 <G8LTw3Itb1nlDmVNaEFLky860NKviRQ5YEY8L8xJwdQ=.9a1d668d-f40d-4982-89f6-24bf999cdece@github.com>
 <GyKeQV23cKD4uCnOUw2XdACewnaVVKh8nOZ9jFy7WD8=.23acbe92-7336-4ee2-9c0b-6ac028cde77b@github.com>
Message-ID: <X9giKsA1uwfPnOd65buu_wgjK1nmLzaeAYtIrdGiPSY=.e7412b1f-941e-402f-865e-60f5bf1efe69@github.com>

On Tue, 9 Sep 2025 08:25:41 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> src/hotspot/cpu/ppc/ppc.ad line 4926:
>> 
>>> 4924:   // Unused, list one so that array generated by adlc is not empty.
>>> 4925:   // Aix compiler chokes if _nop_count = 0.
>>> 4926:   nops(fxNop);
>> 
>> There seems to be some justification here why we needed to have the list.
>> Can you quickly say why we should not be worried about that now? ;)
>
> I don't have the AIX compiler at hand, but based on the comment I'd guess that the AIX compiler errored out either on [this](https://github.com/openjdk/jdk/blob/b1fa1ecc988fb07f191892a459625c2c8f2de3b5/src/hotspot/share/opto/output.cpp#L1403) or on [this](https://github.com/openjdk/jdk/blob/91f12600d2b188ca98c5c575a34b85f5835399a0/src/hotspot/share/adlc/output_h.cpp#L1122). Both these lines are removed in this PR.

Ok, sounds good!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27117#discussion_r2332606173

From epeter at openjdk.org  Tue Sep  9 09:14:05 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 09:14:05 GMT
Subject: RFR: 8366984: Remove delay slot support
In-Reply-To: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
Message-ID: <d156dQjZQs-9T_5Q-77vP54YAjv8jDOKla1d-fHhlns=.f5ec3359-8d09-4c68-90ce-59529e1f1ff2@github.com>

On Fri, 5 Sep 2025 14:24:50 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
> 
> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.

Looks reasonable, thanks for doing the cleanup! I have 2 minor questions though.
(please also run additional stress testing, see slack)

src/hotspot/cpu/arm/arm.ad line 3383:

> 3381:     BR     : R;
> 3382: %}
> 3383: 

Where was this used? Or is it an unrelated cleanup?

src/hotspot/share/adlc/adlparse.cpp line 1394:

> 1392:           parse_err(SYNERR, "Using obsolete token, branch_has_delay_slot");
> 1393:           break;
> 1394:         }

I'm curious: why do you add that special warning? It would fail later anyway, right? Are we expecting anyone to parse things produced by different versions?

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27119#pullrequestreview-3200246647
PR Review Comment: https://git.openjdk.org/jdk/pull/27119#discussion_r2332626258
PR Review Comment: https://git.openjdk.org/jdk/pull/27119#discussion_r2332620923

From epeter at openjdk.org  Tue Sep  9 09:17:14 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 09:17:14 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation
In-Reply-To: <tHD8aWJ_d1GaBqE6Sw7Ip_Yt_Y2y6m6OhaKj0e1mq7U=.8b515a36-eb3f-47bc-9d1d-861b68d32c6d@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <tmEj88Ez4DURxmS7pPm8t1lhRct4fHlQcBySljEu-tg=.a18e80cf-7711-4ac6-990f-4c630b90f98b@github.com>
 <tHD8aWJ_d1GaBqE6Sw7Ip_Yt_Y2y6m6OhaKj0e1mq7U=.8b515a36-eb3f-47bc-9d1d-861b68d32c6d@github.com>
Message-ID: <Zj6KDeuI8lUiJwjWzj4623OqS3Egmlp_PNZvWb0W-ww=.db6e064b-8519-4ca7-bc0d-81e7841e76dc@github.com>

On Wed, 3 Sep 2025 10:11:38 GMT, erifan <duke at openjdk.org> wrote:

>> The algorithm description here is great. Please paste all of it from "Since there are" to "but with different instructions where appropriate." into this PR, before the vector expand implementation.
>
> @theRealAph @e1iu @XiaohongGong @fg1417 @shqking, could you help take a look at this PR, thanks~

@erifan Feel free to ping me again if I should re-review. I'm going on vacation in a week, so I'll be unresponsive for a while (feel free to contact other reviewers, especially for additional testing).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3269684547

From xgong at openjdk.org  Tue Sep  9 09:17:14 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Tue, 9 Sep 2025 09:17:14 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
Message-ID: <BeU0nih59-PfsS8GJ9H7dmUDelXaOZ7RaKKqgF9tQMU=.425f90aa-f94c-48a1-b164-19965568cdbb@github.com>

On Fri, 5 Sep 2025 08:13:28 GMT, erifan <duke at openjdk.org> wrote:

> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
> 
> 
> Benchmarks on Intel 6444y machine with 512-bit avx3:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
> microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
> microMaskLane...

> @erifan This is a regression / bug fix for #25673, right? If so, please convert the JBS issue into a bug.

Thanks for your review! I'v changed the JBS type to bug.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27113#issuecomment-3269685052

From djelinski at openjdk.org  Tue Sep  9 09:18:21 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 9 Sep 2025 09:18:21 GMT
Subject: RFR: 8366984: Remove delay slot support
In-Reply-To: <d156dQjZQs-9T_5Q-77vP54YAjv8jDOKla1d-fHhlns=.f5ec3359-8d09-4c68-90ce-59529e1f1ff2@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
 <d156dQjZQs-9T_5Q-77vP54YAjv8jDOKla1d-fHhlns=.f5ec3359-8d09-4c68-90ce-59529e1f1ff2@github.com>
Message-ID: <hNbMGy3WTfUw0CUbOC07KZDg9mInqs0CrkDG5YvORrY=.e208e2bf-6f04-4af3-acf8-f241d98edf56@github.com>

On Tue, 9 Sep 2025 09:03:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
>> 
>> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.
>
> src/hotspot/share/adlc/adlparse.cpp line 1394:
> 
>> 1392:           parse_err(SYNERR, "Using obsolete token, branch_has_delay_slot");
>> 1393:           break;
>> 1394:         }
> 
> I'm curious: why do you add that special warning? It would fail later anyway, right? Are we expecting anyone to parse things produced by different versions?

I took my inspiration from earlier work on adlc (see 6e35bcbf038cec0210c38428a8e1c233e102911a or 3f9c8a39201644952c6d07b97695a5a7ef918622), but I don't mind removing these warnings and the related code block entirely.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27119#discussion_r2332667364

From epeter at openjdk.org  Tue Sep  9 09:24:45 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 09:24:45 GMT
Subject: RFR: 8356779: IGV: dump the index of the SafePointNode containing
 the current JVMS during parsing
In-Reply-To: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
References: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
Message-ID: <zurgFhHUkxivZ7XL9VsHmNA3IL-siNo2VQEIvp2_tvA=.87bdf751-e0fc-48ba-9c1c-faf8bbef760c@github.com>

On Thu, 4 Sep 2025 05:22:00 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> This PR prints index of the SafePointNode containing the current JVMS during parsing in IGV. As stated in JBS the reason for this is that there are a lot of nodes during parsing, it would be nice to know what are the current nodes in the local slots or in the stack when looking at a graph.

Looks reasonable.

@merykitty first proposed this, so would be good if he took a look too :)

Just out of curiosity: could you show a before/after igv screenshot?

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27083#pullrequestreview-3200350110
PR Comment: https://git.openjdk.org/jdk/pull/27083#issuecomment-3269709657

From duke at openjdk.org  Tue Sep  9 09:26:04 2025
From: duke at openjdk.org (erifan)
Date: Tue, 9 Sep 2025 09:26:04 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation
In-Reply-To: <tHD8aWJ_d1GaBqE6Sw7Ip_Yt_Y2y6m6OhaKj0e1mq7U=.8b515a36-eb3f-47bc-9d1d-861b68d32c6d@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <tmEj88Ez4DURxmS7pPm8t1lhRct4fHlQcBySljEu-tg=.a18e80cf-7711-4ac6-990f-4c630b90f98b@github.com>
 <tHD8aWJ_d1GaBqE6Sw7Ip_Yt_Y2y6m6OhaKj0e1mq7U=.8b515a36-eb3f-47bc-9d1d-861b68d32c6d@github.com>
Message-ID: <KN9AfwG-JV0lUWDN__wxICuHFXjjrCtd2A11sLiKl5Y=.beb1078e-7238-459c-88b7-bb0a9d16b997@github.com>

On Wed, 3 Sep 2025 10:11:38 GMT, erifan <duke at openjdk.org> wrote:

>> The algorithm description here is great. Please paste all of it from "Since there are" to "but with different instructions where appropriate." into this PR, before the vector expand implementation.
>
> @theRealAph @e1iu @XiaohongGong @fg1417 @shqking, could you help take a look at this PR, thanks~

> @erifan Feel free to ping me again if I should re-review. I'm going on vacation in a week, so I'll be unresponsive for a while (feel free to contact other reviewers, especially for additional testing).

Thanks @eme64 , I have made the corresponding changes according to your suggestions. Please help take another look. Thank you!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3269712905

From djelinski at openjdk.org  Tue Sep  9 09:26:04 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 9 Sep 2025 09:26:04 GMT
Subject: RFR: 8366984: Remove delay slot support
In-Reply-To: <d156dQjZQs-9T_5Q-77vP54YAjv8jDOKla1d-fHhlns=.f5ec3359-8d09-4c68-90ce-59529e1f1ff2@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
 <d156dQjZQs-9T_5Q-77vP54YAjv8jDOKla1d-fHhlns=.f5ec3359-8d09-4c68-90ce-59529e1f1ff2@github.com>
Message-ID: <nhoHqghEdOAC7F2YzBUjE4oypjXU8HDmsc789gGi8zg=.798a9f40-5b37-442a-bf89-2e5379fdddfa@github.com>

On Tue, 9 Sep 2025 09:04:48 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
>> 
>> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.
>
> src/hotspot/cpu/arm/arm.ad line 3383:
> 
>> 3381:     BR     : R;
>> 3382: %}
>> 3383: 
> 
> Where was this used? Or is it an unrelated cleanup?

Removing the comment alone didn't feel quite right, so I removed the following block as well. The block appears to be unused. It was copy-pasted from [SPARC](https://github.com/openjdk/jdk/blob/8153779ad32d1e8ddd37ced826c76c7aafc61894/hotspot/src/cpu/sparc/vm/sparc.ad#L4984), where it was also unused.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27119#discussion_r2332702728

From duke at openjdk.org  Tue Sep  9 09:29:24 2025
From: duke at openjdk.org (erifan)
Date: Tue, 9 Sep 2025 09:29:24 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation
In-Reply-To: <tmEj88Ez4DURxmS7pPm8t1lhRct4fHlQcBySljEu-tg=.a18e80cf-7711-4ac6-990f-4c630b90f98b@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <tmEj88Ez4DURxmS7pPm8t1lhRct4fHlQcBySljEu-tg=.a18e80cf-7711-4ac6-990f-4c630b90f98b@github.com>
Message-ID: <YxrR371gpyMeJEbioD-Bupej5RSArZgKCKP1BuQLMQQ=.ae9cdec7-59e4-4969-821c-7e8c0bcefbdf@github.com>

On Wed, 20 Aug 2025 11:27:59 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> The algorithm description here is great. Please paste all of it from "Since there are" to "but with different instructions where appropriate." into this PR, before the vector expand implementation.

@theRealAph @e1iu could you help take another look of this PR, thanks !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3269731241

From xgong at openjdk.org  Tue Sep  9 09:33:34 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Tue, 9 Sep 2025 09:33:34 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
Message-ID: <TbPq9pwx0srZqmT_4tblbgVvxlata8Fes9HemJw8p2c=.4083a780-574f-4064-8fbe-cc3df25e3be3@github.com>

On Fri, 5 Sep 2025 08:13:28 GMT, erifan <duke at openjdk.org> wrote:

> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
> 
> 
> Benchmarks on Intel 6444y machine with 512-bit avx3:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
> microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
> microMaskLane...

LGTM!

-------------

Marked as reviewed by xgong (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27113#pullrequestreview-3200414898

From duke at openjdk.org  Tue Sep  9 09:33:35 2025
From: duke at openjdk.org (erifan)
Date: Tue, 9 Sep 2025 09:33:35 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <gQGD4OgDoIK3W6iFkrD7ib0ZYu7cM3FSB6_5-4uMT6k=.e4c341da-1123-4356-a9f9-632d91d903ce@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
 <gQGD4OgDoIK3W6iFkrD7ib0ZYu7cM3FSB6_5-4uMT6k=.e4c341da-1123-4356-a9f9-632d91d903ce@github.com>
Message-ID: <m27A3ohirLL6PI7NMZlATfMunTKsWA7xNxK2ec5gCbk=.e24abf6b-4157-42c3-a007-bc416471b511@github.com>

On Tue, 9 Sep 2025 08:48:42 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
>> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
>> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
>> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
>> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
>> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
>> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
>> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
>> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
>> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
>> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
>> 
>> 
>> Benchmarks on Intel 6444y machine with 512-bit avx3:
>> 
>> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
>> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
>> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
>> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
>> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
>> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
>> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
>> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
>> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
>> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
>> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
>> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
>> microMaskLaneIsSetInt512_var	ops/ms	573...
>
> The patch looks reasonable, thanks for fixing this and writing an IR test!
> I'm launching some internal testing now, should hopefully not take much more than 24h.

Thanks for your help @eme64 @XiaohongGong @shipilev

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27113#issuecomment-3269744719

From mchevalier at openjdk.org  Tue Sep  9 09:39:49 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 09:39:49 GMT
Subject: RFR: 8367135: Test compiler/loopstripmining/CheckLoopStripMining.java
 needs internal timeouts adjusted
Message-ID: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>

As described, adjust timeout to be as it implicitly used to be.

Thanks,
Marc

-------------

Commit messages:
 - Explicit * 4, but literal

Changes: https://git.openjdk.org/jdk/pull/27167/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27167&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367135
  Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/27167.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27167/head:pull/27167

PR: https://git.openjdk.org/jdk/pull/27167

From mchevalier at openjdk.org  Tue Sep  9 09:55:05 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 09:55:05 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <S0di-GnMEDTJmUFIBcNklkkCByloS4VYFGqksp_0dl8=.0b30d561-5332-4555-8659-e01d887df9cb@github.com>

On Mon, 8 Sep 2025 14:58:33 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   One more ResourceMark
>
> src/hotspot/share/opto/compile.cpp line 702:
> 
>> 700:       ,
>> 701:       _in_dump_cnt(0),
>> 702:       _invariant_checker(GraphInvariantChecker::make_default())
> 
> How does this interface with `ResouceMarks`?
> Because it is now resource allocated. And so is the `_checks`.
> How does this not trip the nesting asserts of allocation there?
> I'm probably missing something here.
> 
> I would have expected that we need to allocate it from the `_comp_arena`.

I guess I can put it in the comp arena if it's better, but I don't see why there would be a problem. These things are under the ResourceMark in the block where the `Compile` object is created (in `C2Compiler::compile_method`), and deleted at the end of that. In between, all the other structures used by the invariant checkers have been created in nested ResourceMarks, and freed before the surrounding one is deleted. Nesting seems indeed respected.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2332826049

From djelinski at openjdk.org  Tue Sep  9 10:13:44 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 9 Sep 2025 10:13:44 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer [v2]
In-Reply-To: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
Message-ID: <e8KnT4ERwDtpWM4totC6J-o8LWlglHuR_ep9Me6etR8=.4c173186-e14f-4c42-9835-eee720e84cae@github.com>

> The nop list has never been used in the history of OpenJDK. Let's clean it up.
> 
> Tested with Mach5 tier 1-5, no related failures.

Daniel Jeli?ski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - Merge remote-tracking branch 'origin/master' into nops-cleanup
 - Update copyright
 - Remove outdated comment
 - Remove nop list

-------------

Changes: https://git.openjdk.org/jdk/pull/27117/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27117&range=01
  Stats: 83 lines in 11 files changed: 1 ins; 77 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27117.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27117/head:pull/27117

PR: https://git.openjdk.org/jdk/pull/27117

From thartmann at openjdk.org  Tue Sep  9 10:23:53 2025
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Tue, 9 Sep 2025 10:23:53 GMT
Subject: RFR: 8367135: Test
 compiler/loopstripmining/CheckLoopStripMining.java needs internal timeouts
 adjusted
In-Reply-To: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>
References: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>
Message-ID: <ZwxVjONx98Y6MCm-x3HS70SNdEwodbGyGXSIxXZEmds=.0398e669-eb90-430f-98f9-0b96ac3a1ef8@github.com>

On Tue, 9 Sep 2025 09:31:24 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

> As described, adjust timeout to be as it implicitly used to be.
> 
> Thanks,
> Marc

Thanks for fixing this. Looks good and trivial.

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27167#pullrequestreview-3200720790

From mchevalier at openjdk.org  Tue Sep  9 10:41:32 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 10:41:32 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <8F_IhYAZ2XxKl9SzWYNYkGvXzKEj1rl8GsRFrORBWaE=.4bd4bd61-f01d-480b-86b1-e65bbf61b065@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>
 <k0FHgFPoG8gM3kEtbMGLfNap6yApYM_fJe3EdxqC1Q8=.1f42f626-dde4-45a3-b756-ca782be6e436@github.com>
 <8F_IhYAZ2XxKl9SzWYNYkGvXzKEj1rl8GsRFrORBWaE=.4bd4bd61-f01d-480b-86b1-e65bbf61b065@github.com>
Message-ID: <iKPXfiLHsZZkBJ8OeimzGzfPEusxF5HzjhR5-oNhKwg=.72a07dba-f3ca-4c62-b467-bf1e62932c76@github.com>

On Tue, 9 Sep 2025 08:29:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I thought about that and I think the current situation is ok. The pattern is not something highly mutable, it's mostly some hardcoded thing. I don't think it's hard to figure out what you're expecting. I'm very reluctant to add some ugliness to the patterns who must stay readable, to be easy to verify by a human. It could be solved with more templates tho.
>
> Once we have more complex patterns, will it really be that easy to see what was expected?
> All you will see is what we actually got. You are already all about good reporting, so I just noticed a hole here.
> You know the code better, so I'll leave it up to you in the end ;)

That is not quite true! We will also print the path from the center, so we know how we arrived at the point that has an unexpected type. We can both use the pattern and follow it, or our general knowledge of the IR to see that something looks wrong in the displayed part.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2333006921

From epeter at openjdk.org  Tue Sep  9 10:58:41 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 10:58:41 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer [v2]
In-Reply-To: <e8KnT4ERwDtpWM4totC6J-o8LWlglHuR_ep9Me6etR8=.4c173186-e14f-4c42-9835-eee720e84cae@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
 <e8KnT4ERwDtpWM4totC6J-o8LWlglHuR_ep9Me6etR8=.4c173186-e14f-4c42-9835-eee720e84cae@github.com>
Message-ID: <QPKNKCH20jqBQKph_TWvO0tYADf27IcuUx2jHtTduUk=.334e62c3-9e17-42be-860c-9cb340731b7d@github.com>

On Tue, 9 Sep 2025 10:13:44 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> The nop list has never been used in the history of OpenJDK. Let's clean it up.
>> 
>> Tested with Mach5 tier 1-5, no related failures.
>
> Daniel Jeli?ski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - Merge remote-tracking branch 'origin/master' into nops-cleanup
>  - Update copyright
>  - Remove outdated comment
>  - Remove nop list

Marked as reviewed by epeter (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27117#pullrequestreview-3200894301

From epeter at openjdk.org  Tue Sep  9 10:59:48 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 10:59:48 GMT
Subject: RFR: 8366984: Remove delay slot support
In-Reply-To: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
Message-ID: <RQ6vnLOFZjJ22UYrpOMl9rRVntEQwaHa-DLfVQopMfs=.d5dd313a-198c-4678-88f1-9875ec52e008@github.com>

On Fri, 5 Sep 2025 14:24:50 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
> 
> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.

Thanks for the answers!

You'll of course have to merge the dependency, and get a second review :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27119#issuecomment-3270142011

From epeter at openjdk.org  Tue Sep  9 10:59:51 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 10:59:51 GMT
Subject: RFR: 8366984: Remove delay slot support
In-Reply-To: <nhoHqghEdOAC7F2YzBUjE4oypjXU8HDmsc789gGi8zg=.798a9f40-5b37-442a-bf89-2e5379fdddfa@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
 <d156dQjZQs-9T_5Q-77vP54YAjv8jDOKla1d-fHhlns=.f5ec3359-8d09-4c68-90ce-59529e1f1ff2@github.com>
 <nhoHqghEdOAC7F2YzBUjE4oypjXU8HDmsc789gGi8zg=.798a9f40-5b37-442a-bf89-2e5379fdddfa@github.com>
Message-ID: <6dsuRmIveoGusgv0MnsHqIv87ZjbwXy3z9DoBvbUwVc=.9437cbe1-e0ad-43ce-bbda-86403a5971fc@github.com>

On Tue, 9 Sep 2025 09:23:46 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> src/hotspot/cpu/arm/arm.ad line 3383:
>> 
>>> 3381:     BR     : R;
>>> 3382: %}
>>> 3383: 
>> 
>> Where was this used? Or is it an unrelated cleanup?
>
> Removing the comment alone didn't feel quite right, so I removed the following block as well. The block appears to be unused. It was copy-pasted from [SPARC](https://github.com/openjdk/jdk/blob/8153779ad32d1e8ddd37ced826c76c7aafc61894/hotspot/src/cpu/sparc/vm/sparc.ad#L4984), where it was also unused.

Thanks for the explanation :)

>> src/hotspot/share/adlc/adlparse.cpp line 1394:
>> 
>>> 1392:           parse_err(SYNERR, "Using obsolete token, branch_has_delay_slot");
>>> 1393:           break;
>>> 1394:         }
>> 
>> I'm curious: why do you add that special warning? It would fail later anyway, right? Are we expecting anyone to parse things produced by different versions?
>
> I took my inspiration from earlier work on adlc (see 6e35bcbf038cec0210c38428a8e1c233e102911a or 3f9c8a39201644952c6d07b97695a5a7ef918622), but I don't mind removing these warnings and the related code block entirely.

Sounds good, just keep the "obsolete" error :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27119#discussion_r2333063403
PR Review Comment: https://git.openjdk.org/jdk/pull/27119#discussion_r2333062954

From chagedorn at openjdk.org  Tue Sep  9 11:16:52 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 9 Sep 2025 11:16:52 GMT
Subject: RFR: 8367135: Test
 compiler/loopstripmining/CheckLoopStripMining.java needs internal timeouts
 adjusted
In-Reply-To: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>
References: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>
Message-ID: <Ua5WfSmx3dnnb4KyDY3J4cQrZWzyh4j5sENnPej8R0A=.a875a469-ae30-4e51-8537-13a740d06f77@github.com>

On Tue, 9 Sep 2025 09:31:24 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

> As described, adjust timeout to be as it implicitly used to be.
> 
> Thanks,
> Marc

Looks good!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27167#pullrequestreview-3200976783

From chagedorn at openjdk.org  Tue Sep  9 11:17:57 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 9 Sep 2025 11:17:57 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v7]
In-Reply-To: <BfbDzTNQbwqUh0pXFbXiFy09JPGfbLPVmUTx-YEe1KM=.cad33852-809f-4907-a41d-628d0d0db07e@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <BfbDzTNQbwqUh0pXFbXiFy09JPGfbLPVmUTx-YEe1KM=.cad33852-809f-4907-a41d-628d0d0db07e@github.com>
Message-ID: <es3n7br2LboEQJGQl5ZTKDcKp8EShNYPcivxI3wGqs8=.2f8e251f-ec85-4f3e-b4ef-608eaed4c748@github.com>

On Tue, 9 Sep 2025 08:39:37 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> A node in a pre loop only has uses out of the loop dominated by the
>> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
>> to the loop exit projection. A range check in the main loop has this
>> node as input (through a chain of some other nodes). Range check
>> elimination needs to update the exit condition of the pre loop with an
>> expression that depends on the node pinned on its exit: that's
>> impossible and the assert fires. This is a variant of 8314024 (this
>> one was for a node with uses out of the pre loop on multiple paths). I
>> propose the same fix: leave the node with control in the pre loop in
>> this case.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8361702
>  - Merge branch 'master' into JDK-8361702
>  - review
>  - Merge branch 'master' into JDK-8361702
>  - Update src/hotspot/share/opto/loopopts.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update src/hotspot/share/opto/loopopts.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - tests
>  - ... and 1 more: https://git.openjdk.org/jdk/compare/e2575a25...91a7d73c

Still good!

Since the last testing is quite a while back, let me rerun it.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26424#pullrequestreview-3200982359

From mchevalier at openjdk.org  Tue Sep  9 11:20:11 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 11:20:11 GMT
Subject: RFR: 8367135: Test
 compiler/loopstripmining/CheckLoopStripMining.java needs internal timeouts
 adjusted
In-Reply-To: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>
References: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>
Message-ID: <6Hq93VTf74iXoQqAltS6qNEpiZmB2TXK2mvhZgtiFtc=.c66ef126-4656-4565-9586-463c888f44a4@github.com>

On Tue, 9 Sep 2025 09:31:24 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

> As described, adjust timeout to be as it implicitly used to be.
> 
> Thanks,
> Marc

Thanks @TobiHartmann & @chhagedorn!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27167#issuecomment-3270222653

From mchevalier at openjdk.org  Tue Sep  9 11:20:12 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 11:20:12 GMT
Subject: Integrated: 8367135: Test
 compiler/loopstripmining/CheckLoopStripMining.java needs internal timeouts
 adjusted
In-Reply-To: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>
References: <zzmSywW-3ApM-7Q2vNzZomBaJ0rYnNY3gePJtRqZcW4=.0b71f122-920a-41ae-a6ad-d0e7f33ab824@github.com>
Message-ID: <j0MYDVzjzrx7745A8QZOLrzkPu-01qJ7Z3lg743uUmU=.a9a8530a-8b24-42ef-bd43-0055537b2109@github.com>

On Tue, 9 Sep 2025 09:31:24 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

> As described, adjust timeout to be as it implicitly used to be.
> 
> Thanks,
> Marc

This pull request has now been integrated.

Changeset: 06326176
Author:    Marc Chevalier <mchevalier at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/0632617670f991da23c3892d357e8d1f051d29a0
Stats:     4 lines in 1 file changed: 0 ins; 0 del; 4 mod

8367135: Test compiler/loopstripmining/CheckLoopStripMining.java needs internal timeouts adjusted

Reviewed-by: thartmann, chagedorn

-------------

PR: https://git.openjdk.org/jdk/pull/27167

From roland at openjdk.org  Tue Sep  9 11:27:50 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Tue, 9 Sep 2025 11:27:50 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
Message-ID: <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>

> An `Initialize` node for an `Allocate` node is created with a memory
> `Proj` of adr type raw memory. In order for stores to be captured, the
> memory state out of the allocation is a `MergeMem` with slices for the
> various object fields/array element set to the raw memory `Proj` of
> the `Initialize` node. If `Phi`s need to be created during later
> transformations from this memory state, The `Phi` for a particular
> slice gets its adr type from the type of the `Proj` which is raw
> memory. If during macro expansion, the `Allocate` is found to have no
> use and so can be removed, the `Proj` out of the `Initialize` is
> replaced by the memory state on input to the `Allocate`. A `Phi` for
> some slice for a field of an object will end up with the raw memory
> state on input to the `Allocate` node. As a result, memory state at
> the `Phi` is incorrect and incorrect execution can happen.
> 
> The fix I propose is, rather than have a single `Proj` for the memory
> state out of the `Initialize` with adr type raw memory, to use one
> `Proj` per slice added to the memory state after the `Initalize`. Each
> of the `Proj` should return the right adr type for its slice. For that
> I propose having a new type of `Proj`: `NarrowMemProj` that captures
> the right adr type.
> 
> Logic for the construction of the `Allocate`/`Initialize` subgraph is
> tweaked so the right adr type captured in is own `NarrowMemProj` is
> added to the memory sugraph. Code that removes an allocation or moves
> it also has to be changed so it correctly takes the multiple memory
> projections out of the `Initialize` node into account.
> 
> One tricky issue is that when EA split types for a scalar replaceable
> `Allocate` node:
> 
> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>   with the type of the slices for the allocation
>   
> 2- before EA, the memory state for one particular field out of the
>   `Initialize` node can be used for a `Store` to the just allocated
>   object or some other. So we can have a chain of `Store`s, some to
>   the newly allocated object, some to some other objects, all of them
>   using the state of `NarrowMemProj` out of the `Initialize`. After
>   split unique types, the `NarrowMemProj` is for the slice of a
>   particular allocation. So `Store`s to some other objects shouldn't
>   use that memory state but the memory state before the `Allocate`.
>   
> For that, I added logic to update the adr type of `NarrowMemProj`
> during split unique types and update the memory input of `Store`s that
> don't depend on the memory state ...

Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:

 - more
 - Merge branch 'master' into JDK-8327963
 - more
 - more
 - Merge branch 'master' into JDK-8327963
 - more
 - more
 - lambda return
 - lambda clean up
 - Merge branch 'master' into JDK-8327963
 - ... and 35 more: https://git.openjdk.org/jdk/compare/e16c5100...b701d03e

-------------

Changes: https://git.openjdk.org/jdk/pull/24570/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=11
  Stats: 932 lines in 20 files changed: 845 ins; 25 del; 62 mod
  Patch: https://git.openjdk.org/jdk/pull/24570.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570

PR: https://git.openjdk.org/jdk/pull/24570

From roland at openjdk.org  Tue Sep  9 11:30:13 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Tue, 9 Sep 2025 11:30:13 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v8]
In-Reply-To: <BrNHUWgnhDZWz523gq_a8Smxck7UE0r0gBLQHfydrXk=.d96048bf-497e-426d-bdab-b58e63b1e5c6@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com>
 <phFEV6ecal3bMYgAt85dr5f6UKm024p2Ssw2l5zDvOQ=.c332a12d-5009-4e99-abc4-e0d58f06a075@github.com>
 <JczlkGMI1ugc2011v3_yecnmAihjcv5YYyixFtvZjvk=.3994dece-26bc-4c73-9850-8f63986b6fc7@github.com>
 <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com>
 <eMGWpjjtAvxGzXXgDpfqUyz-LHobPg5dEAk99yQYhic=.81804900-b4ae-4b71-9a39-893fa7b6d36c@github.com>
 <LeeKE7VBNvxxD8-1ltyf2CGltyUV90y-ZabbxGVYXZc=.79192936-6954-4b74-a4ec-ead162efe4e2@github.com>
 <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com>
 <QtsENUXeRsA140liru9rjk0KDbNVhKj6qPVU8toDlkI=.4b9eadfe-045e-4bae-a2c8-40c04496cb60@github.com>
 <BrNHUWgnhDZWz523gq_a8Smxck7UE0r0gBLQHfydrXk=.d96048bf-497e-426d-bdab-b58e63b1e5c6@github.com>
Message-ID: <AnynYGT-3bAwKJJk2oz1OA3wOAZFjjwKEKfDFOaEhBw=.5f079314-6878-4de9-b395-515d950dfc13@github.com>

On Fri, 11 Jul 2025 18:20:19 GMT, John R Rose <jrose at openjdk.org> wrote:

>>> I think it would be good (although not necessarily in the context of this PR) to establish the "no duplicate memory projection" invariant in the back-end, for sanity and to make sure we do not break any logic that might be implicitly relying on it. If you agree, could you file a follow-up RFE, ideally with a reproducer where the current logic fails to remove `NarrowMemProj`s?
>> 
>> One way would be to simply assert that there's no `NarrowMemProj`s left during final graph reshape. Is that what you'd like?
>> Stepping back, what's the concern here? The new projections should mostly be harmless.
>
>> I think it would be good (although not necessarily in the context of this PR) to establish the "no duplicate memory projection" invariant in the back-end, for sanity and to make sure we do not break any logic that might be implicitly relying on it. If you agree, could you file a follow-up RFE, ideally with a reproducer where the current logic fails to remove `NarrowMemProj`s?
> 
> I see this as a request for a better "normal form" for the graph.  The trick here is that, if we are allowing temporary "abnormal" forms of the graph, in order to give various transforms some "working room" to rearrange things, we need to decide when are the moments when the graph must be settled back down into a normal form.
> 
> We sometimes check for some kinds of IR normality, and/or enforce some normality, in the "final graph reshape" phase.  The problem with loading up too many ad hoc operations at that point is, it may create a completely new kind of graph with new invariants.  (Don't like the current standard?  Create a new one, and see how that goes!  Same for global IR contracts.)  
> 
> Having two kinds of IR with two sets of invariants (one set more restrictive) has an obvious objection:  We fragment our ability to enforce the rules; we need to write enforcement logic which says "which phase are we in?" before checking the right set of rules.  And if the editing sessions are rare, we don't get much benefit from the rules that are enforced by that editing session.  By definition "final graph reshape" is rare.  It's worth it since we are going to a lower IR, which really must have different rules, but it's not a light thing to add to the design.
> 
> In any case, adding a normalization requirement seems to need a "wash pass" of some sort over the whole graph, to do necessary cleanups.  We do this sometimes, I think, after loop opts or EA, maybe other places, and at "final graph reshape".  This is going to be a runtime expense, I think, unless it can be piggybacked on some other pass we already do.  Maybe a hallmark of these "post-operative" cleanups is that the operation itself required some side data structure, created just for the operation (loop nest or connection graph) and discarded later in order to unleash unconstrained downstream transforms.  During the operation, transforms are specialized just to keep the side data structure relevant.  Afterwards, the graph "opens up" to unconstrained changes.  But in all cases, local updates should be as free as possible, even if their ord...

@rose00 @robcasloz I updated the change with a new way to avoid redundant projections. At matching time, before a `NarrowMemProj` is matched into a `MachProj`, new logic checks whether a `MachProj` already exists. That guarantees that no redundant `MachProj` are ever added. It also performs the  new normalization at a major cut-point. What do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3270256703

From epeter at openjdk.org  Tue Sep  9 11:51:00 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 11:51:00 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
Message-ID: <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>

On Fri, 5 Sep 2025 17:17:52 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> PopCountValueTransform.StockKernelInt         thrpt    2  409295.875          ops/s
>> PopCountValueTransform.StockKernelLong        thrpt    2  368025.608          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> PopCountValueTransform.StockKernelInt         thrpt    2  418649.269          ops/s
>> PopCountValueTransform.StockKernelLong        thrpt    2  381330.221          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update countbitsnode.cpp

Very nice improvement @jatin-bhateja , thanks for working on it :)

src/hotspot/share/opto/countbitsnode.cpp line 125:

> 123:         range is computed using the following formulas:-
> 124:         - _hi = ~ZEROS
> 125:         - _lo = ONES

Is there going to be some other Lemma here, that gives rise to the numbering? I'd just remove the numbering.

src/hotspot/share/opto/countbitsnode.cpp line 128:

> 126: Proof:-
> 127:   - KnownBits.ZEROS and KnownBits.ONES are inferred out of the common prefix of the value range
> 128:     delimiting bounds.

It could come from the range. But it could also come from individual bits being and-ed or or-ed to 1 or 0. I'll give an alternative suggestion below.

src/hotspot/share/opto/countbitsnode.cpp line 145:

> 143:     B) Now, transform the computed knownbits back to the value range.
> 144:       _new_lo = _known_bits.ones  = 0b11000100
> 145:       _new_hi = ~known_bits.zeros = 0b11000111

This kinda duplicates all the descriptions that we have in KnownBits. I would drop it. Or maybe just refer to something over there.

src/hotspot/share/opto/countbitsnode.cpp line 149:

> 147:   - We now know that ~KnownBits.ZEROS >= UB >= LB >= KnownBits.ONES
> 148:   - Therefore, popcount(ONES) and popcount(~ZEROS) can safely be assumed as the upper and lower
> 149:     bounds of the result value range.

I don't quite see how that follows from the proof. And I'm also worried about the correctness.

You are using the signed `_lo` and `_hi`. But the zeros and ones are unsigned. So it is a bit unclear what your comparisons prove here - you should probably cast one to signed or the other to unsigned to make things explicit.

One crucial step here is also the linearity assumption of `popcount`. You'd need to show or at least assert that:

~KnownBits.ZEROS >= UB >= t >= LB >= KnownBits.ONES
implies
popcount(~KnownBits.ZEROS) >= popcount(UB) >= popcount(t) >= popcount(LB) >= popcount(KnownBits.ONES)


It all sounds a bit complicated, and I think I would prefer something along the lines of what @SirYwell suggested.

src/hotspot/share/opto/countbitsnode.cpp line 150:

> 148:   - Therefore, popcount(ONES) and popcount(~ZEROS) can safely be assumed as the upper and lower
> 149:     bounds of the result value range.
> 150: */

Suggestion:

// We use the KnownBits information from the integer types to derive how many one bits
// we have at least and at most.
// From the definition of KnownBits, we know:
//   zeros: Indicates which bits must be 0: ones[i] =1 -> t[i]=0
//   ones:  Indicates which bits must be 1: zeros[i]=1 -> t[i]=1
//
// From this, we derive:
//   numer_of_zeros_in_t >= pop_count(zeros)
//   -> number_of_ones_in_t <= bits_per_type - pop_count(zeros) = pop_count(~zeros)
//   number_of_ones_in_t >= pop_count(ones)
//
// By definition:
//   pop_count(t) = number_of_ones_in_t
//
// It follows:
//   pop_count(ones) <= pop_count(t) <= pop_count(~zeros)
//
// Note: signed _lo and _hi, as well as unsigned _ulo and _uhi bounds of the integer types
//       are already reflected in the KnownBits information, see TypeInt / TypeLong definitions.

Feel free to adjust the formulation :)

test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 74:

> 72:         }
> 73:         return 1;
> 74:     }

Can we not assert that there is exactly one popcount? The two should fold to one, no?

test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 114:

> 112:         }
> 113:         return 1;
> 114:     }

Thanks for the tests!

I think it would be quite valuable to have some tests that do not just clamp the range, but also create random `KnownBits`, i.e. with random and/or masks.

For example:
`num = (num | ONES) & ZEROS;`

And then you generate `ONES` and `ZEROS` randomly, maybe even using `Generators`?
Then round it off with some random range comparisons at the end:
`        if (Integer.bitCount(num) >= CON1 && Integer.bitCount(num) <= CON2) {`

test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 148:

> 146: 
> 147:     public static void main(String[] args) {
> 148:         TestFramework.runWithFlags("-XX:-TieredCompilation", "-XX:CompileThresholdScaling=0.2");

Can you explain the need for these flags?
The TestFramework eventually enqueues for compilation anyway. Or is there something about profiling?

test/micro/org/openjdk/bench/java/lang/PopCountValueTransform.java line 79:

> 77:         }
> 78:         return res;
> 79:     }

I assume the `stock` kernels are there to show performance if there is no op, the `folding` kernels you hope have the same performance. It would be nice to have one where the `bitCount` does not fold away, just to keep that comparison :)

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27075#pullrequestreview-3200918107
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333099461
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333106910
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333117632
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333147248
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333189002
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333222066
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333199688
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333087776
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333210925

From epeter at openjdk.org  Tue Sep  9 11:51:02 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 11:51:02 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
Message-ID: <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>

On Tue, 9 Sep 2025 11:03:26 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update countbitsnode.cpp
>
> src/hotspot/share/opto/countbitsnode.cpp line 125:
> 
>> 123:         range is computed using the following formulas:-
>> 124:         - _hi = ~ZEROS
>> 125:         - _lo = ONES
> 
> Is there going to be some other Lemma here, that gives rise to the numbering? I'd just remove the numbering.

Also: this is not really a mathematical statement that can be proven, rather some sort of high-level intention.

> src/hotspot/share/opto/countbitsnode.cpp line 150:
> 
>> 148:   - Therefore, popcount(ONES) and popcount(~ZEROS) can safely be assumed as the upper and lower
>> 149:     bounds of the result value range.
>> 150: */
> 
> Suggestion:
> 
> // We use the KnownBits information from the integer types to derive how many one bits
> // we have at least and at most.
> // From the definition of KnownBits, we know:
> //   zeros: Indicates which bits must be 0: ones[i] =1 -> t[i]=0
> //   ones:  Indicates which bits must be 1: zeros[i]=1 -> t[i]=1
> //
> // From this, we derive:
> //   numer_of_zeros_in_t >= pop_count(zeros)
> //   -> number_of_ones_in_t <= bits_per_type - pop_count(zeros) = pop_count(~zeros)
> //   number_of_ones_in_t >= pop_count(ones)
> //
> // By definition:
> //   pop_count(t) = number_of_ones_in_t
> //
> // It follows:
> //   pop_count(ones) <= pop_count(t) <= pop_count(~zeros)
> //
> // Note: signed _lo and _hi, as well as unsigned _ulo and _uhi bounds of the integer types
> //       are already reflected in the KnownBits information, see TypeInt / TypeLong definitions.
> 
> Feel free to adjust the formulation :)

It goes along the lines of what @SirYwell proposed.

> test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 114:
> 
>> 112:         }
>> 113:         return 1;
>> 114:     }
> 
> Thanks for the tests!
> 
> I think it would be quite valuable to have some tests that do not just clamp the range, but also create random `KnownBits`, i.e. with random and/or masks.
> 
> For example:
> `num = (num | ONES) & ZEROS;`
> 
> And then you generate `ONES` and `ZEROS` randomly, maybe even using `Generators`?
> Then round it off with some random range comparisons at the end:
> `        if (Integer.bitCount(num) >= CON1 && Integer.bitCount(num) <= CON2) {`

Also: how many popcount instructions are left? Should it not at most be 1?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333112033
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333226941
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2333218115

From epeter at openjdk.org  Tue Sep  9 12:00:02 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 12:00:02 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v7]
In-Reply-To: <BfbDzTNQbwqUh0pXFbXiFy09JPGfbLPVmUTx-YEe1KM=.cad33852-809f-4907-a41d-628d0d0db07e@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <BfbDzTNQbwqUh0pXFbXiFy09JPGfbLPVmUTx-YEe1KM=.cad33852-809f-4907-a41d-628d0d0db07e@github.com>
Message-ID: <H3f75OPDtejeLKc3v8aFN8r-Zkry8odU6FAagWZfOc0=.245fae74-22c7-4469-93a6-29f1c5686688@github.com>

On Tue, 9 Sep 2025 08:39:37 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> A node in a pre loop only has uses out of the loop dominated by the
>> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
>> to the loop exit projection. A range check in the main loop has this
>> node as input (through a chain of some other nodes). Range check
>> elimination needs to update the exit condition of the pre loop with an
>> expression that depends on the node pinned on its exit: that's
>> impossible and the assert fires. This is a variant of 8314024 (this
>> one was for a node with uses out of the pre loop on multiple paths). I
>> propose the same fix: leave the node with control in the pre loop in
>> this case.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8361702
>  - Merge branch 'master' into JDK-8361702
>  - review
>  - Merge branch 'master' into JDK-8361702
>  - Update src/hotspot/share/opto/loopopts.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update src/hotspot/share/opto/loopopts.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - tests
>  - ... and 1 more: https://git.openjdk.org/jdk/compare/2676c5f4...91a7d73c

src/hotspot/share/opto/loopopts.cpp line 1936:

> 1934: // Sinking a node from a pre loop to its main loop pins the node between the pre and main loops. If that node is input
> 1935: // to a check that's eliminated by range check elimination, it becomes input to an expression that feeds into the exit
> 1936: // test of the pre loop above the point in the graph where it's pinned.

I guess the alternative would have been not to do that RC elimination, right?
If yes: you could finish the thought and say that we prefer to have a chance at RC elimination, rather than sinking the node out of the pre-loop.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2333262985

From epeter at openjdk.org  Tue Sep  9 12:18:23 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 12:18:23 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v3]
In-Reply-To: <TuBiSPqTkozHX6ZgMqeWkjxvoZR7qZgnZQH9q85B_cs=.0a26c93f-537c-4277-ae1c-7ea2ce0dbc1e@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <TuBiSPqTkozHX6ZgMqeWkjxvoZR7qZgnZQH9q85B_cs=.0a26c93f-537c-4277-ae1c-7ea2ce0dbc1e@github.com>
Message-ID: <aDEkWw3JsCtNwz7lzAp7QGzXb80naqBJYN6wY0Het-k=.4a25187d-64ec-48a5-a245-ca02fc9720bf@github.com>

On Fri, 5 Sep 2025 06:30:34 GMT, erifan <duke at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Align code example data for better reading
>  - Merge branch 'master' into JDK-8363989
>  - Improve the comment of the vector expand implementation
>  - Merge branch 'master' into JDK-8363989
>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>    
>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>    cases, `expand` has not yet been intrinsified:
>    1. **Subword types** on SVE2-capable hardware.
>    2. **All types** on NEON and SVE1 environments.
>    
>    As a result, `expand` API performance is very poor in these scenarios.
>    This patch intrinsifies the `expand` operation in the above environments.
>    
>    Since there are no native instructions directly corresponding to `expand`
>    in these cases, this patch mainly leverages the `TBL` instruction to
>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>    Take a 128-bit byte vector on SVE2 as an example:
>    ```
>    To compute: dst = src.expand(mask)
>    Data direction: high <== low
>    Input:
>      src                         = p o n m l k j i h g f e d c b a
>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    Expected result:
>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>    ```
>    Step 1: calculate the index input of the TBL instruction.
>    ```
>    // Set tmp1 as all 0 vector.
>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>    
>    // Move the mask bits from the predicate register to a vector register.
>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    
>    // Shift the entire register. Prefix sum algorithm.
>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>    
>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>    
>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2...

Thanks for the updates! The patch looks good to me now.
I'll run some testing now, should take about 24h :)

-------------

PR Review: https://git.openjdk.org/jdk/pull/26740#pullrequestreview-3201222772

From epeter at openjdk.org  Tue Sep  9 13:16:59 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 13:16:59 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v11]
In-Reply-To: <HKiejePdRHy-xJNBBNnw09SHkkOpY0EWaVIvS-xg36E=.55c2c30d-f84b-4ca0-a4a8-a25ffbd31236@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <HKiejePdRHy-xJNBBNnw09SHkkOpY0EWaVIvS-xg36E=.55c2c30d-f84b-4ca0-a4a8-a25ffbd31236@github.com>
Message-ID: <Dy9rqrrgUAsownC_lhp5729sObjNAXlCQs6RwOZusCQ=.12827ffc-d4c8-4a98-a309-653af5a97519@github.com>

On Wed, 9 Jul 2025 06:08:33 GMT, erifan <duke at openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
>> 
>> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
>> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
>> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
>> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
>> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
>> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
>> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
>> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
>> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
>> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
>> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
>> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
>> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
>> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
>> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
>> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
>> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
>> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
>> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
>> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
>> testCompareLTMaskNotInt		ops/s	16721...
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update the code comment

Looks much better, thanks for the updates!

I have another small list of suggestions :)

src/hotspot/share/opto/vectornode.cpp line 2243:

> 2241:   if (in1->Opcode() != Op_VectorMaskCmp ||
> 2242:       in1->outcnt() != 1 ||
> 2243:       !(in1->as_VectorMaskCmp())->predicate_can_be_negated() ||

Suggestion:

      !in1->as_VectorMaskCmp()->predicate_can_be_negated() ||

Brackets are unnecessary, and rather make it harder to read.

src/hotspot/share/opto/vectornode.cpp line 2277:

> 2275:     res = VectorNode::Ideal(phase, can_reshape);
> 2276:   }
> 2277:   return res;

What if someone comes and wants to add yet another optimization before `VectorNode::Ideal`? Your code layout would give us deeper and deeper nesting. I suggest flattening it like this:
Suggestion:


  Node* res = Ideal_XorV_VectorMaskCmp(phase, can_reshape);
  if (res != nullptr) { return res; }

  return VectorNode::Ideal(phase, can_reshape);

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 911:

> 909:         testCompareMaskNotLong(L_SPECIES_FOR_CAST, VectorOperators.UGE, (m) -> { return m.cast(I_SPECIES_FOR_CAST).not(); });
> 910:         verifyResultsLong(L_SPECIES_FOR_CAST, VectorOperators.UGE);
> 911:     }

You have some cast in here, and in similar tests.
Can you add an IR rule to check if we do or do not have the expected casts?

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 1007:

> 1005:         testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fninf, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
> 1006:         verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fninf);
> 1007:     }

Do you have test cases for the cases other than `EQ` and `NE`? After all, we don't that someone accidentally messes with the logic you implemented later and we don't notice the bug ;)

test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 351:

> 349:     public void testCompareULEMaskNotLong() {
> 350:         testCompareMaskNotLong(VectorOperators.ULE);
> 351:     }

You could consider making the operator a `@Param` next time.

There are multiple tricks to do that:
- `test/micro/org/openjdk/bench/vm/compiler/VectorStoreToLoadForwarding.java` using `MethodHandles.constant`
- Some inner class that has a static final, which is initialized from the non-final `@Param` value.
- Probably even `StableValue` would work, but I have not yet experimented with it.

It would be nice if we could do the same with the primitive types, but that's probably not going to work as easily.

Really just an idea for next time.

test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 366:

> 364:     public void testCompareNEMaskNotFloat() {
> 365:         testCompareMaskNotFloat(VectorOperators.NE);
> 366:     }

You could still add the other comparisons as well, so we can see the performance difference. Very optional, feel free to ignore this suggestion.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-3201347660
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333480061
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333418237
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333510278
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333503735
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333545924
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2333516350

From epeter at openjdk.org  Tue Sep  9 13:28:22 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 13:28:22 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v7]
In-Reply-To: <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
Message-ID: <mGYtADl5GS7d6lPkAhsqEzEZuA6apXpUfcyyFzb5r08=.c968252b-7f8a-41d0-acbd-6da43db08bf5@github.com>

On Tue, 26 Aug 2025 12:46:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Thanks for filing the issue! I left some comments there. We could delay div/mod by constants to after loop opts. And we could even optimize div/mod in loops that have loop-invariant divisor ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3270740800

From epeter at openjdk.org  Tue Sep  9 13:35:27 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 13:35:27 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v7]
In-Reply-To: <qvXeqU-AMI1hIL6NtQ92h-Z24x41RIkGG67wcZP6m-8=.df359e17-9726-4cd2-ae95-874099f65b76@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
 <qvXeqU-AMI1hIL6NtQ92h-Z24x41RIkGG67wcZP6m-8=.df359e17-9726-4cd2-ae95-874099f65b76@github.com>
Message-ID: <sTuwx9ScBVZSKkAtLSO2IbRAI3v2TmSbMIvd-4uQzdY=.fc7d5549-9f32-430a-bcb1-28e9671183c4@github.com>

On Wed, 3 Sep 2025 15:20:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   review
>
> I also filed https://bugs.openjdk.org/browse/JDK-8366815 now regarding the early transformation of div/mod by constants.

@SirYwell The changes look good to me, thanks for working on this!

I'll now run some internal testing, before approving. Please ping me again in 24h if I don't report back by then :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3270770381

From epeter at openjdk.org  Tue Sep  9 13:37:27 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 13:37:27 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
Message-ID: <yxmXgfFE3PWbZguaTWEPrG17ol6Gx7tGPZIZr3SmOdg=.c48ae7c7-4ed1-47d5-8ebc-9f97f34eabaa@github.com>

On Fri, 5 Sep 2025 08:13:28 GMT, erifan <duke at openjdk.org> wrote:

> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
> 
> 
> Benchmarks on Intel 6444y machine with 512-bit avx3:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
> microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
> microMaskLane...

Patch looks good to me, testing passed :)

Thanks for working on this @erifan !

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27113#pullrequestreview-3201642369

From mchevalier at openjdk.org  Tue Sep  9 13:40:41 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 13:40:41 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <hXYBll0HHr4yM2PH-u5LOBetaZDSq_-5CMqExRT2jDA=.eb630964-ec0f-41df-aa84-60c922731b0a@github.com>

On Mon, 8 Sep 2025 15:27:36 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   One more ResourceMark
>
> src/hotspot/share/opto/graphInvariants.cpp line 116:
> 
>> 114: private:
>> 115:   const N*& _binding;
>> 116: };
> 
> Would it not make sense to move it a bit closer to the related code? Do you need it much before `NodeClassIsAndBind`?

`TypedBind` is like `Bind` they are both matching the same nodes as `TruePattern` just before. I think the grouping makes more sense than splitting `Bind` (whose comment refers ti `TruePattern`) and `TypedBind`. It makes more sense to hoist `NodeClass` here, even if their relation is rather light: used in the same macro.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2333658743

From mbaesken at openjdk.org  Tue Sep  9 14:04:57 2025
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Tue, 9 Sep 2025 14:04:57 GMT
Subject: RFR: 8366775: TestCompileTaskTimeout should use timeoutFactor
In-Reply-To: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
References: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
Message-ID: <FAbshcoZgQbyZL1hY00zT0716kDfRxQ8LINQOuQzjo4=.f3ad54a6-3d07-4713-88fa-607e1b702f1c@github.com>

On Thu, 4 Sep 2025 13:26:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> `TestCompileTaskTimeout.java` employs a timeout to test that methods compiled faster than a specified `CompileTaskTimeout`. However, it does not make use of the jtreg timeout factor, which lead to #26963 increasing the timeout to 2 s. This PR remedies this, by using the timeout factor and reducing the default timeout to 500 ms.
> 
> Testing:
>  - [x] Github Actions
>  - [x] tier1, tier2 linux-x64-debug, linux-x64, linux-aarch64-debug, linux-aarch64

Marked as reviewed by mbaesken (Reviewer).

Looks good, the adjustments seem to work for us.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27094#pullrequestreview-3201794914
PR Comment: https://git.openjdk.org/jdk/pull/27094#issuecomment-3270887814

From epeter at openjdk.org  Tue Sep  9 14:07:19 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 14:07:19 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <hXYBll0HHr4yM2PH-u5LOBetaZDSq_-5CMqExRT2jDA=.eb630964-ec0f-41df-aa84-60c922731b0a@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <hXYBll0HHr4yM2PH-u5LOBetaZDSq_-5CMqExRT2jDA=.eb630964-ec0f-41df-aa84-60c922731b0a@github.com>
Message-ID: <19sn8mAJlmJgsBYmEyI-9PfMbDDbUiQrpxrkrVb9Q4M=.4114119a-d86a-4488-9249-884652601972@github.com>

On Tue, 9 Sep 2025 13:38:00 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 116:
>> 
>>> 114: private:
>>> 115:   const N*& _binding;
>>> 116: };
>> 
>> Would it not make sense to move it a bit closer to the related code? Do you need it much before `NodeClassIsAndBind`?
>
> `TypedBind` is like `Bind` they are both matching the same nodes as `TruePattern` just before. I think the grouping makes more sense than splitting `Bind` (whose comment refers ti `TruePattern`) and `TypedBind`. It makes more sense to hoist `NodeClass` here, even if their relation is rather light: used in the same macro.

Up to you :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2333768687

From mchevalier at openjdk.org  Tue Sep  9 14:10:35 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 14:10:35 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <0y1rlPAYPA8mfhOM22UtuR96ztkUjDwFDzEntzK2_ag=.360369c4-f0cb-4dfe-9773-530482a9c551@github.com>

On Mon, 8 Sep 2025 15:47:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> having the printed statements

which statements?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2333779354

From epeter at openjdk.org  Tue Sep  9 14:14:46 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 14:14:46 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <0y1rlPAYPA8mfhOM22UtuR96ztkUjDwFDzEntzK2_ag=.360369c4-f0cb-4dfe-9773-530482a9c551@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <0y1rlPAYPA8mfhOM22UtuR96ztkUjDwFDzEntzK2_ag=.360369c4-f0cb-4dfe-9773-530482a9c551@github.com>
Message-ID: <dPmhWNmOaMRmoe2Lw7b9Ho5tqB4Xgl0MNNqGHRemmwk=.7bae1c11-0a06-4d9a-b68a-47726420e69c@github.com>

On Tue, 9 Sep 2025 14:08:04 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 211:
>> 
>>> 209:   AtInput(uint which_input, const Pattern* pattern) : _which_input(which_input), _pattern(pattern) {}
>>> 210:   bool check(const Node* center, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
>>> 211:     assert(_which_input < center->req(), "Input number is out of range");
>> 
>> Hmm. Could still be nice if we did our best here, and responded nicely.
>> Just in case someone messes up the pattern, and then we get an assert here.
>> Maybe the bug is hard to reproduce, and having the printed statements would have helped a little?
>
>> having the printed statements
> 
> which statements?

I meant your error messages that you put to the `ss` :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2333793502

From mchevalier at openjdk.org  Tue Sep  9 14:42:23 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 14:42:23 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <JlGBhH5VrDCRo0FgBh96FPz45d0mXRkyYfcqIHEDwBY=.50195674-df30-491f-b757-d7da4e44c845@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <q3X_EWZRIU-1SVW3m5JilUivZd2tAv8G-IeGilUZpKY=.ce081014-24ce-44f8-9cd3-776930629e97@github.com>
 <rJiYQRIbfT5zCARiBxUQHlMRb8LadmcxInoc5EslZio=.900e813b-a33e-462b-8700-74922d000cb9@github.com>
 <JlGBhH5VrDCRo0FgBh96FPz45d0mXRkyYfcqIHEDwBY=.50195674-df30-491f-b757-d7da4e44c845@github.com>
Message-ID: <f17ZnGhpT227GCOZjqY8jApl7pLmQXopair-hQzqjvg=.70c389d3-fe46-46f2-94e0-4c4080c675d6@github.com>

On Tue, 9 Sep 2025 08:22:41 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Can you add a comment, why it can be arbitrarily large?
>> Do you have an example where we have very many ctrl uses?
>
> Also: are these all supposed to be projections of a specific kind? We could also test for that. You can also add that to a future RFE.

> Can you add a comment, why it can be arbitrarily large?

Maybe I'm very wrong about what is a CatchNode, but:

try {
...
}
catch ( ... ) { ... }
catch ( ... ) { ... }
catch ( ... ) { ... }

4 outputs: 3 handlers + 1 fallthrough.

>  Also: are these all supposed to be projections of a specific kind? We could also test for that. You can also add that to a future RFE.

I'd rather do it separately. We can always check more things, but we need to draw the line and that is safe to add later.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2333873366

From mchevalier at openjdk.org  Tue Sep  9 14:50:54 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 14:50:54 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
Message-ID: <M5djFQjWJW4UWxTT_vXvr07LMcclQULmfEIe_yM-BwQ=.d8e0dc86-0156-4d84-a12c-467b55bebdd3@github.com>

On Mon, 8 Sep 2025 16:55:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   One more ResourceMark
>
> src/hotspot/share/opto/graphInvariants.cpp line 528:
> 
>> 526:     if (!center->is_CountedLoop() && !center->is_LongCountedLoop()) {
>> 527:       return CheckResult::NOT_APPLICABLE;
>> 528:     }
> 
> Actually: why not applie that to `OuterStripMinedLoop` as well? Or any `BaseCountedLoop`? Are there more than these 3 cases? If there are ever more, they should probably also adhere to this backedge pattern, we'll just need an extension. But it would be nice to trip over something here if we ever do extend.

I'm going to push back on that. I rather want this one to be about counted loops, which have more structure that is HEAVILY relied on, that I haven't all enumerated, but that can be done.

One can make another check for the few things that hold for other flavors.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2333891737

From mchevalier at openjdk.org  Tue Sep  9 14:50:55 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 14:50:55 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <CTvOJ3ySq51MIP-9Edzrguc7qn5NG4oYQC7yuSEpoqo=.e4f47579-2230-45eb-861e-d93d05011505@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <r0rzQ5BGSRpGVAiK5E9zAZIYkU3gTPa7KjBATdETP6U=.5bff0999-2bfd-4f6e-9044-fcfb74e5e00d@github.com>
 <4jB_I2sHD7IfzhR7ojHfsFPlvZFCOWaHf8aS0AZshj0=.d0162feb-10b8-488d-82fa-eb816ce5dda9@github.com>
 <CTvOJ3ySq51MIP-9Edzrguc7qn5NG4oYQC7yuSEpoqo=.e4f47579-2230-45eb-861e-d93d05011505@github.com>
Message-ID: <pP-j6CSUrAqLgCZUOdBtbGs52AXBI2j-WmZSxhOI4Oc=.3bb30979-d740-4acb-b452-ade3f8f9f683@github.com>

On Tue, 9 Sep 2025 08:15:26 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I don't understand.
>
> I would still consider adding `OuterStripMinedLoop` here, to capture that it has a similar structure. Even if you also verify below specific things for `OuterStripMinedLoop`. Just to check that all these loop structures have the same kind of backedge shape.
> And then make a switch out of it, with a default case that fails. In case we add yet another `Loop` shape, we would then catch that and add the logic for it.
> 
> But actually: do not all `Loop` shapes have this backedge pattern? Or are there some that have a `IfFalse` on the backedge? Because then you could also add `LoopNode` with `LoopEndNode`.

Same as before: we extend the checks later.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2333895804

From djelinski at openjdk.org  Tue Sep  9 15:11:31 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 9 Sep 2025 15:11:31 GMT
Subject: RFR: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer [v2]
In-Reply-To: <RNeW9WTvNU9ySAAEihkuseHSlveeaFiThg5xtLo_Rao=.27185a05-07c3-4d6a-b626-6c56a750947a@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
 <RNeW9WTvNU9ySAAEihkuseHSlveeaFiThg5xtLo_Rao=.27185a05-07c3-4d6a-b626-6c56a750947a@github.com>
Message-ID: <MEJ3gQ61GBrrv0Hx7Z7rxCYWR1C-dRyTDJojfvczYmw=.6899d73f-d892-406a-a9cd-a7f1c072c687@github.com>

On Mon, 8 Sep 2025 23:15:52 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Daniel Jeli?ski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
>> 
>>  - Merge remote-tracking branch 'origin/master' into nops-cleanup
>>  - Update copyright
>>  - Remove outdated comment
>>  - Remove nop list
>
> Looks good.

Thanks @dean-long @eme64 for the review and re-review. The additional tests came back clean. Given that the merge conflict resolution did not change the diff, I'm going to integrate this now.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27117#issuecomment-3271146356

From djelinski at openjdk.org  Tue Sep  9 15:11:33 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 9 Sep 2025 15:11:33 GMT
Subject: Integrated: 8366971: C2: Remove unused nop_list from
 PhaseOutput::init_buffer
In-Reply-To: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
References: <2hBEO9Zpoy2wo_pgTXE9v8KG5u1HNdKp3RgQE-4HYcE=.e86088d1-1e25-49b5-9b3c-c2498ec6ca48@github.com>
Message-ID: <4nmZRAStXqQVsqqb7t9AyH_xhcplf0YCUW4nJ2nMf9E=.0b820a97-cdc7-4139-89d4-856a59ed2cef@github.com>

On Fri, 5 Sep 2025 13:02:00 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

> The nop list has never been used in the history of OpenJDK. Let's clean it up.
> 
> Tested with Mach5 tier 1-5, no related failures.

This pull request has now been integrated.

Changeset: cc6d34b2
Author:    Daniel Jeli?ski <djelinski at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/cc6d34b2fa299a68a05e65e25c1f41dffa67c118
Stats:     83 lines in 11 files changed: 1 ins; 77 del; 5 mod

8366971: C2: Remove unused nop_list from PhaseOutput::init_buffer

Reviewed-by: epeter, dlong

-------------

PR: https://git.openjdk.org/jdk/pull/27117

From djelinski at openjdk.org  Tue Sep  9 15:25:25 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 9 Sep 2025 15:25:25 GMT
Subject: RFR: 8366984: Remove delay slot support [v2]
In-Reply-To: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
Message-ID: <HOivgTBAh1RtTqaZqvW-NHY3CTOrtYJ_9zpVz_Y9sKQ=.ee86238d-9707-4d0e-85e0-fffd686150ce@github.com>

> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
> 
> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.

Daniel Jeli?ski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27119/files
  - new: https://git.openjdk.org/jdk/pull/27119/files/330d5ad1..330d5ad1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27119&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27119&range=00-01

  Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27119.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27119/head:pull/27119

PR: https://git.openjdk.org/jdk/pull/27119

From dfenacci at openjdk.org  Tue Sep  9 15:37:50 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Tue, 9 Sep 2025 15:37:50 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v4]
In-Reply-To: <njrDWUd6VvlyU-9HiTyJxDYRF4GNh586kyg5jHziRdI=.b1e51205-11f6-462b-b839-91bed0866fd7@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
 <jR9U9f8GNW0wPSQZD_UYRJf4hwGCCP7umyTm0bNDz4o=.ed4f44bd-57e7-4b2a-83b3-c8da05609dc4@github.com>
 <njrDWUd6VvlyU-9HiTyJxDYRF4GNh586kyg5jHziRdI=.b1e51205-11f6-462b-b839-91bed0866fd7@github.com>
Message-ID: <lPS-n5zroGHBZl-kuxJWDzaRRbHhZRj3898hXckAl6I=.755eee6b-25ad-4e39-ad75-520491883e75@github.com>

On Mon, 8 Sep 2025 23:53:49 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JDK-8360031: add assert condition and make consume method argument escape
>
> src/hotspot/share/opto/memnode.cpp line 4232:
> 
>> 4230: 
>> 4231: void MemBarNode::remove(PhaseIterGVN *igvn) {
>> 4232:   if (outcnt() != 2) {
> 
> By itself, this allows outcnt() == 0, so maybe we need to continue to fail if that happens.

I added the condition to the assert.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26556#discussion_r2334037886

From dfenacci at openjdk.org  Tue Sep  9 15:37:48 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Tue, 9 Sep 2025 15:37:48 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v4]
In-Reply-To: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
Message-ID: <WWtmL3EDxwnyvgq0UCczwTKvZf2sumk3l0_9IIy1N74=.5f655235-ed97-4aeb-aa66-fd2afa50da25@github.com>

> # Issue
> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered
> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235
> 
> # Cause
> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens:
> * we insert a trailing `MemBarStoreStore` in the constructor
> <img height="200" alt="before_folding" src="https://github.com/user-attachments/assets/c1aab634-808d-4198-94ac-8093c6b85c5d" />
> 
> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. 
> <img height="200" alt="after_folding" src="https://github.com/user-attachments/assets/568e9fc3-5f19-4e10-a72e-f0a5e772daed" />
> 
> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302
> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235
> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier
> 
> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped).
> 
> # Fix
> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation.
> 
> # Testing
> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after.
> Tier 1-3+ tests passed.

Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:

  JDK-8360031: add assert condition and make consume method argument escape

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26556/files
  - new: https://git.openjdk.org/jdk/pull/26556/files/57073b96..f5406f30

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26556&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26556&range=02-03

  Stats: 5 lines in 2 files changed: 3 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/26556.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26556/head:pull/26556

PR: https://git.openjdk.org/jdk/pull/26556

From dfenacci at openjdk.org  Tue Sep  9 15:44:27 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Tue, 9 Sep 2025 15:44:27 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v4]
In-Reply-To: <WWtmL3EDxwnyvgq0UCczwTKvZf2sumk3l0_9IIy1N74=.5f655235-ed97-4aeb-aa66-fd2afa50da25@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
 <WWtmL3EDxwnyvgq0UCczwTKvZf2sumk3l0_9IIy1N74=.5f655235-ed97-4aeb-aa66-fd2afa50da25@github.com>
Message-ID: <ivt2HwY4kFtIeoC0JoOztJyZzbM0ZQ6ecnVRQFIszrY=.edfee448-0f21-4a37-9e0f-3d01116241b1@github.com>

On Tue, 9 Sep 2025 15:37:48 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> # Issue
>> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered
>> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235
>> 
>> # Cause
>> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens:
>> * we insert a trailing `MemBarStoreStore` in the constructor
>> <img height="200" alt="before_folding" src="https://github.com/user-attachments/assets/c1aab634-808d-4198-94ac-8093c6b85c5d" />
>> 
>> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. 
>> <img height="200" alt="after_folding" src="https://github.com/user-attachments/assets/568e9fc3-5f19-4e10-a72e-f0a5e772daed" />
>> 
>> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302
>> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235
>> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier
>> 
>> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped).
>> 
>> # Fix
>> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation.
>> 
>> # Testing
>> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after.
>> Tier 1-3+ tests passed.
>
> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JDK-8360031: add assert condition and make consume method argument escape

The fix made the `ConstructorBarrier.java` JTREG test fail because the argument of the `consume` method wasn't actually escaping (and IGVN was removing the MemBar). So I added an assignment to a volatile field to make it escape.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26556#issuecomment-3271291568

From epeter at openjdk.org  Tue Sep  9 16:03:30 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 16:03:30 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <f17ZnGhpT227GCOZjqY8jApl7pLmQXopair-hQzqjvg=.70c389d3-fe46-46f2-94e0-4c4080c675d6@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <q3X_EWZRIU-1SVW3m5JilUivZd2tAv8G-IeGilUZpKY=.ce081014-24ce-44f8-9cd3-776930629e97@github.com>
 <rJiYQRIbfT5zCARiBxUQHlMRb8LadmcxInoc5EslZio=.900e813b-a33e-462b-8700-74922d000cb9@github.com>
 <JlGBhH5VrDCRo0FgBh96FPz45d0mXRkyYfcqIHEDwBY=.50195674-df30-491f-b757-d7da4e44c845@github.com>
 <f17ZnGhpT227GCOZjqY8jApl7pLmQXopair-hQzqjvg=.70c389d3-fe46-46f2-94e0-4c4080c675d6@github.com>
Message-ID: <XTiLbsVH3xr9VaHB8b_sIxWJsiUSeLMqjZNR-39KIyE=.96fb2db2-410d-40dd-ac4a-ceb646b17112@github.com>

On Tue, 9 Sep 2025 14:39:51 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> Also: are these all supposed to be projections of a specific kind? We could also test for that. You can also add that to a future RFE.
>
>> Can you add a comment, why it can be arbitrarily large?
> 
> Maybe I'm very wrong about what is a CatchNode, but:
> 
> try {
> ...
> }
> catch ( ... ) { ... }
> catch ( ... ) { ... }
> catch ( ... ) { ... }
> 
> 4 outputs: 3 handlers + 1 fallthrough.
> 
>>  Also: are these all supposed to be projections of a specific kind? We could also test for that. You can also add that to a future RFE.
> 
> I'd rather do it separately. We can always check more things, but we need to draw the line and that is safe to add later.

Fair enough, thanks for the explanations :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2334110704

From epeter at openjdk.org  Tue Sep  9 16:06:43 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 16:06:43 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v5]
In-Reply-To: <M5djFQjWJW4UWxTT_vXvr07LMcclQULmfEIe_yM-BwQ=.d8e0dc86-0156-4d84-a12c-467b55bebdd3@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <AxWFxToPeFZtD6SYz4_UETbjzDCyaulW2sU4bCTrcBo=.deaa1648-2126-4a86-a87c-f4e766c70354@github.com>
 <PiHAn_huh5Qzg8nv-k8RJmQVwUPwQc8xDNqkuE9GdOw=.d13aac7a-8e74-4312-b538-833b1684f623@github.com>
 <M5djFQjWJW4UWxTT_vXvr07LMcclQULmfEIe_yM-BwQ=.d8e0dc86-0156-4d84-a12c-467b55bebdd3@github.com>
Message-ID: <GaTItJ1Jgf5AHved9qgxlPZ-AQR55UAucPfPk5wepbI=.e74e4f19-bde7-4a98-bb98-577e9ec6cb29@github.com>

On Tue, 9 Sep 2025 14:46:14 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> src/hotspot/share/opto/graphInvariants.cpp line 528:
>> 
>>> 526:     if (!center->is_CountedLoop() && !center->is_LongCountedLoop()) {
>>> 527:       return CheckResult::NOT_APPLICABLE;
>>> 528:     }
>> 
>> Actually: why not applie that to `OuterStripMinedLoop` as well? Or any `BaseCountedLoop`? Are there more than these 3 cases? If there are ever more, they should probably also adhere to this backedge pattern, we'll just need an extension. But it would be nice to trip over something here if we ever do extend.
>
> I'm going to push back on that. I rather want this one to be about counted loops, which have more structure that is HEAVILY relied on, that I haven't all enumerated, but that can be done.
> 
> One can make another check for the few things that hold for other flavors.

My understanding is this: Any kind of loop has to have a matching end node, and a backedge. That is essencially the structure you are checking for int and long loops, but it also holds for the other loops. If you don't want to do it now, then note it down and consider it in a future RFE ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2334116311

From mchevalier at openjdk.org  Tue Sep  9 16:17:29 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 16:17:29 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v6]
In-Reply-To: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
Message-ID: <VR7SwSdLWdcGCw_i18MbPdoQhBPuaW6LtA7LEVqWMjo=.0d90ae9b-3915-4c62-bc4e-2428f591320e@github.com>

> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
> 
> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
> 
> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
> 
> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
> 
> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
> 
> 1 failure for node
>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
> At node
>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>   From path:
>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>       <-(0)- 210  IfFalse  === 209  [[ 215 216 ]] #0 !orig=198 !jvms: StringL...

Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:

  lot of fixes, porting patterns in other files

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26362/files
  - new: https://git.openjdk.org/jdk/pull/26362/files/ea78a5a3..99040b8e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=04-05

  Stats: 893 lines in 4 files changed: 498 ins; 234 del; 161 mod
  Patch: https://git.openjdk.org/jdk/pull/26362.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26362/head:pull/26362

PR: https://git.openjdk.org/jdk/pull/26362

From epeter at openjdk.org  Tue Sep  9 16:27:26 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 9 Sep 2025 16:27:26 GMT
Subject: RFR: 8367243: Format issues with dist dump debug output in
 PhaseGVN::dead_loop_check
Message-ID: <auxUv9rgsSDSxqghkIa5eSdcluTtyNUby-W269iQIRg=.0b53ca20-dde2-4c95-a992-8bdb6bbbe77c@github.com>

The `#` option adds color to the terminal. But that only usually works on people's terminals, and not if it is piped to a file on the server. Hence, `#` is only really a debugging feature, and not one to report with in connection with `assert`s.

Simply removed the `#`, and fixed some braces and spaces.

-------------

Commit messages:
 - JDK-8367243

Changes: https://git.openjdk.org/jdk/pull/27175/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27175&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367243
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27175.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27175/head:pull/27175

PR: https://git.openjdk.org/jdk/pull/27175

From thartmann at openjdk.org  Tue Sep  9 16:31:18 2025
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Tue, 9 Sep 2025 16:31:18 GMT
Subject: RFR: 8367243: Format issues with dist dump debug output in
 PhaseGVN::dead_loop_check
In-Reply-To: <auxUv9rgsSDSxqghkIa5eSdcluTtyNUby-W269iQIRg=.0b53ca20-dde2-4c95-a992-8bdb6bbbe77c@github.com>
References: <auxUv9rgsSDSxqghkIa5eSdcluTtyNUby-W269iQIRg=.0b53ca20-dde2-4c95-a992-8bdb6bbbe77c@github.com>
Message-ID: <K_oxZys5zrgre-bHxtI1Bh6aow2VL_eY_iTmNn1gTvc=.ad2a5db0-4dad-42cb-87ed-d39b9ab6ea56@github.com>

On Tue, 9 Sep 2025 16:20:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> The `#` option adds color to the terminal. But that only usually works on people's terminals, and not if it is piped to a file on the server. Hence, `#` is only really a debugging feature, and not one to report with in connection with `assert`s.
> 
> Simply removed the `#`, and fixed some braces and spaces.

Looks good and trivial!

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27175#pullrequestreview-3202398872

From djelinski at openjdk.org  Tue Sep  9 16:59:31 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Tue, 9 Sep 2025 16:59:31 GMT
Subject: RFR: 8366984: Remove delay slot support [v3]
In-Reply-To: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
Message-ID: <vgpJoJvUnSIN4QlIhrFGV81HVURYnl0_xLd4AATHhOY=.7b6f883f-5fa3-49ad-b2ac-5c454982751d@github.com>

> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
> 
> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.

Daniel Jeli?ski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:

 - Merge remote-tracking branch 'origin/master' into delay-slot
 - Revert scope_desc change, breaks macos-aarch64
 - Remove remaining comments
 - Update copyright
 - Remove commented out code
 - Remove unused variables
 - Comment out unused _unconditional_delay_slot
 - Remove bundle flags
 - Remove delay slot support from ADL
 - Clean up delay slot remnants from arm32 code
 - ... and 4 more: https://git.openjdk.org/jdk/compare/cc6d34b2...fb68b5a8

-------------

Changes: https://git.openjdk.org/jdk/pull/27119/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27119&range=02
  Stats: 456 lines in 19 files changed: 1 ins; 407 del; 48 mod
  Patch: https://git.openjdk.org/jdk/pull/27119.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27119/head:pull/27119

PR: https://git.openjdk.org/jdk/pull/27119

From mchevalier at openjdk.org  Tue Sep  9 17:07:40 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 9 Sep 2025 17:07:40 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v7]
In-Reply-To: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
Message-ID: <yV7-E0q8AS7c47YiVbZmioeEAn0KTuZU8-zaI1BV-r8=.c7a00f71-dbac-4911-a183-8af53bc9ee4c@github.com>

> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
> 
> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
> 
> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
> 
> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
> 
> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
> 
> 1 failure for node
>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
> At node
>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>   From path:
>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>       <-(0)- 210  IfFalse  === 209  [[ 215 216 ]] #0 !orig=198 !jvms: StringL...

Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:

  A better way to make them not debug-only, without very ad-hoc hacking

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26362/files
  - new: https://git.openjdk.org/jdk/pull/26362/files/99040b8e..a69b9677

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26362&range=05-06

  Stats: 132 lines in 3 files changed: 106 ins; 13 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/26362.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26362/head:pull/26362

PR: https://git.openjdk.org/jdk/pull/26362

From eosterlund at openjdk.org  Tue Sep  9 19:37:31 2025
From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=)
Date: Tue, 9 Sep 2025 19:37:31 GMT
Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4
 only MacOSX aarch64 [v5]
In-Reply-To: <LZMIDjRdRM4uKuuhsDrLGrwWJoVMrwwEv_bprRIjddk=.048960e2-ef28-4274-a8c4-1f0d1d417100@github.com>
References: <FYgWIv_iFwkr9an56KHdJqZlyUgtD_4g2f51hvavZWw=.f5c943f7-65e6-4944-afec-0b9c19a7b284@github.com>
 <LZMIDjRdRM4uKuuhsDrLGrwWJoVMrwwEv_bprRIjddk=.048960e2-ef28-4274-a8c4-1f0d1d417100@github.com>
Message-ID: <rOPmBGP6913XRhJX2iC9sm43vFXvtYznwdPbPurLHL4=.275865c9-b693-4a42-8f9b-116a550b52d2@github.com>

On Mon, 4 Aug 2025 21:26:22 GMT, Dean Long <dlong at openjdk.org> wrote:

>> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value.  Further, it takes a fast-path that uses the previous direct store when at a safepoint.  Combined, these changes should get us back to almost where we were before in terms of overhead.  If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value.
>
> Dean Long has updated the pull request incrementally with one additional commit since the last revision:
> 
>   one unconditional release should be enough

Sorry for the delay. Looks good.

-------------

Marked as reviewed by eosterlund (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26399#pullrequestreview-3203004583

From dlong at openjdk.org  Tue Sep  9 22:50:06 2025
From: dlong at openjdk.org (Dean Long)
Date: Tue, 9 Sep 2025 22:50:06 GMT
Subject: RFR: 8360031: C2 compilation asserts in MemBarNode::remove [v4]
In-Reply-To: <WWtmL3EDxwnyvgq0UCczwTKvZf2sumk3l0_9IIy1N74=.5f655235-ed97-4aeb-aa66-fd2afa50da25@github.com>
References: <h6lgQmZZxVXPeJl0fRWlz8J713AAhfTAw_5UJ3ZL1S4=.b4b40295-6ea8-4498-abb6-98201776b3c9@github.com>
 <WWtmL3EDxwnyvgq0UCczwTKvZf2sumk3l0_9IIy1N74=.5f655235-ed97-4aeb-aa66-fd2afa50da25@github.com>
Message-ID: <dSVDwssOQTTA-jBKq6wOED-ntWdp7gKNXw1lAX0IfyI=.39575f60-2fa7-4eab-9678-90b7aee68aed@github.com>

On Tue, 9 Sep 2025 15:37:48 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> # Issue
>> While compiling `java.util.zip.ZipFile` in C2 this assert is triggered
>> https://github.com/openjdk/jdk/blob/a2e86ff3c56209a14c6e9730781eecd12c81d170/src/hotspot/share/opto/memnode.cpp#L4235
>> 
>> # Cause
>> While compiling the constructor of java.util.zip.ZipFile$CleanableResource the following happens:
>> * we insert a trailing `MemBarStoreStore` in the constructor
>> <img height="200" alt="before_folding" src="https://github.com/user-attachments/assets/c1aab634-808d-4198-94ac-8093c6b85c5d" />
>> 
>> * during IGVN we completely fold the memory subtree of the `MemBarStoreStore` node. The node still has a control output attached. 
>> <img height="200" alt="after_folding" src="https://github.com/user-attachments/assets/568e9fc3-5f19-4e10-a72e-f0a5e772daed" />
>> 
>> * later during the same IGVN run the `MemBarStoreStore` node is handled and we try to remove it (because the `Allocate` node of the `MembBar` is not escaping the thread ) https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4301-L4302
>> * the assert https://github.com/openjdk/jdk/blob/7b7136b4eca15693cfcd46ae63d644efc8a88d2c/src/hotspot/share/opto/memnode.cpp#L4235
>> triggers because the barrier has only 1 (control) output and is a `MemBarStoreStore` (not `Initialize`) barrier
>> 
>> The issue happens only when the `UseStoreStoreForCtor` is set (default as well), which makes C2 use `MemBarStoreStore` instead of `MemBarRelease` at the end of constructors. `MemBarStoreStore` are processed separately by EA and this happens after the IGVN pass that folds the memory subtree. `MemBarRelease` on the other hand are handled during same IGVN pass before the memory subtree gets removed and it?s still got 2 outputs (assert skipped).
>> 
>> # Fix
>> Adapting the assert to accept that `MemBarStoreStore` can also have `!= 2` outputs (when `+UseStoreStoreForCtor` is used) seems to be an OK solution as this seems like a perfectly plausible situation.
>> 
>> # Testing
>> Unfortunately reproducing the issue with a simple regression test has proven very hard. The test seems to rely on very peculiar profiling and IGVN worklist sequence. JBS replay compilation passes. Running JCK's `api/java_util` 100 times triggers the assert a couple of times on average before the fix, none after.
>> Tier 1-3+ tests passed.
>
> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JDK-8360031: add assert condition and make consume method argument escape

LGTM, but let's wait for @vnkozlov to approve it.

-------------

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26556#pullrequestreview-3203613679

From duke at openjdk.org  Tue Sep  9 23:04:08 2025
From: duke at openjdk.org (Chad Rakoczy)
Date: Tue, 9 Sep 2025 23:04:08 GMT
Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache
 [v47]
In-Reply-To: <CpuGkGuFlcXd3ZwuZCG8oWEEa2GKgTs3LaGwpIESm9g=.4807c870-5cce-4f87-aca7-79c1b87e7b0a@github.com>
References: <CpuGkGuFlcXd3ZwuZCG8oWEEa2GKgTs3LaGwpIESm9g=.4807c870-5cce-4f87-aca7-79c1b87e7b0a@github.com>
Message-ID: <ipQ3Ffpq76wW20yGAebpKIf4Gh47P0sBmj2ciBwY9kI=.1479ef99-c462-409c-aa30-086b589478c3@github.com>

> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186).
> 
> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation.
> 
> This does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created to test the new functionality
> 
> Additional Testing:
> - [x] Linux x64 fastdebug tier 1/2/3/4
> - [x] Linux aarch64 fastdebug tier 1/2/3/4

Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision:

  Fix race when not installed nmethod is deoptimized

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/23573/files
  - new: https://git.openjdk.org/jdk/pull/23573/files/a2051637..bf18a4c8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=46
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=45-46

  Stats: 8 lines in 4 files changed: 2 ins; 2 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/23573.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573

PR: https://git.openjdk.org/jdk/pull/23573

From duke at openjdk.org  Tue Sep  9 23:22:52 2025
From: duke at openjdk.org (Chad Rakoczy)
Date: Tue, 9 Sep 2025 23:22:52 GMT
Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache
 [v46]
In-Reply-To: <K9If679t8ipetKmZAt1YVCXy5vplvdgCEs9O9VT8d30=.cc8cbc06-ce38-4806-a5de-0e7d60957c07@github.com>
References: <CpuGkGuFlcXd3ZwuZCG8oWEEa2GKgTs3LaGwpIESm9g=.4807c870-5cce-4f87-aca7-79c1b87e7b0a@github.com>
 <S-edDUdZcJb2zCePiPAGlUTPvrhVN5GbV2e7kC7Eu78=.f8346731-b24f-4ab1-bb2b-6f8d3435e0a6@github.com>
 <K9If679t8ipetKmZAt1YVCXy5vplvdgCEs9O9VT8d30=.cc8cbc06-ce38-4806-a5de-0e7d60957c07@github.com>
Message-ID: <KppOD2Z0CMuT4g4HFFl4_wPIqZUw2yWTLvyPrqxITaY=.3eb2f78f-ebdd-4be5-8ec4-96ece990a2ce@github.com>

On Sat, 30 Aug 2025 00:32:02 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

> It failed on linux-x64 and linux-aarch64. I tried locally on linux-x64 but it passed.

Sorry for the late response I have been on vacation.

The test failed due to a race condition involving the de-optimization of a `not_installed` nmethod.

`CompiledICLocker` uses `CompiledICProtectionBehaviour::is_safe(nm)` to determine whether it needs to acquire the `CompiledIC_lock`. If the nmethod `not_installed` at the time of the check, the lock is not acquired. However, if the nmethod is de-optimized and its state transitions to `not_entrant`, the next evaluation of `is_safe(nm)` will return false because the nmethod is no longer `not_installed`.

The fix is to ensure that the `NMethodState_lock` is held when checking `nmethod::is_not_installed`, to prevent concurrent state changes that could lead to this race.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3272584418

From dlong at openjdk.org  Tue Sep  9 23:31:01 2025
From: dlong at openjdk.org (Dean Long)
Date: Tue, 9 Sep 2025 23:31:01 GMT
Subject: RFR: 8361376: Regressions 1-6% in several Renaissance in 26-b4
 only MacOSX aarch64 [v5]
In-Reply-To: <-MqvO74Up2R0qmEDtgyGY-yScxZ-v6ZQWxDtSxpKO_g=.56d4eeca-670d-41e4-9e96-ba20b1b44100@github.com>
References: <FYgWIv_iFwkr9an56KHdJqZlyUgtD_4g2f51hvavZWw=.f5c943f7-65e6-4944-afec-0b9c19a7b284@github.com>
 <LZMIDjRdRM4uKuuhsDrLGrwWJoVMrwwEv_bprRIjddk=.048960e2-ef28-4274-a8c4-1f0d1d417100@github.com>
 <T06fzDKNa9g5UbLKM_kPYCgTDFK8dRrfLle5iaVKtCA=.9704b0b1-6886-45b9-bb0f-aa26caa25d68@github.com>
 <-MqvO74Up2R0qmEDtgyGY-yScxZ-v6ZQWxDtSxpKO_g=.56d4eeca-670d-41e4-9e96-ba20b1b44100@github.com>
Message-ID: <Cjf_lQobp8s7MFspFELlwB-zawalxntx0P0BQ-0syXc=.d98c6e00-07da-4a97-9fc5-2dcf38c3be62@github.com>

On Wed, 27 Aug 2025 20:17:07 GMT, Erik ?sterlund <eosterlund at openjdk.org> wrote:

>> @fisk , can I get you to review this?
>
>> @fisk , can I get you to review this?
> 
> Sure! Based on the symptoms you described, my main comment is that we might be looking at the wrong places. I don't know if this is really about lock contention. Perhaps it is indirectly. But you mention there is still so e regression with ZGC.
> 
> My hypothesis would be that it is the unnecessary incrementing of the global patching epoch that causes the regression when using ZGC. It is only really needed when disarming the nmethod - in orher words when the guard value is set to the good value.
> 
> The point of incrementing the patching epoch is to protect other threads from entering the nmethod without executing an instruction cross modication fence. And all other threads will have to do that.
> 
> Only ZGC uses the mode of nmethod entry barriers that does this due to being the only GC that updates instructions in a concurrent phase on AArch64. We are conservative on AArch64 and ensure the use of appropriate synchronous cross modifying code. But that's not needed when arming, which is what we do when making the bmethod not entrant.

Thanks @fisk and @theRealAph .

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26399#issuecomment-3272594352

From dlong at openjdk.org  Tue Sep  9 23:31:03 2025
From: dlong at openjdk.org (Dean Long)
Date: Tue, 9 Sep 2025 23:31:03 GMT
Subject: Integrated: 8361376: Regressions 1-6% in several Renaissance in 26-b4
 only MacOSX aarch64
In-Reply-To: <FYgWIv_iFwkr9an56KHdJqZlyUgtD_4g2f51hvavZWw=.f5c943f7-65e6-4944-afec-0b9c19a7b284@github.com>
References: <FYgWIv_iFwkr9an56KHdJqZlyUgtD_4g2f51hvavZWw=.f5c943f7-65e6-4944-afec-0b9c19a7b284@github.com>
Message-ID: <u0UCXLbR1VMoJ_wBLo7_yHfmxRcEToRaMds05kz8BqQ=.abf615fe-796f-4e54-babd-3edd8b399d24@github.com>

On Sat, 19 Jul 2025 01:39:12 GMT, Dean Long <dlong at openjdk.org> wrote:

> This PR removes the recently added lock around set_guard_value, using instead Atomic::cmpxchg to atomically update bit-fields of the guard value.  Further, it takes a fast-path that uses the previous direct store when at a safepoint.  Combined, these changes should get us back to almost where we were before in terms of overhead.  If necessary, we could go even further and allow make_not_entrant() to perform a direct byte store, leaving 24 bits for the guard value.

This pull request has now been integrated.

Changeset: f9640398
Author:    Dean Long <dlong at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/f96403986b99008593e025c4991ee865fce59bb1
Stats:     240 lines in 15 files changed: 128 ins; 71 del; 41 mod

8361376: Regressions 1-6% in several Renaissance in 26-b4 only MacOSX aarch64

Co-authored-by: Martin Doerr <mdoerr at openjdk.org>
Reviewed-by: mdoerr, aph, eosterlund

-------------

PR: https://git.openjdk.org/jdk/pull/26399

From missa at openjdk.org  Wed Sep 10 01:00:52 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Wed, 10 Sep 2025 01:00:52 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v10]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <xTvNYjyfGzoDTcVYsk4sRMTwpXCNy7-4g2S9moyWrWY=.71ee655c-4eb3-4df4-8003-6e083a97e595@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Add new IR nodes covering x86 floating point conversion instructions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/4d8f3ab6..bc59e4d2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=08-09

  Stats: 121 lines in 3 files changed: 60 ins; 0 del; 61 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From missa at openjdk.org  Wed Sep 10 01:00:55 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Wed, 10 Sep 2025 01:00:55 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v9]
In-Reply-To: <sSFSX4MSj6uHOW0VOmXBi75QgHEbOGUHh4wJRtxgR44=.b3b4e38f-0719-4c3d-ae38-14d2df8fd9f7@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <mG1sFdV99uAG4cWGfM6kCew9UVLdVuG4_GHADimAsVQ=.8013b182-afc1-4156-9718-13efb348bbb6@github.com>
 <sSFSX4MSj6uHOW0VOmXBi75QgHEbOGUHh4wJRtxgR44=.b3b4e38f-0719-4c3d-ae38-14d2df8fd9f7@github.com>
Message-ID: <0JotX9md-fjgXvgjODrvDQuHSHQQOI_TW-1U4qNDGz4=.25b4ff36-ea2e-4eba-8a5f-0a2bfe405064@github.com>

On Tue, 9 Sep 2025 02:28:45 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Check for scalar casting instead of vector casting in tests when disabling vector alignment or compact object headers
>
> test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 432:
> 
>> 430:         applyIfCPUFeatureAnd = {"avx", "true", "avx10_2", "false"})
>> 431:     @IR(counts = {"cast2DtoX", " >0 "}, phase = CompilePhase.FINAL_CODE,
>> 432:         applyIfCPUFeature = {"avx10_2", "true"})
> 
> Please refer to https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java#L2638 for adding MachNode IR node based checks

Thanks, I added some new nodes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2335208252

From dlong at openjdk.org  Wed Sep 10 01:07:22 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 10 Sep 2025 01:07:22 GMT
Subject: RFR: 8366984: Remove delay slot support [v3]
In-Reply-To: <vgpJoJvUnSIN4QlIhrFGV81HVURYnl0_xLd4AATHhOY=.7b6f883f-5fa3-49ad-b2ac-5c454982751d@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
 <vgpJoJvUnSIN4QlIhrFGV81HVURYnl0_xLd4AATHhOY=.7b6f883f-5fa3-49ad-b2ac-5c454982751d@github.com>
Message-ID: <sM4M2ODWvAXO0IFdJrjCqc5_gdaa2istbhQiCzuVXCM=.ae02757b-6d56-4a31-93c6-794e67c038cc@github.com>

On Tue, 9 Sep 2025 16:59:31 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

>> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
>> 
>> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.
>
> Daniel Jeli?ski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:
> 
>  - Merge remote-tracking branch 'origin/master' into delay-slot
>  - Revert scope_desc change, breaks macos-aarch64
>  - Remove remaining comments
>  - Update copyright
>  - Remove commented out code
>  - Remove unused variables
>  - Comment out unused _unconditional_delay_slot
>  - Remove bundle flags
>  - Remove delay slot support from ADL
>  - Clean up delay slot remnants from arm32 code
>  - ... and 4 more: https://git.openjdk.org/jdk/compare/cc6d34b2...fb68b5a8

Marked as reviewed by dlong (Reviewer).

src/hotspot/share/runtime/sharedRuntime.cpp line 3505:

> 3503:         nm = cb->as_nmethod();
> 3504:         method = nm->method();
> 3505:         for (ScopeDesc *sd = nm->scope_desc_near(fr.pc()); sd != nullptr; sd = sd->sender()) {

It's tempting to try to change this to scope_desc_at(), but also slightly risky if SPARC isn't the only reason it was needed.  Should we investigate this in a separate RFE?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27119#pullrequestreview-3203943528
PR Review Comment: https://git.openjdk.org/jdk/pull/27119#discussion_r2335213805

From dlong at openjdk.org  Wed Sep 10 01:09:16 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 10 Sep 2025 01:09:16 GMT
Subject: RFR: 8366461: Remove obsolete method handle invoke logic [v3]
In-Reply-To: <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
References: <LQQer6eHAvGEV6clizLClEdOtBBIO7GCQCzibGcEzL8=.7ec9480c-c660-460d-ab5c-69d4d4a4d03d@github.com>
 <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
Message-ID: <AwkHmH0d9__b_S9ycjrMsSJx1lSWIgekneIgiPb4PGM=.37491d2c-9b0b-4d84-88ef-f9555865445d@github.com>

On Tue, 2 Sep 2025 20:52:32 GMT, Dean Long <dlong at openjdk.org> wrote:

>> At one time, JSR292 support needed special logic to save and restore SP across method handle instrinsic calls, but that is no longer the case. The only platform that still does the save/restore is arm32, which is no longer necessary. The save/restore can be removed along with related APIs and logic. Note that the arm32 port is largely based on the x86 port, which stopped doing the save/restore in jdk9 ([JDK-8068945](https://bugs.openjdk.org/browse/JDK-8068945)).
>
> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - revert whitespace change
>  - undo debug changes
>  - cleanup

I need one more review for this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27059#issuecomment-3272844515

From duke at openjdk.org  Wed Sep 10 01:49:14 2025
From: duke at openjdk.org (duke)
Date: Wed, 10 Sep 2025 01:49:14 GMT
Subject: RFR: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where
 the input index is a variable
In-Reply-To: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
Message-ID: <K4xpD4elMDHMukQms2WkEFNBdY7y8SIk03fNmX7WP-8=.05c3a4d6-7987-4bd8-a429-2bd694893e7b@github.com>

On Fri, 5 Sep 2025 08:13:28 GMT, erifan <duke at openjdk.org> wrote:

> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
> 
> 
> Benchmarks on Intel 6444y machine with 512-bit avx3:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
> microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
> microMaskLane...

@erifan 
Your change (at version a672dd26c6c7547bca260815ae2e1d7c3652c929) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27113#issuecomment-3272907016

From duke at openjdk.org  Wed Sep 10 01:53:21 2025
From: duke at openjdk.org (erifan)
Date: Wed, 10 Sep 2025 01:53:21 GMT
Subject: Integrated: 8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet
 where the input index is a variable
In-Reply-To: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
References: <XaokWZpc-AL2hWWGvsmDczO1u4o5uGEaCSgPjSyxkz4=.3251a974-6374-48c8-a8e4-88914b730505@github.com>
Message-ID: <xX4wg0LJKdi7vrLDcyOvfnZmUnqw_2UpYpgFw-fJfMI=.ac5c7524-5222-41e0-9d41-784083739ccb@github.com>

On Fri, 5 Sep 2025 08:13:28 GMT, erifan <duke at openjdk.org> wrote:

> Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	21702.14415	91.902159	103472.9391	36.057447	4.767867
> microMaskLaneIsSetByte64_var	ops/ms	21468.51868	107.94177	103365.6561	69.47736	4.814754
> microMaskLaneIsSetDouble128_var	ops/ms	77489.32791	153.242699	413499.4127	311.854079	5.336211
> microMaskLaneIsSetFloat128_var	ops/ms	41034.95204	399.421823	206840.0988	74.702234	5.040583
> microMaskLaneIsSetFloat64_var	ops/ms	77607.40268	175.938921	413745.3001	149.716794	5.33126
> microMaskLaneIsSetInt128_var	ops/ms	41452.48893	76.143208	206845.9754	59.371129	4.989953
> microMaskLaneIsSetInt64_var	    ops/ms	77726.2542	173.180518	413427.8838	363.575023	5.319024
> microMaskLaneIsSetLong128_var	ops/ms	77646.11218	177.496587	413403.4404	236.609314	5.3242
> microMaskLaneIsSetShort128_var	ops/ms	21374.93265	48.13101	103417.4618	34.827021	4.838259
> microMaskLaneIsSetShort64_var	ops/ms	41066.19395	353.320621	206801.109	106.408938	5.035799
> 
> 
> Benchmarks on Intel 6444y machine with 512-bit avx3:
> 
> Benchmark			            Unit	Before		Score Error	After		Score Error	Uplift
> microMaskLaneIsSetByte128_var	ops/ms	57658.45497	240.209309	211643.8406	29.214532	3.670647
> microMaskLaneIsSetByte256_var	ops/ms	57451.68169	116.994128	211609.4652	160.48513	3.683259
> microMaskLaneIsSetByte512_var	ops/ms	57530.22411	311.63868	199802.8084	408.144015	3.473005
> microMaskLaneIsSetByte64_var	ops/ms	57642.2672	161.406221	205252.4464	196.86852	3.560797
> microMaskLaneIsSetDouble256_var	ops/ms	114401.3789	231.797375	361400.344	565.593984	3.159055
> microMaskLaneIsSetDouble512_var	ops/ms	57379.27882	159.699503	211476.1138	136.980026	3.685583
> microMaskLaneIsSetFloat128_var	ops/ms	113943.9512	141.062663	360855.3915	494.471996	3.166955
> microMaskLaneIsSetFloat256_var	ops/ms	57682.78182	138.142053	211659.5098	30.167972	3.66937
> microMaskLaneIsSetFloat512_var	ops/ms	57617.66405	301.748599	211246.8588	597.18949	3.666355
> microMaskLaneIsSetInt128_var	ops/ms	113914.5062	118.681382	360856.4465	555.097397	3.167783
> microMaskLaneIsSetInt256_var	ops/ms	57681.79883	112.391639	211555.6742	217.556981	3.667633
> microMaskLaneIsSetInt512_var	ops/ms	57350.20346	206.146723	211657.7207	68.461571	3.690618
> microMaskLane...

This pull request has now been integrated.

Changeset: 53b3e056
Author:    erifan <erfang at nvidia.com>
Committer: Xiaohong Gong <xgong at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/53b3e0567d2801ddf62c5849b219324ddfcb264a
Stats:     170 lines in 4 files changed: 168 ins; 0 del; 2 mod

8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where the input index is a variable

Reviewed-by: shade, xgong, epeter

-------------

PR: https://git.openjdk.org/jdk/pull/27113

From dzhang at openjdk.org  Wed Sep 10 03:16:41 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Wed, 10 Sep 2025 03:16:41 GMT
Subject: RFR: 8367293: RISC-V: enable vectorapi test for VectorMask.laneIsSet
Message-ID: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>

Hi,
Can you help to review this patch? Thanks!

[JDK-8366588](https://bugs.openjdk.org/browse/JDK-8366588) adds a vectorapi test for VectorMask.laneIsSet, which we can also enable on RISC-V.

### Test (fastdebug)
- [x] Run compiler/vectorapi/VectorMaskLaneIsSetTest.java on k1, k230 and sg2042

-------------

Commit messages:
 - 8367293: RISC-V: enable vectorapi test for VectorMask.laneIsSet

Changes: https://git.openjdk.org/jdk/pull/27181/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27181&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367293
  Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/27181.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27181/head:pull/27181

PR: https://git.openjdk.org/jdk/pull/27181

From fyang at openjdk.org  Wed Sep 10 04:06:11 2025
From: fyang at openjdk.org (Fei Yang)
Date: Wed, 10 Sep 2025 04:06:11 GMT
Subject: RFR: 8367293: RISC-V: enable vectorapi test for
 VectorMask.laneIsSet
In-Reply-To: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
References: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
Message-ID: <kzGt_GkeLG65jLSJiJUdkdWjyxhp9vBTtzDbPM-aJbA=.b508057e-d144-4f9f-93be-e7a06d51ecd5@github.com>

On Wed, 10 Sep 2025 03:10:02 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8366588](https://bugs.openjdk.org/browse/JDK-8366588) adds a vectorapi test for VectorMask.laneIsSet, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskLaneIsSetTest.java on k1, k230 and sg2042

Thanks!

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27181#pullrequestreview-3204337742

From galder at openjdk.org  Wed Sep 10 04:30:21 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Wed, 10 Sep 2025 04:30:21 GMT
Subject: RFR: 8366845: C2 SuperWord: wrong VectorCast after
 VectorReinterpret with swapped src/dst type
In-Reply-To: <h8Wj-7MocJpnnBqCE_UUJvMMEYKUz5Xv6imVz-Q7ziA=.87f4f694-815b-47b9-ab1f-7916da64ea8a@github.com>
References: <VznBrJ_glHHA4wRo0joE4KO7c5J1F5z7W2u-px1uQzQ=.6cc5a46c-8731-4612-acda-9aab0e177f5a@github.com>
 <wM2-jj9R6aEVh5NPIoI3ycOdwjpABPLXAkzoA9Xa_8I=.d5042609-8733-4a46-9cba-220fd11e44b8@github.com>
 <h8Wj-7MocJpnnBqCE_UUJvMMEYKUz5Xv6imVz-Q7ziA=.87f4f694-815b-47b9-ab1f-7916da64ea8a@github.com>
Message-ID: <ovDvN-ev4E6SpCTC9InByqNpnNwutgUKpee9b_X3jnc=.12c99a07-0fa8-4264-8ec8-e7526cd395c3@github.com>

On Fri, 5 Sep 2025 09:00:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Makes sense @eme64. Happy with the fix and tests :)
>
> @galderz @iwanowww @TobiHartmann FYI, I filed:
> [JDK-8366965](https://bugs.openjdk.org/browse/JDK-8366965) C2 SuperWord: add more tests for MoveF2I / Float.floatToRawIntBits and friends

Thanks for the quick turnaround @eme64 on this!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27100#issuecomment-3273275432

From epeter at openjdk.org  Wed Sep 10 05:12:12 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 05:12:12 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v7]
In-Reply-To: <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
Message-ID: <jeyCSLLqDxnfoeYJ3YtQx0snqyZPDjsulpC2l-b0YDg=.931cb727-bfd1-4afb-873e-66e4f8f7d57b@github.com>

On Tue, 26 Aug 2025 12:46:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Tests pass, approved ? 

@merykitty @mhaessig your turn ?

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/25254#pullrequestreview-3204439034

From epeter at openjdk.org  Wed Sep 10 05:16:10 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 05:16:10 GMT
Subject: RFR: 8367293: RISC-V: enable vectorapi test for
 VectorMask.laneIsSet
In-Reply-To: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
References: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
Message-ID: <x-j1fjl4LRf6n_Hp0rcQZ01UuO3zSjpyBbkIM9hdcz4=.9e8a0615-0645-4606-a758-07bba229cb57@github.com>

On Wed, 10 Sep 2025 03:10:02 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8366588](https://bugs.openjdk.org/browse/JDK-8366588) adds a vectorapi test for VectorMask.laneIsSet, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskLaneIsSetTest.java on k1, k230 and sg2042

Looks reasonable :)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27181#pullrequestreview-3204446793

From djelinski at openjdk.org  Wed Sep 10 06:08:21 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Wed, 10 Sep 2025 06:08:21 GMT
Subject: RFR: 8366984: Remove delay slot support [v3]
In-Reply-To: <sM4M2ODWvAXO0IFdJrjCqc5_gdaa2istbhQiCzuVXCM=.ae02757b-6d56-4a31-93c6-794e67c038cc@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
 <vgpJoJvUnSIN4QlIhrFGV81HVURYnl0_xLd4AATHhOY=.7b6f883f-5fa3-49ad-b2ac-5c454982751d@github.com>
 <sM4M2ODWvAXO0IFdJrjCqc5_gdaa2istbhQiCzuVXCM=.ae02757b-6d56-4a31-93c6-794e67c038cc@github.com>
Message-ID: <_tx-ASKQdoHNnXSOi30eyjBgbDtsMY_WaoRPNuqrX80=.f65d62bd-dd4a-448b-b466-f76ca8d01112@github.com>

On Wed, 10 Sep 2025 01:03:57 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Daniel Jeli?ski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:
>> 
>>  - Merge remote-tracking branch 'origin/master' into delay-slot
>>  - Revert scope_desc change, breaks macos-aarch64
>>  - Remove remaining comments
>>  - Update copyright
>>  - Remove commented out code
>>  - Remove unused variables
>>  - Comment out unused _unconditional_delay_slot
>>  - Remove bundle flags
>>  - Remove delay slot support from ADL
>>  - Clean up delay slot remnants from arm32 code
>>  - ... and 4 more: https://git.openjdk.org/jdk/compare/cc6d34b2...fb68b5a8
>
> src/hotspot/share/runtime/sharedRuntime.cpp line 3505:
> 
>> 3503:         nm = cb->as_nmethod();
>> 3504:         method = nm->method();
>> 3505:         for (ScopeDesc *sd = nm->scope_desc_near(fr.pc()); sd != nullptr; sd = sd->sender()) {
> 
> It's tempting to try to change this to scope_desc_at(), but also slightly risky if SPARC isn't the only reason it was needed.  Should we investigate this in a separate RFE?

[I tried](https://github.com/djelinski/jdk/actions/runs/17495967429) before I posted this PR. Apparently scope_desc_near is also needed on macosx-aarch64:

#  Internal Error (nmethod.cpp:668), pid=11090, tid=43267
#  guarantee(pd != nullptr) failed: scope must be present

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27119#discussion_r2335651132

From djelinski at openjdk.org  Wed Sep 10 06:19:29 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Wed, 10 Sep 2025 06:19:29 GMT
Subject: RFR: 8366984: Remove delay slot support
In-Reply-To: <RQ6vnLOFZjJ22UYrpOMl9rRVntEQwaHa-DLfVQopMfs=.d5dd313a-198c-4678-88f1-9875ec52e008@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
 <RQ6vnLOFZjJ22UYrpOMl9rRVntEQwaHa-DLfVQopMfs=.d5dd313a-198c-4678-88f1-9875ec52e008@github.com>
Message-ID: <tIfGRAf_jkcy2mA7fUjBj4hf9sANMYsru2UUZFHnyFQ=.d864b318-448d-4066-b708-2428e3b7bed6@github.com>

On Tue, 9 Sep 2025 10:56:04 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
>> 
>> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.
>
> Thanks for the answers!
> 
> You'll of course have to merge the dependency, and get a second review :)

Thanks @eme64 and @dean-long for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27119#issuecomment-3273459678

From djelinski at openjdk.org  Wed Sep 10 06:19:30 2025
From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=)
Date: Wed, 10 Sep 2025 06:19:30 GMT
Subject: Integrated: 8366984: Remove delay slot support
In-Reply-To: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
References: <msZk11XqMQy93X0WkadtY0MP-sB74iIsygq2iXqFc_I=.e1c48e06-4b7f-4201-98d0-09ec2ecab91e@github.com>
Message-ID: <aR1xxMzuuFGNIrACCd0zKkV3DxLCi0D0R_lUXdFtoKc=.3a3266b5-8f6d-4371-9827-8abbf4bc2708@github.com>

On Fri, 5 Sep 2025 14:24:50 GMT, Daniel Jeli?ski <djelinski at openjdk.org> wrote:

> SPARC was the only supported architecture that uses a delay slot. The SPARC port was removed in JDK 15, and the code is effectively dead. Let's remove it.
> 
> The changes are no-op on all architectures that do not use delay slots. I still tested tier 1-5 on mach5, no related failures.

This pull request has now been integrated.

Changeset: b7b01d6f
Author:    Daniel Jeli?ski <djelinski at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/b7b01d6f564ae34e913ae51bd2f8243a32807136
Stats:     456 lines in 19 files changed: 1 ins; 407 del; 48 mod

8366984: Remove delay slot support

Reviewed-by: dlong, epeter

-------------

PR: https://git.openjdk.org/jdk/pull/27119

From qxing at openjdk.org  Wed Sep 10 06:58:33 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Wed, 10 Sep 2025 06:58:33 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v13]
In-Reply-To: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
Message-ID: <yHej4vjLnXluWqjX0z5UsGaQJjyx071QeDqcL3rmOAk=.4ffaf2c4-8962-4989-8142-53b62458bc61@github.com>

> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases.
> 
> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch:
> 
> 
> public static int numberOfNibbles(int i) {
>   int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i);
>   return Math.max((mag + 3) / 4, 1);
> }
> 
> 
> Testing: tier1, IR test

Qizheng Xing has updated the pull request incrementally with two additional commits since the last revision:

 - Add random range tests
 - Add more comments to IR test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25928/files
  - new: https://git.openjdk.org/jdk/pull/25928/files/d09d4cb0..79394a25

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=11-12

  Stats: 180 lines in 1 file changed: 154 ins; 16 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/25928.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928

PR: https://git.openjdk.org/jdk/pull/25928

From qxing at openjdk.org  Wed Sep 10 06:58:35 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Wed, 10 Sep 2025 06:58:35 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v10]
In-Reply-To: <bVqqmEXHIoBacby_IoOzCHbAt4nzoS4M6p-QySMh7gc=.57711175-230b-4ce2-85db-c050e4912509@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
 <y1b0oyJhY7YkAtBpuou3hMv2aSy7SnM1M5Y5QH4oLi4=.6502c4df-4bdf-433f-840c-1de76de82c22@github.com>
 <bVqqmEXHIoBacby_IoOzCHbAt4nzoS4M6p-QySMh7gc=.57711175-230b-4ce2-85db-c050e4912509@github.com>
Message-ID: <mwxOJIaF76Xqln0bzenOODTN7-SY6dQ28dln7qu-KTE=.902baa09-b788-41a0-97ac-f554f6243071@github.com>

On Tue, 19 Aug 2025 14:00:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Remove redundant `@require` in IR test
>
> test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 164:
> 
>> 162:         return Long.numberOfTrailingZeros(l) / 8;
>> 163:     }
>> 164: }
> 
> Nice examples! Could you please add a short description to most of them, explaining what you are testing with each? It would help me as a reviewer to see if you cover enough cases.
> 
> I'm also missing some cases where you have non-trivial input ranges. And then verification that the output range is correct.
> 
> You could look at this example:
> https://github.com/openjdk/jdk/pull/25254/files#diff-0e3d89ac8cf0548b69d9bdb0859380bc31de0a772fa7ff211f446a4a5abd4197R220-R248

Added comments for unit tests and random ranges tests like the example.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2335744000

From qxing at openjdk.org  Wed Sep 10 07:03:02 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Wed, 10 Sep 2025 07:03:02 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v14]
In-Reply-To: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
Message-ID: <rfJok2To3wAFUZVTuijpiuD03NQDcR3rouE9TNtoDPM=.f1ebc739-65fe-4c25-9587-5efddec3a0db@github.com>

> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases.
> 
> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch:
> 
> 
> public static int numberOfNibbles(int i) {
>   int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i);
>   return Math.max((mag + 3) / 4, 1);
> }
> 
> 
> Testing: tier1, IR test

Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:

  Remove redundant import

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25928/files
  - new: https://git.openjdk.org/jdk/pull/25928/files/79394a25..f5d1e53d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=12-13

  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/25928.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928

PR: https://git.openjdk.org/jdk/pull/25928

From qxing at openjdk.org  Wed Sep 10 07:18:16 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Wed, 10 Sep 2025 07:18:16 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v9]
In-Reply-To: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
 <py_UgbCQ3Y7BlN3tQkylQSISyMJ3zHa3VoDP7VK83jY=.e71f52c0-ef80-4f7e-afe8-0e60d33cb785@github.com>
 <dA3mvVbfZcBhR9Yi6HKk9s_7UZ76kI1CkReQFbyDZms=.cc99b4a1-dff5-4202-8936-86301a41e766@github.com>
 <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com>
Message-ID: <hg5lN2IyBUXW0t0aLKI_lNs8SXV3Kl-q6Hz0z0-0fOg=.3378a8ac-29d6-4942-939e-4e61933d1c77@github.com>

On Tue, 9 Sep 2025 08:40:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Hi @jatin-bhateja, I've added a micro benchmark that includes the `numberOfNibbles` implementation from this PR description and your micro kernel.
>> 
>> Here's my test results on an Intel(R) Xeon(R) Platinum:
>> 
>> 
>> # Baseline:
>> Benchmark                                  Mode  Cnt     Score   Error  Units
>> CountLeadingZeros.benchClzLongConstrained  avgt   15  1517.888 ? 5.691  ns/op
>> CountLeadingZeros.benchNumberOfNibbles     avgt   15  1094.422 ? 1.753  ns/op
>> 
>> # This patch:
>> Benchmark                                  Mode  Cnt    Score   Error  Units
>> CountLeadingZeros.benchClzLongConstrained  avgt   15    0.948 ? 0.002  ns/op
>> CountLeadingZeros.benchNumberOfNibbles     avgt   15  942.438 ? 1.742  ns/op
>
> @MaxXSoft Feel free to just ping me again when you want another review :)
> FYI: I'll be on a longer vacation starting in about a week, so don't expect me to respond then.

@eme64 Thank you for your patience and kind reviews! I've updated this patch based on your suggestions.

This patch is now ready for further review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3273636620

From duke at openjdk.org  Wed Sep 10 07:34:58 2025
From: duke at openjdk.org (erifan)
Date: Wed, 10 Sep 2025 07:34:58 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v11]
In-Reply-To: <HKiejePdRHy-xJNBBNnw09SHkkOpY0EWaVIvS-xg36E=.55c2c30d-f84b-4ca0-a4a8-a25ffbd31236@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <HKiejePdRHy-xJNBBNnw09SHkkOpY0EWaVIvS-xg36E=.55c2c30d-f84b-4ca0-a4a8-a25ffbd31236@github.com>
Message-ID: <_ZKvuU_IqxgtXTVqz8yS2XOnItp0mtlemk2CR2p551s=.5c2ce4d5-f851-4acc-9994-adc76813d640@github.com>

On Wed, 9 Jul 2025 06:08:33 GMT, erifan <duke at openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
>> 
>> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
>> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
>> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
>> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
>> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
>> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
>> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
>> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
>> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
>> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
>> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
>> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
>> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
>> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
>> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
>> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
>> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
>> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
>> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
>> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
>> testCompareLTMaskNotInt		ops/s	16721...
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update the code comment

@eme64 Thank you for your patience in reviewing this PR. I'm doing some internal testing and expect to push a new commit next week. I'll be on vacation for the next two days. Thank you!

-------------

PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-3204218047

From duke at openjdk.org  Wed Sep 10 07:35:03 2025
From: duke at openjdk.org (erifan)
Date: Wed, 10 Sep 2025 07:35:03 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v11]
In-Reply-To: <Dy9rqrrgUAsownC_lhp5729sObjNAXlCQs6RwOZusCQ=.12827ffc-d4c8-4a98-a309-653af5a97519@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <HKiejePdRHy-xJNBBNnw09SHkkOpY0EWaVIvS-xg36E=.55c2c30d-f84b-4ca0-a4a8-a25ffbd31236@github.com>
 <Dy9rqrrgUAsownC_lhp5729sObjNAXlCQs6RwOZusCQ=.12827ffc-d4c8-4a98-a309-653af5a97519@github.com>
Message-ID: <zT4GnjtDqBcRgbc-Yd8Ln5JfJfltu4PQu20XMt4z-LI=.5174f54b-727e-4bdb-aa48-ce04b38c7728@github.com>

On Tue, 9 Sep 2025 12:56:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> erifan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update the code comment
>
> src/hotspot/share/opto/vectornode.cpp line 2243:
> 
>> 2241:   if (in1->Opcode() != Op_VectorMaskCmp ||
>> 2242:       in1->outcnt() != 1 ||
>> 2243:       !(in1->as_VectorMaskCmp())->predicate_can_be_negated() ||
> 
> Suggestion:
> 
>       !in1->as_VectorMaskCmp()->predicate_can_be_negated() ||
> 
> Brackets are unnecessary, and rather make it harder to read.

Good catch, done.

> src/hotspot/share/opto/vectornode.cpp line 2277:
> 
>> 2275:     res = VectorNode::Ideal(phase, can_reshape);
>> 2276:   }
>> 2277:   return res;
> 
> What if someone comes and wants to add yet another optimization before `VectorNode::Ideal`? Your code layout would give us deeper and deeper nesting. I suggest flattening it like this:
> Suggestion:
> 
> 
>   Node* res = Ideal_XorV_VectorMaskCmp(phase, can_reshape);
>   if (res != nullptr) { return res; }
> 
>   return VectorNode::Ideal(phase, can_reshape);

Make sense, done.

> test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 351:
> 
>> 349:     public void testCompareULEMaskNotLong() {
>> 350:         testCompareMaskNotLong(VectorOperators.ULE);
>> 351:     }
> 
> You could consider making the operator a `@Param` next time.
> 
> There are multiple tricks to do that:
> - `test/micro/org/openjdk/bench/vm/compiler/VectorStoreToLoadForwarding.java` using `MethodHandles.constant`
> - Some inner class that has a static final, which is initialized from the non-final `@Param` value.
> - Probably even `StableValue` would work, but I have not yet experimented with it.
> 
> It would be nice if we could do the same with the primitive types, but that's probably not going to work as easily.
> 
> Really just an idea for next time.

Good point, I didn't know about these methods before. I will submit this change in my next commit, thank you.

> test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 366:
> 
>> 364:     public void testCompareNEMaskNotFloat() {
>> 365:         testCompareMaskNotFloat(VectorOperators.NE);
>> 366:     }
> 
> You could still add the other comparisons as well, so we can see the performance difference. Very optional, feel free to ignore this suggestion.

Sounds good, this will be added with the above change.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2335413222
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2335421260
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2335825557
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2335827904

From epeter at openjdk.org  Wed Sep 10 07:45:53 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 07:45:53 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v7]
In-Reply-To: <FFyeak7o5Plkg2ljHZD05VetZ9uI81UnZN1sc65ZqAg=.201bccb4-361c-4869-baac-d73c49f5f8d7@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <dCstHcUFS9A79fKEf3RWnPrxvnzKjyVfbBzyT_iyzYo=.19255391-54fb-445e-b7e8-faf016e8a79f@github.com>
 <jc11aMooMRS54e6I3rd0HyobUW38VG_SbP60BoHUu48=.6ad63307-03bb-4171-bfa6-4f40741a1fc6@github.com>
 <NOSjg9nd8YCpTLPchcVXO2KxOzfTmYuxaQHqZhmHGUo=.e98cf933-0c08-4761-8210-75d56ece7542@github.com>
 <tLkj61MwZSaQEeLO3reAqAWfAMbs_hcR4wVXuUNpu5E=.197c558b-665f-4d7d-8f0c-97031a0ccf16@github.com>
 <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com>
 <FFyeak7o5Plkg2ljHZD05VetZ9uI81UnZN1sc65ZqAg=.201bccb4-361c-4869-baac-d73c49f5f8d7@github.com>
Message-ID: <aPgI3IJisrH2EWUdZmjG-3STzFMsUnBDcuzY2060JuQ=.3d5b8674-4233-47cc-b556-fedab2f11359@github.com>

On Wed, 3 Sep 2025 10:09:58 GMT, erifan <duke at openjdk.org> wrote:

>>> Oh I think we still cannot use `BoolTest::negate`, because we cannot instantiate a `BoolTest` object with **unsigned** comparison. `BoolTest::negate` is a non-static function.
>> 
>> I see. Ok. Hmm. I still think that the logic should be in `BoolTest`, because that is where the exact implementation of the enum values is. In that context it is easier to see why `^4` does the negation. And imagine we were ever to change the enum values, then it would be harder to find your code and fix it.
>> 
>> Maybe it could be called `BoolTest::negate_mask(mast btm)` and explain in a comment that both signed and unsigned is supported.
>
> Hi @eme64 @theRealAph @XiaohongGong @fg1417 @shqking ,  could you help take a look at this PR, thanks

@erifan Sounds good. No rush, it takes as long as it takes. I'll soon be on vacation too and may not respond until mid of October.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3273732881

From galder at openjdk.org  Wed Sep 10 08:16:22 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Wed, 10 Sep 2025 08:16:22 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v4]
In-Reply-To: <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
Message-ID: <Oz7mhp5z2j1OzIHwP1J2fBFr4rN9qS05vVWYr9HZpS0=.9135e1da-8b53-41a7-abaa-858fb21206c1@github.com>

On Mon, 8 Sep 2025 08:28:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ---------------------------------
>> 
>> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>> 
>> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>> 
>> My vision:
>> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
>> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
>> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
>> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>>   - That means it is straight-forward to compute cost
>>   - And it also makes optimizations on that graph easier
>>   - And the `apply` methods are simpler too
>> 
>> ----------------------------------
>> 
>> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>> 
>> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>> 
>> What I did:
>> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>>   - Will make it easier to optimize and compute cost in future RFE's.
>> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
>> - New vector nodes, they are special cases I split away from ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix typo

Changes requested by galder (Author).

src/hotspot/share/opto/vtransform.cpp line 795:

> 793: 
> 794:   VectorNode* vn = nullptr;
> 795:   if (req() <= 3) {

I'm wondering if with this change, the `assert(2 <= req() && req() <= 4, "Must have 1-3 inputs");` call could moved and be made more specific for these 2 sides of the conditon.

For example, we know that if we go down the `req() <= 3` route, then we're in the 1-2 inputs? And if if we're in the other one we're at least 3 inputs.

Then, with that in mind, I wonder if we couldn't move `  Node* in3 = (req() >= 4) ? apply_state.transformed_node(in_req(3)) : nullptr;` to be computed only in the `else` and convert it to `Node* in3 = apply_state.transformed_node(in_req(3))`?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27056#pullrequestreview-3204977764
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2335935271

From shade at openjdk.org  Wed Sep 10 08:20:33 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 10 Sep 2025 08:20:33 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode
Message-ID: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>

I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:

 1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
 2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.

I think we should be running CTW tests in AWT headless mode to begin with. 

Additional testing:
 - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.org/jdk/pull/27187/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27187&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367313
  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27187.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27187/head:pull/27187

PR: https://git.openjdk.org/jdk/pull/27187

From rcastanedalo at openjdk.org  Wed Sep 10 08:31:48 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 10 Sep 2025 08:31:48 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v8]
In-Reply-To: <BrNHUWgnhDZWz523gq_a8Smxck7UE0r0gBLQHfydrXk=.d96048bf-497e-426d-bdab-b58e63b1e5c6@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com>
 <phFEV6ecal3bMYgAt85dr5f6UKm024p2Ssw2l5zDvOQ=.c332a12d-5009-4e99-abc4-e0d58f06a075@github.com>
 <JczlkGMI1ugc2011v3_yecnmAihjcv5YYyixFtvZjvk=.3994dece-26bc-4c73-9850-8f63986b6fc7@github.com>
 <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com>
 <eMGWpjjtAvxGzXXgDpfqUyz-LHobPg5dEAk99yQYhic=.81804900-b4ae-4b71-9a39-893fa7b6d36c@github.com>
 <LeeKE7VBNvxxD8-1ltyf2CGltyUV90y-ZabbxGVYXZc=.79192936-6954-4b74-a4ec-ead162efe4e2@github.com>
 <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com>
 <QtsENUXeRsA140liru9rjk0KDbNVhKj6qPVU8toDlkI=.4b9eadfe-045e-4bae-a2c8-40c04496cb60@github.com>
 <BrNHUWgnhDZWz523gq_a8Smxck7UE0r0gBLQHfydrXk=.d96048bf-497e-426d-bdab-b58e63b1e5c6@github.com>
Message-ID: <hGGgYXj4IJCGws1HtyYZjSjpi88IemdVUxZO1HaVDdc=.9ee892d7-09ec-4752-a4ad-385ff209c5c0@github.com>

On Fri, 11 Jul 2025 18:20:19 GMT, John R Rose <jrose at openjdk.org> wrote:

> Specifically, if we are using narrow memory projections sometimes, we should be prepared to respect them always.

@rose00 I fully agree with your general argument, but note that my request refers to avoiding redundant MachProj memory projections arising after matching narrow memory projections (such as nodes 56-39 in B6 in the following CFG: https://github.com/user-attachments/files/20477560/after-gcm.pdf), not narrow memory projections per se. I do not see any use in allowing redundant MachProj memory projections in the IR, while due to their ambiguity they increase the risk of introducing new bugs or unveiling latent bugs, e.g. in anti-dependency analysis. So I am happy that Roland has found a cheap way to prevent them from ever appearing in the IR.

> @rose00 @robcasloz I updated the change with a new way to avoid redundant projections. At matching time, before a `NarrowMemProj` is matched into a `MachProj`, new logic checks whether a `MachProj` already exists. That guarantees that no redundant `MachProj` are ever added. It also performs the new normalization at a major cut-point. What do you think?

That sounds good to me, thank you for enforcing this Roland! I will re-run testing and have a new look at the changeset within the next days.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3273887804
PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3273891349

From duke at openjdk.org  Wed Sep 10 08:48:59 2025
From: duke at openjdk.org (erifan)
Date: Wed, 10 Sep 2025 08:48:59 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress
Message-ID: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>

The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.

This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.

This pull request introduces the following changes:
1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
2. Eliminates unnecessary compress operations for partial subword type cases.
3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.

Benchmark results demonstrate that these changes significantly improve performance.

Benchmarks on Nvidia Grace machine with 128-bit SVE:

Benchmark	            Unit	Before	 Error	After	 Error	Uplift
Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38


This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.

-------------

Commit messages:
 - 8366333: AArch64: Enhance SVE subword type implementation of vector compress

Changes: https://git.openjdk.org/jdk/pull/27188/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27188&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366333
  Stats: 414 lines in 9 files changed: 297 ins; 24 del; 93 mod
  Patch: https://git.openjdk.org/jdk/pull/27188.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27188/head:pull/27188

PR: https://git.openjdk.org/jdk/pull/27188

From epeter at openjdk.org  Wed Sep 10 08:49:09 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 08:49:09 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v4]
In-Reply-To: <Oz7mhp5z2j1OzIHwP1J2fBFr4rN9qS05vVWYr9HZpS0=.9135e1da-8b53-41a7-abaa-858fb21206c1@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
 <Oz7mhp5z2j1OzIHwP1J2fBFr4rN9qS05vVWYr9HZpS0=.9135e1da-8b53-41a7-abaa-858fb21206c1@github.com>
Message-ID: <WfsaKxc58NtKWEfEwf2fmcmhR4OhVAi7OxiTKjZP4Ws=.6954d58d-1006-4874-a82c-cf8f58b49b70@github.com>

On Wed, 10 Sep 2025 08:10:07 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix typo
>
> src/hotspot/share/opto/vtransform.cpp line 795:
> 
>> 793: 
>> 794:   VectorNode* vn = nullptr;
>> 795:   if (req() <= 3) {
> 
> I'm wondering if with this change, the `assert(2 <= req() && req() <= 4, "Must have 1-3 inputs");` call could moved and be made more specific for these 2 sides of the conditon.
> 
> For example, we know that if we go down the `req() <= 3` route, then we're in the 1-2 inputs? And if if we're in the other one we're at least 3 inputs.
> 
> Then, with that in mind, I wonder if we couldn't move `  Node* in3 = (req() >= 4) ? apply_state.transformed_node(in_req(3)) : nullptr;` to be computed only in the `else` and convert it to `Node* in3 = apply_state.transformed_node(in_req(3))`?

We could. But I'd prefer to do the req assert before I access any inputs, to avoid failing in the input access.

And I also like the parallel pattern of fetching the inputs, moving it inside the if/else would in my opinion make it harder to read.

We could also just drop the assert and rely on the asserts in the input fetch.

Personally, I would leave it as I have it now, but I'm open to a majority vote ;)

@chhagedorn What would you prefer?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2336034124

From xgong at openjdk.org  Wed Sep 10 08:57:40 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Wed, 10 Sep 2025 08:57:40 GMT
Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v11]
In-Reply-To: <ezowLR1tocCY7LvboEC3gAfVphplLZH9WcfUgrbiPnk=.0b60331f-89be-4c4f-ab96-380d437d9b74@github.com>
References: <cGkYMFJGc4N5Wwje26vKLpmnV4UpfT8tZpLOeGfosxI=.219cd257-382f-401b-8c15-2e7803ae7b01@github.com>
 <ezowLR1tocCY7LvboEC3gAfVphplLZH9WcfUgrbiPnk=.0b60331f-89be-4c4f-ab96-380d437d9b74@github.com>
Message-ID: <RdS8yVu4Yzdq_NtJn32YuXVspdLiiluqKgQSP8u6AzE=.14d35280-01e5-47fb-81e6-f52e07290079@github.com>

On Thu, 14 Aug 2025 14:01:13 GMT, Mikhail Ablakatov <mablakatov at openjdk.org> wrote:

>> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used.
>> 
>> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still.
>> 
>> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks.
>> 
>> Benchmarks results:
>> 
>> Neoverse-V1 (SVE 256-bit)
>> 
>>   Benchmark                 (size)   Mode   master         PR  Units
>>   ByteMaxVector.MULLanes      1024  thrpt 5447.643  11455.535 ops/ms
>>   ShortMaxVector.MULLanes     1024  thrpt 3388.183   7144.301 ops/ms
>>   IntMaxVector.MULLanes       1024  thrpt 3010.974   4911.485 ops/ms
>>   LongMaxVector.MULLanes      1024  thrpt 1539.137   2562.835 ops/ms
>>   FloatMaxVector.MULLanes     1024  thrpt 1355.551   4158.128 ops/ms
>>   DoubleMaxVector.MULLanes    1024  thrpt 1715.854   3284.189 ops/ms
>> 
>> 
>> Fujitsu A64FX (SVE 512-bit):
>> 
>>   Benchmark                 (size)   Mode   master         PR  Units
>>   ByteMaxVector.MULLanes      1024  thrpt 1091.692   2887.798 ops/ms
>>   ShortMaxVector.MULLanes     1024  thrpt  597.008   1863.338 ops/ms
>>   IntMaxVector.MULLanes       1024  thrpt  510.642   1348.651 ops/ms
>>   LongMaxVector.MULLanes      1024  thrpt  468.878    878.620 ops/ms
>>   FloatMaxVector.MULLanes     1024  thrpt  376.284   2237.564 ops/ms
>>   DoubleMaxVector.MULLanes    1024  thrpt  431.343   1646.792 ops/ms
>
> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   cleanup: start the SVE Integer Misc - Unpredicated section

Following issues are reported when I run this test on a SVE 512-bit vector length simulator.

test Byte256VectorTests.MULByte256VectorTestsMasked(byte[-i * 5], byte[cornerCaseValue(i)], mask[false]): success [15ms]
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  Internal Error (/tmp/ci-scripts/jdk-src/src/hotspot/cpu/aarch64/aarch64_vector.ad:3522), pid=299515, tid=299551
 #  assert(length_in_bytes == MaxVectorSize) failed: invalid vector length
 #


Same failures happens on following tests:

jdk/incubator/vector/Byte256VectorTests.java
jdk/incubator/vector/Int256VectorTests.java
jdk/incubator/vector/Long256VectorTests.java
jdk/incubator/vector/Short256VectorTests.java

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3273984366

From qamai at openjdk.org  Wed Sep 10 10:29:37 2025
From: qamai at openjdk.org (Quan Anh Mai)
Date: Wed, 10 Sep 2025 10:29:37 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v7]
In-Reply-To: <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
Message-ID: <W982XWCiEvt6xSrgvoqHxToIp9llsI8mjyMV7S9Ygw4=.4c2df39b-a783-464b-a91e-a6682848cf6e@github.com>

On Tue, 26 Aug 2025 12:46:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Nice consolidation also. I have only some small style suggestion.

src/hotspot/share/opto/divnode.cpp line 1220:

> 1218:   // Mod by zero?  Throw exception at runtime!
> 1219:   if (t2 == TypeInteger::zero(bt)) {
> 1220:     return TypeInt::TOP;

`TypeInt::TOP` is actually `Type::TOP`

src/hotspot/share/opto/divnode.cpp line 1225:

> 1223:   const TypeInteger* i1 = t1->isa_integer(bt);
> 1224:   const TypeInteger* i2 = t2->isa_integer(bt);
> 1225:   if (i1 == nullptr || i2 == nullptr) {

If they are not `TOP` here, `isa_integer` should never return `nullptr`, it's better to do an `assert` here.

src/hotspot/share/opto/divnode.cpp line 1269:

> 1267:     hi = MIN2(hi, i1->hi_as_long());
> 1268:   }
> 1269:   return TypeInteger::make(lo, hi, MAX2(i1->_widen,i2->_widen), bt);

Small style: space after comma.

-------------

Marked as reviewed by qamai (Committer).

PR Review: https://git.openjdk.org/jdk/pull/25254#pullrequestreview-3205479330
PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2336282089
PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2336297089
PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2336288184

From epeter at openjdk.org  Wed Sep 10 11:35:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 11:35:32 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v4]
In-Reply-To: <WfsaKxc58NtKWEfEwf2fmcmhR4OhVAi7OxiTKjZP4Ws=.6954d58d-1006-4874-a82c-cf8f58b49b70@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
 <Oz7mhp5z2j1OzIHwP1J2fBFr4rN9qS05vVWYr9HZpS0=.9135e1da-8b53-41a7-abaa-858fb21206c1@github.com>
 <WfsaKxc58NtKWEfEwf2fmcmhR4OhVAi7OxiTKjZP4Ws=.6954d58d-1006-4874-a82c-cf8f58b49b70@github.com>
Message-ID: <CqKvnX5BsCqs6D7x856P6XXjFe9_HOHghb9PT_L7R_w=.3356117a-6962-4000-a60d-9c1f7b754872@github.com>

On Wed, 10 Sep 2025 08:46:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/vtransform.cpp line 795:
>> 
>>> 793: 
>>> 794:   VectorNode* vn = nullptr;
>>> 795:   if (req() <= 3) {
>> 
>> I'm wondering if with this change, the `assert(2 <= req() && req() <= 4, "Must have 1-3 inputs");` call could moved and be made more specific for these 2 sides of the conditon.
>> 
>> For example, we know that if we go down the `req() <= 3` route, then we're in the 1-2 inputs? And if if we're in the other one we're at least 3 inputs.
>> 
>> Then, with that in mind, I wonder if we couldn't move `  Node* in3 = (req() >= 4) ? apply_state.transformed_node(in_req(3)) : nullptr;` to be computed only in the `else` and convert it to `Node* in3 = apply_state.transformed_node(in_req(3))`?
>
> We could. But I'd prefer to do the req assert before I access any inputs, to avoid failing in the input access.
> 
> And I also like the parallel pattern of fetching the inputs, moving it inside the if/else would in my opinion make it harder to read.
> 
> We could also just drop the assert and rely on the asserts in the input fetch.
> 
> Personally, I would leave it as I have it now, but I'm open to a majority vote ;)
> 
> @chhagedorn What would you prefer?

I discussed a bit with @chhagedorn .

He thought I could move down the `Node* in3 = apply_state.transformed_node(in_req(3))`.

Maybe if we extend the element wise ops to cases with yet another input it will have to be moved up again, but it's fine to move down for now.

The assert we'll leave where it is, it makes more sense as a precondition. As such, I'll move it to the top of the method.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2336454623

From epeter at openjdk.org  Wed Sep 10 11:42:01 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 11:42:01 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v5]
In-Reply-To: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
Message-ID: <ejM4uA86PWPLDhM6MYFtjvARXExH15eBN8hSj6tWGB4=.561493b2-52b6-4706-ae39-ecf653b027d4@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  for Galder

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27056/files
  - new: https://git.openjdk.org/jdk/pull/27056/files/e3fe36ee..f346e69f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=03-04

  Stats: 5 lines in 1 file changed: 2 ins; 3 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27056.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27056/head:pull/27056

PR: https://git.openjdk.org/jdk/pull/27056

From jbhateja at openjdk.org  Wed Sep 10 12:25:27 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 10 Sep 2025 12:25:27 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v4]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <znRBdCvSKtWJ22IsvVHHQkKQQNBYGBUCKtQ0qreXSGk=.85c56b65-2471-467d-82bf-d486843c8cff@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> PopCountValueTransform.StockKernelInt         thrpt    2  409295.875          ops/s
> PopCountValueTransform.StockKernelLong        thrpt    2  368025.608          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> PopCountValueTransform.StockKernelInt         thrpt    2  418649.269          ops/s
> PopCountValueTransform.StockKernelLong        thrpt    2  381330.221          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Update src/hotspot/share/opto/countbitsnode.cpp
  
  Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/52ae6bc8..36ecb5d1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=02-03

  Stats: 31 lines in 1 file changed: 0 ins; 12 del; 19 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From jbhateja at openjdk.org  Wed Sep 10 12:30:21 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 10 Sep 2025 12:30:21 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
Message-ID: <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>

On Tue, 9 Sep 2025 11:46:03 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 114:
>> 
>>> 112:         }
>>> 113:         return 1;
>>> 114:     }
>> 
>> Thanks for the tests!
>> 
>> I think it would be quite valuable to have some tests that do not just clamp the range, but also create random `KnownBits`, i.e. with random and/or masks.
>> 
>> For example:
>> `num = (num | ONES) & ZEROS;`
>> 
>> And then you generate `ONES` and `ZEROS` randomly, maybe even using `Generators`?
>> Then round it off with some random range comparisons at the end:
>> `        if (Integer.bitCount(num) >= CON1 && Integer.bitCount(num) <= CON2) {`
>
> Also: how many popcount instructions are left? Should it not at most be 1?

> Thanks for the tests!
> 
> I think it would be quite valuable to have some tests that do not just clamp the range, but also create random `KnownBits`, i.e. with random and/or masks.
> 
> For example: `num = (num | ONES) & ZEROS;`
> 
> And then you generate `ONES` and `ZEROS` randomly, maybe even using `Generators`? Then round it off with some random range comparisons at the end: ` if (Integer.bitCount(num) >= CON1 && Integer.bitCount(num) <= CON2) {`

With Random Ranges, we will not be able to ascertain the count of PopCountI IR node, which is why I created different tests for complete logic sweeping, and the one which retains PopCountIR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2336597616

From epeter at openjdk.org  Wed Sep 10 12:38:18 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 12:38:18 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v6]
In-Reply-To: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
Message-ID: <KyCggwF9j8oRNqLQ9yGGTSS58DbD6H-MVBDojQK83Kw=.3f4c0628-d564-4a71-9c35-95da77c1d62c@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  fix include order

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27056/files
  - new: https://git.openjdk.org/jdk/pull/27056/files/f346e69f..afd716e3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27056&range=04-05

  Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27056.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27056/head:pull/27056

PR: https://git.openjdk.org/jdk/pull/27056

From epeter at openjdk.org  Wed Sep 10 12:50:00 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 12:50:00 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
Message-ID: <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>

On Wed, 10 Sep 2025 12:27:51 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Also: how many popcount instructions are left? Should it not at most be 1?
>
>> Thanks for the tests!
>> 
>> I think it would be quite valuable to have some tests that do not just clamp the range, but also create random `KnownBits`, i.e. with random and/or masks.
>> 
>> For example: `num = (num | ONES) & ZEROS;`
>> 
>> And then you generate `ONES` and `ZEROS` randomly, maybe even using `Generators`? Then round it off with some random range comparisons at the end: ` if (Integer.bitCount(num) >= CON1 && Integer.bitCount(num) <= CON2) {`
> 
> With Random Ranges, we will not be able to ascertain the count of PopCountI IR node, which is why I created different tests for complete logic sweeping, and the one which retains PopCountIR.

Oh, maybe I missed those "complete logic sweeping tests". Can you please point me to them?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2336651060

From epeter at openjdk.org  Wed Sep 10 12:53:14 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 12:53:14 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v4]
In-Reply-To: <Oz7mhp5z2j1OzIHwP1J2fBFr4rN9qS05vVWYr9HZpS0=.9135e1da-8b53-41a7-abaa-858fb21206c1@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
 <Oz7mhp5z2j1OzIHwP1J2fBFr4rN9qS05vVWYr9HZpS0=.9135e1da-8b53-41a7-abaa-858fb21206c1@github.com>
Message-ID: <Me7kAu1tAlWBUHKNw46-yFQ2Yaz3z57Taz2L4ZuoU4I=.2b6ac1ac-fc08-45d5-aca0-17df1ebeb037@github.com>

On Wed, 10 Sep 2025 08:14:04 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix typo
>
> Changes requested by galder (Author).

@galderz I addressed your comment, would you mind having another look?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27056#issuecomment-3274845574

From galder at openjdk.org  Wed Sep 10 13:51:47 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Wed, 10 Sep 2025 13:51:47 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v6]
In-Reply-To: <KyCggwF9j8oRNqLQ9yGGTSS58DbD6H-MVBDojQK83Kw=.3f4c0628-d564-4a71-9c35-95da77c1d62c@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <KyCggwF9j8oRNqLQ9yGGTSS58DbD6H-MVBDojQK83Kw=.3f4c0628-d564-4a71-9c35-95da77c1d62c@github.com>
Message-ID: <EZjF0t-HeOjSbhkDduTuNca4LCgsrGR4GCFOZ60V0os=.d721264b-c367-466d-89cf-ae2b854dd7b4@github.com>

On Wed, 10 Sep 2025 12:38:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ---------------------------------
>> 
>> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>> 
>> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>> 
>> My vision:
>> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
>> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
>> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
>> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>>   - That means it is straight-forward to compute cost
>>   - And it also makes optimizations on that graph easier
>>   - And the `apply` methods are simpler too
>> 
>> ----------------------------------
>> 
>> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>> 
>> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>> 
>> What I did:
>> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>>   - Will make it easier to optimize and compute cost in future RFE's.
>> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
>> - New vector nodes, they are special cases I split away from ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix include order

Nice tidy up @eme64!

-------------

Marked as reviewed by galder (Author).

PR Review: https://git.openjdk.org/jdk/pull/27056#pullrequestreview-3206262311

From galder at openjdk.org  Wed Sep 10 13:51:49 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Wed, 10 Sep 2025 13:51:49 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v4]
In-Reply-To: <CqKvnX5BsCqs6D7x856P6XXjFe9_HOHghb9PT_L7R_w=.3356117a-6962-4000-a60d-9c1f7b754872@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <N8oSF-jg2c9nJQFV8maPYVd16rMTNu6Dv6SZ4o0AJ4I=.f9db0f00-e868-4b7e-a824-c5fb2be21afb@github.com>
 <Oz7mhp5z2j1OzIHwP1J2fBFr4rN9qS05vVWYr9HZpS0=.9135e1da-8b53-41a7-abaa-858fb21206c1@github.com>
 <WfsaKxc58NtKWEfEwf2fmcmhR4OhVAi7OxiTKjZP4Ws=.6954d58d-1006-4874-a82c-cf8f58b49b70@github.com>
 <CqKvnX5BsCqs6D7x856P6XXjFe9_HOHghb9PT_L7R_w=.3356117a-6962-4000-a60d-9c1f7b754872@github.com>
Message-ID: <9tZVJNOOTP7iLuJZP4csjmyNXD_bOSyy1rINDUJscwU=.551a8cf7-be56-43d6-b066-3cd481bc1186@github.com>

On Wed, 10 Sep 2025 11:32:21 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> We could. But I'd prefer to do the req assert before I access any inputs, to avoid failing in the input access.
>> 
>> And I also like the parallel pattern of fetching the inputs, moving it inside the if/else would in my opinion make it harder to read.
>> 
>> We could also just drop the assert and rely on the asserts in the input fetch.
>> 
>> Personally, I would leave it as I have it now, but I'm open to a majority vote ;)
>> 
>> @chhagedorn What would you prefer?
>
> I discussed a bit with @chhagedorn .
> 
> He thought I could move down the `Node* in3 = apply_state.transformed_node(in_req(3))`.
> 
> Maybe if we extend the element wise ops to cases with yet another input it will have to be moved up again, but it's fine to move down for now.
> 
> The assert we'll leave where it is, it makes more sense as a precondition. As such, I'll move it to the top of the method.

Sounds good, thanks @eme64

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2336833959

From jbhateja at openjdk.org  Wed Sep 10 14:21:00 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 10 Sep 2025 14:21:00 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
Message-ID: <ImVwlXfBW6MwuLRxJQfg8Nrl8TZH0R6mUkk9eh57kNU=.afe0a309-19d2-4d9e-91ae-36b4234e6ee8@github.com>

On Tue, 9 Sep 2025 11:00:24 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update countbitsnode.cpp
>
> test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 148:
> 
>> 146: 
>> 147:     public static void main(String[] args) {
>> 148:         TestFramework.runWithFlags("-XX:-TieredCompilation", "-XX:CompileThresholdScaling=0.2");
> 
> Can you explain the need for these flags?
> The TestFramework eventually enqueues for compilation anyway. Or is there something about profiling?

Thanks for triggering an IR framework refresher :-), these options are only pertinent with Standalone run mode.

> test/micro/org/openjdk/bench/java/lang/PopCountValueTransform.java line 79:
> 
>> 77:         }
>> 78:         return res;
>> 79:     }
> 
> I assume the `stock` kernels are there to show performance if there is no op, the `folding` kernels you hope have the same performance. It would be nice to have one where the `bitCount` does not fold away, just to keep that comparison :)

I see your point, on a second thought, since any benchmarks compare the performance of kernels with and without optimization it's better to do away with the stock variants and only retain folding kernels.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2336929724
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2336929500

From jbhateja at openjdk.org  Wed Sep 10 14:22:26 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 10 Sep 2025 14:22:26 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v5]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <O4M5GhTlZs8F6on8slsggZ5pyAp1WANEkhEkwrZ7NXU=.126d7b01-d139-4164-b0c8-66ef46c09d77@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  review resoultions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/36ecb5d1..f1095b58

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=03-04

  Stats: 29 lines in 2 files changed: 2 ins; 20 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From jbhateja at openjdk.org  Wed Sep 10 14:24:28 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 10 Sep 2025 14:24:28 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
 <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
Message-ID: <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>

On Wed, 10 Sep 2025 12:47:42 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> Thanks for the tests!
>>> 
>>> I think it would be quite valuable to have some tests that do not just clamp the range, but also create random `KnownBits`, i.e. with random and/or masks.
>>> 
>>> For example: `num = (num | ONES) & ZEROS;`
>>> 
>>> And then you generate `ONES` and `ZEROS` randomly, maybe even using `Generators`? Then round it off with some random range comparisons at the end: ` if (Integer.bitCount(num) >= CON1 && Integer.bitCount(num) <= CON2) {`
>> 
>> With Random Ranges, we will not be able to ascertain the count of PopCountI IR node, which is why I created different tests for complete logic sweeping, and the one which retains PopCountIR.
>
> Oh, maybe I missed those "complete logic sweeping tests". Can you please point me to them?

testPopCountElisionInt1 and testPopCountElisionLong1 check for absence of PopCount IR nodes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2336945012

From jbhateja at openjdk.org  Wed Sep 10 14:30:10 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 10 Sep 2025 14:30:10 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v6]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <PH8_f63ip4PTYI7b3Gp7B4WEiyNrMEy6GAfgk8KUjNA=.659ab08e-9943-4119-a54e-ee223bbcf575@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Update TestPopCountValueTransforms.java

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/f1095b58..9e3957de

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=04-05

  Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From hgreule at openjdk.org  Wed Sep 10 14:30:11 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Wed, 10 Sep 2025 14:30:11 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
 <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
 <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>
Message-ID: <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>

On Wed, 10 Sep 2025 14:22:10 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Oh, maybe I missed those "complete logic sweeping tests". Can you please point me to them?
>
> testPopCountElisionInt1 and testPopCountElisionLong1 check for absence of PopCount IR nodes.

I think Or and And nodes aren't updated to make use if KnownBits themselves (that generally makes testing based on KnownBits a bit difficult).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2336959670

From epeter at openjdk.org  Wed Sep 10 14:58:05 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 14:58:05 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
 <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
 <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>
 <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
Message-ID: <1J6deqVB9WWQRxzc3oLxXLyIxam61rqx5u5KxZPCtqE=.db2c8f33-4297-47b1-9180-2a732eec8ac1@github.com>

On Wed, 10 Sep 2025 14:26:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> testPopCountElisionInt1 and testPopCountElisionLong1 check for absence of PopCount IR nodes.
>
> I think Or and And nodes aren't updated to make use if KnownBits themselves (that generally makes testing based on KnownBits a bit difficult).

Ah I see. We should do that soon, it would give us a good way to do this kind of verification.
So you can decide if you want to do the bits thing already in anticipation, or not yet. Personally, I would add it so that we catch the bugs in the future.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2337043857

From epeter at openjdk.org  Wed Sep 10 14:58:07 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 14:58:07 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <1J6deqVB9WWQRxzc3oLxXLyIxam61rqx5u5KxZPCtqE=.db2c8f33-4297-47b1-9180-2a732eec8ac1@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
 <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
 <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>
 <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
 <1J6deqVB9WWQRxzc3oLxXLyIxam61rqx5u5KxZPCtqE=.db2c8f33-4297-47b1-9180-2a732eec8ac1@github.com>
Message-ID: <dGVqRRWs4L6kX61szDT9RzDC8dSrU4GNBabBJKqqX3U=.0f70523d-d5dd-4e6a-91f0-5837fac4c992@github.com>

On Wed, 10 Sep 2025 14:54:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I think Or and And nodes aren't updated to make use if KnownBits themselves (that generally makes testing based on KnownBits a bit difficult).
>
> Ah I see. We should do that soon, it would give us a good way to do this kind of verification.
> So you can decide if you want to do the bits thing already in anticipation, or not yet. Personally, I would add it so that we catch the bugs in the future.

> testPopCountElisionInt1 and testPopCountElisionLong1 check for absence of PopCount IR nodes.

@jatin-bhateja But there the clamps are with fixed constants. It would be nice if we also had some tests with randomized constants. We don't need IR tests for those, just result verification.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2337048167

From chagedorn at openjdk.org  Wed Sep 10 15:02:37 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Wed, 10 Sep 2025 15:02:37 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v6]
In-Reply-To: <KyCggwF9j8oRNqLQ9yGGTSS58DbD6H-MVBDojQK83Kw=.3f4c0628-d564-4a71-9c35-95da77c1d62c@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <KyCggwF9j8oRNqLQ9yGGTSS58DbD6H-MVBDojQK83Kw=.3f4c0628-d564-4a71-9c35-95da77c1d62c@github.com>
Message-ID: <NzN9xvMKhEJDWa3aY4kzTvhWGAtxTmiG79-mIpsB0YQ=.f1126533-d2c0-41bf-bc5f-afb261b75040@github.com>

On Wed, 10 Sep 2025 12:38:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ---------------------------------
>> 
>> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>> 
>> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>> 
>> My vision:
>> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
>> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
>> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
>> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>>   - That means it is straight-forward to compute cost
>>   - And it also makes optimizations on that graph easier
>>   - And the `apply` methods are simpler too
>> 
>> ----------------------------------
>> 
>> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>> 
>> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>> 
>> What I did:
>> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>>   - Will make it easier to optimize and compute cost in future RFE's.
>> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
>> - New vector nodes, they are special cases I split away from ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix include order

Still good!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27056#pullrequestreview-3206586599

From epeter at openjdk.org  Wed Sep 10 15:17:09 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 15:17:09 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
 <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
 <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>
 <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
Message-ID: <VJD8EJAuA7CEMNrXXjYJkmZzfu8SDFsN1-zY_C-toGE=.fc86c304-ba81-4d1c-b5a8-06f9d9e588ed@github.com>

On Wed, 10 Sep 2025 14:26:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> testPopCountElisionInt1 and testPopCountElisionLong1 check for absence of PopCount IR nodes.
>
> I think Or and And nodes aren't updated to make use if KnownBits themselves (that generally makes testing based on KnownBits a bit difficult).

@SirYwell @jatin-bhateja I filed an RFE for And / Or. I think these would be really important to do soon, because any other KnownBits optimization relies on those working for verification (generating inputs and verifying outputs). https://bugs.openjdk.org/browse/JDK-8367341

@SirYwell @jatin-bhateja @merykitty I linked this issue here to the KnownBits RFE, to make sure we keep track of all KnownBits extensions. Can you please help me with linking any other RFEs that have already been filed or come up in the future? It would help track progress and avoid duplicated work.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2337091504
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2337099580

From rasbold at openjdk.org  Wed Sep 10 15:52:54 2025
From: rasbold at openjdk.org (Chuck Rasbold)
Date: Wed, 10 Sep 2025 15:52:54 GMT
Subject: RFR: 8366118: DontCompileHugeMethods is not respected with
 -XX:-TieredCompilation [v5]
In-Reply-To: <kaYxfqjgve0ZNzFAQ3P4s3tJjsaiN3StSdDVj7C71gs=.acbbf467-df7c-4ec8-a6a4-b997ce25c163@github.com>
References: <KeZJJ4fwtgBQCPof9uvquApw5fUZ75EjT2KhM3_ZpEU=.e28f986e-61b8-40d4-b812-58cfae0e2270@github.com>
 <kaYxfqjgve0ZNzFAQ3P4s3tJjsaiN3StSdDVj7C71gs=.acbbf467-df7c-4ec8-a6a4-b997ce25c163@github.com>
Message-ID: <RJmItENr5UuFL6PUeDayfLSN6VanULBQA5FnoRjpEcY=.15b2b462-fe03-46ca-bcec-be64ce6cadeb@github.com>

On Fri, 29 Aug 2025 23:12:18 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi,
>> 
>> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause.
>> 
>> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation.
>> 
>> -Man
>
> Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8366118-DontCompileHugeMethods
>  - Add -Xbatch to test
>  - Use List.of in test
>  - Add a jtreg test
>  - 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation

Marked as reviewed by rasbold (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/26932#pullrequestreview-3206801879

From mablakatov at openjdk.org  Wed Sep 10 15:57:54 2025
From: mablakatov at openjdk.org (Mikhail Ablakatov)
Date: Wed, 10 Sep 2025 15:57:54 GMT
Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v11]
In-Reply-To: <8T7swIJ17tLLg4FO_N5UZ0HsMYrz31ywBiMZohefGTE=.386eeb0d-8541-4c35-8a68-6caf31ea867e@github.com>
References: <cGkYMFJGc4N5Wwje26vKLpmnV4UpfT8tZpLOeGfosxI=.219cd257-382f-401b-8c15-2e7803ae7b01@github.com>
 <ezowLR1tocCY7LvboEC3gAfVphplLZH9WcfUgrbiPnk=.0b60331f-89be-4c4f-ab96-380d437d9b74@github.com>
 <8T7swIJ17tLLg4FO_N5UZ0HsMYrz31ywBiMZohefGTE=.386eeb0d-8541-4c35-8a68-6caf31ea867e@github.com>
Message-ID: <HvM3USKjA_LPjNtO3v8s-gQRbDnhV8-bGRWF5tPjfG4=.1017c4e6-c08d-46ee-8d7d-c70c27754cf5@github.com>

On Tue, 9 Sep 2025 06:51:00 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> Do you intend to ignore ops with >32B vector size? May I ask the reason?

The reason is the lack of relevant hardware. The only publicly available platform that implements 512b SVE I'm aware of is Fujitsu A64FX. I used to have access to that platform but no longer which makes it difficult to test and benchmark changes for 512b SVE. Stripping that functionality and keeping the implementation in bounds of 256b SVE reduces complexity of this patch.

> If so, maybe the title like AArch64: Implement MulReduction for 256-bit SVE is more accurate?

Given the state of the PR it might be. Thank you for the suggestion, I'll consider it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-3275566957

From jbhateja at openjdk.org  Wed Sep 10 16:00:37 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 10 Sep 2025 16:00:37 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
 <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
 <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>
 <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
Message-ID: <6eJGadjOxt_uInDZmiRc5MZNefslQT3-bOcsTp2tEe0=.0bdc2d27-d09a-4872-9633-51c2b55d1c18@github.com>

On Wed, 10 Sep 2025 14:26:31 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> testPopCountElisionInt1 and testPopCountElisionLong1 check for absence of PopCount IR nodes.
>
> I think Or and And nodes aren't updated to make use if KnownBits themselves (that generally makes testing based on KnownBits a bit difficult).

> @SirYwell @jatin-bhateja @merykitty I linked this issue here to the KnownBits RFE, to make sure we keep track of all KnownBits extensions. Can you please help me with linking any other RFEs that have already been filed or come up in the future? It would help track progress and avoid duplicate work.

Current And Value Transforms :
    - Constant folds - both inputs
    -  There are four possible cases for known bits extraction : -

        _lo   _hi
      <0     <0        : Possibility of finding common prefix and known ZERO and ONE bits among the common portion.
      >=0     <0      : Not applicable scenario, since lower is greater than the upper bound.
      <0     >=0      : No possibility of finding a common prefix b/w hi and lo bounds, thus no known bits exist. 
      >=0   >=0      : Possibility of finding common prefix and known ZERO and ONE bits among the common portion.


Existing value transforms and canonicalization should furnish known bits in applicable scenarios. 

For a full solution, we can add another rule to directly AND the known ZERO and ONE bits of participating inputs, and let canonicalization compute the resultant type and clean up existing handling in Value transforms and explicit constant folding

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2337215333

From epeter at openjdk.org  Wed Sep 10 16:06:04 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 10 Sep 2025 16:06:04 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <6eJGadjOxt_uInDZmiRc5MZNefslQT3-bOcsTp2tEe0=.0bdc2d27-d09a-4872-9633-51c2b55d1c18@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
 <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
 <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>
 <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
 <6eJGadjOxt_uInDZmiRc5MZNefslQT3-bOcsTp2tEe0=.0bdc2d27-d09a-4872-9633-51c2b55d1c18@github.com>
Message-ID: <qlo1opmYZQ8czfnsycZObRee3wbq_4nVyLsi2znLv9k=.2c6b1641-a784-4b44-afb1-faa68b733352@github.com>

On Wed, 10 Sep 2025 15:55:42 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> For a full solution, we can add another rule to directly AND the known ZERO and ONE bits of participating inputs, and let canonicalization compute the resultant type and clean up existing handling in Value transforms and explicit constant folding

Yes, this is what we would end up with after https://bugs.openjdk.org/browse/JDK-8367341 . But I think currently, there is no good way to set / get bits directly.

Using signed comparisons as you mentioned is only of limited help. But it is what we have for now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2337233209

From manc at openjdk.org  Wed Sep 10 17:45:27 2025
From: manc at openjdk.org (Man Cao)
Date: Wed, 10 Sep 2025 17:45:27 GMT
Subject: RFR: 8366118: DontCompileHugeMethods is not respected with
 -XX:-TieredCompilation [v5]
In-Reply-To: <kaYxfqjgve0ZNzFAQ3P4s3tJjsaiN3StSdDVj7C71gs=.acbbf467-df7c-4ec8-a6a4-b997ce25c163@github.com>
References: <KeZJJ4fwtgBQCPof9uvquApw5fUZ75EjT2KhM3_ZpEU=.e28f986e-61b8-40d4-b812-58cfae0e2270@github.com>
 <kaYxfqjgve0ZNzFAQ3P4s3tJjsaiN3StSdDVj7C71gs=.acbbf467-df7c-4ec8-a6a4-b997ce25c163@github.com>
Message-ID: <zhghWGIcsbNUta67ZZRy7ve5B9KoxruOSQA-aETgJCQ=.097f1491-5f91-4ca8-8eab-f6829b9d47ee@github.com>

On Fri, 29 Aug 2025 23:12:18 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi,
>> 
>> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause.
>> 
>> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation.
>> 
>> -Man
>
> Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8366118-DontCompileHugeMethods
>  - Add -Xbatch to test
>  - Use List.of in test
>  - Add a jtreg test
>  - 8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation

Thanks for the review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26932#issuecomment-3275922256

From manc at openjdk.org  Wed Sep 10 17:45:29 2025
From: manc at openjdk.org (Man Cao)
Date: Wed, 10 Sep 2025 17:45:29 GMT
Subject: Integrated: 8366118: DontCompileHugeMethods is not respected with
 -XX:-TieredCompilation
In-Reply-To: <KeZJJ4fwtgBQCPof9uvquApw5fUZ75EjT2KhM3_ZpEU=.e28f986e-61b8-40d4-b812-58cfae0e2270@github.com>
References: <KeZJJ4fwtgBQCPof9uvquApw5fUZ75EjT2KhM3_ZpEU=.e28f986e-61b8-40d4-b812-58cfae0e2270@github.com>
Message-ID: <j5Cbg2qJ36QHxPUjwYQwN2UrXTmA4ZHjpNZvzPl3WJw=.cee3dc2d-7282-4b51-b543-f249b484f168@github.com>

On Mon, 25 Aug 2025 19:38:23 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi,
> 
> Could anyone review this change that fixes https://bugs.openjdk.org/browse/JDK-8366118? When this bug happens, it is difficult or almost impossible to debug due to the lack of stack trace, hs-err log or core dump. Fortunately we are also experimenting with sigaltstack for https://bugs.openjdk.org/browse/JDK-8364654, and it helped immensely to identify the root cause.
> 
> I will also try adding a test case for DontCompileHugeMethod under -XX:-TieredCompilation.
> 
> -Man

This pull request has now been integrated.

Changeset: 4e2a85f7
Author:    Man Cao <manc at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/4e2a85f7500876d65c36aeaf54f5361a1549e7f5
Stats:     143 lines in 2 files changed: 123 ins; 0 del; 20 mod

8366118: DontCompileHugeMethods is not respected with -XX:-TieredCompilation

Co-authored-by: Chuck Rasbold <rasbold at openjdk.org>
Co-authored-by: Justin King <jcking at openjdk.org>
Reviewed-by: rasbold, iveresov, jiangli

-------------

PR: https://git.openjdk.org/jdk/pull/26932

From jbhateja at openjdk.org  Wed Sep 10 18:15:18 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Wed, 10 Sep 2025 18:15:18 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v2]
In-Reply-To: <n-siOErgS2rctFXJMCJCghmKgcF7Zup5Et-PsHNRzIg=.229d78d2-a26c-47b3-b506-d486a2dd17cd@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <qCTL7ytC98tTgipvaLxd9U5mgqBWhI45eS-Gy4_EnSo=.939751de-2adf-45e5-924a-4469de333938@github.com>
 <OGnqrh6sJ6pUldrhttHHkG_tSVVv7vM2So_Q2F9F-wI=.224b58bf-6464-4a16-bbc0-6bf61d904009@github.com>
 <Ome8T1rq6SFBf9AkwRZvjV2UbPPX9EnaEgoTE5oJz7Y=.53fdfa3f-31d4-4a74-8a0c-14317ef5c5f1@github.com>
 <n-siOErgS2rctFXJMCJCghmKgcF7Zup5Et-PsHNRzIg=.229d78d2-a26c-47b3-b506-d486a2dd17cd@github.com>
Message-ID: <0KIGO9Uk5uIHhyFupqt0KvRbLPz_YmTxnR0Q4Bpzakw=.df5451ef-2235-46e0-a472-5909a999976b@github.com>

On Sat, 6 Sep 2025 00:28:18 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>>> @missa-prime Looks like an interesting patch! Do you think you could add some sort of IR test here, to verify that the correct code is generated on AVX10 vs lower AVX?
>> 
>> @eme64 Thanks for the suggestion. This patch doesn't modify any IR though, so I'm not sure what IR test(s) to add. I could modify existing tests (`test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`, `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java`, `test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java`) that use IR nodes as dependencies though. Would that be sufficient? Or did you have something else in mind?
>
>> @missa-prime Could you not match on the mach graph? See example: `test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java` with `CompilePhase.FINAL_CODE`.
>> 
>> Maybe another `CompilePhase` is better. I have never matched on the mach graph myself, but I wonder if it may be useful here.
> 
> I modified existing vector conversion tests, and I'll add some matching scalar tests to get full coverage.

@missa-prime , please have a look at the following failure with the current patch 

2025-09-10T02:04:00.8424130Z 
2025-09-10T02:04:00.8424221Z Failed IR Rules (4) of Methods (4)
2025-09-10T02:04:00.8424462Z ----------------------------------
2025-09-10T02:04:00.8425011Z 1) Method "public char[] compiler.vectorization.runner.ArrayTypeConvertTest.convertDoubleToChar()" - [Failed IR rules: 1]:
2025-09-10T02:04:00.8426351Z    * @IR rule 3: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#CAST_D2X#_", "> 0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"avx", "true", "avx10_2", "false"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2025-09-10T02:04:00.8427967Z      > Phase "Final Code":
2025-09-10T02:04:00.8428412Z        - counts: Graph contains wrong number of nodes:
2025-09-10T02:04:00.8429037Z          * Constraint 1: "(\d+(\s){2}(castD2X_reg_(av|eve)x.*)+(\s){2}===.*)"
2025-09-10T02:04:00.8429650Z            - Failed comparison: [found] 0 > 0 [given]
2025-09-10T02:04:00.8430146Z            - No nodes matched!
2025-09-10T02:04:00.8430409Z 
2025-09-10T02:04:00.8431165Z 2) Method "public int[] compiler.vectorization.runner.ArrayTypeConvertTest.convertDoubleToInt()" - [Failed IR rules: 1]:
2025-09-10T02:04:00.8433283Z    * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#CAST_D2X#_", "> 0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"avx", "true", "avx10_2", "false"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2025-09-10T02:04:00.8434546Z      > Phase "Final Code":
2025-09-10T02:04:00.8434810Z        - counts: Graph contains wrong number of nodes:
2025-09-10T02:04:00.8435174Z          * Constraint 1: "(\d+(\s){2}(castD2X_reg_(av|eve)x.*)+(\s){2}===.*)"
2025-09-10T02:04:00.8435523Z            - Failed comparison: [found] 0 > 0 [given]
2025-09-10T02:04:00.8435792Z            - No nodes matched!
2025-09-10T02:04:00.8435937Z 
2025-09-10T02:04:00.8436340Z 3) Method "public short[] compiler.vectorization.runner.ArrayTypeConvertTest.convertDoubleToShort()" - [Failed IR rules: 1]:
2025-09-10T02:04:00.8437688Z    * @IR rule 3: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#CAST_D2X#_", "> 0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"avx", "true", "avx10_2", "false"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2025-09-10T02:04:00.8438703Z      > Phase "Final Code":
2025-09-10T02:04:00.8438956Z        - counts: Graph contains wrong number of nodes:
2025-09-10T02:04:00.8439301Z          * Constraint 1: "(\d+(\s){2}(castD2X_reg_(av|eve)x.*)+(\s){2}===.*)"
2025-09-10T02:04:00.8439647Z            - Failed comparison: [found] 0 > 0 [given]
2025-09-10T02:04:00.8439913Z            - No nodes matched!
2025-09-10T02:04:00.8440063Z 
2025-09-10T02:04:00.8440436Z 4) Method "public int[] compiler.vectorization.runner.ArrayTypeConvertTest.convertFloatToInt()" - [Failed IR rules: 1]:
2025-09-10T02:04:00.8441891Z    * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#CAST_F2X#_", "> 0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"avx", "true", "avx10_2", "false"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2025-09-10T02:04:00.8442912Z      > Phase "Final Code":
2025-09-10T02:04:00.8443155Z        - counts: Graph contains wrong number of nodes:
2025-09-10T02:04:00.8443492Z          * Constraint 1: "(\d+(\s){2}(castF2X_reg_(av|eve)x.*)+(\s){2}===.*)"
2025-09-10T02:04:00.8443951Z            - Failed comparison: [found] 0 > 0 [given]
2025-09-10T02:04:00.8444210Z            - No nodes matched!
2025-09-10T02:04:00.8444355Z 
2025-09-10T02:04:00.8444498Z >>> Check stdout for compilation output of the failed methods

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3276020517

From cslucas at openjdk.org  Wed Sep 10 18:39:06 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Wed, 10 Sep 2025 18:39:06 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v2]
In-Reply-To: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
Message-ID: <gFzwQrRGY91LYbe0DluSjX80ET_kk6Kx7arucF2GW5c=.55deed6b-9e47-4000-9916-f1392967aa99@github.com>

> Please, review this patch to fix issue that may occur when reducing allocation merge.
> 
> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
> 
> The change in `revisit_reducible_phi_status` is just a clean-up.
> The real fix is in `find_scalar_replaceable_allocs`.
> 
> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.

Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision:

  Revert clean-up in EA. Make catch statements more specific in test case.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27063/files
  - new: https://git.openjdk.org/jdk/pull/27063/files/7ebd687f..17d5ab22

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27063&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27063&range=00-01

  Stats: 16 lines in 2 files changed: 13 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/27063.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27063/head:pull/27063

PR: https://git.openjdk.org/jdk/pull/27063

From cslucas at openjdk.org  Wed Sep 10 18:39:07 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Wed, 10 Sep 2025 18:39:07 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <2brDXuLmbVBVRaeSyCdKokA706v3t6VsZfGvj_QceJ4=.4483390e-c726-4d82-b220-f1dbdf4efef0@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
 <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>
 <2brDXuLmbVBVRaeSyCdKokA706v3t6VsZfGvj_QceJ4=.4483390e-c726-4d82-b220-f1dbdf4efef0@github.com>
Message-ID: <3-CCR9TA1nRh8rYDO8BEs-H6qP-Xa42r2kSjckUcdLw=.87585c25-a3dd-4bb7-825a-db58ddae7abb@github.com>

On Tue, 9 Sep 2025 07:01:15 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> Sounds good, please file a RFE for that. I would suggest then to postpone the clean-up in `revisit_reducible_phi_status` to that RFE.

I created this RFE to track that: https://bugs.openjdk.org/browse/JDK-8367367

@robcasloz - I pushed some changes addressing yours and @eme64 comments. Could you please re-run your internal tests?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3276082582
PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3276091930

From rehn at openjdk.org  Wed Sep 10 18:46:52 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Wed, 10 Sep 2025 18:46:52 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v5]
In-Reply-To: <64z-PlrnxAISLzKBq-RZz7CXkQirGTvOgTGMJQl833o=.73ea3239-dfb6-4e32-b20f-8398334f2759@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <64z-PlrnxAISLzKBq-RZz7CXkQirGTvOgTGMJQl833o=.73ea3239-dfb6-4e32-b20f-8398334f2759@github.com>
Message-ID: <IzlmxvN2cYF-OVYP_QsLfsGHpdI1EyVMIW-blkQa_Ko=.d3579688-3ffa-456a-a999-c1ec75ccc72e@github.com>

On Thu, 4 Sep 2025 13:32:34 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hey, please consider!
>> 
>> A bunch of info in JBS entry, please read that also.
>> 
>> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
>> This patch restores them and removes this regression.
>> 
>> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
>> 
>> Please test on your hardware!
>> 
>> 
>> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
>> JDK-23 (last version with trampoline calls)
>> Mean: 3189.5827
>> Standard Deviation: 284.6478
>> 
>> JDK-25
>> Mean: 3424.8905
>> Standard Deviation: 222.2208
>> 
>> Patch:
>> Mean: 3144.8535
>> Standard Deviation: 229.2577
>> 
>> 
>> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
> 
>  - Merge branch 'master' into 8365926
>  - Review comments
>  - Review comments
>  - Merge branch 'master' into 8365926
>  - Spelling
>  - Merge branch 'master' into 8365926
>  - draft jal<->jalr

Hamlin had some offline Q so I gather this data for him:

Benchmark Results:

Base: JDK24* +UseTrampoline

JAL OPT: JDK24* +UseTrampoline + JAL OPT


+-----------------+--------------+--------------+----------------+----------------+--------------+------------------+-------------+----------------+------------------+--------------------+
| Benchmark       | Mean (Base)  | SD (Base)    | Fastest (Base) | Mean (JAL OPT) | SD (JAL OPT) | Fastest (JAL OPT)| Diff Mean   | Diff Fastest   | Mean Diff Ratio  | Fastest Diff Ratio |
+-----------------+--------------+--------------+----------------+----------------+--------------+------------------+-------------+----------------+------------------+--------------------+
| future-genetic  | 8317.8449    | 925.0775     | 7824.59        | 8421.137       | 1870.3916    | 7955.19          | 103.2922    | 130.6          | 1.012418145      | 1.01669097         |
| akka-uct        | 54775.8037   | 5220.7361    | 49614.46       | 54149.9939     | 4730.3662    | 48736.7          | -625.8097   | -877.76        | 0.9885750686     | 0.9823083835       |
| movie-lens      | 44859.3268   | 107.8713     | 38160.64       | 43043.6965     | 7932.6525    | 36807.2          | -1815.6295  | -1353.44       | 0.9595261529     | 0.9645330896       |
| scala-doku      | 10792.4933   | 3004.9348    | 970.34         | 10739.0164     | 2692.6155    | 9226.94          | -53.4766    | 256.59         | 0.9950450188     | 1.028605382        |
| chi-square      | 4740.1812    | 3552.9489    | 2579.09        | 4749.0893      | 3484.3178    | 2498.04          | 8.9081      | -81.05         | 1.001879274      | 0.968574187        |
| fj-kmeans       | 18597.656    | 2481.4036    | 17994.43       | 18588.154      | 4458.6089    | 18019.15         | -9.5018     | 24.72          | 0.9994890862     | 1.001373758        |
| db-shootout     | 26529.8048   | 3163.9087    | 21270.43       | 25101.5681     | 2483.0698    | 21419.11         | -1428.2367  | 148.67         | 0.9461648244     | 1.006989986        |
| finagle-http    | 20646.1713   | 1635.9154    | 14898.97       | 20250.4966     | 1046.1738    | 14735.66         | -395.6747   | -163.31        | 0.9808354443     | 0.9890388396       |
| reactors        | 52051.8872   | 2023.7865    | 49188.65       | 51625.9497     | 2150.598     | 48874.49         | -425.9376   | -314.16        | 0.9918170594     | 0.9936131608       |
| dec-tree        | 7532.9295    | 756.8107     | 4076.4         | 7441.0578      | 750.30926    | 4089.08          | -91.8717    | 12.68          | 0.9878039878     | 1.003110588        |
| naive-bayes     | 38973.8684   | 16828.5555   | 31479.37       | 38484.4577     | 16640.458    | 31576.24         | -489.4106   | 96.87          | 0.9874425937     | 1.003077253        |
| als             | 20116.2896   | 42.9005      | 14593.64       | 19553.929      | 947.1711     | 14599.15         | -562.3509   | 5.52           | 0.9720449855     | 1.000377562        |
| par-mnemonics   | 17564.7499   | 744.1041     | 16654.08       | 17239.074      | 1100.0016    | 15942.67         | -325.676    | -711.41        | 0.9814585518     | 0.9572831402       |
| scala-kmeans    | 1201.4918    | 180.6982     | 845            | 1173.5701      | 205.5769     | 791.32           | -27.9217    | -53.68         | 0.9767608069     | 0.9364733728       |
| philosophers    | 4780.9081    | 417.8337     | 3656.22        | 4828.5436      | 1372.1029    | 3926.02          | 47.6356     | 269.8          | 1.009963714      | 1.073792058        |
| log-regression  | 7403.8792    | 8743.3328    | 3675.79        | 7275.2818      | 715.8207     | 3578.2           | -128.5983   | -97.6          | 0.98263097       | 0.9734506052       |
| gauss-mix       | 35128.1145   | 8364.2843    | 27585.27       | 33996.7118     | 7896.5377    | 26810.99         | -1131.4027  | -774.27        | 0.9677921028     | 0.9719313967       |
| mnemonics       | 21426.0608   | 537.9065     | 20202.69       | 20956.9427     | 610.3026     | 19568.55         | -469.1181   | -634.14        | 0.9781052568     | 0.9686111107       |
| dotty           | 16674.7994   | 13824.23     | 12773.145      | 16098.8288     | 13498.268    | 7484.09          | -575.9706   | -247.36        | 0.965458619      | 0.9680060015       |
| finagle-chirper | 20949.0206   | 10776.0049   | 15527.08       | 20286.9623     | 10038.7242   | 15212.05         | -662.0582   | -315.03        | 0.9683966944     | 0.9797109308       |
+-----------------+--------------+--------------+----------------+----------------+--------------+------------------+-------------+----------------+------------------+--------------------+

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3276121311

From hgreule at openjdk.org  Wed Sep 10 19:32:58 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Wed, 10 Sep 2025 19:32:58 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v7]
In-Reply-To: <W982XWCiEvt6xSrgvoqHxToIp9llsI8mjyMV7S9Ygw4=.4c2df39b-a783-464b-a91e-a6682848cf6e@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <UVFPHEq4IAzIGZV0N9oMApm-KTc8fWWJxFlL5nVZfxc=.a31180a8-c8a5-4036-a99e-fa4fa38e6a08@github.com>
 <W982XWCiEvt6xSrgvoqHxToIp9llsI8mjyMV7S9Ygw4=.4c2df39b-a783-464b-a91e-a6682848cf6e@github.com>
Message-ID: <Be05VlOxO9COHvGN5DUFyaWQJ8sOo5OzOEz3RIiS2b4=.440e225e-bb84-400c-b587-698efc7377b5@github.com>

On Wed, 10 Sep 2025 10:25:00 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   review
>
> src/hotspot/share/opto/divnode.cpp line 1225:
> 
>> 1223:   const TypeInteger* i1 = t1->isa_integer(bt);
>> 1224:   const TypeInteger* i2 = t2->isa_integer(bt);
>> 1225:   if (i1 == nullptr || i2 == nullptr) {
> 
> If they are not `TOP` here, `isa_integer` should never return `nullptr`, it's better to do an `assert` here.

I guess using `is_integer` directly might make sense then?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2337712152

From vlivanov at openjdk.org  Wed Sep 10 22:05:49 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 10 Sep 2025 22:05:49 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v9]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <TT6Kfkeu68h3ExOhLLK2AIXGNewaptQTQe9YTqoBRxs=.ad741b24-99da-4b20-94f3-159de7ec53be@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request incrementally with four additional commits since the last revision:

 - update
 - update
 - update
 - MultiNode -> Node

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25315/files
  - new: https://git.openjdk.org/jdk/pull/25315/files/e95d4eb9..6981bd18

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=07-08

  Stats: 68 lines in 12 files changed: 40 ins; 2 del; 26 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From vlivanov at openjdk.org  Wed Sep 10 22:05:51 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 10 Sep 2025 22:05:51 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <L0LfhTmGAmKPwwFXYaibWSIA3rLaL8j1xL4OL4XkutY=.70bbb39a-5e86-4a80-952b-d3b98a2a4a36@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <UKkT1Wqi4ftj3eGF2KzT8saeWoWSBTXx5kw0FOiJyLE=.c10dbf15-3348-495b-b9aa-556b78bc1e0b@github.com>
 <ejAu9M0FYELqOdzDW8uankmdRt0w8bloAwcxWcyx5k0=.9a47c6c4-e9df-40f5-aba9-23073a12bd17@github.com>
 <L0LfhTmGAmKPwwFXYaibWSIA3rLaL8j1xL4OL4XkutY=.70bbb39a-5e86-4a80-952b-d3b98a2a4a36@github.com>
Message-ID: <IS7_TgFKAYShMIh7km2ILg1eE59Z-llT3T6ID_gA3iU=.a577363d-66c6-4922-bd88-0ff2fcb55e5d@github.com>

On Mon, 8 Sep 2025 13:28:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Good idea. Added one.
>
> Also: you promise that it happens randomly. But it seems to be added deterministically everywhere. Did I miss something?

Sorry for the confusion. Reworded the comment. I didn't intend to make it truly random. The idea was to automatically insert RF nodes during parsing to stress the implementation. It doesn't slow down compilation times that much, so aggressive insertion just works.

>> Live ranges of values are routinely extended during loop opts. And it can break the invariant that all interfering safepoints contain the referent in their oop map. (If an interfering safepoint doesn't keep the referent alive, then it becomes possible for the referent to be prematurely GCed.)  
>> 
>> After loop opts are over, it becomes possible to reliably enumerate all interfering safe points and ensure the referent present in their oop maps.
>
> Can you make sure this explanation is in the comment ;)

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2334889253
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2334855917

From vlivanov at openjdk.org  Wed Sep 10 22:05:56 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 10 Sep 2025 22:05:56 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <_n3uP_Dkl3RNq3MFoRDXsS28SM8CcQHaR6vdUJF9U8s=.dcfab97b-be28-4244-93df-c8a23d6d66b8@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <_n3uP_Dkl3RNq3MFoRDXsS28SM8CcQHaR6vdUJF9U8s=.dcfab97b-be28-4244-93df-c8a23d6d66b8@github.com>
Message-ID: <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>

On Mon, 8 Sep 2025 12:45:56 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/callGenerator.cpp line 623:
>> 
>>> 621:       return; // keep the original call node as the holder of reachability info
>>> 622:     }
>>> 623:   }
>> 
>> Maybe that's just me. But people use the assert messages both in positive and negative ways, and so this is a bit ambiguous. Maybe you can write:
>> `no reachability edge should be present`
>> 
>> I'm still a bit unsure what the `SafePointNode::grow_stack` comment means.
>> In the previous comment https://github.com/openjdk/jdk/pull/25315#discussion_r2320120466 you explained more. Why not add that here instead?
>
> I'm also not sure yet why there is a difference between incremental inlining and regular inlining.
> Do you think it would make sense to explain that here, or is it explained elsewhere?

There are no safepoint-attached reachability edges present during normal parsing. For incremental inlining, JVMS from the original call is taken and extended with callee state. If there are reachability edges present, they have to be treated specially and carried over to all safepoints produced during incremental inlining attempt. There's no such support in place yet.

>> src/hotspot/share/opto/macro.cpp line 973:
>> 
>>> 971:         _igvn._worklist.push(ac);
>>> 972:       } else if (use->is_ReachabilityFence() && OptimizeReachabilityFences) {
>>> 973:         use->as_ReachabilityFence()->clear_referent(_igvn); // redundant fence
>> 
>> Thanks for refactoring a bit here :)
>> 
>> Is this rf guaranteed to belong to the Allocation somehow?
>
> Ah, you could mention that later `ReachabilityFenceNode::Identity` removes the rf.

> Is this rf guaranteed to belong to the Allocation somehow?

I don't get your question. The code iterates over users of an allocation which is being eliminated.  Semantically, RF is a no-op on a scalarizable referent and has to be removed in order to let the scalarization happen.

> Ah, you could mention that later ReachabilityFenceNode::Identity removes the rf.

Done.

>> src/hotspot/share/opto/reachability.cpp line 136:
>> 
>>> 134:     return true;
>>> 135:   }
>>> 136: }
>> 
>> Nit: `an no-op` -> `a no-op`
>> 
>> Also: do you need the return value? The only use case does not do anything with it.
>
> You could mention that `Identity` will remove the node later.

> Also: do you need the return value? The only use case does not do anything with it.

I decided to keep it for diagnostic purposes even though no existing callers care about it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2334899185
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337978454
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337989028

From vlivanov at openjdk.org  Wed Sep 10 22:06:04 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 10 Sep 2025 22:06:04 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
Message-ID: <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>

On Mon, 8 Sep 2025 12:59:48 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> could we just go through _reachability_fences, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?

RF elimination needs control for referent to enumerate all interfering safepoints. 

Theoretically, it's possible to use a conservative estimate, but then:
 (1) it can worsen the result (by enumerating more interfering safepoints than needed); and
 (2) build an unschedulable graph if referent doesn't dominate safepoint node (if estimate is way too conservative). 

IMO it's safer to build full dominator tree here.  

> It probably has a performance impact, right? Have you measured that? 

It does have a noticeable cost. On my laptop it bumps the time spent doing RF processing from 170ms to 210ms

$ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:-StressReachabilityFences

         IdealLoop:             0.173 s
           ReachabilityFence:   0.000 s
             Optimize:          0.000 s
             Eliminate:         0.000 s
``` 
vs

$ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences

         IdealLoop:             0.212 s
           ReachabilityFence:   0.030 s
             Optimize:          0.004 s
             Eliminate:         0.004 s
``` 

I reimplemented it to piggyback on the last loop optimization attempt if there's any and it drastically improves the situation:

$ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences

         IdealLoop:             0.193 s
           ReachabilityFence:   0.009 s
             Optimize:          0.003 s
             Eliminate:         0.004 s

> src/hotspot/share/opto/loopTransform.cpp line 66:
> 
>> 64: //------------------------------unique_loop_exit_or_null----------------------
>> 65: // Return the loop-exit projection if it is unique.
>> 66: Node* IdealLoopTree::unique_loop_exit_or_null() {
> 
> I suggested it here:
> https://github.com/openjdk/jdk/pull/25315#discussion_r2149677594
> Can we change the return type to `IfProjNode`?
> 
> Also: when is it possible that there are none or multiple loop exits?
> Can you add a comment below where you return nullptr?

Done.

> src/hotspot/share/opto/parse1.cpp line 2233:
> 
>> 2231:       insert_reachability_fence(referent);
>> 2232:     }
>> 2233:   }
> 
> Comments look better, thanks :)
> 
> But `StressReachabilityFences` seems to promise that it should happen randomly. Did you want to do that or adjust the flag comment?

I adjusted flag comment.

> src/hotspot/share/opto/reachability.cpp line 49:
> 
>> 47:  *
>> 48:  * It is tempting to directly attach referents to interfering safepoints right from the beginning, but it
>> 49:  * doesn't play well with some optimizations C2 does.
> 
> Do you have an example for such optimizations?

Loop-invariant code motion is one example. Do you want me to add it to the comment?

After parsing is over, the IR is in valid state, but loop optimizations are the primary reason why it can be broken later.

> src/hotspot/share/opto/reachability.cpp line 67:
> 
>> 65:  * RF nodes may interfere with RA, so stand-alone RF nodes are eliminated and their referents are
>> 66:  * transferred to corresponding safepoints (phase #2). When safepoints are pruned during macro expansion,
>> 67:  * corresponding reachability edges also go away.
> 
> Spell our RA on first use. Make more clear that this is why we eliminate RF before RA.
> Suggestion:
> 
>  * RF nodes may interfere with register allocation (RA), hence we eliminate RF nodes and transfer their
>  * referents  to corresponding safepoints (phase #2). When safepoints are pruned during macro expansion,
>  * corresponding reachability edges also go away.
> 
> `reachability edges also go away` ... and that is ok why? Sketch of what you could write, is it correct?
> - reachability only needs to be correct at SafePoints. If all the SafePoints are removed for a referent, then we don't need to ensure its reachablility.

Applied your suggested change and elaborated the comment.

> the very same similar way sounds a little funny. I
Fixed. 

>  What is the issue with the edges being attached to safepoints here?

The issue is safepoint-attached representation conflicts with derived oops representation. There's no way to distinguish between them. As of now, VM treats post-debug info edges as representing derived oops which is completely wrong when there are reachability edges present. More work is needed to support both cases.

> src/hotspot/share/opto/reachability.cpp line 438:
> 
>> 436:   if (!OptimizeReachabilityFences) {
>> 437:     return false;
>> 438:   }
> 
> Can this ever fail? Could it be an assert?

Done.

> src/hotspot/share/opto/reachability.cpp line 441:
> 
>> 439: 
>> 440:   Unique_Node_List redundant_rfs;
>> 441:   Node_List worklist;
> 
> Not sure if necessary, but maybe good practice anyway: add `ResourceMark`.

Done.

> src/hotspot/share/opto/reachability.cpp line 453:
> 
>> 451:         SafePointNode* sfpt = safepoints.pop()->as_SafePoint();
>> 452:         assert(is_dominator(get_ctrl(referent), sfpt), "");
>> 453:         assert(sfpt->req() == rf_start_offset(sfpt), "");
> 
> Is this the only reason we need this to happend during LoopOpts - i.e. that we can call `get_ctrl` and `is_dominator`?
> 
> Because it is potentially a lot of overhead to create the whole loop-opts structures just for this.

It's solely for `get_ctrl(referent)` call in `enumerate_interfering_sfpts()`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337971541
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337972022
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337978893
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2334848196
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2334876169
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337985581
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337997889
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337998906
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337994039

From vlivanov at openjdk.org  Wed Sep 10 22:06:05 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 10 Sep 2025 22:06:05 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <_n3uP_Dkl3RNq3MFoRDXsS28SM8CcQHaR6vdUJF9U8s=.dcfab97b-be28-4244-93df-c8a23d6d66b8@github.com>
 <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>
Message-ID: <jbCJQiFNr7fqAPDXcVPMM10gsjthnq_HMaUuv7L9gQI=.72a5f375-f03b-4117-9607-2cd3a1950399@github.com>

On Wed, 10 Sep 2025 21:46:17 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> You could mention that `Identity` will remove the node later.
>
>> Also: do you need the return value? The only use case does not do anything with it.
> 
> I decided to keep it for diagnostic purposes even though no existing callers care about it.

> You could mention that Identity will remove the node later.

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2337989216

From dzhang at openjdk.org  Wed Sep 10 23:55:10 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Wed, 10 Sep 2025 23:55:10 GMT
Subject: RFR: 8367293: RISC-V: enable vectorapi test for
 VectorMask.laneIsSet
In-Reply-To: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
References: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
Message-ID: <4K4PN7Fl15s3VmsLEI4_u7RZqRTBU4m8KUtt7kYSUfc=.eb73d339-bfff-4a96-9c26-c43eacc66314@github.com>

On Wed, 10 Sep 2025 03:10:02 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8366588](https://bugs.openjdk.org/browse/JDK-8366588) adds a vectorapi test for VectorMask.laneIsSet, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskLaneIsSetTest.java on k1, k230 and sg2042

Thanks all for the review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27181#issuecomment-3276890571

From duke at openjdk.org  Wed Sep 10 23:55:10 2025
From: duke at openjdk.org (duke)
Date: Wed, 10 Sep 2025 23:55:10 GMT
Subject: RFR: 8367293: RISC-V: enable vectorapi test for
 VectorMask.laneIsSet
In-Reply-To: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
References: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
Message-ID: <4ve1VKFLupxZj5WpLeVbQ98wjoC7hXFlmgL0p5kwbW8=.0b819012-bdf7-400f-9ef7-95e35fd4dfd7@github.com>

On Wed, 10 Sep 2025 03:10:02 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8366588](https://bugs.openjdk.org/browse/JDK-8366588) adds a vectorapi test for VectorMask.laneIsSet, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskLaneIsSetTest.java on k1, k230 and sg2042

@DingliZhang 
Your change (at version c7a5e95ad5f7b84333509375db249c0797b480c4) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27181#issuecomment-3276892219

From dzhang at openjdk.org  Thu Sep 11 00:07:20 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Thu, 11 Sep 2025 00:07:20 GMT
Subject: Integrated: 8367293: RISC-V: enable vectorapi test for
 VectorMask.laneIsSet
In-Reply-To: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
References: <EWSDz9fKUgj_I0INgH88Veng-6sK1t48_tRkFc1pqA4=.3b35a3ab-f369-4c36-8c78-06daee52a8a6@github.com>
Message-ID: <0GdMbtdpbOdvhk1eNjHjaeuRgHSinMk-oG95r_uiwWc=.e58f778a-3527-40a6-977d-a0c44164d54b@github.com>

On Wed, 10 Sep 2025 03:10:02 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8366588](https://bugs.openjdk.org/browse/JDK-8366588) adds a vectorapi test for VectorMask.laneIsSet, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskLaneIsSetTest.java on k1, k230 and sg2042

This pull request has now been integrated.

Changeset: 134c3ef4
Author:    Dingli Zhang <dzhang at openjdk.org>
Committer: Fei Yang <fyang at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/134c3ef41e774b483bcce32ce2fe0ef416017728
Stats:     7 lines in 1 file changed: 0 ins; 0 del; 7 mod

8367293: RISC-V: enable vectorapi test for VectorMask.laneIsSet

Reviewed-by: fyang, epeter

-------------

PR: https://git.openjdk.org/jdk/pull/27181

From sparasa at openjdk.org  Thu Sep 11 00:45:45 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Thu, 11 Sep 2025 00:45:45 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v5]
In-Reply-To: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
Message-ID: <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>

> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
> 
> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
> 
> For example:
> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding

Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:

  undo new match rules for RegMemReg for commutative operations

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26997/files
  - new: https://git.openjdk.org/jdk/pull/26997/files/9714a9b1..012511ab

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26997&range=03-04

  Stats: 120 lines in 1 file changed: 0 ins; 120 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/26997.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26997/head:pull/26997

PR: https://git.openjdk.org/jdk/pull/26997

From sparasa at openjdk.org  Thu Sep 11 00:45:45 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Thu, 11 Sep 2025 00:45:45 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v5]
In-Reply-To: <hnQ3ti2GcS0BrzRaU2jKby4D1ou_1niFo_l4_WxwYvk=.837b0e66-4472-49be-8f3f-a07ef331af74@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <Z05ERz5_lcuvZcuF7YQ1qBv6eeHMPiH1RdpdvE-aTds=.7f197932-f7a6-4845-9d04-a5c29ee7ca0b@github.com>
 <0X5cvpQZxb1l5Q_8f-iU0K4WtdyFW8ehdPXR2zsnSzo=.7f4f3d03-94db-4482-b5ee-c5f1362d84b5@github.com>
 <hnQ3ti2GcS0BrzRaU2jKby4D1ou_1niFo_l4_WxwYvk=.837b0e66-4472-49be-8f3f-a07ef331af74@github.com>
Message-ID: <teidXZJMVhx0zRfX5-K-z4p2jUAY2UC0Q5QXt8dSSVk=.b858ed30-5e0f-43f5-a42d-373237166a7f@github.com>

On Tue, 9 Sep 2025 02:18:32 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Will run experiments to make sure that the RegRegMem pattern also applies to RegMemReg case and remove the newly added match rules if they're redundant. Will update you soon.
>
> Hi @vamsi-parasa, your latest patch does not address this.

Hi Jatin (@jatin-bhateja), please see the latest update which removed the unnecessary match rules for RegMemReg case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26997#discussion_r2338238097

From missa at openjdk.org  Thu Sep 11 02:19:54 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 11 Sep 2025 02:19:54 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v11]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:

 - Check for instructions that shouldn't appear in vector floating point conversion tests
 - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/bc59e4d2..8587952d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=09-10

  Stats: 38 lines in 3 files changed: 2 ins; 0 del; 36 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From epeter at openjdk.org  Thu Sep 11 05:07:31 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 11 Sep 2025 05:07:31 GMT
Subject: RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes [v6]
In-Reply-To: <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
 <TxVrg4YQuQqoGvnUpOxnQheehz3zaihOJONpmL2MzZU=.8af6afb9-381f-4633-8fa9-73242552d170@github.com>
Message-ID: <UZ0SrZBuk6Jsjmjs1IkcyHyeXqf79BzV4prbgzknPS0=.94d7932a-4ebe-4834-bea7-006bc254ed2e@github.com>

On Fri, 5 Sep 2025 17:48:21 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix include order
>
> Thank you for your continued effort on cost modelling, @eme64! I have some minor style comments and questions, but this mostly looks good to me. 
> 
> Regarding style, I find the alignment of local variables to be a bit distracting, especially when the aligned "things" are different operations and things are sometimes aligned and sometimes not. However, I do not know the style of the rest of the SuperWord code.

@mhaessig @galderz @chhagedorn Thanks for reviewing and all the helpful suggestions :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27056#issuecomment-3277619564

From epeter at openjdk.org  Thu Sep 11 05:07:33 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 11 Sep 2025 05:07:33 GMT
Subject: Integrated: 8366702: C2 SuperWord: refactor VTransform vector nodes
In-Reply-To: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
References: <W9dH1mEa1uVX7kL0daRw5HmVKoeb5wFLLx7_kbyOZk0=.c02d22d1-1033-49db-a1ff-0ce0c1acbcc3@github.com>
Message-ID: <_4wlIBKArnJ0dC8M_Mfoa3I1JQ77CkOLDtSTP3KYPns=.eb956bae-c733-41c4-aa4a-d997feca2a80@github.com>

On Tue, 2 Sep 2025 15:30:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

This pull request has now been integrated.

Changeset: 4cc75be8
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/4cc75be80e6a89e0ed293e2f8bbb6d0f94189468
Stats:     352 lines in 4 files changed: 173 ins; 65 del; 114 mod

8366702: C2 SuperWord: refactor VTransform vector nodes

Reviewed-by: chagedorn, galder

-------------

PR: https://git.openjdk.org/jdk/pull/27056

From epeter at openjdk.org  Thu Sep 11 05:08:27 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 11 Sep 2025 05:08:27 GMT
Subject: RFR: 8367243: Format issues with dist dump debug output in
 PhaseGVN::dead_loop_check
In-Reply-To: <K_oxZys5zrgre-bHxtI1Bh6aow2VL_eY_iTmNn1gTvc=.ad2a5db0-4dad-42cb-87ed-d39b9ab6ea56@github.com>
References: <auxUv9rgsSDSxqghkIa5eSdcluTtyNUby-W269iQIRg=.0b53ca20-dde2-4c95-a992-8bdb6bbbe77c@github.com>
 <K_oxZys5zrgre-bHxtI1Bh6aow2VL_eY_iTmNn1gTvc=.ad2a5db0-4dad-42cb-87ed-d39b9ab6ea56@github.com>
Message-ID: <5csypyowSL57JQ2SIkrH7CwktQ4nXeN7eNQSUS9nghQ=.5857a143-c420-48d9-aac9-9658211e1966@github.com>

On Tue, 9 Sep 2025 16:28:24 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> The `#` option adds color to the terminal. But that only usually works on people's terminals, and not if it is piped to a file on the server. Hence, `#` is only really a debugging feature, and not one to report with in connection with `assert`s.
>> 
>> Simply removed the `#`, and fixed some braces and spaces.
>
> Looks good and trivial!

@TobiHartmann Thanks for the review! I agree it is trivial.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27175#issuecomment-3277641678

From epeter at openjdk.org  Thu Sep 11 05:08:29 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 11 Sep 2025 05:08:29 GMT
Subject: Integrated: 8367243: Format issues with dist dump debug output in
 PhaseGVN::dead_loop_check
In-Reply-To: <auxUv9rgsSDSxqghkIa5eSdcluTtyNUby-W269iQIRg=.0b53ca20-dde2-4c95-a992-8bdb6bbbe77c@github.com>
References: <auxUv9rgsSDSxqghkIa5eSdcluTtyNUby-W269iQIRg=.0b53ca20-dde2-4c95-a992-8bdb6bbbe77c@github.com>
Message-ID: <tsfNTU--NENj2Oxr0cKzIAr8TyNERRHvylX5XurejUo=.a572f1d6-9af1-4893-b1ec-66520d8bd00b@github.com>

On Tue, 9 Sep 2025 16:20:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> The `#` option adds color to the terminal. But that only usually works on people's terminals, and not if it is piped to a file on the server. Hence, `#` is only really a debugging feature, and not one to report with in connection with `assert`s.
> 
> Simply removed the `#`, and fixed some braces and spaces.

This pull request has now been integrated.

Changeset: 2826d170
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/2826d1702534783023802ac5c8d8ea575558f09f
Stats:     1 line in 1 file changed: 0 ins; 0 del; 1 mod

8367243: Format issues with dist dump debug output in PhaseGVN::dead_loop_check

Reviewed-by: thartmann

-------------

PR: https://git.openjdk.org/jdk/pull/27175

From epeter at openjdk.org  Thu Sep 11 05:22:18 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 11 Sep 2025 05:22:18 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and Expressions
Message-ID: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>

Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).

Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.

Details, in **order you should review**:
- `Operations.java`: maps lots of primitive operators as Expressions.
- `Expression.java`: the fundamental engine behind Expressions.
- `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
- `tests/TestExpression.java`: correctness test of Expression machinery.
- `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
- `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
- `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.

If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.

**Future Work**:
- Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
- Use `Expression`s to model more operations:
  - `Vector API`, more arithmetic operations like from `Math` classes etc.
- Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
- Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `StressCCP` it can at arbitrary times in CCP and IGVN narrow its type or fold away. Initially, it outputs the `bottom_type`, no matter the input type. Eventually, we can progressively update the output to be narrower, as long as it still contains the input type. And at some point fold it away. Each time, this should trigger worklist notification, and could trigger optimizations. If there is a bug, IGVN / CCP verification could catch it.

-------------

Commit messages:
 - fix whitespaces
 - LibraryRNG example
 - fix bug
 - documentation
 - improve expression fuzzer
 - wip constraints
 - add more comments
 - wip test cmp
 - test refactoring
 - handle non-deterministic results
 - ... and 15 more: https://git.openjdk.org/jdk/compare/02fe095d...0709731a

Changes: https://git.openjdk.org/jdk/pull/26885/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8359412
  Stats: 1702 lines in 7 files changed: 1702 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/26885.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885

PR: https://git.openjdk.org/jdk/pull/26885

From galder at openjdk.org  Thu Sep 11 06:11:11 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Thu, 11 Sep 2025 06:11:11 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress
In-Reply-To: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
Message-ID: <G8aVuW-KQmy7GbZY0QblQy5taiBlNGRc6XP_Wz1TwWg=.5515c4a2-e293-4d08-a0cd-7b039cd10f43@github.com>

On Wed, 10 Sep 2025 08:41:51 GMT, erifan <duke at openjdk.org> wrote:

> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
> 
> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
> 
> This pull request introduces the following changes:
> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
> 2. Eliminates unnecessary compress operations for partial subword type cases.
> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
> 
> Benchmark results demonstrate that these changes significantly improve performance.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
> 
> 
> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.

Would it make sense to additionally run the relevant benchmarks on other popular aarch64 platforms such as Graviton, to make sure the improvements are seen there as well?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27188#issuecomment-3278225500

From galder at openjdk.org  Thu Sep 11 06:15:14 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Thu, 11 Sep 2025 06:15:14 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress
In-Reply-To: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
Message-ID: <ZBjcahJBoRNtaFQl_Fqxfyl9nLiYdvxbB8Sd-bhBkyA=.5af061c8-e9b0-4d70-a3f5-8025e56b7a23@github.com>

On Wed, 10 Sep 2025 08:41:51 GMT, erifan <duke at openjdk.org> wrote:

> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
> 
> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
> 
> This pull request introduces the following changes:
> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
> 2. Eliminates unnecessary compress operations for partial subword type cases.
> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
> 
> Benchmark results demonstrate that these changes significantly improve performance.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
> 
> 
> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.

Changes requested by galder (Author).

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2292:

> 2290:   // Return if the vector length is no more than MaxVectorSize/2, since the
> 2291:   // highest half is invalid.
> 2292:   if (vector_length_in_bytes <= (MaxVectorSize >> 1)) {

Couldn't this check be done first thing when the function is called? Then you would avoid unnecessary work?

I also wonder if this check should be done before `sve_compress_byte` is called, but I think at the very least it should be done first thing in this function.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27188#pullrequestreview-3209040850
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2338760542

From rcastanedalo at openjdk.org  Thu Sep 11 07:45:20 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 11 Sep 2025 07:45:20 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v2]
In-Reply-To: <gFzwQrRGY91LYbe0DluSjX80ET_kk6Kx7arucF2GW5c=.55deed6b-9e47-4000-9916-f1392967aa99@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <gFzwQrRGY91LYbe0DluSjX80ET_kk6Kx7arucF2GW5c=.55deed6b-9e47-4000-9916-f1392967aa99@github.com>
Message-ID: <8gKEdtd0n1SEUAGX-1Q41O0ZkCLNw2jUmXzDo1tWpyk=.30e53d54-d958-422e-8206-60fd56b9e412@github.com>

On Wed, 10 Sep 2025 18:39:06 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

>> Please, review this patch to fix issue that may occur when reducing allocation merge.
>> 
>> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
>> 
>> The change in `revisit_reducible_phi_status` is just a clean-up.
>> The real fix is in `find_scalar_replaceable_allocs`.
>> 
>> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.
>
> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert clean-up in EA. Make catch statements more specific in test case.

Changes requested by rcastanedalo (Reviewer).

test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationNotReducibleAnymore.java line 38:

> 36: 
> 37: public class TestReduceAllocationNotReducibleAnymore {
> 38:     public static void main(String[] args) {

Suggestion:

    public static void main (String[] args) {

test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationNotReducibleAnymore.java line 39:

> 37: public class TestReduceAllocationNotReducibleAnymore {
> 38:     public static void main(String[] args) {
> 39:         for (int i =0; i< 100; i++) {

Suggestion:

        for (int i = 0; i < 100; i++) {

-------------

PR Review: https://git.openjdk.org/jdk/pull/27063#pullrequestreview-3209508720
PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2339170016
PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2339171222

From rcastanedalo at openjdk.org  Thu Sep 11 07:45:22 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 11 Sep 2025 07:45:22 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <2brDXuLmbVBVRaeSyCdKokA706v3t6VsZfGvj_QceJ4=.4483390e-c726-4d82-b220-f1dbdf4efef0@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
 <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>
 <2brDXuLmbVBVRaeSyCdKokA706v3t6VsZfGvj_QceJ4=.4483390e-c726-4d82-b220-f1dbdf4efef0@github.com>
Message-ID: <t_dqm0EN559GBmw4cJBPitWnDxmNFjoGMran3JxdRVI=.cb8a4e4f-7ce7-48f6-9c11-fe646c57efd7@github.com>

On Tue, 9 Sep 2025 07:01:15 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>>> Hi Cesar, thanks for addressing this issue. I will run some more comprehensive testing and have a look at it in the next days.
>> 
>> Testing did not reveal any issue. I have, however, a high-level question: could the current two-step design ([SR state adjustment loop](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L300-L315) followed by a [NSR propagation loop](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L318-L320) miss marking allocations as NSR in more complex scenarios, e.g. involving longer points-to/merge chains? Wouldn't it be more principled to re-run the SR state adjustment loop until a fixed point is reached, keeping `reducible_merges` consistent as new allocations are discovered to be NSR? (e.g. by calling `revisit_reducible_phi_status` - with your clean-up applied - every time [an allocation is marked as NSR due to non-removable merges](https://github.com/openjdk/jdk/blob/166ef5e7b1c6d6a9f0f1f29fedb7f65b94f53119/src/hotspot/share/opto/escape.cpp#L2962-L2964)).
>
>> @robcasloz - are you thinking that the "fixed point" loops on `find_scalar_replaceable_allocs` aren't sufficient?
> 
> You're right, that should do.
> 
>> At first glance yes, I think that the code would be more cleaned up if done that way. If the code had been written like that in the first place we wouldn't have seen the current issue. (...)
> 
> Agree, a single fixed point loop combining NSR detection and propagation would be ideal for clarity and maintainability.
> 
>>  I propose that we move forward with the current patch and work on this refactoring as a separate issue.
> 
> Sounds good, please file a RFE for that. I would suggest then to postpone the clean-up in `revisit_reducible_phi_status` to that RFE.

> @robcasloz - I pushed some changes addressing yours and @eme64 comments. Could you please re-run your internal tests?

Thanks, I will report back within a couple of days.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3278934984

From rcastanedalo at openjdk.org  Thu Sep 11 07:50:33 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 11 Sep 2025 07:50:33 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v8]
In-Reply-To: <hGGgYXj4IJCGws1HtyYZjSjpi88IemdVUxZO1HaVDdc=.9ee892d7-09ec-4752-a4ad-385ff209c5c0@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com>
 <phFEV6ecal3bMYgAt85dr5f6UKm024p2Ssw2l5zDvOQ=.c332a12d-5009-4e99-abc4-e0d58f06a075@github.com>
 <JczlkGMI1ugc2011v3_yecnmAihjcv5YYyixFtvZjvk=.3994dece-26bc-4c73-9850-8f63986b6fc7@github.com>
 <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com>
 <eMGWpjjtAvxGzXXgDpfqUyz-LHobPg5dEAk99yQYhic=.81804900-b4ae-4b71-9a39-893fa7b6d36c@github.com>
 <LeeKE7VBNvxxD8-1ltyf2CGltyUV90y-ZabbxGVYXZc=.79192936-6954-4b74-a4ec-ead162efe4e2@github.com>
 <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com>
 <QtsENUXeRsA140liru9rjk0KDbNVhKj6qPVU8toDlkI=.4b9eadfe-045e-4bae-a2c8-40c04496cb60@github.com>
 <BrNHUWgnhDZWz523gq_a8Smxck7UE0r0gBLQHfydrXk=.d96048bf-497e-426d-bdab-b58e63b1e5c6@github.com>
 <hGGgYXj4IJCGws1HtyYZjSjpi88IemdVUxZ
 O1HaVDdc=.9ee892d7-09ec-4752-a4ad-385ff209c5c0@github.com>
Message-ID: <ZChc05Qt2p92YdfYKDubkDBnkvFqv3ETpjXRVyxKhnQ=.24861051-f0b7-4ba6-960d-92a5cf9ecf9a@github.com>

On Wed, 10 Sep 2025 08:29:19 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> That sounds good to me, thank you for enforcing this Roland! I will re-run testing and have a new look at the changeset within the next days.

Test results of b701d03ed335286587c4d2539dde715b091d30bd on top of jdk-26+14 look good. Will have a look at the code within the next days.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3278966805

From rcastanedalo at openjdk.org  Thu Sep 11 08:40:15 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 11 Sep 2025 08:40:15 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v2]
In-Reply-To: <8gKEdtd0n1SEUAGX-1Q41O0ZkCLNw2jUmXzDo1tWpyk=.30e53d54-d958-422e-8206-60fd56b9e412@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <gFzwQrRGY91LYbe0DluSjX80ET_kk6Kx7arucF2GW5c=.55deed6b-9e47-4000-9916-f1392967aa99@github.com>
 <8gKEdtd0n1SEUAGX-1Q41O0ZkCLNw2jUmXzDo1tWpyk=.30e53d54-d958-422e-8206-60fd56b9e412@github.com>
Message-ID: <vV3Y_Ti-Gw_s_ul-nDjoBibcO67Hru9DPKC1uTMqoyo=.6abec990-436f-47ce-bb40-a62a1296cfc3@github.com>

On Thu, 11 Sep 2025 07:40:03 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Revert clean-up in EA. Make catch statements more specific in test case.
>
> test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationNotReducibleAnymore.java line 38:
> 
>> 36: 
>> 37: public class TestReduceAllocationNotReducibleAnymore {
>> 38:     public static void main(String[] args) {
> 
> Suggestion:
> 
>     public static void main (String[] args) {

@JohnTortugo Please disregard this style suggestion, I had not had my morning coffee yet.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2339463846

From mli at openjdk.org  Thu Sep 11 09:03:56 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 11 Sep 2025 09:03:56 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v5]
In-Reply-To: <IzlmxvN2cYF-OVYP_QsLfsGHpdI1EyVMIW-blkQa_Ko=.d3579688-3ffa-456a-a999-c1ec75ccc72e@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <64z-PlrnxAISLzKBq-RZz7CXkQirGTvOgTGMJQl833o=.73ea3239-dfb6-4e32-b20f-8398334f2759@github.com>
 <IzlmxvN2cYF-OVYP_QsLfsGHpdI1EyVMIW-blkQa_Ko=.d3579688-3ffa-456a-a999-c1ec75ccc72e@github.com>
Message-ID: <TbX0Ps86Ds60F98KXjt_afTSj_9dhe3jz1ohwM7cL1w=.8f9526fd-8427-45dc-9243-29af915a9278@github.com>

On Wed, 10 Sep 2025 18:43:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> Hamlin had some offline Q so I gather this data for him:

Thanks Robbin for collecting the data!

> So on average using auipc+ld+jalr + JAL opt is 1.73% faster than the old trampolines.

This looks great!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3279324038

From mli at openjdk.org  Thu Sep 11 09:03:58 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 11 Sep 2025 09:03:58 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v2]
In-Reply-To: <R_lZuWiCR0VbCKRNb6cIlcdJKcmUca2HdD4m-Z_lK-w=.f2351d4e-1c54-4eac-9ff6-1e43a06ecfad@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <BymtYSgSlXggycd57z3AGcv-BFeaLsawQuSXWo_93lc=.6dc39f16-8cc8-4956-8214-ba7bc9b648a3@github.com>
 <LUj3nxfsDhJ2kTdroX_W8MZCHAlgjEtQ2byk-ke_Cos=.f8503d8f-02dc-442c-b871-1a7fa735dc93@github.com>
 <GKX55pfbOY1fAST3KbnIX3a-ZSHTg704ry9wrXbMbmQ=.4ebe7c56-6339-4982-a59b-1297f4a35732@github.com>
 <mp1Jr2Vt3uN19r4fINgunk_5JleGfm7HVpYsKSdeF5c=.4097c1ec-0dbd-4ede-8850-d4dac6a33705@github.com>
 <r6BB4foBp1qFm3dabVKnYTWHDXM7ZgiWlBnR4tN0Tdg=.d862d996-2e84-49fd-983a-29be846e23b1@github.com>
 <dK7zHpwKfAakWruo-sKFP5pPlMJV-ZWL0h5oX5Jag5Q=.3d028e3f-2d6b-40d0-90fd-9a0d4cd4c7f8@github.com>
 <R_lZuWiCR0VbCKRNb6cIlcdJKcmUca2HdD4m-Z_lK-w=.f2351d4e-1c54-4eac-9ff6-1e43a06ecfad@github.com>
Message-ID: <Goc1r75ZcZvBM4wRKCVbFBGBK1ZOEzQjaKHfI5rsdlE=.23f50846-23f3-469a-a8c3-ae961857614c@github.com>

On Wed, 3 Sep 2025 09:47:07 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>>> But the AbstractICache::invalidate_range is not documented to guarantee to have this effect.
>> 
>> what "not documented" here mean? By reading the code, seems `AbstractICache::invalidate_range` will delegate to `icache_flush` in riscv which will do the fence and flush.
>> 
>> BTW, here are some comments from hotspot/share/runtime/icache.hpp,
>> 
>> // Default implementation is in icache.cpp, and can be hidden per-platform.
>> // Most platforms must provide only ICacheStubGenerator::generate_icache_flush().
>> 
>> 
>>> If someone executes the new instruction when changed to jalr(3), we did want them to call the new location we stored to the stub(1). By saying 1 happens before 3, we convey our intent.
>>> Aarch64 also have this.
>> 
>> Make sense!
>> In worst condition, what will happen if we remove the 2 release here and just count on `fence rw, rw` in `AbstractICache::invalidate_range`? Seems we're fine based on your latter comment.
>> I suppose these extra 2 releases bring some performance penalty? If this is true, I'm not sure if it's worth to treat such a rare condition in such a proper way.
>
>> > But the AbstractICache::invalidate_range is not documented to guarantee to have this effect.
>> 
>> what "not documented" here mean? By reading the code, seems `AbstractICache::invalidate_range` will delegate to `icache_flush` in riscv which will do the fence and flush.
>> 
>> BTW, here are some comments from hotspot/share/runtime/icache.hpp,
>> 
>> ```
>> // Default implementation is in icache.cpp, and can be hidden per-platform.
>> // Most platforms must provide only ICacheStubGenerator::generate_icache_flush().
>> ```
> 
> Yes, and it doesn't say this method also provide a release fence or anything like that.
> I other general code we seem to needed, I can remove release(4) for a comment if you like.
> 
>> 
>> > If someone executes the new instruction when changed to jalr(3), we did want them to call the new location we stored to the stub(1). By saying 1 happens before 3, we convey our intent.
>> > Aarch64 also have this.
>> 
>> Make sense! In worst condition, what will happen if we remove the 2 release here and just count on `fence rw, rw` in `AbstractICache::invalidate_range`? Seems we're fine based on your latter comment. I suppose these extra 2 releases bring some performance penalty? If this is true, I'm not sure if it's worth to treat such a rare condition in such a proper way.
> 
> Yes, we should be fine, but there is no reason to not store them in 'wish' order.
> No there is no perfomance differences, this code is not executed often and the call to invalidate_range is so slow that anything else don't matter. You are talking about removing a few cycles from something that take tens of thousands of cycles.

I think we'd better to remove the code which is not necessary, in the sense of performance and readability.
If needed, we can add some comments here instead.

Otherwise the change looks good to me. Thanks!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2339594340

From rehn at openjdk.org  Thu Sep 11 09:19:59 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 11 Sep 2025 09:19:59 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v6]
In-Reply-To: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
Message-ID: <rYU7bMi_JCAFJKjZdTciL8TTbwA_bomSs3EJzEORWRs=.53465c22-1450-4eae-a4e9-2c94e210d652@github.com>

> Hey, please consider!
> 
> A bunch of info in JBS entry, please read that also.
> 
> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
> This patch restores them and removes this regression.
> 
> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
> 
> Please test on your hardware!
> 
> 
> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> 
> 
> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.

Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision:

 - Review fix
 - Merge branch 'master' into 8365926
 - Merge branch 'master' into 8365926
 - Review comments
 - Review comments
 - Merge branch 'master' into 8365926
 - Spelling
 - Merge branch 'master' into 8365926
 - draft jal<->jalr

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26944/files
  - new: https://git.openjdk.org/jdk/pull/26944/files/da18e6b6..b4e6c579

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26944&range=04-05

  Stats: 16185 lines in 515 files changed: 7189 ins; 6335 del; 2661 mod
  Patch: https://git.openjdk.org/jdk/pull/26944.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26944/head:pull/26944

PR: https://git.openjdk.org/jdk/pull/26944

From mli at openjdk.org  Thu Sep 11 09:25:16 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 11 Sep 2025 09:25:16 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v6]
In-Reply-To: <rYU7bMi_JCAFJKjZdTciL8TTbwA_bomSs3EJzEORWRs=.53465c22-1450-4eae-a4e9-2c94e210d652@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <rYU7bMi_JCAFJKjZdTciL8TTbwA_bomSs3EJzEORWRs=.53465c22-1450-4eae-a4e9-2c94e210d652@github.com>
Message-ID: <Kw6bbUEAQ7F2f0EIVfvAxOodCeQzsfJkyx5AGZGUusc=.70e01c02-ed6e-470f-acc9-59bcb77f236a@github.com>

On Thu, 11 Sep 2025 09:19:59 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hey, please consider!
>> 
>> A bunch of info in JBS entry, please read that also.
>> 
>> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
>> This patch restores them and removes this regression.
>> 
>> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
>> 
>> Please test on your hardware!
>> 
>> 
>> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
>> JDK-23 (last version with trampoline calls)
>> Mean: 3189.5827
>> Standard Deviation: 284.6478
>> 
>> JDK-25
>> Mean: 3424.8905
>> Standard Deviation: 222.2208
>> 
>> Patch:
>> Mean: 3144.8535
>> Standard Deviation: 229.2577
>> 
>> 
>> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision:
> 
>  - Review fix
>  - Merge branch 'master' into 8365926
>  - Merge branch 'master' into 8365926
>  - Review comments
>  - Review comments
>  - Merge branch 'master' into 8365926
>  - Spelling
>  - Merge branch 'master' into 8365926
>  - draft jal<->jalr

Looks good. Thanks!

-------------

Marked as reviewed by mli (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26944#pullrequestreview-3210160509

From thartmann at openjdk.org  Thu Sep 11 09:45:29 2025
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Thu, 11 Sep 2025 09:45:29 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v2]
In-Reply-To: <gFzwQrRGY91LYbe0DluSjX80ET_kk6Kx7arucF2GW5c=.55deed6b-9e47-4000-9916-f1392967aa99@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <gFzwQrRGY91LYbe0DluSjX80ET_kk6Kx7arucF2GW5c=.55deed6b-9e47-4000-9916-f1392967aa99@github.com>
Message-ID: <tXYgEjTd-sJKM-pfP6s4-b1ej_0qFGlsCcLB7hLNYxM=.f72a884f-8b9a-43fc-91ff-0b5a1c91fbfc@github.com>

On Wed, 10 Sep 2025 18:39:06 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

>> Please, review this patch to fix issue that may occur when reducing allocation merge.
>> 
>> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
>> 
>> The change in `revisit_reducible_phi_status` is just a clean-up.
>> The real fix is in `find_scalar_replaceable_allocs`.
>> 
>> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.
>
> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert clean-up in EA. Make catch statements more specific in test case.

src/hotspot/share/opto/escape.cpp line 3135:

> 3133:           Node* phi = use->ideal_node();
> 3134:           if (phi->Opcode() == Op_Phi && reducible_merges.member(phi)) {
> 3135:             if (!can_reduce_phi(phi->as_Phi())) {

Drive-by comment: I think the ifs should be merged

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2339804735

From dlunden at openjdk.org  Thu Sep 11 10:02:33 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Thu, 11 Sep 2025 10:02:33 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v26]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <oRHbPZI1QXNKCfqk8IbdWcTHbIqZsbZHAsgHyUYMf3o=.344b6b42-2978-4a75-9711-9d6e4f0e6e2a@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Address review comments (renaming on the way in a separate PR)

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20404/files
  - new: https://git.openjdk.org/jdk/pull/20404/files/c4a706b5..f250a061

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=25
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=24-25

  Stats: 203 lines in 2 files changed: 183 ins; 14 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From dlunden at openjdk.org  Thu Sep 11 10:02:38 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Thu, 11 Sep 2025 10:02:38 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <nPTzMYzEUYJny3vO2sSKelMlFnsdxzKrKisedajsGlI=.d58e9b0e-dbfc-4cd1-8010-046621d48351@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <nPTzMYzEUYJny3vO2sSKelMlFnsdxzKrKisedajsGlI=.d58e9b0e-dbfc-4cd1-8010-046621d48351@github.com>
Message-ID: <A6219amqOCQ9jXs3RyRgFQ309qcYfU4cXjPpX0-DTlc=.106790a9-ff9b-4855-a18b-48d292198b67@github.com>

On Mon, 1 Sep 2025 07:30:49 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
>> 
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Fix typo
>>  - Updates after Emanuel's comments
>>  - Refactor and improve TestNestedSynchronize.java
>>  - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47
>
> test/hotspot/jtreg/compiler/arguments/TestMaxMethodArguments.java line 57:
> 
>> 55:         try {
>> 56:             test(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 21
 7, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255);
>> 57:         } catch (TestException e) {
> 
> This seems to be the only test that actually tests what your PR title promises: it has a method with many arguments.

I have now pushed a new template framework test `TestMethodArguments.java`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2339895018

From dlunden at openjdk.org  Thu Sep 11 10:13:11 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Thu, 11 Sep 2025 10:13:11 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp
Message-ID: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>

Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.

### Changeset

- Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
- Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
- Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
- Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.

### Testing

- [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
- `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

-------------

Commit messages:
 - Fix issue

Changes: https://git.openjdk.org/jdk/pull/27215/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367397
  Stats: 108 lines in 12 files changed: 1 ins; 0 del; 107 mod
  Patch: https://git.openjdk.org/jdk/pull/27215.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27215/head:pull/27215

PR: https://git.openjdk.org/jdk/pull/27215

From dlunden at openjdk.org  Thu Sep 11 10:13:11 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Thu, 11 Sep 2025 10:13:11 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <JccL1PeACLU1ktMIxHq9fK4N6BoZ5HTCzOTlFFLuWrQ=.35f0c1f7-605a-4821-8bf0-e89eefc1d80d@github.com>

On Thu, 11 Sep 2025 10:04:47 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Suggesting @robcasloz and @eme64 for reviewing, as you are already familiar with the code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27215#issuecomment-3279707428

From dlunden at openjdk.org  Thu Sep 11 10:16:33 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Thu, 11 Sep 2025 10:16:33 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v23]
In-Reply-To: <qfJuLa2rYGYnrmbp32LpJgVaZfShvNjVkGOuJrSw00A=.5f7b712d-5700-45b5-8beb-fde3611e31de@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <KuhZYofHDkGkzw1Kq6vDvRs4_aDxOJDbTpIL8gnkQL8=.0d25e4bc-1f73-490f-a65b-29bef7ac8903@github.com>
 <qfJuLa2rYGYnrmbp32LpJgVaZfShvNjVkGOuJrSw00A=.5f7b712d-5700-45b5-8beb-fde3611e31de@github.com>
Message-ID: <GjF5qX4BV-4xAWV6kDweN3luDSVQXxxp5i6creb7_L4=.085a85af-0ec5-42ca-a076-bbf554853d3a@github.com>

On Wed, 27 Aug 2025 09:08:09 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add clarifying comments at definitions of register mask sizes
>
>> For reference, here is now the changeset adding an IFG bailout: #26118
> 
> Since that is now integrated: do we need to make any changes to the patch here? I thought the goal was to use the bailouts instead of increasing `MaxNodeLimit`.
> 
> Because looking at the discussions above: we were worried that there could be compile-time regressions - even if quite rare. But they were in the range of 40s which is quite scary. Are these now gone?

@eme64 I have now addressed your comments (the renaming is in https://github.com/openjdk/jdk/pull/27215, as requested). Please have a look and let me know if I've missed something.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3279719061

From mli at openjdk.org  Thu Sep 11 10:49:22 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 11 Sep 2025 10:49:22 GMT
Subject: RFR: 8367406: Simple refactoring AOTCodeAddressTable::id_for_address
Message-ID: <-15yudSPZOyKnpwNY9mTVKdXDu4hcjxqZxI2AXodi5Q=.9a3db977-a9d8-4061-8a04-39ce967eb550@github.com>

Hi,
Can you help to review this simple refactoring?

AOTCodeAddressTable::id_for_address currently is implemented in a way that introduce too many nested if/else, seems we could make the code more readable by removing these nested if/else. But it's quite subjective, so I'll let you tell if the patch is helpful.

Run tests (test/hotspot/jtreg/runtime/cds/appcds/aot*), no new failures on x64.

Thanks!

-------------

Commit messages:
 - initial commit

Changes: https://git.openjdk.org/jdk/pull/27217/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27217&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367406
  Stats: 60 lines in 1 file changed: 9 ins; 12 del; 39 mod
  Patch: https://git.openjdk.org/jdk/pull/27217.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27217/head:pull/27217

PR: https://git.openjdk.org/jdk/pull/27217

From fandreuzzi at openjdk.org  Thu Sep 11 11:07:34 2025
From: fandreuzzi at openjdk.org (Francesco Andreuzzi)
Date: Thu, 11 Sep 2025 11:07:34 GMT
Subject: RFR: 8367406: Simple refactoring
 AOTCodeAddressTable::id_for_address
In-Reply-To: <-15yudSPZOyKnpwNY9mTVKdXDu4hcjxqZxI2AXodi5Q=.9a3db977-a9d8-4061-8a04-39ce967eb550@github.com>
References: <-15yudSPZOyKnpwNY9mTVKdXDu4hcjxqZxI2AXodi5Q=.9a3db977-a9d8-4061-8a04-39ce967eb550@github.com>
Message-ID: <99qr4pl_RhUPNfhtAirWoeY5fUDaYdEgOYXetw59yGw=.f91f9b32-7a3f-4e1a-848c-c12cf65394a6@github.com>

On Thu, 11 Sep 2025 10:42:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this simple refactoring?
> 
> AOTCodeAddressTable::id_for_address currently is implemented in a way that introduce too many nested if/else, seems we could make the code more readable by removing these nested if/else. But it's quite subjective, so I'll let you tell if the patch is helpful.
> 
> Run tests (test/hotspot/jtreg/runtime/cds/appcds/aot*), no new failures on x64.
> 
> Thanks!

src/hotspot/share/code/aotCodeCache.cpp line 1685:

> 1683:       desc = StubCodeDesc::desc_for(addr + frame::pc_return_offset);
> 1684:     }
> 1685:     const char* sub_name = (desc != nullptr) ? desc->name() : "<unknown>";

This seems to be used only in the assertion, maybe it could be hidden behind a `#ifdef ASSERT`?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27217#discussion_r2340199165

From jbhateja at openjdk.org  Thu Sep 11 12:16:47 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Thu, 11 Sep 2025 12:16:47 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v7]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <fSXhxnCbvqLQDqh6nvnQKE61sw4my40lxRcCciDmZxY=.ead513af-c671-449a-87b6-eb2d8d630d18@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Adding random bound test point

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/9e3957de..a7f9b79c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=05-06

  Stats: 60 lines in 1 file changed: 58 ins; 1 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From jbhateja at openjdk.org  Thu Sep 11 12:16:49 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Thu, 11 Sep 2025 12:16:49 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <qlo1opmYZQ8czfnsycZObRee3wbq_4nVyLsi2znLv9k=.2c6b1641-a784-4b44-afb1-faa68b733352@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <AUuedV6hy3s5kl9HPpeKsn2F60SCA9Boow9Id1hblIk=.4d01a071-5024-464c-8e5b-bcc8c1a435c3@github.com>
 <88lK21UPhkqWYMU-PNUCMYYH1QWrjiUfftspxZB7GFM=.99f8aa26-0075-4932-a427-054f088d8068@github.com>
 <7GYE4B_fk2sz0pxSjPgYxpTWz1v4T0-V-oMmBcS0tpY=.658141db-9bac-4f19-876f-f859ae00984b@github.com>
 <JYdOI7YvcPChDpvScIRPuEEPsxfmNNl2FBljazj_TzU=.db1076c4-7526-4722-91be-dd9d5218bf2b@github.com>
 <SrTJSVBZJcq9X_awIuwMAUdDfFZ9yaUzufdJt3W7QfM=.4bc569a7-074a-423d-aab8-04bf2d6ac9e0@github.com>
 <4PssROFHsUv9rYCp9KlszXVzJV4jIxbHSOWKmQ8VA0k=.db8cac2d-23fd-4ad0-ac1e-7c6e2f3c7b8e@github.com>
 <6eJGadjOxt_uInDZmiRc5MZNefslQT3-bOcsTp2tEe0=.0bdc2d27-d09a-4872-9633-51c2b55d1c18@github.com>
 <qlo1opmYZQ8czfnsycZObRee3wbq_4nVyLsi2znLv9k=.2c6b1641-a784-4b44-afb1-faa68b733352@github.com>
Message-ID: <4c4bRGY4F63wqzDBO5mcg5K6o51PkRPlfH65EsIYqXI=.b0a45206-5c7a-4635-8fbe-2f97cd6c6463@github.com>

On Wed, 10 Sep 2025 16:03:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> @SirYwell @jatin-bhateja @merykitty I linked this issue here to the KnownBits RFE, to make sure we keep track of all KnownBits extensions. Can you please help me with linking any other RFEs that have already been filed or come up in the future? It would help track progress and avoid duplicate work.
>> 
>> Current And Value Transforms :
>>     - Constant folds - both inputs
>>     -  There are four possible cases for known bits extraction : -
>> 
>>         _lo   _hi
>>       <0     <0        : Possibility of finding common prefix and known ZERO and ONE bits among the common portion.
>>       >=0     <0      : Not applicable scenario, since lower is greater than the upper bound.
>>       <0     >=0      : No possibility of finding a common prefix b/w hi and lo bounds, thus no known bits exist. 
>>       >=0   >=0      : Possibility of finding common prefix and known ZERO and ONE bits among the common portion.
>> 
>> 
>> Existing value transforms and canonicalization should furnish known bits in applicable scenarios. 
>> 
>> For a full solution, we can add another rule to directly AND the known ZERO and ONE bits of participating inputs, and let canonicalization compute the resultant type and clean up existing handling in Value transforms and explicit constant folding
>
>> For a full solution, we can add another rule to directly AND the known ZERO and ONE bits of participating inputs, and let canonicalization compute the resultant type and clean up existing handling in Value transforms and explicit constant folding
> 
> Yes, this is what we would end up with after https://bugs.openjdk.org/browse/JDK-8367341 . But I think currently, there is no good way to set / get bits directly.
> 
> Using signed comparisons as you mentioned is only of limited help. But it is what we have for now.

> > testPopCountElisionInt1 and testPopCountElisionLong1 check for absence of PopCount IR nodes.
> 
> @jatin-bhateja But there the clamps are with fixed constants. It would be nice if we also had some tests with randomized constants. We don't need IR tests for those, just result verification.

Hi @eme64, added a random bound test point.-

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2340528585

From mli at openjdk.org  Thu Sep 11 12:29:48 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 11 Sep 2025 12:29:48 GMT
Subject: RFR: 8367406: Simple refactoring
 AOTCodeAddressTable::id_for_address
In-Reply-To: <99qr4pl_RhUPNfhtAirWoeY5fUDaYdEgOYXetw59yGw=.f91f9b32-7a3f-4e1a-848c-c12cf65394a6@github.com>
References: <-15yudSPZOyKnpwNY9mTVKdXDu4hcjxqZxI2AXodi5Q=.9a3db977-a9d8-4061-8a04-39ce967eb550@github.com>
 <99qr4pl_RhUPNfhtAirWoeY5fUDaYdEgOYXetw59yGw=.f91f9b32-7a3f-4e1a-848c-c12cf65394a6@github.com>
Message-ID: <h-OUyBSccGLFRzy5Bbexct4XQlMNazG6e235HFjHwHk=.e15db576-be5b-4e92-836a-e58f2d835997@github.com>

On Thu, 11 Sep 2025 11:04:54 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

>> Hi,
>> Can you help to review this simple refactoring?
>> 
>> AOTCodeAddressTable::id_for_address currently is implemented in a way that introduce too many nested if/else, seems we could make the code more readable by removing these nested if/else. But it's quite subjective, so I'll let you tell if the patch is helpful.
>> 
>> Run tests (test/hotspot/jtreg/runtime/cds/appcds/aot*), no new failures on x64.
>> 
>> Thanks!
>
> src/hotspot/share/code/aotCodeCache.cpp line 1685:
> 
>> 1683:       desc = StubCodeDesc::desc_for(addr + frame::pc_return_offset);
>> 1684:     }
>> 1685:     const char* sub_name = (desc != nullptr) ? desc->name() : "<unknown>";
> 
> This seems to be used only in the assertion, maybe it could be hidden behind a `#ifdef ASSERT`?

I assume the compiler will remove it in product version, as the sub_name is not used anywhere else, and there is no side effect of its generation.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27217#discussion_r2340594415

From epeter at openjdk.org  Thu Sep 11 12:56:45 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 11 Sep 2025 12:56:45 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <2unG-RdDR2e1mI-veaR3AdDGGs1q4XFdITrnQtBGOw8=.47f7d565-b722-434b-96ef-b51ed733b241@github.com>

On Thu, 11 Sep 2025 10:04:47 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Thanks for doing this! I think it is a step in the right direction - though I have not checked if it renames everything we should. I think we can just do this one now, integrate it to your other PR, and see if we need to do another round of renamings.

src/hotspot/share/opto/regmask.hpp line 78:

> 76:     // is something like 90+ parameters.
> 77:     int       _RM_INT[RM_SIZE_IN_INTS];
> 78:     uintptr_t _RM_WORD[_RM_SIZE_IN_WORDS];

Is there now still a reason to have `_` for the words and not for the ints?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27215#pullrequestreview-3211313012
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2340652830

From epeter at openjdk.org  Thu Sep 11 12:56:46 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 11 Sep 2025 12:56:46 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp
In-Reply-To: <2unG-RdDR2e1mI-veaR3AdDGGs1q4XFdITrnQtBGOw8=.47f7d565-b722-434b-96ef-b51ed733b241@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <2unG-RdDR2e1mI-veaR3AdDGGs1q4XFdITrnQtBGOw8=.47f7d565-b722-434b-96ef-b51ed733b241@github.com>
Message-ID: <V01kxeeWm3UAJritt7R3PAFxS8SsVtxcttokr2Y-x84=.512a732e-de28-454a-bed6-ba9b2fb5979b@github.com>

On Thu, 11 Sep 2025 12:40:02 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> src/hotspot/share/opto/regmask.hpp line 78:
> 
>> 76:     // is something like 90+ parameters.
>> 77:     int       _RM_INT[RM_SIZE_IN_INTS];
>> 78:     uintptr_t _RM_WORD[_RM_SIZE_IN_WORDS];
> 
> Is there now still a reason to have `_` for the words and not for the ints?

Generally, we use `_` for fields, but not for constants.
Also: fields should be lower-case, so maybe `_RM_INT` -> `_rm_int`?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2340660689

From adinn at openjdk.org  Thu Sep 11 13:13:27 2025
From: adinn at openjdk.org (Andrew Dinn)
Date: Thu, 11 Sep 2025 13:13:27 GMT
Subject: RFR: 8367406: Simple refactoring
 AOTCodeAddressTable::id_for_address
In-Reply-To: <-15yudSPZOyKnpwNY9mTVKdXDu4hcjxqZxI2AXodi5Q=.9a3db977-a9d8-4061-8a04-39ce967eb550@github.com>
References: <-15yudSPZOyKnpwNY9mTVKdXDu4hcjxqZxI2AXodi5Q=.9a3db977-a9d8-4061-8a04-39ce967eb550@github.com>
Message-ID: <1TN7O_LAUKJXrMDQFCQWVbqK_z1EGwv19lIKZkIRA9U=.c7edd78b-a253-4284-afbc-a775e1983b0b@github.com>

On Thu, 11 Sep 2025 10:42:48 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this simple refactoring?
> 
> AOTCodeAddressTable::id_for_address currently is implemented in a way that introduce too many nested if/else, seems we could make the code more readable by removing these nested if/else. But it's quite subjective, so I'll let you tell if the patch is helpful.
> 
> Run tests (test/hotspot/jtreg/runtime/cds/appcds/aot*), no new failures on x64.
> 
> Thanks!

I'm not convinced this is making anything simpler. Also, it is diverging from the code we have in the Leyden repo which caters for further cases.

If this code does merit a cleanup (which I agree is the case) the that should really wait until we have
1. folded in cases currently catered for in Leyden premain that deal with translation of stub addresses
2. worked out a better way of managing addresses than the current use of several ad hoc bucket lists

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27217#issuecomment-3280585779

From chagedorn at openjdk.org  Thu Sep 11 13:20:58 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 11 Sep 2025 13:20:58 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v7]
In-Reply-To: <BfbDzTNQbwqUh0pXFbXiFy09JPGfbLPVmUTx-YEe1KM=.cad33852-809f-4907-a41d-628d0d0db07e@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <BfbDzTNQbwqUh0pXFbXiFy09JPGfbLPVmUTx-YEe1KM=.cad33852-809f-4907-a41d-628d0d0db07e@github.com>
Message-ID: <tMQz65CLLwiEE2Yu9txXYgg-sIz9Aqt_ar5HI95pJ6A=.92c0c9bf-8168-4d92-922b-42ccedb2dcff@github.com>

On Tue, 9 Sep 2025 08:39:37 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> A node in a pre loop only has uses out of the loop dominated by the
>> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
>> to the loop exit projection. A range check in the main loop has this
>> node as input (through a chain of some other nodes). Range check
>> elimination needs to update the exit condition of the pre loop with an
>> expression that depends on the node pinned on its exit: that's
>> impossible and the assert fires. This is a variant of 8314024 (this
>> one was for a node with uses out of the pre loop on multiple paths). I
>> propose the same fix: leave the node with control in the pre loop in
>> this case.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8361702
>  - Merge branch 'master' into JDK-8361702
>  - review
>  - Merge branch 'master' into JDK-8361702
>  - Update src/hotspot/share/opto/loopopts.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update src/hotspot/share/opto/loopopts.cpp
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>  - tests
>  - ... and 1 more: https://git.openjdk.org/jdk/compare/3ba2cf5f...91a7d73c

Testing looked good!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26424#issuecomment-3280625737

From mli at openjdk.org  Thu Sep 11 13:26:07 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 11 Sep 2025 13:26:07 GMT
Subject: RFR: 8367406: Simple refactoring
 AOTCodeAddressTable::id_for_address
In-Reply-To: <1TN7O_LAUKJXrMDQFCQWVbqK_z1EGwv19lIKZkIRA9U=.c7edd78b-a253-4284-afbc-a775e1983b0b@github.com>
References: <-15yudSPZOyKnpwNY9mTVKdXDu4hcjxqZxI2AXodi5Q=.9a3db977-a9d8-4061-8a04-39ce967eb550@github.com>
 <1TN7O_LAUKJXrMDQFCQWVbqK_z1EGwv19lIKZkIRA9U=.c7edd78b-a253-4284-afbc-a775e1983b0b@github.com>
Message-ID: <VDOnxGE4IQZ3hK_Xg8PTzA7qeBQfn0e0qAOf_ZlbE4w=.dad186c1-8ca8-411c-9e0f-af50199546a4@github.com>

On Thu, 11 Sep 2025 13:10:29 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

> I'm not convinced this is making anything simpler. Also, it is diverging from the code we have in the Leyden repo which caters for further cases.
> 
> If this code does merit a cleanup (which I agree is the case) the that should really wait until we have
> 
> 1. folded in cases currently catered for in Leyden premain that deal with translation of stub addresses
> 2. worked out a better way of managing addresses than the current use of several ad hoc bucket lists

I see, thanks for the information.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27217#issuecomment-3280665372

From dlunden at openjdk.org  Thu Sep 11 13:43:26 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Thu, 11 Sep 2025 13:43:26 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp
In-Reply-To: <V01kxeeWm3UAJritt7R3PAFxS8SsVtxcttokr2Y-x84=.512a732e-de28-454a-bed6-ba9b2fb5979b@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <2unG-RdDR2e1mI-veaR3AdDGGs1q4XFdITrnQtBGOw8=.47f7d565-b722-434b-96ef-b51ed733b241@github.com>
 <V01kxeeWm3UAJritt7R3PAFxS8SsVtxcttokr2Y-x84=.512a732e-de28-454a-bed6-ba9b2fb5979b@github.com>
Message-ID: <1xDonJ67G3hUAWTdngutIb7LBboWxHRviCHXKDCSoN4=.2617f8e9-206b-424d-a1ab-501b182717bb@github.com>

On Thu, 11 Sep 2025 12:41:50 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 78:
>> 
>>> 76:     // is something like 90+ parameters.
>>> 77:     int       _RM_INT[RM_SIZE_IN_INTS];
>>> 78:     uintptr_t _RM_WORD[_RM_SIZE_IN_WORDS];
>> 
>> Is there now still a reason to have `_` for the words and not for the ints?
>
> Generally, we use `_` for fields, but not for constants.
> Also: fields should be lower-case, so maybe `_RM_INT` -> `_rm_int`?

Thanks, I agree that it seems more consistent to use `_rm_int` and `_rm_word` instead. The missing leading underscore for `RM_SIZE_IN_INTS` highlights that it is a macro, unlike `_RM_SIZE_IN_WORDS`. Maybe this is just for historical reasons and not up to date with today's conventions? 

Do we classify constant static fields such as `_RM_SIZE_IN_WORDS` as constants or fields? I.e., do we use upper or lower case? I guess it would be `_rm_size_in_words` if considered a field and `RM_SIZE_IN_WORDS` (without the leading underscore) if considered a constant.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2340921637

From dlunden at openjdk.org  Thu Sep 11 14:01:43 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Thu, 11 Sep 2025 14:01:43 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Lowercase _RM_INT and _RM_WORD

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27215/files
  - new: https://git.openjdk.org/jdk/pull/27215/files/67381d34..61ff4f8c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=00-01

  Stats: 50 lines in 2 files changed: 0 ins; 0 del; 50 mod
  Patch: https://git.openjdk.org/jdk/pull/27215.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27215/head:pull/27215

PR: https://git.openjdk.org/jdk/pull/27215

From sparasa at openjdk.org  Thu Sep 11 16:28:21 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Thu, 11 Sep 2025 16:28:21 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v5]
In-Reply-To: <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
Message-ID: <JF-5csR-8C7x2ooGamkx5B1s1eY25ehxH0mc-ngL53k=.d6626a6c-6947-4b4d-a026-1e2956c2b216@github.com>

On Thu, 11 Sep 2025 00:45:45 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
>> 
>> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
>> 
>> For example:
>> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
>> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   undo new match rules for RegMemReg for commutative operations

Hi Emanuel (@eme64),

Could you please run the tests for this PR?

Thanks,
Vamsi

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26997#issuecomment-3281736456

From sviswanathan at openjdk.org  Thu Sep 11 17:02:40 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Thu, 11 Sep 2025 17:02:40 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v11]
In-Reply-To: <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
Message-ID: <POHmORZoc4s0RudsNKQQ6qoGrHc9IHtZCblewvgAF3U=.69a95520-3f32-456f-8080-fb51649bb0d2@github.com>

On Thu, 11 Sep 2025 02:19:54 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
>> 1...
>
> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions

test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 496:

> 494:     public static final String CAST_F2X = PREFIX + "CAST_F2X" + POSTFIX;
> 495:     static {
> 496:         machOnlyNameRegex(CAST_F2X, "castF2X_reg_(av|eve)x");

This should be "castFtoX_reg_(av|eve)x".

test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 501:

> 499:     public static final String CAST_D2X = PREFIX + "CAST_D2X" + POSTFIX;
> 500:     static {
> 501:         machOnlyNameRegex(CAST_D2X, "castD2X_reg_(av|eve)x");

This should be "castDtoX_reg_(av|eve)x".

test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 506:

> 504:     public static final String CAST2_F2X = PREFIX + "CAST2_F2X" + POSTFIX;
> 505:     static {
> 506:         machOnlyNameRegex(CAST2_F2X, "cast2F2X_(reg|mem)_evex");

This should be "cast2FtoX_(reg|mem)_evex"

test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 511:

> 509:     public static final String CAST2_D2X = PREFIX + "CAST2_D2X" + POSTFIX;
> 510:     static {
> 511:         machOnlyNameRegex(CAST2_D2X, "cast2D2X_(reg|mem)_evex");

This should be "cast2DtoX_(reg|mem)_evex".

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2341775174
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2341782092
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2341784055
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2341787294

From cslucas at openjdk.org  Thu Sep 11 17:09:35 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Thu, 11 Sep 2025 17:09:35 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v3]
In-Reply-To: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
Message-ID: <9prBDcDkholUOVv1rNRDNQyrjCzn6FCESaQofSTwLN0=.c13a6b10-82ef-479b-b1d5-3102d7ea0165@github.com>

> Please, review this patch to fix issue that may occur when reducing allocation merge.
> 
> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
> 
> The change in `revisit_reducible_phi_status` is just a clean-up.
> The real fix is in `find_scalar_replaceable_allocs`.
> 
> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.

Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision:

  Update test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationNotReducibleAnymore.java
  
  Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27063/files
  - new: https://git.openjdk.org/jdk/pull/27063/files/17d5ab22..28d9432e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27063&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27063&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27063.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27063/head:pull/27063

PR: https://git.openjdk.org/jdk/pull/27063

From jbhateja at openjdk.org  Thu Sep 11 17:18:51 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Thu, 11 Sep 2025 17:18:51 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v5]
In-Reply-To: <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
Message-ID: <F-O1FLFCjIwZ4Qx80nqFI9D3nCJBfjj7TFsL10sYxWo=.a77a5c9c-e7c4-4d95-8ed6-a319fdb8be13@github.com>

On Thu, 11 Sep 2025 00:45:45 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
>> 
>> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
>> 
>> For example:
>> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
>> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   undo new match rules for RegMemReg for commutative operations

Hi @vamsi-parasa , Thanks for addressing my comments.

-------------

Marked as reviewed by jbhateja (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26997#pullrequestreview-3212908454

From jbhateja at openjdk.org  Thu Sep 11 17:31:14 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Thu, 11 Sep 2025 17:31:14 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v11]
In-Reply-To: <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
Message-ID: <8x67EDx2mHmRygqECi1m3BJ8kmBOpogaVvy-V_NnsUU=.f41f9f34-a559-491b-8d9d-8ae05a6890d3@github.com>

On Thu, 11 Sep 2025 02:19:54 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
>> 1...
>
> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions

src/hotspot/cpu/x86/x86.ad line 7719:

> 7717:             is_integral_type(Matcher::vector_element_basic_type(n)));
> 7718:   match(Set dst (VectorCastF2X src));
> 7719:   format %{ "vector_cast2r_f2x $dst, $src\t!" %}

Suggestion:

  format %{ "vector_cast_f2x_saturating $dst, $src\t!" %}

src/hotspot/cpu/x86/x86.ad line 7732:

> 7730:             is_integral_type(Matcher::vector_element_basic_type(n)));
> 7731:   match(Set dst (VectorCastF2X (LoadVector src)));
> 7732:   format %{ "vector_cast2m_f2x $dst, $src\t!" %}

Suggestion:

  format %{ "vector_cast_f2x_saturating $dst, $src\t!" %}

src will be represented by appropriate addressing scheme for the memory operand

src/hotspot/cpu/x86/x86.ad line 7793:

> 7791:             is_integral_type(Matcher::vector_element_basic_type(n)));
> 7792:   match(Set dst (VectorCastD2X src));
> 7793:   format %{ "vector_cast2r_d2x $dst, $src\t!" %}

Suggestion:

  format %{ "vector_cast_d2x_saturating $dst, $src\t!" %}

src/hotspot/cpu/x86/x86.ad line 7806:

> 7804:             is_integral_type(Matcher::vector_element_basic_type(n)));
> 7805:   match(Set dst (VectorCastD2X (LoadVector src)));
> 7806:   format %{ "vector_cast2m_d2x $dst, $src\t!" %}

Suggestion:

  format %{ "vector_cast_d2x_saturating $dst, $src\t!" %}

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2341851882
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2341859872
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2341861234
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2341861814

From hgreule at openjdk.org  Thu Sep 11 17:42:46 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Thu, 11 Sep 2025 17:42:46 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v8]
In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
Message-ID: <19JdaOkvM92QSjXvYVr1CNSXD5hkXINl1gh6qj-DCMQ=.6b268ebd-6c9a-4b33-b355-1dc41de53454@github.com>

> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
> 
> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
> 
> ### Monotonicity
> 
> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
> 
> ### Testing
> 
> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
> 
> Please review and let me know what you think.
> 
> ### Other
> 
> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
> 
> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.

Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:

  address comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25254/files
  - new: https://git.openjdk.org/jdk/pull/25254/files/5c74919a..41d0e2c7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=06-07

  Stats: 7 lines in 1 file changed: 0 ins; 3 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/25254.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254

PR: https://git.openjdk.org/jdk/pull/25254

From hgreule at openjdk.org  Thu Sep 11 17:42:47 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Thu, 11 Sep 2025 17:42:47 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value
In-Reply-To: <EOLqK3ulrKNtgzmlWbNpwvCdg8sBaABmXNGdlucIurI=.7ce09643-efd1-4e3f-91f1-6e8040f4a51f@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <EOLqK3ulrKNtgzmlWbNpwvCdg8sBaABmXNGdlucIurI=.7ce09643-efd1-4e3f-91f1-6e8040f4a51f@github.com>
Message-ID: <hScWI2VL-Cc2H-kQUfhd32fPCAkbXLHCUNh-2XZutsE=.2518bc77-83c4-4b85-af22-3230fe310130@github.com>

On Thu, 15 May 2025 17:47:16 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
>> Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
> 
> Can we return `Type::TOP` instead?
> 
> Besides, #17508 should be merged right after JDK-25 folk, do you want to wait for it first?

@merykitty thanks, I hopefully addressed your comments :)

@eme64 do you want to re-run the tests once again?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3282030670

From vlivanov at openjdk.org  Thu Sep 11 18:13:33 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 11 Sep 2025 18:13:33 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v10]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <dBG9F5UPHM_kth5KLrKjXvA23-MLzeKiMYYhLJsxsmk=.c1589edc-7eb0-4d87-a956-383b0ece2aa1@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision:

 - minor fixes
 - Fix guaranteed_safepoint usage

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25315/files
  - new: https://git.openjdk.org/jdk/pull/25315/files/6981bd18..267995ce

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=08-09

  Stats: 54 lines in 5 files changed: 33 ins; 14 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From vlivanov at openjdk.org  Thu Sep 11 18:18:13 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 11 Sep 2025 18:18:13 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:

  Minor fix

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25315/files
  - new: https://git.openjdk.org/jdk/pull/25315/files/267995ce..01eaf64f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=09-10

  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From vlivanov at openjdk.org  Thu Sep 11 18:28:12 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 11 Sep 2025 18:28:12 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
 <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
Message-ID: <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>

On Wed, 3 Sep 2025 08:30:47 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>>> Representing ReachabilityFence as memory barrier (e.g., MemBarCPUOrder) would solve the issue, but performance costs are prohibitively high.
>> 
>>> How bad is it? MemBarCPUOrder pinches all memory, so I assume this breaks a lot of optimizations when RF is sitting in the hot loop? I remember we went through a similar exercise with Blackholes: [JDK-8296545](https://bugs.openjdk.org/browse/JDK-8296545) -- and decided to pinch only the control. I guessing this is not enough to fix RF, or is it?
>> 
>> Yes, if a barrier stays inside loop body, it breaks a lot of important optimizations. It may end up almost as bad as a full-blown call (except a barrier can be moved around while a call can't). And moving a node when it depends both on control and memory is more complicated than just a CFG node. Moreover, as you can see in the proposed solution, even CFG-only representation is problematic for loop opts, so additional care is needed to ensure RFs are moved out of loops. 
>> 
>> As an alternative approach, I thought about reifying RF as a data node (think of `CastPP`) and then linking its referent to all safepoints it dominates after loop opts are over.  But that would only affect `optimize_reachability_fences()`. Everything else  would stay the same. So, I decided to stay with CFG-only representation for now.
>
> @iwanowww Let me know whenever this is ready to review again ?

@eme64 I think I addressed/answered all your suggestions/questions. Please, take another look. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3282162627

From vlivanov at openjdk.org  Thu Sep 11 18:28:14 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 11 Sep 2025 18:28:14 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <L0LfhTmGAmKPwwFXYaibWSIA3rLaL8j1xL4OL4XkutY=.70bbb39a-5e86-4a80-952b-d3b98a2a4a36@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <UKkT1Wqi4ftj3eGF2KzT8saeWoWSBTXx5kw0FOiJyLE=.c10dbf15-3348-495b-b9aa-556b78bc1e0b@github.com>
 <ejAu9M0FYELqOdzDW8uankmdRt0w8bloAwcxWcyx5k0=.9a47c6c4-e9df-40f5-aba9-23073a12bd17@github.com>
 <L0LfhTmGAmKPwwFXYaibWSIA3rLaL8j1xL4OL4XkutY=.70bbb39a-5e86-4a80-952b-d3b98a2a4a36@github.com>
Message-ID: <CwgjtkeUgu6FVHvMFSeXZ3BOrQaeZU8ZaUsp3ptgXfI=.59baa9b4-aa55-4242-8a7f-44cd08caf990@github.com>

On Mon, 8 Sep 2025 12:55:36 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Well, it's a SafePointNode class after all. I lifted it from `CallNode` subclass to avoid elaborate check on SafePoint nodes (!is_Call() || as_Call() && guaranteed_safepoint()`)).
>> 
>> If some node extends SafePointNode, but doesn't keep JVM state, it has to communicate it to users one way or another. And changing the default doesn't improve the situation IMO: reporting a safepoint node as a non-safepoint is still a bug.
>
> Hmm. The way it is formulated it sounds more like:
> - `true` -> we are guaranteed that it is a safepoint.
> - `false` -> it may or may not be a safepoint - no guarantees.
> Am I understanding this right?
> 
> If yes, then it would make more sense to have a default that is `no guarantee`. But maybe that makes things more complicated in other ways. All I'm saying it makes me nervous ;)

You are right. I studied the code and `guaranteed_safepoint()` behaves as you described. It doesn't work for RF purposes, so I migrated the code to `sfpt->jvms() != nullptr` check and fixed a bug along the way. The changes related to `guaranteed_safepoint()` are reverted.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2341997278

From sviswanathan at openjdk.org  Thu Sep 11 21:02:24 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Thu, 11 Sep 2025 21:02:24 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v11]
In-Reply-To: <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
Message-ID: <JdF1SElMVjuCvk2UusfQZt5WSHQmnpXv5zmVajHjEDQ=.a15a9160-0347-44c8-b815-492ac2690476@github.com>

On Thu, 11 Sep 2025 02:19:54 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
>> 1...
>
> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions

src/hotspot/cpu/x86/x86.ad line 7715:

> 7713: %}
> 7714: 
> 7715: instruct cast2FtoX_reg_evex(vec dst, vec src) %{

Could be named as castFtoX_reg_avx10.

src/hotspot/cpu/x86/x86.ad line 7728:

> 7726: %}
> 7727: 
> 7728: instruct cast2FtoX_mem_evex(vec dst, memory src) %{

Could be named as  castFtoX_mem_avx10.

src/hotspot/cpu/x86/x86.ad line 7789:

> 7787: %}
> 7788: 
> 7789: instruct cast2DtoX_reg_evex(vec dst, vec src) %{

Could be named as castDtoX_reg_avx10.

src/hotspot/cpu/x86/x86.ad line 7802:

> 7800: %}
> 7801: 
> 7802: instruct cast2DtoX_mem_evex(vec dst, memory src) %{

Could be named as castDtoX_mem_avx10.

src/hotspot/cpu/x86/x86_64.ad line 11728:

> 11726: %}
> 11727: 
> 11728: instruct conv2F2I_reg_reg(rRegI dst, regF src)

Could be named as convF2I_reg_reg_avx10.

src/hotspot/cpu/x86/x86_64.ad line 11739:

> 11737: %}
> 11738: 
> 11739: instruct conv2F2I_reg_mem(rRegI dst, memory src)

Could be named as convF2I_reg_mem_avx10.

src/hotspot/cpu/x86/x86_64.ad line 11762:

> 11760: %}
> 11761: 
> 11762: instruct conv2F2L_reg_reg(rRegL dst, regF src)

Could be named as convF2L_reg_reg_avx10

src/hotspot/cpu/x86/x86_64.ad line 11773:

> 11771: %}
> 11772: 
> 11773: instruct conv2F2L_reg_mem(rRegL dst, memory src)

Could be named as convF2L_reg_mem_avx10

src/hotspot/cpu/x86/x86_64.ad line 11796:

> 11794: %}
> 11795: 
> 11796: instruct conv2D2I_reg_reg(rRegI dst, regD src)

Could be named as convD2I_reg_reg_avx10.

src/hotspot/cpu/x86/x86_64.ad line 11807:

> 11805: %}
> 11806: 
> 11807: instruct conv2D2I_reg_mem(rRegI dst, memory src)

Could be named as convD2I_reg_mem_avx10.

src/hotspot/cpu/x86/x86_64.ad line 11830:

> 11828: %}
> 11829: 
> 11830: instruct conv2D2L_reg_reg(rRegL dst, regD src)

Could be named as convD2L_reg_reg_avx10.

src/hotspot/cpu/x86/x86_64.ad line 11841:

> 11839: %}
> 11840: 
> 11841: instruct conv2D2L_reg_mem(rRegL dst, memory src)

Could be named as convD2L_reg_mem_avx10.

test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 494:

> 492:     }
> 493: 
> 494:     public static final String CAST_F2X = PREFIX + "CAST_F2X" + POSTFIX;

May be we can name CAST_F2X as X86_VCAST_F2X and CAST2_F2X as X86_VCAST_F2X_AVX10.
Then we can use the similar theme for other names below as well.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342294949
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342295729
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342298160
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342300778
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342302012
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342304990
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342305755
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342306364
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342307288
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342308076
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342309412
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342310391
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342326871

From sviswanathan at openjdk.org  Thu Sep 11 21:02:26 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Thu, 11 Sep 2025 21:02:26 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v11]
In-Reply-To: <JdF1SElMVjuCvk2UusfQZt5WSHQmnpXv5zmVajHjEDQ=.a15a9160-0347-44c8-b815-492ac2690476@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
 <JdF1SElMVjuCvk2UusfQZt5WSHQmnpXv5zmVajHjEDQ=.a15a9160-0347-44c8-b815-492ac2690476@github.com>
Message-ID: <FHmMvnWk8PthQKVHgNFxsipPXSbvqwFKkk8ETk-bujg=.d3c49857-415c-44e1-853f-5be85161a41c@github.com>

On Thu, 11 Sep 2025 20:50:12 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions
>
> src/hotspot/cpu/x86/x86_64.ad line 11841:
> 
>> 11839: %}
>> 11840: 
>> 11841: instruct conv2D2L_reg_mem(rRegL dst, memory src)
> 
> Could be named as convD2L_reg_mem_avx10.

IRNode.java will need name regex changes accordingly.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342314412

From missa at openjdk.org  Thu Sep 11 23:10:44 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 11 Sep 2025 23:10:44 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v12]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <t2javAPv4fqPJOS4or2dIL2lU1jcI6F_Dk88kPEJ2KE=.c9607d43-3d92-437c-8d6d-73558b55b0dd@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:

 - Change the floating point conversion instruction, IR nodes, and test rules to make them clearer
 - Change debug text format of AVX 10.2 vector conversion instructions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/8587952d..df175756

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=10-11

  Stats: 180 lines in 7 files changed: 60 ins; 60 del; 60 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From missa at openjdk.org  Thu Sep 11 23:10:54 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 11 Sep 2025 23:10:54 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v11]
In-Reply-To: <JdF1SElMVjuCvk2UusfQZt5WSHQmnpXv5zmVajHjEDQ=.a15a9160-0347-44c8-b815-492ac2690476@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
 <JdF1SElMVjuCvk2UusfQZt5WSHQmnpXv5zmVajHjEDQ=.a15a9160-0347-44c8-b815-492ac2690476@github.com>
Message-ID: <0wYVVPSsr5S3QTcGPkM0dmXLwJq_ff1yOZCOpqlyMMo=.c0585691-bf34-4da8-9dda-3d7bd2c9339f@github.com>

On Thu, 11 Sep 2025 20:42:16 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions
>
> src/hotspot/cpu/x86/x86.ad line 7715:
> 
>> 7713: %}
>> 7714: 
>> 7715: instruct cast2FtoX_reg_evex(vec dst, vec src) %{
> 
> Could be named as castFtoX_reg_avx10.

Renamed

> src/hotspot/cpu/x86/x86.ad line 7728:
> 
>> 7726: %}
>> 7727: 
>> 7728: instruct cast2FtoX_mem_evex(vec dst, memory src) %{
> 
> Could be named as  castFtoX_mem_avx10.

Renamed

> src/hotspot/cpu/x86/x86.ad line 7789:
> 
>> 7787: %}
>> 7788: 
>> 7789: instruct cast2DtoX_reg_evex(vec dst, vec src) %{
> 
> Could be named as castDtoX_reg_avx10.

Renamed

> src/hotspot/cpu/x86/x86.ad line 7802:
> 
>> 7800: %}
>> 7801: 
>> 7802: instruct cast2DtoX_mem_evex(vec dst, memory src) %{
> 
> Could be named as castDtoX_mem_avx10.

Renamed

> src/hotspot/cpu/x86/x86_64.ad line 11728:
> 
>> 11726: %}
>> 11727: 
>> 11728: instruct conv2F2I_reg_reg(rRegI dst, regF src)
> 
> Could be named as convF2I_reg_reg_avx10.

Renamed

> src/hotspot/cpu/x86/x86_64.ad line 11739:
> 
>> 11737: %}
>> 11738: 
>> 11739: instruct conv2F2I_reg_mem(rRegI dst, memory src)
> 
> Could be named as convF2I_reg_mem_avx10.

Renamed

> src/hotspot/cpu/x86/x86_64.ad line 11762:
> 
>> 11760: %}
>> 11761: 
>> 11762: instruct conv2F2L_reg_reg(rRegL dst, regF src)
> 
> Could be named as convF2L_reg_reg_avx10

Renamed

> src/hotspot/cpu/x86/x86_64.ad line 11773:
> 
>> 11771: %}
>> 11772: 
>> 11773: instruct conv2F2L_reg_mem(rRegL dst, memory src)
> 
> Could be named as convF2L_reg_mem_avx10

Renamed

> src/hotspot/cpu/x86/x86_64.ad line 11796:
> 
>> 11794: %}
>> 11795: 
>> 11796: instruct conv2D2I_reg_reg(rRegI dst, regD src)
> 
> Could be named as convD2I_reg_reg_avx10.

Renamed

> src/hotspot/cpu/x86/x86_64.ad line 11807:
> 
>> 11805: %}
>> 11806: 
>> 11807: instruct conv2D2I_reg_mem(rRegI dst, memory src)
> 
> Could be named as convD2I_reg_mem_avx10.

Renamed

> src/hotspot/cpu/x86/x86_64.ad line 11830:
> 
>> 11828: %}
>> 11829: 
>> 11830: instruct conv2D2L_reg_reg(rRegL dst, regD src)
> 
> Could be named as convD2L_reg_reg_avx10.

Renamed

> test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 494:
> 
>> 492:     }
>> 493: 
>> 494:     public static final String CAST_F2X = PREFIX + "CAST_F2X" + POSTFIX;
> 
> May be we can name CAST_F2X as X86_VCAST_F2X and CAST2_F2X as X86_VCAST_F2X_AVX10.
> Then we can use the similar theme for other names below as well.

Renamed

> test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 496:
> 
>> 494:     public static final String CAST_F2X = PREFIX + "CAST_F2X" + POSTFIX;
>> 495:     static {
>> 496:         machOnlyNameRegex(CAST_F2X, "castF2X_reg_(av|eve)x");
> 
> This should be "castFtoX_reg_(av|eve)x".

Fixed

> test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 501:
> 
>> 499:     public static final String CAST_D2X = PREFIX + "CAST_D2X" + POSTFIX;
>> 500:     static {
>> 501:         machOnlyNameRegex(CAST_D2X, "castD2X_reg_(av|eve)x");
> 
> This should be "castDtoX_reg_(av|eve)x".

Fixed

> test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 506:
> 
>> 504:     public static final String CAST2_F2X = PREFIX + "CAST2_F2X" + POSTFIX;
>> 505:     static {
>> 506:         machOnlyNameRegex(CAST2_F2X, "cast2F2X_(reg|mem)_evex");
> 
> This should be "cast2FtoX_(reg|mem)_evex"

Fixed

> test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 511:
> 
>> 509:     public static final String CAST2_D2X = PREFIX + "CAST2_D2X" + POSTFIX;
>> 510:     static {
>> 511:         machOnlyNameRegex(CAST2_D2X, "cast2D2X_(reg|mem)_evex");
> 
> This should be "cast2DtoX_(reg|mem)_evex".

Fixed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342544829
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342545092
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342545345
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342545547
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342545846
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342546207
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342546448
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342546884
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342547335
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342547642
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342547891
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342548440
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342543063
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342543272
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342543522
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342543785

From missa at openjdk.org  Thu Sep 11 23:11:00 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 11 Sep 2025 23:11:00 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v11]
In-Reply-To: <FHmMvnWk8PthQKVHgNFxsipPXSbvqwFKkk8ETk-bujg=.d3c49857-415c-44e1-853f-5be85161a41c@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
 <JdF1SElMVjuCvk2UusfQZt5WSHQmnpXv5zmVajHjEDQ=.a15a9160-0347-44c8-b815-492ac2690476@github.com>
 <FHmMvnWk8PthQKVHgNFxsipPXSbvqwFKkk8ETk-bujg=.d3c49857-415c-44e1-853f-5be85161a41c@github.com>
Message-ID: <ZVGPnvfa52A05pHj3KdwoMK0SHkHArU1w24tVgjh2pg=.6d2f288d-a1f8-4b01-9814-7e5d86d41cff@github.com>

On Thu, 11 Sep 2025 20:51:47 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> src/hotspot/cpu/x86/x86_64.ad line 11841:
>> 
>>> 11839: %}
>>> 11840: 
>>> 11841: instruct conv2D2L_reg_mem(rRegL dst, memory src)
>> 
>> Could be named as convD2L_reg_mem_avx10.
>
> IRNode.java will need name regex changes accordingly.

Renamed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342548095

From missa at openjdk.org  Thu Sep 11 23:10:58 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 11 Sep 2025 23:10:58 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v11]
In-Reply-To: <8x67EDx2mHmRygqECi1m3BJ8kmBOpogaVvy-V_NnsUU=.f41f9f34-a559-491b-8d9d-8ae05a6890d3@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <TtiXR9rxmvSXMqM8PsGX6U7_Mg67yjXRw7fl-92oMxM=.e0f48991-d58f-4f0d-84be-59e521b832dc@github.com>
 <8x67EDx2mHmRygqECi1m3BJ8kmBOpogaVvy-V_NnsUU=.f41f9f34-a559-491b-8d9d-8ae05a6890d3@github.com>
Message-ID: <Q6ybs4-IUtJayC0RIq1DLjknQE7xZqDe_DG2Uow67tU=.0b45cd7b-7bdc-4d77-976f-d1d68356ef2d@github.com>

On Thu, 11 Sep 2025 17:20:29 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions
>
> src/hotspot/cpu/x86/x86.ad line 7719:
> 
>> 7717:             is_integral_type(Matcher::vector_element_basic_type(n)));
>> 7718:   match(Set dst (VectorCastF2X src));
>> 7719:   format %{ "vector_cast2r_f2x $dst, $src\t!" %}
> 
> Suggestion:
> 
>   format %{ "vector_cast_f2x_saturating $dst, $src\t!" %}

Updated

> src/hotspot/cpu/x86/x86.ad line 7732:
> 
>> 7730:             is_integral_type(Matcher::vector_element_basic_type(n)));
>> 7731:   match(Set dst (VectorCastF2X (LoadVector src)));
>> 7732:   format %{ "vector_cast2m_f2x $dst, $src\t!" %}
> 
> Suggestion:
> 
>   format %{ "vector_cast_f2x_saturating $dst, $src\t!" %}
> 
> src will be represented by appropriate addressing scheme for the memory operand

Updated

> src/hotspot/cpu/x86/x86.ad line 7793:
> 
>> 7791:             is_integral_type(Matcher::vector_element_basic_type(n)));
>> 7792:   match(Set dst (VectorCastD2X src));
>> 7793:   format %{ "vector_cast2r_d2x $dst, $src\t!" %}
> 
> Suggestion:
> 
>   format %{ "vector_cast_d2x_saturating $dst, $src\t!" %}

Updated

> src/hotspot/cpu/x86/x86.ad line 7806:
> 
>> 7804:             is_integral_type(Matcher::vector_element_basic_type(n)));
>> 7805:   match(Set dst (VectorCastD2X (LoadVector src)));
>> 7806:   format %{ "vector_cast2m_d2x $dst, $src\t!" %}
> 
> Suggestion:
> 
>   format %{ "vector_cast_d2x_saturating $dst, $src\t!" %}

Updated

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342543956
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342544232
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342544450
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342544575

From dlong at openjdk.org  Thu Sep 11 23:50:22 2025
From: dlong at openjdk.org (Dean Long)
Date: Thu, 11 Sep 2025 23:50:22 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
Message-ID: <V9oqcN4DtpsHHax36QmUsRqz_KWQR-nXcEavdbNxpys=.0ca41f4e-999b-487c-8780-7ddfdcfa4d38@github.com>

On Thu, 11 Sep 2025 14:01:43 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Lowercase _RM_INT and _RM_WORD

src/hotspot/share/opto/chaitin.cpp line 1580:

> 1578:     _ifg->re_insert(lidx);
> 1579:     if( !lrg->alive() ) continue;
> 1580:     // capture allstackedness flag before mask is hacked

allstackedness --> infiniteness?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2342594775

From dlong at openjdk.org  Thu Sep 11 23:50:23 2025
From: dlong at openjdk.org (Dean Long)
Date: Thu, 11 Sep 2025 23:50:23 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <1xDonJ67G3hUAWTdngutIb7LBboWxHRviCHXKDCSoN4=.2617f8e9-206b-424d-a1ab-501b182717bb@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <2unG-RdDR2e1mI-veaR3AdDGGs1q4XFdITrnQtBGOw8=.47f7d565-b722-434b-96ef-b51ed733b241@github.com>
 <V01kxeeWm3UAJritt7R3PAFxS8SsVtxcttokr2Y-x84=.512a732e-de28-454a-bed6-ba9b2fb5979b@github.com>
 <1xDonJ67G3hUAWTdngutIb7LBboWxHRviCHXKDCSoN4=.2617f8e9-206b-424d-a1ab-501b182717bb@github.com>
Message-ID: <JnEOgQliunSIdlAAUkgdvSDzJqjCwKOSn4pwVW3ZD2Q=.7a7a0f01-bc7b-4026-a42d-d62be7d24e7b@github.com>

On Thu, 11 Sep 2025 13:40:35 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Generally, we use `_` for fields, but not for constants.
>> Also: fields should be lower-case, so maybe `_RM_INT` -> `_rm_int`?
>
> Thanks, I agree that it seems more consistent to use `_rm_int` and `_rm_word` instead. The missing leading underscore for `RM_SIZE_IN_INTS` highlights that it is a macro, unlike `_RM_SIZE_IN_WORDS`. Maybe this is just for historical reasons and not up to date with today's conventions? 
> 
> Do we classify constant static fields such as `_RM_SIZE_IN_WORDS` as constants or fields? I.e., do we use upper or lower case? I guess it would be `_rm_size_in_words` if considered a field and `RM_SIZE_IN_WORDS` (without the leading underscore) if considered a constant.

I vote for `RM_SIZE_IN_WORDS` because it is a constant, the same as if it was a value from an enum.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2342592026

From dlong at openjdk.org  Fri Sep 12 00:12:19 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 12 Sep 2025 00:12:19 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
Message-ID: <WEjhW8W7zNdMKusm1NMRY3-vgNEa_ssBm0hdcLp2_eM=.fd6329a9-98e5-4282-b655-29de105cea8c@github.com>

On Thu, 11 Sep 2025 14:01:43 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Lowercase _RM_INT and _RM_WORD

src/hotspot/share/opto/regmask.hpp line 66:

> 64: 
> 65:   static const unsigned int _WordBitMask = BitsPerWord - 1U;
> 66:   static const unsigned int _LogWordBits = LogBitsPerWord;

What about just replacing all uses of _LogWordBits with LogBitsPerWord?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2342617346

From dlong at openjdk.org  Fri Sep 12 00:21:19 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 12 Sep 2025 00:21:19 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
Message-ID: <dihhaMNeMsnjzbZqg33g-nt8W-AlgH6gFhPKGl1yKfs=.8e24a156-b88d-4454-a63b-b2a060174cb6@github.com>

On Thu, 11 Sep 2025 14:01:43 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Lowercase _RM_INT and _RM_WORD

src/hotspot/share/opto/regmask.hpp line 166:

> 164:   // indefinitely with ONE bits.  Returns TRUE if mask is infinite or
> 165:   // unbounded in size.  Returns FALSE if mask is finite size.
> 166:   bool is_infinite() const {

"infinite" hides the fact that these unbounded bits are stack bits and not register bits, but `is_UnboundedStack` or `is_InfiniteStack` might be too verbose.  How does `is_InfStack` sound?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2342628324

From sviswanathan at openjdk.org  Fri Sep 12 00:24:20 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Fri, 12 Sep 2025 00:24:20 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v12]
In-Reply-To: <t2javAPv4fqPJOS4or2dIL2lU1jcI6F_Dk88kPEJ2KE=.c9607d43-3d92-437c-8d6d-73558b55b0dd@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <t2javAPv4fqPJOS4or2dIL2lU1jcI6F_Dk88kPEJ2KE=.c9607d43-3d92-437c-8d6d-73558b55b0dd@github.com>
Message-ID: <r7BF7aD9Fdk0lipCH8Z0UBddG3buXXIa3SsA3smDNvc=.b5e36dc9-4d07-4b8d-abd4-7d449842e85b@github.com>

On Thu, 11 Sep 2025 23:10:44 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
>> 1...
>
> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Change the floating point conversion instruction, IR nodes, and test rules to make them clearer
>  - Change debug text format of AVX 10.2 vector conversion instructions

src/hotspot/cpu/x86/x86.ad line 7669:

> 7667:   predicate(!VM_Version::supports_avx10_2() &&
> 7668:             !VM_Version::supports_avx512vl() &&
> 7669:             Matcher::vector_length_in_bytes(n->in(1)) < 64 &&

Good to add "is_integral_type(Matcher::vector_element_basic_type(n)) &&" here.

test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java line 26:

> 24: /**
> 25: * @test
> 26: * @bug 8287835 8320347

Did you mean 8364305 here?

test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 364:

> 362:         applyIfCPUFeatureAnd = {"avx2", "true", "avx10_2", "false"})
> 363:     @IR(counts = {IRNode.X86_VCAST_F2X_AVX10, "> 0"},
> 364:         applyIfCPUFeature = {"avx10_2", "true"})

Need to add the following for X86_VCAST_F2X as well as X86_VCAST_F2X_AVX10.
applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"},

test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 387:

> 385:         applyIfCPUFeatureAnd = {"avx2", "true", "avx10_2", "false"})
> 386:     @IR(counts = {IRNode.X86_VCAST_F2X_AVX10, "> 0"},
> 387:         applyIfCPUFeature = {"avx10_2", "true"})

Need to add the following for X86_VCAST_F2X as well as X86_VCAST_F2X_AVX10.
applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"},

test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 413:

> 411:         applyIfCPUFeatureAnd = {"avx", "true", "avx10_2", "false"})
> 412:     @IR(counts = {IRNode.X86_VCAST_D2X_AVX10, "> 0"},
> 413:         applyIfCPUFeature = {"avx10_2", "true"})

Need to add the following for X86_VCAST_D2X and X86_VCAST_D2X_AVX10:
applyIf = {"MaxVectorSize", ">=16"},

test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 432:

> 430:         applyIfCPUFeatureAnd = {"avx", "true", "avx10_2", "false"})
> 431:     @IR(counts = {IRNode.X86_VCAST_D2X_AVX10, "> 0"},
> 432:         applyIfCPUFeature = {"avx10_2", "true"})

Need to add the following for X86_VCAST_D2X and X86_VCAST_D2X_AVX10:
applyIf = {"MaxVectorSize", ">=16"},

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342571300
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342620816
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342615073
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342615727
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342618205
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2342618455

From dlong at openjdk.org  Fri Sep 12 01:04:21 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 12 Sep 2025 01:04:21 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
Message-ID: <BuKfkAAcusJ6TNHSHtVaYYcmjnAVTIInXbhd4Z5Fg5w=.067f6b09-67e0-4b97-9753-c727c67343ca@github.com>

On Tue, 9 Sep 2025 11:27:50 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> An `Initialize` node for an `Allocate` node is created with a memory
>> `Proj` of adr type raw memory. In order for stores to be captured, the
>> memory state out of the allocation is a `MergeMem` with slices for the
>> various object fields/array element set to the raw memory `Proj` of
>> the `Initialize` node. If `Phi`s need to be created during later
>> transformations from this memory state, The `Phi` for a particular
>> slice gets its adr type from the type of the `Proj` which is raw
>> memory. If during macro expansion, the `Allocate` is found to have no
>> use and so can be removed, the `Proj` out of the `Initialize` is
>> replaced by the memory state on input to the `Allocate`. A `Phi` for
>> some slice for a field of an object will end up with the raw memory
>> state on input to the `Allocate` node. As a result, memory state at
>> the `Phi` is incorrect and incorrect execution can happen.
>> 
>> The fix I propose is, rather than have a single `Proj` for the memory
>> state out of the `Initialize` with adr type raw memory, to use one
>> `Proj` per slice added to the memory state after the `Initalize`. Each
>> of the `Proj` should return the right adr type for its slice. For that
>> I propose having a new type of `Proj`: `NarrowMemProj` that captures
>> the right adr type.
>> 
>> Logic for the construction of the `Allocate`/`Initialize` subgraph is
>> tweaked so the right adr type captured in is own `NarrowMemProj` is
>> added to the memory sugraph. Code that removes an allocation or moves
>> it also has to be changed so it correctly takes the multiple memory
>> projections out of the `Initialize` node into account.
>> 
>> One tricky issue is that when EA split types for a scalar replaceable
>> `Allocate` node:
>> 
>> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>>   with the type of the slices for the allocation
>>   
>> 2- before EA, the memory state for one particular field out of the
>>   `Initialize` node can be used for a `Store` to the just allocated
>>   object or some other. So we can have a chain of `Store`s, some to
>>   the newly allocated object, some to some other objects, all of them
>>   using the state of `NarrowMemProj` out of the `Initialize`. After
>>   split unique types, the `NarrowMemProj` is for the slice of a
>>   particular allocation. So `Store`s to some other objects shouldn't
>>   use that memory state but the memory state before the `Allocate`.
>>   
>> For that, I added logic to update the adr type of `NarrowMemProj`
>> during split uni...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
> 
>  - more
>  - Merge branch 'master' into JDK-8327963
>  - more
>  - more
>  - Merge branch 'master' into JDK-8327963
>  - more
>  - more
>  - lambda return
>  - lambda clean up
>  - Merge branch 'master' into JDK-8327963
>  - ... and 35 more: https://git.openjdk.org/jdk/compare/e16c5100...b701d03e

src/hotspot/share/opto/loopTransform.cpp line 3992:

> 3990:   Node* frame = new ParmNode(C->start(), TypeFunc::FramePtr);
> 3991:   _igvn.register_new_node_with_optimizer(frame);
> 3992:   call->init_req(TypeFunc::FramePtr,  frame);

This seems unrelated.  Is it needed?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2342681526

From fyang at openjdk.org  Fri Sep 12 01:49:16 2025
From: fyang at openjdk.org (Fei Yang)
Date: Fri, 12 Sep 2025 01:49:16 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v6]
In-Reply-To: <rYU7bMi_JCAFJKjZdTciL8TTbwA_bomSs3EJzEORWRs=.53465c22-1450-4eae-a4e9-2c94e210d652@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <rYU7bMi_JCAFJKjZdTciL8TTbwA_bomSs3EJzEORWRs=.53465c22-1450-4eae-a4e9-2c94e210d652@github.com>
Message-ID: <XrddmzTqWtRBH2x8-z6GMGJPlplj0mgRuM93iT5_dLQ=.14cadbc1-c4fa-451f-815a-5b5447fad33a@github.com>

On Thu, 11 Sep 2025 09:19:59 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Hey, please consider!
>> 
>> A bunch of info in JBS entry, please read that also.
>> 
>> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
>> This patch restores them and removes this regression.
>> 
>> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
>> 
>> Please test on your hardware!
>> 
>> 
>> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
>> JDK-23 (last version with trampoline calls)
>> Mean: 3189.5827
>> Standard Deviation: 284.6478
>> 
>> JDK-25
>> Mean: 3424.8905
>> Standard Deviation: 222.2208
>> 
>> Patch:
>> Mean: 3144.8535
>> Standard Deviation: 229.2577
>> 
>> 
>> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision:
> 
>  - Review fix
>  - Merge branch 'master' into 8365926
>  - Merge branch 'master' into 8365926
>  - Review comments
>  - Review comments
>  - Merge branch 'master' into 8365926
>  - Spelling
>  - Merge branch 'master' into 8365926
>  - draft jal<->jalr

Still good to me.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26944#pullrequestreview-3214325209

From wenanjian at openjdk.org  Fri Sep 12 03:15:27 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Fri, 12 Sep 2025 03:15:27 GMT
Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v6]
In-Reply-To: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
References: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
Message-ID: <8wqGgE5DEY1mQm5SP3g0Y_LEn8q9ptTtbjY5MEQOCHE=.10c6adb9-8dcd-4f10-bf5b-bd5d0be4f053@github.com>

> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed.

Anjian Wen has updated the pull request incrementally with one additional commit since the last revision:

  fix the counter increase at limit and add test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25281/files
  - new: https://git.openjdk.org/jdk/pull/25281/files/6bd22c4e..0769db02

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=04-05

  Stats: 37 lines in 2 files changed: 29 ins; 1 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/25281.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281

PR: https://git.openjdk.org/jdk/pull/25281

From wenanjian at openjdk.org  Fri Sep 12 03:40:59 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Fri, 12 Sep 2025 03:40:59 GMT
Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v7]
In-Reply-To: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
References: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
Message-ID: <CK9saRbHrBxaXya098IIqpafnO3lI90UJ1ryPwuXP14=.0ed5e960-675a-4808-a96a-eae2c4f09e07@github.com>

> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed.

Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:

 - Merge branch 'openjdk:master' into aes_ctr
 - fix the counter increase at limit and add test
 - change format
 - update reg use and instruction
 - change some name and format
 - delete useless Label, change L_judge_used to L_slow_loop
 - add Flags and fix the stubid name
 - RISC-V: implement AES-CTR mode intrinsics

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25281/files
  - new: https://git.openjdk.org/jdk/pull/25281/files/0769db02..ff513708

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=05-06

  Stats: 82462 lines in 2415 files changed: 49550 ins; 22013 del; 10899 mod
  Patch: https://git.openjdk.org/jdk/pull/25281.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281

PR: https://git.openjdk.org/jdk/pull/25281

From epeter at openjdk.org  Fri Sep 12 05:52:12 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 05:52:12 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <WEjhW8W7zNdMKusm1NMRY3-vgNEa_ssBm0hdcLp2_eM=.fd6329a9-98e5-4282-b655-29de105cea8c@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
 <WEjhW8W7zNdMKusm1NMRY3-vgNEa_ssBm0hdcLp2_eM=.fd6329a9-98e5-4282-b655-29de105cea8c@github.com>
Message-ID: <Fr0jsNv9me1djItMgWmPIem2FO02xMz7SgRtKQS1Xks=.5d4efbea-ee61-4e5f-bb69-7341d1a12fb0@github.com>

On Fri, 12 Sep 2025 00:08:06 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Lowercase _RM_INT and _RM_WORD
>
> src/hotspot/share/opto/regmask.hpp line 66:
> 
>> 64: 
>> 65:   static const unsigned int _WordBitMask = BitsPerWord - 1U;
>> 66:   static const unsigned int _LogWordBits = LogBitsPerWord;
> 
> What about just replacing all uses of _LogWordBits with LogBitsPerWord?

Yes, that would be a good step in the right direction.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2343046545

From rehn at openjdk.org  Fri Sep 12 06:12:23 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Fri, 12 Sep 2025 06:12:23 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v6]
In-Reply-To: <XrddmzTqWtRBH2x8-z6GMGJPlplj0mgRuM93iT5_dLQ=.14cadbc1-c4fa-451f-815a-5b5447fad33a@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <rYU7bMi_JCAFJKjZdTciL8TTbwA_bomSs3EJzEORWRs=.53465c22-1450-4eae-a4e9-2c94e210d652@github.com>
 <XrddmzTqWtRBH2x8-z6GMGJPlplj0mgRuM93iT5_dLQ=.14cadbc1-c4fa-451f-815a-5b5447fad33a@github.com>
Message-ID: <DOLU_XE25wdobOhe6JjHGiMvh9NSQP4oAVElytvZquA=.3fab8b6b-9563-42d7-9ed2-082d62234f49@github.com>

On Fri, 12 Sep 2025 01:46:57 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision:
>> 
>>  - Review fix
>>  - Merge branch 'master' into 8365926
>>  - Merge branch 'master' into 8365926
>>  - Review comments
>>  - Review comments
>>  - Merge branch 'master' into 8365926
>>  - Spelling
>>  - Merge branch 'master' into 8365926
>>  - draft jal<->jalr
>
> Still good to me.

Thanks  @RealFYang, @Hamlin-Li !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3283845949

From bmaillard at openjdk.org  Fri Sep 12 07:25:06 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Fri, 12 Sep 2025 07:25:06 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop
Message-ID: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>

This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.

### Context

The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:


    static public void test() {
        x = 0;
        for (int i = 0; i < 20000; i++) {
            x += i;
        }
        x = 0;
    }


After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.

This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).

### Detailed Analysis

In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.

This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.

This is what the IR looks like after the creation of the post loop in our reproducer:

<img width="1720" height="1908" alt="image" src="https://github.com/user-attachments/assets/ae074f45-4239-4664-99dd-ec247af39da5" />

On the screenshot, node `118 StoreI` takes directly `24 StoreI` as memory input, even though it is obvious that `96 CountedLoopEnd` (to which `73 NodeI` is attached) is a predecessor of `114 CountedLoopEnd` in the CFG. 

After that, we observe a succession of IGVN optimizations that eventually lead to the generation of wrong code:
- The `IfFalse` projection of `128 If` becomes dead, as the the _post_ loop is always executed (number of iterations is known)
- `121 Region` and `123 Phi` are subsequently eliminated (as a result of the dead path)
- Because the `Phi` disappeared, `118 StoreI` becomes the memory input of `89 StoreI`
- `118 StoreI` is eliminated because it is directly followed by a write at the same memory location
- `89 StoreI` is replaced by `24 StoreI` as an `Identity` optimizations because it is stores the same value at the same location
 
Node `89 StoreI` corresponds to the last `x = 0` assignment, and its elimination directly causes the wrong result (the store node from the `OuterStripMinedLoop` remains, as it is used by the safepoint).

### Proposed Fix

As mentioned previously, the impact of the missing `Phi` nodes need to be investigated further, as it it likely that this causes other bugs in the compilation process. This is a "local fix" for the specific issue of `Store` nodes moved out of the inner loop.

The approach here is to do the wiring directly in `PhaseIdealLoop::insert_post_loop`, right after having done the usual rewiring based on the `Phi` nodes. As the conditions for moving `Store` nodes out of the loop are quite restrictive, the pattern is predictable: `Store` nodes are attached to the `false` projection of the inner `CountedLoopEnd`, right before the safepoint in the CFG.

In the simplest case, the memory input of new version of the store node is outside of the loop body. In the cloned node, we change it to point to its original version instead (as the original store is always executed before).

It may also be that the memory input of the new node points to another memory node in the loop body. This can happen in the case where we have:


for (int i = 0; i < 20000; i++) {
    a1.field += i;
    a2.field += i;
}


Here, the second store has the first one as memory input, as `a1` and `a2` may be aliases. In this case, we only need to change the memory input of the first store in the chain, and it needs to point to the last memory node in the chain in the original version of the loop.

### Testing
- [x] [GitHub Actions](https://github.com/benoitmaillard/jdk/actions?query=branch%3AJDK-8364757)
- [x] tier1-4, plus some internal testing

Thank you for reviewing!

-------------

Commit messages:
 - Fix bad test headers after remaining
 - Fix trailing whitespace
 - Add jtreg tests
 - Fix logic after failing TestStoresSunkInOuterStripMinedLoop
 - 8364757: First attempt at fixing the store node issue

Changes: https://git.openjdk.org/jdk/pull/27225/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8364757
  Stats: 141 lines in 3 files changed: 141 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27225.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225

PR: https://git.openjdk.org/jdk/pull/27225

From roland at openjdk.org  Fri Sep 12 07:27:02 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Fri, 12 Sep 2025 07:27:02 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v8]
In-Reply-To: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
Message-ID: <4Yzeo6gJlk-Jq5zlh3P9HPCm57-7AwIqsywOWbawzcI=.13938c72-a9d4-463d-a54c-a08c70482a6b@github.com>

> A node in a pre loop only has uses out of the loop dominated by the
> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
> to the loop exit projection. A range check in the main loop has this
> node as input (through a chain of some other nodes). Range check
> elimination needs to update the exit condition of the pre loop with an
> expression that depends on the node pinned on its exit: that's
> impossible and the assert fires. This is a variant of 8314024 (this
> one was for a node with uses out of the pre loop on multiple paths). I
> propose the same fix: leave the node with control in the pre loop in
> this case.

Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:

  review

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26424/files
  - new: https://git.openjdk.org/jdk/pull/26424/files/91a7d73c..ec28714e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=06-07

  Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/26424.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26424/head:pull/26424

PR: https://git.openjdk.org/jdk/pull/26424

From roland at openjdk.org  Fri Sep 12 07:27:06 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Fri, 12 Sep 2025 07:27:06 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v7]
In-Reply-To: <H3f75OPDtejeLKc3v8aFN8r-Zkry8odU6FAagWZfOc0=.245fae74-22c7-4469-93a6-29f1c5686688@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <BfbDzTNQbwqUh0pXFbXiFy09JPGfbLPVmUTx-YEe1KM=.cad33852-809f-4907-a41d-628d0d0db07e@github.com>
 <H3f75OPDtejeLKc3v8aFN8r-Zkry8odU6FAagWZfOc0=.245fae74-22c7-4469-93a6-29f1c5686688@github.com>
Message-ID: <dbiKX2DcmqEeT6vyVwC2sjamRYJEponE6smReCCDSlI=.4cd6f95e-2240-428c-8d06-fb5f005dce50@github.com>

On Tue, 9 Sep 2025 11:56:56 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8361702
>>  - Merge branch 'master' into JDK-8361702
>>  - review
>>  - Merge branch 'master' into JDK-8361702
>>  - Update src/hotspot/share/opto/loopopts.cpp
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update src/hotspot/share/opto/loopopts.cpp
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - tests
>>  - ... and 1 more: https://git.openjdk.org/jdk/compare/a6afe4cc...91a7d73c
>
> src/hotspot/share/opto/loopopts.cpp line 1936:
> 
>> 1934: // Sinking a node from a pre loop to its main loop pins the node between the pre and main loops. If that node is input
>> 1935: // to a check that's eliminated by range check elimination, it becomes input to an expression that feeds into the exit
>> 1936: // test of the pre loop above the point in the graph where it's pinned.
> 
> I guess the alternative would have been not to do that RC elimination, right?
> If yes: you could finish the thought and say that we prefer to have a chance at RC elimination, rather than sinking the node out of the pre-loop.

I updated the comment based on your suggestion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2343254430

From roland at openjdk.org  Fri Sep 12 07:30:29 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Fri, 12 Sep 2025 07:30:29 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <BuKfkAAcusJ6TNHSHtVaYYcmjnAVTIInXbhd4Z5Fg5w=.067f6b09-67e0-4b97-9753-c727c67343ca@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
 <BuKfkAAcusJ6TNHSHtVaYYcmjnAVTIInXbhd4Z5Fg5w=.067f6b09-67e0-4b97-9753-c727c67343ca@github.com>
Message-ID: <PsAetiA4N_lr7Mz7DJKMP7v-pVoRV9LZTvDC0tuNvWw=.4da18f35-c089-4926-a5d4-bcafcb3ab0e3@github.com>

On Fri, 12 Sep 2025 01:00:20 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
>> 
>>  - more
>>  - Merge branch 'master' into JDK-8327963
>>  - more
>>  - more
>>  - Merge branch 'master' into JDK-8327963
>>  - more
>>  - more
>>  - lambda return
>>  - lambda clean up
>>  - Merge branch 'master' into JDK-8327963
>>  - ... and 35 more: https://git.openjdk.org/jdk/compare/e16c5100...b701d03e
>
> src/hotspot/share/opto/loopTransform.cpp line 3992:
> 
>> 3990:   Node* frame = new ParmNode(C->start(), TypeFunc::FramePtr);
>> 3991:   _igvn.register_new_node_with_optimizer(frame);
>> 3992:   call->init_req(TypeFunc::FramePtr,  frame);
> 
> This seems unrelated.  Is it needed?

It's one of the things mentioned in that comment:
https://github.com/openjdk/jdk/pull/24570#issuecomment-2883651987

"I added asserts to catch cases where proj_out is called but the node has more than one matching projection. With those asserts, I caught some false positive/cases where we got lucky and worked around them by reworking the code so it doesn't use proj_out. That's the case in PhaseIdealLoop::intrinsify_fill(): we can end up there with more than one FramePtr projection because the code pattern used elsewhere is to add one more projection and let identical projections common during igvn. "

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2343260134

From dlunden at openjdk.org  Fri Sep 12 08:02:20 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 08:02:20 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <V9oqcN4DtpsHHax36QmUsRqz_KWQR-nXcEavdbNxpys=.0ca41f4e-999b-487c-8780-7ddfdcfa4d38@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
 <V9oqcN4DtpsHHax36QmUsRqz_KWQR-nXcEavdbNxpys=.0ca41f4e-999b-487c-8780-7ddfdcfa4d38@github.com>
Message-ID: <XS7Q_8Gd4T_12YFFmGbulnCe9SVs5CXy9OhpxUqgkRY=.32b71e46-46c5-43b3-a9d2-c0576abbd20a@github.com>

On Thu, 11 Sep 2025 23:47:39 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Lowercase _RM_INT and _RM_WORD
>
> src/hotspot/share/opto/chaitin.cpp line 1580:
> 
>> 1578:     _ifg->re_insert(lidx);
>> 1579:     if( !lrg->alive() ) continue;
>> 1580:     // capture allstackedness flag before mask is hacked
> 
> allstackedness --> infiniteness?

Thanks, I did not think to `grep` for that one...

> src/hotspot/share/opto/regmask.hpp line 166:
> 
>> 164:   // indefinitely with ONE bits.  Returns TRUE if mask is infinite or
>> 165:   // unbounded in size.  Returns FALSE if mask is finite size.
>> 166:   bool is_infinite() const {
> 
> "infinite" hides the fact that these unbounded bits are stack bits and not register bits, but `is_UnboundedStack` or `is_InfiniteStack` might be too verbose.  How does `is_InfStack` sound?

I like the suggestion, but should we not make it `is_infinite_stack` (current convention according to the style guide)? Or does historic conventions in `regmask.hpp` take precedence?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2343339147
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2343345578

From dlunden at openjdk.org  Fri Sep 12 08:02:22 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 08:02:22 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <Fr0jsNv9me1djItMgWmPIem2FO02xMz7SgRtKQS1Xks=.5d4efbea-ee61-4e5f-bb69-7341d1a12fb0@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
 <WEjhW8W7zNdMKusm1NMRY3-vgNEa_ssBm0hdcLp2_eM=.fd6329a9-98e5-4282-b655-29de105cea8c@github.com>
 <Fr0jsNv9me1djItMgWmPIem2FO02xMz7SgRtKQS1Xks=.5d4efbea-ee61-4e5f-bb69-7341d1a12fb0@github.com>
Message-ID: <j8gkF2kojcaAK0TYQ8VjGD6O9_VvLeMnenCypqsm0HU=.38cc91b0-ca55-44ae-b8bc-32544c0357bd@github.com>

On Fri, 12 Sep 2025 05:48:39 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 66:
>> 
>>> 64: 
>>> 65:   static const unsigned int _WordBitMask = BitsPerWord - 1U;
>>> 66:   static const unsigned int _LogWordBits = LogBitsPerWord;
>> 
>> What about just replacing all uses of _LogWordBits with LogBitsPerWord?
>
> Yes, that would be a good step in the right direction.

Sure, sounds good

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2343336560

From rehn at openjdk.org  Fri Sep 12 08:05:38 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Fri, 12 Sep 2025 08:05:38 GMT
Subject: Integrated: 8365926: RISC-V: Performance regression in renaissance
 (chi-square)
In-Reply-To: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
Message-ID: <d9oFceg8UcrSGNY0MjnDuJYlfn3WycfC5Qui4wh2hi0=.a05e24a7-3513-4b77-9195-00cd489b25e6@github.com>

On Tue, 26 Aug 2025 14:43:05 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> Hey, please consider!
> 
> A bunch of info in JBS entry, please read that also.
> 
> I narrowed this issue down to the old jal optimization, making direct calls when in reach.
> This patch restores them and removes this regression.
> 
> In essence we turn "jalr ra,0(t1)" into a "jal ra,<dest>" if reachable, and restore the jalr if a new destination is not reachable.
> 
> Please test on your hardware!
> 
> 
> Chi Square (100 runs each, 10 fastest iterations of each run, P550)
> JDK-23 (last version with trampoline calls)
> Mean: 3189.5827
> Standard Deviation: 284.6478
> 
> JDK-25
> Mean: 3424.8905
> Standard Deviation: 222.2208
> 
> Patch:
> Mean: 3144.8535
> Standard Deviation: 229.2577
> 
> 
> No issues found in t1, running t2 also. Stress tested on vf2, bpi-f3, p550.

This pull request has now been integrated.

Changeset: 5c1865a4
Author:    Robbin Ehn <rehn at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/5c1865a4fcd5da80ddcc506f4e41aada0fb93970
Stats:     86 lines in 3 files changed: 68 ins; 0 del; 18 mod

8365926: RISC-V: Performance regression in renaissance (chi-square)

Reviewed-by: fyang, mli

-------------

PR: https://git.openjdk.org/jdk/pull/26944

From wenanjian at openjdk.org  Fri Sep 12 08:11:43 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Fri, 12 Sep 2025 08:11:43 GMT
Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v7]
In-Reply-To: <gVCtrw6dXJ629mh1jBcsjZ5UU4NPGZ8Xd9C7VmiKxAM=.839127b7-981b-43ba-aa12-fb2497d3a997@github.com>
References: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
 <bT6qNgqLvGHT6fA0GYGKbZYUs5Oimz6ZqXygsW-Yp3s=.c951383e-c7fd-4bc7-8003-f95abd23b56e@github.com>
 <gVCtrw6dXJ629mh1jBcsjZ5UU4NPGZ8Xd9C7VmiKxAM=.839127b7-981b-43ba-aa12-fb2497d3a997@github.com>
Message-ID: <os9NxPGTHFi463EdCHidgOgT7Uh5Mu0QsGEnFS0LTZ8=.2bbd2eb3-8433-4a03-99d5-5a33dfed164e@github.com>

On Fri, 25 Jul 2025 10:22:49 GMT, Anjian Wen <wenanjian at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2745:
>> 
>>> 2743:     __ vsetivli(x0, 4, Assembler::e32, Assembler::m1);
>>> 2744:     __ vrev8_v(v31, v31, Assembler::VectorMask::v0_t); // convert big-endien to little-endian
>>> 2745:     __ vadd_vi(v31, v31, 1, Assembler::VectorMask::v0_t);
>> 
>> Are you sure this is correct? See `com.sun.crypto.provider.CounterMode::increment`.
>
> Thanks for the review. I'm still developing it.
> Regarding the growth of the counter array, it should use 8 bytes to store the count.  I use 4 Byte here according to OpenSSL aes-ctr code, I will try to fix it later
> https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkb-zvkned.pl#L242

> Are you sure this is correct? See `com.sun.crypto.provider.CounterMode::increment`.

Hi @theRealAph , according to your advice and code from `com.sun.crypto.provider.CounterMode::increment`,  I have modified my patch about counter increase by increasing 2 8Byte. Most of case increasing the first 8 Byte(from 8bit to 15 bit) is enough, it only needs to increase the next 8Byte when the first 8Byte overflows. And I have added a test for limit case, could you please help review again?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2343365699

From epeter at openjdk.org  Fri Sep 12 08:44:24 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 08:44:24 GMT
Subject: RFR: 8367483: C2 crash in  PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
Message-ID: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>

`CastX2PNode::Ideal` optimizes cases:

CastX2P(AddX(x, y)) -> AddP(CastX2P(x), y)
CastX2P(SubL(x, y)) -> AddP(CastX2P(x), SubL(0, y))


But the notification code `PhaseIterGVN::add_users_of_use_to_worklist` only adds `CastX2P` to the worklist for the `AddX` and not the `SubX` cases.

---------------------------------------

A little brag: this is the second (unrelated, i.e. non aliasing) bug that `TestAliasingFuzzer.java` found. Fuzzing access to native MemorySegment seems to trigger new/rare patterns.

-------------

Commit messages:
 - move test
 - Apply suggestions from code review
 - JDK-8367483

Changes: https://git.openjdk.org/jdk/pull/27249/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27249&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367483
  Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27249.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27249/head:pull/27249

PR: https://git.openjdk.org/jdk/pull/27249

From chagedorn at openjdk.org  Fri Sep 12 08:44:26 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 12 Sep 2025 08:44:26 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
Message-ID: <r4syH5AR3pNPt7FGqZuqnRi8W4jmhcxXvkWFkJD1dK8=.15e3b04c-8eb4-4b4b-a514-be8f18d84716@github.com>

On Fri, 12 Sep 2025 08:18:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> `CastX2PNode::Ideal` optimizes cases:
> 
> CastX2P(AddX(x, y)) -> AddP(CastX2P(x), y)
> CastX2P(SubL(x, y)) -> AddP(CastX2P(x), SubL(0, y))
> 
> 
> But the notification code `PhaseIterGVN::add_users_of_use_to_worklist` only adds `CastX2P` to the worklist for the `AddX` and not the `SubX` cases.
> 
> ---------------------------------------
> 
> A little brag: this is the second (unrelated, i.e. non aliasing) bug that `TestAliasingFuzzer.java` found. Fuzzing access to native MemorySegment seems to trigger new/rare patterns.

Otherwise, looks good!

test/hotspot/jtreg/compiler/c2/gvn/MissedOptimizationWithCastX2PSubX.java line 1:

> 1: /*

There is a `compiler/igvn` test folder. I think this suits better than `c2/gvn`.

test/hotspot/jtreg/compiler/c2/gvn/MissedOptimizationWithCastX2PSubX.java line 34:

> 32:  *           -XX:-TieredCompilation
> 33:  *           -XX:+IgnoreUnrecognizedVMOptions
> 34:  *           -XX:+UnlockDiagnosticVMOptions

These are not required:
Suggestion:

 *           -XX:CompileCommand=compileonly,compiler.c2.gvn.MissedOptimizationWithCastX2PSubX::test
 *           -XX:-TieredCompilation
 *           -XX:+IgnoreUnrecognizedVMOptions

test/hotspot/jtreg/compiler/c2/gvn/MissedOptimizationWithCastX2PSubX.java line 37:

> 35:  *           -XX:VerifyIterativeGVN=1110
> 36:  *           compiler.c2.gvn.MissedOptimizationWithCastX2PSubX
> 37:  * @run driver compiler.c2.gvn.MissedOptimizationWithCastX2PSubX

Should be `main` to allow additional flags to be passed in.
Suggestion:

 * @run main compiler.c2.gvn.MissedOptimizationWithCastX2PSubX

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27249#pullrequestreview-3215370623
PR Review Comment: https://git.openjdk.org/jdk/pull/27249#discussion_r2343421069
PR Review Comment: https://git.openjdk.org/jdk/pull/27249#discussion_r2343415661
PR Review Comment: https://git.openjdk.org/jdk/pull/27249#discussion_r2343416742

From bmaillard at openjdk.org  Fri Sep 12 08:44:27 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Fri, 12 Sep 2025 08:44:27 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
Message-ID: <5-DC_hrp0sdE4QYHLP5ChTq2NlFyXm_xpB2NWiJUuuE=.47b98166-122e-43f9-87cb-cb6d992f84b6@github.com>

On Fri, 12 Sep 2025 08:18:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> `CastX2PNode::Ideal` optimizes cases:
> 
> CastX2P(AddX(x, y)) -> AddP(CastX2P(x), y)
> CastX2P(SubL(x, y)) -> AddP(CastX2P(x), SubL(0, y))
> 
> 
> But the notification code `PhaseIterGVN::add_users_of_use_to_worklist` only adds `CastX2P` to the worklist for the `AddX` and not the `SubX` cases.
> 
> ---------------------------------------
> 
> A little brag: this is the second (unrelated, i.e. non aliasing) bug that `TestAliasingFuzzer.java` found. Fuzzing access to native MemorySegment seems to trigger new/rare patterns.

Looks good to me!

-------------

Marked as reviewed by bmaillard (Author).

PR Review: https://git.openjdk.org/jdk/pull/27249#pullrequestreview-3215415303

From epeter at openjdk.org  Fri Sep 12 08:44:27 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 08:44:27 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <5-DC_hrp0sdE4QYHLP5ChTq2NlFyXm_xpB2NWiJUuuE=.47b98166-122e-43f9-87cb-cb6d992f84b6@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
 <5-DC_hrp0sdE4QYHLP5ChTq2NlFyXm_xpB2NWiJUuuE=.47b98166-122e-43f9-87cb-cb6d992f84b6@github.com>
Message-ID: <hAm32KJaiKh8RrOIS8EfefpHoLH7y93aG2iK8Kcu4A8=.1fc4d675-6491-47e0-9520-3bc8099c5f5c@github.com>

On Fri, 12 Sep 2025 08:37:55 GMT, Beno?t Maillard <bmaillard at openjdk.org> wrote:

>> `CastX2PNode::Ideal` optimizes cases:
>> 
>> CastX2P(AddX(x, y)) -> AddP(CastX2P(x), y)
>> CastX2P(SubL(x, y)) -> AddP(CastX2P(x), SubL(0, y))
>> 
>> 
>> But the notification code `PhaseIterGVN::add_users_of_use_to_worklist` only adds `CastX2P` to the worklist for the `AddX` and not the `SubX` cases.
>> 
>> ---------------------------------------
>> 
>> A little brag: this is the second (unrelated, i.e. non aliasing) bug that `TestAliasingFuzzer.java` found. Fuzzing access to native MemorySegment seems to trigger new/rare patterns.
>
> Looks good to me!

@benoitmaillard Thanks for the review!

@chhagedorn Thanks for the suggestions, can I have your re-approval? ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27249#issuecomment-3284329031

From bmaillard at openjdk.org  Fri Sep 12 08:44:29 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Fri, 12 Sep 2025 08:44:29 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <rithOU9jNgOpx2tP1ezQEfYeUqnC-7kyvghsL_u6Cms=.8664d09f-c8b3-44fd-b395-b884290db55a@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
 <r4syH5AR3pNPt7FGqZuqnRi8W4jmhcxXvkWFkJD1dK8=.15e3b04c-8eb4-4b4b-a514-be8f18d84716@github.com>
 <rithOU9jNgOpx2tP1ezQEfYeUqnC-7kyvghsL_u6Cms=.8664d09f-c8b3-44fd-b395-b884290db55a@github.com>
Message-ID: <FZc3zHJx9Y1IgeDBZAm7u_NbVjJOjj3kLVCA5gbJEjQ=.6aa48642-716b-4184-a205-ba135dfeae2c@github.com>

On Fri, 12 Sep 2025 08:34:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/c2/gvn/MissedOptimizationWithCastX2PSubX.java line 1:
>> 
>>> 1: /*
>> 
>> There is a `compiler/igvn` test folder. I think this suits better than `c2/gvn`.
>
> Yes. We already have other missed optimization tests in `c2/gvn` though ? 
> As always: quite a mess.

This makes sense, but I remember that in the past similar tests (for example `test/hotspot/jtreg/compiler/c2/TestEliminateRedundantConversionSequences.java`) were simply put in `c2`. Not sure what is the policy here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27249#discussion_r2343448150

From rehn at openjdk.org  Fri Sep 12 08:45:39 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Fri, 12 Sep 2025 08:45:39 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v5]
In-Reply-To: <TbX0Ps86Ds60F98KXjt_afTSj_9dhe3jz1ohwM7cL1w=.8f9526fd-8427-45dc-9243-29af915a9278@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <64z-PlrnxAISLzKBq-RZz7CXkQirGTvOgTGMJQl833o=.73ea3239-dfb6-4e32-b20f-8398334f2759@github.com>
 <IzlmxvN2cYF-OVYP_QsLfsGHpdI1EyVMIW-blkQa_Ko=.d3579688-3ffa-456a-a999-c1ec75ccc72e@github.com>
 <TbX0Ps86Ds60F98KXjt_afTSj_9dhe3jz1ohwM7cL1w=.8f9526fd-8427-45dc-9243-29af915a9278@github.com>
Message-ID: <Kjbns9HUxkiTIns0MTabAAfOCugdnAyfjoWi_kSBOGg=.07262507-bb20-4b42-b62a-f90e1980a6f9@github.com>

On Thu, 11 Sep 2025 09:01:23 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hamlin had some offline Q so I gather this data for him:
>> 
>> Benchmark Results, doing 20 iteration and 20 runs of each benchmarks for both options:
>> (using P550 where I saw the largest regression)
>> 
>> Base: JDK24* +UseTrampoline
>> JAL OPT: JDK24* -UseTrampoline + JAL OPT
>> 
>> Values are in ms, lower is better.
>> 
>> 
>> +-----------------+--------------+--------------+----------------+----------------+--------------+------------------+-------------+----------------+------------------+--------------------+
>> | Benchmark       | Mean (Base)  | SD (Base)    | Fastest (Base) | Mean (JAL OPT) | SD (JAL OPT) | Fastest (JAL OPT)| Diff Mean   | Diff Fastest   | Mean Diff Ratio  | Fastest Diff Ratio |
>> +-----------------+--------------+--------------+----------------+----------------+--------------+------------------+-------------+----------------+------------------+--------------------+
>> | future-genetic  | 8317.8449    | 925.0775     | 7824.59        | 8421.137       | 1870.3916    | 7955.19          | 103.2922    | 130.6          | 1.012418145      | 1.01669097         |
>> | akka-uct        | 54775.8037   | 5220.7361    | 49614.46       | 54149.9939     | 4730.3662    | 48736.7          | -625.8097   | -877.76        | 0.9885750686     | 0.9823083835       |
>> | movie-lens      | 44859.3268   | 107.8713     | 38160.64       | 43043.6965     | 7932.6525    | 36807.2          | -1815.6295  | -1353.44       | 0.9595261529     | 0.9645330896       |
>> | scala-doku      | 10792.4933   | 3004.9348    | 970.34         | 10739.0164     | 2692.6155    | 9226.94          | -53.4766    | 256.59         | 0.9950450188     | 1.028605382        |
>> | chi-square      | 4740.1812    | 3552.9489    | 2579.09        | 4749.0893      | 3484.3178    | 2498.04          | 8.9081      | -81.05         | 1.001879274      | 0.968574187        |
>> | fj-kmeans       | 18597.656    | 2481.4036    | 17994.43       | 18588.154      | 4458.6089    | 18019.15         | -9.5018     | 24.72          | 0.9994890862     | 1.001373758        |
>> | db-shootout     | 26529.8048   | 3163.9087    | 21270.43       | 25101.5681     | 2483.0698    | 21419.11         | -1428.2367  | 148.67         | 0.9461648244     | 1.006989986        |
>> | finagle-http    | 20646.1713   | 1635.9154    | 14898.97       | 20250.4966     | 1046.1738    | 14735.66         | -395.6747   | -163.31        | 0.9808354443     | 0.9890388396       |
>> | reactors        | 52051.8872   | 2023.7865    | 49188.65       | 51625.9...
>
>> Hamlin had some offline Q so I gather this data for him:
> 
> Thanks Robbin for collecting the data!
> 
>> So on average using auipc+ld+jalr + JAL opt is 1.73% faster than the old trampolines.
> 
> This looks great!

@Hamlin-Li @RealFYang 

I broke release builds:
src/hotspot/cpu/riscv/nativeInst_riscv.cpp is missing this include runtime/atomic.hpp
https://bugs.openjdk.org/browse/JDK-8367498

If you can you fix that for me (away for an hour) I would very much be thankfull!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3284331961

From epeter at openjdk.org  Fri Sep 12 08:44:29 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 08:44:29 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <r4syH5AR3pNPt7FGqZuqnRi8W4jmhcxXvkWFkJD1dK8=.15e3b04c-8eb4-4b4b-a514-be8f18d84716@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
 <r4syH5AR3pNPt7FGqZuqnRi8W4jmhcxXvkWFkJD1dK8=.15e3b04c-8eb4-4b4b-a514-be8f18d84716@github.com>
Message-ID: <rithOU9jNgOpx2tP1ezQEfYeUqnC-7kyvghsL_u6Cms=.8664d09f-c8b3-44fd-b395-b884290db55a@github.com>

On Fri, 12 Sep 2025 08:28:51 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> `CastX2PNode::Ideal` optimizes cases:
>> 
>> CastX2P(AddX(x, y)) -> AddP(CastX2P(x), y)
>> CastX2P(SubL(x, y)) -> AddP(CastX2P(x), SubL(0, y))
>> 
>> 
>> But the notification code `PhaseIterGVN::add_users_of_use_to_worklist` only adds `CastX2P` to the worklist for the `AddX` and not the `SubX` cases.
>> 
>> ---------------------------------------
>> 
>> A little brag: this is the second (unrelated, i.e. non aliasing) bug that `TestAliasingFuzzer.java` found. Fuzzing access to native MemorySegment seems to trigger new/rare patterns.
>
> test/hotspot/jtreg/compiler/c2/gvn/MissedOptimizationWithCastX2PSubX.java line 1:
> 
>> 1: /*
> 
> There is a `compiler/igvn` test folder. I think this suits better than `c2/gvn`.

Yes. We already have other missed optimization tests in `c2/gvn` though ? 
As always: quite a mess.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27249#discussion_r2343441022

From epeter at openjdk.org  Fri Sep 12 08:44:31 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 08:44:31 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <lifLxKnvEPFWBnq6DfPISF6tbLMMWSHtVCU6x_o2f8I=.0c824d3f-8b5c-4aef-a1c6-7846c37bf20b@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
 <r4syH5AR3pNPt7FGqZuqnRi8W4jmhcxXvkWFkJD1dK8=.15e3b04c-8eb4-4b4b-a514-be8f18d84716@github.com>
 <rithOU9jNgOpx2tP1ezQEfYeUqnC-7kyvghsL_u6Cms=.8664d09f-c8b3-44fd-b395-b884290db55a@github.com>
 <FZc3zHJx9Y1IgeDBZAm7u_NbVjJOjj3kLVCA5gbJEjQ=.6aa48642-716b-4184-a205-ba135dfeae2c@github.com>
 <lifLxKnvEPFWBnq6DfPISF6tbLMMWSHtVCU6x_o2f8I=.0c824d3f-8b5c-4aef-a1c6-7846c37bf20b@github.com>
Message-ID: <PnpHXOGjWF86p7MDtQ-myd6-CB5dixgbYaD6sUSwlNc=.9c744b68-565e-408c-aca4-717dc5e891c0@github.com>

On Fri, 12 Sep 2025 08:37:12 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> This makes sense, but I remember that in the past similar tests (for example `test/hotspot/jtreg/compiler/c2/TestEliminateRedundantConversionSequences.java`) were simply put in `c2`. Not sure what is the policy here.
>
> Yes, indeed. We should probably move them as well at some point. And we should probably stick more to the convention to name tests "TestXYZ" to distinguish between helper classes and actual tests.

Yes. Maybe we just have to at some point move all tests around. Will hurt a bit for backports maybe. But it should be ok on the whole.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27249#discussion_r2343463923

From chagedorn at openjdk.org  Fri Sep 12 08:44:30 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 12 Sep 2025 08:44:30 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <FZc3zHJx9Y1IgeDBZAm7u_NbVjJOjj3kLVCA5gbJEjQ=.6aa48642-716b-4184-a205-ba135dfeae2c@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
 <r4syH5AR3pNPt7FGqZuqnRi8W4jmhcxXvkWFkJD1dK8=.15e3b04c-8eb4-4b4b-a514-be8f18d84716@github.com>
 <rithOU9jNgOpx2tP1ezQEfYeUqnC-7kyvghsL_u6Cms=.8664d09f-c8b3-44fd-b395-b884290db55a@github.com>
 <FZc3zHJx9Y1IgeDBZAm7u_NbVjJOjj3kLVCA5gbJEjQ=.6aa48642-716b-4184-a205-ba135dfeae2c@github.com>
Message-ID: <lifLxKnvEPFWBnq6DfPISF6tbLMMWSHtVCU6x_o2f8I=.0c824d3f-8b5c-4aef-a1c6-7846c37bf20b@github.com>

On Fri, 12 Sep 2025 08:35:47 GMT, Beno?t Maillard <bmaillard at openjdk.org> wrote:

>> Yes. We already have other missed optimization tests in `c2/gvn` though ? 
>> As always: quite a mess.
>
> This makes sense, but I remember that in the past similar tests (for example `test/hotspot/jtreg/compiler/c2/TestEliminateRedundantConversionSequences.java`) were simply put in `c2`. Not sure what is the policy here.

Yes, indeed. We should probably move them as well at some point. And we should probably stick more to the convention to name tests "TestXYZ" to distinguish between helper classes and actual tests.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27249#discussion_r2343455094

From chagedorn at openjdk.org  Fri Sep 12 08:47:09 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 12 Sep 2025 08:47:09 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
Message-ID: <7LnPwg-I-_AG6uTrehfAWH_2eg94EzvX3aWdhtjpiBs=.663f707b-f6a4-4d83-b564-83fcc38d2744@github.com>

On Fri, 12 Sep 2025 08:18:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> `CastX2PNode::Ideal` optimizes cases:
> 
> CastX2P(AddX(x, y)) -> AddP(CastX2P(x), y)
> CastX2P(SubL(x, y)) -> AddP(CastX2P(x), SubL(0, y))
> 
> 
> But the notification code `PhaseIterGVN::add_users_of_use_to_worklist` only adds `CastX2P` to the worklist for the `AddX` and not the `SubX` cases.
> 
> ---------------------------------------
> 
> A little brag: this is the second (unrelated, i.e. non aliasing) bug that `TestAliasingFuzzer.java` found. Fuzzing access to native MemorySegment seems to trigger new/rare patterns.

Looks good and trivial, thanks for the update

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27249#pullrequestreview-3215461040

From mli at openjdk.org  Fri Sep 12 08:51:22 2025
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 12 Sep 2025 08:51:22 GMT
Subject: RFR: 8365926: RISC-V: Performance regression in renaissance
 (chi-square) [v5]
In-Reply-To: <TbX0Ps86Ds60F98KXjt_afTSj_9dhe3jz1ohwM7cL1w=.8f9526fd-8427-45dc-9243-29af915a9278@github.com>
References: <viYDaVS4fRIHDctkMwW8VOeCXXZ6XvsUvMSoZyHjxfQ=.1f1ba0c6-be03-4bdb-8b17-321b161eb9e1@github.com>
 <64z-PlrnxAISLzKBq-RZz7CXkQirGTvOgTGMJQl833o=.73ea3239-dfb6-4e32-b20f-8398334f2759@github.com>
 <IzlmxvN2cYF-OVYP_QsLfsGHpdI1EyVMIW-blkQa_Ko=.d3579688-3ffa-456a-a999-c1ec75ccc72e@github.com>
 <TbX0Ps86Ds60F98KXjt_afTSj_9dhe3jz1ohwM7cL1w=.8f9526fd-8427-45dc-9243-29af915a9278@github.com>
Message-ID: <wPX8HyA3jscYmbbZuwxtT0MRlR9nPC1r64hKUHBr5Xs=.a58c255e-d105-4ca5-bdb2-fe806b886985@github.com>

On Thu, 11 Sep 2025 09:01:23 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hamlin had some offline Q so I gather this data for him:
>> 
>> Benchmark Results, doing 20 iteration and 20 runs of each benchmarks for both options:
>> (using P550 where I saw the largest regression)
>> 
>> Base: JDK24* +UseTrampoline
>> JAL OPT: JDK24* -UseTrampoline + JAL OPT
>> 
>> Values are in ms, lower is better.
>> 
>> 
>> +-----------------+--------------+--------------+----------------+----------------+--------------+------------------+-------------+----------------+------------------+--------------------+
>> | Benchmark       | Mean (Base)  | SD (Base)    | Fastest (Base) | Mean (JAL OPT) | SD (JAL OPT) | Fastest (JAL OPT)| Diff Mean   | Diff Fastest   | Mean Diff Ratio  | Fastest Diff Ratio |
>> +-----------------+--------------+--------------+----------------+----------------+--------------+------------------+-------------+----------------+------------------+--------------------+
>> | future-genetic  | 8317.8449    | 925.0775     | 7824.59        | 8421.137       | 1870.3916    | 7955.19          | 103.2922    | 130.6          | 1.012418145      | 1.01669097         |
>> | akka-uct        | 54775.8037   | 5220.7361    | 49614.46       | 54149.9939     | 4730.3662    | 48736.7          | -625.8097   | -877.76        | 0.9885750686     | 0.9823083835       |
>> | movie-lens      | 44859.3268   | 107.8713     | 38160.64       | 43043.6965     | 7932.6525    | 36807.2          | -1815.6295  | -1353.44       | 0.9595261529     | 0.9645330896       |
>> | scala-doku      | 10792.4933   | 3004.9348    | 970.34         | 10739.0164     | 2692.6155    | 9226.94          | -53.4766    | 256.59         | 0.9950450188     | 1.028605382        |
>> | chi-square      | 4740.1812    | 3552.9489    | 2579.09        | 4749.0893      | 3484.3178    | 2498.04          | 8.9081      | -81.05         | 1.001879274      | 0.968574187        |
>> | fj-kmeans       | 18597.656    | 2481.4036    | 17994.43       | 18588.154      | 4458.6089    | 18019.15         | -9.5018     | 24.72          | 0.9994890862     | 1.001373758        |
>> | db-shootout     | 26529.8048   | 3163.9087    | 21270.43       | 25101.5681     | 2483.0698    | 21419.11         | -1428.2367  | 148.67         | 0.9461648244     | 1.006989986        |
>> | finagle-http    | 20646.1713   | 1635.9154    | 14898.97       | 20250.4966     | 1046.1738    | 14735.66         | -395.6747   | -163.31        | 0.9808354443     | 0.9890388396       |
>> | reactors        | 52051.8872   | 2023.7865    | 49188.65       | 51625.9...
>
>> Hamlin had some offline Q so I gather this data for him:
> 
> Thanks Robbin for collecting the data!
> 
>> So on average using auipc+ld+jalr + JAL opt is 1.73% faster than the old trampolines.
> 
> This looks great!

> @Hamlin-Li @RealFYang
> 
> I broke release builds: src/hotspot/cpu/riscv/nativeInst_riscv.cpp is missing this include runtime/atomic.hpp https://bugs.openjdk.org/browse/JDK-8367498
> 
> If you can you fix that for me (away for an hour) I would very much be thankfull!

Sure, let me do it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26944#issuecomment-3284352576

From roland at openjdk.org  Fri Sep 12 09:10:20 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Fri, 12 Sep 2025 09:10:20 GMT
Subject: RFR: 8366888: C2: incorrect assertion predicate with short running
 long counted loop
Message-ID: <wd6ljAcyNhWdRzCKJkIXgVzSFFfVIzjkNB7N7-qVfvs=.c1429794-3513-455c-ab96-d20e2ef82909@github.com>

In:


        for (int i = 100; i < 1100; i++) {
            v += floatArray[i - 100];
            Objects.checkIndex(i, longRange);
        }


The int counted loop has both an int range check and a long range. The
int range check is optimized first. Assertion predicates are inserted
above the loop. One predicates checks that:


init - 100 <u floatArray.length


The loop is then transformed to enable the optimization of the long
range check. The loop is short running, so there's no need to create a
loop nest. The counted loop is mostly left as is but, the loop's
bounds are changed from:


        for (int i = 100; i < 1100; i++) {


to:


        for (int i = 0; i < 1000; i++) {


The reason for that the long range check transformation expects the
loop to start at 0.

Pre/main/post loops are created. Template Assertion predicates are
added above the main loop. The loop is unrolled. Initialized assertion
predicates are created. The one created from the condition:


init - 100 <u floatArray.length


checks the value of `i` out of the pre loop which is 1. That check fails.

The root cause of the failure is that when bounds of the counted loop
are changed, template assertion predicates need to be updated with and
adjusted init input.

When the bounds of the loop are known, the assertion predicates can be
updated in place. Otherwise, when the loop is speculated to be short
running, the assertion predicates are updated when they are cloned.

-------------

Commit messages:
 - whitespaces
 - fix

Changes: https://git.openjdk.org/jdk/pull/27250/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27250&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366888
  Stats: 255 lines in 8 files changed: 243 ins; 3 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/27250.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27250/head:pull/27250

PR: https://git.openjdk.org/jdk/pull/27250

From roland at openjdk.org  Fri Sep 12 09:12:26 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Fri, 12 Sep 2025 09:12:26 GMT
Subject: RFR: 8366888: C2: incorrect assertion predicate with short running
 long counted loop
In-Reply-To: <wd6ljAcyNhWdRzCKJkIXgVzSFFfVIzjkNB7N7-qVfvs=.c1429794-3513-455c-ab96-d20e2ef82909@github.com>
References: <wd6ljAcyNhWdRzCKJkIXgVzSFFfVIzjkNB7N7-qVfvs=.c1429794-3513-455c-ab96-d20e2ef82909@github.com>
Message-ID: <NOWKexdrvSclJlw_285EA_qCDQujBI5nk-08Y6EII0Q=.881bc2e2-4c78-4e9d-a967-e7687393083d@github.com>

On Fri, 12 Sep 2025 08:57:57 GMT, Roland Westrelin <roland at openjdk.org> wrote:

> In:
> 
> 
>         for (int i = 100; i < 1100; i++) {
>             v += floatArray[i - 100];
>             Objects.checkIndex(i, longRange);
>         }
> 
> 
> The int counted loop has both an int range check and a long range. The
> int range check is optimized first. Assertion predicates are inserted
> above the loop. One predicates checks that:
> 
> 
> init - 100 <u floatArray.length
> 
> 
> The loop is then transformed to enable the optimization of the long
> range check. The loop is short running, so there's no need to create a
> loop nest. The counted loop is mostly left as is but, the loop's
> bounds are changed from:
> 
> 
>         for (int i = 100; i < 1100; i++) {
> 
> 
> to:
> 
> 
>         for (int i = 0; i < 1000; i++) {
> 
> 
> The reason for that the long range check transformation expects the
> loop to start at 0.
> 
> Pre/main/post loops are created. Template Assertion predicates are
> added above the main loop. The loop is unrolled. Initialized assertion
> predicates are created. The one created from the condition:
> 
> 
> init - 100 <u floatArray.length
> 
> 
> checks the value of `i` out of the pre loop which is 1. That check fails.
> 
> The root cause of the failure is that when bounds of the counted loop
> are changed, template assertion predicates need to be updated with and
> adjusted init input.
> 
> When the bounds of the loop are known, the assertion predicates can be
> updated in place. Otherwise, when the loop is speculated to be short
> running, the assertion predicates are updated when they are cloned.

Thanks @chhagedorn for the test case

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27250#issuecomment-3284433245

From mli at openjdk.org  Fri Sep 12 09:17:53 2025
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 12 Sep 2025 09:17:53 GMT
Subject: RFR: 8367501: RISC-V: build broken after JDK-8365926
Message-ID: <rfQZwToWKXJLzbg9VEI2_X2paRC4OvMIcAqFpA8W-ug=.46bbaa14-8239-4635-8e2e-86fbdc16d06f@github.com>

Hi,
Can you help to review this patch?

check https://github.com/openjdk/jdk/pull/26944, https://github.com/openjdk/jdk/pull/27135

Thanks

-------------

Commit messages:
 - initial commit

Changes: https://git.openjdk.org/jdk/pull/27251/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27251&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367501
  Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27251.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27251/head:pull/27251

PR: https://git.openjdk.org/jdk/pull/27251

From roland at openjdk.org  Fri Sep 12 09:24:28 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Fri, 12 Sep 2025 09:24:28 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop
In-Reply-To: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
Message-ID: <0sO2cPw0cvqc012qfyLQLLTukDO2q85ry3tGavZ3ZPM=.6d8c9b58-2eb5-46c0-ac53-d5041588d8ea@github.com>

On Thu, 11 Sep 2025 13:05:21 GMT, Beno?t Maillard <bmaillard at openjdk.org> wrote:

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Not a review but a comment on the missing Phis. Your description makes it sound like if the `OuterStripMinedLoop` was created with `Phis` from the start,  there would be no issue. That's no true AFAICT. The current logic for pre/main/post loops creation would simply not work because it doesn't expect the `Phis` and it would need to be extended so things are rewired correctly with the outer loop `Phis`. The inner loop would still have no `Phi` for the sunk store. So the existing logic, once fixed, would not find it either and you would need some new logic to find it maybe using the outer loop `Phis`. The current shape of the outer loop (without the Phis) is very simple and there's only one location where the Store can be (on the exit projection of the inner loop right above the safepoint which is right below the exit of the inner loop and can't be anywhere else). So you added logic to find the Store relying on the current shape of the outer loop. If the outer loop had `Phis`, some alt
 ernate version of that logic could be used. They seem like 2 ways of doing the same thing to me and nothing tells us one is better than the other.  In short, I don't find this bug a good example of something that would work better if we had `Phi`s on the outer loop. I wouldn't say the root cause is that we don't have `Phi`s on the outer loop either.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27225#issuecomment-3284472055

From rehn at openjdk.org  Fri Sep 12 10:44:28 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Fri, 12 Sep 2025 10:44:28 GMT
Subject: RFR: 8367501: RISC-V: build broken after JDK-8365926
In-Reply-To: <rfQZwToWKXJLzbg9VEI2_X2paRC4OvMIcAqFpA8W-ug=.46bbaa14-8239-4635-8e2e-86fbdc16d06f@github.com>
References: <rfQZwToWKXJLzbg9VEI2_X2paRC4OvMIcAqFpA8W-ug=.46bbaa14-8239-4635-8e2e-86fbdc16d06f@github.com>
Message-ID: <1DzpLBgwYHLES28Ke04iAutvg8CFvfHrWI4hWV5YQng=.13dabf77-37ce-467c-a2cd-e91fe3517ebc@github.com>

On Fri, 12 Sep 2025 09:09:43 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this patch?
> 
> check https://github.com/openjdk/jdk/pull/26944, https://github.com/openjdk/jdk/pull/27135
> 
> Thanks

Haha I did the same thing as @jdksjolen :) I should I have merge before.

Locally test, thank you @Hamlin-Li!

-------------

Marked as reviewed by rehn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27251#pullrequestreview-3216035215

From mli at openjdk.org  Fri Sep 12 10:44:28 2025
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 12 Sep 2025 10:44:28 GMT
Subject: RFR: 8367501: RISC-V: build broken after JDK-8365926
In-Reply-To: <1DzpLBgwYHLES28Ke04iAutvg8CFvfHrWI4hWV5YQng=.13dabf77-37ce-467c-a2cd-e91fe3517ebc@github.com>
References: <rfQZwToWKXJLzbg9VEI2_X2paRC4OvMIcAqFpA8W-ug=.46bbaa14-8239-4635-8e2e-86fbdc16d06f@github.com>
 <1DzpLBgwYHLES28Ke04iAutvg8CFvfHrWI4hWV5YQng=.13dabf77-37ce-467c-a2cd-e91fe3517ebc@github.com>
Message-ID: <EB9Znye0UCT7qiczhtC7Z4T_eIUQ0BmKyv4qgNn6uho=.8c774fdc-652d-45c8-b98f-fbcc077a952c@github.com>

On Fri, 12 Sep 2025 10:37:23 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> Haha I did the same thing as @jdksjolen :) I should I have merge before.
> 
> Locally test, thank you @Hamlin-Li!

Trigger runtime tests, no failure found yet.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27251#issuecomment-3284761602

From mli at openjdk.org  Fri Sep 12 10:44:29 2025
From: mli at openjdk.org (Hamlin Li)
Date: Fri, 12 Sep 2025 10:44:29 GMT
Subject: Integrated: 8367501: RISC-V: build broken after JDK-8365926
In-Reply-To: <rfQZwToWKXJLzbg9VEI2_X2paRC4OvMIcAqFpA8W-ug=.46bbaa14-8239-4635-8e2e-86fbdc16d06f@github.com>
References: <rfQZwToWKXJLzbg9VEI2_X2paRC4OvMIcAqFpA8W-ug=.46bbaa14-8239-4635-8e2e-86fbdc16d06f@github.com>
Message-ID: <mEJzKCBhU-lAo6BpYG3tVKTvdLX1lgWeX7B6BnrfXCI=.8f3718e7-57cc-4272-8a44-9497118413cd@github.com>

On Fri, 12 Sep 2025 09:09:43 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this patch?
> 
> check https://github.com/openjdk/jdk/pull/26944, https://github.com/openjdk/jdk/pull/27135
> 
> Thanks

This pull request has now been integrated.

Changeset: d13769d6
Author:    Hamlin Li <mli at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/d13769d6c12688edffb23965c23cac614a9e6926
Stats:     3 lines in 1 file changed: 1 ins; 0 del; 2 mod

8367501: RISC-V: build broken after JDK-8365926

Reviewed-by: rehn

-------------

PR: https://git.openjdk.org/jdk/pull/27251

From dlunden at openjdk.org  Fri Sep 12 11:31:00 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 11:31:00 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v3]
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <yQdr-FtgeelfAadYdv3LrUgeJgw0f5LX59fUOsGNyXk=.a1bd3bfe-c77a-46cb-a31c-07ca1c57d646@github.com>

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Fix remaining references to all-stack

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27215/files
  - new: https://git.openjdk.org/jdk/pull/27215/files/61ff4f8c..82b85367

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=01-02

  Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/27215.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27215/head:pull/27215

PR: https://git.openjdk.org/jdk/pull/27215

From dlunden at openjdk.org  Fri Sep 12 11:36:27 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 11:36:27 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v4]
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <XqX_nPhu_ej5IgW2_z3DFreGRjyfUELqEEiIic2Bg9c=.7ed11c6f-0bbd-48b2-b80f-4d71010886f2@github.com>

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Remove _LogWordBits

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27215/files
  - new: https://git.openjdk.org/jdk/pull/27215/files/82b85367..47773ee9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=02-03

  Stats: 9 lines in 3 files changed: 0 ins; 1 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/27215.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27215/head:pull/27215

PR: https://git.openjdk.org/jdk/pull/27215

From epeter at openjdk.org  Fri Sep 12 11:39:11 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 11:39:11 GMT
Subject: RFR: 8367483: C2 crash in PhaseValues::type: assert(t != nullptr)
 failed: must set before get - missing notification for CastX2P(SubL(x, y))
In-Reply-To: <7LnPwg-I-_AG6uTrehfAWH_2eg94EzvX3aWdhtjpiBs=.663f707b-f6a4-4d83-b564-83fcc38d2744@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
 <7LnPwg-I-_AG6uTrehfAWH_2eg94EzvX3aWdhtjpiBs=.663f707b-f6a4-4d83-b564-83fcc38d2744@github.com>
Message-ID: <GfHK6dS7esfFzjtHPRnHuDF5t6tLhMciACpHj3VXU-8=.c3c326b0-ef31-4d5a-a325-82b5c7b4b85f@github.com>

On Fri, 12 Sep 2025 08:45:07 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> `CastX2PNode::Ideal` optimizes cases:
>> 
>> CastX2P(AddX(x, y)) -> AddP(CastX2P(x), y)
>> CastX2P(SubL(x, y)) -> AddP(CastX2P(x), SubL(0, y))
>> 
>> 
>> But the notification code `PhaseIterGVN::add_users_of_use_to_worklist` only adds `CastX2P` to the worklist for the `AddX` and not the `SubX` cases.
>> 
>> ---------------------------------------
>> 
>> A little brag: this is the second (unrelated, i.e. non aliasing) bug that `TestAliasingFuzzer.java` found. Fuzzing access to native MemorySegment seems to trigger new/rare patterns.
>
> Looks good and trivial, thanks for the update

@chhagedorn @benoitmaillard Thanks for the reviews!

I agree, the patch is quite trivial.
I'm risking a Friday afternoon integration, to make sure the CI does not fail on our stress job.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27249#issuecomment-3284931070

From epeter at openjdk.org  Fri Sep 12 12:09:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:09:32 GMT
Subject: RFR: 8366940: Test compiler/loopopts/superword/TestAliasingFuzzer.java
 timed out
Message-ID: <-iszfG2luNYZtsMxsMnsDWoQIscvcY37XSpi8fDDcEE=.2d26cf70-ba2e-4967-a083-50867f291784@github.com>

`TestAliasingFuzzer.java` generates 30 subtests for every run. They are randomized. Some vectorize and execute faster, some fail to vectorize and execute slower.

Hence, some natural variance in the duration is expected.
On most machines, it seems the variance in "Running Tests" is about 30-50sec (total test time about 35-70sec). But on some machines (macosx-x64-debug), the execution time is a bit slower: 60-100 in "Running Tests", with some outliers at 110+sec. These occasionally trip the 120sec timeout, and when they trip it, they somehow cause the harness to take an excessive 9+min to shut everything down.

Solutions:
- Option 1: generate fewer tests in `TestAliasingFuzzer.java`. Would be sad, the test has now found 2 real bugs within 2 weeks.
- Option 2: increase test timeout. That is what I'll do. Because the "outliers" that caused the timeouts were not far from all other cases on the same platform, and so they are acceptable.

-------------

Commit messages:
 - JDK-8366940

Changes: https://git.openjdk.org/jdk/pull/27257/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27257&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366940
  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27257.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27257/head:pull/27257

PR: https://git.openjdk.org/jdk/pull/27257

From epeter at openjdk.org  Fri Sep 12 12:09:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:09:32 GMT
Subject: Integrated: 8367483: C2 crash in  PhaseValues::type: assert(t !=
 nullptr) failed: must set before get - missing notification for
 CastX2P(SubL(x, y))
In-Reply-To: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
References: <_kMBdz-PsErEbxlHt7PDZTJmRqNEguaZS4GAgta9KtY=.2f177665-f392-4539-b490-b635a6afbe15@github.com>
Message-ID: <rEAo9HXCkk_0pRsalVNHTdMLqQoz4-3GkQgEr-ipPhI=.d862a3bd-06b3-41e2-b18f-b4333561998b@github.com>

On Fri, 12 Sep 2025 08:18:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> `CastX2PNode::Ideal` optimizes cases:
> 
> CastX2P(AddX(x, y)) -> AddP(CastX2P(x), y)
> CastX2P(SubL(x, y)) -> AddP(CastX2P(x), SubL(0, y))
> 
> 
> But the notification code `PhaseIterGVN::add_users_of_use_to_worklist` only adds `CastX2P` to the worklist for the `AddX` and not the `SubX` cases.
> 
> ---------------------------------------
> 
> A little brag: this is the second (unrelated, i.e. non aliasing) bug that `TestAliasingFuzzer.java` found. Fuzzing access to native MemorySegment seems to trigger new/rare patterns.

This pull request has now been integrated.

Changeset: 02d7281b
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/02d7281b93296e7700e215804cb9e2f8341cab06
Stats:     63 lines in 2 files changed: 62 ins; 0 del; 1 mod

8367483: C2 crash in  PhaseValues::type: assert(t != nullptr) failed: must set before get - missing notification for CastX2P(SubL(x, y))

Reviewed-by: chagedorn, bmaillard

-------------

PR: https://git.openjdk.org/jdk/pull/27249

From epeter at openjdk.org  Fri Sep 12 12:16:01 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:16:01 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value
In-Reply-To: <hScWI2VL-Cc2H-kQUfhd32fPCAkbXLHCUNh-2XZutsE=.2518bc77-83c4-4b85-af22-3230fe310130@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <EOLqK3ulrKNtgzmlWbNpwvCdg8sBaABmXNGdlucIurI=.7ce09643-efd1-4e3f-91f1-6e8040f4a51f@github.com>
 <hScWI2VL-Cc2H-kQUfhd32fPCAkbXLHCUNh-2XZutsE=.2518bc77-83c4-4b85-af22-3230fe310130@github.com>
Message-ID: <aPUCL3Aqezo5Hc4vID-htuZ_M22G0-vQR_-u1ZLT0ds=.2feff632-6e64-47ac-a943-7f54847c9969@github.com>

On Thu, 11 Sep 2025 17:39:45 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>>> Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> Can we return `Type::TOP` instead?
>> 
>> Besides, #17508 should be merged right after JDK-25 folk, do you want to wait for it first?
>
> @merykitty thanks, I hopefully addressed your comments :)
> 
> @eme64 do you want to re-run the tests once again?

@SirYwell Launching tests ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3285049047

From epeter at openjdk.org  Fri Sep 12 12:19:33 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:19:33 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v5]
In-Reply-To: <JF-5csR-8C7x2ooGamkx5B1s1eY25ehxH0mc-ngL53k=.d6626a6c-6947-4b4d-a026-1e2956c2b216@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
 <JF-5csR-8C7x2ooGamkx5B1s1eY25ehxH0mc-ngL53k=.d6626a6c-6947-4b4d-a026-1e2956c2b216@github.com>
Message-ID: <fBtz4TPlnLtzJ5bIjeGZXEr4jJCYHHHwp0t7bJOsyes=.b491f29c-96cc-4d21-84de-50fb7cc7450f@github.com>

On Thu, 11 Sep 2025 16:25:32 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   undo new match rules for RegMemReg for commutative operations
>
> Hi Emanuel (@eme64),
> 
> Could you please run the tests for this PR?
> 
> Thanks,
> Vamsi

@vamsi-parasa Quickly scanned the patch, looks reasonable. Launching tests ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26997#issuecomment-3285064018

From epeter at openjdk.org  Fri Sep 12 12:27:11 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:27:11 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v4]
In-Reply-To: <JnEOgQliunSIdlAAUkgdvSDzJqjCwKOSn4pwVW3ZD2Q=.7a7a0f01-bc7b-4026-a42d-d62be7d24e7b@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <2unG-RdDR2e1mI-veaR3AdDGGs1q4XFdITrnQtBGOw8=.47f7d565-b722-434b-96ef-b51ed733b241@github.com>
 <V01kxeeWm3UAJritt7R3PAFxS8SsVtxcttokr2Y-x84=.512a732e-de28-454a-bed6-ba9b2fb5979b@github.com>
 <1xDonJ67G3hUAWTdngutIb7LBboWxHRviCHXKDCSoN4=.2617f8e9-206b-424d-a1ab-501b182717bb@github.com>
 <JnEOgQliunSIdlAAUkgdvSDzJqjCwKOSn4pwVW3ZD2Q=.7a7a0f01-bc7b-4026-a42d-d62be7d24e7b@github.com>
Message-ID: <1LfsfA8tLxKr7hmkLM8-ZR49IEblYMEjTkcUPC0P5cs=.e73b68b8-04ec-4714-a56e-f3af91dce5bc@github.com>

On Thu, 11 Sep 2025 23:44:45 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Thanks, I agree that it seems more consistent to use `_rm_int` and `_rm_word` instead. The missing leading underscore for `RM_SIZE_IN_INTS` highlights that it is a macro, unlike `_RM_SIZE_IN_WORDS`. Maybe this is just for historical reasons and not up to date with today's conventions? 
>> 
>> Do we classify constant static fields such as `_RM_SIZE_IN_WORDS` as constants or fields? I.e., do we use upper or lower case? I guess it would be `_rm_size_in_words` if considered a field and `RM_SIZE_IN_WORDS` (without the leading underscore) if considered a constant.
>
> I vote for `RM_SIZE_IN_WORDS` because it is a constant, the same as if it was a value from an enum.

Same as @dean-long : constants and enum values are generally `RM_SIZE_IN_WORDS`. Sometimes we also do `CamelCase`. Like for `LogBitsPerWord`. Underscore `_` is really only for fields.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344084682

From dlunden at openjdk.org  Fri Sep 12 12:33:42 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 12:33:42 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v5]
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <ZxjDAkYKza3KEFcLc6YgrI-9btMtVVz_-TWOZVXBVMU=.b417ed96-3b18-487c-92cb-2dd205834dd2@github.com>

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Rename constants

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27215/files
  - new: https://git.openjdk.org/jdk/pull/27215/files/47773ee9..31c78597

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=03-04

  Stats: 21 lines in 2 files changed: 0 ins; 0 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/27215.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27215/head:pull/27215

PR: https://git.openjdk.org/jdk/pull/27215

From epeter at openjdk.org  Fri Sep 12 12:33:44 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:33:44 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v4]
In-Reply-To: <XqX_nPhu_ej5IgW2_z3DFreGRjyfUELqEEiIic2Bg9c=.7ed11c6f-0bbd-48b2-b80f-4d71010886f2@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <XqX_nPhu_ej5IgW2_z3DFreGRjyfUELqEEiIic2Bg9c=.7ed11c6f-0bbd-48b2-b80f-4d71010886f2@github.com>
Message-ID: <19P8X88PcVoh8x62iBz5baOyefoWgGp-aFd_Bli-vm0=.fbd446d3-0dad-41e8-bb13-ab113a3b9767@github.com>

On Fri, 12 Sep 2025 11:36:27 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove _LogWordBits

Changes requested by epeter (Reviewer).

src/hotspot/share/opto/regmask.hpp line 65:

> 63:   LP64_ONLY(STATIC_ASSERT(is_aligned(RM_SIZE_IN_INTS, 2)));
> 64: 
> 65:   static const unsigned int _WordBitMask = BitsPerWord - 1U;

You could also remove the `_` here. I suppose we keep `CamelCase` here because to keep it parallel to `BitsPerWord`. What do you think @dean-long ?

src/hotspot/share/opto/regmask.hpp line 68:

> 66:   static const unsigned int _RM_SIZE_IN_WORDS =
> 67:       LP64_ONLY(RM_SIZE_IN_INTS >> 1) NOT_LP64(RM_SIZE_IN_INTS);
> 68:   static const unsigned int _RM_WORD_MAX_INDEX = _RM_SIZE_IN_WORDS - 1U;

I would get rid of the `_` here. Constants should preferrably be `UPPER_CASE` (most of the time), and occasionally `CamelCase` where we are already doing it (only do it if needed for consistency). Underscore is only used for fields, as far as I know.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27215#pullrequestreview-3216439510
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344087574
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344091072

From epeter at openjdk.org  Fri Sep 12 12:33:46 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:33:46 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <XS7Q_8Gd4T_12YFFmGbulnCe9SVs5CXy9OhpxUqgkRY=.32b71e46-46c5-43b3-a9d2-c0576abbd20a@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
 <V9oqcN4DtpsHHax36QmUsRqz_KWQR-nXcEavdbNxpys=.0ca41f4e-999b-487c-8780-7ddfdcfa4d38@github.com>
 <XS7Q_8Gd4T_12YFFmGbulnCe9SVs5CXy9OhpxUqgkRY=.32b71e46-46c5-43b3-a9d2-c0576abbd20a@github.com>
Message-ID: <dS1Cf8BSsIYA4hSxO3v97eHO5-LdcVRqgizTnzA7aRg=.b6b56ab1-e287-4fc5-b853-a61f4e096796@github.com>

On Fri, 12 Sep 2025 07:59:44 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 166:
>> 
>>> 164:   // indefinitely with ONE bits.  Returns TRUE if mask is infinite or
>>> 165:   // unbounded in size.  Returns FALSE if mask is finite size.
>>> 166:   bool is_infinite() const {
>> 
>> "infinite" hides the fact that these unbounded bits are stack bits and not register bits, but `is_UnboundedStack` or `is_InfiniteStack` might be too verbose.  How does `is_InfStack` sound?
>
> I like the suggestion, but should we not make it `is_infinite_stack` (current convention according to the style guide)? Or does historic conventions in `regmask.hpp` take precedence?

I would prefer `is_infinite_stack`.

`is_InfiniteStack` and `is_InfStack` only make sense if `InfiniteStack` is a class / name that is used widely in CamelCase. Like for `ContedLoopNode` -> `is_CountedLoop`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344098469

From dlunden at openjdk.org  Fri Sep 12 12:33:45 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 12:33:45 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v5]
In-Reply-To: <1LfsfA8tLxKr7hmkLM8-ZR49IEblYMEjTkcUPC0P5cs=.e73b68b8-04ec-4714-a56e-f3af91dce5bc@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <2unG-RdDR2e1mI-veaR3AdDGGs1q4XFdITrnQtBGOw8=.47f7d565-b722-434b-96ef-b51ed733b241@github.com>
 <V01kxeeWm3UAJritt7R3PAFxS8SsVtxcttokr2Y-x84=.512a732e-de28-454a-bed6-ba9b2fb5979b@github.com>
 <1xDonJ67G3hUAWTdngutIb7LBboWxHRviCHXKDCSoN4=.2617f8e9-206b-424d-a1ab-501b182717bb@github.com>
 <JnEOgQliunSIdlAAUkgdvSDzJqjCwKOSn4pwVW3ZD2Q=.7a7a0f01-bc7b-4026-a42d-d62be7d24e7b@github.com>
 <1LfsfA8tLxKr7hmkLM8-ZR49IEblYMEjTkcUPC0P5cs=.e73b68b8-04ec-4714-a56e-f3af91dce5bc@github.com>
Message-ID: <YAZHAUcBni1T13KI6vHdQjAxb3WOBHC5zaq0JgNhsNs=.d91805dd-4bb1-486d-afb0-1358a0a17d24@github.com>

On Fri, 12 Sep 2025 12:23:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I vote for `RM_SIZE_IN_WORDS` because it is a constant, the same as if it was a value from an enum.
>
> Same as @dean-long : constants and enum values are generally `RM_SIZE_IN_WORDS`. Sometimes we also do `CamelCase`. Like for `LogBitsPerWord`. Underscore `_` is really only for fields.

Thanks, now updated

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344100928

From syan at openjdk.org  Fri Sep 12 12:34:10 2025
From: syan at openjdk.org (SendaoYan)
Date: Fri, 12 Sep 2025 12:34:10 GMT
Subject: RFR: 8366940: Test
 compiler/loopopts/superword/TestAliasingFuzzer.java timed out
In-Reply-To: <-iszfG2luNYZtsMxsMnsDWoQIscvcY37XSpi8fDDcEE=.2d26cf70-ba2e-4967-a083-50867f291784@github.com>
References: <-iszfG2luNYZtsMxsMnsDWoQIscvcY37XSpi8fDDcEE=.2d26cf70-ba2e-4967-a083-50867f291784@github.com>
Message-ID: <cywK2fr_VL5axWsEvS5Ucs8krhZWTMKm7b2j18j_Q7s=.6c6965b7-8c2b-4bc2-9395-00555c2a865c@github.com>

On Fri, 12 Sep 2025 12:01:25 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> `TestAliasingFuzzer.java` generates 30 subtests for every run. They are randomized. Some vectorize and execute faster, some fail to vectorize and execute slower.
> 
> Hence, some natural variance in the duration is expected.
> On most machines, it seems the variance in "Running Tests" is about 30-50sec (total test time about 35-70sec). But on some machines (macosx-x64-debug), the execution time is a bit slower: 60-100 in "Running Tests", with some outliers at 110+sec. These occasionally trip the 120sec timeout, and when they trip it, they somehow cause the harness to take an excessive 9+min to shut everything down.
> 
> Solutions:
> - Option 1: generate fewer tests in `TestAliasingFuzzer.java`. Would be sad, the test has now found 2 real bugs within 2 weeks.
> - Option 2: increase test timeout. That is what I'll do. Because the "outliers" that caused the timeouts were not far from all other cases on the same platform, and so they are acceptable.

Marked as reviewed by syan (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27257#pullrequestreview-3216462290

From epeter at openjdk.org  Fri Sep 12 12:37:27 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:37:27 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v7]
In-Reply-To: <fSXhxnCbvqLQDqh6nvnQKE61sw4my40lxRcCciDmZxY=.ead513af-c671-449a-87b6-eb2d8d630d18@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <fSXhxnCbvqLQDqh6nvnQKE61sw4my40lxRcCciDmZxY=.ead513af-c671-449a-87b6-eb2d8d630d18@github.com>
Message-ID: <01wSitNTzH-39p7KkpV9C_aD-4pvob_CqtnreZGP9L8=.396f1a6a-84b4-465a-91f5-5ef2ecd074b7@github.com>

On Thu, 11 Sep 2025 12:16:47 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding random bound test point

test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 56:

> 54:     static final long rand_bndL2 = G.uniformLongs(0xFFL, Long.MAX_VALUE).next();
> 55:     static final long rand_popcL1 = G.uniformLongs(0, 3).next();
> 56:     static final long rand_popcL2 = G.uniformLongs(20, 40).next();

What is the reason for limiting the range on all these values?
For example, we now never generate negative values, i.e. the msb is never set.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2344107725

From dlunden at openjdk.org  Fri Sep 12 12:37:27 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 12:37:27 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v4]
In-Reply-To: <19P8X88PcVoh8x62iBz5baOyefoWgGp-aFd_Bli-vm0=.fbd446d3-0dad-41e8-bb13-ab113a3b9767@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <XqX_nPhu_ej5IgW2_z3DFreGRjyfUELqEEiIic2Bg9c=.7ed11c6f-0bbd-48b2-b80f-4d71010886f2@github.com>
 <19P8X88PcVoh8x62iBz5baOyefoWgGp-aFd_Bli-vm0=.fbd446d3-0dad-41e8-bb13-ab113a3b9767@github.com>
Message-ID: <-hb0UM3-WL9h72oPsOvK5NA2-pYCyDQZS42R7FzPJ3s=.5dbaea39-391e-4e25-9e9e-f97940d3bc06@github.com>

On Fri, 12 Sep 2025 12:24:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Remove _LogWordBits
>
> src/hotspot/share/opto/regmask.hpp line 65:
> 
>> 63:   LP64_ONLY(STATIC_ASSERT(is_aligned(RM_SIZE_IN_INTS, 2)));
>> 64: 
>> 65:   static const unsigned int _WordBitMask = BitsPerWord - 1U;
> 
> You could also remove the `_` here. I suppose we keep `CamelCase` here because to keep it parallel to `BitsPerWord`. What do you think @dean-long ?

I renamed this entire group of constants to use the same style (uppercase separated by `_`, without leading `_`). It is now `WORD_BIT_MASK`. I think it makes more sense to use the same style across `regmask.hpp`, rather than following styles in other files.

> src/hotspot/share/opto/regmask.hpp line 68:
> 
>> 66:   static const unsigned int _RM_SIZE_IN_WORDS =
>> 67:       LP64_ONLY(RM_SIZE_IN_INTS >> 1) NOT_LP64(RM_SIZE_IN_INTS);
>> 68:   static const unsigned int _RM_WORD_MAX_INDEX = _RM_SIZE_IN_WORDS - 1U;
> 
> I would get rid of the `_` here. Constants should preferrably be `UPPER_CASE` (most of the time), and occasionally `CamelCase` where we are already doing it (only do it if needed for consistency). Underscore is only used for fields, as far as I know.

Yes, I had the same thought (now updated)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344107536
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344109467

From rcastanedalo at openjdk.org  Fri Sep 12 12:40:17 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 12 Sep 2025 12:40:17 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v2]
In-Reply-To: <tXYgEjTd-sJKM-pfP6s4-b1ej_0qFGlsCcLB7hLNYxM=.f72a884f-8b9a-43fc-91ff-0b5a1c91fbfc@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <gFzwQrRGY91LYbe0DluSjX80ET_kk6Kx7arucF2GW5c=.55deed6b-9e47-4000-9916-f1392967aa99@github.com>
 <tXYgEjTd-sJKM-pfP6s4-b1ej_0qFGlsCcLB7hLNYxM=.f72a884f-8b9a-43fc-91ff-0b5a1c91fbfc@github.com>
Message-ID: <MR6smbt34Y5cBmaZVoGh6lE92Tsg6-598lkohdfAD0o=.d517b6b6-cd86-4beb-a270-71b95c3d50d1@github.com>

On Thu, 11 Sep 2025 09:42:18 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Revert clean-up in EA. Make catch statements more specific in test case.
>
> src/hotspot/share/opto/escape.cpp line 3135:
> 
>> 3133:           Node* phi = use->ideal_node();
>> 3134:           if (phi->Opcode() == Op_Phi && reducible_merges.member(phi)) {
>> 3135:             if (!can_reduce_phi(phi->as_Phi())) {
> 
> Drive-by comment: I think the ifs should be merged

@JohnTortugo: this comment is marked as resolved in the PR but I cannot see any reply or actual code change, did you perhaps forget pushing the requested change?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2344117240

From epeter at openjdk.org  Fri Sep 12 12:43:11 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:43:11 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v5]
In-Reply-To: <ZxjDAkYKza3KEFcLc6YgrI-9btMtVVz_-TWOZVXBVMU=.b417ed96-3b18-487c-92cb-2dd205834dd2@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <ZxjDAkYKza3KEFcLc6YgrI-9btMtVVz_-TWOZVXBVMU=.b417ed96-3b18-487c-92cb-2dd205834dd2@github.com>
Message-ID: <j5rDEvsSeUhhziTWPvuxfW3Oy_7njqM5sUgEnGGLJIY=.3ecd5c25-e002-499c-b082-b24ed9c75fdf@github.com>

On Fri, 12 Sep 2025 12:33:42 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename constants

Nice, thanks for the improvements. I feel like the fog is lifting slowly from this code, and we can see the sunshine :sun_behind_large_cloud: -> :sun_behind_small_cloud: -> ?  :rofl:

src/hotspot/share/opto/chaitin.hpp line 51:

> 49: class LRG : public ResourceObj {
> 50: public:
> 51:   static const uint InfiniteStack_size = 0xFFFFF; // This mask size is used to tell that the mask of this LRG supports stack positions

We may want to prevent this from snowballing everywhere ... but this is also a constant and we probably want to call it `INFINITE_STACK_SIZE`, right?

src/hotspot/share/opto/regmask.cpp line 249:

> 247:     if (_rm_word[i]) {                // Found some bits
> 248:       // Convert to bit number, return hi bit in pair
> 249:       return OptoReg::Name((i<<LogBitsPerWord) + find_lowest_bit(_rm_word[i]) + (size - 1));

Suggestion:

      return OptoReg::Name((i << LogBitsPerWord) + find_lowest_bit(_rm_word[i]) + (size - 1));

Might as well fix code style while we touch it ;)

-------------

PR Review: https://git.openjdk.org/jdk/pull/27215#pullrequestreview-3216478619
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344114954
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344116691

From epeter at openjdk.org  Fri Sep 12 12:43:13 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:43:13 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v4]
In-Reply-To: <-hb0UM3-WL9h72oPsOvK5NA2-pYCyDQZS42R7FzPJ3s=.5dbaea39-391e-4e25-9e9e-f97940d3bc06@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <XqX_nPhu_ej5IgW2_z3DFreGRjyfUELqEEiIic2Bg9c=.7ed11c6f-0bbd-48b2-b80f-4d71010886f2@github.com>
 <19P8X88PcVoh8x62iBz5baOyefoWgGp-aFd_Bli-vm0=.fbd446d3-0dad-41e8-bb13-ab113a3b9767@github.com>
 <-hb0UM3-WL9h72oPsOvK5NA2-pYCyDQZS42R7FzPJ3s=.5dbaea39-391e-4e25-9e9e-f97940d3bc06@github.com>
Message-ID: <wydAw2cAmqyLR3bPxtOSUTr4CbSvVLqV4g6ZD-N6UXw=.3c9a5166-943c-44c2-aa50-eae4b05c4a22@github.com>

On Fri, 12 Sep 2025 12:33:34 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 65:
>> 
>>> 63:   LP64_ONLY(STATIC_ASSERT(is_aligned(RM_SIZE_IN_INTS, 2)));
>>> 64: 
>>> 65:   static const unsigned int _WordBitMask = BitsPerWord - 1U;
>> 
>> You could also remove the `_` here. I suppose we keep `CamelCase` here because to keep it parallel to `BitsPerWord`. What do you think @dean-long ?
>
> I renamed this entire group of constants to use the same style (uppercase separated by `_`, without leading `_`). It is now `WORD_BIT_MASK`. I think it makes more sense to use the same style across `regmask.hpp`, rather than following styles in other files.

Nice!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344123876

From dlunden at openjdk.org  Fri Sep 12 12:53:37 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 12:53:37 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v6]
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <XpgEeMJB6cUqa1rXQ_k7D_o0rN7c9fGODicDy-4QEt4=.1b570e52-0a57-4266-8ae6-aed3748c6c8b@github.com>

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Change infinite to infinite_stack

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27215/files
  - new: https://git.openjdk.org/jdk/pull/27215/files/31c78597..37f3cbd2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=04-05

  Stats: 31 lines in 11 files changed: 0 ins; 0 del; 31 mod
  Patch: https://git.openjdk.org/jdk/pull/27215.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27215/head:pull/27215

PR: https://git.openjdk.org/jdk/pull/27215

From dlunden at openjdk.org  Fri Sep 12 12:53:39 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 12:53:39 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v2]
In-Reply-To: <dihhaMNeMsnjzbZqg33g-nt8W-AlgH6gFhPKGl1yKfs=.8e24a156-b88d-4454-a63b-b2a060174cb6@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <Dbraqc-jw-wVVDlFAkGZXH_69lxpJYlMjteJRBZx1gM=.3446b723-3eef-475b-b434-af2e815bd695@github.com>
 <dihhaMNeMsnjzbZqg33g-nt8W-AlgH6gFhPKGl1yKfs=.8e24a156-b88d-4454-a63b-b2a060174cb6@github.com>
Message-ID: <bDi1xWb0PT7kkCTjEJECKwGNBIE-HWg9kmC9GaWLg9I=.ac713364-0e0b-4a21-93aa-56d4cd0de1d2@github.com>

On Fri, 12 Sep 2025 00:18:59 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Lowercase _RM_INT and _RM_WORD
>
> src/hotspot/share/opto/regmask.hpp line 166:
> 
>> 164:   // indefinitely with ONE bits.  Returns TRUE if mask is infinite or
>> 165:   // unbounded in size.  Returns FALSE if mask is finite size.
>> 166:   bool is_infinite() const {
> 
> "infinite" hides the fact that these unbounded bits are stack bits and not register bits, but `is_UnboundedStack` or `is_InfiniteStack` might be too verbose.  How does `is_InfStack` sound?

OK, I've now changed it to `is_infinite_stack`. @dean-long Let us know if you feel strongly about this and want to change it. I don't think it is too verbose.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344154909

From epeter at openjdk.org  Fri Sep 12 12:56:26 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 12:56:26 GMT
Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero
 nodes more precise [v14]
In-Reply-To: <rfJok2To3wAFUZVTuijpiuD03NQDcR3rouE9TNtoDPM=.f1ebc739-65fe-4c25-9587-5efddec3a0db@github.com>
References: <Iv0Ou9LLsRec6RaUaWA4pC7Ds7Hu_KXQbalxo71v8iM=.03c2733d-eaeb-4fec-a85b-cef252aa8c68@github.com>
 <rfJok2To3wAFUZVTuijpiuD03NQDcR3rouE9TNtoDPM=.f1ebc739-65fe-4c25-9587-5efddec3a0db@github.com>
Message-ID: <l__izApglFXlPPQplA8g68Li_484jKC7bHsdj6zhNWU=.19a5e4d0-d453-4faa-8e58-8bb642df7e4e@github.com>

On Wed, 10 Sep 2025 07:03:02 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases.
>> 
>> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch:
>> 
>> 
>> public static int numberOfNibbles(int i) {
>>   int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i);
>>   return Math.max((mag + 3) / 4, 1);
>> }
>> 
>> 
>> Testing: tier1, IR test
>
> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove redundant import

Thanks for the update. Though I think you modified the test example so far that it does not work any more, i.e. it would not constant fold if the output range was wrong. I've identified 2 sources that would prevent constant folding:
- It is unclear if `getResultChecksum` would get inlined. That way, the `result` loses the type information about the ranges.
- The comparisons themselves would not constant fold, because the values you compare with are not constants, but array element loads. You need to compare `result` with a compile time constant.

Maybe the idea is not 100% clear for you:
Imagine `result` should be in some range `2..10`. But with a bug, we now return `3..10`. This means the output of `numberOfLeadingZeros` is still variable, and it does not constant fold. But: if there is something below it, like a `result < 3` ... this would now constant fold to `false`, even though we could have had a `2` at runtime.
But for this to work, it all needs to be in the same compilation unit, and the value we compare to `result` must be a compile time constant.

Does that make sense?

test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 516:

> 514:     }
> 515: 
> 516:     int getResultChecksum(int result, int[] LIMITS) {

I would put a `@ForceInlinie` before this. You are using it in many methods, and so it may not get inlined reliably. And if it does not get inlined, then the result verifcation would not constant-fold, and so it would be kind of useless. Because we rely on the fact that if the range is wrong, we could get bad constant folding ;)

test/hotspot/jtreg/compiler/c2/gvn/TestCountBitsRange.java line 521:

> 519:             if (result < LIMITS[i]) sum += 1 << i;
> 520:             if (result > LIMITS[i + 1]) sum += 1 << (i + 1);
> 521:         }

I doublt that this works, because the test would not constant fold if the range was too narrow.
I think you need to manually unroll the loop, and load the constants from `static final` values, or another method that allows it to be a compile time constant.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3216519184
PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2344141663
PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2344145678

From epeter at openjdk.org  Fri Sep 12 13:03:19 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 13:03:19 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3]
In-Reply-To: <IS7_TgFKAYShMIh7km2ILg1eE59Z-llT3T6ID_gA3iU=.a577363d-66c6-4922-bd88-0ff2fcb55e5d@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <hcmJkTFFU9oEe5LE1Q5974w0KM5Pe6SLi0uKLAxU7rM=.70a3f0d7-eb32-44b0-b819-1c4db6273976@github.com>
 <UKkT1Wqi4ftj3eGF2KzT8saeWoWSBTXx5kw0FOiJyLE=.c10dbf15-3348-495b-b9aa-556b78bc1e0b@github.com>
 <ejAu9M0FYELqOdzDW8uankmdRt0w8bloAwcxWcyx5k0=.9a47c6c4-e9df-40f5-aba9-23073a12bd17@github.com>
 <L0LfhTmGAmKPwwFXYaibWSIA3rLaL8j1xL4OL4XkutY=.70bbb39a-5e86-4a80-952b-d3b98a2a4a36@github.com>
 <IS7_TgFKAYShMIh7km2ILg1eE59Z-llT3T6ID_gA3iU=.a577363d-66c6-4922-bd88-0ff2fcb55e5d@github.com>
Message-ID: <Iiv2ODWa0B9GyCO_fhLWLlB87EB2DGkpv5pK5OBuBLE=.cc80caec-8b8a-4aef-b71b-7869313fd546@github.com>

On Tue, 9 Sep 2025 21:45:00 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Also: you promise that it happens randomly. But it seems to be added deterministically everywhere. Did I miss something?
>
> Sorry for the confusion. Reworded the comment. I didn't intend to make it truly random. The idea was to automatically insert RF nodes during parsing to stress the implementation. It doesn't slow down compilation times that much, so aggressive insertion just works.

@iwanowww Ok, makes sense. I wonder though if we should consider random insertion.
It would also be nice to document somewhere that this only really tests the internal mechanism of ReachabilityFence. It does not really stess the case where we should have had a ReachabilityFence but fail to have one. Instead, we just insert more than we need. So we don't really expect this to trigger bugs with missing RFs.

>> Hmm. The way it is formulated it sounds more like:
>> - `true` -> we are guaranteed that it is a safepoint.
>> - `false` -> it may or may not be a safepoint - no guarantees.
>> Am I understanding this right?
>> 
>> If yes, then it would make more sense to have a default that is `no guarantee`. But maybe that makes things more complicated in other ways. All I'm saying it makes me nervous ;)
>
> You are right. I studied the code and `guaranteed_safepoint()` behaves as you described. It doesn't work for RF purposes, so I migrated the code to `sfpt->jvms() != nullptr` check and fixed a bug along the way. The changes related to `guaranteed_safepoint()` are reverted.

Maybe it could be worth putting some comments around `guaranteed_safepoint` that describe the logic of it, while you have the advantage of understanding what it means?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344175842
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344184330

From dlunden at openjdk.org  Fri Sep 12 13:08:01 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 13:08:01 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v7]
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <uuKRfLk4kK4ri09mjxk_-1kS0hTGaNBZDCs3K7aXU5A=.0c1dfb6e-d0ce-4c8c-9259-38ef54f18048@github.com>

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Rename InfiniteStack_size and fix style of touched code

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27215/files
  - new: https://git.openjdk.org/jdk/pull/27215/files/37f3cbd2..cf247cd2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=05-06

  Stats: 39 lines in 8 files changed: 15 ins; 0 del; 24 mod
  Patch: https://git.openjdk.org/jdk/pull/27215.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27215/head:pull/27215

PR: https://git.openjdk.org/jdk/pull/27215

From epeter at openjdk.org  Fri Sep 12 13:08:20 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 13:08:20 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <_n3uP_Dkl3RNq3MFoRDXsS28SM8CcQHaR6vdUJF9U8s=.dcfab97b-be28-4244-93df-c8a23d6d66b8@github.com>
 <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>
Message-ID: <6XMjW5KmnMDigmEXRpEy4lDGEUpElgzTq2YDULaFAAk=.1241a58d-3143-473c-b78b-afd60be2ef4b@github.com>

On Tue, 9 Sep 2025 21:51:31 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> I'm also not sure yet why there is a difference between incremental inlining and regular inlining.
>> Do you think it would make sense to explain that here, or is it explained elsewhere?
>
> There are no safepoint-attached reachability edges present during normal parsing. For incremental inlining, JVMS from the original call is taken and extended with callee state. If there are reachability edges present, they have to be treated specially and carried over to all safepoints produced during incremental inlining attempt. There's no such support in place yet.

@iwanowww Ok, sounds a bit complicated. Maybe that is what we have to do, at least for now. But please make sure that this is documented, maybe right here or elsewhere. Because it is only half-clear to me now.

Ok, so if the outer scope has RF edges, we need to make sure the inner scope has those RF edges too, right?
Ah, you are saying we are not doing that yet? Are you keeping track of that information for later?

Could we now create a reproducer that would fail in incremental inlining with a missing RF edge? Probably tricky, but very valuable ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344201256

From dlunden at openjdk.org  Fri Sep 12 13:08:06 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 12 Sep 2025 13:08:06 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v5]
In-Reply-To: <j5rDEvsSeUhhziTWPvuxfW3Oy_7njqM5sUgEnGGLJIY=.3ecd5c25-e002-499c-b082-b24ed9c75fdf@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <ZxjDAkYKza3KEFcLc6YgrI-9btMtVVz_-TWOZVXBVMU=.b417ed96-3b18-487c-92cb-2dd205834dd2@github.com>
 <j5rDEvsSeUhhziTWPvuxfW3Oy_7njqM5sUgEnGGLJIY=.3ecd5c25-e002-499c-b082-b24ed9c75fdf@github.com>
Message-ID: <NfjK0eb95bYLHvWAd8g-b0eanvF_SrfadaVVIQovXYU=.3b7cc018-9378-474f-bc45-4e9f4a55849e@github.com>

On Fri, 12 Sep 2025 12:36:55 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Rename constants
>
> src/hotspot/share/opto/chaitin.hpp line 51:
> 
>> 49: class LRG : public ResourceObj {
>> 50: public:
>> 51:   static const uint InfiniteStack_size = 0xFFFFF; // This mask size is used to tell that the mask of this LRG supports stack positions
> 
> We may want to prevent this from snowballing everywhere ... but this is also a constant and we probably want to call it `INFINITE_STACK_SIZE`, right?

I agree, now fixed (but yes, we need to stop snowballing at some point!)

> src/hotspot/share/opto/regmask.cpp line 249:
> 
>> 247:     if (_rm_word[i]) {                // Found some bits
>> 248:       // Convert to bit number, return hi bit in pair
>> 249:       return OptoReg::Name((i<<LogBitsPerWord) + find_lowest_bit(_rm_word[i]) + (size - 1));
> 
> Suggestion:
> 
>       return OptoReg::Name((i << LogBitsPerWord) + find_lowest_bit(_rm_word[i]) + (size - 1));
> 
> Might as well fix code style while we touch it ;)

Sure, now fixed (and checked and updated all other touched code as well)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344192294
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2344194711

From epeter at openjdk.org  Fri Sep 12 13:12:35 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 13:12:35 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <EfupY9ket8JIkRM8Lplq6Crn4q0wFbWplEqBEhsekV8=.2b604f83-823c-47e8-9470-99dbf1541508@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
 <EfupY9ket8JIkRM8Lplq6Crn4q0wFbWplEqBEhsekV8=.2b604f83-823c-47e8-9470-99dbf1541508@github.com>
Message-ID: <Vry0AHu7gc4hwodx_bUmO3WgQoWdr4DfsLu_5Bzd6yA=.82e6b04e-1723-4f4c-b0fa-07b55f43a65f@github.com>

On Fri, 12 Sep 2025 13:08:49 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> could we just go through _reachability_fences, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?
>> 
>> RF elimination needs control for referent to enumerate all interfering safepoints. 
>> 
>> Theoretically, it's possible to use a conservative estimate, but then:
>>  (1) it can worsen the result (by enumerating more interfering safepoints than needed); and
>>  (2) build an unschedulable graph if referent doesn't dominate safepoint node (if estimate is way too conservative). 
>> 
>> IMO it's safer to build full dominator tree here.  
>> 
>>> It probably has a performance impact, right? Have you measured that? 
>> 
>> It does have a noticeable cost. On my laptop it bumps the time spent doing RF processing from 170ms to 210ms
>> 
>> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:-StressReachabilityFences
>> 
>>          IdealLoop:             0.173 s
>>            ReachabilityFence:   0.000 s
>>              Optimize:          0.000 s
>>              Eliminate:         0.000 s
>> ``` 
>> vs
>> 
>> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences
>> 
>>          IdealLoop:             0.212 s
>>            ReachabilityFence:   0.030 s
>>              Optimize:          0.004 s
>>              Eliminate:         0.004 s
>> ``` 
>> 
>> I reimplemented it to piggyback on the last loop optimization attempt if there's any and it drastically improves the situation:
>> 
>> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences
>> 
>>          IdealLoop:             0.193 s
>>            ReachabilityFence:   0.009 s
>>              Optimize:          0.003 s
>>              Eliminate:         0.004 s
>
> @iwanowww 
> Ok, thanks for measuring this. We really need to keep an eye on this, otherwise it will surely trip @robcasloz 's C2 compile time benchmarking eventualyl ;)
> 
> Can you point me to the code where you are actually using the dominator information? I think I did not find it the last time I reviewed.

Ah, you mentioned it somewhere else:
> It's solely for get_ctrl(referent) call in enumerate_interfering_sfpts().

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344214962

From epeter at openjdk.org  Fri Sep 12 13:12:34 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 13:12:34 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
Message-ID: <EfupY9ket8JIkRM8Lplq6Crn4q0wFbWplEqBEhsekV8=.2b604f83-823c-47e8-9470-99dbf1541508@github.com>

On Wed, 10 Sep 2025 21:34:37 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> src/hotspot/share/opto/compile.cpp line 2522:
>> 
>>> 2520:     if (failing())  return;
>>> 2521:     assert(_reachability_fences.length() == 0, "no RF nodes allowed");
>>> 2522:   }
>> 
>> Looks better than before :)
>> 
>> I'm still wondering: do we need to do a whole loop-opts phase here? It probably has a performance impact, right?
>> Have you measured that?
>> 
>> If it is measurable: could we just go through `_reachability_fences`, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?
>
>> could we just go through _reachability_fences, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?
> 
> RF elimination needs control for referent to enumerate all interfering safepoints. 
> 
> Theoretically, it's possible to use a conservative estimate, but then:
>  (1) it can worsen the result (by enumerating more interfering safepoints than needed); and
>  (2) build an unschedulable graph if referent doesn't dominate safepoint node (if estimate is way too conservative). 
> 
> IMO it's safer to build full dominator tree here.  
> 
>> It probably has a performance impact, right? Have you measured that? 
> 
> It does have a noticeable cost. On my laptop it bumps the time spent doing RF processing from 170ms to 210ms
> 
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:-StressReachabilityFences
> 
>          IdealLoop:             0.173 s
>            ReachabilityFence:   0.000 s
>              Optimize:          0.000 s
>              Eliminate:         0.000 s
> ``` 
> vs
> 
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences
> 
>          IdealLoop:             0.212 s
>            ReachabilityFence:   0.030 s
>              Optimize:          0.004 s
>              Eliminate:         0.004 s
> ``` 
> 
> I reimplemented it to piggyback on the last loop optimization attempt if there's any and it drastically improves the situation:
> 
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences
> 
>          IdealLoop:             0.193 s
>            ReachabilityFence:   0.009 s
>              Optimize:          0.003 s
>              Eliminate:         0.004 s

@iwanowww 
Ok, thanks for measuring this. We really need to keep an eye on this, otherwise it will surely trip @robcasloz 's C2 compile time benchmarking eventualyl ;)

Can you point me to the code where you are actually using the dominator information? I think I did not find it the last time I reviewed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344212858

From epeter at openjdk.org  Fri Sep 12 13:21:41 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 13:21:41 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <_n3uP_Dkl3RNq3MFoRDXsS28SM8CcQHaR6vdUJF9U8s=.dcfab97b-be28-4244-93df-c8a23d6d66b8@github.com>
 <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>
Message-ID: <hRJMVwoNPY2xD1ntsuHYrmG283r7RCyAcTZZ6USWe4A=.25f49491-163e-44f8-946c-e157f8837250@github.com>

On Wed, 10 Sep 2025 21:39:02 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Ah, you could mention that later `ReachabilityFenceNode::Identity` removes the rf.
>
>> Is this rf guaranteed to belong to the Allocation somehow?
> 
> I don't get your question. The code iterates over users of an allocation which is being eliminated.  Semantically, RF is a no-op on a scalarizable referent and has to be removed in order to let the scalarization happen.
> 
>> Ah, you could mention that later ReachabilityFenceNode::Identity removes the rf.
> 
> Done.

But are we sure that the `ReachabilityFence` really belongs to the `Allocation` that is eliminated?
Can we check if the referent matches?
Because what if there are multiple allocations:

x = allocation;
y = allocation; // -> eliminate
ReachabilityFence(x); // is only ctrl use of Allocation for y, but belongs to Allocation of x.

Could there be such cases?

@iwanowww

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344241787
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344242372

From epeter at openjdk.org  Fri Sep 12 13:29:28 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 13:29:28 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
Message-ID: <Ci-CQQY-qs8vwCJmOqh2gmFHaULHRF1o9MXTu15rCJg=.becf246a-86a9-4adf-a2b6-ebfe27676347@github.com>

On Tue, 9 Sep 2025 21:27:12 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> src/hotspot/share/opto/reachability.cpp line 49:
>> 
>>> 47:  *
>>> 48:  * It is tempting to directly attach referents to interfering safepoints right from the beginning, but it
>>> 49:  * doesn't play well with some optimizations C2 does.
>> 
>> Do you have an example for such optimizations?
>
> Loop-invariant code motion is one example. Do you want me to add it to the comment?
> 
> After parsing is over, the IR is in valid state, but loop optimizations are the primary reason why it can be broken later.

Just make sure that this information is in the code comments - I'm not just asking for myself here ;)

>> src/hotspot/share/opto/reachability.cpp line 71:
>> 
>>> 69:  * Unfortunately, it's not straightforward to stay with safepoint-attached representation till the very end,
>>> 70:  * because information about derived oops is attached to safepoints the very same similar way. So, for now RFs are
>>> 71:  * rematerialized at safepoints before RA (phase #3).
>> 
>> `the very same similar way` sounds a little funny. I'm also not quite seeing the problem yet. What is the issue with the edges being attached to safepoints here?
>
>> the very same similar way sounds a little funny. I
> Fixed. 
> 
>>  What is the issue with the edges being attached to safepoints here?
> 
> The issue is safepoint-attached representation conflicts with derived oops representation. There's no way to distinguish between them. As of now, VM treats post-debug info edges as representing derived oops which is completely wrong when there are reachability edges present. More work is needed to support both cases.

Ok, thanks for the explanation. Just make sure that information is in the code comments ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344263650
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344265719

From epeter at openjdk.org  Fri Sep 12 14:12:42 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 14:12:42 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
Message-ID: <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>

On Thu, 11 Sep 2025 18:18:13 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This PR introduces C2 support for `Reference.reachabilityFence()`.
>> 
>> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
>> 
>> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
>> 
>> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
>> 
>> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
>> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
>> 
>> Testing:
>> - [x] hs-tier1 - hs-tier8
>> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
>> - [x] java/lang/foreign microbenchmarks
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Minor fix

@iwanowww Thanks for all the updates, and the presentation on Tuesday in our staff-meeting!

src/hotspot/share/opto/callGenerator.cpp line 620:

> 618:     // Inlining logic doesn't expect any extra edges past debug info and fails with
> 619:     // an assert in SafePointNode::grow_stack.
> 620:     assert(endoff == call->req(), "reachability edges not supported");

Could we trip over this assert by modifying the reproducer, and add some method somewhere that gets inlined late?

src/hotspot/share/opto/compile.hpp line 110:

> 108:   LoopOptsNone,
> 109:   LoopOptsMaxUnroll,
> 110:   LoopOptsEliminateRFs,

With the additional flags,  I think we now need some kind of documentation here. I'm losing a bit the overview - and maybe never really had it.

src/hotspot/share/opto/loopnode.cpp line 5341:

> 5339:     C->print_method(PHASE_ELIMINATE_REACHABILITY_FENCES, 2);
> 5340:     assert(C->reachability_fences_count() == 0, "no RF nodes allowed");
> 5341:   }

Can we somehow assert that we now really will never do loop-opts again?
Why are you checking for `_mode == LoopOptsDefaultFinal` and not for `LoopOptsEliminateRFs`?
If that was a bug, then more verification would be extra justified ;)

src/hotspot/share/opto/reachability.cpp line 52:

> 50:  *
> 51:  * Instead, reachability representation transitions through multiple phases:
> 52:  *   (0) initial set of RFs is materialized during parsing;

Suggestion:

 *   (0) initial set of RFs is materialized during parsing, by intrinsifying calls to Reference.reachabilityFence;

src/hotspot/share/opto/reachability.cpp line 54:

> 52:  *   (0) initial set of RFs is materialized during parsing;
> 53:  *   (1) optimization pass during loop opts eliminates redundant RF nodes and
> 54:  *       moves the ones with loop-invariant referents outside loops;

Suggestion:

 *   (1) optimization pass during loop opts eliminates redundant RF nodes and
 *       moves the ones with loop-invariant referents outside (after) loops;

src/hotspot/share/opto/reachability.cpp line 67:

> 65:  * Live ranges of values are routinely extended during loop opts. And it can break the invariant that
> 66:  * all interfering safepoints contain the referent in their oop map. (If an interfering safepoint doesn't
> 67:  * keep the referent alive, then it becomes possible for the referent to be prematurely GCed.)

Can we have a concrete example. I thought of a store that is sunk out of the loop. But of course that should not cross a SafePoint on the way either. So then that's not a good argument. Do you have one that works?

src/hotspot/share/opto/reachability.cpp line 70:

> 68:  *
> 69:  * After loop opts are over, it becomes possible to reliably enumerate all interfering safe points and
> 70:  * ensure the referent present in their oop maps.

Suggestion:

 * After loop opts are over, it becomes possible to reliably enumerate all interfering safe points and
 * to ensure that the referent is present in their oop maps.

Grammar. Maybe you need to fix it in a different way if it does not match the intended semantics ;)

src/hotspot/share/opto/reachability.cpp line 81:

> 79:  * (c) Unfortunately, it's not straightforward to stay with safepoint-attached representation till the very end,
> 80:  * because information about derived oops is attached to safepoints in a similar way. So, for now RFs are
> 81:  * rematerialized at safepoints before RA (phase #3).

I still don't understand this. What is similar to what? And why is that a problem?

src/hotspot/share/opto/reachability.hpp line 32:

> 30: #include "opto/type.hpp"
> 31: 
> 32: //------------------------ReachabilityFenceNode--------------------------

Suggestion:

// Represents a Reference.reachabilityFence call
// See documentation in reachability.cpp

test/hotspot/jtreg/compiler/c2/TestReachabilityFence.java line 40:

> 38:  * @run main/othervm -Xbatch compiler.c2.TestReachabilityFence
> 39:  */
> 40: public class TestReachabilityFence {

This test seems very important to me. Can you please add some extra code comments, about what goes wrong before the fix, i.e. if RF are not present?

Maybe some explanation about what it took to write this test, so that we can build on that to extend the test later?

-------------

PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3216761112
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344292203
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344381871
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344313204
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344334081
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344337061
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344345310
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344349802
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344355280
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344359681
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344374341

From epeter at openjdk.org  Fri Sep 12 14:12:45 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 14:12:45 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
Message-ID: <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>

On Fri, 12 Sep 2025 13:38:08 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Minor fix
>
> src/hotspot/share/opto/callGenerator.cpp line 620:
> 
>> 618:     // Inlining logic doesn't expect any extra edges past debug info and fails with
>> 619:     // an assert in SafePointNode::grow_stack.
>> 620:     assert(endoff == call->req(), "reachability edges not supported");
> 
> Could we trip over this assert by modifying the reproducer, and add some method somewhere that gets inlined late?

Could we also bail out here? Or what would happen now in production if there is a RF edge?

> src/hotspot/share/opto/loopnode.cpp line 5341:
> 
>> 5339:     C->print_method(PHASE_ELIMINATE_REACHABILITY_FENCES, 2);
>> 5340:     assert(C->reachability_fences_count() == 0, "no RF nodes allowed");
>> 5341:   }
> 
> Can we somehow assert that we now really will never do loop-opts again?
> Why are you checking for `_mode == LoopOptsDefaultFinal` and not for `LoopOptsEliminateRFs`?
> If that was a bug, then more verification would be extra justified ;)

Otherwise, please explain the meaning of `LoopOptsDefaultFinal`. Maybe it should be an OR here?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344294495
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344320789

From epeter at openjdk.org  Fri Sep 12 14:12:43 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 14:12:43 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
 <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
 <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>
Message-ID: <mgdTRnlYCZYfqFFzzwlsnCv5u4rQQsrVMw5sC7AmdO0=.ac414de9-0972-4efc-accc-e4202ae16797@github.com>

On Thu, 11 Sep 2025 18:24:46 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> @iwanowww Let me know whenever this is ready to review again ?
>
> @eme64 I think I addressed/answered all your suggestions/questions. Please, take another look. Thanks!

@iwanowww Thanks for the updates! I again only looked through most comments as well.

These are the major topics for me:
- `StressReachabilityFences` only inserts RF where they are not needed. So this allows us to test the consistency of the RF machinery, but not to test if we are missing RF where they are needed. That is much harder, and we should probably invest in writing more tests for those cases, even if it is really hard. Maybe we can even write fuzzing tests for it?
- There seems to be missing support for carrying RF edges through incremental inlining, right? File an RFE, or track it elsewhere. Could we create a reproducer for this case / can we extend the existing one? https://github.com/openjdk/jdk/pull/25315#discussion_r2330095168
- Are we sure that we don't eliminate the RF for the wrong allocation? https://github.com/openjdk/jdk/pull/25315#discussion_r2330230044
- Extra compile-time due to extra loop-opts round. https://github.com/openjdk/jdk/pull/25315#discussion_r2330176841 . It used to be a 20% increase, now you managed to make it only 10%. Still considerable. All of it just to call `get_ctrl(referent)` in `enumerate_interfering_sfpts`.

I think some of these issues should also be discussed in the PR description / JIRA description.
It would be especially nice if you could summarize the scope of the problem of RF, and which parts are now fixed, and which parts you know are not yet fixed. Of course there may be even more we don't know, but best write everything down we already do know. ;)

Other ideas:
- You should file an RFE to add your stress flags to the stress job, and also the fuzzer.
- I did not yet study the reproducer `TestReachabilityFence.java`. We should consider making a fuzzer style test out of it, maybe using the template framework. Feel free to just file an RFE for that, and assign it to me.

@shipilev @TobiHartmann @chhagedorn 
I'm soon going on vacation (in a week), and so I'd like the other reviewers to be aware of these issues.
I don't want to hold up the patch, so feel free to have someone else review. But I'm also happy to come back to this mid October.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3285447179

From epeter at openjdk.org  Fri Sep 12 14:12:49 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 14:12:49 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <Ci-CQQY-qs8vwCJmOqh2gmFHaULHRF1o9MXTu15rCJg=.becf246a-86a9-4adf-a2b6-ebfe27676347@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
 <Ci-CQQY-qs8vwCJmOqh2gmFHaULHRF1o9MXTu15rCJg=.becf246a-86a9-4adf-a2b6-ebfe27676347@github.com>
Message-ID: <JKD0jqllmhfDeNwEd1g7LMAx8V4idsZFTXrS4C0KkUI=.47cf85da-9d62-4339-8bbd-80821a48ac32@github.com>

On Fri, 12 Sep 2025 13:26:08 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Loop-invariant code motion is one example. Do you want me to add it to the comment?
>> 
>> After parsing is over, the IR is in valid state, but loop optimizations are the primary reason why it can be broken later.
>
> Just make sure that this information is in the code comments - I'm not just asking for myself here ;)

Yes, maybe say what the general problem is, and make a concrete example. I'm currently a bit struggling to think of one that is relevant.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344331771

From epeter at openjdk.org  Fri Sep 12 14:12:49 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 14:12:49 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <JKD0jqllmhfDeNwEd1g7LMAx8V4idsZFTXrS4C0KkUI=.47cf85da-9d62-4339-8bbd-80821a48ac32@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
 <Ci-CQQY-qs8vwCJmOqh2gmFHaULHRF1o9MXTu15rCJg=.becf246a-86a9-4adf-a2b6-ebfe27676347@github.com>
 <JKD0jqllmhfDeNwEd1g7LMAx8V4idsZFTXrS4C0KkUI=.47cf85da-9d62-4339-8bbd-80821a48ac32@github.com>
Message-ID: <4jTV6y9R_JfATA54LC7FK3DKdBX1srsU09DK1I25Uo0=.94233927-71f2-4f13-894d-206d00f5fdaa@github.com>

On Fri, 12 Sep 2025 13:51:42 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Just make sure that this information is in the code comments - I'm not just asking for myself here ;)
>
> Yes, maybe say what the general problem is, and make a concrete example. I'm currently a bit struggling to think of one that is relevant.

Ah yes: we may for example move a store out (after) the loop. But wait. We can't move a store across a SafePoint, so that's not a good example.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344342049

From epeter at openjdk.org  Fri Sep 12 14:20:34 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 12 Sep 2025 14:20:34 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block
Message-ID: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>

I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
https://github.com/openjdk/jdk/pull/20964
[See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)

This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.

------------------------------

**Goals**
- VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
- Remove `_nodes` from the vector vtnodes.

**Details**
- Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
  - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
- Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
- Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
- `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.

I also made a lot of annotations in the code below, for easier review.

**Suggested order for review**
- Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
- Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
- `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
- `VTransformApplyState`: how it now tracks the memory state.
- `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
- Then look at all the other details.

-------------

Commit messages:
 - fix documentation
 - mem_ref -> vpointer
 - wip rm nodes
 - control dependency
 - phi cleanup
 - apply_backedge
 - hook inputs
 - apply
 - wip init memory state
 - small improvement
 - ... and 6 more: https://git.openjdk.org/jdk/compare/2826d170...3ec3ea2a

Changes: https://git.openjdk.org/jdk/pull/27208/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367389
  Stats: 690 lines in 10 files changed: 363 ins; 243 del; 84 mod
  Patch: https://git.openjdk.org/jdk/pull/27208.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27208/head:pull/27208

PR: https://git.openjdk.org/jdk/pull/27208

From missa at openjdk.org  Fri Sep 12 20:32:54 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Fri, 12 Sep 2025 20:32:54 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v12]
In-Reply-To: <r7BF7aD9Fdk0lipCH8Z0UBddG3buXXIa3SsA3smDNvc=.b5e36dc9-4d07-4b8d-abd4-7d449842e85b@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <t2javAPv4fqPJOS4or2dIL2lU1jcI6F_Dk88kPEJ2KE=.c9607d43-3d92-437c-8d6d-73558b55b0dd@github.com>
 <r7BF7aD9Fdk0lipCH8Z0UBddG3buXXIa3SsA3smDNvc=.b5e36dc9-4d07-4b8d-abd4-7d449842e85b@github.com>
Message-ID: <_p9DjOv5DH3cP7WAD4Sf4f9pxil8WEx7cf7d-6Od1XI=.b28fbc4a-6632-47f7-a098-860daabd9ec8@github.com>

On Thu, 11 Sep 2025 23:27:31 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Change the floating point conversion instruction, IR nodes, and test rules to make them clearer
>>  - Change debug text format of AVX 10.2 vector conversion instructions
>
> src/hotspot/cpu/x86/x86.ad line 7669:
> 
>> 7667:   predicate(!VM_Version::supports_avx10_2() &&
>> 7668:             !VM_Version::supports_avx512vl() &&
>> 7669:             Matcher::vector_length_in_bytes(n->in(1)) < 64 &&
> 
> Good to add "is_integral_type(Matcher::vector_element_basic_type(n)) &&" here.

Added

> test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java line 26:
> 
>> 24: /**
>> 25: * @test
>> 26: * @bug 8287835 8320347
> 
> Did you mean 8364305 here?

Yes, I was looking up a different one. I correct it now. Thanks.

> test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 364:
> 
>> 362:         applyIfCPUFeatureAnd = {"avx2", "true", "avx10_2", "false"})
>> 363:     @IR(counts = {IRNode.X86_VCAST_F2X_AVX10, "> 0"},
>> 364:         applyIfCPUFeature = {"avx10_2", "true"})
> 
> Need to add the following for X86_VCAST_F2X as well as X86_VCAST_F2X_AVX10.
> applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"},

Added

> test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 387:
> 
>> 385:         applyIfCPUFeatureAnd = {"avx2", "true", "avx10_2", "false"})
>> 386:     @IR(counts = {IRNode.X86_VCAST_F2X_AVX10, "> 0"},
>> 387:         applyIfCPUFeature = {"avx10_2", "true"})
> 
> Need to add the following for X86_VCAST_F2X as well as X86_VCAST_F2X_AVX10.
> applyIfOr = {"AlignVector", "false", "UseCompactObjectHeaders", "false"},

Added

> test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 413:
> 
>> 411:         applyIfCPUFeatureAnd = {"avx", "true", "avx10_2", "false"})
>> 412:     @IR(counts = {IRNode.X86_VCAST_D2X_AVX10, "> 0"},
>> 413:         applyIfCPUFeature = {"avx10_2", "true"})
> 
> Need to add the following for X86_VCAST_D2X and X86_VCAST_D2X_AVX10:
> applyIf = {"MaxVectorSize", ">=16"},

Added

> test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 432:
> 
>> 430:         applyIfCPUFeatureAnd = {"avx", "true", "avx10_2", "false"})
>> 431:     @IR(counts = {IRNode.X86_VCAST_D2X_AVX10, "> 0"},
>> 432:         applyIfCPUFeature = {"avx10_2", "true"})
> 
> Need to add the following for X86_VCAST_D2X and X86_VCAST_D2X_AVX10:
> applyIf = {"MaxVectorSize", ">=16"},

Added

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2345368353
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2345369890
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2345368764
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2345368997
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2345369217
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2345369350

From missa at openjdk.org  Fri Sep 12 20:32:53 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Fri, 12 Sep 2025 20:32:53 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v13]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <B6eIsPElSczUZ0wnfNkqxODeuxji1CVkB7O_XhY34V4=.0405478d-d4eb-4d5f-8a26-fc4b5ecf244e@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg/com...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Add extra constraints to vector floating point conversion instruction predicates and tests

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/df175756..025d815f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=11-12

  Stats: 11 lines in 3 files changed: 9 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From vlivanov at openjdk.org  Fri Sep 12 20:53:30 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 12 Sep 2025 20:53:30 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
Message-ID: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>

As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.

Consider `FloatVector::lanewiseTemplate`:

    FloatVector lanewiseTemplate(VectorOperators.Unary op) {
        if (opKind(op, VO_SPECIAL)) {
            ...                             
            else if (opKind(op, VO_MATHLIB)) {
                return unaryMathOp(op);
            }
        }
        int opc = opCode(op);
        return VectorSupport.unaryOp(opc, ...);
    }


At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 

It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.

The fix is to fail-fast intrinsification rather than crashing the VM.

Testing: tier1 - tier4

-------------

Commit messages:
 - fix

Changes: https://git.openjdk.org/jdk/pull/27263/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27263&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367333
  Stats: 168 lines in 3 files changed: 168 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27263.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27263/head:pull/27263

PR: https://git.openjdk.org/jdk/pull/27263

From dlong at openjdk.org  Fri Sep 12 22:15:12 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 12 Sep 2025 22:15:12 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v7]
In-Reply-To: <uuKRfLk4kK4ri09mjxk_-1kS0hTGaNBZDCs3K7aXU5A=.0c1dfb6e-d0ce-4c8c-9259-38ef54f18048@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <uuKRfLk4kK4ri09mjxk_-1kS0hTGaNBZDCs3K7aXU5A=.0c1dfb6e-d0ce-4c8c-9259-38ef54f18048@github.com>
Message-ID: <YKOrPyGjNn_bZ_pX19eJLXCtqzNUX0v802ZFuJ1g_V0=.e4cf02b9-13dd-4fd3-86cc-be5cbc83b6ea@github.com>

On Fri, 12 Sep 2025 13:08:01 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename InfiniteStack_size and fix style of touched code

Marked as reviewed by dlong (Reviewer).

src/hotspot/share/opto/regmask.hpp line 173:

> 171:   }
> 172: 
> 173:   void set_infinite() {

Suggestion:

  void set_infinite_stack() {

For consistency with `is_infinite_stack()`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27215#pullrequestreview-3218936271
PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2345519216

From dlong at openjdk.org  Fri Sep 12 22:25:18 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 12 Sep 2025 22:25:18 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v4]
In-Reply-To: <wydAw2cAmqyLR3bPxtOSUTr4CbSvVLqV4g6ZD-N6UXw=.3c9a5166-943c-44c2-aa50-eae4b05c4a22@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <XqX_nPhu_ej5IgW2_z3DFreGRjyfUELqEEiIic2Bg9c=.7ed11c6f-0bbd-48b2-b80f-4d71010886f2@github.com>
 <19P8X88PcVoh8x62iBz5baOyefoWgGp-aFd_Bli-vm0=.fbd446d3-0dad-41e8-bb13-ab113a3b9767@github.com>
 <-hb0UM3-WL9h72oPsOvK5NA2-pYCyDQZS42R7FzPJ3s=.5dbaea39-391e-4e25-9e9e-f97940d3bc06@github.com>
 <wydAw2cAmqyLR3bPxtOSUTr4CbSvVLqV4g6ZD-N6UXw=.3c9a5166-943c-44c2-aa50-eae4b05c4a22@github.com>
Message-ID: <lLC10-3Ppn9IcN7DxndCJLVL3tO2i6FkCzQ3CvgmiUI=.faffa2a1-41a4-49bd-aa5a-614e5d7f85eb@github.com>

On Fri, 12 Sep 2025 12:40:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I renamed this entire group of constants to use the same style (uppercase separated by `_`, without leading `_`). It is now `WORD_BIT_MASK`. I think it makes more sense to use the same style across `regmask.hpp`, rather than following styles in other files.
>
> Nice!

I would have been OK with CamelCase, but then I would have wondered why the constant wasn't defined in globalDefinitions.hpp instead :-)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2345530581

From dlong at openjdk.org  Fri Sep 12 22:44:26 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 12 Sep 2025 22:44:26 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <PsAetiA4N_lr7Mz7DJKMP7v-pVoRV9LZTvDC0tuNvWw=.4da18f35-c089-4926-a5d4-bcafcb3ab0e3@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
 <BuKfkAAcusJ6TNHSHtVaYYcmjnAVTIInXbhd4Z5Fg5w=.067f6b09-67e0-4b97-9753-c727c67343ca@github.com>
 <PsAetiA4N_lr7Mz7DJKMP7v-pVoRV9LZTvDC0tuNvWw=.4da18f35-c089-4926-a5d4-bcafcb3ab0e3@github.com>
Message-ID: <_YXE9yfxaouyeyMsdurEy_uEx0FJDbGcX8M8L7aDqm0=.770ff0aa-8ae3-46ac-8cc1-7d38710e859e@github.com>

On Fri, 12 Sep 2025 07:26:20 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> src/hotspot/share/opto/loopTransform.cpp line 3992:
>> 
>>> 3990:   Node* frame = new ParmNode(C->start(), TypeFunc::FramePtr);
>>> 3991:   _igvn.register_new_node_with_optimizer(frame);
>>> 3992:   call->init_req(TypeFunc::FramePtr,  frame);
>> 
>> This seems unrelated.  Is it needed?
>
> It's one of the things mentioned in that comment:
> https://github.com/openjdk/jdk/pull/24570#issuecomment-2883651987
> 
> "I added asserts to catch cases where proj_out is called but the node has more than one matching projection. With those asserts, I caught some false positive/cases where we got lucky and worked around them by reworking the code so it doesn't use proj_out. That's the case in PhaseIdealLoop::intrinsify_fill(): we can end up there with more than one FramePtr projection because the code pattern used elsewhere is to add one more projection and let identical projections common during igvn. "

Are we just lucky that we don't have the same problem with ReturnAdr here?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2345548490

From missa at openjdk.org  Sat Sep 13 01:21:12 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 13 Sep 2025 01:21:12 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v14]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <4Eui7URmA1Y5NPrrV4813qb7UUsNVSRP-JSnPdX0Ojg=.4db7c50e-18cd-47ec-ae8c-4ae17597b286@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Introduce scalar floating point conversion tests with IR rules

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/025d815f..5d26ff48

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=12-13

  Stats: 262 lines in 3 files changed: 252 ins; 0 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From epeter at openjdk.org  Sat Sep 13 04:52:10 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Sat, 13 Sep 2025 04:52:10 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v7]
In-Reply-To: <uuKRfLk4kK4ri09mjxk_-1kS0hTGaNBZDCs3K7aXU5A=.0c1dfb6e-d0ce-4c8c-9259-38ef54f18048@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <uuKRfLk4kK4ri09mjxk_-1kS0hTGaNBZDCs3K7aXU5A=.0c1dfb6e-d0ce-4c8c-9259-38ef54f18048@github.com>
Message-ID: <K3qIOyUwT8HFfGZUS03imyxKa2-ZkzDF3l43Zl7zw08=.e99c7a86-4bad-4244-b6a6-a68da598599d@github.com>

On Fri, 12 Sep 2025 13:08:01 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename InfiniteStack_size and fix style of touched code

Changes requested by epeter (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27215#pullrequestreview-3219509250

From epeter at openjdk.org  Sat Sep 13 04:52:12 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Sat, 13 Sep 2025 04:52:12 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v7]
In-Reply-To: <YKOrPyGjNn_bZ_pX19eJLXCtqzNUX0v802ZFuJ1g_V0=.e4cf02b9-13dd-4fd3-86cc-be5cbc83b6ea@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <uuKRfLk4kK4ri09mjxk_-1kS0hTGaNBZDCs3K7aXU5A=.0c1dfb6e-d0ce-4c8c-9259-38ef54f18048@github.com>
 <YKOrPyGjNn_bZ_pX19eJLXCtqzNUX0v802ZFuJ1g_V0=.e4cf02b9-13dd-4fd3-86cc-be5cbc83b6ea@github.com>
Message-ID: <4TEpZlUksghJKcxz5Vd0kxvlt13fr-Z-4xdHD91NFtQ=.1ab10a84-1bb7-4704-a86b-dadbda0e38f4@github.com>

On Fri, 12 Sep 2025 22:12:03 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Rename InfiniteStack_size and fix style of touched code
>
> src/hotspot/share/opto/regmask.hpp line 173:
> 
>> 171:   }
>> 172: 
>> 173:   void set_infinite() {
> 
> Suggestion:
> 
>   void set_infinite_stack() {
> 
> For consistency with `is_infinite_stack()`.

Yes, it should be `set_infinite_stack` in parallel with `is_infinite_stack`, nice catch!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2345896440

From jbhateja at openjdk.org  Sat Sep 13 08:40:27 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Sat, 13 Sep 2025 08:40:27 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v14]
In-Reply-To: <4Eui7URmA1Y5NPrrV4813qb7UUsNVSRP-JSnPdX0Ojg=.4db7c50e-18cd-47ec-ae8c-4ae17597b286@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <4Eui7URmA1Y5NPrrV4813qb7UUsNVSRP-JSnPdX0Ojg=.4db7c50e-18cd-47ec-ae8c-4ae17597b286@github.com>
Message-ID: <h5zYzw4-3S7--SEB5eAQakfXk41ytIDP2rAAyaSnnfM=.45cb037a-6503-4e5b-b90e-9df9fc3a4bb4@github.com>

On Sat, 13 Sep 2025 01:21:12 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Introduce scalar floating point conversion tests with IR rules

test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 70:

> 68:             float_arr[i] = ran.nextFloat(floor_val, ceil_val);
> 69:             double_arr[i] = ran.nextDouble(floor_val, ceil_val);
> 70:         }

Please use Generators instead of direct initialization.

test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 89:

> 87:             if (int_arr[i] != expected) {
> 88:                 throw new RuntimeException("Invalid result: int_arr[" + i + "] = " + int_arr[i] + " != " + expected);
> 89:             }

Use Verify.checkEQ instead.

test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 109:

> 107:             if (long_arr[i] != expected) {
> 108:                 throw new RuntimeException("Invalid result: long_arr[" + i + "] = " + long_arr[i] + " != " + expected);
> 109:             }

Use Verify.checkEQ, checkout relevant code in https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/lib and their usages

test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 122:

> 120:         checkf2short();
> 121:     }
> 122: 

What is the reason behind additional level of abstraction when now manually inline this code.

test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 138:

> 136:         applyIfCPUFeature = {"avx10_2", "false"})
> 137:     @IR(counts = {IRNode.X86_SCONV_F2I_AVX10, "> 0"},
> 138:         applyIfCPUFeature = {"avx10_2", "true"})

These IR rules apply to the CompilePhase.MATCHING

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2346076729
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2346078144
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2346080343
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2346082356
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2346097798

From hgreule at openjdk.org  Sun Sep 14 14:44:02 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Sun, 14 Sep 2025 14:44:02 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v9]
In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
Message-ID: <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>

> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
> 
> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
> 
> ### Monotonicity
> 
> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
> 
> ### Testing
> 
> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
> 
> Please review and let me know what you think.
> 
> ### Other
> 
> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
> 
> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.

Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:

  remove unused parameter

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25254/files
  - new: https://git.openjdk.org/jdk/pull/25254/files/41d0e2c7..96602c67

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=07-08

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/25254.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254

PR: https://git.openjdk.org/jdk/pull/25254

From hgreule at openjdk.org  Sun Sep 14 14:44:04 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Sun, 14 Sep 2025 14:44:04 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v8]
In-Reply-To: <19JdaOkvM92QSjXvYVr1CNSXD5hkXINl1gh6qj-DCMQ=.6b268ebd-6c9a-4b33-b355-1dc41de53454@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <19JdaOkvM92QSjXvYVr1CNSXD5hkXINl1gh6qj-DCMQ=.6b268ebd-6c9a-4b33-b355-1dc41de53454@github.com>
Message-ID: <JOIcF4ymxbs1f3mt7Fr181o8fB0Nuj3DigYyTd6q8oY=.a6fb12bc-e4c3-4ef9-9381-bf64b8e7b2f0@github.com>

On Thu, 11 Sep 2025 17:42:46 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   address comments

I noticed one parameter was unused, I removed it now. This shouldn't affect testing I guess.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3289599649

From duke at openjdk.org  Mon Sep 15 02:22:46 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 02:22:46 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v12]
In-Reply-To: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
Message-ID: <AsmaBqIpxGH9HwDO0Zhxb5VCE3H-S5fUEbsm6a45Czw=.b0fb6a9d-1ad1-47bb-a921-71867163a81f@github.com>

> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>     => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>     => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
> 
> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
> testCompareLTMaskNotInt		ops/s	1672180.09	995.238142	2353757.863	853.774734	1.4
> testCompareLTMaskNotLong	ops/s	856502.26...

erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision:

 - Simplify JMH testing
 - Merge branch 'master' into JDK-8354242
 - Update the code comment
 - Align indentation
 - Merge branch 'master' into JDK-8354242
 - Address more comments
   
   ATT.
 - Merge branch 'master' into JDK-8354242
 - Support negating unsigned comparison for BoolTest::mask
   
   Added a static method `negate_mask(mask btm)` into BoolTest class to
   negate both signed and unsigned comparison.
 - Addressed some review comments
 - Merge branch 'master' into JDK-8354242
 - ... and 10 more: https://git.openjdk.org/jdk/compare/4d660b21...52bbd3cd

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24674/files
  - new: https://git.openjdk.org/jdk/pull/24674/files/04142a19..52bbd3cd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=10-11

  Stats: 129948 lines in 3408 files changed: 76187 ins; 35380 del; 18381 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674

From duke at openjdk.org  Mon Sep 15 02:30:21 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 02:30:21 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v11]
In-Reply-To: <Dy9rqrrgUAsownC_lhp5729sObjNAXlCQs6RwOZusCQ=.12827ffc-d4c8-4a98-a309-653af5a97519@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <HKiejePdRHy-xJNBBNnw09SHkkOpY0EWaVIvS-xg36E=.55c2c30d-f84b-4ca0-a4a8-a25ffbd31236@github.com>
 <Dy9rqrrgUAsownC_lhp5729sObjNAXlCQs6RwOZusCQ=.12827ffc-d4c8-4a98-a309-653af5a97519@github.com>
Message-ID: <bmUZ1_1zQGNoHLj81ccIb0NFgnhDTJG2BMaJLh-wwuc=.ccb4ba93-6406-4d5d-a6b0-55c5cf02149c@github.com>

On Tue, 9 Sep 2025 13:03:03 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> erifan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update the code comment
>
> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 1007:
> 
>> 1005:         testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fninf, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
>> 1006:         verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fninf);
>> 1007:     }
> 
> Do you have test cases for the cases other than `EQ` and `NE`? After all, we don't that someone accidentally messes with the logic you implemented later and we don't notice the bug ;)

For `float` and `double`, only `EQ` and `NE` are supported. So the positive test only includes these two OPs. And we have one negative test for other unsupported OPs, see  `testCompareMaskNotFloatNegative`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2347761055

From duke at openjdk.org  Mon Sep 15 03:34:19 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 03:34:19 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress
In-Reply-To: <ZBjcahJBoRNtaFQl_Fqxfyl9nLiYdvxbB8Sd-bhBkyA=.5af061c8-e9b0-4d70-a3f5-8025e56b7a23@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <ZBjcahJBoRNtaFQl_Fqxfyl9nLiYdvxbB8Sd-bhBkyA=.5af061c8-e9b0-4d70-a3f5-8025e56b7a23@github.com>
Message-ID: <IQFCLCgRj2unKoPmC1XYhLLzGRsv316IStEALqkLEIQ=.e7574d3c-6a0f-41f3-8b7b-eb5eefe0c4ca@github.com>

On Thu, 11 Sep 2025 06:10:59 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
>> 
>> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
>> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
>> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
>> 
>> This pull request introduces the following changes:
>> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
>> 2. Eliminates unnecessary compress operations for partial subword type cases.
>> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
>> 
>> Benchmark results demonstrate that these changes significantly improve performance.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
>> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>> 
>> 
>> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.
>
> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2292:
> 
>> 2290:   // Return if the vector length is no more than MaxVectorSize/2, since the
>> 2291:   // highest half is invalid.
>> 2292:   if (vector_length_in_bytes <= (MaxVectorSize >> 1)) {
> 
> Couldn't this check be done first thing when the function is called? Then you would avoid unnecessary work?
> 
> I also wonder if this check should be done before `sve_compress_byte` is called, but I think at the very least it should be done first thing in this function.

We need to do the lower half, so I think there's no unnecessary work.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2347805971

From duke at openjdk.org  Mon Sep 15 05:43:11 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 05:43:11 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v13]
In-Reply-To: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
Message-ID: <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>

> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>     => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>     => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
> 
> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
> testCompareLTMaskNotInt		ops/s	1672180.09	995.238142	2353757.863	853.774734	1.4
> testCompareLTMaskNotLong	ops/s	856502.26...

erifan has updated the pull request incrementally with one additional commit since the last revision:

  Add an IR rule for vector mask cast operation

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24674/files
  - new: https://git.openjdk.org/jdk/pull/24674/files/52bbd3cd..56bb34ff

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=11-12

  Stats: 40 lines in 1 file changed: 40 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/24674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674

PR: https://git.openjdk.org/jdk/pull/24674

From duke at openjdk.org  Mon Sep 15 05:43:14 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 05:43:14 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v11]
In-Reply-To: <Dy9rqrrgUAsownC_lhp5729sObjNAXlCQs6RwOZusCQ=.12827ffc-d4c8-4a98-a309-653af5a97519@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <HKiejePdRHy-xJNBBNnw09SHkkOpY0EWaVIvS-xg36E=.55c2c30d-f84b-4ca0-a4a8-a25ffbd31236@github.com>
 <Dy9rqrrgUAsownC_lhp5729sObjNAXlCQs6RwOZusCQ=.12827ffc-d4c8-4a98-a309-653af5a97519@github.com>
Message-ID: <PBIddzyhuyq2_vDIkQPqdy0rM57w_MiAbHYO1tv48eA=.5591bbc5-57ce-49fc-8e8a-5ef1710a46f6@github.com>

On Tue, 9 Sep 2025 13:04:48 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> erifan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update the code comment
>
> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 911:
> 
>> 909:         testCompareMaskNotLong(L_SPECIES_FOR_CAST, VectorOperators.UGE, (m) -> { return m.cast(I_SPECIES_FOR_CAST).not(); });
>> 910:         verifyResultsLong(L_SPECIES_FOR_CAST, VectorOperators.UGE);
>> 911:     }
> 
> You have some cast in here, and in similar tests.
> Can you add an IR rule to check if we do or do not have the expected casts?

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2347920402

From duke at openjdk.org  Mon Sep 15 05:46:36 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 05:46:36 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v7]
In-Reply-To: <aPgI3IJisrH2EWUdZmjG-3STzFMsUnBDcuzY2060JuQ=.3d5b8674-4233-47cc-b556-fedab2f11359@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <dCstHcUFS9A79fKEf3RWnPrxvnzKjyVfbBzyT_iyzYo=.19255391-54fb-445e-b7e8-faf016e8a79f@github.com>
 <jc11aMooMRS54e6I3rd0HyobUW38VG_SbP60BoHUu48=.6ad63307-03bb-4171-bfa6-4f40741a1fc6@github.com>
 <NOSjg9nd8YCpTLPchcVXO2KxOzfTmYuxaQHqZhmHGUo=.e98cf933-0c08-4761-8210-75d56ece7542@github.com>
 <tLkj61MwZSaQEeLO3reAqAWfAMbs_hcR4wVXuUNpu5E=.197c558b-665f-4d7d-8f0c-97031a0ccf16@github.com>
 <15TW6hiffz65NhHevPefL_6swSC07UD-GwiJ4tPDtFs=.b83081df-8abd-4756-b4e0-1d969678a0d2@github.com>
 <FFyeak7o5Plkg2ljHZD05VetZ9uI81UnZN1sc65ZqAg=.201bccb4-361c-4869-baac-d73c49f5f8d7@github.com>
 <aPgI3IJisrH2EWUdZmjG-3STzFMsUnBDcuzY2060JuQ=.3d5b8674-4233-47cc-b556-fedab2f11359@github.com>
Message-ID: <wnuHxPrB-sz6ZPejrDntdoDzrwwdcYbryzswrAE_8TA=.3d97f0b2-31c3-47d0-b904-767527cb5b73@github.com>

On Wed, 10 Sep 2025 07:43:20 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Hi @eme64 @theRealAph @XiaohongGong @fg1417 @shqking ,  could you help take a look at this PR, thanks
>
> @erifan Sounds good. No rush, it takes as long as it takes. I'll soon be on vacation too and may not respond until mid of October.

Hi @eme64 I have dealt with all of your suggestions except one that I think it has already been covered. Could you please have a look at this PR when you have a chance? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3290572086

From duke at openjdk.org  Mon Sep 15 05:55:43 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 05:55:43 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v4]
In-Reply-To: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
Message-ID: <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>

> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
> 1. **Subword types** on SVE2-capable hardware.
> 2. **All types** on NEON and SVE1 environments.
> 
> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
> 
> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
> 
> To compute: dst = src.expand(mask)
> Data direction: high <== low
> Input:
>   src                         = p o n m l k j i h g f e d c b a
>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
> Expected result:
>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
> 
> Step 1: calculate the index input of the TBL instruction.
> 
> // Set tmp1 as all 0 vector.
> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 
> // Move the mask bits from the predicate register to a vector register.
> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
> 
> // Shift the entire register. Prefix sum algorithm.
> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
> 
> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
> 
> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
> 
> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
> 
> // Clear inactive elements.
> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
> 
> // Set the inactive lane value to -1 and set the active lane to the target index.
> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
> 
> Step 2: shuffle the source vector elements to the target vector
> 
> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
> 
> 
> The same algorithm is used for NEON and SVE1, but with different instructions where appropriate.
> 
> The following benchmarks are from panama-...

erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:

 - Merge branch 'master' into JDK-8363989
 - Align code example data for better reading
 - Merge branch 'master' into JDK-8363989
 - Improve the comment of the vector expand implementation
 - Merge branch 'master' into JDK-8363989
 - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
   
   Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
   for 32-bit and 64-bit types only when SVE2 is available. In the following
   cases, `expand` has not yet been intrinsified:
   1. **Subword types** on SVE2-capable hardware.
   2. **All types** on NEON and SVE1 environments.
   
   As a result, `expand` API performance is very poor in these scenarios.
   This patch intrinsifies the `expand` operation in the above environments.
   
   Since there are no native instructions directly corresponding to `expand`
   in these cases, this patch mainly leverages the `TBL` instruction to
   implement `expand`. To compute the index input for `TBL`, the prefix sum
   algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
   Take a 128-bit byte vector on SVE2 as an example:
   ```
   To compute: dst = src.expand(mask)
   Data direction: high <== low
   Input:
     src                         = p o n m l k j i h g f e d c b a
     mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
   Expected result:
     dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
   ```
   Step 1: calculate the index input of the TBL instruction.
   ```
   // Set tmp1 as all 0 vector.
   tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   
   // Move the mask bits from the predicate register to a vector register.
   // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
   tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
   
   // Shift the entire register. Prefix sum algorithm.
   dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
   tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
   
   dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
   tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
   
   dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
   tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
   
   dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
   tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
   
   // Clear inactive elements.
   dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
   
   // Set the inactive lane value to -1 and set the active lane to the target index.
   dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
   ```
   Step 2: shuffle the source vector elements to the target vector
   ```
   tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
   ```
   
   The same algorithm is used for NEON and SVE1, but with different
   instructions where appropriate.
   
   The following benchmarks are from panama-vector/vectorIntrinsics.
   
   On Nvidia Grace machine with option `-XX:UseSVE=2`:
   ```
   Benchmark		Unit	Before		Score Error	After		Score Error	Uplift
   Byte128Vector.expand	ops/ms	1791.022366	5.619883	9633.388683	1.968788	5.37
   Double128Vector.expand	ops/ms	4489.255846	0.48485		4488.772949	0.491596	0.99
   Float128Vector.expand	ops/ms	8863.02424	6.888087	8908.352235	51.487453	1
   Int128Vector.expand	ops/ms	8873.485683	3.275682	8879.635643	1.243863	1
   Long128Vector.expand	ops/ms	4485.1149	4.458073	4489.365269	0.851093	1
   Short128Vector.expand	ops/ms	792.068834	2.640398	5880.811288	6.40683		7.42
   Byte64Vector.expand	ops/ms	854.455002	8.548982	5999.046295	37.209987	7.02
   Double64Vector.expand	ops/ms	46.49763	0.104773	46.526043	0.102451	1
   Float64Vector.expand	ops/ms	4510.596811	0.504477	4509.984244	1.519178	0.99
   Int64Vector.expand	ops/ms	4508.778322	1.664461	4535.216611	26.742484	1
   Long64Vector.expand	ops/ms	45.665462	0.705485	46.496232	0.075648	1.01
   Short64Vector.expand	ops/ms	394.527324	1.284691	3860.199621	0.720015	9.78
   ```
   
   On Nvidia Grace machine with option `-XX:UseSVE=1`:
   ```
   Benchmark		Unit	Before		Score Error	After		Score Error	Uplift
   Byte128Vector.expand	ops/ms	1767.314171	12.431526	9630.892248	1.478813	5.44
   Double128Vector.expand	ops/ms	197.614381	0.945541	2416.075281	2.664325	12.22
   Float128Vector.expand	ops/ms	390.878183	2.089234	3844.011978	3.792751	9.83
   Int128Vector.expand	ops/ms	394.550044	2.025371	3843.280133	3.528017	9.74
   Long128Vector.expand	ops/ms	198.366863	0.651726	2423.234639	4.911434	12.21
   Short128Vector.expand	ops/ms	790.044704	3.339363	5885.595035	1.440598	7.44
   Byte64Vector.expand	ops/ms	853.479119	7.158898	5942.750116	1.054905	6.96
   Double64Vector.expand	ops/ms	46.550458	0.079191	46.423053	0.057554	0.99
   Float64Vector.expand	ops/ms	197.977215	1.156535	2445.010767	1.992358	12.34
   Int64Vector.expand	ops/ms	198.326857	1.02785		2444.211583	2.5432		12.32
   Long64Vector.expand	ops/ms	46.526513	0.25779		45.984253	0.566691	0.98
   Short64Vector.expand	ops/ms	398.649412	1.87764		3837.495773	3.528926	9.62
   ```
   
   On Nvidia Grace machine with option `-XX:UseSVE=0`:
   ```
   Benchmark		Unit	Before		Score Error	After		Score Error	Uplift
   Byte128Vector.expand	ops/ms	1802.98702	6.906394	9427.491602	2.067934	5.22
   Double128Vector.expand	ops/ms	198.498191	0.429071	1190.476326	0.247358	5.99
   Float128Vector.expand	ops/ms	392.849005	2.034676	2373.195574	2.006566	6.04
   Int128Vector.expand	ops/ms	395.69179	2.194773	2372.084745	2.058303	5.99
   Long128Vector.expand	ops/ms	198.191673	1.476362	1189.712301	1.006821	6
   Short128Vector.expand	ops/ms	795.785831	5.62611		4731.514053	2.365213	5.94
   Byte64Vector.expand	ops/ms	843.549268	7.174254	5865.556155	37.639415	6.95
   Double64Vector.expand	ops/ms	45.943599	0.484743	46.529755	0.111551	1.01
   Float64Vector.expand	ops/ms	193.945993	0.943338	1463.836772	0.618393	7.54
   Int64Vector.expand	ops/ms	194.168021	0.492286	1473.004575	8.802656	7.58
   Long64Vector.expand	ops/ms	46.570488	0.076372	46.696353	0.078649	1
   Short64Vector.expand	ops/ms	387.973334	2.367312	2920.428114	0.863635	7.52
   ```
   
   Some JTReg test cases are added for the above changes. And the patch was
   tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed.

-------------

Changes: https://git.openjdk.org/jdk/pull/26740/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26740&range=03
  Stats: 485 lines in 9 files changed: 388 ins; 12 del; 85 mod
  Patch: https://git.openjdk.org/jdk/pull/26740.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26740/head:pull/26740

PR: https://git.openjdk.org/jdk/pull/26740

From chagedorn at openjdk.org  Mon Sep 15 07:02:22 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 15 Sep 2025 07:02:22 GMT
Subject: RFR: 8366940: Test
 compiler/loopopts/superword/TestAliasingFuzzer.java timed out
In-Reply-To: <-iszfG2luNYZtsMxsMnsDWoQIscvcY37XSpi8fDDcEE=.2d26cf70-ba2e-4967-a083-50867f291784@github.com>
References: <-iszfG2luNYZtsMxsMnsDWoQIscvcY37XSpi8fDDcEE=.2d26cf70-ba2e-4967-a083-50867f291784@github.com>
Message-ID: <IpVJewXYYf1PixPJ2EkzPiCRv0_7jeu95ah5Rnfj67Q=.d70d33f0-76a8-45ac-b8e1-48d09360f855@github.com>

On Fri, 12 Sep 2025 12:01:25 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> `TestAliasingFuzzer.java` generates 30 subtests for every run. They are randomized. Some vectorize and execute faster, some fail to vectorize and execute slower.
> 
> Hence, some natural variance in the duration is expected.
> On most machines, it seems the variance in "Running Tests" is about 30-50sec (total test time about 35-70sec). But on some machines (macosx-x64-debug), the execution time is a bit slower: 60-100 in "Running Tests", with some outliers at 110+sec. These occasionally trip the 120sec timeout, and when they trip it, they somehow cause the harness to take an excessive 9+min to shut everything down.
> 
> Solutions:
> - Option 1: generate fewer tests in `TestAliasingFuzzer.java`. Would be sad, the test has now found 2 real bugs within 2 weeks.
> - Option 2: increase test timeout. That is what I'll do. Because the "outliers" that caused the timeouts were not far from all other cases on the same platform, and so they are acceptable.

That looks reasonable!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27257#pullrequestreview-3223155288

From epeter at openjdk.org  Mon Sep 15 07:02:23 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 07:02:23 GMT
Subject: RFR: 8366940: Test
 compiler/loopopts/superword/TestAliasingFuzzer.java timed out
In-Reply-To: <cywK2fr_VL5axWsEvS5Ucs8krhZWTMKm7b2j18j_Q7s=.6c6965b7-8c2b-4bc2-9395-00555c2a865c@github.com>
References: <-iszfG2luNYZtsMxsMnsDWoQIscvcY37XSpi8fDDcEE=.2d26cf70-ba2e-4967-a083-50867f291784@github.com>
 <cywK2fr_VL5axWsEvS5Ucs8krhZWTMKm7b2j18j_Q7s=.6c6965b7-8c2b-4bc2-9395-00555c2a865c@github.com>
Message-ID: <zF84kxgjJgZgaSTHHJjwL8uvC32tovncfi19crIcWgs=.249717e9-973c-4774-ab28-d0c3ddc2c4ab@github.com>

On Fri, 12 Sep 2025 12:32:00 GMT, SendaoYan <syan at openjdk.org> wrote:

>> `TestAliasingFuzzer.java` generates 30 subtests for every run. They are randomized. Some vectorize and execute faster, some fail to vectorize and execute slower.
>> 
>> Hence, some natural variance in the duration is expected.
>> On most machines, it seems the variance in "Running Tests" is about 30-50sec (total test time about 35-70sec). But on some machines (macosx-x64-debug), the execution time is a bit slower: 60-100 in "Running Tests", with some outliers at 110+sec. These occasionally trip the 120sec timeout, and when they trip it, they somehow cause the harness to take an excessive 9+min to shut everything down.
>> 
>> Solutions:
>> - Option 1: generate fewer tests in `TestAliasingFuzzer.java`. Would be sad, the test has now found 2 real bugs within 2 weeks.
>> - Option 2: increase test timeout. That is what I'll do. Because the "outliers" that caused the timeouts were not far from all other cases on the same platform, and so they are acceptable.
>
> Marked as reviewed by syan (Committer).

@sendaoYan @chhagedorn Thanks for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27257#issuecomment-3290745075

From epeter at openjdk.org  Mon Sep 15 07:02:24 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 07:02:24 GMT
Subject: Integrated: 8366940: Test
 compiler/loopopts/superword/TestAliasingFuzzer.java timed out
In-Reply-To: <-iszfG2luNYZtsMxsMnsDWoQIscvcY37XSpi8fDDcEE=.2d26cf70-ba2e-4967-a083-50867f291784@github.com>
References: <-iszfG2luNYZtsMxsMnsDWoQIscvcY37XSpi8fDDcEE=.2d26cf70-ba2e-4967-a083-50867f291784@github.com>
Message-ID: <Sknbc01pbjXCQ7GsskulTuuoEzrTJda8O8Ak6Zlnids=.94dc8df8-9158-4830-ab2d-8be4bfcff77d@github.com>

On Fri, 12 Sep 2025 12:01:25 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> `TestAliasingFuzzer.java` generates 30 subtests for every run. They are randomized. Some vectorize and execute faster, some fail to vectorize and execute slower.
> 
> Hence, some natural variance in the duration is expected.
> On most machines, it seems the variance in "Running Tests" is about 30-50sec (total test time about 35-70sec). But on some machines (macosx-x64-debug), the execution time is a bit slower: 60-100 in "Running Tests", with some outliers at 110+sec. These occasionally trip the 120sec timeout, and when they trip it, they somehow cause the harness to take an excessive 9+min to shut everything down.
> 
> Solutions:
> - Option 1: generate fewer tests in `TestAliasingFuzzer.java`. Would be sad, the test has now found 2 real bugs within 2 weeks.
> - Option 2: increase test timeout. That is what I'll do. Because the "outliers" that caused the timeouts were not far from all other cases on the same platform, and so they are acceptable.

This pull request has now been integrated.

Changeset: cf00f96f
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/cf00f96fd49ac7e6e04fdde74a3015531a0b59c8
Stats:     2 lines in 1 file changed: 0 ins; 0 del; 2 mod

8366940: Test compiler/loopopts/superword/TestAliasingFuzzer.java timed out

Reviewed-by: syan, chagedorn

-------------

PR: https://git.openjdk.org/jdk/pull/27257

From jbhateja at openjdk.org  Mon Sep 15 08:20:35 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Mon, 15 Sep 2025 08:20:35 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v8]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <HPgkmQwoaMXSWdXiMXkbqSoMnI13yPpPBJSPxZKTxnc=.f978ac37-462f-496e-b5ec-bf3005cb7e5a@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Extending the random ranges

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/a7f9b79c..278f1dc8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=06-07

  Stats: 29 lines in 1 file changed: 2 ins; 6 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From chagedorn at openjdk.org  Mon Sep 15 09:14:13 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 15 Sep 2025 09:14:13 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v8]
In-Reply-To: <4Yzeo6gJlk-Jq5zlh3P9HPCm57-7AwIqsywOWbawzcI=.13938c72-a9d4-463d-a54c-a08c70482a6b@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <4Yzeo6gJlk-Jq5zlh3P9HPCm57-7AwIqsywOWbawzcI=.13938c72-a9d4-463d-a54c-a08c70482a6b@github.com>
Message-ID: <kTPVf6rbq844SSoLk9wLYKxdL6RD7myinShIFZXagHo=.9afe0c48-2ed6-4c48-ba9c-4a982518c62a@github.com>

On Fri, 12 Sep 2025 07:27:02 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> A node in a pre loop only has uses out of the loop dominated by the
>> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
>> to the loop exit projection. A range check in the main loop has this
>> node as input (through a chain of some other nodes). Range check
>> elimination needs to update the exit condition of the pre loop with an
>> expression that depends on the node pinned on its exit: that's
>> impossible and the assert fires. This is a variant of 8314024 (this
>> one was for a node with uses out of the pre loop on multiple paths). I
>> propose the same fix: leave the node with control in the pre loop in
>> this case.
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Marked as reviewed by chagedorn (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/26424#pullrequestreview-3223618579

From shade at openjdk.org  Mon Sep 15 09:21:44 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 15 Sep 2025 09:21:44 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
In-Reply-To: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
Message-ID: <sf50SUSio_XVYb96O-NS1MBeKxaiWFpYxdn97cHXlo4=.e6ac17c4-bed2-4ddf-8686-f23809c19a89@github.com>

On Fri, 12 Sep 2025 19:14:18 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.
> 
> Consider `FloatVector::lanewiseTemplate`:
> 
>     FloatVector lanewiseTemplate(VectorOperators.Unary op) {
>         if (opKind(op, VO_SPECIAL)) {
>             ...                             
>             else if (opKind(op, VO_MATHLIB)) {
>                 return unaryMathOp(op);
>             }
>         }
>         int opc = opCode(op);
>         return VectorSupport.unaryOp(opc, ...);
>     }
> 
> 
> At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 
> 
> It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.
> 
> The fix is to fail-fast intrinsification rather than crashing the VM.
> 
> Testing: tier1 - tier4

Looks reasonable! Thanks.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27263#pullrequestreview-3223639697

From chagedorn at openjdk.org  Mon Sep 15 09:30:23 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 15 Sep 2025 09:30:23 GMT
Subject: RFR: 8362394: C2: Repeated stacked string concatenation fails with
 "Hit MemLimit" and other resourcing errors [v4]
In-Reply-To: <ImE4GMvRS0mguhEym1s84tDliD6VdBzqsLi_7LVkiiE=.2c7a9e8a-16a8-4b68-a67b-12e3be3317cc@github.com>
References: <oE4pDFEgcIH13lUcCbdn20KwW63_9RRpaZCsmNPZzWQ=.832b9063-9bdc-413a-9741-b7d6bb629e8a@github.com>
 <ImE4GMvRS0mguhEym1s84tDliD6VdBzqsLi_7LVkiiE=.2c7a9e8a-16a8-4b68-a67b-12e3be3317cc@github.com>
Message-ID: <aDSHQ9CGMUkZ9imN6klCEknFfCK-hL24HSXv_ZM1y5Q=.3a39c449-1d01-43e9-831d-c976cc8b4a93@github.com>

On Thu, 21 Aug 2025 07:41:32 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

>> This PR addresses a bug in the stringopts phase. During string concatenation, repeated stacking of concatenations can lead to excessive compilation resource use and generation of questionable code as the merging of two StringBuilder-append-toString links sc1 and sc2 can result in a new StringBuilder with the size sc1->num_arguments() * sc2->num_arguments().
>> 
>> In the attached test, the size of the successively merged StringBuilder doubles on each merge -- there's 24 of them -- as the toString result of the first component is used twice in the second component [1], etc. Not only does the compiler hang on this test case, but the string concat optimization seems to give an arbitrary amount of back-to-back stores in the generated code depending on the number of stacked concatenations.
>> 
>> The proposed solution is to put an upper bound on the size of a merged concatenation, which guards against this case of repeated concatenations on the same string variable, and potentially other edge cases. 100 seems like a generous limit, and higher limits could be insufficient as each argument corresponds to about 20 new nodes later in replace_string_concat [2].
>> 
>> [1] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L303
>> 
>> [2] https://github.com/openjdk/jdk/blob/0ceb366dc26e2e4f6252da9dd8930b016a5d46ba/src/hotspot/share/opto/stringopts.cpp#L1806
>> 
>> Testing: T1-4.
>> 
>> Extra testing: verified that no method in T1-4 is being compiled with a merged concat candidate exceeding the suggested limit of 100 aguments, regardless of whether or not the later checks verify_control_flow() and verify_mem_flow pass.
>
> Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision:
> 
>   compare order

The fix looks good to me, too!  A few small comments/suggestions.

src/hotspot/share/opto/stringopts.cpp line 56:

> 54:                                        // to restart at the initial JVMState.
> 55: 
> 56:   static constexpr uint STACKED_CONCAT_UPPER_BOUND = 256; // argument limit for a merged concat.

Can you add a comment how we ended up with 256?

src/hotspot/share/opto/stringopts.cpp line 319:

> 317:     // -- and bail out in that case.
> 318:     if (arguments_appended > STACKED_CONCAT_UPPER_BOUND) {
> 319:       return nullptr;

Should we also print an error message for `PrintOptimizeStringConcat` here?

test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 26:

> 24: /*
> 25:  * @test
> 26:  * @bug 8357105

Wrong bug number:
Suggestion:

 * @bug 8362394

test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsMany.java line 37:

> 35:  */
> 36: 
> 37: // The test uses -XX:-OptoScheduling to avoid the assert "too many D-U pinch points" on aarch64.

I assume this is due to JDK-8328078? Maybe you can also mention the bug number here for completeness.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26685#pullrequestreview-3223638117
PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2348385681
PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2348379717
PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2348366153
PR Review Comment: https://git.openjdk.org/jdk/pull/26685#discussion_r2348364931

From chagedorn at openjdk.org  Mon Sep 15 09:32:26 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 15 Sep 2025 09:32:26 GMT
Subject: RFR: 8356779: IGV: dump the index of the SafePointNode containing
 the current JVMS during parsing
In-Reply-To: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
References: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
Message-ID: <i9BLYUQ_3MLCmz6Rs98x4zratzNeajnL_4vFaJ6QpBM=.64625c11-b517-4302-98f9-3c9190a72221@github.com>

On Thu, 4 Sep 2025 05:22:00 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> This PR prints index of the SafePointNode containing the current JVMS during parsing in IGV. As stated in JBS the reason for this is that there are a lot of nodes during parsing, it would be nice to know what are the current nodes in the local slots or in the stack when looking at a graph.

That's a good addition, looks good to me, too!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27083#pullrequestreview-3223687303

From jbhateja at openjdk.org  Mon Sep 15 09:38:14 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Mon, 15 Sep 2025 09:38:14 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v13]
In-Reply-To: <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
Message-ID: <4D63cqV0LkPYrSMSkfachZzoH_qpH9vhAbo57RRe1Js=.7a21d73b-7963-4e15-b013-8295b274d5d0@github.com>

On Mon, 15 Sep 2025 05:43:11 GMT, erifan <duke at openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
>> 
>> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
>> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
>> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
>> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
>> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
>> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
>> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
>> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
>> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
>> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
>> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
>> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
>> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
>> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
>> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
>> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
>> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
>> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
>> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
>> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
>> testCompareLTMaskNotInt		ops/s	16721...
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add an IR rule for vector mask cast operation

Your benchmark and code changes look good to me. Thanks for addressing my comments.

-------------

Marked as reviewed by jbhateja (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-3223705838

From eastigeevich at openjdk.org  Mon Sep 15 09:38:33 2025
From: eastigeevich at openjdk.org (Evgeny Astigeevich)
Date: Mon, 15 Sep 2025 09:38:33 GMT
Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache
 [v42]
In-Reply-To: <Da1gP_hlbGUK6sKpxKMPME5zLsoTCU6UYXKXXpkkQQA=.8f5b0159-26e6-4ec5-96f1-c9d16573a1de@github.com>
References: <CpuGkGuFlcXd3ZwuZCG8oWEEa2GKgTs3LaGwpIESm9g=.4807c870-5cce-4f87-aca7-79c1b87e7b0a@github.com>
 <FaROpkj7dY-WEQp3-3tIXs1qY0ChbkPNyqOB0-Ip45w=.7d9afb70-e3be-441b-9420-6683713a5556@github.com>
 <Da1gP_hlbGUK6sKpxKMPME5zLsoTCU6UYXKXXpkkQQA=.8f5b0159-26e6-4ec5-96f1-c9d16573a1de@github.com>
Message-ID: <t4vyoH7Dg8Lq8uE2KAzU0tRbD6LVaF5Z87a60eM9Sro=.81e2d9ec-a901-44a5-bdaf-b4c9377698fb@github.com>

On Thu, 21 Aug 2025 14:56:30 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

> Hey!
> 
> @fisk
> 
> > Also, do you have any numbers showing if iTLB pressure improved? Or performance improved? Or in general that anything improved? I'm guessing so but I'd like to see some data.
> 
> The issue is that some of the major arm manufacturers seems to have missed appendix C in Intel opt manual - "OPTIMIZATION WITH LARGE CODE PAGES".
> 
> E.g. running renaissance dotty on a G3 I saw 37% front-ends stall (G2 28%, they made significant improvement to backend on G3, presumably not front-end hence more stalling).
> 
> By using less itbl entries we can significant increase ipc on these CPUs. Simple testing with some eariler version of this got ~10% reduction in frontend stalls (take that number with a grain of salt). Now if this is correct approach or not, that's is still unclear to me.

@robehn @fisk 

I added a microbenchmark demonstrating performance impact of the sparse CodeCache: https://github.com/openjdk/jdk/pull/23831
It shows the code sparsity affects both Intel and Graviton CPUs. In case of Graviton CPUs you can measure the code sparsity: r11c counter per 1000 instructions. BTW there might be no ITLB misses or minor ones in the case of the code sparsity.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3291256979

From epeter at openjdk.org  Mon Sep 15 09:42:37 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 09:42:37 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v4]
In-Reply-To: <m5wop1pevTtf0G1NtQlQBQsdMPY0wT0rbQdo4H-k2EY=.10fe2102-936e-44bf-b599-5a3844bbeb15@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <6dWR-SxhuKd9-T3q313I6at4vTBcYlufyCBNjGGopv4=.cae3abea-0752-4191-ac08-890476489af3@github.com>
 <wzkv2ry6TsRXprP_LR7q0l9atl1GFIP4RwNgoEGnkV4=.3ce72d24-5da5-423b-b2d3-36f3fcb681dd@github.com>
 <m5wop1pevTtf0G1NtQlQBQsdMPY0wT0rbQdo4H-k2EY=.10fe2102-936e-44bf-b599-5a3844bbeb15@github.com>
Message-ID: <ZR2wT4ta4I8R7bXaqA5deU2wug1ZyEx5INcSFqF8mBI=.5d7ac45a-49a2-43e8-9c44-b0ca71e25fe5@github.com>

On Tue, 26 Aug 2025 09:27:04 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java line 28:
>> 
>>> 26:  * @bug 8361702
>>> 27:  * @summary C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test?
>>> 28:  * @requires vm.flavor == "server"
>> 
>> Would this test fail without this requires? Or could we remove it, in the hopes of catching something else somewhere else?
>
> The `@requires` is there because the test run needs command line options that are c2 specific.

Ok, but then you should make the run below without flags in a separate `@test` that does not have this restriction.

@rwestrel

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2348419584
PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2348424709

From duke at openjdk.org  Mon Sep 15 09:49:19 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 09:49:19 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v13]
In-Reply-To: <4D63cqV0LkPYrSMSkfachZzoH_qpH9vhAbo57RRe1Js=.7a21d73b-7963-4e15-b013-8295b274d5d0@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
 <4D63cqV0LkPYrSMSkfachZzoH_qpH9vhAbo57RRe1Js=.7a21d73b-7963-4e15-b013-8295b274d5d0@github.com>
Message-ID: <0SEFllVEITC_xA1OeWHnPC0S9-nbnicZOCKlAcbwH1M=.ecc56fb6-45fd-44c9-a9ca-4a5f5a391a34@github.com>

On Mon, 15 Sep 2025 09:33:47 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> erifan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Add an IR rule for vector mask cast operation
>
> Your benchmark and code changes look good to me. Thanks for addressing my comments.

Thanks @jatin-bhateja . And the updated benchmarks test results are as follow, no much changes.

On Nvidia Grace machine with 128-bit SVE2:
With option `-XX:UseSVE=2`:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	908008.7644	827.699314	1175289.515	240.548861	1.294359
testCompareMaskNotDouble	NE	ops/s	872199.2489	131.090115	1175667.777	129.741515	1.347934
testCompareMaskNotDouble	LT	ops/s	880166.7559	1570.41653	882160.6889	4723.507639	1.002265
testCompareMaskNotDouble	LE	ops/s	878115.3293	2919.637497	879033.7895	5404.617017	1.001045
testCompareMaskNotDouble	GT	ops/s	877068.5325	9595.275981	865832.864	5054.26002	0.987189
testCompareMaskNotDouble	GE	ops/s	895695.0228	3276.687933	871153.7117	7714.572967	0.9726
testCompareMaskNotFloat	    EQ	ops/s	1811841.295	278.140948	2350971.83	606.667654	1.297559
testCompareMaskNotFloat	    NE	ops/s	1727124.634	1755.717051	2351789.019	269.531198	1.361678
testCompareMaskNotFloat	    LT	ops/s	1735243.319	4912.343726	1726257.01	823.746765	0.994821
testCompareMaskNotFloat	    LE	ops/s	1726151.367	1071.383328	1727029.339	960.336314	1.000508
testCompareMaskNotFloat	    GT	ops/s	1729704.897	1646.026351	1726069.02	440.981281	0.997897
testCompareMaskNotFloat	    GE	ops/s	1726515.227	2171.61643	1728365.682	1404.298156	1.001071
testCompareMaskNotByte	    EQ	ops/s	8480574.694	1254.415788	10200329.86	8560.199493	1.202787
testCompareMaskNotByte	    NE	ops/s	8480141.263	1437.762594	10207424.91	3664.106923	1.203685
testCompareMaskNotByte	    LT	ops/s	8471471.384	7699.585554	10203300.19	4675.047416	1.20443
testCompareMaskNotByte	    LE	ops/s	8476165.519	6045.944392	10204956.23	2174.866199	1.203959
testCompareMaskNotByte	    GT	ops/s	8479397.377	1290.560961	10207032.3	5414.789178	1.203745
testCompareMaskNotByte	    GE	ops/s	8479979.908	1094.823175	10203115.77	2909.433184	1.2032
testCompareMaskNotByte	    ULT	ops/s	8480915.515	1420.30856	10213140.54	19628.56888	1.204249
testCompareMaskNotByte	    ULE	ops/s	8481768.961	1806.086454	10191601.05	9537.089409	1.201589
testCompareMaskNotByte	    UGT	ops/s	8477948.807	3652.437106	10208439.79	8335.226416	1.204116
testCompareMaskNotByte	    UGE	ops/s	8477320.065	2191.753237	10198589.9	5748.761942	1.203044
testCompareMaskNotInt	    EQ	ops/s	1906386.393	208.045573	2346741.129	383.461819	1.230989
testCompareMaskNotInt	    NE	ops/s	1674206.146	169.967081	2346609.602	652.964692	1.401625
testCompareMaskNotInt	    LT	ops/s	1684755.085	4939.806653	2345939.728	738.842445	1.392451
testCompareMaskNotInt	    LE	ops/s	1659985.83	2408.542766	2346929.8	192.550397	1.413825
testCompareMaskNotInt	    GT	ops/s	1674460.437	447.120589	2347037.155	342.433085	1.401667
testCompareMaskNotInt	    GE	ops/s	1658699.073	884.268891	2347411.827	281.885914	1.415212
testCompareMaskNotInt	    ULT	ops/s	1677043.66	6215.834359	2347155.384	425.141786	1.399579
testCompareMaskNotInt	    ULE	ops/s	1667049.76	9521.094204	2346815.213	316.03901	1.407765
testCompareMaskNotInt	    UGT	ops/s	1661045.828	3669.548525	2346711.365	2808.608132	1.412791
testCompareMaskNotInt	    UGE	ops/s	1663715.691	4570.73053	2347096.847	191.804359	1.410755
testCompareMaskNotLong	    EQ	ops/s	885668.5947	203.053456	1174274.006	113.51354	1.325861
testCompareMaskNotLong	    NE	ops/s	837449.9353	198.611966	1174330.269	106.514374	1.402269
testCompareMaskNotLong	    LT	ops/s	846790.2128	7005.585657	1174290.879	93.56413	1.386755
testCompareMaskNotLong	    LE	ops/s	851253.2346	7624.045467	1174162.355	179.854316	1.379333
testCompareMaskNotLong	    GT	ops/s	837715.7563	4272.558281	1173797.819	289.311518	1.401188
testCompareMaskNotLong	    GE	ops/s	883137.593	14804.63746	1174216.909	86.404559	1.329596
testCompareMaskNotLong	    ULT	ops/s	872478.9017	4955.722542	1174341.995	124.656933	1.345983
testCompareMaskNotLong	    ULE	ops/s	866570.738	12541.58528	1174185.197	594.850706	1.354979
testCompareMaskNotLong	    UGT	ops/s	866389.0927	3971.492766	1174210.803	153.960084	1.355292
testCompareMaskNotLong	    UGE	ops/s	848339.3876	4555.514721	1174060.638	240.326562	1.383951
testCompareMaskNotShort	    EQ	ops/s	3336170.783	2286.717236	4684904.156	2134.72575	1.404275
testCompareMaskNotShort	    NE	ops/s	3334775.472	717.588615	4690264.12	3017.756867	1.40647
testCompareMaskNotShort	    LT	ops/s	3334619.058	1138.901707	4685883.864	3808.321694	1.405223
testCompareMaskNotShort	    LE	ops/s	3335538.353	538.676789	4688238.934	1029.406266	1.405541
testCompareMaskNotShort	    GT	ops/s	3301425.217	694.060525	4689167.049	2845.363801	1.420346
testCompareMaskNotShort	    GE	ops/s	3301580.972	317.042851	4688970.211	1292.83929	1.420219
testCompareMaskNotShort	    ULT	ops/s	3336318.051	892.515034	4687549.384	1403.281648	1.405006
testCompareMaskNotShort	    ULE	ops/s	3335188.292	972.230191	4684723.63	3937.599084	1.404635
testCompareMaskNotShort	    UGT	ops/s	3334490.656	930.409628	4688058.378	1166.776081	1.405929
testCompareMaskNotShort	    UGE	ops/s	3333050.033	3146.019596	4689197.9	456.439188	1.406878


With option `-XX:UseSVE=0`:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	788505.9464	579.254839	769969.5798	138.792325	0.976491
testCompareMaskNotDouble	NE	ops/s	655499.7935	471.970429	915086.3257	183.495964	1.396013
testCompareMaskNotDouble	LT	ops/s	788418.7889	574.263314	789271.7448	51.838991	1.001081
testCompareMaskNotDouble	LE	ops/s	789144.8431	45.334181	789326.1963	84.148011	1.000229
testCompareMaskNotDouble	GT	ops/s	788690.8485	662.950083	789246.9812	99.060588	1.000705
testCompareMaskNotDouble	GE	ops/s	789421.2387	94.012868	789166.4717	111.772533	0.999677
testCompareMaskNotFloat	    EQ	ops/s	1816132.864	1298.2187	1816461.601	311.706275	1.000181
testCompareMaskNotFloat	    NE	ops/s	1550767.697	1142.987761	2301429.148	159.71525	1.484057
testCompareMaskNotFloat	    LT	ops/s	1815531.685	1370.868745	1817187.121	761.68401	1.000911
testCompareMaskNotFloat	    LE	ops/s	1817937.722	484.638134	1817703.209	625.275639	0.999871
testCompareMaskNotFloat	    GT	ops/s	1818618.89	724.324392	1817977.851	481.152488	0.999647
testCompareMaskNotFloat	    GE	ops/s	1815118.411	1327.945736	1817476.414	510.712942	1.001299
testCompareMaskNotByte	    EQ	ops/s	6489599.571	5127.815254	6535895.286	17029.15534	1.007133
testCompareMaskNotByte	    NE	ops/s	9089974.523	4069.346579	15945662.17	22867.48282	1.754203
testCompareMaskNotByte	    LT	ops/s	6499040.898	1250.085336	15939338.57	17451.05939	2.452567
testCompareMaskNotByte	    LE	ops/s	6493612.339	4928.466061	15926355.01	27249.57103	2.452618
testCompareMaskNotByte	    GT	ops/s	6494486.565	5229.4598	15957497.14	6893.237334	2.457083
testCompareMaskNotByte	    GE	ops/s	6499295.661	1030.044749	15903755.01	46454.70992	2.446996
testCompareMaskNotByte	    ULT	ops/s	6494212.684	5194.712704	15944816.71	3467.818892	2.455234
testCompareMaskNotByte	    ULE	ops/s	6493882.576	5092.839387	15936419.25	22755.34523	2.454066
testCompareMaskNotByte	    UGT	ops/s	6493479.899	4678.096391	15958133.18	3483.353667	2.457562
testCompareMaskNotByte	    UGE	ops/s	6500338.419	709.344957	15968155.27	14020.47085	2.456511
testCompareMaskNotInt	    EQ	ops/s	1830787.273	237.597163	1878452.588	142.728192	1.026035
testCompareMaskNotInt	    NE	ops/s	1615081.395	1219.871461	2360913.712	199.556675	1.461792
testCompareMaskNotInt	    LT	ops/s	1827819.867	1360.728526	2360561.422	248.025925	1.291462
testCompareMaskNotInt	    LE	ops/s	1830975.648	416.987529	2360703.924	194.958346	1.289314
testCompareMaskNotInt	    GT	ops/s	1830633.964	301.849017	2360552.203	224.908655	1.289472
testCompareMaskNotInt	    GE	ops/s	1829476.495	1348.361278	2360673.736	137.538696	1.290354
testCompareMaskNotInt	    ULT	ops/s	1829137.773	1285.55232	2360615.95	162.876291	1.290562
testCompareMaskNotInt	    ULE	ops/s	1828107.468	1360.867847	2360790.337	297.267481	1.291384
testCompareMaskNotInt	    UGT	ops/s	1829659.222	1459.098806	2361025.107	266.158075	1.290417
testCompareMaskNotInt	    UGE	ops/s	1829548.187	1427.266787	2360941.943	242.380469	1.29045
testCompareMaskNotLong	    EQ	ops/s	810439.9121	82.577412	802287.4993	73.462086	0.98994
testCompareMaskNotLong	    NE	ops/s	681643.6089	485.657471	932324.6973	158.28799	1.367759
testCompareMaskNotLong	    LT	ops/s	809850.546	680.71673	931404.3219	685.591444	1.150094
testCompareMaskNotLong	    LE	ops/s	810584.5191	115.234753	932234.2412	105.451172	1.150076
testCompareMaskNotLong	    GT	ops/s	810593.5376	117.947863	931879.1829	553.397713	1.149625
testCompareMaskNotLong	    GE	ops/s	810435.8405	81.88737	931833.0348	177.765694	1.149792
testCompareMaskNotLong	    ULT	ops/s	810429.8459	90.005329	932127.5278	74.443387	1.150164
testCompareMaskNotLong	    ULE	ops/s	809740.842	411.655134	932231.6607	76.044104	1.151271
testCompareMaskNotLong	    UGT	ops/s	810493.4369	52.024062	932239.1709	143.915229	1.150211
testCompareMaskNotLong	    UGE	ops/s	810442.0661	64.064396	932361.567	119.570287	1.150435
testCompareMaskNotShort	    EQ	ops/s	4786426.182	299.050738	4694123.013	482.608634	0.980715
testCompareMaskNotShort	    NE	ops/s	3808932.807	2993.590606	5672255.469	6262.526335	1.489198
testCompareMaskNotShort	    LT	ops/s	4782535.485	3699.104322	5668474.071	11101.86452	1.185244
testCompareMaskNotShort	    LE	ops/s	4782896.891	3338.57484	5669188.434	6309.723399	1.185304
testCompareMaskNotShort	    GT	ops/s	4778532.318	3571.547653	5680482.703	10427.66734	1.18875
testCompareMaskNotShort	    GE	ops/s	4786150.851	794.769881	5664644.919	6542.434538	1.183549
testCompareMaskNotShort	    ULT	ops/s	4783623.78	3582.962421	5668267.123	17841.44773	1.184931
testCompareMaskNotShort	    ULE	ops/s	4782752.125	3610.296618	5666231.302	6964.505363	1.184721
testCompareMaskNotShort	    UGT	ops/s	4782469.332	2913.37576	5655837.96	6494.608864	1.182618
testCompareMaskNotShort	    UGE	ops/s	4782606.35	3491.774067	5667295.182	14176.96543	1.18498


On AMD EPYC 9124 16-Core Processor:
With option `-XX:UseAVX=3`:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	2166357.886	27577.51358	2920183.192	38491.49083	1.347968
testCompareMaskNotDouble	NE	ops/s	2177325.341	32771.27023	2965747.932	39271.62615	1.362106
testCompareMaskNotDouble	LT	ops/s	2123834.711	22890.39919	2197099.169	29107.41329	1.034496
testCompareMaskNotDouble	LE	ops/s	2172931.681	32912.05647	2121686.057	34927.37781	0.976416
testCompareMaskNotDouble	GT	ops/s	2164924.662	30925.91899	2124062.892	37135.0458	0.981125
testCompareMaskNotDouble	GE	ops/s	2150619.038	35515.09022	2192636.533	38672.85716	1.019537
testCompareMaskNotFloat	    EQ	ops/s	4518378.764	74733.72389	6724589.409	50424.63568	1.488274
testCompareMaskNotFloat	    NE	ops/s	4522823.224	78138.66727	6907565.257	203953.3299	1.527268
testCompareMaskNotFloat	    LT	ops/s	4587473.545	62621.25938	4431658.918	52760.23989	0.966034
testCompareMaskNotFloat	    LE	ops/s	4472078.986	79338.23304	4472390.043	66247.285	1.000069
testCompareMaskNotFloat	    GT	ops/s	4451744.39	220787.9755	4440866.486	58674.19154	0.997556
testCompareMaskNotFloat	    GE	ops/s	4459601.349	57873.05167	4481398.426	76819.69285	1.004887
testCompareMaskNotByte	    EQ	ops/s	19415317.92	356367.4937	20649319.86	240515.9459	1.063558
testCompareMaskNotByte	    NE	ops/s	19401162.58	362571.8103	21010358.2	71221.35255	1.082943
testCompareMaskNotByte	    LT	ops/s	19175612.37	273080.6175	20235838.72	396190.6101	1.05529
testCompareMaskNotByte	    LE	ops/s	19036831.33	121135.0491	20674528.84	248839.9471	1.086027
testCompareMaskNotByte	    GT	ops/s	19008302.3	124633.9182	20671390.89	271644.5576	1.087492
testCompareMaskNotByte	    GE	ops/s	19590753.42	429156.452	20491615.07	332912.82	1.045984
testCompareMaskNotByte	    ULT	ops/s	19431604.06	421396.5487	20575805.9	248466.2368	1.058883
testCompareMaskNotByte	    ULE	ops/s	19060425.47	98309.75469	20774930.43	206596.0422	1.089951
testCompareMaskNotByte	    UGT	ops/s	19266788.04	362893.3051	20861521.87	106977.3707	1.082771
testCompareMaskNotByte	    UGE	ops/s	19127964.33	447774.3747	20791221.56	254458.0132	1.086954
testCompareMaskNotInt	    EQ	ops/s	4473402.48	84902.77154	7191777.028	94315.13878	1.607674
testCompareMaskNotInt	    NE	ops/s	4583165.363	73491.79073	7249884.988	80028.31191	1.581851
testCompareMaskNotInt	    LT	ops/s	4618634.192	81869.82512	7242567.732	71211.3697	1.568118
testCompareMaskNotInt	    LE	ops/s	4650524.195	72302.56692	7154948.491	83057.90635	1.538525
testCompareMaskNotInt	    GT	ops/s	4534752.486	94449.20198	7004428.251	38365.18576	1.54461
testCompareMaskNotInt	    GE	ops/s	4540777.389	86331.11847	7129527.341	74343.06996	1.570111
testCompareMaskNotInt	    ULT	ops/s	4528175.644	114213.6504	7220013.98	82850.22587	1.594464
testCompareMaskNotInt	    ULE	ops/s	4619335.448	74203.98889	7118543.128	54457.43284	1.541031
testCompareMaskNotInt	    UGT	ops/s	4572521.254	122912.75	7154797.741	98858.3477	1.564737
testCompareMaskNotInt	    UGE	ops/s	4579627.842	80558.04554	7179020.593	99239.23499	1.567599
testCompareMaskNotLong	    EQ	ops/s	2103965.347	17059.28178	2997338.009	32388.42725	1.424613
testCompareMaskNotLong	    NE	ops/s	2174434.633	36011.24708	2984460.593	29074.42994	1.372522
testCompareMaskNotLong	    LT	ops/s	2110937.378	56642.0052	3020690.893	31167.62537	1.430971
testCompareMaskNotLong	    LE	ops/s	2153414.166	31280.20562	2971696.162	31176.24605	1.379992
testCompareMaskNotLong	    GT	ops/s	2166028.207	49432.18925	3008018.282	26534.78551	1.388725
testCompareMaskNotLong	    GE	ops/s	2178206.136	35757.6799	2933186.687	19824.26727	1.346606
testCompareMaskNotLong	    ULT	ops/s	2104344.728	31405.7728	2964354.007	26871.18289	1.408682
testCompareMaskNotLong	    ULE	ops/s	2210232.578	21993.95777	3032635.261	25545.43656	1.372088
testCompareMaskNotLong	    UGT	ops/s	2167177.931	44896.90807	2996245.236	34153.68941	1.382556
testCompareMaskNotLong	    UGE	ops/s	2117175.328	26131.1893	2977492.164	23227.65519	1.406351
testCompareMaskNotShort	    EQ	ops/s	8131234.179	185997.1777	12414378.38	122648.1579	1.526752
testCompareMaskNotShort	    NE	ops/s	8506016.656	236481.383	12720442.64	322747.8776	1.495464
testCompareMaskNotShort	    LT	ops/s	8487868.819	244943.6097	12150479.62	244300.5456	1.431511
testCompareMaskNotShort	    LE	ops/s	8549184.557	286833.466	12358019.06	136683.2112	1.44552
testCompareMaskNotShort	    GT	ops/s	8375447.45	221237.073	12602058.97	385690.3318	1.504643
testCompareMaskNotShort	    GE	ops/s	8123474.548	127727.1461	12799747.64	197940.1001	1.575649
testCompareMaskNotShort	    ULT	ops/s	8491650.422	313124.2425	12751186.59	255845.1653	1.501614
testCompareMaskNotShort	    ULE	ops/s	8363009.676	203670.1995	12675908.7	279496.9925	1.515711
testCompareMaskNotShort	    UGT	ops/s	8332268.933	279787.2503	12279451.4	436971.6582	1.473722
testCompareMaskNotShort	    UGE	ops/s	8931588.505	203962.9257	12324437.67	330723.3066	1.37987

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3291304777

From duke at openjdk.org  Mon Sep 15 09:58:20 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 09:58:20 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v2]
In-Reply-To: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
Message-ID: <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>

> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
> 
> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
> 
> This pull request introduces the following changes:
> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
> 2. Eliminates unnecessary compress operations for partial subword type cases.
> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
> 
> Benchmark results demonstrate that these changes significantly improve performance.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
> 
> 
> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.

erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:

 - Merge branch 'master' into JDK-8366333-compress
 - 8366333: AArch64: Enhance SVE subword type implementation of vector compress
   
   The AArch64 SVE and SVE2 architectures lack an instruction suitable for
   subword-type `compress` operations. Therefore, the current implementation
   uses the 32-bit SVE `compact` instruction to compress subword types by
   first widening the high and low parts to 32 bits, compressing them, and
   then narrowing them back to their original type. Finally, the high and
   low parts are merged using the `index + tbl` instructions.
   
   This approach is significantly slower compared to architectures with native
   support. After evaluating all available AArch64 SVE instructions and
   experimenting with various implementations?such as looping over the active
   elements, extraction, and insertion?I confirmed that the existing algorithm
   is optimal given the instruction set. However, there is still room for
   optimization in the following two aspects:
   1. Merging with `index + tbl` is suboptimal due to the high latency of
   the `index` instruction.
   2. For partial subword types, operations to the highest half are unnecessary
   because those bits are invalid.
   
   This pull request introduces the following changes:
   1. Replaces `index + tbl` with the `whilelt + splice` instructions, which
   offer lower latency and higher throughput.
   2. Eliminates unnecessary compress operations for partial subword type cases.
   3. For `sve_compress_byte`, one less temporary register is used to alleviate
   potential register pressure.
   
   Benchmark results demonstrate that these changes significantly improve performance.
   
   Benchmarks on Nvidia Grace machine with 128-bit SVE:
   ```
   Benchmark	        Unit	Before	 Error	After	 Error	Uplift
   Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
   Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
   Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
   Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
   ```
   
   This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments,
   and all tests passed.

-------------

Changes: https://git.openjdk.org/jdk/pull/27188/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27188&range=01
  Stats: 414 lines in 9 files changed: 297 ins; 24 del; 93 mod
  Patch: https://git.openjdk.org/jdk/pull/27188.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27188/head:pull/27188

PR: https://git.openjdk.org/jdk/pull/27188

From duke at openjdk.org  Mon Sep 15 10:01:18 2025
From: duke at openjdk.org (erifan)
Date: Mon, 15 Sep 2025 10:01:18 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress
In-Reply-To: <G8aVuW-KQmy7GbZY0QblQy5taiBlNGRc6XP_Wz1TwWg=.5515c4a2-e293-4d08-a0cd-7b039cd10f43@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <G8aVuW-KQmy7GbZY0QblQy5taiBlNGRc6XP_Wz1TwWg=.5515c4a2-e293-4d08-a0cd-7b039cd10f43@github.com>
Message-ID: <UJ7aFOla6ZN9sNBIZF8efrJkN6-ty93pxHeQN6wx4Yk=.36595868-860a-4f0f-8caa-e752e7bedada@github.com>

On Thu, 11 Sep 2025 06:07:42 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
>> 
>> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
>> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
>> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
>> 
>> This pull request introduces the following changes:
>> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
>> 2. Eliminates unnecessary compress operations for partial subword type cases.
>> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
>> 
>> Benchmark results demonstrate that these changes significantly improve performance.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
>> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>> 
>> 
>> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.
>
> Would it make sense to additionally run the relevant benchmarks on other popular aarch64 platforms such as Graviton, to make sure the improvements are seen there as well?

@galderz Yeah, absolutely. This is the test results on an **AWS graviton3 V1 machine**, we can see similar performance gain.

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:////Users/erfang/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip.htm">
<link rel=File-List
href="file:////Users/erfang/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_filelist.xml">


</head>

<body link="#467886" vlink="#96607D">


Benchmark | Units | Before | Error | After | Error | Uplift
-- | -- | -- | -- | -- | -- | --
Byte128Vector.compress | ops/ms | 2405.511 | 0.763 | 6116.85 | 17.699 | 2.54284848
Byte64Vector.compress | ops/ms | 1151.662 | 11.262 | 5278.924 | 6.74 | 4.58374419
Double128Vector.compress | ops/ms | 4919.017 | 4.909 | 4940.232 | 20.143 | 1.00431285
Double64Vector.compress | ops/ms | 37.071 | 0.778 | 37.109 | 0.945 | 1.00102506
Float128Vector.compress | ops/ms | 9580.312 | 48.341 | 9586.499 | 74.934 | 1.0006458
Float64Vector.compress | ops/ms | 4943.728 | 7.361 | 4941.917 | 5.871 | 0.99963368
Int128Vector.compress | ops/ms | 9496.991 | 34.972 | 9515.122 | 29.204 | 1.00190913
Int64Vector.compress | ops/ms | 4940.23 | 7.141 | 4941.815 | 5.077 | 1.00032084
Long128Vector.compress | ops/ms | 4918.142 | 14.835 | 4917.148 | 9.05 | 0.99979789
Long64Vector.compress | ops/ms | 36.58 | 0.426 | 36.574 | 0.431 | 0.99983598
Short128Vector.compress | ops/ms | 3343.878 | 0.898 | 6813.421 | 4.143 | 2.03758062
Short64Vector.compress | ops/ms | 1595.358 | 3.37 | 3390.959 | 3.55 | 2.12551603


</body>

</html>

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27188#issuecomment-3291355148

From qxing at openjdk.org  Mon Sep 15 10:18:25 2025
From: qxing at openjdk.org (Qizheng Xing)
Date: Mon, 15 Sep 2025 10:18:25 GMT
Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant
 safepoints in loops [v2]
In-Reply-To: <KFnQveyKY4m3hkBHyUJeVticFI23bls1-_dkKzDS1HA=.5ed7ddc0-cd6a-496c-8d1b-fb30b78d6fec@github.com>
References: <gH_7R5UQ0P_p9lE00k_O08uypVhvDiYBQM6fR71lnI4=.fc388dd3-cb23-44fd-8139-7b9fb95c227a@github.com>
 <ReHPFa4lChrvTE_s7sT8w8vy9lb09Q_IhyZLH4wH3gU=.981d3b3c-b8ac-4a08-9660-19cfa69faf8a@github.com>
 <sCZc14VcHXtM5gl49UN-YRjmmOM0QSgcTOmdSS4CA0Y=.5e3a359c-b393-4a5d-a53f-b066ad7cf899@github.com>
 <KFnQveyKY4m3hkBHyUJeVticFI23bls1-_dkKzDS1HA=.5ed7ddc0-cd6a-496c-8d1b-fb30b78d6fec@github.com>
Message-ID: <IOibuwtxyCdrJGFW4qi-UG-_jRa35Fl9vtbjHV8-W7U=.61bd76d1-cfc2-4faf-96cc-d923b2bcd695@github.com>

On Wed, 2 Apr 2025 07:22:13 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The second question:
>> 
>>> If we now removed safepoints in places where we would actually have needed them: how would we find out? I suppose we would get longer time to safepoint - higher latency in some cases. How would we catch this with our tests?
>> 
>> I tried running tier1 tests with `JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=1000`, and there were no failures.
>> 
>> Running with `-XX:SafepointTimeoutDelay=500` caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option.
>
> @MaxXSoft Would you mind improving the documentation comments, so that they are easier to understand? Maybe you can even add more comments around your code change, to "prove" why it is ok to do what we would do with your change?

Hi @eme64, this PR is now ready for further reviews. Could you please continue to review this PR?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-3291428642

From galder at openjdk.org  Mon Sep 15 10:32:14 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 15 Sep 2025 10:32:14 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v2]
In-Reply-To: <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>
Message-ID: <uLvaXO4MtophJ2T-x9oNWGGWAfMGhneJdwGJgrwNZHs=.ec31b553-15be-4c3a-8cb0-dc38c6b0c35b@github.com>

On Mon, 15 Sep 2025 09:58:20 GMT, erifan <duke at openjdk.org> wrote:

>> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
>> 
>> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
>> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
>> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
>> 
>> This pull request introduces the following changes:
>> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
>> 2. Eliminates unnecessary compress operations for partial subword type cases.
>> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
>> 
>> Benchmark results demonstrate that these changes significantly improve performance.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
>> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>> 
>> 
>> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
> 
>  - Merge branch 'master' into JDK-8366333-compress
>  - 8366333: AArch64: Enhance SVE subword type implementation of vector compress
>    
>    The AArch64 SVE and SVE2 architectures lack an instruction suitable for
>    subword-type `compress` operations. Therefore, the current implementation
>    uses the 32-bit SVE `compact` instruction to compress subword types by
>    first widening the high and low parts to 32 bits, compressing them, and
>    then narrowing them back to their original type. Finally, the high and
>    low parts are merged using the `index + tbl` instructions.
>    
>    This approach is significantly slower compared to architectures with native
>    support. After evaluating all available AArch64 SVE instructions and
>    experimenting with various implementations?such as looping over the active
>    elements, extraction, and insertion?I confirmed that the existing algorithm
>    is optimal given the instruction set. However, there is still room for
>    optimization in the following two aspects:
>    1. Merging with `index + tbl` is suboptimal due to the high latency of
>    the `index` instruction.
>    2. For partial subword types, operations to the highest half are unnecessary
>    because those bits are invalid.
>    
>    This pull request introduces the following changes:
>    1. Replaces `index + tbl` with the `whilelt + splice` instructions, which
>    offer lower latency and higher throughput.
>    2. Eliminates unnecessary compress operations for partial subword type cases.
>    3. For `sve_compress_byte`, one less temporary register is used to alleviate
>    potential register pressure.
>    
>    Benchmark results demonstrate that these changes significantly improve performance.
>    
>    Benchmarks on Nvidia Grace machine with 128-bit SVE:
>    ```
>    Benchmark	        Unit	Before	 Error	After	 Error	Uplift
>    Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>    Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>    Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>    Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>    ```
>    
>    This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments,
>    and all tests passed.

Marked as reviewed by galder (Author).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27188#pullrequestreview-3223928525

From dlunden at openjdk.org  Mon Sep 15 11:46:55 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 15 Sep 2025 11:46:55 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v8]
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <KFySrpJo-ZYIw4bBUljBT5RWClzTjl1t91Z8gWctzsA=.2f5f8207-3c69-41ea-9b2f-d7cce6cb0a91@github.com>

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Rename set_infinite to set_infinite_stack

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27215/files
  - new: https://git.openjdk.org/jdk/pull/27215/files/cf247cd2..79ebf2c3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27215&range=06-07

  Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/27215.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27215/head:pull/27215

PR: https://git.openjdk.org/jdk/pull/27215

From dlunden at openjdk.org  Mon Sep 15 11:46:57 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 15 Sep 2025 11:46:57 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v7]
In-Reply-To: <4TEpZlUksghJKcxz5Vd0kxvlt13fr-Z-4xdHD91NFtQ=.1ab10a84-1bb7-4704-a86b-dadbda0e38f4@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <uuKRfLk4kK4ri09mjxk_-1kS0hTGaNBZDCs3K7aXU5A=.0c1dfb6e-d0ce-4c8c-9259-38ef54f18048@github.com>
 <YKOrPyGjNn_bZ_pX19eJLXCtqzNUX0v802ZFuJ1g_V0=.e4cf02b9-13dd-4fd3-86cc-be5cbc83b6ea@github.com>
 <4TEpZlUksghJKcxz5Vd0kxvlt13fr-Z-4xdHD91NFtQ=.1ab10a84-1bb7-4704-a86b-dadbda0e38f4@github.com>
Message-ID: <HMD-47-16D7k5LESDD6ECgZ1ENXZgLXPrCVQYKgEakg=.b64f3d4d-a57c-4b17-8bb8-6b7f002ad183@github.com>

On Sat, 13 Sep 2025 04:46:28 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 173:
>> 
>>> 171:   }
>>> 172: 
>>> 173:   void set_infinite() {
>> 
>> Suggestion:
>> 
>>   void set_infinite_stack() {
>> 
>> For consistency with `is_infinite_stack()`.
>
> Yes, it should be `set_infinite_stack` in parallel with `is_infinite_stack`, nice catch!

Good catch, now updated.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27215#discussion_r2348729018

From epeter at openjdk.org  Mon Sep 15 12:12:25 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 12:12:25 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v8]
In-Reply-To: <KFySrpJo-ZYIw4bBUljBT5RWClzTjl1t91Z8gWctzsA=.2f5f8207-3c69-41ea-9b2f-d7cce6cb0a91@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <KFySrpJo-ZYIw4bBUljBT5RWClzTjl1t91Z8gWctzsA=.2f5f8207-3c69-41ea-9b2f-d7cce6cb0a91@github.com>
Message-ID: <3cfLDG8pb2LJyTfZBsd4euG-I-MPmP7jFuj4cColG10=.54cdec43-ea4e-4302-b4e6-e652ba754e77@github.com>

On Mon, 15 Sep 2025 11:46:55 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename set_infinite to set_infinite_stack

Marked as reviewed by epeter (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27215#pullrequestreview-3224298424

From qamai at openjdk.org  Mon Sep 15 12:55:12 2025
From: qamai at openjdk.org (Quan Anh Mai)
Date: Mon, 15 Sep 2025 12:55:12 GMT
Subject: RFR: 8356779: IGV: dump the index of the SafePointNode containing
 the current JVMS during parsing
In-Reply-To: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
References: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
Message-ID: <EvqZHnkOCMUcyknoMYe-nfvcwHSh6mQzHFXI3w40370=.9ceae312-056f-4264-bec1-bc3777909e61@github.com>

On Thu, 4 Sep 2025 05:22:00 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> This PR prints index of the SafePointNode containing the current JVMS during parsing in IGV. As stated in JBS the reason for this is that there are a lot of nodes during parsing, it would be nice to know what are the current nodes in the local slots or in the stack when looking at a graph.
> 
> IGV screenshot of before fix 
> <img width="314" height="675" alt="Screenshot 2025-09-15 at 11 56 54" src="https://github.com/user-attachments/assets/3489f580-f4a3-4f22-86a6-0d9351f1d143" />
> 
> IGV screenshot of after fix
> <img width="314" height="652" alt="Screenshot 2025-09-15 at 11 54 55" src="https://github.com/user-attachments/assets/239c7d60-7a2a-4608-acd0-036bab3ae048" />

Marked as reviewed by qamai (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27083#pullrequestreview-3224467839

From rcastanedalo at openjdk.org  Mon Sep 15 14:01:28 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Mon, 15 Sep 2025 14:01:28 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v8]
In-Reply-To: <KFySrpJo-ZYIw4bBUljBT5RWClzTjl1t91Z8gWctzsA=.2f5f8207-3c69-41ea-9b2f-d7cce6cb0a91@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <KFySrpJo-ZYIw4bBUljBT5RWClzTjl1t91Z8gWctzsA=.2f5f8207-3c69-41ea-9b2f-d7cce6cb0a91@github.com>
Message-ID: <Ydm93wL3jzsDXadmcQLQG5el5uk6ymXe1g45woigcx4=.032db51b-8217-4234-ba68-134dbec59e4c@github.com>

On Mon, 15 Sep 2025 11:46:55 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename set_infinite to set_infinite_stack

Looks good!

-------------

Marked as reviewed by rcastanedalo (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27215#pullrequestreview-3224780646

From shade at openjdk.org  Mon Sep 15 14:08:57 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 15 Sep 2025 14:08:57 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode [v2]
In-Reply-To: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
Message-ID: <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>

> I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:
> 
>  1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
>  2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.
> 
> I think we should be running CTW tests in AWT headless mode to begin with. 
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge branch 'master' into JDK-8367313-ctw-headless-mode
 - Fix

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27187/files
  - new: https://git.openjdk.org/jdk/pull/27187/files/c4684176..75df3054

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27187&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27187&range=00-01

  Stats: 29954 lines in 1034 files changed: 14950 ins; 9185 del; 5819 mod
  Patch: https://git.openjdk.org/jdk/pull/27187.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27187/head:pull/27187

PR: https://git.openjdk.org/jdk/pull/27187

From shade at openjdk.org  Mon Sep 15 14:08:58 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 15 Sep 2025 14:08:58 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode
In-Reply-To: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
Message-ID: <lm35bekxfl7WrocJ_USCc8ozYPKrGgi-sq9-vHI63WM=.0794c1ec-246f-4688-acf4-0b30360e6444@github.com>

On Wed, 10 Sep 2025 08:11:43 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:
> 
>  1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
>  2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.
> 
> I think we should be running CTW tests in AWT headless mode to begin with. 
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`

Friendly reminder; @TobiHartmann, maybe?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27187#issuecomment-3292351557

From roland at openjdk.org  Mon Sep 15 14:25:02 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 15 Sep 2025 14:25:02 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v9]
In-Reply-To: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
Message-ID: <tPTg71BiV4FeyzlVCx6nLOzE6ggwOZZqakhbiiPXIB0=.0d0c76f8-47fc-4cb0-a880-7c8b9c80be4b@github.com>

> A node in a pre loop only has uses out of the loop dominated by the
> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
> to the loop exit projection. A range check in the main loop has this
> node as input (through a chain of some other nodes). Range check
> elimination needs to update the exit condition of the pre loop with an
> expression that depends on the node pinned on its exit: that's
> impossible and the assert fires. This is a variant of 8314024 (this
> one was for a node with uses out of the pre loop on multiple paths). I
> propose the same fix: leave the node with control in the pre loop in
> this case.

Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:

  review

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26424/files
  - new: https://git.openjdk.org/jdk/pull/26424/files/ec28714e..ed103c23

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26424&range=07-08

  Stats: 7 lines in 1 file changed: 6 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/26424.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26424/head:pull/26424

PR: https://git.openjdk.org/jdk/pull/26424

From roland at openjdk.org  Mon Sep 15 14:25:03 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 15 Sep 2025 14:25:03 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v4]
In-Reply-To: <ZR2wT4ta4I8R7bXaqA5deU2wug1ZyEx5INcSFqF8mBI=.5d7ac45a-49a2-43e8-9c44-b0ca71e25fe5@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <6dWR-SxhuKd9-T3q313I6at4vTBcYlufyCBNjGGopv4=.cae3abea-0752-4191-ac08-890476489af3@github.com>
 <wzkv2ry6TsRXprP_LR7q0l9atl1GFIP4RwNgoEGnkV4=.3ce72d24-5da5-423b-b2d3-36f3fcb681dd@github.com>
 <m5wop1pevTtf0G1NtQlQBQsdMPY0wT0rbQdo4H-k2EY=.10fe2102-936e-44bf-b599-5a3844bbeb15@github.com>
 <ZR2wT4ta4I8R7bXaqA5deU2wug1ZyEx5INcSFqF8mBI=.5d7ac45a-49a2-43e8-9c44-b0ca71e25fe5@github.com>
Message-ID: <PtGoUUKoJ9OqzJHQ5bhVDoSGfQ6SC9JQ6rqYeoMl4BA=.c104e514-ab8b-4862-9a3d-7b4b51629c4c@github.com>

On Mon, 15 Sep 2025 09:39:21 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The `@requires` is there because the test run needs command line options that are c2 specific.
>
> @rwestrel

@eme64 is the new commit what you had in mind?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26424#discussion_r2349176886

From shade at openjdk.org  Mon Sep 15 14:27:28 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 15 Sep 2025 14:27:28 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
 [v2]
In-Reply-To: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
References: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
Message-ID: <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>

> See the bug for discussion what issues current machinery has. 
> 
> This PR executes the plan outlined in the bug:
>  1. Common the receiver type profiling code in interpreter and C1
>  2. Rewrite receiver type profiling code to only do atomic receiver slot installations
>  3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed 
> 
> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
> 
> Additional testing:
>   - [x] Linux x86_64 server fastdebug, `compiler/`
>   - [ ] Linux x86_64 server fastdebug, `all`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
 - Drop atomic counters
 - Initial version

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25305/files
  - new: https://git.openjdk.org/jdk/pull/25305/files/e078cfb1..f934435b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=00-01

  Stats: 29954 lines in 1034 files changed: 14950 ins; 9185 del; 5819 mod
  Patch: https://git.openjdk.org/jdk/pull/25305.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305

PR: https://git.openjdk.org/jdk/pull/25305

From epeter at openjdk.org  Mon Sep 15 15:12:42 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 15:12:42 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v9]
In-Reply-To: <tPTg71BiV4FeyzlVCx6nLOzE6ggwOZZqakhbiiPXIB0=.0d0c76f8-47fc-4cb0-a880-7c8b9c80be4b@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <tPTg71BiV4FeyzlVCx6nLOzE6ggwOZZqakhbiiPXIB0=.0d0c76f8-47fc-4cb0-a880-7c8b9c80be4b@github.com>
Message-ID: <Qk2Bc6VQMBPc3dro9kabXA1XMjE3m0iXCaP12f3FG0I=.2dd7170a-6354-4e9d-8a1f-d8d201836cf4@github.com>

On Mon, 15 Sep 2025 14:25:02 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> A node in a pre loop only has uses out of the loop dominated by the
>> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
>> to the loop exit projection. A range check in the main loop has this
>> node as input (through a chain of some other nodes). Range check
>> elimination needs to update the exit condition of the pre loop with an
>> expression that depends on the node pinned on its exit: that's
>> impossible and the assert fires. This is a variant of 8314024 (this
>> one was for a node with uses out of the pre loop on multiple paths). I
>> propose the same fix: leave the node with control in the pre loop in
>> this case.
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Yes, thanks for the updates ?

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26424#pullrequestreview-3225140490

From jbhateja at openjdk.org  Mon Sep 15 15:28:54 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Mon, 15 Sep 2025 15:28:54 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
In-Reply-To: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
Message-ID: <5xYVPgSuC3a9kqp_hRs3vgtBDoJzlmf9v6wgMa9XFJ4=.c8abf0f6-b563-4b3f-92c3-d902b6e59950@github.com>

On Fri, 12 Sep 2025 19:14:18 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.
> 
> Consider `FloatVector::lanewiseTemplate`:
> 
>     FloatVector lanewiseTemplate(VectorOperators.Unary op) {
>         if (opKind(op, VO_SPECIAL)) {
>             ...                             
>             else if (opKind(op, VO_MATHLIB)) {
>                 return unaryMathOp(op);
>             }
>         }
>         int opc = opCode(op);
>         return VectorSupport.unaryOp(opc, ...);
>     }
> 
> 
> At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 
> 
> It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.
> 
> The fix is to fail-fast intrinsification rather than crashing the VM.
> 
> Testing: tier1 - tier4

LGTM

Best Regards

test/hotspot/jtreg/compiler/vectorapi/TestVectorMathLib.java line 40:

> 38:  *                   -XX:CompileCommand=compileonly,compiler.vectorapi.TestVectorMathLib::test*
> 39:  *                   -XX:+StressIncrementalInlining
> 40:  *                       compiler.vectorapi.TestVectorMathLib

Suggestion:

 *                   -XX:+StressIncrementalInlining compiler.vectorapi.TestVectorMathLib

-------------

Marked as reviewed by jbhateja (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27263#pullrequestreview-3225199836
PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2349369795

From epeter at openjdk.org  Mon Sep 15 15:31:24 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 15:31:24 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
Message-ID: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>

Demo from here:
https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/

Cleaned up and enhanced with a JTREG and IR test.
I also added some additional "generated" normal maps from height functions.
And I display the resulting image side-by-side with the normal map.

I decided to put it in a new directory `compiler.galery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "galery") and that we may want to back up with other tests like IR testing.

Here some snapshots:
<img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
<img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
<img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
<img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />

-------------

Commit messages:
 - more prints
 - comments
 - update
 - more details
 - documentation
 - IR rule
 - simplify timeout
 - handle timeouts, maybe a bit cluncky
 - fix path issuesg
 - add stand-alone
 - ... and 1 more: https://git.openjdk.org/jdk/compare/2826d170...1bdaf5fc

Changes: https://git.openjdk.org/jdk/pull/27282/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367657
  Stats: 659 lines in 4 files changed: 659 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27282.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27282/head:pull/27282

PR: https://git.openjdk.org/jdk/pull/27282

From eastigeevich at openjdk.org  Mon Sep 15 15:43:14 2025
From: eastigeevich at openjdk.org (Evgeny Astigeevich)
Date: Mon, 15 Sep 2025 15:43:14 GMT
Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache
 [v43]
In-Reply-To: <sFZEr-LpWdiPRfqZSVsOjFzneGp9ruJ2D_iumNYryIA=.a25df53b-60c0-42ff-bc62-cdc825394d15@github.com>
References: <CpuGkGuFlcXd3ZwuZCG8oWEEa2GKgTs3LaGwpIESm9g=.4807c870-5cce-4f87-aca7-79c1b87e7b0a@github.com>
 <DIk0WWyGJ3TcW7ZgdBXt3Km27oUXEKwUmia-ATfPsHk=.b2dd102b-d7e3-4e47-939e-3e6e627bef91@github.com>
 <sFZEr-LpWdiPRfqZSVsOjFzneGp9ruJ2D_iumNYryIA=.a25df53b-60c0-42ff-bc62-cdc825394d15@github.com>
Message-ID: <_gl77pppG0Zcwu5LuuEHISHJ27TyQuIgvkQ_fovBYJ0=.2c65934b-cab9-45da-9c34-bd45d68d0ef6@github.com>

On Tue, 26 Aug 2025 10:03:29 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix WB_RelocateNMethodFromAddr to not use stale nmethod pointer
>
> A side comment, which I don't find it discussed in JEP or in the issues. (maybe I just missed it)
> There can also be a significant performance improvement using direct jumps versus using indirect jump and reduced memory pressure. E.g. a direct BL vs BL to LDR + BR + <8 byte address>.
> 
> Hence it would be good to place hot methods within the hot area in "call sequences", as an application may have many hot methods totally unrelated to each other. This also means you really would like to have e.g. vtable stub in reach of BL in above case to get the most out of it.

@robehn 

> A side comment, which I don't find it discussed in JEP or in the issues. (maybe I just missed it) There can also be a significant performance improvement using direct jumps versus using indirect jump and reduced memory pressure. E.g. a direct BL vs BL to LDR + BR + <8 byte address>.

Java calls, which are indirect in the original nmethod, will become direct if callees get close to the copy.
Java calls, which are direct in the original nmethod, will become indirect if callees get far from the copy.
We can do this because we use trampolines for Java calls.

Runtime calls, which are indirect in the original nmethod, will stay indirect. Whether runtime calls are direct or indirect depends on the static configuration of CodeCache not on a placement of nmethod in CodeCache. See `target_needs_far_branch()` in `src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp`. We don't use trampolines for runtime calls. Maybe it's worth to switch to use trampolines for runtime calls as well. We have a mechanism of shared trampolines. Runtime calls are always direct for the default CodeCache configuration: 240MB, three code heaps.

> 
> Hence it would be good to place hot methods within the hot area in "call sequences", as an application may have many hot methods totally unrelated to each other.

We are working on an algorithm to identify candidates to be placed together in the hot code heap. It can consider putting together a caller and its callees in the hot code heap.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3292817263

From duke at openjdk.org  Mon Sep 15 15:44:21 2025
From: duke at openjdk.org (Thomas Zimmermann)
Date: Mon, 15 Sep 2025 15:44:21 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
In-Reply-To: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
Message-ID: <Do8L67wogjgIzwdDapssT4QQwBlchXTtZyrGE5tAzf4=.32f1232d-b6ab-4975-a13d-47d22807583e@github.com>

On Mon, 15 Sep 2025 06:01:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Demo from here:
> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
> 
> Cleaned up and enhanced with a JTREG and IR test.
> I also added some additional "generated" normal maps from height functions.
> And I display the resulting image side-by-side with the normal map.
> 
> I decided to put it in a new directory `compiler.galery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "galery") and that we may want to back up with other tests like IR testing.
> 
> There is a **stand-alone** way to run the demo:
> `java test/hotspot/jtreg/compiler/galery/NormalMapping.java`
> (though it may only run with JDK22+, probably due some amber features)
> 
> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />

Shouldn't it be "gallery" or am I missing a joke?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27282#issuecomment-3292820860

From cslucas at openjdk.org  Mon Sep 15 16:48:42 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Mon, 15 Sep 2025 16:48:42 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v2]
In-Reply-To: <MR6smbt34Y5cBmaZVoGh6lE92Tsg6-598lkohdfAD0o=.d517b6b6-cd86-4beb-a270-71b95c3d50d1@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <gFzwQrRGY91LYbe0DluSjX80ET_kk6Kx7arucF2GW5c=.55deed6b-9e47-4000-9916-f1392967aa99@github.com>
 <tXYgEjTd-sJKM-pfP6s4-b1ej_0qFGlsCcLB7hLNYxM=.f72a884f-8b9a-43fc-91ff-0b5a1c91fbfc@github.com>
 <MR6smbt34Y5cBmaZVoGh6lE92Tsg6-598lkohdfAD0o=.d517b6b6-cd86-4beb-a270-71b95c3d50d1@github.com>
Message-ID: <hNyh8-AvnNcz_27wUm_dmeEF2IHgSHwiMoEdioCkTLw=.ac333544-b95a-4860-84f7-5fd46c009f30@github.com>

On Fri, 12 Sep 2025 12:37:54 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/share/opto/escape.cpp line 3135:
>> 
>>> 3133:           Node* phi = use->ideal_node();
>>> 3134:           if (phi->Opcode() == Op_Phi && reducible_merges.member(phi)) {
>>> 3135:             if (!can_reduce_phi(phi->as_Phi())) {
>> 
>> Drive-by comment: I think the ifs should be merged
>
> @JohnTortugo: this comment is marked as resolved in the PR but I cannot see any reply or actual code change, did you perhaps forget pushing the requested change?

Apologies, I clicked resolve and didn't see it later on. I'll push it as soon as I have some time.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27063#discussion_r2349577410

From epeter at openjdk.org  Mon Sep 15 17:27:58 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 17:27:58 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v2]
In-Reply-To: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
Message-ID: <NGnJ-KCo1nn836z293riUS24I4eP7aweLQXNL2iGBLI=.784a0d47-0830-4b37-b7f5-df4490166399@github.com>

> Demo from here:
> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
> 
> Cleaned up and enhanced with a JTREG and IR test.
> I also added some additional "generated" normal maps from height functions.
> And I display the resulting image side-by-side with the normal map.
> 
> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
> 
> There is a **stand-alone** way to run the demo:
> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
> (though it may only run with JDK22+, probably due some amber features)
> 
> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  galery -> gallery

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27282/files
  - new: https://git.openjdk.org/jdk/pull/27282/files/1bdaf5fc..40f1f38f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=00-01

  Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27282.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27282/head:pull/27282

PR: https://git.openjdk.org/jdk/pull/27282

From epeter at openjdk.org  Mon Sep 15 17:41:34 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 17:41:34 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v3]
In-Reply-To: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
Message-ID: <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>

> Demo from here:
> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
> 
> Cleaned up and enhanced with a JTREG and IR test.
> I also added some additional "generated" normal maps from height functions.
> And I display the resulting image side-by-side with the normal map.
> 
> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
> 
> There is a **stand-alone** way to run the demo:
> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
> (though it may only run with JDK22+, probably due some amber features)
> 
> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  fix inlining

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27282/files
  - new: https://git.openjdk.org/jdk/pull/27282/files/40f1f38f..47aa0c7d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=01-02

  Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27282.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27282/head:pull/27282

PR: https://git.openjdk.org/jdk/pull/27282

From dlunden at openjdk.org  Mon Sep 15 17:47:34 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 15 Sep 2025 17:47:34 GMT
Subject: RFR: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp [v8]
In-Reply-To: <KFySrpJo-ZYIw4bBUljBT5RWClzTjl1t91Z8gWctzsA=.2f5f8207-3c69-41ea-9b2f-d7cce6cb0a91@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
 <KFySrpJo-ZYIw4bBUljBT5RWClzTjl1t91Z8gWctzsA=.2f5f8207-3c69-41ea-9b2f-d7cce6cb0a91@github.com>
Message-ID: <azzhawpLAqSF1GfoZb6EV3lA3f48-aGmXPD2sFA-4vU=.86035cbf-567d-490a-af76-829573cecb19@github.com>

On Mon, 15 Sep 2025 11:46:55 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
>> 
>> ### Changeset
>> 
>> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
>> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
>> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
>> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
>> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename set_infinite to set_infinite_stack

Thanks for the reviews everyone!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27215#issuecomment-3293261684

From dlunden at openjdk.org  Mon Sep 15 17:47:36 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Mon, 15 Sep 2025 17:47:36 GMT
Subject: Integrated: 8367397: Improve naming and terminology in regmask.hpp and
 regmask.cpp
In-Reply-To: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
References: <RWGyjwySOg-MiHPpC_AZF8XckEwED7U31mOwlVhaKOs=.10ff23d0-8ff4-4b53-a9d7-fa890f211963@github.com>
Message-ID: <zhaxD088Xs1xvxoj9qdasZoNZvYa45kJ0_xg97NdkIU=.8fc205a8-60c6-4212-9261-d310794c7794@github.com>

On Thu, 11 Sep 2025 10:04:47 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

> Some names in `regmask.hpp` and `regmask.cpp` are unclear and should be improved.
> 
> ### Changeset
> 
> - Rename `RM_SIZE` to `RM_SIZE_IN_INTS` and `_RM_I` to `_RM_INT` to make it clear that these refer to integer-sized (32-bit) array elements.
> - Rename `_RM_SIZE` to `_RM_SIZE_IN_WORDS` and `_RM_UP` to `_RM_WORD` to make it clear that these refer to machine-word-sized (32 or 64 bits depending on platform) array elements.
> - Rename `_RM_MAX` to `_RM_WORD_MAX_INDEX` for clarity.
> - Rename `is_AllStack` to `is_infinite` (and related resulting changes in comments and local variables). The old terminology "all-stack", referring to the infinite register mask bits, is misleading (as pointed out by @eme64 in https://github.com/openjdk/jdk/pull/20404#discussion_r2316234008). The reason is that the infinite bits do not represent *all* stack bits. Some stack bits are instead part of the non-infinite bits of the register mask.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/17638365968)
> - `tier1` and HotSpot parts of `tier2` and `tier3` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.

This pull request has now been integrated.

Changeset: 60930a3e
Author:    Daniel Lund?n <dlunden at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/60930a3e196088e239c902216de07e1cce8407e4
Stats:     135 lines in 12 files changed: 15 ins; 0 del; 120 mod

8367397: Improve naming and terminology in regmask.hpp and regmask.cpp

Reviewed-by: epeter, rcastanedalo, dlong

-------------

PR: https://git.openjdk.org/jdk/pull/27215

From sparasa at openjdk.org  Mon Sep 15 17:54:31 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Mon, 15 Sep 2025 17:54:31 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v5]
In-Reply-To: <JF-5csR-8C7x2ooGamkx5B1s1eY25ehxH0mc-ngL53k=.d6626a6c-6947-4b4d-a026-1e2956c2b216@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
 <JF-5csR-8C7x2ooGamkx5B1s1eY25ehxH0mc-ngL53k=.d6626a6c-6947-4b4d-a026-1e2956c2b216@github.com>
Message-ID: <7RTXYjdRF7b27DdNVQHUQx0vmUhr-sqm8XU1cIAoLLo=.638d7f26-d065-47d3-ae08-1e36f75463d5@github.com>

On Thu, 11 Sep 2025 16:25:32 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   undo new match rules for RegMemReg for commutative operations
>
> Hi Emanuel (@eme64),
> 
> Could you please run the tests for this PR?
> 
> Thanks,
> Vamsi

> @vamsi-parasa Quickly scanned the patch, looks reasonable. Launching tests ?

Hi Emanuel (@eme64), could you please let me know if the tests passed?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26997#issuecomment-3293301474

From roland at openjdk.org  Mon Sep 15 18:16:05 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 15 Sep 2025 18:16:05 GMT
Subject: RFR: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit,
 limit_ctrl), pre_end)) failed: node pinned on loop exit test? [v4]
In-Reply-To: <df5Xmw34tA_D9dMVgZMhWZMJ5m4Go__wZXeF95bLK9w=.d2bbee06-90ba-4066-b764-333e02b51010@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
 <SGrdYRJonE7IeyU3AwADQcdZrKgkZggYb7utb4-vE0o=.1646c8b9-e5b0-4d0e-bb79-4452b115e4f9@github.com>
 <df5Xmw34tA_D9dMVgZMhWZMJ5m4Go__wZXeF95bLK9w=.d2bbee06-90ba-4066-b764-333e02b51010@github.com>
Message-ID: <-9gD2_d_l2sz1el2iIeJc_GiWA0gmSwdeCygX5aqHbk=.938ce734-f23d-4275-b710-9635e8ce6e2e@github.com>

On Tue, 5 Aug 2025 09:43:35 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8361702
>>  - Update src/hotspot/share/opto/loopopts.cpp
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE3.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update src/hotspot/share/opto/loopopts.cpp
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - Update test/hotspot/jtreg/compiler/rangechecks/TestSunkRangeFromPreLoopRCE2.java
>>    
>>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
>>  - tests
>>  - fix
>
> Thank you for working on this, @rwestrel. It looks good to me.

@mhaessig @chhagedorn @eme64 thanks for the reviews

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26424#issuecomment-3293366705

From roland at openjdk.org  Mon Sep 15 18:16:07 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 15 Sep 2025 18:16:07 GMT
Subject: Integrated: 8361702: C2: assert(is_dominator(compute_early_ctrl(limit, 
 limit_ctrl), pre_end)) failed: node pinned on loop exit test?
In-Reply-To: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
References: <1-3MDixhdwZEgDMpoAZckhK5_lFjygsKl4q1__tsCKs=.dffa9c0e-8ea1-4465-a1fc-6ad2dbcfe5db@github.com>
Message-ID: <RV9SzkFafDjSgtWytGRgY-wGXviT65Z0rId_iMWqLAg=.974129c0-94d1-4a2d-ab42-c1bd8acaa947@github.com>

On Tue, 22 Jul 2025 08:25:08 GMT, Roland Westrelin <roland at openjdk.org> wrote:

> A node in a pre loop only has uses out of the loop dominated by the
> loop exit. `PhaseIdealLoop::try_sink_out_of_loop()` sets its control
> to the loop exit projection. A range check in the main loop has this
> node as input (through a chain of some other nodes). Range check
> elimination needs to update the exit condition of the pre loop with an
> expression that depends on the node pinned on its exit: that's
> impossible and the assert fires. This is a variant of 8314024 (this
> one was for a node with uses out of the pre loop on multiple paths). I
> propose the same fix: leave the node with control in the pre loop in
> this case.

This pull request has now been integrated.

Changeset: f8ba02f2
Author:    Roland Westrelin <roland at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/f8ba02f2296f0ef0227f90e0e1ed116121e68231
Stats:     184 lines in 4 files changed: 166 ins; 7 del; 11 mod

8361702: C2: assert(is_dominator(compute_early_ctrl(limit, limit_ctrl), pre_end)) failed: node pinned on loop exit test?

Reviewed-by: epeter, chagedorn, mhaessig

-------------

PR: https://git.openjdk.org/jdk/pull/26424

From epeter at openjdk.org  Mon Sep 15 20:25:11 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Mon, 15 Sep 2025 20:25:11 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v3]
In-Reply-To: <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
Message-ID: <CzIc8swnGGEOGbEVzd0Nz7NRrXeT2fFbenDQ4f_DZos=.f69a9a91-aef2-40dd-9f17-2d3f389cb650@github.com>

On Mon, 15 Sep 2025 17:41:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Demo from here:
>> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
>> 
>> Cleaned up and enhanced with a JTREG and IR test.
>> I also added some additional "generated" normal maps from height functions.
>> And I display the resulting image side-by-side with the normal map.
>> 
>> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
>> 
>> There is a **stand-alone** way to run the demo:
>> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
>> (though it may only run with JDK22+, probably due some amber features)
>> 
>> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
>> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
>> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
>> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
>> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix inlining

@grfrost In case you missed it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27282#issuecomment-3293800874

From dlong at openjdk.org  Mon Sep 15 23:00:37 2025
From: dlong at openjdk.org (Dean Long)
Date: Mon, 15 Sep 2025 23:00:37 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
 <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
 <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>
Message-ID: <IHy67LkRLB0zhka_u6d5__rQmA8iXUkObu5DzkKcgYI=.8ee80763-b01c-49a8-b4ea-fae2c83eb870@github.com>

On Thu, 11 Sep 2025 18:24:46 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> @iwanowww Let me know whenever this is ready to review again ?
>
> @eme64 I think I addressed/answered all your suggestions/questions. Please, take another look. Thanks!

@iwanowww , do you have a test that shows constant oops are a problem?  My initial impression is that PreserveReachabilityFencesOnConstants shouldn't be needed, because any oops referenced during the compile should go into the ciEnv metadata[] and then into the nmethod oops.  So GC can't reclaim these oops because the nmethod keeps references to them.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3294254199

From dlong at openjdk.org  Tue Sep 16 01:27:22 2025
From: dlong at openjdk.org (Dean Long)
Date: Tue, 16 Sep 2025 01:27:22 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>
Message-ID: <U1p9TRvNC5OzFwxraenVJ-4R4AV5Tnyhho025Fyh-ow=.5b0cc316-6852-4f90-aede-7363eac525e7@github.com>

On Fri, 12 Sep 2025 13:39:03 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/callGenerator.cpp line 620:
>> 
>>> 618:     // Inlining logic doesn't expect any extra edges past debug info and fails with
>>> 619:     // an assert in SafePointNode::grow_stack.
>>> 620:     assert(endoff == call->req(), "reachability edges not supported");
>> 
>> Could we trip over this assert by modifying the reproducer, and add some method somewhere that gets inlined late?
>
> Could we also bail out here? Or what would happen now in production if there is a RF edge?

We also use this area past endoff() for storing the "ex_oop" (see for example GraphKit::has_saved_ex_oop()).  Are ex_oop and reachability edges mutually exclusive?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2350439892

From cslucas at openjdk.org  Tue Sep 16 02:35:01 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Tue, 16 Sep 2025 02:35:01 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v4]
In-Reply-To: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
Message-ID: <vMfMlE3PBXuzWabH2c3DNvZRV9xHiQxE9phvtF2QiaM=.1c6d4cb9-c19d-42cd-b6e7-a4123d72d35d@github.com>

> Please, review this patch to fix issue that may occur when reducing allocation merge.
> 
> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
> 
> The change in `revisit_reducible_phi_status` is just a clean-up.
> The real fix is in `find_scalar_replaceable_allocs`.
> 
> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.

Cesar Soares Lucas has updated the pull request incrementally with two additional commits since the last revision:

 - Merge remote-tracking branch 'refs/remotes/origin/ram-non-reducible' into ram-non-reducible
 - Merge consecutive ifs

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27063/files
  - new: https://git.openjdk.org/jdk/pull/27063/files/28d9432e..2236348b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27063&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27063&range=02-03

  Stats: 7 lines in 1 file changed: 0 ins; 2 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27063.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27063/head:pull/27063

PR: https://git.openjdk.org/jdk/pull/27063

From epeter at openjdk.org  Tue Sep 16 05:38:13 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 05:38:13 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v5]
In-Reply-To: <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
Message-ID: <lHnIF_1S19u7UH5A4QNzvjZ3AZjH9uzRtnAVkZFuX5s=.3871ed99-4c8e-4455-85b4-a841fe34c71d@github.com>

On Thu, 11 Sep 2025 00:45:45 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
>> 
>> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
>> 
>> For example:
>> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
>> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   undo new match rules for RegMemReg for commutative operations

Not reviewed in detail, but looks reasonable.
Tests pass :)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26997#pullrequestreview-3227427375

From epeter at openjdk.org  Tue Sep 16 06:20:26 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 06:20:26 GMT
Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant
 safepoints in loops [v3]
In-Reply-To: <sn_DXhVbdmCWY_uSmAh_vTO-GXigoAEXx8XGDgKen4Q=.21e20dde-0d99-4aa8-a8fb-b647055a2afe@github.com>
References: <gH_7R5UQ0P_p9lE00k_O08uypVhvDiYBQM6fR71lnI4=.fc388dd3-cb23-44fd-8139-7b9fb95c227a@github.com>
 <sn_DXhVbdmCWY_uSmAh_vTO-GXigoAEXx8XGDgKen4Q=.21e20dde-0d99-4aa8-a8fb-b647055a2afe@github.com>
Message-ID: <QnBB1_nSHcpWJ96_Tag0PeCPzigbdbfDkpZQcgg2n3w=.2e5c596f-e67e-4592-9bc8-14cd7e6e0cc1@github.com>

On Thu, 22 May 2025 07:53:39 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated.
>> 
>> <img width="353" alt="loop-safepoint" src="https://github.com/user-attachments/assets/c220e103-aaba-4e3f-98ac-1ddb6465c309" />
>> 
>> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64).
>> 
>> 
>> Benchmark                Mode  Cnt       Score      Error  Units
>> LoopSafepoint.loopVar    avgt   15  208296.259 ? 1350.409  ns/op   # baseline
>> LoopSafepoint.loopVar    avgt   15  200692.874 ?  616.770  ns/op   # this patch
>> 
>> 
>> Testing: tier1-2 on x86_64 and aarch64.
>
> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Improve documentation comments

Wow, this took me way too long to have a look at.
But I feel like now I understand what's going on, so thanks for the additional changes on the documentation.

The approach seems very reasonable. I'd lilke to see a few more tests though. Maybe you can instead point me to tests that already exist, that would be fine too.

I'm soon going on vacation, so it may take me even more time to re-review.
But I'd suggest that @rwestrel look at this PR, since he last worked on this code.

@MaxXSoft Can you please merge with master as well?

I think we should also run some larger benchmarking on this patch, just to see if there are any surprises (I'd expect mostly improvements, but we shall see).

src/hotspot/share/opto/loopnode.cpp line 3818:

> 3816: //               /   |  |
> 3817: //              v    +--+
> 3818: //        exit  4

This drawing seems a bit confusing. There seem to be 3 edges coming out of 2.
Do you think you could fix it too, just to create more clarity in the code?

src/hotspot/share/opto/loopnode.cpp line 3830:

> 3828: //
> 3829: // The insights into the problem:
> 3830: //  A) Counted loops are okay

What does it mean to be "okay"? Why are they "okay"?

src/hotspot/share/opto/loopnode.cpp line 3832:

> 3830: //  A) Counted loops are okay
> 3831: //  B) Innermost loops are okay because there's no inner loops can delete
> 3832: //     their ncsfpts

Suggestion:

//  B) Innermost loops are okay because there's no inner loops that can
//     delete their ncsfpts

Missing `that`. I feel that we are now losing information. The previous comment made a promise that your comment does not make any more. Is that intentional?

It seems the logic was: only outer loops need to mark safepoints for protection, because only loops further in can remove safepoints. Is that still correct?

src/hotspot/share/opto/loopnode.cpp line 3840:

> 3838: //     inside any nested loop, then that loop is okay
> 3839: //  E) Otherwise, if an outer loop's ncsfpt on the idom-path is nested in
> 3840: //     an inner loop, we need to prevent the inner loop from deleting it

Nice, that's indeed an improvement :)

test/hotspot/jtreg/compiler/c2/irTests/TestLoopSafepoint.java line 24:

> 22:  */
> 23: 
> 24: package compiler.c2.irTests;

We'd like to get away from putting all IR tests in `irTests`, and we'd rather put them into thematic directories.
Proposal: `compiler/loopopts/TestRedundantSafePointElimination.java`

test/hotspot/jtreg/compiler/c2/irTests/TestLoopSafepoint.java line 33:

> 31:  * @summary Tests that redundant safepoints can be eliminated in loops.
> 32:  * @library /test/lib /
> 33:  * @requires vm.compiler2.enabled

Is this `@requires` strictly required? If not, remove it so we can run these tests also with C1 and other compilers.

test/hotspot/jtreg/compiler/c2/irTests/TestLoopSafepoint.java line 66:

> 64:             empty();
> 65:         }
> 66:     }

So these do not end up being CountedLoop?

test/hotspot/jtreg/compiler/c2/irTests/TestLoopSafepoint.java line 84:

> 82:             empty();
> 83:         }
> 84:     }

All of the cases here are only single loops, right? But is the algorithm not mostly dealing with nested loops, where we have to make sure that in some cases the `SafePoint` is not eliminated? Could you add some extra cases for that?

test/micro/org/openjdk/bench/vm/compiler/LoopSafepoint.java line 76:

> 74:         }
> 75:         return sum;
> 76:     }

I think it would be nice if you made the examples in the JMH and the JTREG as similar as possible.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/23057#pullrequestreview-3227461101
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350903845
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350907448
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350916805
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350922211
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350951501
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350952358
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350961237
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350965087
PR Review Comment: https://git.openjdk.org/jdk/pull/23057#discussion_r2350967582

From epeter at openjdk.org  Tue Sep 16 06:28:24 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 06:28:24 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v13]
In-Reply-To: <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
Message-ID: <2DmjQ5OVBp_bHyQl1gMIB6-vNn8AgXjHbK2Geu9pWr8=.c08a85a0-c63e-4d31-9140-64c44c7b8cd6@github.com>

On Mon, 15 Sep 2025 05:43:11 GMT, erifan <duke at openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
>> 
>> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
>> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
>> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
>> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
>> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
>> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
>> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
>> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
>> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
>> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
>> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
>> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
>> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
>> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
>> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
>> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
>> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
>> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
>> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
>> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
>> testCompareLTMaskNotInt		ops/s	16721...
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add an IR rule for vector mask cast operation

@erifan Nice work on the benchmark refactor! And thanks for the other updates.

I'll run some testing now, should take about 24h.

-------------

PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-3227664072

From epeter at openjdk.org  Tue Sep 16 06:31:34 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 06:31:34 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v9]
In-Reply-To: <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
Message-ID: <y2HwUNAB48NE1pgrVSszhEjaL0f3_0ZkjLyuWLTNBBQ=.6095e6ad-1e1a-4d88-ac04-7c6842b70857@github.com>

On Sun, 14 Sep 2025 14:44:02 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove unused parameter

Testing looks good. Minor changes should be ok, as long as GitHub Actions passes.

Thanks for all the work @SirYwell !

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/25254#pullrequestreview-3227698583

From epeter at openjdk.org  Tue Sep 16 06:49:20 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 06:49:20 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
In-Reply-To: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
Message-ID: <sZvkxuXnOiN1VKfG92NEmZl2f4g0xdTBwF62_lhWlZg=.c5c434d8-00fa-4b26-a127-dad00aea9fe6@github.com>

On Fri, 12 Sep 2025 19:14:18 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.
> 
> Consider `FloatVector::lanewiseTemplate`:
> 
>     FloatVector lanewiseTemplate(VectorOperators.Unary op) {
>         if (opKind(op, VO_SPECIAL)) {
>             ...                             
>             else if (opKind(op, VO_MATHLIB)) {
>                 return unaryMathOp(op);
>             }
>         }
>         int opc = opCode(op);
>         return VectorSupport.unaryOp(opc, ...);
>     }
> 
> 
> At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 
> 
> It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.
> 
> The fix is to fail-fast intrinsification rather than crashing the VM.
> 
> Testing: tier1 - tier4

Changes requested by epeter (Reviewer).

test/hotspot/jtreg/compiler/vectorapi/TestVectorMathLib.java line 33:

> 31:  * @test
> 32:  * @bug 8367333
> 33:  * @requires vm.compiler2.enabled

Do you need this `@requires`? It might be nice to be able to run this with other compilers too.

test/hotspot/jtreg/compiler/vectorapi/TestVectorMathLib.java line 40:

> 38:  *                   -XX:CompileCommand=compileonly,compiler.vectorapi.TestVectorMathLib::test*
> 39:  *                   -XX:+StressIncrementalInlining
> 40:  *                       compiler.vectorapi.TestVectorMathLib

Like @jatin-bhateja mentioned: alignment is off.
I'd also like to see a run without flags, maybe with only `-XX:CompileCommand=compileonly,compiler.vectorapi.TestVectorMathLib::test*`

-------------

PR Review: https://git.openjdk.org/jdk/pull/27263#pullrequestreview-3227803098
PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2351063466
PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2351069465

From hgreule at openjdk.org  Tue Sep 16 06:50:28 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Tue, 16 Sep 2025 06:50:28 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value
In-Reply-To: <aPUCL3Aqezo5Hc4vID-htuZ_M22G0-vQR_-u1ZLT0ds=.2feff632-6e64-47ac-a943-7f54847c9969@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <EOLqK3ulrKNtgzmlWbNpwvCdg8sBaABmXNGdlucIurI=.7ce09643-efd1-4e3f-91f1-6e8040f4a51f@github.com>
 <hScWI2VL-Cc2H-kQUfhd32fPCAkbXLHCUNh-2XZutsE=.2518bc77-83c4-4b85-af22-3230fe310130@github.com>
 <aPUCL3Aqezo5Hc4vID-htuZ_M22G0-vQR_-u1ZLT0ds=.2feff632-6e64-47ac-a943-7f54847c9969@github.com>
Message-ID: <yNzG0F73DhkS_mTaWyO7bKfPYuYlxinOtC9v5QBNBX0=.c7b924c3-73bf-49d8-8430-ebe8ff309adc@github.com>

On Fri, 12 Sep 2025 12:12:21 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> @merykitty thanks, I hopefully addressed your comments :)
>> 
>> @eme64 do you want to re-run the tests once again?
>
> @SirYwell Launching tests ?

Thanks @eme64! Do I need another re-approval from @merykitty or are we ready to integrate?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3295932705

From epeter at openjdk.org  Tue Sep 16 06:51:38 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 06:51:38 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode [v2]
In-Reply-To: <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
 <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
Message-ID: <Kuu7xiJHnurxjpmgBW6eSVqZpBCJ4p2kjUPVNoCn4_A=.76ab934b-7ba2-4d53-b66a-119c4f82fb6a@github.com>

On Mon, 15 Sep 2025 14:08:57 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:
>> 
>>  1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
>>  2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.
>> 
>> I think we should be running CTW tests in AWT headless mode to begin with. 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8367313-ctw-headless-mode
>  - Fix

@TobiHartmann is on vacation. Maybe @vnkozlov ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27187#issuecomment-3295941040

From epeter at openjdk.org  Tue Sep 16 06:54:35 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 06:54:35 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode [v2]
In-Reply-To: <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
 <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
Message-ID: <5OMky7jrDAsxrSC50xEfkVP1mFvoFQ0VB2trl46a7i8=.bda7d7f9-35b5-4553-b0ff-b26776cfef57@github.com>

On Mon, 15 Sep 2025 14:08:57 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:
>> 
>>  1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
>>  2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.
>> 
>> I think we should be running CTW tests in AWT headless mode to begin with. 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8367313-ctw-headless-mode
>  - Fix

Looks reasonable. I'll run some internal testing, takes about 24h.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27187#pullrequestreview-3227841960

From manc at openjdk.org  Tue Sep 16 06:57:16 2025
From: manc at openjdk.org (Man Cao)
Date: Tue, 16 Sep 2025 06:57:16 GMT
Subject: RFR: 8367613: Test compiler/runtime/TestDontCompileHugeMethods.java
 failed
Message-ID: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>

Hi,

Could anyone approve this change that exclude this test when running with `-Xcomp`? This avoids the test failure reported in [JDK-8367613](https://bugs.openjdk.org/browse/JDK-8367613).

For reasons I don't yet understand, the `HugeSwitch::shortMethod` method is not compiled under `-Xcomp  -XX:TieredStopAtLevel=1`. The method gets compiled with either `-Xcomp` or `-XX:TieredStopAtLevel=1`, but not both. I appreciate if anyone could provide insights on possible reasons.

-------------

Commit messages:
 - 8367613: Test compiler/runtime/TestDontCompileHugeMethods.java failed

Changes: https://git.openjdk.org/jdk/pull/27306/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27306&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367613
  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27306.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27306/head:pull/27306

PR: https://git.openjdk.org/jdk/pull/27306

From chagedorn at openjdk.org  Tue Sep 16 07:09:03 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 16 Sep 2025 07:09:03 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v3]
In-Reply-To: <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
Message-ID: <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>

On Mon, 15 Sep 2025 17:41:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Demo from here:
>> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
>> 
>> Cleaned up and enhanced with a JTREG and IR test.
>> I also added some additional "generated" normal maps from height functions.
>> And I display the resulting image side-by-side with the normal map.
>> 
>> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
>> 
>> There is a **stand-alone** way to run the demo:
>> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
>> (though it may only run with JDK22+, probably due some amber features)
>> 
>> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
>> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
>> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
>> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
>> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix inlining

Great work and thanks for sharing it! A few small suggestions, otherwise, it looks good to me!

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 40:

> 38: import java.awt.geom.Path2D;
> 39: import javax.swing.JPanel;
> 40: import java.awt.Font;

Some unused imports (double check again after removing):
Suggestion:

import java.awt.Graphics;
import java.awt.Graphics2D;
import java.awt.Color;
import java.awt.image.BufferedImage;
import java.awt.image.DataBufferInt;
import java.io.IOException;
import java.util.Random;
import javax.swing.JPanel;
import java.awt.Font;

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 88:

> 86:         System.out.println("Welcome to the Normal Mapping Demo!");
> 87:         // Create an applicateion state with 5 lights.
> 88:         State state = new State(5);

I suggest to put `5` into a named constant. This invites to play around with different number of lights.

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 93:

> 91:         System.out.println("Setting up Window...");
> 92:         MyDrawingPanel panel = new MyDrawingPanel(state);
> 93:         JFrame frame = new JFrame("Normal Mapping Demo (Auto Vectorization)");

Suggestion:

        JFrame frame = new JFrame("Normal Mapping Demo (Auto-Vectorization)");

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 121:

> 119:     }
> 120: 
> 121:     public static File getLocalFile(String name) {

Isn't `name` always constant (i.e. `normal_map.png`)? Then you could also extract that to a constant and use it here directly.

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 149:

> 147:     }
> 148: 
> 149:     public static class Light {

Maybe add a quick comment what this class does since it's a demo and one might want to better understand what's going on. Same for `State` class below.

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 170:

> 168:             dy *= 0.99;
> 169:             dx += RANDOM.nextFloat() * 0.001 - 0.0005;;
> 170:             dy += RANDOM.nextFloat() * 0.001 - 0.0005;;

Suggestion:

            dx += RANDOM.nextFloat() * 0.001 - 0.0005;
            dy += RANDOM.nextFloat() * 0.001 - 0.0005;

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 244:

> 242: 
> 243:         public void nextNormals() {
> 244:             switch(nextNormalsId) {

Suggestion:

            switch (nextNormalsId) {

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 299:

> 297:         interface HeightFunction {
> 298:             // x and y should be in [0..1]
> 299:             public double call(double x, double y);

Implicit:
Suggestion:

            double call(double x, double y);

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 310:

> 308: 
> 309:                 // A selection of "height functions":
> 310:                 return switch(name) {

Suggestion:

                return switch (name) {

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 314:

> 312:                     case "heart" -> {
> 313:                         double heart = Math.abs(Math.pow(x*x + y*y - 1, 3) - x*x * Math.pow(-y, 3));
> 314:                         double decay = Math.exp(-(x*x + y*y));

Suggestion:

                        double heart = Math.abs(Math.pow(x * x + y * y - 1, 3) - x * x * Math.pow(-y, 3));
                        double decay = Math.exp(-(x * x + y * y));

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 318:

> 316:                     }
> 317:                     case "hill" ->    0.5 * Math.exp(-(x*x + y*y));
> 318:                     case "ripple" ->  0.01 * Math.sin(x*x + y*y);

Suggestion:

                    case "hill" ->    0.5 * Math.exp(-(x * x + y * y));
                    case "ripple" ->  0.01 * Math.sin(x * x + y * y);

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 411:

> 409:             for (int i = 0; i < lights.length; i++) {
> 410:                 lights[i].update();
> 411:             }

As below, you could use enhanced-for:
Suggestion:

            for (Light light : lights) {
                light.update();
            }

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 417:

> 415:             Arrays.fill(outputArray, 0);
> 416: 
> 417:             // Add inn the contribution of each light

Suggestion:

            // Add in the contribution of each light

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 463:

> 461:                 float luminosity = Math.max(0, dotProduct / d3) * luminosityCorrection;
> 462: 
> 463:                 // Now we we compute the color values that hopefully end up in the range

Suggestion:

                // Now we compute the color values that hopefully end up in the range

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 480:

> 478: 
> 479:         // This is a bit of a horrible hack, but it mostly works.
> 480:         // Essencially, it tries to solve the "exposure" problem:

Suggestion:

        // Essentially, it tries to solve the "exposure" problem:

test/hotspot/jtreg/compiler/gallery/TestNormalMapping.java line 29:

> 27:  * @summary Visual example of auto vectorization: normal mapping.
> 28:  * @library /test/lib /
> 29:  * @run main compiler.gallery.TestNormalMapping ir

This should be `driver` because otherwise, we will be stressing the driver VM when run with `Xcomp` etc.
Suggestion:

 * @run driver compiler.gallery.TestNormalMapping ir

-------------

PR Review: https://git.openjdk.org/jdk/pull/27282#pullrequestreview-3227811932
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351083668
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351084533
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351069089
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351078912
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351088352
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351085454
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351093619
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351098053
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351098915
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351101451
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351102209
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351110603
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351104989
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351112255
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351112981
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351131694

From epeter at openjdk.org  Tue Sep 16 07:09:58 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:09:58 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v2]
In-Reply-To: <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>
Message-ID: <EsiYouuvqjFpbUVOKPtUBymx12t--iEc7QNUwBrdDJo=.545aa38b-8933-44a7-9ae5-51872308596c@github.com>

On Mon, 15 Sep 2025 09:58:20 GMT, erifan <duke at openjdk.org> wrote:

>> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
>> 
>> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
>> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
>> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
>> 
>> This pull request introduces the following changes:
>> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
>> 2. Eliminates unnecessary compress operations for partial subword type cases.
>> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
>> 
>> Benchmark results demonstrate that these changes significantly improve performance.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
>> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>> 
>> 
>> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
> 
>  - Merge branch 'master' into JDK-8366333-compress
>  - 8366333: AArch64: Enhance SVE subword type implementation of vector compress
>    
>    The AArch64 SVE and SVE2 architectures lack an instruction suitable for
>    subword-type `compress` operations. Therefore, the current implementation
>    uses the 32-bit SVE `compact` instruction to compress subword types by
>    first widening the high and low parts to 32 bits, compressing them, and
>    then narrowing them back to their original type. Finally, the high and
>    low parts are merged using the `index + tbl` instructions.
>    
>    This approach is significantly slower compared to architectures with native
>    support. After evaluating all available AArch64 SVE instructions and
>    experimenting with various implementations?such as looping over the active
>    elements, extraction, and insertion?I confirmed that the existing algorithm
>    is optimal given the instruction set. However, there is still room for
>    optimization in the following two aspects:
>    1. Merging with `index + tbl` is suboptimal due to the high latency of
>    the `index` instruction.
>    2. For partial subword types, operations to the highest half are unnecessary
>    because those bits are invalid.
>    
>    This pull request introduces the following changes:
>    1. Replaces `index + tbl` with the `whilelt + splice` instructions, which
>    offer lower latency and higher throughput.
>    2. Eliminates unnecessary compress operations for partial subword type cases.
>    3. For `sve_compress_byte`, one less temporary register is used to alleviate
>    potential register pressure.
>    
>    Benchmark results demonstrate that these changes significantly improve performance.
>    
>    Benchmarks on Nvidia Grace machine with 128-bit SVE:
>    ```
>    Benchmark	        Unit	Before	 Error	After	 Error	Uplift
>    Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>    Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>    Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>    Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>    ```
>    
>    This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments,
>    and all tests passed.

Drive-by comments, going on vacation soon so don't depend on me fully reviewing this any time soon ;)

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2287:

> 2285:   sve_compress_short(dst, vtmp1, ptmp, vtmp2, vtmp3, pgtmp, extended_size > MaxVectorSize ? MaxVectorSize : extended_size);
> 2286:   // Narrow the result back to type BYTE.
> 2287:   // dst   = 0 0 0 0 0 0 0 0 0 0 0 0 0 g c a

Can you make sure that your examples are all nicely aligned?

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2315:

> 2313:   // Combine the compressed low with the compressed high.
> 2314:   // dst   = 0 0 0 0 0 0 0 0 0 0 0 p i g c a
> 2315:   sve_splice(dst, B, ptmp, vtmp1);

Alignment of examples would be nice

test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 36:

> 34:  * @key randomness
> 35:  * @library /test/lib /
> 36:  * @summary AArch64: Enhance SVE subword type implementation of vector compress

I would change the summary to something a bit more generic, since the test is not only good for aarch64 / SVE.
Suggestion:

 * @summary IR test for VectorAPI compress

test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 214:

> 212: 
> 213:     @Test
> 214:     @IR(counts = { IRNode.COMPRESS_VD, "= 1" }, applyIfCPUFeature = { "sve", "true" })

Could you please change this so that the `applyIfCPUFeature` is on a new line?
That would make it easier to add more platforms later :)

test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 228:

> 226:                      .start();
> 227:     }
> 228: }

Question: is there already another test that checks `compress`?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27188#pullrequestreview-3227854355
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2351095704
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2351097303
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2351125031
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2351129802
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2351138273

From epeter at openjdk.org  Tue Sep 16 07:18:37 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:18:37 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v8]
In-Reply-To: <HPgkmQwoaMXSWdXiMXkbqSoMnI13yPpPBJSPxZKTxnc=.f978ac37-462f-496e-b5ec-bf3005cb7e5a@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <HPgkmQwoaMXSWdXiMXkbqSoMnI13yPpPBJSPxZKTxnc=.f978ac37-462f-496e-b5ec-bf3005cb7e5a@github.com>
Message-ID: <Q8CV6qZKlcjxQTAp6SCDaQPs-JGWLxgUh0nYz0vdKA0=.4f22adc9-41b7-4b7c-b95a-9ebd7f283a60@github.com>

On Mon, 15 Sep 2025 08:20:35 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Extending the random ranges

Changes requested by epeter (Reviewer).

test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 46:

> 44:     static final int SIZE = 4096;
> 45: 
> 46:     static int rand_numI = G.uniformInts(Integer.MIN_VALUE, Integer.MAX_VALUE).next();

Why not just take `G.ints().next()`?

test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 56:

> 54:     static final long rand_bndL2 = G.uniformLongs(-0xFFFFFFL, 0xFFFFFF).next();
> 55:     static final long rand_popcL1 = G.uniformLongs(0, 4).next();
> 56:     static final long rand_popcL2 = G.uniformLongs(0, 32).next();

Can you please give us some code comments why you are doing:
- only uniform distribution. Is that needed? Generators generates special values more often for a good reason: it creates interesting edge cases, especially for bit operations like this here.
- Why are you restricting the ranges? There could always be surprises outside the ranges you pick, and it would be a shame to not generate those. Unless you are absolutely sure they are not needed. Or if extending the range would mean we would generate interesting cases with a probability that is too small, that could be another reason to restrict the ranges.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27075#pullrequestreview-3227945975
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2351152244
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2351166568

From epeter at openjdk.org  Tue Sep 16 07:22:36 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:22:36 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value
In-Reply-To: <yNzG0F73DhkS_mTaWyO7bKfPYuYlxinOtC9v5QBNBX0=.c7b924c3-73bf-49d8-8430-ebe8ff309adc@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <EOLqK3ulrKNtgzmlWbNpwvCdg8sBaABmXNGdlucIurI=.7ce09643-efd1-4e3f-91f1-6e8040f4a51f@github.com>
 <hScWI2VL-Cc2H-kQUfhd32fPCAkbXLHCUNh-2XZutsE=.2518bc77-83c4-4b85-af22-3230fe310130@github.com>
 <aPUCL3Aqezo5Hc4vID-htuZ_M22G0-vQR_-u1ZLT0ds=.2feff632-6e64-47ac-a943-7f54847c9969@github.com>
 <yNzG0F73DhkS_mTaWyO7bKfPYuYlxinOtC9v5QBNBX0=.c7b924c3-73bf-49d8-8430-ebe8ff309adc@github.com>
Message-ID: <T345sRDLncndmjsLh_NEvBXRtJyTfxFeAjmQwICk_yQ=.cd444d29-d85d-4c0d-9382-981124e6bf3f@github.com>

On Tue, 16 Sep 2025 06:47:43 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> @SirYwell Launching tests ?
>
> Thanks @eme64! Do I need another re-approval from @merykitty or are we ready to integrate?

@SirYwell @merykitty Let's give him 24h. If he does not respond, you can integrate in my opinion.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3296167247

From epeter at openjdk.org  Tue Sep 16 07:33:22 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:33:22 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation
In-Reply-To: <YxrR371gpyMeJEbioD-Bupej5RSArZgKCKP1BuQLMQQ=.ae9cdec7-59e4-4969-821c-7e8c0bcefbdf@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <tmEj88Ez4DURxmS7pPm8t1lhRct4fHlQcBySljEu-tg=.a18e80cf-7711-4ac6-990f-4c630b90f98b@github.com>
 <YxrR371gpyMeJEbioD-Bupej5RSArZgKCKP1BuQLMQQ=.ae9cdec7-59e4-4969-821c-7e8c0bcefbdf@github.com>
Message-ID: <zQvZZ7A_01Czb0l17glhs7kNeQwQfzNiWFyovr0MTf8=.876dd110-5222-465e-b1e6-99e5e2f41fe9@github.com>

On Tue, 9 Sep 2025 09:26:31 GMT, erifan <duke at openjdk.org> wrote:

>> The algorithm description here is great. Please paste all of it from "Since there are" to "but with different instructions where appropriate." into this PR, before the vector expand implementation.
>
> @theRealAph @e1iu could you help take another look of this PR, thanks !

@erifan I'm seeing `gtest/GTestWrapper.java` fail on `aarch64` machines.

Looks like this:

[ RUN      ] AssemblerAArch64.validate_vm
[12.324s][warning][os] Loading hsdis library failed
.../test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp:49: Failure
Expected equality of these values:
  insns[i]
    Which is: 335545527
  insns1[i]
    Which is: 335545526
Ours:

Loading hsdis library failed, undisassembled code is shown in MachCode section
[MachCode]
  0x0000ffff97c20548: b604 0014 
[/MachCode]
Theirs:

Loading hsdis library failed, undisassembled code is shown in MachCode section
[MachCode]
  0x0000ffff853eb0a8: b704 0014 
[/MachCode]

Could this be related?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3296237445

From epeter at openjdk.org  Tue Sep 16 07:41:47 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:41:47 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v4]
In-Reply-To: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
Message-ID: <ZoxXKto9CzFuY2iQfiKzWD2i0NRAJIchjQtM9YliUn8=.7e4b2d41-ba24-4b06-91cb-d46b30e7aa16@github.com>

> Demo from here:
> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
> 
> Cleaned up and enhanced with a JTREG and IR test.
> I also added some additional "generated" normal maps from height functions.
> And I display the resulting image side-by-side with the normal map.
> 
> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
> 
> There is a **stand-alone** way to run the demo:
> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
> (though it may only run with JDK22+, probably due some amber features)
> 
> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Apply suggestions from code review
  
  Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27282/files
  - new: https://git.openjdk.org/jdk/pull/27282/files/47aa0c7d..00416267

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=02-03

  Stats: 19 lines in 2 files changed: 0 ins; 3 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/27282.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27282/head:pull/27282

PR: https://git.openjdk.org/jdk/pull/27282

From epeter at openjdk.org  Tue Sep 16 07:41:50 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:41:50 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v3]
In-Reply-To: <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
 <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
Message-ID: <9WNJ1t-LUT2EmkBkPRLkQOjep3EkiMHFWovt9VnJUmA=.8da3417f-5781-4823-a2f5-35392cd2f8df@github.com>

On Tue, 16 Sep 2025 06:49:08 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix inlining
>
> test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 121:
> 
>> 119:     }
>> 120: 
>> 121:     public static File getLocalFile(String name) {
> 
> Isn't `name` always constant (i.e. `normal_map.png`)? Then you could also extract that to a constant and use it here directly.

I would like to allow the user to add their own images. I used to have multiple, but the file sizes are a bit of an issue.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351248006

From epeter at openjdk.org  Tue Sep 16 07:42:42 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:42:42 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v3]
In-Reply-To: <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
 <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
Message-ID: <RnCvwXoWKtykMroBM1tksgHLzfjAJq8llk5D94dZDZk=.474e33d5-1385-4041-b553-80a7e7797c5e@github.com>

On Tue, 16 Sep 2025 06:50:50 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix inlining
>
> test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 88:
> 
>> 86:         System.out.println("Welcome to the Normal Mapping Demo!");
>> 87:         // Create an applicateion state with 5 lights.
>> 88:         State state = new State(5);
> 
> I suggest to put `5` into a named constant. This invites to play around with different number of lights.

Nice idea!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351262820

From epeter at openjdk.org  Tue Sep 16 07:45:51 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:45:51 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v3]
In-Reply-To: <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
 <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
Message-ID: <SXy1lelPI1v-JAgd3aP5m6i3QVN6li_AVDPLEGL9RZA=.3145bb7f-ae14-4dc7-b49f-69d08b55564c@github.com>

On Tue, 16 Sep 2025 06:51:59 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix inlining
>
> test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 149:
> 
>> 147:     }
>> 148: 
>> 149:     public static class Light {
> 
> Maybe add a quick comment what this class does since it's a demo and one might want to better understand what's going on. Same for `State` class below.

good idea!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351272790

From chagedorn at openjdk.org  Tue Sep 16 07:51:42 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 16 Sep 2025 07:51:42 GMT
Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in
 SuperWord truncation: CastII [v2]
In-Reply-To: <V977QzHH4oel8SJt9d3kG1HFtrscXKdCAAIYx0CqCzI=.22e0de4a-1a32-4ce9-b820-6d1247e0d4a4@github.com>
References: <XKOkG-MuA1n1Cy1qrBXCPBBx9RLFjD4iMk-oWNKfSPM=.42d2c171-b4b0-4457-b993-014a3cdfe656@github.com>
 <V977QzHH4oel8SJt9d3kG1HFtrscXKdCAAIYx0CqCzI=.22e0de4a-1a32-4ce9-b820-6d1247e0d4a4@github.com>
Message-ID: <vcVmO3zw1cEkEGwFmSIuh11_gQ1gHez02MhVNQvt79o=.943407c5-d6bc-4ec8-acc0-c2002824fd00@github.com>

On Thu, 21 Aug 2025 15:21:48 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> Hi all,
>> This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks!
>
> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update comment for constraint casts

The fix looks good to me, too! I only have one comment about the test.

test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 431:

> 429:     }
> 430: 
> 431:     @Test(compLevel = CompLevel.C2)

Any particular reason you've chosen `C2` here and not let the IR framework handle it? (by default it's `ANY` which will compile at the highest available tier). I'm also wondering if this test would fail if someone ran the test with a build without C2.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26827#pullrequestreview-3228138075
PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2351288090

From epeter at openjdk.org  Tue Sep 16 07:57:38 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 07:57:38 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v5]
In-Reply-To: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
Message-ID: <2gGUfvVlIaLGOd5iJUN3-oi9jlytrkULE3WZRUX1x78=.c0da1562-8cac-4215-9ae4-5cb248c89c0b@github.com>

> Demo from here:
> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
> 
> Cleaned up and enhanced with a JTREG and IR test.
> I also added some additional "generated" normal maps from height functions.
> And I display the resulting image side-by-side with the normal map.
> 
> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
> 
> There is a **stand-alone** way to run the demo:
> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
> (though it may only run with JDK22+, probably due some amber features)
> 
> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  more for Christian

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27282/files
  - new: https://git.openjdk.org/jdk/pull/27282/files/00416267..806c9379

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=03-04

  Stats: 20 lines in 1 file changed: 16 ins; 3 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27282.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27282/head:pull/27282

PR: https://git.openjdk.org/jdk/pull/27282

From epeter at openjdk.org  Tue Sep 16 08:00:47 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 08:00:47 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v3]
In-Reply-To: <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
 <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
Message-ID: <nniKuA71IV-DSr2lVJ_KxicQzW49kaBs4AGoqRcSh9Y=.fa494749-46f7-4ca5-9feb-e5d50dfbf641@github.com>

On Tue, 16 Sep 2025 07:05:14 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   fix inlining
>
> Great work and thanks for sharing it! A few small suggestions, otherwise, it looks good to me!

@chhagedorn Thanks a lot for reviewing! I addressed all your suggestions / comments ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27282#issuecomment-3296399618

From chagedorn at openjdk.org  Tue Sep 16 08:13:10 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 16 Sep 2025 08:13:10 GMT
Subject: RFR: 8367613: Test
 compiler/runtime/TestDontCompileHugeMethods.java failed
In-Reply-To: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
References: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
Message-ID: <viR_qPrD-PqsT7ndF1WZzTMK3IOXeRESFAXmNRYZris=.966e109b-5573-44e8-8921-c79c9a5b4c88@github.com>

On Tue, 16 Sep 2025 06:48:23 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi,
> 
> Could anyone approve this change that exclude this test when running with `-Xcomp`? This avoids the test failure reported in [JDK-8367613](https://bugs.openjdk.org/browse/JDK-8367613).
> 
> For reasons I don't yet understand, the `HugeSwitch::shortMethod` method is not compiled under `-Xcomp  -XX:TieredStopAtLevel=1`. The method gets compiled with either `-Xcomp` or `-XX:TieredStopAtLevel=1`, but not both. I appreciate if anyone could provide insights on possible reasons.

When looking at the test, it seems that we want to verify that `shortMethod()` is compiled while `hugeSwitch()` is not. When running with `-Xcomp`, we will immediately compile `main()` and directly inline `shortMethod()` with C1 (with C2 we fail to inline with "failed initial checks" and thus will compile `shortMethod()` separately when calling it the first time). Therefore, with C1, we will not compile `shortMethod()` separately and the test fails. 

Excluding `-Xcomp` looks reasonable. An alternative would be to exclude `main()` from compilation. But I think for the purpose of this test, excluding `-Xcomp` seems better.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27306#pullrequestreview-3228249028

From duke at openjdk.org  Tue Sep 16 08:27:40 2025
From: duke at openjdk.org (erifan)
Date: Tue, 16 Sep 2025 08:27:40 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v4]
In-Reply-To: <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>
Message-ID: <k4i2IgzPq5sakx1uRKLvr8Q-Nve9N6R6eWktsoLWZ_c=.75d5e48b-1cbf-438a-9b7c-ca14b980dcce@github.com>

On Mon, 15 Sep 2025 05:55:43 GMT, erifan <duke at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge branch 'master' into JDK-8363989
>  - Align code example data for better reading
>  - Merge branch 'master' into JDK-8363989
>  - Improve the comment of the vector expand implementation
>  - Merge branch 'master' into JDK-8363989
>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>    
>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>    cases, `expand` has not yet been intrinsified:
>    1. **Subword types** on SVE2-capable hardware.
>    2. **All types** on NEON and SVE1 environments.
>    
>    As a result, `expand` API performance is very poor in these scenarios.
>    This patch intrinsifies the `expand` operation in the above environments.
>    
>    Since there are no native instructions directly corresponding to `expand`
>    in these cases, this patch mainly leverages the `TBL` instruction to
>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>    Take a 128-bit byte vector on SVE2 as an example:
>    ```
>    To compute: dst = src.expand(mask)
>    Data direction: high <== low
>    Input:
>      src                         = p o n m l k j i h g f e d c b a
>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    Expected result:
>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>    ```
>    Step 1: calculate the index input of the TBL instruction.
>    ```
>    // Set tmp1 as all 0 vector.
>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>    
>    // Move the mask bits from the predicate register to a vector register.
>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    
>    // Shift the entire register. Prefix sum algorithm.
>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>    
>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>    
>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>    
>    dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
> ...

I'm not sure, I can pass all local tests, I'll take a look. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3296543668

From chagedorn at openjdk.org  Tue Sep 16 08:30:54 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 16 Sep 2025 08:30:54 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v5]
In-Reply-To: <2gGUfvVlIaLGOd5iJUN3-oi9jlytrkULE3WZRUX1x78=.c0da1562-8cac-4215-9ae4-5cb248c89c0b@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <2gGUfvVlIaLGOd5iJUN3-oi9jlytrkULE3WZRUX1x78=.c0da1562-8cac-4215-9ae4-5cb248c89c0b@github.com>
Message-ID: <L-qYHfVfMTYBSZ9ZS09ivw4qqbD0T27Z0nt2NMKwuzI=.4ab88f15-f2f6-468e-b913-c6f1e06f7968@github.com>

On Tue, 16 Sep 2025 07:57:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Demo from here:
>> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
>> 
>> Cleaned up and enhanced with a JTREG and IR test.
>> I also added some additional "generated" normal maps from height functions.
>> And I display the resulting image side-by-side with the normal map.
>> 
>> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
>> 
>> There is a **stand-alone** way to run the demo:
>> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
>> (though it may only run with JDK22+, probably due some amber features)
>> 
>> **Quick Perforance Numbers**, running on my avx512 laptop.
>> default / AVX3: 105 FPS
>> AVX2: 82 FPS
>> AVX1: 50 FPS
>> No vectorization: 19 FPS
>> GraalJIT: 13 FPS (`jdk-26-ea+5` - probably issue with vectorization / inlining?)
>> 
>> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
>> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
>> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
>> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
>> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   more for Christian

Looks good (minus two typos), thanks for the updates!

test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 151:

> 149:     /**
> 150:      * This class represents the lights that are located on the normal map,
> 151:      * move around randomyl, and shine their color of light on the scene.

Suggestion:

     * moved around randomly, and shine their color of light on the scene.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27282#pullrequestreview-3228335413
PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351429960

From chagedorn at openjdk.org  Tue Sep 16 08:30:56 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 16 Sep 2025 08:30:56 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v3]
In-Reply-To: <9WNJ1t-LUT2EmkBkPRLkQOjep3EkiMHFWovt9VnJUmA=.8da3417f-5781-4823-a2f5-35392cd2f8df@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <Q5K0iEuMtU1WeixSMQh9WJ9DUWe-Ci5juCT_BzoO2HE=.ff056548-8238-48bd-9f2f-66e13c43dff8@github.com>
 <z9LUu3aMAonOnbA05xSNFk0xXh0HEMkiT_-Vv3YnIfI=.8c0c7838-17f7-4aa1-a415-c58aad2b7ddd@github.com>
 <9WNJ1t-LUT2EmkBkPRLkQOjep3EkiMHFWovt9VnJUmA=.8da3417f-5781-4823-a2f5-35392cd2f8df@github.com>
Message-ID: <eFx_nKtnWdApWuXWKxlLoEsefi5S5hsXorf-gGp4EW0=.8007f028-d88a-4c76-8fe4-2991e3115903@github.com>

On Tue, 16 Sep 2025 07:36:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/gallery/NormalMapping.java line 121:
>> 
>>> 119:     }
>>> 120: 
>>> 121:     public static File getLocalFile(String name) {
>> 
>> Isn't `name` always constant (i.e. `normal_map.png`)? Then you could also extract that to a constant and use it here directly.
>
> I would like to allow the user to add their own images. I used to have multiple, but the file sizes are a bit of an issue.

I see, do you want to add a comment somewhere to suggest to play around with multiple image?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2351440198

From dfenacci at openjdk.org  Tue Sep 16 08:49:24 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Tue, 16 Sep 2025 08:49:24 GMT
Subject: RFR: 8367613: Test
 compiler/runtime/TestDontCompileHugeMethods.java failed
In-Reply-To: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
References: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
Message-ID: <4826eQblg2rlidW-mkVYXEDgccQNUBD0xbFpluJHlCA=.d2706893-10a9-4a62-b9c9-ae7407c70856@github.com>

On Tue, 16 Sep 2025 06:48:23 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi,
> 
> Could anyone approve this change that exclude this test when running with `-Xcomp`? This avoids the test failure reported in [JDK-8367613](https://bugs.openjdk.org/browse/JDK-8367613).
> 
> For reasons I don't yet understand, the `HugeSwitch::shortMethod` method is not compiled under `-Xcomp  -XX:TieredStopAtLevel=1`. The method gets compiled with either `-Xcomp` or `-XX:TieredStopAtLevel=1`, but not both. I appreciate if anyone could provide insights on possible reasons.

Marginal thing: since the issue happens with `-Xcomp` and `-XX:TieredStopAtLevel=1` it might be good to add the latter to `@requires` to restrict it as much as possible.
Also you might want to add this bug to the `@bug` tag.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27306#pullrequestreview-3228463710

From dlunden at openjdk.org  Tue Sep 16 08:52:57 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 08:52:57 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:

 - Clarify comments in regmask.hpp
 - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
 - Address review comments (renaming on the way in a separate PR)
 - Update src/hotspot/share/opto/regmask.hpp
   
   Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
 - Restore modified java/lang/invoke tests
 - Sort includes (new requirement)
 - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
 - Add clarifying comments at definitions of register mask sizes
 - Fix implicit zero and nullptr checks
 - Add deep copy comment
 - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288

-------------

Changes: https://git.openjdk.org/jdk/pull/20404/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=26
  Stats: 2852 lines in 29 files changed: 2289 ins; 289 del; 274 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From dlunden at openjdk.org  Tue Sep 16 09:09:30 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 09:09:30 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <ahDUkle8YPUMjgTpAS3CWajrsK6AYoo13oVeoPGV16s=.f754aa0f-649a-4cd4-bbf1-85296035b413@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <NDDSmCvsbgpWgTU_bCIhNdo8foNn447LmTJ4HsCTv-s=.e0549027-7ac1-4794-bfce-322d3870f9d1@github.com>
 <8L3IGg5YYgi2EjlC-v5U3FkkWvK1swESQFAMwX02I84=.d597910f-0aca-4eb2-b68c-fbe565e73291@github.com>
 <ahDUkle8YPUMjgTpAS3CWajrsK6AYoo13oVeoPGV16s=.f754aa0f-649a-4cd4-bbf1-85296035b413@github.com>
Message-ID: <A-4LTEgD_6WP9Hp3k21RgV5fDjKRrNCh25Dwdq-gymk=.7b102466-d324-47d9-88df-588dd6d63362@github.com>

On Tue, 2 Sep 2025 14:05:25 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Sure, we can rename them. I think `RM_SIZE_IN_INTS` and `RM_SIZE_IN_WORDS` would be most suitable. I avoided such a change in this changeset to not make it bigger than it already is. Isn't it easier to do the renaming in a follow-up RFE though, instead of before this PR? I'm fine with both though, not that much extra work to do it before.
>
> I think it would be easier to review if you do it first.
> That PR won't be super controversial, and just makes the code nicer.
> And then when we come back here, we may even be able to drop some comments, or be able to catch bugs just because the reviewers understand better what's going on ;)

Closing this thread as https://github.com/openjdk/jdk/pull/27215 is now integrated.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351579319

From dlunden at openjdk.org  Tue Sep 16 09:09:31 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 09:09:31 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <Op0K9v60ajIqvDAbyLxf7vvLtHrSaJgAjCIMzK_6WGE=.0106823b-a76c-44cf-b93b-6ab8ef700d77@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
 <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>
 <GdM72hQe1NvODLC6vcGtXrL5GnMA2c6IsRcdVW3z6r8=.740386db-28e4-46b7-a321-2218dfe2d846@github.com>
 <zJ_64S_33_XM0PJXxKU5cVJKeawayMUaWU7E0iBKkKw=.d046b0de-1772-4904-916a-cfca5034f634@github.com>
 <Op0K9v60ajIqvDAbyLxf7vvLtHrSaJgAjCIMzK_6WGE=.0106823b-a76c-44cf-b93b-6ab8ef700d77@github.com>
Message-ID: <v80_D87mRRMcNpM-sLjT8tncD6gMqa0hKImiryH7bbU=.e9f37739-327b-4261-9efb-697de2322cf0@github.com>

On Tue, 9 Sep 2025 08:35:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Ah, I think I now better understand your question. `rm_up` is a low-level method for internal use in `regmask.hpp` and `regmask.cpp` only (perhaps I should prepend it with an underscore?). It basically makes it so that we can regard the backing storage (`_RM_UP` and `_RM_UP_EXT`) as one contiguous array. `Member` is exposed externally and so needs the offset logic.
>
> Makes sense. Maybe we can make that a bit more clear in the renaming.
> Maybe we can make a clear distinction between the two mappings somehow?

Do you think this is good enough now after the renaming? To me, the distinction it is already quite clear (different argument types and method visibility).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351592165

From dlunden at openjdk.org  Tue Sep 16 09:09:33 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 09:09:33 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v17]
In-Reply-To: <oUIcvgdIomZTx78vZBEJWBWpHU94X5qTjZohgkmuNeo=.f64553b1-24d1-4fc1-b43e-51f2ba8581bf@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <waaqRqn_VDR65p23wpyP_eaTUWQfLYimE9I32FcPt_Y=.4d9b0176-af3c-4a01-92d1-ab59574f4a90@github.com>
 <C1d1Ony-SCP0Kid5CF_M7ElH5GR6DGGTrpmV4K73hFU=.83eb79e7-5456-4142-b460-b7afc691b3e2@github.com>
 <XzaXVkiyPvNqlJbdRL24aDtEPnwhsX7ZtgA9qXPMLhg=.51790093-b781-4734-98bb-1e36f79fa955@github.com>
 <PYjvjyDcI0z3P5K-GlPFweVSb5-_WysVB2wy3VrK4qA=.77151262-e2b4-42a9-a112-6892e7af15e9@github.com>
 <4JpU1sh_7wBfEZG3sJ8z-dWz-Wpk7osUjYZByvqetgc=.acf93145-620b-42e0-af57-ddf20875fd96@github.com>
 <oUIcvgdIomZTx78vZBEJWBWpHU94X5qTjZohgkmuNeo=.f64553b1-24d1-4fc1-b43e-51f2ba8581bf@github.com>
Message-ID: <uiUQ5LKUG0R5VKt0ZFv0SxHt-9F6w5zVjS-6Fv8X12o=.8509533e-1b3c-45c3-8261-2eda44ae7b10@github.com>

On Mon, 23 Jun 2025 14:27:48 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Alright. Well sure, we don't have to do a full renaming now. Though I do need to understand what is what to be able to review. Is there a good definition somewhere of what is what?
>
> I added comments at definition points of the various sizes. Let me know if something is still confusing.

Resolving this thread as https://github.com/openjdk/jdk/pull/27215 is now integrated.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351570887

From epeter at openjdk.org  Tue Sep 16 09:14:50 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 09:14:50 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop
In-Reply-To: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
Message-ID: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>

On Thu, 11 Sep 2025 13:05:21 GMT, Beno?t Maillard <bmaillard at openjdk.org> wrote:

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Thanks for working on this @benoitmaillard !

And thanks for all the explanations.

It seems the missing Phi at the OuterStripMinedLoop are a decision that implies that Stores will just sort of "hang" between loop exit and SafePoint. That is now the new "invariant". Fine for now, but we may want to reconsider adding the Phi for the OuterStripMinedLoop eventually.

I have read through the PR, and was a little confused about names, so bear with my comments ? 

On the algo level I was wondering if it is possible to have a chain of stores between the exit and SafePoint? Do you have such examples?

src/hotspot/share/opto/loopTransform.cpp line 1679:

> 1677:       Node* next = out->fast_out(l);
> 1678:       if (next->is_Mem() && next->in(MemNode::Memory) == out) {
> 1679:         IdealLoopTree* output_loop = get_loop(get_ctrl(next));

I would keep the names for `next` and `output_loop` consistent. Maybe `next_loop`? Or just call them `use` and `use_loop`?

src/hotspot/share/opto/loopTransform.cpp line 1692:

> 1690:   }
> 1691:   return out;
> 1692: }

Note from later me: I was quite confused here. I thought this was going to be some general function that should handle all sorts of memory flow in the loop, but that is not the case. I'll leave all my comments here just to show you what I as the reader thought when reading it ;)

Below, in a code comment you say that this method does:
`Find the last memory node in the loop when following memory usages`

What happens here if we hit an if-diamond (or more complicated), where there can be multiple memory uses, that are then merged again by a memory phi?


store
 |
 +--------+
 |        |
store   store
 |        |
 +---+ +--+
     | |
     phi
      |
    store -> the last one in the loop

I wonder if this is somehow possible. There are surely some IGVN optimizations that would common the stores here, and so the graph would probably have to be even more complicated. But I'm simply wondering if it could be possible that we would have branches / phis in the memory graph. Or what guarantees us that the graph is really linear here?

I'm also not sure how to parse the method name:
`find_mem_out_outer_strip_mined`
- find "mem out" <something> outer-strip-mined <loop?>
- find mem outside of outer-strip-mined loop?

src/hotspot/share/opto/loopTransform.cpp line 1788:

> 1786:   // right after the execution of the inner CountedLoop.
> 1787:   // We have to make sure that such stores in the post loop have the right memory inputs from the main loop
> 1788:   if (loop->tail()->in(0)->is_BaseCountedLoopEnd()) {

Out of curiosity: when would this condition be false?

src/hotspot/share/opto/loopTransform.cpp line 1793:

> 1791:     for (DUIterator j = if_false->outs(); if_false->has_out(j); j++) {
> 1792:       Node* store = if_false->out(j)->isa_Store();
> 1793:       // We don't make changes if the memory input is in the loop body as well

Why? I suppose that is because there must be a Phi in the loop then, right? Maybe state that in the comment here.

src/hotspot/share/opto/loopTransform.cpp line 1794:

> 1792:       Node* store = if_false->out(j)->isa_Store();
> 1793:       // We don't make changes if the memory input is in the loop body as well
> 1794:       if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {

Suggestion:

      if (store != nullptr && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {

No implicit null or zero checks, see hotspot style guide ;)

src/hotspot/share/opto/loopTransform.cpp line 1797:

> 1795:         Node* mem_out = find_mem_out_outer_strip_mined(store, outer_loop);
> 1796:         Node* store_new = old_new[store->_idx];
> 1797:         store_new->set_req(MemNode::Memory, mem_out);

Could it be that there are multiple stores in a chain after the loop exit and before the SafePoint?

Loop
Exit
store1
store2
store3
SafePoint

If so, they all have the same control, namely at the `if_false`.
Their memory state should be ordered, where store2 depends on store1 and store3 on store2. Only store1 should then really have its memory input updated.

Your code now finds the `store_new` for each of store1, store2 and store3, and sets all of their memory inputs to `mem_out`. But that means that the "new" stores all have the same memory input, and are not in a chain any more. Did I see this right? Is that ok?

src/hotspot/share/opto/loopnode.hpp line 1384:

> 1382: 
> 1383:   // Find the last memory node in the loop when following memory usages
> 1384:   Node *find_mem_out_outer_strip_mined(Node* store, IdealLoopTree* outer_loop);

The name of the method is a bit confusing. And the comment seems to suggest something different than what the code says.

test/hotspot/jtreg/compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java line 77:

> 75:         a1.field = 0;
> 76:         a2.field = 0;
> 77:     }

Do the field stores both float out of the loop, and end up in a chain between exit and safepoint? Might be nice to add some comments to these tests so we can see what examples you already cover and if we might need some more.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27225#pullrequestreview-3228308675
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351475787
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351447559
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351489419
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351520064
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351496858
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351551611
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351410690
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351608304

From epeter at openjdk.org  Tue Sep 16 09:14:52 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 09:14:52 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop
In-Reply-To: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
Message-ID: <XmnT5_5b88bstnI4s9dViTVJPFIccAizD0yjezUHohI=.8cc48a49-6e9e-4595-9b5b-2f56ee89590f@github.com>

On Tue, 16 Sep 2025 08:30:05 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
>> 
>> ### Context
>> 
>> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
>> 
>> 
>>     static public void test() {
>>         x = 0;
>>         for (int i = 0; i < 20000; i++) {
>>             x += i;
>>         }
>>         x = 0;
>>     }
>> 
>> 
>> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
>> 
>> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
>> 
>> ### Detailed Analysis
>> 
>> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
>> 
>> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
>> 
>> This is wh...
>
> src/hotspot/share/opto/loopTransform.cpp line 1692:
> 
>> 1690:   }
>> 1691:   return out;
>> 1692: }
> 
> Note from later me: I was quite confused here. I thought this was going to be some general function that should handle all sorts of memory flow in the loop, but that is not the case. I'll leave all my comments here just to show you what I as the reader thought when reading it ;)
> 
> Below, in a code comment you say that this method does:
> `Find the last memory node in the loop when following memory usages`
> 
> What happens here if we hit an if-diamond (or more complicated), where there can be multiple memory uses, that are then merged again by a memory phi?
> 
> 
> store
>  |
>  +--------+
>  |        |
> store   store
>  |        |
>  +---+ +--+
>      | |
>      phi
>       |
>     store -> the last one in the loop
> 
> I wonder if this is somehow possible. There are surely some IGVN optimizations that would common the stores here, and so the graph would probably have to be even more complicated. But I'm simply wondering if it could be possible that we would have branches / phis in the memory graph. Or what guarantees us that the graph is really linear here?
> 
> I'm also not sure how to parse the method name:
> `find_mem_out_outer_strip_mined`
> - find "mem out" <something> outer-strip-mined <loop?>
> - find mem outside of outer-strip-mined loop?

I suppose we would trigger your assert if we found a branch:
`assert(unique_next == nullptr, "memory node should only have one usage in the loop body");`

Now we usually only do pre-main-post for relatively small loop bodies, see `LoopUnrollLimit`. But I wonder if we ever decided to increase this limit, would we then encounter such more complicated memory graphs?

> src/hotspot/share/opto/loopTransform.cpp line 1794:
> 
>> 1792:       Node* store = if_false->out(j)->isa_Store();
>> 1793:       // We don't make changes if the memory input is in the loop body as well
>> 1794:       if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
> 
> Suggestion:
> 
>       if (store != nullptr && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
> 
> No implicit null or zero checks, see hotspot style guide ;)

The loop nesting check looks a bit convoluted. Consider refactoring a little. Could you get rid of the `!` by swapping things around?
`get_loop(get_ctrl(store->in(MemNode::Memory))))->is_member(outer_loop)`
Does not look that much better either... hmm.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351468893
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351511009

From epeter at openjdk.org  Tue Sep 16 09:14:53 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 09:14:53 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop
In-Reply-To: <XmnT5_5b88bstnI4s9dViTVJPFIccAizD0yjezUHohI=.8cc48a49-6e9e-4595-9b5b-2f56ee89590f@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
 <XmnT5_5b88bstnI4s9dViTVJPFIccAizD0yjezUHohI=.8cc48a49-6e9e-4595-9b5b-2f56ee89590f@github.com>
Message-ID: <KKoDMHzNe6b5JRZBdlA9bQumxKfPlU2LWzZtcnjvS7w=.17239127-765f-4d8c-9b4c-4fe589ff5db0@github.com>

On Tue, 16 Sep 2025 08:34:48 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/loopTransform.cpp line 1692:
>> 
>>> 1690:   }
>>> 1691:   return out;
>>> 1692: }
>> 
>> Note from later me: I was quite confused here. I thought this was going to be some general function that should handle all sorts of memory flow in the loop, but that is not the case. I'll leave all my comments here just to show you what I as the reader thought when reading it ;)
>> 
>> Below, in a code comment you say that this method does:
>> `Find the last memory node in the loop when following memory usages`
>> 
>> What happens here if we hit an if-diamond (or more complicated), where there can be multiple memory uses, that are then merged again by a memory phi?
>> 
>> 
>> store
>>  |
>>  +--------+
>>  |        |
>> store   store
>>  |        |
>>  +---+ +--+
>>      | |
>>      phi
>>       |
>>     store -> the last one in the loop
>> 
>> I wonder if this is somehow possible. There are surely some IGVN optimizations that would common the stores here, and so the graph would probably have to be even more complicated. But I'm simply wondering if it could be possible that we would have branches / phis in the memory graph. Or what guarantees us that the graph is really linear here?
>> 
>> I'm also not sure how to parse the method name:
>> `find_mem_out_outer_strip_mined`
>> - find "mem out" <something> outer-strip-mined <loop?>
>> - find mem outside of outer-strip-mined loop?
>
> I suppose we would trigger your assert if we found a branch:
> `assert(unique_next == nullptr, "memory node should only have one usage in the loop body");`
> 
> Now we usually only do pre-main-post for relatively small loop bodies, see `LoopUnrollLimit`. But I wonder if we ever decided to increase this limit, would we then encounter such more complicated memory graphs?

Ok, I think I have been misled by the names / comments.
You are really looking for the last store in the `outer_loop`. And we do have the guarantee of a linear memory graph because it is the one between `if_false` and SafePoint.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351568383

From epeter at openjdk.org  Tue Sep 16 09:14:53 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 09:14:53 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop
In-Reply-To: <KKoDMHzNe6b5JRZBdlA9bQumxKfPlU2LWzZtcnjvS7w=.17239127-765f-4d8c-9b4c-4fe589ff5db0@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
 <XmnT5_5b88bstnI4s9dViTVJPFIccAizD0yjezUHohI=.8cc48a49-6e9e-4595-9b5b-2f56ee89590f@github.com>
 <KKoDMHzNe6b5JRZBdlA9bQumxKfPlU2LWzZtcnjvS7w=.17239127-765f-4d8c-9b4c-4fe589ff5db0@github.com>
Message-ID: <EkXO0UA215XqrSBCd9ZD6HRnbwpjkIucIiaJaN8yjuY=.89df443d-16bc-4ab0-8b2e-a6e451e2d7ca@github.com>

On Tue, 16 Sep 2025 08:58:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I suppose we would trigger your assert if we found a branch:
>> `assert(unique_next == nullptr, "memory node should only have one usage in the loop body");`
>> 
>> Now we usually only do pre-main-post for relatively small loop bodies, see `LoopUnrollLimit`. But I wonder if we ever decided to increase this limit, would we then encounter such more complicated memory graphs?
>
> Ok, I think I have been misled by the names / comments.
> You are really looking for the last store in the `outer_loop`. And we do have the guarantee of a linear memory graph because it is the one between `if_false` and SafePoint.

I think a better method name would help a lot ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2351569804

From duke at openjdk.org  Tue Sep 16 09:43:54 2025
From: duke at openjdk.org (lusou-zhangquan)
Date: Tue, 16 Sep 2025 09:43:54 GMT
Subject: RFR: 8367706: Remove redundant register used by cmove in C1 LIR
 generation
Message-ID: <TL12NFqsBwHIdjdM0xzg_O4xZE5GZB8Pd07WpBYH0aM=.93d5a973-720c-4bed-b570-f21731cedc3b@github.com>

This PR removes redundant temp register used by cmove in C1 LIRGenerator::do_LookupSwitch and LIRGenerator::do_TableSwitch. The issue [8367706](https://bugs.openjdk.org/browse/JDK-8367706) is reported by me and it's my pleasure to fix it.

-------------

Commit messages:
 - 8367706: Remove redundant register used by cmove in C1 LIR generation

Changes: https://git.openjdk.org/jdk/pull/27307/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27307&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367706
  Stats: 8 lines in 1 file changed: 2 ins; 6 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27307.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27307/head:pull/27307

PR: https://git.openjdk.org/jdk/pull/27307

From qamai at openjdk.org  Tue Sep 16 09:47:14 2025
From: qamai at openjdk.org (Quan Anh Mai)
Date: Tue, 16 Sep 2025 09:47:14 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v9]
In-Reply-To: <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
Message-ID: <GvXCK34EyQ2-gCApaUrIHfS519xcQntra-YR5tK9lm8=.a226c512-051f-43f7-9ba1-a9ca6b202fc9@github.com>

On Sun, 14 Sep 2025 14:44:02 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove unused parameter

Marked as reviewed by qamai (Committer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/25254#pullrequestreview-3228765828

From rcastanedalo at openjdk.org  Tue Sep 16 09:55:04 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 16 Sep 2025 09:55:04 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT [v4]
In-Reply-To: <vMfMlE3PBXuzWabH2c3DNvZRV9xHiQxE9phvtF2QiaM=.1c6d4cb9-c19d-42cd-b6e7-a4123d72d35d@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <vMfMlE3PBXuzWabH2c3DNvZRV9xHiQxE9phvtF2QiaM=.1c6d4cb9-c19d-42cd-b6e7-a4123d72d35d@github.com>
Message-ID: <EGeZnOwXGN_bF6TgDzmX2AHMZIG-zduxkHnjk6lx96M=.62451fe1-7881-4fe2-b3ed-2078b51c6919@github.com>

On Tue, 16 Sep 2025 02:35:01 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

>> Please, review this patch to fix issue that may occur when reducing allocation merge.
>> 
>> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
>> 
>> The change in `revisit_reducible_phi_status` is just a clean-up.
>> The real fix is in `find_scalar_replaceable_allocs`.
>> 
>> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.
>
> Cesar Soares Lucas has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'refs/remotes/origin/ram-non-reducible' into ram-non-reducible
>  - Merge consecutive ifs

Looks good, thanks! Please consider addressing [JDK-8367367](https://bugs.openjdk.org/browse/JDK-8367367) as follow-up work, while the context is still available in the higher levels of our memory hierarchy ;)

-------------

Marked as reviewed by rcastanedalo (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27063#pullrequestreview-3228800991

From duke at openjdk.org  Tue Sep 16 10:14:53 2025
From: duke at openjdk.org (erifan)
Date: Tue, 16 Sep 2025 10:14:53 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation
In-Reply-To: <zQvZZ7A_01Czb0l17glhs7kNeQwQfzNiWFyovr0MTf8=.876dd110-5222-465e-b1e6-99e5e2f41fe9@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <tmEj88Ez4DURxmS7pPm8t1lhRct4fHlQcBySljEu-tg=.a18e80cf-7711-4ac6-990f-4c630b90f98b@github.com>
 <YxrR371gpyMeJEbioD-Bupej5RSArZgKCKP1BuQLMQQ=.ae9cdec7-59e4-4969-821c-7e8c0bcefbdf@github.com>
 <zQvZZ7A_01Czb0l17glhs7kNeQwQfzNiWFyovr0MTf8=.876dd110-5222-465e-b1e6-99e5e2f41fe9@github.com>
Message-ID: <F2asbwVo6VKd20hlZAsZQoySZ_MGKg-SA38hvLtkrtg=.065cd1c3-bf67-498d-b814-3d71473f9b49@github.com>

On Tue, 16 Sep 2025 07:30:20 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> @theRealAph @e1iu could you help take another look of this PR, thanks !
>
> @erifan I'm seeing `gtest/GTestWrapper.java` fail on `aarch64` machines.
> 
> Looks like this:
> 
> [ RUN      ] AssemblerAArch64.validate_vm
> [12.324s][warning][os] Loading hsdis library failed
> .../test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp:49: Failure
> Expected equality of these values:
>   insns[i]
>     Which is: 335545527
>   insns1[i]
>     Which is: 335545526
> Ours:
> 
> Loading hsdis library failed, undisassembled code is shown in MachCode section
> [MachCode]
>   0x0000ffff97c20548: b604 0014 
> [/MachCode]
> Theirs:
> 
> Loading hsdis library failed, undisassembled code is shown in MachCode section
> [MachCode]
>   0x0000ffff853eb0a8: b704 0014 
> [/MachCode]
> 
> Could this be related?

Hi @eme64 I can't reproduce the test failure on my local and Jenkins test environments. I see this from the above error log:

Loading hsdis library failed, undisassembled code is shown in MachCode section

Not sure if this is related.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3297410878

From fandreuzzi at openjdk.org  Tue Sep 16 10:23:28 2025
From: fandreuzzi at openjdk.org (Francesco Andreuzzi)
Date: Tue, 16 Sep 2025 10:23:28 GMT
Subject: RFR: 8367740: assembler_<cpu>.inline.hpp should not include
 assembler.inline.hpp
Message-ID: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>

This is the content of assembler.inline.hpp:
https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/share/asm/assembler.inline.hpp#L28-L30

Most of the `assembler_<cpu>.inline.hpp` include it:
https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/cpu/zero/assembler_zero.inline.hpp#L29-L32

They should probably include `assembler.hpp` instead.

Testing: tier1 in GHA

-------------

Commit messages:
 - cc

Changes: https://git.openjdk.org/jdk/pull/27311/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27311&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367740
  Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27311.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27311/head:pull/27311

PR: https://git.openjdk.org/jdk/pull/27311

From rcastanedalo at openjdk.org  Tue Sep 16 10:23:23 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 16 Sep 2025 10:23:23 GMT
Subject: RFR: 8367728: IGV: dump node address type
Message-ID: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>

This changeset dumps the address type of each node (`Node::adr_type()`), when not null, into the IGV graphs. This should improve the visibility and diagnosability of C2 type inconsistencies, see e.g. [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667).

#### Testing
- tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode).
- Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`).

-------------

Commit messages:
 - Dump address type

Changes: https://git.openjdk.org/jdk/pull/27310/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27310&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367728
  Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27310.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27310/head:pull/27310

PR: https://git.openjdk.org/jdk/pull/27310

From mchevalier at openjdk.org  Tue Sep 16 10:42:21 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 16 Sep 2025 10:42:21 GMT
Subject: RFR: 8367728: IGV: dump node address type
In-Reply-To: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
References: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
Message-ID: <v6V_XIM3EAcmlkVEhYtkCjzc-_cu9k0hzPIfa2Y6tPw=.a16c6a03-b719-4bc7-852e-3eef2b5b80d5@github.com>

On Tue, 16 Sep 2025 10:11:50 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset dumps the address type of each node (`Node::adr_type()`), when not null, into the IGV graphs. This should improve the visibility and diagnosability of C2 type inconsistencies, see e.g. [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667).
> 
> #### Testing
> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode).
> - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`).

I'm happy.

src/hotspot/share/opto/idealGraphPrinter.cpp line 452:

> 450:     }
> 451:     if (n->adr_type() != nullptr) {
> 452:       stringStream adr_type_stream;

Other stringStream around are using a preallocated buffer. Would it be a good idea here too?

-------------

Marked as reviewed by mchevalier (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27310#pullrequestreview-3229121349
PR Review Comment: https://git.openjdk.org/jdk/pull/27310#discussion_r2351915650

From epeter at openjdk.org  Tue Sep 16 11:07:57 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 11:07:57 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <aNAOEMb-aBnJ9bjkGbxpKB_Wpv3iw4iM0ukP2O3HrQg=.5a080d76-1ec7-44ae-af53-697a955308cf@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
 <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>
 <GdM72hQe1NvODLC6vcGtXrL5GnMA2c6IsRcdVW3z6r8=.740386db-28e4-46b7-a321-2218dfe2d846@github.com>
 <zJ_64S_33_XM0PJXxKU5cVJKeawayMUaWU7E0iBKkKw=.d046b0de-1772-4904-916a-cfca5034f634@github.com>
 <Op0K9v60ajIqvDAbyLxf7vvLtHrSaJgAjCIMzK_6WGE=.0106823b-a76c-44cf-b93b-6ab8ef700d77@github.com>
 <v80_D87mRRMcNpM-sLjT8tncD6gMqa0hKImiryH7bbU=.e9f37739-327b-4261-9efb-697de2322cf0@github.com>
 <aNAOEMb-aBnJ9bjkGbxpKB_Wpv3iw4iM0ukP2O3HrQg=.5a080d76-1ec7-44ae-af53-697a955308cf@github.com>
Message-ID: <8EDUG032a2-wepy1MeWd6n3Gfxr3_sajeRf07BbI0Wk=.ee7db86d-fd41-4beb-9a68-79812187466e@github.com>

On Tue, 16 Sep 2025 10:44:26 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Do you think this is good enough now after the renaming? To me, the distinction it is already quite clear (different argument types and method visibility).
>
> @dlunde It could be helpful to see a small example to see what maps to what if there are multiple views.

Why not move the field down to its explanation? Or move the explanation to the field?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351952889

From epeter at openjdk.org  Tue Sep 16 11:07:57 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 11:07:57 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
Message-ID: <XCQU9EZMg7DCptZ6udk6-QCFBjrn6konix7klz1pM5M=.81e2286a-f9e8-41a6-90fb-0c6a15cd0f90@github.com>

On Tue, 16 Sep 2025 10:52:53 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/regmask.hpp line 267:
> 
>> 265: 
>> 266:   // Where to extend the register mask
>> 267:   Arena* _arena;
> 
> Usually, we try to keep all fields at the top.

Just to keep the overview.

> src/hotspot/share/opto/regmask.hpp line 464:
> 
>> 462:     copy(rm);
>> 463:     return *this;
>> 464:   }
> 
> You could also delete this one, and use the `copy` explicitly at the use site. That would make the allocations a bit more explicit. What do you think?
> Whenever possible, it is nice to be able to declare a type `NONCOPYABLE`. Especially if it does allocations where copy is non-trivial.

You already removed some assignments, like this one, which is good:
https://github.com/openjdk/jdk/pull/20404/files#diff-344e52fd6be79f1d97a33d7ebbf131148df90bb52e3b33952340e8d37a3849d8L1501-R1512

Generally, there are a lot of constructors here. All of them are public, none explicit. Maybe that is just how it has to be, but maybe you can simplify a little.

> src/hotspot/share/opto/regmask.hpp line 659:
> 
>> 657: 
>> 658:   // Fill a register mask with 1's starting from the given register.
>> 659:   void Set_All_From(OptoReg::Name reg) {
> 
> Oh boy, we have a mixture of `lower_case` and `Strange_Case` method names.
> We missed those in the renaming RFE :/

Or is there a particular logic behind it?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351970696
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351760214
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351870907

From epeter at openjdk.org  Tue Sep 16 11:07:56 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 11:07:56 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
Message-ID: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>

On Tue, 16 Sep 2025 08:52:57 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
> 
>  - Clarify comments in regmask.hpp
>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>  - Address review comments (renaming on the way in a separate PR)
>  - Update src/hotspot/share/opto/regmask.hpp
>    
>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>  - Restore modified java/lang/invoke tests
>  - Sort includes (new requirement)
>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>  - Add clarifying comments at definitions of register mask sizes
>  - Fix implicit zero and nullptr checks
>  - Add deep copy comment
>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288

A first batch of comments before lunch (made it half way through `regmask.hpp)`

src/hotspot/share/adlc/formsopt.cpp line 174:

> 172:   // The array of Register Mask bits should be large enough to cover all the
> 173:   // machine registers and usually all parameters that need to be passed on the
> 174:   // stack (stack registers) up to some interesting limit. On Intel, the limit

What do you mean by `usually`? It could be misunderstood that sometimes it may not be large enough to even cover those "upt to some interesting limit". Consider rephrasing for clarity ;)

src/hotspot/share/opto/chaitin.cpp line 645:

> 643:     if (C->failing()) {
> 644:       return;
> 645:     }

What can fail here?

src/hotspot/share/opto/chaitin.hpp line 151:

> 149:     _mask_size = _mask.rm_size_in_bits();
> 150:     return _mask.rollover();
> 151:   }

Subjective: I would have kept the one-liner approach consistently here, since that is what surrounding code does.

src/hotspot/share/opto/ifg.cpp line 732:

> 730: 
> 731:     // Remove bound register(s) from 'l's choices
> 732:     old = interfering_lrg.mask();

Just checking: This is an implicit `copy` case, right?

src/hotspot/share/opto/locknode.cpp line 43:

> 41: 
> 42: BoxLockNode::BoxLockNode(int slot)
> 43:     : Node(Compile::current()->root()), _slot(slot),

Suggestion:

    : Node(Compile::current()->root()),
      _slot(slot),

I would put all on separate lines. Optional.

src/hotspot/share/opto/locknode.cpp line 55:

> 53:   }
> 54:   init_class_id(Class_BoxLock);
> 55:   init_flags(Flag_rematerialize);

Any reason why you moved these after the bailout? Maybe that's fine, but I don't know what the implications might be. Do you?

src/hotspot/share/opto/machnode.hpp line 758:

> 756: public:
> 757:   MachProjNode(Node* multi, uint con, const RegMask& out, uint ideal_reg)
> 758:       : ProjNode(multi, con), _rout(out, Compile::current()->comp_arena()),

Suggestion:

      : ProjNode(multi, con),
        _rout(out, Compile::current()->comp_arena()),

Optional. Either list horizontally or vertically is my opinion ;)

src/hotspot/share/opto/postaloc.cpp line 681:

> 679:             for (int l = 1; l < n_regs; l++) {
> 680:               OptoReg::Name ureg_lo = OptoReg::add(ureg,-l);
> 681:               bool is_reg = OptoReg::is_reg(ureg_lo);

Only needed in assert. Do you really need to give it a separate name? Subjective, your choice.

Does it have a side-effect?

src/hotspot/share/opto/postaloc.cpp line 685:

> 683:               assert(is_adjacent || is_reg,
> 684:                      "only registers can be non-adjacent");
> 685:               if (!value[ureg_lo] && is_adjacent) { // Nearly always adjacent

`value[ureg_lo]` returns a `Node*`, right? Then that would make this an implicit null check, not allowed by style guide ;)

src/hotspot/share/opto/regmask.hpp line 122:

> 120:     // the machine registers and usually all parameters that need to be passed
> 121:     // on the stack (stack registers) up to some interesting limit. On Intel,
> 122:     // the limit is something like 90+ parameters.

You may say that that in the "unusual" case, we have to use `_rm_word_ext`. Just so the reader knows what the ominous "usually" refers to ;)

src/hotspot/share/opto/regmask.hpp line 217:

> 215:   // are included in the register mask. Depending on the value of
> 216:   // _infinite_stack (denoted with as), {s10, s11, ...} are all included (as=1)
> 217:   // or excluded (as=0). Note that all registers/stack locations under _lwm

Do you want to rename `as` now that it does not refer to `all_stack` but `infinite_stack`?

src/hotspot/share/opto/regmask.hpp line 267:

> 265: 
> 266:   // Where to extend the register mask
> 267:   Arena* _arena;

Usually, we try to keep all fields at the top.

src/hotspot/share/opto/regmask.hpp line 270:

> 268: 
> 269:   // Grow the register mask to ensure it can fit at least min_size words.
> 270:   void grow(unsigned int min_size, bool init = true) {

Suggestion:

  void grow(unsigned int min_size, bool initialize_... = true) {

I would spell out what it means. `init` could mean lots of things.

src/hotspot/share/opto/regmask.hpp line 285:

> 283:         assert(_original_ext_address == &_rm_word_ext, "clone sanity check");
> 284:         _rm_word_ext = REALLOC_ARENA_ARRAY(_arena, uintptr_t, _rm_word_ext,
> 285:                                          old_ext_size, new_ext_size);

Suggestion:

                                           old_ext_size, new_ext_size);

src/hotspot/share/opto/regmask.hpp line 450:

> 448:     Insert(reg);
> 449:   }
> 450:   RegMask(OptoReg::Name reg) : RegMask(reg, nullptr) {}

You may want to add `explicit`, so nobody accidentally converts them ;)

src/hotspot/share/opto/regmask.hpp line 458:

> 456:   }
> 457: 
> 458:   RegMask(const RegMask& rm) : RegMask(rm, nullptr) {}

Do you want to add `explicit` here?
This is a shallow copy, right? Maybe add a comment for that.

src/hotspot/share/opto/regmask.hpp line 464:

> 462:     copy(rm);
> 463:     return *this;
> 464:   }

You could also delete this one, and use the `copy` explicitly at the use site. That would make the allocations a bit more explicit. What do you think?
Whenever possible, it is nice to be able to declare a type `NONCOPYABLE`. Especially if it does allocations where copy is non-trivial.

src/hotspot/share/opto/regmask.hpp line 659:

> 657: 
> 658:   // Fill a register mask with 1's starting from the given register.
> 659:   void Set_All_From(OptoReg::Name reg) {

Oh boy, we have a mixture of `lower_case` and `Strange_Case` method names.
We missed those in the renaming RFE :/

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-3228765757
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351720167
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351729163
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351797677
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351813653
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351821908
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351831266
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351835412
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351891004
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351885171
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351923440
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351946036
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351970184
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351976938
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351972811
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351741579
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352007574
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351754230
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351864011

From epeter at openjdk.org  Tue Sep 16 11:07:57 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 11:07:57 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <v80_D87mRRMcNpM-sLjT8tncD6gMqa0hKImiryH7bbU=.e9f37739-327b-4261-9efb-697de2322cf0@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
 <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>
 <GdM72hQe1NvODLC6vcGtXrL5GnMA2c6IsRcdVW3z6r8=.740386db-28e4-46b7-a321-2218dfe2d846@github.com>
 <zJ_64S_33_XM0PJXxKU5cVJKeawayMUaWU7E0iBKkKw=.d046b0de-1772-4904-916a-cfca5034f634@github.com>
 <Op0K9v60ajIqvDAbyLxf7vvLtHrSaJgAjCIMzK_6WGE=.0106823b-a76c-44cf-b93b-6ab8ef700d77@github.com>
 <v80_D87mRRMcNpM-sLjT8tncD6gMqa0hKImiryH7bbU=.e9f37739-327b-4261-9efb-697de2322cf0@github.com>
Message-ID: <aNAOEMb-aBnJ9bjkGbxpKB_Wpv3iw4iM0ukP2O3HrQg=.5a080d76-1ec7-44ae-af53-697a955308cf@github.com>

On Tue, 16 Sep 2025 09:05:06 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Makes sense. Maybe we can make that a bit more clear in the renaming.
>> Maybe we can make a clear distinction between the two mappings somehow?
>
> Do you think this is good enough now after the renaming? To me, the distinction it is already quite clear (different argument types and method visibility).

@dlunde It could be helpful to see a small example to see what maps to what if there are multiple views.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2351938661

From dlunden at openjdk.org  Tue Sep 16 11:31:04 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 11:31:04 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
Message-ID: <hmz592KUGaPyddKKstSM6DwhorHzB359hmnpfUTQMbI=.f263426a-da5a-4938-8d7d-80dc5bcbf201@github.com>

On Tue, 16 Sep 2025 09:45:07 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/adlc/formsopt.cpp line 174:
> 
>> 172:   // The array of Register Mask bits should be large enough to cover all the
>> 173:   // machine registers and usually all parameters that need to be passed on the
>> 174:   // stack (stack registers) up to some interesting limit. On Intel, the limit
> 
> What do you mean by `usually`? It could be misunderstood that sometimes it may not be large enough to even cover those "upt to some interesting limit". Consider rephrasing for clarity ;)

Sure, I'll rephrase it

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352096087

From dlunden at openjdk.org  Tue Sep 16 11:34:09 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 11:34:09 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
Message-ID: <pvmNfa0T_R14138XjxaFlWU1SePlnfzL2ZCU4JT2Dvc=.5ad6258f-08ce-40ed-bfc7-5e77c35d45d1@github.com>

On Tue, 16 Sep 2025 09:48:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/chaitin.cpp line 645:
> 
>> 643:     if (C->failing()) {
>> 644:       return;
>> 645:     }
> 
> What can fail here?

This bailout, added in this changeset: https://github.com/openjdk/jdk/blob/c1f41288c7f75b5abd6055fbc032cf4447532548/src/hotspot/share/opto/chaitin.cpp#L1664-L1672

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352106256

From rcastanedalo at openjdk.org  Tue Sep 16 11:52:03 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 16 Sep 2025 11:52:03 GMT
Subject: RFR: 8367728: IGV: dump node address type
In-Reply-To: <v6V_XIM3EAcmlkVEhYtkCjzc-_cu9k0hzPIfa2Y6tPw=.a16c6a03-b719-4bc7-852e-3eef2b5b80d5@github.com>
References: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
 <v6V_XIM3EAcmlkVEhYtkCjzc-_cu9k0hzPIfa2Y6tPw=.a16c6a03-b719-4bc7-852e-3eef2b5b80d5@github.com>
Message-ID: <6CdmgzpYWNBs__UtcZADKa23joZZS2ELePE8tkGCwAQ=.7cc5b8fe-c080-4c11-a3a8-1d8db6482633@github.com>

On Tue, 16 Sep 2025 10:38:13 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> This changeset dumps the address type of each node (`Node::adr_type()`), when not null, into the IGV graphs. This should improve the visibility and diagnosability of C2 type inconsistencies, see e.g. [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667).
>> 
>> #### Testing
>> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode).
>> - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`).
>
> src/hotspot/share/opto/idealGraphPrinter.cpp line 452:
> 
>> 450:     }
>> 451:     if (n->adr_type() != nullptr) {
>> 452:       stringStream adr_type_stream;
> 
> Other stringStream around are using a preallocated buffer. Would it be a good idea here too?

Thanks for bringing this up. I did not use the pre-allocated buffer for simplicity, which I think I is more important than efficiency in this code - as long as the efficiency is not bad enough to turn into a usability problem. We should probably investigate (separately) simplifying all other uses of `stringStream` in the IGV dumping logic.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27310#discussion_r2352179908

From dlunden at openjdk.org  Tue Sep 16 11:55:32 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 11:55:32 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
Message-ID: <Iy6aK7Bc7f8007DiFTY8UziPapdMfq2NFFLe-mUVpqo=.61820a83-0d45-41e1-9b28-ca7dabc375ef@github.com>

On Tue, 16 Sep 2025 09:52:21 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/regmask.hpp line 450:
> 
>> 448:     Insert(reg);
>> 449:   }
>> 450:   RegMask(OptoReg::Name reg) : RegMask(reg, nullptr) {}
> 
> You may want to add `explicit`, so nobody accidentally converts them ;)

Thanks, good point

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352183022

From dlunden at openjdk.org  Tue Sep 16 11:55:36 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 11:55:36 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <XCQU9EZMg7DCptZ6udk6-QCFBjrn6konix7klz1pM5M=.81e2286a-f9e8-41a6-90fb-0c6a15cd0f90@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
 <XCQU9EZMg7DCptZ6udk6-QCFBjrn6konix7klz1pM5M=.81e2286a-f9e8-41a6-90fb-0c6a15cd0f90@github.com>
Message-ID: <-NoNYKu9VID7gHzEAObPa7adchdpdL5CaNLclPBERVI=.3c469bcd-2497-4285-89f1-d62e5fdaf3d3@github.com>

On Tue, 16 Sep 2025 09:58:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 464:
>> 
>>> 462:     copy(rm);
>>> 463:     return *this;
>>> 464:   }
>> 
>> You could also delete this one, and use the `copy` explicitly at the use site. That would make the allocations a bit more explicit. What do you think?
>> Whenever possible, it is nice to be able to declare a type `NONCOPYABLE`. Especially if it does allocations where copy is non-trivial.
>
> You already removed some assignments, like this one, which is good:
> https://github.com/openjdk/jdk/pull/20404/files#diff-344e52fd6be79f1d97a33d7ebbf131148df90bb52e3b33952340e8d37a3849d8L1501-R1512
> 
> Generally, there are a lot of constructors here. All of them are public, none explicit. Maybe that is just how it has to be, but maybe you can simplify a little.

I agree with you in principle, but the copy constructor and assignment operator are heavily used by the ADLC-generated code. I prefer not touching it, at least in this PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352191833

From mchevalier at openjdk.org  Tue Sep 16 12:00:38 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Tue, 16 Sep 2025 12:00:38 GMT
Subject: RFR: 8367728: IGV: dump node address type
In-Reply-To: <6CdmgzpYWNBs__UtcZADKa23joZZS2ELePE8tkGCwAQ=.7cc5b8fe-c080-4c11-a3a8-1d8db6482633@github.com>
References: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
 <v6V_XIM3EAcmlkVEhYtkCjzc-_cu9k0hzPIfa2Y6tPw=.a16c6a03-b719-4bc7-852e-3eef2b5b80d5@github.com>
 <6CdmgzpYWNBs__UtcZADKa23joZZS2ELePE8tkGCwAQ=.7cc5b8fe-c080-4c11-a3a8-1d8db6482633@github.com>
Message-ID: <rwWvPIZw5EdFSyo-G0qlOv-2pgS88bMbC9P-VoBj2A0=.a11fc97b-ad9d-4717-8491-5afda94c1e8b@github.com>

On Tue, 16 Sep 2025 11:49:11 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> src/hotspot/share/opto/idealGraphPrinter.cpp line 452:
>> 
>>> 450:     }
>>> 451:     if (n->adr_type() != nullptr) {
>>> 452:       stringStream adr_type_stream;
>> 
>> Other stringStream around are using a preallocated buffer. Would it be a good idea here too?
>
> Thanks for bringing this up. I did not use the pre-allocated buffer for simplicity, which I think I is more important than efficiency in this code - as long as the efficiency is not bad enough to turn into a usability problem. We should probably investigate (separately) simplifying all other uses of `stringStream` in the IGV dumping logic.

I think that makes sense. Thanks.

No strong opinion whether we should change what is already there, as long as we don't add more.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27310#discussion_r2352218330

From dlunden at openjdk.org  Tue Sep 16 12:07:29 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:07:29 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
Message-ID: <bpnIb7cFZgSV9QSSa86a5Xs8kFHMnOCzmBZ4mUbRP8I=.aa8eef13-4d0b-4e07-a6fb-0e4f58a2d46f@github.com>

On Tue, 16 Sep 2025 10:07:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/chaitin.hpp line 151:
> 
>> 149:     _mask_size = _mask.rm_size_in_bits();
>> 150:     return _mask.rollover();
>> 151:   }
> 
> Subjective: I would have kept the one-liner approach consistently here, since that is what surrounding code does.

Fair enough, I'll update!

> src/hotspot/share/opto/ifg.cpp line 732:
> 
>> 730: 
>> 731:     // Remove bound register(s) from 'l's choices
>> 732:     old = interfering_lrg.mask();
> 
> Just checking: This is an implicit `copy` case, right?

Right, it is equivalent to make `copy` public and call it directly. However, see my other comment for why I think we (for now) should keep the `operator=` around.

> src/hotspot/share/opto/locknode.cpp line 43:
> 
>> 41: 
>> 42: BoxLockNode::BoxLockNode(int slot)
>> 43:     : Node(Compile::current()->root()), _slot(slot),
> 
> Suggestion:
> 
>     : Node(Compile::current()->root()),
>       _slot(slot),
> 
> I would put all on separate lines. Optional.

Sure, I'll update

> src/hotspot/share/opto/locknode.cpp line 55:
> 
>> 53:   }
>> 54:   init_class_id(Class_BoxLock);
>> 55:   init_flags(Flag_rematerialize);
> 
> Any reason why you moved these after the bailout? Maybe that's fine, but I don't know what the implications might be. Do you?

No reason that I can remember. I'll move them before the bailout!

> src/hotspot/share/opto/machnode.hpp line 758:
> 
>> 756: public:
>> 757:   MachProjNode(Node* multi, uint con, const RegMask& out, uint ideal_reg)
>> 758:       : ProjNode(multi, con), _rout(out, Compile::current()->comp_arena()),
> 
> Suggestion:
> 
>       : ProjNode(multi, con),
>         _rout(out, Compile::current()->comp_arena()),
> 
> Optional. Either list horizontally or vertically is my opinion ;)

Sure, updated

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352218757
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352225778
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352228859
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352236989
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352240162

From dlunden at openjdk.org  Tue Sep 16 12:07:32 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:07:32 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <XCQU9EZMg7DCptZ6udk6-QCFBjrn6konix7klz1pM5M=.81e2286a-f9e8-41a6-90fb-0c6a15cd0f90@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
 <XCQU9EZMg7DCptZ6udk6-QCFBjrn6konix7klz1pM5M=.81e2286a-f9e8-41a6-90fb-0c6a15cd0f90@github.com>
Message-ID: <UVgvv29Q9y6-Mwvx4yWZ_urMAMoCAYmrtM_8t-tKEcw=.b8148eb9-c820-40ff-9c40-ca1b3923099f@github.com>

On Tue, 16 Sep 2025 10:26:07 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 659:
>> 
>>> 657: 
>>> 658:   // Fill a register mask with 1's starting from the given register.
>>> 659:   void Set_All_From(OptoReg::Name reg) {
>> 
>> Oh boy, we have a mixture of `lower_case` and `Strange_Case` method names.
>> We missed those in the renaming RFE :/
>
> Or is there a particular logic behind it?

No other logic than keeping the same style as the surrounding old code. I can update it to use up-to-date style, but then we increase the scope of this PR. Is a follow-up PR OK?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352247193

From dlunden at openjdk.org  Tue Sep 16 12:11:51 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:11:51 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
Message-ID: <zjaKBlp2oBBNRp9pjJZphn3qnqeS0J8kxqoIgEkrMMc=.fe7a7c02-f146-4e92-a89f-2355c3d32160@github.com>

On Tue, 16 Sep 2025 10:29:58 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/postaloc.cpp line 685:
> 
>> 683:               assert(is_adjacent || is_reg,
>> 684:                      "only registers can be non-adjacent");
>> 685:               if (!value[ureg_lo] && is_adjacent) { // Nearly always adjacent
> 
> `value[ureg_lo]` returns a `Node*`, right? Then that would make this an implicit null check, not allowed by style guide ;)

Here I'll argue not touching this in this PR (I did not introduce this), as this is the style of the surrounding code. Should be addressed in a follow-up PR though.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352261538

From dlunden at openjdk.org  Tue Sep 16 12:17:27 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:17:27 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
Message-ID: <XXX-ijRjFpOI9z8YGj4dl5G2l2-jaXwU0mXqzmeCAww=.7bc57a52-e9c4-4e6f-96a6-46d222324a0b@github.com>

On Tue, 16 Sep 2025 10:31:33 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/postaloc.cpp line 681:
> 
>> 679:             for (int l = 1; l < n_regs; l++) {
>> 680:               OptoReg::Name ureg_lo = OptoReg::add(ureg,-l);
>> 681:               bool is_reg = OptoReg::is_reg(ureg_lo);
> 
> Only needed in assert. Do you really need to give it a separate name? Subjective, your choice.
> 
> Does it have a side-effect?

Giving it a name is only for clarity, mirroring the style of `is_adjacent` in the `assert`. I'll inline it, no problem. No side-effect.

> src/hotspot/share/opto/regmask.hpp line 122:
> 
>> 120:     // the machine registers and usually all parameters that need to be passed
>> 121:     // on the stack (stack registers) up to some interesting limit. On Intel,
>> 122:     // the limit is something like 90+ parameters.
> 
> You may say that that in the "unusual" case, we have to use `_rm_word_ext`. Just so the reader knows what the ominous "usually" refers to ;)

Yes, thanks. I'll update this comment to reflect the new register mask features.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352277426
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352285497

From dlunden at openjdk.org  Tue Sep 16 12:25:02 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:25:02 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v24]
In-Reply-To: <8EDUG032a2-wepy1MeWd6n3Gfxr3_sajeRf07BbI0Wk=.ee7db86d-fd41-4beb-9a68-79812187466e@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <Y6xwbHxXLhE6LWHiUbrEERiMKMu3ac_QKOYqgfvNrf8=.6f0f8ff4-6cee-475e-92a7-4dd2d4eb77ee@github.com>
 <GOyanc2ZP9eYLO0HtzP3JtijiGG-mmNhsY9w5Szocuc=.a903ec8a-ab26-4f74-a8c4-f9ec87e92cc9@github.com>
 <_a6JVBA326t8l1U3ZI8C-J3Ju5jm-RklBFGtnR7fbyY=.70638135-7577-44dc-a212-fe5e39b1f5fa@github.com>
 <RF4t87nYZFpJ461_rsT41aOIRigwmG6leniy4j9-QaA=.3e079800-bb8c-4e54-8a60-a060c30b5796@github.com>
 <GdM72hQe1NvODLC6vcGtXrL5GnMA2c6IsRcdVW3z6r8=.740386db-28e4-46b7-a321-2218dfe2d846@github.com>
 <zJ_64S_33_XM0PJXxKU5cVJKeawayMUaWU7E0iBKkKw=.d046b0de-1772-4904-916a-cfca5034f634@github.com>
 <Op0K9v60ajIqvDAbyLxf7vvLtHrSaJgAjCIMzK_6WGE=.0106823b-a76c-44cf-b93b-6ab8ef700d77@github.com>
 <v80_D87mRRMcNpM-sLjT8tncD6gMqa0hKImiryH7bbU=.e9f37739-327b-4261-9efb-697de2322cf0@github.com>
 <aNAOEMb-aBnJ9bjkGbxpKB_Wpv3iw4iM0ukP2O3HrQg=.5a080d76-1ec7-44ae-af53-697a955308cf@github.com>
 <8EDUG032a2-wepy1MeWd6n3Gfxr3_sajeRf
 07BbI0Wk=.ee7db86d-fd41-4beb-9a68-79812187466e@github.com>
Message-ID: <UrYRVt_WwGj3HZiCSLYYaDqzpbzBd5A6gWVTkdveyx8=.93094bd4-6861-47b2-8e3e-85624c4a3fa8@github.com>

On Tue, 16 Sep 2025 10:48:14 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> @dlunde It could be helpful to see a small example to see what maps to what if there are multiple views.
>
> Why not move the field down to its explanation? Or move the explanation to the field?

I think my comment about multiple views was misleading, rephrased it a bit now. I also moved the field down to just before the example illustrating it, good suggestion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352318746

From dlunden at openjdk.org  Tue Sep 16 12:25:04 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:25:04 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <XCQU9EZMg7DCptZ6udk6-QCFBjrn6konix7klz1pM5M=.81e2286a-f9e8-41a6-90fb-0c6a15cd0f90@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
 <XCQU9EZMg7DCptZ6udk6-QCFBjrn6konix7klz1pM5M=.81e2286a-f9e8-41a6-90fb-0c6a15cd0f90@github.com>
Message-ID: <U2xyEgHYWJi1eLOxiAMPXaiQ3LY_DvGGN-Kcvx-uW6M=.310903b2-339b-4b61-af57-7521f84d3804@github.com>

On Tue, 16 Sep 2025 10:53:04 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 267:
>> 
>>> 265: 
>>> 266:   // Where to extend the register mask
>>> 267:   Arena* _arena;
>> 
>> Usually, we try to keep all fields at the top.
>
> Just to keep the overview.

Sure, moved next to `_rm_word_ext` now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352323443

From dlunden at openjdk.org  Tue Sep 16 12:30:06 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:30:06 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
Message-ID: <CgDqueiQ2qerbQb_2ZSx6gLYbcZfeZCAMEydhpCaNPg=.7e987b4c-7fe2-406f-9f91-1b630bbdab7a@github.com>

On Tue, 16 Sep 2025 11:03:28 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/regmask.hpp line 458:
> 
>> 456:   }
>> 457: 
>> 458:   RegMask(const RegMask& rm) : RegMask(rm, nullptr) {}
> 
> Do you want to add `explicit` here?
> This is a shallow copy, right? Maybe add a comment for that.

The ADLC-generated code relies on using the constructor implicitly, so I prefer not touching it in this changeset at least. All the copies are deep, clarified now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352336557

From hgreule at openjdk.org  Tue Sep 16 12:38:33 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Tue, 16 Sep 2025 12:38:33 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v9]
In-Reply-To: <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
Message-ID: <HHsZ3o64JQf9QrTCxdpjnY-E9S-CnqbXZsjpvjjWVhM=.52a948dd-9015-4ca0-879d-38ce2d979bfe@github.com>

On Sun, 14 Sep 2025 14:44:02 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove unused parameter

Thanks everyone for the patience and the reviews :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-3298520392

From hgreule at openjdk.org  Tue Sep 16 12:38:35 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Tue, 16 Sep 2025 12:38:35 GMT
Subject: Integrated: 8356813: Improve Mod(I|L)Node::Value
In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
Message-ID: <8KSnYgDRvdBlvE0hx2hmWRZaKZ9_XfLHMqqKYFDFRmU=.fd29e8aa-2511-4f96-8976-2c3bcf6c2450@github.com>

On Thu, 15 May 2025 15:13:18 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
> 
> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
> 
> ### Monotonicity
> 
> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
> 
> ### Testing
> 
> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
> 
> Please review and let me know what you think.
> 
> ### Other
> 
> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
> 
> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.

This pull request has now been integrated.

Changeset: c7f014ed
Author:    Hannes Greule <hgreule at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c7f014ed494409cdf9fc925fe98de08346606408
Stats:     695 lines in 3 files changed: 630 ins; 50 del; 15 mod

8356813: Improve Mod(I|L)Node::Value

Reviewed-by: epeter, qamai

-------------

PR: https://git.openjdk.org/jdk/pull/25254

From dlunden at openjdk.org  Tue Sep 16 12:39:13 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:39:13 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v28]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <qXc7TNLuk8MPsKlfWYIACWBcbVtwMBG8DgAFKOhpgoM=.13ff2040-0288-4c91-80ff-3af867e0b14b@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Update after comments from Emanuel

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20404/files
  - new: https://git.openjdk.org/jdk/pull/20404/files/c1f41288..fe69f5a3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=27
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=26-27

  Stats: 84 lines in 7 files changed: 26 ins; 29 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From ayang at openjdk.org  Tue Sep 16 12:48:21 2025
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Tue, 16 Sep 2025 12:48:21 GMT
Subject: RFR: 8367740: assembler_<cpu>.inline.hpp should not include
 assembler.inline.hpp
In-Reply-To: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
References: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
Message-ID: <9nxdJBFv1YD2q97EhtgKjQaSWXLGiNOGnEuE-B4_q1w=.b0ca98e7-34b0-4e45-bd23-4e7d70e62b4d@github.com>

On Tue, 16 Sep 2025 10:15:06 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

> This is the content of assembler.inline.hpp:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/share/asm/assembler.inline.hpp#L28-L30
> 
> Most of the `assembler_<cpu>.inline.hpp` include it:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/cpu/zero/assembler_zero.inline.hpp#L29-L32
> 
> They should probably include `assembler.hpp` instead.
> 
> Testing: tier1 in GHA

Some background on this: https://github.com/openjdk/jdk/pull/27189#discussion_r2344516463, just fyi for others

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27311#issuecomment-3298600193

From epeter at openjdk.org  Tue Sep 16 12:49:00 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 12:49:00 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
Message-ID: <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>

On Tue, 16 Sep 2025 08:52:57 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
> 
>  - Clarify comments in regmask.hpp
>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>  - Address review comments (renaming on the way in a separate PR)
>  - Update src/hotspot/share/opto/regmask.hpp
>    
>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>  - Restore modified java/lang/invoke tests
>  - Sort includes (new requirement)
>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>  - Add clarifying comments at definitions of register mask sizes
>  - Fix implicit zero and nullptr checks
>  - Add deep copy comment
>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288

Alright, sprinted through the end.

I really appreciate that you added extensive `gtest`s, thanks for that ? 

And thanks for using the Template Framework, I'm curious to hear if you have any feedback on it :)

src/hotspot/share/opto/chaitin.cpp line 1656:

> 1654: 
> 1655:     // Check if a color is available and if so pick the color
> 1656:     OptoReg::Name reg = choose_color(*lrg);

Accidental find: why is this assert commented out?

src/hotspot/share/opto/chaitin.cpp line 1663:

> 1661:     if (!OptoReg::is_valid(reg) && is_infinite_stack) {
> 1662:       // Bump register mask up to next stack chunk
> 1663:       bool success = lrg->rollover();

Can you add a comment that explains what this does / means? Do we start spilling to the stack slots instead of using registers?

src/hotspot/share/opto/regmask.hpp line 241:

> 239:   //          \_______________________________________________________________________________/
> 240:   //                                                  |
> 241:   //                                  _rm_size_in_words=_offset=5

Can you please add some concise comment why we need `rollover`? Does that happen during register allocation, and if we have rollover then we start spilling instead of keeping values in registers?

src/hotspot/share/opto/regmask.hpp line 837:

> 835:   // ----------------------------------------------------------------------
> 836:   // The methods below are only for testing purposes (see test_regmask.cpp)
> 837:   // ----------------------------------------------------------------------

I wonder if it could be solved with `friend` instead, so it does not have to be public and get accidentally used somehow.

Or maybe some `gtest_` prefix? Not sure.

test/hotspot/jtreg/compiler/arguments/TestMethodArguments.java line 51:

> 49:     static final int INPUT_SIZE = 100;
> 50: 
> 51:     public static Template.ZeroArgs generateTest(PrimitiveType t, int numberOfArguments) {

You should write out `type` instead of `t`, would make it consistent with your `let` below.

test/hotspot/jtreg/compiler/arguments/TestMethodArguments.java line 120:

> 118:                 Template.let("classpath", comp.getEscapedClassPathOfCompiledClasses()),
> 119:                 """
> 120:                         import java.util.Arrays;

Personally, I would not indent this deeply. I know that the generated code will not have proper indentation, but that's no so bad. Readability of the Templates is more important I think. Subjective though.

test/hotspot/jtreg/compiler/arguments/TestMethodArguments.java line 146:

> 144:                                 }
> 145:                                 return array;
> 146:                             }

Seems like we need to add some convenience "fill" methods to the template library. We'll get there eventually, just keep this for now.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-3229588008
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352217532
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352235070
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352231339
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352309374
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352361517
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352382893
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352378439

From epeter at openjdk.org  Tue Sep 16 12:49:02 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 12:49:02 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
Message-ID: <yAWlqsoRw6XTVuYKBC9QXG_5rH8xx2MsotmdBqZ3M3Q=.77688482-c807-4a15-9f22-2b868e977f8c@github.com>

On Tue, 16 Sep 2025 11:57:55 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/chaitin.cpp line 1656:
> 
>> 1654: 
>> 1655:     // Check if a color is available and if so pick the color
>> 1656:     OptoReg::Name reg = choose_color(*lrg);
> 
> Accidental find: why is this assert commented out?

`//assert(is_infinite_stack == lrg->mask().is_infinite_stack(), "nbrs must not change InfiniteStackedness");`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352218812

From dlunden at openjdk.org  Tue Sep 16 12:57:52 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 16 Sep 2025 12:57:52 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <yAWlqsoRw6XTVuYKBC9QXG_5rH8xx2MsotmdBqZ3M3Q=.77688482-c807-4a15-9f22-2b868e977f8c@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
 <yAWlqsoRw6XTVuYKBC9QXG_5rH8xx2MsotmdBqZ3M3Q=.77688482-c807-4a15-9f22-2b868e977f8c@github.com>
Message-ID: <Pk7jfQR86D8wKhFB7wFZ6M6dSzWVPLuaaIHK8lV8i-U=.e75818b5-ed36-4b41-bfd5-6750e3df7722@github.com>

On Tue, 16 Sep 2025 11:58:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/chaitin.cpp line 1656:
>> 
>>> 1654: 
>>> 1655:     // Check if a color is available and if so pick the color
>>> 1656:     OptoReg::Name reg = choose_color(*lrg);
>> 
>> Accidental find: why is this assert commented out?
>
> `//assert(is_infinite_stack == lrg->mask().is_infinite_stack(), "nbrs must not change InfiniteStackedness");`

No idea, sorry (it has been that way since initial load). I just touched it to change from all_stack to infinite_stack.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352417426

From dfenacci at openjdk.org  Tue Sep 16 13:18:29 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Tue, 16 Sep 2025 13:18:29 GMT
Subject: RFR: 8367728: IGV: dump node address type
In-Reply-To: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
References: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
Message-ID: <IU5nLlur2MAeCmYZv8lv-GdIJrY2OK-TmvniETZ5GR8=.9b935076-e0e0-4dac-8092-0ca0310e7533@github.com>

On Tue, 16 Sep 2025 10:11:50 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset dumps the address type of each node (`Node::adr_type()`), when not null, into the IGV graphs. This should improve the visibility and diagnosability of C2 type inconsistencies, see e.g. [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667).
> 
> #### Testing
> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode).
> - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`).

Strange that it wasn't already printed ? Thanks for adding this @robcasloz! LGTM

-------------

Marked as reviewed by dfenacci (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27310#pullrequestreview-3229947806

From mhaessig at openjdk.org  Tue Sep 16 13:37:14 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 16 Sep 2025 13:37:14 GMT
Subject: RFR: 8366775: TestCompileTaskTimeout should use timeoutFactor
In-Reply-To: <FAbshcoZgQbyZL1hY00zT0716kDfRxQ8LINQOuQzjo4=.f3ad54a6-3d07-4713-88fa-607e1b702f1c@github.com>
References: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
 <FAbshcoZgQbyZL1hY00zT0716kDfRxQ8LINQOuQzjo4=.f3ad54a6-3d07-4713-88fa-607e1b702f1c@github.com>
Message-ID: <FoNG4T6j6aX_khdsTO5CvoHeNsqlSUHIbb-832goELo=.bfb7f0f1-d192-420b-bf2d-1509c2f67287@github.com>

On Tue, 9 Sep 2025 14:02:01 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> `TestCompileTaskTimeout.java` employs a timeout to test that methods compiled faster than a specified `CompileTaskTimeout`. However, it does not make use of the jtreg timeout factor, which lead to #26963 increasing the timeout to 2 s. This PR remedies this, by using the timeout factor and reducing the default timeout to 500 ms.
>> 
>> Testing:
>>  - [x] Github Actions
>>  - [x] tier1, tier2 linux-x64-debug, linux-x64, linux-aarch64-debug, linux-aarch64
>
> Looks good, the adjustments seem to work for us.

Thank you for reviewing @MBaesken, @robcasloz, and @chhagedorn!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27094#issuecomment-3298734770

From mhaessig at openjdk.org  Tue Sep 16 13:37:28 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 16 Sep 2025 13:37:28 GMT
Subject: Integrated: 8366775: TestCompileTaskTimeout should use timeoutFactor
In-Reply-To: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
References: <J7_KNMZeW20dQGNYUpDSZ7oDlXsr5dC_lyZzt1vrl-U=.882667fd-1064-4980-87ea-ee6bbb747d29@github.com>
Message-ID: <zFJQsd7mmmFJaV0Hcek09Yu-kfawuSJU1CFFJ0xJ2bU=.12a9efaf-4e82-4471-8435-293b311be4a6@github.com>

On Thu, 4 Sep 2025 13:26:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> `TestCompileTaskTimeout.java` employs a timeout to test that methods compiled faster than a specified `CompileTaskTimeout`. However, it does not make use of the jtreg timeout factor, which lead to #26963 increasing the timeout to 2 s. This PR remedies this, by using the timeout factor and reducing the default timeout to 500 ms.
> 
> Testing:
>  - [x] Github Actions
>  - [x] tier1, tier2 linux-x64-debug, linux-x64, linux-aarch64-debug, linux-aarch64

This pull request has now been integrated.

Changeset: c82070e6
Author:    Manuel H?ssig <mhaessig at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c82070e6357a1b49f2887ab22267393ba87d9352
Stats:     7 lines in 1 file changed: 6 ins; 0 del; 1 mod

8366775: TestCompileTaskTimeout should use timeoutFactor

Reviewed-by: chagedorn, rcastanedalo, mbaesken

-------------

PR: https://git.openjdk.org/jdk/pull/27094

From dfenacci at openjdk.org  Tue Sep 16 13:56:26 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Tue, 16 Sep 2025 13:56:26 GMT
Subject: RFR: 8367740: assembler_<cpu>.inline.hpp should not include
 assembler.inline.hpp
In-Reply-To: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
References: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
Message-ID: <e5uncFkZ0PiR2YfzEV-0lbIbzBRw9N0gImMhSP9YABo=.335e998a-10c2-4d93-90f1-6040b027abce@github.com>

On Tue, 16 Sep 2025 10:15:06 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

> This is the content of assembler.inline.hpp:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/share/asm/assembler.inline.hpp#L28-L30
> 
> Most of the `assembler_<cpu>.inline.hpp` include it:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/cpu/zero/assembler_zero.inline.hpp#L29-L32
> 
> They should probably include `assembler.hpp` instead.
> 
> Testing: tier1 in GHA

It looks like there were a few include cycles. Thanks for fixing this @fandreuz.
Running tier1-3+ tests...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27311#issuecomment-3298879857

From mhaessig at openjdk.org  Tue Sep 16 13:57:41 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 16 Sep 2025 13:57:41 GMT
Subject: RFR: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation [v2]
In-Reply-To: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
Message-ID: <jG_GKWUcQaT0ZqrP7zymhd_WwllFZCdQ0TTCSnTpXNk=.2a75e1e6-2900-4546-88b6-21de793b2726@github.com>

> When running a debug JVM on Linux with a compile task timeout and repeated compilation, the execution will time out almost always because the timeout does not reset for repetitions of a compilation. The core of the compile task timeout is to limit the amount of time a single compilation can take. Thus, this PR resets the `CompileTaskTimeout` for every compilation when running with `-XX:RepeatCompilation=<n>` for n > 1.
> 
> This PR is stacked on top of #27094.
> 
> Testing:
>  - [x] Github Actions (failures are unrelated)
>  - [x] tier1, tier2, tier3 plus some additional internal testing

Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27120/files
  - new: https://git.openjdk.org/jdk/pull/27120/files/cfe842c7..cfe842c7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27120&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27120&range=00-01

  Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27120.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27120/head:pull/27120

PR: https://git.openjdk.org/jdk/pull/27120

From epeter at openjdk.org  Tue Sep 16 14:24:51 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 14:24:51 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <zjaKBlp2oBBNRp9pjJZphn3qnqeS0J8kxqoIgEkrMMc=.fe7a7c02-f146-4e92-a89f-2355c3d32160@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
 <zjaKBlp2oBBNRp9pjJZphn3qnqeS0J8kxqoIgEkrMMc=.fe7a7c02-f146-4e92-a89f-2355c3d32160@github.com>
Message-ID: <jUdabfZhjHMX6jJJIgxClIOimKcNN_JyfblrQuF94hY=.b6ad8b57-e731-45d0-88c8-e4906216541e@github.com>

On Tue, 16 Sep 2025 12:08:25 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> src/hotspot/share/opto/postaloc.cpp line 685:
>> 
>>> 683:               assert(is_adjacent || is_reg,
>>> 684:                      "only registers can be non-adjacent");
>>> 685:               if (!value[ureg_lo] && is_adjacent) { // Nearly always adjacent
>> 
>> `value[ureg_lo]` returns a `Node*`, right? Then that would make this an implicit null check, not allowed by style guide ;)
>
> Here I'll argue not touching this in this PR (I did not introduce this), as this is the style of the surrounding code. Should be addressed in a follow-up PR though.

I'd say this is not just formatting/naming, but code style. We usually fix these cases when we touch the code ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352685909

From epeter at openjdk.org  Tue Sep 16 14:24:52 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 14:24:52 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <UVgvv29Q9y6-Mwvx4yWZ_urMAMoCAYmrtM_8t-tKEcw=.b8148eb9-c820-40ff-9c40-ca1b3923099f@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
 <XCQU9EZMg7DCptZ6udk6-QCFBjrn6konix7klz1pM5M=.81e2286a-f9e8-41a6-90fb-0c6a15cd0f90@github.com>
 <UVgvv29Q9y6-Mwvx4yWZ_urMAMoCAYmrtM_8t-tKEcw=.b8148eb9-c820-40ff-9c40-ca1b3923099f@github.com>
Message-ID: <fMXf7ImImlVVvS12voq7CO2_EdYoKL1qlOUwB7phpCA=.cc24e237-dd90-4c04-acd5-51fceb2edb47@github.com>

On Tue, 16 Sep 2025 12:05:09 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> Or is there a particular logic behind it?
>
> No other logic than keeping the same style as the surrounding old code. I can update it to use up-to-date style, but then we increase the scope of this PR. Is a follow-up PR OK?

Follow up is fine :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352682181

From epeter at openjdk.org  Tue Sep 16 14:30:49 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 14:30:49 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <CgDqueiQ2qerbQb_2ZSx6gLYbcZfeZCAMEydhpCaNPg=.7e987b4c-7fe2-406f-9f91-1b630bbdab7a@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
 <CgDqueiQ2qerbQb_2ZSx6gLYbcZfeZCAMEydhpCaNPg=.7e987b4c-7fe2-406f-9f91-1b630bbdab7a@github.com>
Message-ID: <GM3p22TWNkH3xEAvP--T2S4XQcxeX0wEGOyLvd-at3Q=.e088d3b8-5ded-4f61-a1db-231655d7e768@github.com>

On Tue, 16 Sep 2025 12:26:53 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> src/hotspot/share/opto/regmask.hpp line 458:
>> 
>>> 456:   }
>>> 457: 
>>> 458:   RegMask(const RegMask& rm) : RegMask(rm, nullptr) {}
>> 
>> Do you want to add `explicit` here?
>> This is a shallow copy, right? Maybe add a comment for that.
>
> The ADLC-generated code relies on using the constructor implicitly, so I prefer not touching it in this changeset at least. All the copies are deep, clarified now.

Ok, I understand. Can you show me an example, so I can understand a little better?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2352706428

From epeter at openjdk.org  Tue Sep 16 14:39:20 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 14:39:20 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v28]
In-Reply-To: <qXc7TNLuk8MPsKlfWYIACWBcbVtwMBG8DgAFKOhpgoM=.13ff2040-0288-4c91-80ff-3af867e0b14b@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <qXc7TNLuk8MPsKlfWYIACWBcbVtwMBG8DgAFKOhpgoM=.13ff2040-0288-4c91-80ff-3af867e0b14b@github.com>
Message-ID: <AHeSWnuBT0XGO2epc6Iixwr2NyR6ZSFBCBxJ0LKk6yc=.0af88214-2af5-4b5e-ba33-5a363b790027@github.com>

On Tue, 16 Sep 2025 12:39:13 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update after comments from Emanuel

You seem to have a build failure:

In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/compile.hpp:43,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:29,
                 from /home/runner/work/jdk/jdk/test/hotspot/gtest/opto/test_rangeinference.cpp:26:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp: In constructor ?RegMask::RegMask(Arena*)?:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp:441:53: error: class ?RegMask? does not have any field named ?_read_only?
  441 |       : _rm_word() DEBUG_ONLY(COMMA _arena(arena)), _read_only(read_only),
      |                                                     ^~~~~~~~~~
/home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp:441:64: error: ?read_only? was not declared in this scope
  441 |       : _rm_word() DEBUG_ONLY(COMMA _arena(arena)), _read_only(read_only),
      |

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3299064676

From epeter at openjdk.org  Tue Sep 16 14:39:22 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Tue, 16 Sep 2025 14:39:22 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v23]
In-Reply-To: <GjF5qX4BV-4xAWV6kDweN3luDSVQXxxp5i6creb7_L4=.085a85af-0ec5-42ca-a076-bbf554853d3a@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <KuhZYofHDkGkzw1Kq6vDvRs4_aDxOJDbTpIL8gnkQL8=.0d25e4bc-1f73-490f-a65b-29bef7ac8903@github.com>
 <qfJuLa2rYGYnrmbp32LpJgVaZfShvNjVkGOuJrSw00A=.5f7b712d-5700-45b5-8beb-fde3611e31de@github.com>
 <GjF5qX4BV-4xAWV6kDweN3luDSVQXxxp5i6creb7_L4=.085a85af-0ec5-42ca-a076-bbf554853d3a@github.com>
Message-ID: <gS-eZ4OguAN-N_CI_x4TipmDjyr1XiPce49X0z3AWc4=.afc55282-4ad2-4d65-9912-d513d11585a3@github.com>

On Thu, 11 Sep 2025 10:13:01 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>>> For reference, here is now the changeset adding an IFG bailout: #26118
>> 
>> Since that is now integrated: do we need to make any changes to the patch here? I thought the goal was to use the bailouts instead of increasing `MaxNodeLimit`.
>> 
>> Because looking at the discussions above: we were worried that there could be compile-time regressions - even if quite rare. But they were in the range of 40s which is quite scary. Are these now gone?
>
> @eme64 I have now addressed your comments (the renaming is in https://github.com/openjdk/jdk/pull/27215, as requested). Please have a look and let me know if I've missed something.

@dlunde Thanks for the swift updates! I have in the meantime added some more comments, just making sure you don't miss them :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3299067317

From mhaessig at openjdk.org  Tue Sep 16 15:38:12 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 16 Sep 2025 15:38:12 GMT
Subject: RFR: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation [v3]
In-Reply-To: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
Message-ID: <6ijTgwXUpwm8C_U7oOsN7RScv-caCal0U67UXFZ6VmY=.5550cf2f-2c57-4fc0-a2cd-3df6627485a2@github.com>

> When running a debug JVM on Linux with a compile task timeout and repeated compilation, the execution will time out almost always because the timeout does not reset for repetitions of a compilation. The core of the compile task timeout is to limit the amount of time a single compilation can take. Thus, this PR resets the `CompileTaskTimeout` for every compilation when running with `-XX:RepeatCompilation=<n>` for n > 1.
> 
> This PR is stacked on top of #27094.
> 
> Testing:
>  - [x] Github Actions (failures are unrelated)
>  - [x] tier1, tier2, tier3 plus some additional internal testing

Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - Merge branch 'master' into JDK-8366875-repeat-comp-to
 - Reset timeout on repeated compilations
 - Add regression test
 - Use timeuot factor

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27120/files
  - new: https://git.openjdk.org/jdk/pull/27120/files/cfe842c7..f9a170b6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27120&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27120&range=01-02

  Stats: 31864 lines in 1079 files changed: 16371 ins; 9354 del; 6139 mod
  Patch: https://git.openjdk.org/jdk/pull/27120.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27120/head:pull/27120

PR: https://git.openjdk.org/jdk/pull/27120

From mhaessig at openjdk.org  Tue Sep 16 15:38:14 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 16 Sep 2025 15:38:14 GMT
Subject: RFR: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation [v2]
In-Reply-To: <jG_GKWUcQaT0ZqrP7zymhd_WwllFZCdQ0TTCSnTpXNk=.2a75e1e6-2900-4546-88b6-21de793b2726@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
 <jG_GKWUcQaT0ZqrP7zymhd_WwllFZCdQ0TTCSnTpXNk=.2a75e1e6-2900-4546-88b6-21de793b2726@github.com>
Message-ID: <eh-M_swib9KP_gKT07QEADoLQ3vCy3SSndiJkw7DjlU=.29c28aff-ad31-4a9a-86b7-9443ee8887e7@github.com>

On Tue, 16 Sep 2025 13:57:41 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> When running a debug JVM on Linux with a compile task timeout and repeated compilation, the execution will time out almost always because the timeout does not reset for repetitions of a compilation. The core of the compile task timeout is to limit the amount of time a single compilation can take. Thus, this PR resets the `CompileTaskTimeout` for every compilation when running with `-XX:RepeatCompilation=<n>` for n > 1.
>> 
>> This PR is stacked on top of #27094.
>> 
>> Testing:
>>  - [x] Github Actions (failures are unrelated)
>>  - [x] tier1, tier2, tier3 plus some additional internal testing
>
> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase.

Merged master and fixed conflicts. I am currently rerunning testing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27120#issuecomment-3299321385

From mhaessig at openjdk.org  Tue Sep 16 15:42:39 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 16 Sep 2025 15:42:39 GMT
Subject: RFR: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java [v2]
In-Reply-To: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
References: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
Message-ID: <mJBCJpEMT_Yl1s5b8M2Yu6gJo18FZepU9Pj1zqUqZBU=.07df1c29-c004-437f-b611-9a71299aafd7@github.com>

> The test definitions of `TestAlignVectorFuzzer.java` all contain `printcompilation` directives. These are redundant and slow down the test execution of a test that already often times out. @eme64 also suggested adding a `compileonly` directive to one of the four tests.
> 
> Testing:
>  - [ ] Github Actions
>  - [ ] tier1 and stress testing (features `TestAlignVectorFuzzer.java`)

Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge branch 'master' into JDK-8366878-align-fuzz-flags
 - Make compileonly a separate run
 - Fix flags

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27122/files
  - new: https://git.openjdk.org/jdk/pull/27122/files/d2db1697..3aa62f9e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27122&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27122&range=00-01

  Stats: 31883 lines in 1081 files changed: 16389 ins; 9354 del; 6140 mod
  Patch: https://git.openjdk.org/jdk/pull/27122.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27122/head:pull/27122

PR: https://git.openjdk.org/jdk/pull/27122

From mhaessig at openjdk.org  Tue Sep 16 15:42:40 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 16 Sep 2025 15:42:40 GMT
Subject: RFR: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java
In-Reply-To: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
References: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
Message-ID: <9G7_44gEH9_LlCOwXzcSvKgGaF_TibDsAb_anH9ot34=.caedf4a0-560a-4b60-ad56-29b3c9e35bd0@github.com>

On Fri, 5 Sep 2025 16:46:09 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> The test definitions of `TestAlignVectorFuzzer.java` all contain `printcompilation` directives. These are redundant and slow down the test execution of a test that already often times out. @eme64 also suggested adding a `compileonly` directive to one of the four tests.
> 
> Testing:
>  - [ ] Github Actions
>  - [ ] tier1 and stress testing (features `TestAlignVectorFuzzer.java`)

Merged master and addressed @eme64's comment.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27122#issuecomment-3299328398

From mhaessig at openjdk.org  Tue Sep 16 15:42:44 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Tue, 16 Sep 2025 15:42:44 GMT
Subject: RFR: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java [v2]
In-Reply-To: <5PWmoHhlhYHDD7WBje51yGzGHr1Dq3QCDRNApA64MmY=.ed2e0b11-e144-4e24-97dd-7a7ccdd208c0@github.com>
References: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
 <5PWmoHhlhYHDD7WBje51yGzGHr1Dq3QCDRNApA64MmY=.ed2e0b11-e144-4e24-97dd-7a7ccdd208c0@github.com>
Message-ID: <jj4hzorz-0guU5qakdgBP767xkxsX4lTbc81CJ0WK_I=.48b191f5-296b-4693-852c-62ecdc8c1c89@github.com>

On Mon, 8 Sep 2025 05:53:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8366878-align-fuzz-flags
>>  - Make compileonly a separate run
>>  - Fix flags
>
> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java line 35:
> 
>> 33:  *                                 -XX:CompileCommand=compileonly,compiler.loopopts.superword.TestAlignVectorFuzzer::*
>> 34:  *                                 compiler.loopopts.superword.TestAlignVectorFuzzer
>> 35:  */
> 
> I think it would be good if we also had the same run but without the compileonly. That's what I meant by duplication ;)

I added a separate run in a new commit.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27122#discussion_r2352913724

From dfenacci at openjdk.org  Tue Sep 16 16:29:59 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Tue, 16 Sep 2025 16:29:59 GMT
Subject: RFR: 8367278: Test compiler/startup/StartupOutput.java timed out after
 completion on Windows
Message-ID: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>

## Problem
After [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555) changed the default TIMEOUT_FACTOR from 4 to 1, the test compiler/startup/StartupOutput.java can occasionally slightly exceed the 2-minute timeout on Windows.

## Change
Rather than increasing the timeout, this change reduces the number of VM runs with randomly generated near-minimum code cache sizes from 200 to 50. This should still provide sufficient coverage while keeping execution well within the timeout.

## Testing:
Tiers 1-3+

-------------

Commit messages:
 - JDK-8367278: reduce loop to 50 cycles
 - JDK-8367278: Test compiler/startup/StartupOutput.java timed out after completion on Windows

Changes: https://git.openjdk.org/jdk/pull/27254/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27254&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367278
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27254.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27254/head:pull/27254

PR: https://git.openjdk.org/jdk/pull/27254

From sparasa at openjdk.org  Tue Sep 16 17:42:18 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Tue, 16 Sep 2025 17:42:18 GMT
Subject: RFR: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2 [v5]
In-Reply-To: <lHnIF_1S19u7UH5A4QNzvjZ3AZjH9uzRtnAVkZFuX5s=.3871ed99-4c8e-4455-85b4-a841fe34c71d@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
 <CI6jzRQmAQXIQaToisYrZGI8tLzI14TOKLtPY2zrPWw=.55db890f-8f77-4ed7-8a9c-1d99ca9487ba@github.com>
 <lHnIF_1S19u7UH5A4QNzvjZ3AZjH9uzRtnAVkZFuX5s=.3871ed99-4c8e-4455-85b4-a841fe34c71d@github.com>
Message-ID: <5QOi2GBheGqa8c_Hc9yfuq0DTm8UsD1QshPkVdgdFDc=.079d9af8-0ccd-4540-ae10-3ae359a9a6d9@github.com>

On Tue, 16 Sep 2025 05:35:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Not reviewed in detail, but looks reasonable. Tests pass :)

Thank you Emanuel! :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26997#issuecomment-3299735699

From sparasa at openjdk.org  Tue Sep 16 18:16:52 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Tue, 16 Sep 2025 18:16:52 GMT
Subject: Integrated: 8354348: Enable Extended EVEX to REX2/REX demotion for
 commutative operations with same dst and src2
In-Reply-To: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
References: <jCJz5X50JLznjIWKMVUz_5JDeIvMRCBOdBMINRS-o0k=.bf483d10-1d72-4f23-ac34-642343b04337@github.com>
Message-ID: <5YUTgEkfPuRs8PZq3uH2XbK9KxOZIQLzuDZ4Lz9VYSg=.1f1e53a5-a1d7-4f56-8a9d-6490a07022ee@github.com>

On Thu, 28 Aug 2025 21:09:03 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

> This change extends Extended EVEX (EEVEX) to REX2/REX demotion for Intel APX NDD instructions to handle commutative operations when the destination register and the second source register (src2) are the same.
> 
> Currently, EEVEX to REX2/REX demotion is only enabled when the first source (src1) and the destination are the same. This enhancement allows additional cases of valid demotion for commutative instructions (add, imul, and, or, xor).
> 
> For example:
> `eaddl r18, r25, r18` can be encoded as `addl r18, r25` using APX REX2 encoding
> `eaddl r2, r7, r2` can be encoded as `addl r2, r7` using non-APX legacy encoding

This pull request has now been integrated.

Changeset: c41add8d
Author:    Srinivas Vamsi Parasa <sparasa at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c41add8d3e24be5f469f18cfbf0f476f2baf63a6
Stats:     3085 lines in 4 files changed: 518 ins; 169 del; 2398 mod

8354348: Enable Extended EVEX to REX2/REX demotion for commutative operations with same dst and src2

Reviewed-by: jbhateja, epeter, sviswanathan

-------------

PR: https://git.openjdk.org/jdk/pull/26997

From kvn at openjdk.org  Tue Sep 16 19:32:45 2025
From: kvn at openjdk.org (Vladimir Kozlov)
Date: Tue, 16 Sep 2025 19:32:45 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode [v2]
In-Reply-To: <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
 <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
Message-ID: <GQdQbes3Ih0TrWx1v2f2oGfUXGy87xVgYRHbEh_7FxM=.06a64b8b-d822-4797-aeb8-0c8cf93a675a@github.com>

On Mon, 15 Sep 2025 14:08:57 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:
>> 
>>  1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
>>  2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.
>> 
>> I think we should be running CTW tests in AWT headless mode to begin with. 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8367313-ctw-headless-mode
>  - Fix

Good.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27187#pullrequestreview-3231402860

From vlivanov at openjdk.org  Tue Sep 16 20:09:18 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 16 Sep 2025 20:09:18 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
Message-ID: <OQOC6AdwvpZrL5teIwHml6xnayliLgsY5VjI87p5XNs=.cee478ee-56f8-4b6b-bb68-0a5c4ea24df7@github.com>

> As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.
> 
> Consider `FloatVector::lanewiseTemplate`:
> 
>     FloatVector lanewiseTemplate(VectorOperators.Unary op) {
>         if (opKind(op, VO_SPECIAL)) {
>             ...                             
>             else if (opKind(op, VO_MATHLIB)) {
>                 return unaryMathOp(op);
>             }
>         }
>         int opc = opCode(op);
>         return VectorSupport.unaryOp(opc, ...);
>     }
> 
> 
> At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 
> 
> It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.
> 
> The fix is to fail-fast intrinsification rather than crashing the VM.
> 
> Testing: tier1 - tier4

Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:

  review feedback

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27263/files
  - new: https://git.openjdk.org/jdk/pull/27263/files/66892f1d..f63e76ce

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27263&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27263&range=00-01

  Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27263.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27263/head:pull/27263

PR: https://git.openjdk.org/jdk/pull/27263

From vlivanov at openjdk.org  Tue Sep 16 20:09:18 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 16 Sep 2025 20:09:18 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <OQOC6AdwvpZrL5teIwHml6xnayliLgsY5VjI87p5XNs=.cee478ee-56f8-4b6b-bb68-0a5c4ea24df7@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
 <OQOC6AdwvpZrL5teIwHml6xnayliLgsY5VjI87p5XNs=.cee478ee-56f8-4b6b-bb68-0a5c4ea24df7@github.com>
Message-ID: <PJqbHI7xhXxlokivYrcEmraq-vwZy27OIbvDFwa0piQ=.99aab7df-15ff-4c54-aef7-76d9836c7d2f@github.com>

On Tue, 16 Sep 2025 20:05:48 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.
>> 
>> Consider `FloatVector::lanewiseTemplate`:
>> 
>>     FloatVector lanewiseTemplate(VectorOperators.Unary op) {
>>         if (opKind(op, VO_SPECIAL)) {
>>             ...                             
>>             else if (opKind(op, VO_MATHLIB)) {
>>                 return unaryMathOp(op);
>>             }
>>         }
>>         int opc = opCode(op);
>>         return VectorSupport.unaryOp(opc, ...);
>>     }
>> 
>> 
>> At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 
>> 
>> It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.
>> 
>> The fix is to fail-fast intrinsification rather than crashing the VM.
>> 
>> Testing: tier1 - tier4
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review feedback

Thanks for the reviews.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27263#pullrequestreview-3231511953

From vlivanov at openjdk.org  Tue Sep 16 20:09:22 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 16 Sep 2025 20:09:22 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <sZvkxuXnOiN1VKfG92NEmZl2f4g0xdTBwF62_lhWlZg=.c5c434d8-00fa-4b26-a127-dad00aea9fe6@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
 <sZvkxuXnOiN1VKfG92NEmZl2f4g0xdTBwF62_lhWlZg=.c5c434d8-00fa-4b26-a127-dad00aea9fe6@github.com>
Message-ID: <3Cy6jhWxbaQeWwo22L9nxPnipY1-vHsGZEtk8IZUiq8=.bfefdef7-0137-422b-a7b0-e4fae2a5b282@github.com>

On Tue, 16 Sep 2025 06:44:49 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   review feedback
>
> test/hotspot/jtreg/compiler/vectorapi/TestVectorMathLib.java line 33:
> 
>> 31:  * @test
>> 32:  * @bug 8367333
>> 33:  * @requires vm.compiler2.enabled
> 
> Do you need this `@requires`? It might be nice to be able to run this with other compilers too.

It's intended as C2-specific regression test and it relies on C2-specific VM flags. Vector API unit tests (under test/jdk/jdk/incubator/vector/) exercise the very same functionality, but don't specify flags required to trigger the bug.

> test/hotspot/jtreg/compiler/vectorapi/TestVectorMathLib.java line 40:
> 
>> 38:  *                   -XX:CompileCommand=compileonly,compiler.vectorapi.TestVectorMathLib::test*
>> 39:  *                   -XX:+StressIncrementalInlining
>> 40:  *                       compiler.vectorapi.TestVectorMathLib
> 
> Like @jatin-bhateja mentioned: alignment is off.
> I'd also like to see a run without flags, maybe with only `-XX:CompileCommand=compileonly,compiler.vectorapi.TestVectorMathLib::test*`

Again, IMO it doesn't make sense to run the regression test without stressing incremental inlining. Otherwise, it duplicates existing tests.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2353525672
PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2353527146

From vlivanov at openjdk.org  Tue Sep 16 20:09:24 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 16 Sep 2025 20:09:24 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <5xYVPgSuC3a9kqp_hRs3vgtBDoJzlmf9v6wgMa9XFJ4=.c8abf0f6-b563-4b3f-92c3-d902b6e59950@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
 <5xYVPgSuC3a9kqp_hRs3vgtBDoJzlmf9v6wgMa9XFJ4=.c8abf0f6-b563-4b3f-92c3-d902b6e59950@github.com>
Message-ID: <Ez-dlrWrYF5Gi4FMGiJG9uKDmeLl6qGlA7yhYwQigqQ=.fdf4af39-7255-464d-a197-20df99058d8b@github.com>

On Mon, 15 Sep 2025 15:24:58 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   review feedback
>
> test/hotspot/jtreg/compiler/vectorapi/TestVectorMathLib.java line 40:
> 
>> 38:  *                   -XX:CompileCommand=compileonly,compiler.vectorapi.TestVectorMathLib::test*
>> 39:  *                   -XX:+StressIncrementalInlining
>> 40:  *                       compiler.vectorapi.TestVectorMathLib
> 
> Suggestion:
> 
>  *                   -XX:+StressIncrementalInlining compiler.vectorapi.TestVectorMathLib

Ok, fixed. I prefer to keep test class name on a separate line.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2353534340

From jcking at openjdk.org  Tue Sep 16 21:40:04 2025
From: jcking at openjdk.org (Justin King)
Date: Tue, 16 Sep 2025 21:40:04 GMT
Subject: RFR: 8367789: AArch64 missing acquire in
 JNI_FastGetField::generate_fast_get_int_field0
Message-ID: <BCmnPdUWV9yyc_1MBRX_7gC2OHXNZJJJW3MlP8ZvhTY=.92ba900b-b1d8-4ba6-a774-ce757a2eca47@github.com>

Use a load-acquire to match the store-release used by C++ to update `safepoint_counter` during arming.

-------------

Commit messages:
 - JDK-8367789: Use load-acquire instead of load

Changes: https://git.openjdk.org/jdk/pull/27325/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27325&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367789
  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27325.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27325/head:pull/27325

PR: https://git.openjdk.org/jdk/pull/27325

From duke at openjdk.org  Tue Sep 16 21:54:04 2025
From: duke at openjdk.org (Chad Rakoczy)
Date: Tue, 16 Sep 2025 21:54:04 GMT
Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache
 [v46]
In-Reply-To: <K9If679t8ipetKmZAt1YVCXy5vplvdgCEs9O9VT8d30=.cc8cbc06-ce38-4806-a5de-0e7d60957c07@github.com>
References: <CpuGkGuFlcXd3ZwuZCG8oWEEa2GKgTs3LaGwpIESm9g=.4807c870-5cce-4f87-aca7-79c1b87e7b0a@github.com>
 <S-edDUdZcJb2zCePiPAGlUTPvrhVN5GbV2e7kC7Eu78=.f8346731-b24f-4ab1-bb2b-6f8d3435e0a6@github.com>
 <K9If679t8ipetKmZAt1YVCXy5vplvdgCEs9O9VT8d30=.cc8cbc06-ce38-4806-a5de-0e7d60957c07@github.com>
Message-ID: <1mM9usJWy-ZWYMEm1qxiHfxbO1jn6zpBS_t16Xr9i64=.5f4a3243-87e4-439d-b315-fcc4be60fcca@github.com>

On Sat, 30 Aug 2025 00:32:02 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix NMethodRelocationTest.java logging race
>
> It failed on linux-x64 and linux-aarch64.
> I tried locally on  linux-x64  but it passed.

@vnkozlov The bug you discovered has been fixed. Can you rerun your testing to confirm on your end?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-3300462043

From manc at openjdk.org  Tue Sep 16 21:59:10 2025
From: manc at openjdk.org (Man Cao)
Date: Tue, 16 Sep 2025 21:59:10 GMT
Subject: RFR: 8367613: Test
 compiler/runtime/TestDontCompileHugeMethods.java failed [v2]
In-Reply-To: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
References: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
Message-ID: <5eWiPUhybQOdBZAfm8LnEGLQ8ZwXHcqatCQEf8PVlgo=.ffcd2f7e-f734-49de-a2e4-1099bfb544f5@github.com>

> Hi,
> 
> Could anyone approve this change that exclude this test when running with `-Xcomp`? This avoids the test failure reported in [JDK-8367613](https://bugs.openjdk.org/browse/JDK-8367613).
> 
> For reasons I don't yet understand, the `HugeSwitch::shortMethod` method is not compiled under `-Xcomp  -XX:TieredStopAtLevel=1`. The method gets compiled with either `-Xcomp` or `-XX:TieredStopAtLevel=1`, but not both. I appreciate if anyone could provide insights on possible reasons.

Man Cao has updated the pull request incrementally with one additional commit since the last revision:

  Switch to disable inlining for shortMethod

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27306/files
  - new: https://git.openjdk.org/jdk/pull/27306/files/f460dc4d..93540e05

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27306&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27306&range=00-01

  Stats: 6 lines in 1 file changed: 4 ins; 1 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27306.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27306/head:pull/27306

PR: https://git.openjdk.org/jdk/pull/27306

From manc at openjdk.org  Tue Sep 16 22:09:24 2025
From: manc at openjdk.org (Man Cao)
Date: Tue, 16 Sep 2025 22:09:24 GMT
Subject: RFR: 8367613: Test
 compiler/runtime/TestDontCompileHugeMethods.java failed [v2]
In-Reply-To: <5eWiPUhybQOdBZAfm8LnEGLQ8ZwXHcqatCQEf8PVlgo=.ffcd2f7e-f734-49de-a2e4-1099bfb544f5@github.com>
References: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
 <5eWiPUhybQOdBZAfm8LnEGLQ8ZwXHcqatCQEf8PVlgo=.ffcd2f7e-f734-49de-a2e4-1099bfb544f5@github.com>
Message-ID: <t7dFJpkduQrzcnfAcPaJxmyCmEHSTUm1lr-61lNTETc=.0c1c2732-2d08-4e30-b564-b0c47450ab7d@github.com>

On Tue, 16 Sep 2025 21:59:10 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi,
>> 
>> Could anyone approve this change that exclude this test when running with `-Xcomp`? This avoids the test failure reported in [JDK-8367613](https://bugs.openjdk.org/browse/JDK-8367613).
>> 
>> For reasons I don't yet understand, the `HugeSwitch::shortMethod` method is not compiled under `-Xcomp  -XX:TieredStopAtLevel=1`. The method gets compiled with either `-Xcomp` or `-XX:TieredStopAtLevel=1`, but not both. I appreciate if anyone could provide insights on possible reasons.
>
> Man Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Switch to disable inlining for shortMethod

Thank you for the review and suggestions.

@chhagedorn Thank you for the explanation. I switched to `-XX:CompileCommand=dontinline` for `shortMethod()`. It works for `-Xcomp -XX:TieredStopAtLevel=1`.

The benefit of `dontinline` approach is that it allows the test run under `-Xcomp`, esp. `-XX:-TieredCompilation` with `-Xcomp`. It is also future-proof, in case C2 manages to inline `shortMethod()` into `main()` under `-Xcomp` in the future.

Also added bug number. And excluding `-XX:TieredStopAtLevel=1` is no longer needed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27306#issuecomment-3300497116

From dlong at openjdk.org  Wed Sep 17 01:09:39 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 17 Sep 2025 01:09:39 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
Message-ID: <sM14v3wTzmIjccAGdJ19bgJ_w8O6ZfVTzCDAYIPtkh4=.4c158d93-6a29-4024-b5e4-413c6ed29481@github.com>

On Fri, 12 Sep 2025 14:00:53 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Minor fix
>
> src/hotspot/share/opto/reachability.cpp line 81:
> 
>> 79:  * (c) Unfortunately, it's not straightforward to stay with safepoint-attached representation till the very end,
>> 80:  * because information about derived oops is attached to safepoints in a similar way. So, for now RFs are
>> 81:  * rematerialized at safepoints before RA (phase #3).
> 
> I still don't understand this. What is similar to what? And why is that a problem?

Why don't we put RF edges somewhere else, so they don't look like derived oops?  I was thinking they could go in the monitor area, or if that causes problems, we introduce a new area.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2353965620

From dlong at openjdk.org  Wed Sep 17 02:14:42 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 17 Sep 2025 02:14:42 GMT
Subject: RFR: 8367706: Remove redundant register used by cmove in C1 LIR
 generation
In-Reply-To: <TL12NFqsBwHIdjdM0xzg_O4xZE5GZB8Pd07WpBYH0aM=.93d5a973-720c-4bed-b570-f21731cedc3b@github.com>
References: <TL12NFqsBwHIdjdM0xzg_O4xZE5GZB8Pd07WpBYH0aM=.93d5a973-720c-4bed-b570-f21731cedc3b@github.com>
Message-ID: <fxElxSWQUJU1pgMxWaQciWqmHJ4VmI5aXpkWB3LIJ-w=.db0434aa-8c3d-4262-b072-9f3cd6ed184b@github.com>

On Tue, 16 Sep 2025 09:35:03 GMT, lusou-zhangquan <duke at openjdk.org> wrote:

> This PR removes redundant temp register used by cmove in C1 LIRGenerator::do_LookupSwitch and LIRGenerator::do_TableSwitch. The issue [8367706](https://bugs.openjdk.org/browse/JDK-8367706) is reported by me and it's my pleasure to fix it.

Reversing the order of the two source arguments seems wrong.  Please explain.

-------------

Changes requested by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27307#pullrequestreview-3232251609

From xgong at openjdk.org  Wed Sep 17 02:21:42 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Wed, 17 Sep 2025 02:21:42 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v3]
In-Reply-To: <k0ubo89q5sh66RtJ1D3sHphTg9s3NCBE_wkQv9KHDD4=.f7875a72-2bd7-45c5-ae16-1558e9997339@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <O5IGyu-C8N8goFvkFoKQxKuJ67f1_tedjCMqIwsLx1g=.69f50bdd-781e-4379-a8b5-12f8858ea299@github.com>
 <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com>
 <b1TzbMFznYJuFizcy93hsTxo9-hoyDe7YKUuIsy7xRA=.6811ef6f-3b3b-4b8a-b63b-75d824e65968@github.com>
 <hjuxd7lDyNoeFhxtYBMJQA1IDwzdu5tb1ZQcBqQLSeA=.623134f4-b2b8-4010-a6b5-5815e9d29aaf@github.com>
 <ylIL4AVS9i4oBXIImUlxGzE1uDAToMvzF282-EnOG8A=.a61aaeeb-cfef-44ae-8913-ee8f6f58b781@github.com>
 <k0ubo89q5sh66RtJ1D3sHphTg9s3NCBE_wkQv9KHDD4=.f7875a72-2bd7-45c5-ae16-1558e9997339@github.com>
Message-ID: <xljifFTeZ3n1kM40N19IQfj-7cmrb2cHHYT85eIy-ig=.60af0a3d-f7be-4bf2-9e8d-eb803f0dddbb@github.com>

On Tue, 9 Sep 2025 07:27:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> To me a `false` means this: If we support gater/scalter, then we do not need a vector index, we can do without it.
>>> 
>>> Is that correct?
>> 
>> Thanks for your review!  Actually gather/scatter always need an index input. What this function want to decide is how the index elements are passed to the operations.
>> 
>> It doesn't take an assumption whether vector gather_load/scatter_store is supported or not in backend. It just checks whether the `index` input of such operations requires a vector register or an address which stores the indexes. Currently, on x86, it passes an array address for subword types (the indexes are then will be loaded one-by-one in backend codegen). However, on AArch64, we requires it a vector type for all types instead (the indexes have been loaded and saved into vector registers in IR level). 
>> 
>>> The current platform does not support vector gather-load or scatter-store at all.
>> 
>> I'm sorry that I didn't  clarify very clear about @fg1417 's second statement. Whether the current platform supports vector gather-load/scatter-store is still decided by `Matcher::match_rule_supported_vector()` like other operations. It return `false` here just because arm doesn't support any vector operations. Assume if it want to support a vector gather/scatter, the index input must not be a vector, right?
>
> Thanks for all the explanations, that was very helpful!
> 
> Can you please adjust the comment so that all the relevant information is there?
> We could also make the name of the method more precise / informative?
> Maybe you could write something like this:
> 
> // true -> if gather/scatter supported: require index in vector register
> // false -> if gather/scatter supported: allows both index in vector register AND array address holding indices
> 
> Then give more information about platform specific things that you mentioned about aarch64 and x86 in the relevant files ;)

Hi @eme64 , regarding to the method name, is `gather_scatter_requires_index_in_vector()` fine to you? If so, I think I can change the name to it. Or please let me know if you have a better one. Thanks!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2354066760

From duke at openjdk.org  Wed Sep 17 02:47:21 2025
From: duke at openjdk.org (lusou-zhangquan)
Date: Wed, 17 Sep 2025 02:47:21 GMT
Subject: RFR: 8367706: Remove redundant register used by cmove in C1 LIR
 generation [v2]
In-Reply-To: <TL12NFqsBwHIdjdM0xzg_O4xZE5GZB8Pd07WpBYH0aM=.93d5a973-720c-4bed-b570-f21731cedc3b@github.com>
References: <TL12NFqsBwHIdjdM0xzg_O4xZE5GZB8Pd07WpBYH0aM=.93d5a973-720c-4bed-b570-f21731cedc3b@github.com>
Message-ID: <58MAR1O9tGfnVcoCfbv17BI-IP2qC2BuYDYc3GZQ30Q=.3a60b666-cb80-405a-9a98-d46bf724f7c0@github.com>

> This PR removes redundant temp register used by cmove in C1 LIRGenerator::do_LookupSwitch and LIRGenerator::do_TableSwitch. The issue [8367706](https://bugs.openjdk.org/browse/JDK-8367706) is reported by me and it's my pleasure to fix it.

lusou-zhangquan has updated the pull request incrementally with one additional commit since the last revision:

  Fix wrong source register order

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27307/files
  - new: https://git.openjdk.org/jdk/pull/27307/files/233e7681..aeb9cfc4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27307&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27307&range=00-01

  Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27307.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27307/head:pull/27307

PR: https://git.openjdk.org/jdk/pull/27307

From duke at openjdk.org  Wed Sep 17 03:16:35 2025
From: duke at openjdk.org (erifan)
Date: Wed, 17 Sep 2025 03:16:35 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v2]
In-Reply-To: <EsiYouuvqjFpbUVOKPtUBymx12t--iEc7QNUwBrdDJo=.545aa38b-8933-44a7-9ae5-51872308596c@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>
 <EsiYouuvqjFpbUVOKPtUBymx12t--iEc7QNUwBrdDJo=.545aa38b-8933-44a7-9ae5-51872308596c@github.com>
Message-ID: <Qrf5fEdzfsAlFRuEf2DrPf7Thj4xkdOhM_pjWv3j82Y=.1272dda2-b5bd-4649-a87d-d086f5d99fea@github.com>

On Tue, 16 Sep 2025 07:02:23 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>> 
>>  - Merge branch 'master' into JDK-8366333-compress
>>  - 8366333: AArch64: Enhance SVE subword type implementation of vector compress
>>    
>>    The AArch64 SVE and SVE2 architectures lack an instruction suitable for
>>    subword-type `compress` operations. Therefore, the current implementation
>>    uses the 32-bit SVE `compact` instruction to compress subword types by
>>    first widening the high and low parts to 32 bits, compressing them, and
>>    then narrowing them back to their original type. Finally, the high and
>>    low parts are merged using the `index + tbl` instructions.
>>    
>>    This approach is significantly slower compared to architectures with native
>>    support. After evaluating all available AArch64 SVE instructions and
>>    experimenting with various implementations?such as looping over the active
>>    elements, extraction, and insertion?I confirmed that the existing algorithm
>>    is optimal given the instruction set. However, there is still room for
>>    optimization in the following two aspects:
>>    1. Merging with `index + tbl` is suboptimal due to the high latency of
>>    the `index` instruction.
>>    2. For partial subword types, operations to the highest half are unnecessary
>>    because those bits are invalid.
>>    
>>    This pull request introduces the following changes:
>>    1. Replaces `index + tbl` with the `whilelt + splice` instructions, which
>>    offer lower latency and higher throughput.
>>    2. Eliminates unnecessary compress operations for partial subword type cases.
>>    3. For `sve_compress_byte`, one less temporary register is used to alleviate
>>    potential register pressure.
>>    
>>    Benchmark results demonstrate that these changes significantly improve performance.
>>    
>>    Benchmarks on Nvidia Grace machine with 128-bit SVE:
>>    ```
>>    Benchmark	        Unit	Before	 Error	After	 Error	Uplift
>>    Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>>    Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>>    Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>>    Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>>    ```
>>    
>>    This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments,
>>    and all tests passed.
>
> test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 36:
> 
>> 34:  * @key randomness
>> 35:  * @library /test/lib /
>> 36:  * @summary AArch64: Enhance SVE subword type implementation of vector compress
> 
> I would change the summary to something a bit more generic, since the test is not only good for aarch64 / SVE.
> Suggestion:
> 
>  * @summary IR test for VectorAPI compress

It seems that the summary and the PR title are usually consistent. Is there any convention or rule for this?

> test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 228:
> 
>> 226:                      .start();
>> 227:     }
>> 228: }
> 
> Question: is there already another test that checks `compress`?

Yes, just like `expand`, it's here https://github.com/openjdk/jdk/blob/986ecff5f9b16f1b41ff15ad94774d65f3a4631d/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L5357
This test file is mainly for IR test.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2354169473
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2354167428

From dlong at openjdk.org  Wed Sep 17 03:33:34 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 17 Sep 2025 03:33:34 GMT
Subject: RFR: 8367706: Remove redundant register used by cmove in C1 LIR
 generation [v2]
In-Reply-To: <58MAR1O9tGfnVcoCfbv17BI-IP2qC2BuYDYc3GZQ30Q=.3a60b666-cb80-405a-9a98-d46bf724f7c0@github.com>
References: <TL12NFqsBwHIdjdM0xzg_O4xZE5GZB8Pd07WpBYH0aM=.93d5a973-720c-4bed-b570-f21731cedc3b@github.com>
 <58MAR1O9tGfnVcoCfbv17BI-IP2qC2BuYDYc3GZQ30Q=.3a60b666-cb80-405a-9a98-d46bf724f7c0@github.com>
Message-ID: <6ZCa4q2sQbr59eHejCBQgdek27IHOuPQkdqln0OiFW8=.4251ab15-a8fa-4c49-8f03-8209be1a787d@github.com>

On Wed, 17 Sep 2025 02:47:21 GMT, lusou-zhangquan <duke at openjdk.org> wrote:

>> This PR removes redundant temp register used by cmove in C1 LIRGenerator::do_LookupSwitch and LIRGenerator::do_TableSwitch. The issue [8367706](https://bugs.openjdk.org/browse/JDK-8367706) is reported by me and it's my pleasure to fix it.
>
> lusou-zhangquan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix wrong source register order

Marked as reviewed by dlong (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27307#pullrequestreview-3232502452

From epeter at openjdk.org  Wed Sep 17 06:01:42 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 06:01:42 GMT
Subject: RFR: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java [v2]
In-Reply-To: <mJBCJpEMT_Yl1s5b8M2Yu6gJo18FZepU9Pj1zqUqZBU=.07df1c29-c004-437f-b611-9a71299aafd7@github.com>
References: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
 <mJBCJpEMT_Yl1s5b8M2Yu6gJo18FZepU9Pj1zqUqZBU=.07df1c29-c004-437f-b611-9a71299aafd7@github.com>
Message-ID: <bEv5hn6ZblGLpIUUvTJvxCFBYiJv9GsbsW2FdZv5Zuc=.c73d9d88-eb61-42f0-a9f5-b61b214744f5@github.com>

On Tue, 16 Sep 2025 15:42:39 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> The test definitions of `TestAlignVectorFuzzer.java` all contain `printcompilation` directives. These are redundant and slow down the test execution of a test that already often times out. @eme64 also suggested adding a `compileonly` directive to one of the four tests.
>> 
>> Testing:
>>  - [ ] Github Actions
>>  - [ ] tier1 and stress testing (features `TestAlignVectorFuzzer.java`)
>
> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8366878-align-fuzz-flags
>  - Make compileonly a separate run
>  - Fix flags

Looks good :)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27122#pullrequestreview-3232783938

From jwaters at openjdk.org  Wed Sep 17 06:02:38 2025
From: jwaters at openjdk.org (Julian Waters)
Date: Wed, 17 Sep 2025 06:02:38 GMT
Subject: RFR: 8367706: Remove redundant register used by cmove in C1 LIR
 generation [v2]
In-Reply-To: <58MAR1O9tGfnVcoCfbv17BI-IP2qC2BuYDYc3GZQ30Q=.3a60b666-cb80-405a-9a98-d46bf724f7c0@github.com>
References: <TL12NFqsBwHIdjdM0xzg_O4xZE5GZB8Pd07WpBYH0aM=.93d5a973-720c-4bed-b570-f21731cedc3b@github.com>
 <58MAR1O9tGfnVcoCfbv17BI-IP2qC2BuYDYc3GZQ30Q=.3a60b666-cb80-405a-9a98-d46bf724f7c0@github.com>
Message-ID: <wMnf-KsDEXwZE8S8PLP-1pqS5DXbqe5xzj_z-rQ-5Cc=.59b239f3-48f3-4df1-9c73-8152a77907de@github.com>

On Wed, 17 Sep 2025 02:47:21 GMT, lusou-zhangquan <duke at openjdk.org> wrote:

>> This PR removes redundant temp register used by cmove in C1 LIRGenerator::do_LookupSwitch and LIRGenerator::do_TableSwitch. The issue [8367706](https://bugs.openjdk.org/browse/JDK-8367706) is reported by me and it's my pleasure to fix it.
>
> lusou-zhangquan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix wrong source register order

This appears to be causing all x64 JDKs to behave wrongly according to Actions

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27307#issuecomment-3301423528

From epeter at openjdk.org  Wed Sep 17 06:07:45 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 06:07:45 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <3Cy6jhWxbaQeWwo22L9nxPnipY1-vHsGZEtk8IZUiq8=.bfefdef7-0137-422b-a7b0-e4fae2a5b282@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
 <sZvkxuXnOiN1VKfG92NEmZl2f4g0xdTBwF62_lhWlZg=.c5c434d8-00fa-4b26-a127-dad00aea9fe6@github.com>
 <3Cy6jhWxbaQeWwo22L9nxPnipY1-vHsGZEtk8IZUiq8=.bfefdef7-0137-422b-a7b0-e4fae2a5b282@github.com>
Message-ID: <WlNycrFJxUGh9p1sE-I-cTwAuidgnPGUQl-98GxqQG0=.4419f936-35e7-4b64-aa50-831247e8390c@github.com>

On Tue, 16 Sep 2025 20:00:10 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/vectorapi/TestVectorMathLib.java line 33:
>> 
>>> 31:  * @test
>>> 32:  * @bug 8367333
>>> 33:  * @requires vm.compiler2.enabled
>> 
>> Do you need this `@requires`? It might be nice to be able to run this with other compilers too.
>
> It's intended as C2-specific regression test and it relies on C2-specific VM flags. Vector API unit tests (under test/jdk/jdk/incubator/vector/) exercise the very same functionality, but don't specify flags required to trigger the bug.

I leave it up to you. You can always ignore unrecognized flags. And tests tend to diverge over time, so maybe a little duplication would not hurt. But up to you.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2354404792

From epeter at openjdk.org  Wed Sep 17 06:11:42 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 06:11:42 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <OQOC6AdwvpZrL5teIwHml6xnayliLgsY5VjI87p5XNs=.cee478ee-56f8-4b6b-bb68-0a5c4ea24df7@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
 <OQOC6AdwvpZrL5teIwHml6xnayliLgsY5VjI87p5XNs=.cee478ee-56f8-4b6b-bb68-0a5c4ea24df7@github.com>
Message-ID: <8j6oTk-ZdlV7VH7N9gfyWYaBLPezIdi11dy5r9892c8=.bb57a43e-2823-4c0b-8cde-c96d8fd3df4f@github.com>

On Tue, 16 Sep 2025 20:09:18 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.
>> 
>> Consider `FloatVector::lanewiseTemplate`:
>> 
>>     FloatVector lanewiseTemplate(VectorOperators.Unary op) {
>>         if (opKind(op, VO_SPECIAL)) {
>>             ...                             
>>             else if (opKind(op, VO_MATHLIB)) {
>>                 return unaryMathOp(op);
>>             }
>>         }
>>         int opc = opCode(op);
>>         return VectorSupport.unaryOp(opc, ...);
>>     }
>> 
>> 
>> At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 
>> 
>> It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.
>> 
>> The fix is to fail-fast intrinsification rather than crashing the VM.
>> 
>> Testing: tier1 - tier4
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review feedback

Looks reasonable.

I leave it up to you with the `@requires`. I was wondering why not just add the extra run with special flags to the original test? But I don't want to hold up the PR, so up to you ;)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27263#pullrequestreview-3232804061

From epeter at openjdk.org  Wed Sep 17 06:11:44 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 06:11:44 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <WlNycrFJxUGh9p1sE-I-cTwAuidgnPGUQl-98GxqQG0=.4419f936-35e7-4b64-aa50-831247e8390c@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
 <sZvkxuXnOiN1VKfG92NEmZl2f4g0xdTBwF62_lhWlZg=.c5c434d8-00fa-4b26-a127-dad00aea9fe6@github.com>
 <3Cy6jhWxbaQeWwo22L9nxPnipY1-vHsGZEtk8IZUiq8=.bfefdef7-0137-422b-a7b0-e4fae2a5b282@github.com>
 <WlNycrFJxUGh9p1sE-I-cTwAuidgnPGUQl-98GxqQG0=.4419f936-35e7-4b64-aa50-831247e8390c@github.com>
Message-ID: <k8fR7jU8NQKCVvJ3epN1Me5Yvp19D60-HDCghKqoxsU=.3b6d27c7-b155-4325-8b8c-a5d624eee69a@github.com>

On Wed, 17 Sep 2025 06:05:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> It's intended as C2-specific regression test and it relies on C2-specific VM flags. Vector API unit tests (under test/jdk/jdk/incubator/vector/) exercise the very same functionality, but don't specify flags required to trigger the bug.
>
> I leave it up to you. You can always ignore unrecognized flags. And tests tend to diverge over time, so maybe a little duplication would not hurt. But up to you.

And if it is a duplication, you should probably leave a comment linking it to the other one.
Also: why not just add the extra run over at the original test?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2354410385

From epeter at openjdk.org  Wed Sep 17 06:21:35 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 06:21:35 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v3]
In-Reply-To: <xljifFTeZ3n1kM40N19IQfj-7cmrb2cHHYT85eIy-ig=.60af0a3d-f7be-4bf2-9e8d-eb803f0dddbb@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <O5IGyu-C8N8goFvkFoKQxKuJ67f1_tedjCMqIwsLx1g=.69f50bdd-781e-4379-a8b5-12f8858ea299@github.com>
 <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com>
 <b1TzbMFznYJuFizcy93hsTxo9-hoyDe7YKUuIsy7xRA=.6811ef6f-3b3b-4b8a-b63b-75d824e65968@github.com>
 <hjuxd7lDyNoeFhxtYBMJQA1IDwzdu5tb1ZQcBqQLSeA=.623134f4-b2b8-4010-a6b5-5815e9d29aaf@github.com>
 <ylIL4AVS9i4oBXIImUlxGzE1uDAToMvzF282-EnOG8A=.a61aaeeb-cfef-44ae-8913-ee8f6f58b781@github.com>
 <k0ubo89q5sh66RtJ1D3sHphTg9s3NCBE_wkQv9KHDD4=.f7875a72-2bd7-45c5-ae16-1558e9997339@github.com>
 <xljifFTeZ3n1kM40N19IQfj-7cmrb2cHHYT85eIy-ig=.60af0a3d-f7be-4bf2-9e8d-eb803f0dddbb@github.com>
Message-ID: <8QfxQ-oTWaWPUIHkOODfYCHUyAzxhGksLwX56sKva10=.764e4cbe-4416-47c6-8c11-5d282249b017@github.com>

On Wed, 17 Sep 2025 02:18:17 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Thanks for all the explanations, that was very helpful!
>> 
>> Can you please adjust the comment so that all the relevant information is there?
>> We could also make the name of the method more precise / informative?
>> Maybe you could write something like this:
>> 
>> // true -> if gather/scatter supported: require index in vector register
>> // false -> if gather/scatter supported: allows both index in vector register AND array address holding indices
>> 
>> Then give more information about platform specific things that you mentioned about aarch64 and x86 in the relevant files ;)
>
> Hi @eme64 , regarding to the method name, is `gather_scatter_requires_index_in_vector()` fine to you? If so, I think I can change the name to it. Or please let me know if you have a better one. Thanks!
> 
> BTW, do you think it's better if I reverse the function of this method, such as `gather_scatter_requires_index_in_addr()`. Because gather/scatter is a vector operation. By default, accepting a  vector input usually make sense. And this is true for all word and double-word types. The subword type loading which requires the indexes saved in an address on X86 is a corner case to me.

If I understood right, some platforms only support addr, some only index, right? Are there any that support both? You could also have 2 methods, that say `allows` or maybe more idiomatically for hotspot C2 `implements / implemented`. Yet another alternative: `enum` with the different states.

These are just some ideas. But from what you are telling me, it would really make sense to go with `gather_scatter_requires_index_in_addr`, since the addr case is indeed a corner-case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2354427486

From xgong at openjdk.org  Wed Sep 17 06:25:42 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Wed, 17 Sep 2025 06:25:42 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v3]
In-Reply-To: <8QfxQ-oTWaWPUIHkOODfYCHUyAzxhGksLwX56sKva10=.764e4cbe-4416-47c6-8c11-5d282249b017@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <O5IGyu-C8N8goFvkFoKQxKuJ67f1_tedjCMqIwsLx1g=.69f50bdd-781e-4379-a8b5-12f8858ea299@github.com>
 <1XFXtkTlDshGtoxEdLVg0f2J2rtn4wz7CdUB9pb9N2g=.25e7e0b5-8468-4d91-adb9-c459bda40933@github.com>
 <b1TzbMFznYJuFizcy93hsTxo9-hoyDe7YKUuIsy7xRA=.6811ef6f-3b3b-4b8a-b63b-75d824e65968@github.com>
 <hjuxd7lDyNoeFhxtYBMJQA1IDwzdu5tb1ZQcBqQLSeA=.623134f4-b2b8-4010-a6b5-5815e9d29aaf@github.com>
 <ylIL4AVS9i4oBXIImUlxGzE1uDAToMvzF282-EnOG8A=.a61aaeeb-cfef-44ae-8913-ee8f6f58b781@github.com>
 <k0ubo89q5sh66RtJ1D3sHphTg9s3NCBE_wkQv9KHDD4=.f7875a72-2bd7-45c5-ae16-1558e9997339@github.com>
 <xljifFTeZ3n1kM40N19IQfj-7cmrb2cHHYT85eIy-ig=.60af0a3d-f7be-4bf2-9e8d-eb803f0dddbb@github.com>
 <8QfxQ-oTWaWPUIHkOODfYCHUyAzxhGksLwX56sKva10=.764e4cbe-4416-47c6-8c11-5d282249b017@github.com>
Message-ID: <B9YO_DM5yO5xKcFlyAK84hLg6aMNczHcv1WekMXknmE=.b38c7119-f442-4c23-9595-bd50b56ef2ac@github.com>

On Wed, 17 Sep 2025 06:18:30 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> If I understood right, some platforms only support addr, some only index, right? Are there any that support both? 

Right. I don't think any arch support both the style. Either a vector index or an array address is enough.

Besides, C2 has the helper `Matcher::match_rule_supported_vector()` which can check whether an op is implemented yet or not.

> These are just some ideas. But from what you are telling me, it would really make sense to go with gather_scatter_requires_index_in_addr, since the addr case is indeed a corner-case.

Yes, I think this would be better. Thanks!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2354434754

From epeter at openjdk.org  Wed Sep 17 06:27:36 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 06:27:36 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v2]
In-Reply-To: <Qrf5fEdzfsAlFRuEf2DrPf7Thj4xkdOhM_pjWv3j82Y=.1272dda2-b5bd-4649-a87d-d086f5d99fea@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>
 <EsiYouuvqjFpbUVOKPtUBymx12t--iEc7QNUwBrdDJo=.545aa38b-8933-44a7-9ae5-51872308596c@github.com>
 <Qrf5fEdzfsAlFRuEf2DrPf7Thj4xkdOhM_pjWv3j82Y=.1272dda2-b5bd-4649-a87d-d086f5d99fea@github.com>
Message-ID: <WxnlIDe3oqkkECuLdBvLKF3XxKxVw8VSL-v2jSpyfgY=.d2d9b002-be0b-4d75-a72d-6ac2affd9cd1@github.com>

On Wed, 17 Sep 2025 03:14:02 GMT, erifan <duke at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 36:
>> 
>>> 34:  * @key randomness
>>> 35:  * @library /test/lib /
>>> 36:  * @summary AArch64: Enhance SVE subword type implementation of vector compress
>> 
>> I would change the summary to something a bit more generic, since the test is not only good for aarch64 / SVE.
>> Suggestion:
>> 
>>  * @summary IR test for VectorAPI compress
>
> It seems that the summary and the PR title are usually consistent. Is there any convention or rule for this?

I think that people often just do whatever they feel like. But I think the summary should summarize the content of the test, give maybe a reason for the test. Sometimes the PR title captures the intent of the test, then I'm fine with that. But sometimes the PR title is not quite adequate, maybe too narrow like here. But it is not a big deal, just a little nit ;)

>> test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 228:
>> 
>>> 226:                      .start();
>>> 227:     }
>>> 228: }
>> 
>> Question: is there already another test that checks `compress`?
>
> Yes, just like `expand`, it's here https://github.com/openjdk/jdk/blob/986ecff5f9b16f1b41ff15ad94774d65f3a4631d/test/jdk/jdk/incubator/vector/Byte128VectorTests.java#L5357
> This test file is mainly for IR test.

Nice, thanks! I forgot to search over there ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2354439774
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2354435654

From epeter at openjdk.org  Wed Sep 17 07:02:43 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 07:02:43 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v13]
In-Reply-To: <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
Message-ID: <qkFgbTfGz30T_AzW9FwkZIsMZ7oB9Fj5eydVC5Ry3B4=.09f8c379-e4eb-41c2-9aa3-e7cb967d42ba@github.com>

On Mon, 15 Sep 2025 05:43:11 GMT, erifan <duke at openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
>> 
>> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
>> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
>> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
>> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
>> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
>> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
>> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
>> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
>> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
>> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
>> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
>> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
>> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
>> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
>> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
>> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
>> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
>> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
>> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
>> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
>> testCompareLTMaskNotInt		ops/s	16721...
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add an IR rule for vector mask cast operation

@erifan Thanks for the work! All tests pass on my side, patch looks good to me too :)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-3232943779

From mhaessig at openjdk.org  Wed Sep 17 07:04:57 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 17 Sep 2025 07:04:57 GMT
Subject: RFR: 8367721: Test compiler/arguments/TestCompileTaskTimeout.java
 crashed: SIGSEGV
Message-ID: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>

The test `TestCompileTaskTimeout.java` runs `java -Xcomp -XX:CompileTaskTimeout=1 --version` to demonstrate that the timeout works. Part of the timeout working involves it printing the method of the compile task. Inspecting the core file of the execution that failed with a `SIGSEGV` in the compile task timeout signal handler, the backtrace looks as follows:

#n   <called signal handler>
#n+1 CompilerThreadTimeoutLinux::signal_handler()
#n+2 <called signal handler>
#n+3 timer_settime()
#n+4 CompilerThreadTimeoutLinux::disarm()
#n+5 CompileTaskWrapper::~CompileTaskWrapper()

So, the compile task hit the timeout during destruction of the underlying `CompileTaskWrapper`. Since the timeout was disarmed only after setting the task to null in the destructor, the signal handler segfaulted when trying to access the method of the compile task to print it out. This PR addresses this issue by moving up the disarmament of the timeout to the top of the destructor.

Because this issue can only be triggered with bad --- or good, depending on your view --- luck on timing, I could not devise a regression test. But this is not too big of an issue, since the CI already caught this issue.

Testing:
 - [ ] Github Actions
 - [ ] tier1,tier2,tier3 plus stress testing on Oracle supported platforms

-------------

Commit messages:
 - Move disarmament of timeout to the very beginning of destuctor

Changes: https://git.openjdk.org/jdk/pull/27331/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27331&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367721
  Stats: 5 lines in 1 file changed: 4 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27331.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27331/head:pull/27331

PR: https://git.openjdk.org/jdk/pull/27331

From duke at openjdk.org  Wed Sep 17 07:26:52 2025
From: duke at openjdk.org (erifan)
Date: Wed, 17 Sep 2025 07:26:52 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v13]
In-Reply-To: <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
Message-ID: <27SoQ3ZhkmDXmpLXeRiBu3eJychQuq-BgZ9VEE5Ab_U=.82d70745-599b-4edf-ba8e-54c4956ea166@github.com>

On Mon, 15 Sep 2025 05:43:11 GMT, erifan <duke at openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
>> 
>> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
>> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
>> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
>> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
>> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
>> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
>> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
>> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
>> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
>> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
>> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
>> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
>> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
>> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
>> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
>> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
>> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
>> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
>> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
>> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
>> testCompareLTMaskNotInt		ops/s	16721...
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add an IR rule for vector mask cast operation

Thanks all for your help, I'll integrate the PR.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3301646116

From duke at openjdk.org  Wed Sep 17 07:26:53 2025
From: duke at openjdk.org (duke)
Date: Wed, 17 Sep 2025 07:26:53 GMT
Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare
 [v13]
In-Reply-To: <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
 <Uw3Zu_QQI6cPp7DUf4uHNO_ZsrSWS1jnnJJ3gtfXyi8=.c2a93a2c-f3cf-4489-98aa-e25abc448fd7@github.com>
Message-ID: <F0-3zYswq-gXT7BjUQFACoWsk3i9EJbWon7-acrs8NE=.5dbe7df0-3deb-42b6-8635-454e232f302d@github.com>

On Mon, 15 Sep 2025 05:43:11 GMT, erifan <duke at openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
>> 
>> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
>> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
>> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
>> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
>> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
>> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
>> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
>> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
>> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
>> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
>> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
>> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
>> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
>> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
>> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
>> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
>> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
>> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
>> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
>> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
>> testCompareLTMaskNotInt		ops/s	16721...
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add an IR rule for vector mask cast operation

@erifan 
Your change (at version 56bb34ffe3ca104c8f838a41f33b1d90bb10b68b) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-3301651892

From duke at openjdk.org  Wed Sep 17 07:35:04 2025
From: duke at openjdk.org (erifan)
Date: Wed, 17 Sep 2025 07:35:04 GMT
Subject: Integrated: 8354242: VectorAPI: combine vector not operation with
 compare
In-Reply-To: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
References: <mWHKJozX3YTvBOjLDUa1_2QQiNm6xVc2rb1FtXcZzPc=.e2405ec4-09ed-46c1-acd5-3f39083fce3f@github.com>
Message-ID: <9jXNL4s-eyJLY6-tYH6-4B5AFrZi-Kr_-J-S2H88Lmc=.8fe08e80-6ed1-4971-b23a-9e1a5b8a4916@github.com>

On Wed, 16 Apr 2025 06:39:33 GMT, erifan <duke at openjdk.org> wrote:

> This patch optimizes the following patterns:
> For integer types:
> 
> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>     => (VectorMaskCmp src1 src2 ncond)
> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>     => (VectorMaskCmp src1 src2 ncond)
> 
> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
> 
> For float and double types:
> 
> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
> 
> cond can be eq or ne.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`:
> 
> Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
> testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
> testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
> testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
> testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
> testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
> testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
> testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
> testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
> testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
> testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
> testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
> testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
> testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
> testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
> testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
> testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
> testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
> testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
> testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
> testCompareLTMaskNotInt		ops/s	1672180.09	995.238142	2353757.863	853.774734	1.4
> testCompareLTMaskNotLong	ops/s	856502.26...

This pull request has now been integrated.

Changeset: 45cc515f
Author:    erifan <erfang at nvidia.com>
Committer: Xiaohong Gong <xgong at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/45cc515f451accfd1a0a36d17ccb38d428a5d035
Stats:     1635 lines in 7 files changed: 1634 ins; 0 del; 1 mod

8354242: VectorAPI: combine vector not operation with compare

Reviewed-by: epeter, jbhateja, xgong

-------------

PR: https://git.openjdk.org/jdk/pull/24674

From snatarajan at openjdk.org  Wed Sep 17 07:50:37 2025
From: snatarajan at openjdk.org (Saranya Natarajan)
Date: Wed, 17 Sep 2025 07:50:37 GMT
Subject: RFR: 8356779: IGV: dump the index of the SafePointNode containing
 the current JVMS during parsing
In-Reply-To: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
References: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
Message-ID: <ErdnlkqsQgv5Abp80Rpx_2mQVmACcnQP6pUPVEuZpMI=.ccfdfbd0-56d3-422b-a319-6f0a0592c8c3@github.com>

On Thu, 4 Sep 2025 05:22:00 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> This PR prints index of the SafePointNode containing the current JVMS during parsing in IGV. As stated in JBS the reason for this is that there are a lot of nodes during parsing, it would be nice to know what are the current nodes in the local slots or in the stack when looking at a graph.
> 
> IGV screenshot of before fix 
> <img width="314" height="675" alt="Screenshot 2025-09-15 at 11 56 54" src="https://github.com/user-attachments/assets/3489f580-f4a3-4f22-86a6-0d9351f1d143" />
> 
> IGV screenshot of after fix
> <img width="314" height="652" alt="Screenshot 2025-09-15 at 11 54 55" src="https://github.com/user-attachments/assets/239c7d60-7a2a-4608-acd0-036bab3ae048" />

Thanks for the review. Please sponsor.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27083#issuecomment-3301740477

From mchevalier at openjdk.org  Wed Sep 17 07:52:33 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Wed, 17 Sep 2025 07:52:33 GMT
Subject: RFR: 8367721: Test compiler/arguments/TestCompileTaskTimeout.java
 crashed: SIGSEGV
In-Reply-To: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>
References: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>
Message-ID: <dM0zkUAC_VIxTJCq4JavWqZbHlaNtad3fbpn-xwq_7k=.a9dca0b3-ec6b-4375-98a8-21ba89546bb7@github.com>

On Wed, 17 Sep 2025 06:57:29 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> The test `TestCompileTaskTimeout.java` runs `java -Xcomp -XX:CompileTaskTimeout=1 --version` to demonstrate that the timeout works. Part of the timeout working involves it printing the method of the compile task. Inspecting the core file of the execution that failed with a `SIGSEGV` in the compile task timeout signal handler, the backtrace looks as follows:
> 
> #n   <called signal handler>
> #n+1 CompilerThreadTimeoutLinux::signal_handler()
> #n+2 <called signal handler>
> #n+3 timer_settime()
> #n+4 CompilerThreadTimeoutLinux::disarm()
> #n+5 CompileTaskWrapper::~CompileTaskWrapper()
> 
> So, the compile task hit the timeout during destruction of the underlying `CompileTaskWrapper`. Since the timeout was disarmed only after setting the task to null in the destructor, the signal handler segfaulted when trying to access the method of the compile task to print it out. This PR addresses this issue by moving up the disarmament of the timeout to the top of the destructor.
> 
> Because this issue can only be triggered with bad --- or good, depending on your view --- luck on timing, I could not devise a regression test. But this is not too big of an issue, since the CI already caught this issue.
> 
> Testing:
>  - [ ] Github Actions
>  - [ ] tier1,tier2,tier3 plus stress testing on Oracle supported platforms

That makes a lot of sense to me.

If I understand well, it happens when the compile task is naturally terminating, when compilation is done in pretty much the delay granted by the timeout. It doesn't come from the concurrent handling of the timeout and some other kind of error?

-------------

Marked as reviewed by mchevalier (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27331#pullrequestreview-3233140061

From shade at openjdk.org  Wed Sep 17 08:02:39 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Sep 2025 08:02:39 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode [v2]
In-Reply-To: <Kuu7xiJHnurxjpmgBW6eSVqZpBCJ4p2kjUPVNoCn4_A=.76ab934b-7ba2-4d53-b66a-119c4f82fb6a@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
 <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
 <Kuu7xiJHnurxjpmgBW6eSVqZpBCJ4p2kjUPVNoCn4_A=.76ab934b-7ba2-4d53-b66a-119c4f82fb6a@github.com>
Message-ID: <HXDOZXi-KEBNs3c4DiU6Ud6JEkbfDFo22mEYC05r8D8=.4f67f93d-154d-42bb-ba30-727a152b157a@github.com>

On Tue, 16 Sep 2025 06:48:54 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8367313-ctw-headless-mode
>>  - Fix
>
> @TobiHartmann is on vacation. Maybe @vnkozlov ?

Thanks. @eme64 -- I assume testing came back clean? (There seems to be nothing to break here...)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27187#issuecomment-3301790464

From xgong at openjdk.org  Wed Sep 17 08:48:16 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Wed, 17 Sep 2025 08:48:16 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v6]
In-Reply-To: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
Message-ID: <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>

> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
> 
> ### Background
> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
> 
> ### Implementation
> 
> #### Challenges
> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
> 
> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
> 
> Use `ByteVector.SPECIES_512` as an example:
> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
> - It requires 4 times of vector gather-loads to finish the whole operation.
> 
> 
> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
> int[] idx = [0, 1, 2, 3, ..., 63, ...]
> 
> 4 gather-load:
> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
> 
> 
> #### Solution
> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
> 
> Here is the main changes:
> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
> - Added `VectorSliceNode` for result merging.
> - Added `VectorMaskWidenNode` for mask spliting and type conversion fo...

Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:

 - Add more comments for IRs and added method
 - Merge branch 'jdk:master' into JDK-8351623-sve
 - Merge 'jdk:master' into JDK-8351623-sve
 - Address review comments
 - Refine IR pattern and clean backend rules
 - Fix indentation issue and move the helper matcher method to header files
 - Merge branch jdk:master into JDK-8351623-sve
 - 8351623: VectorAPI: Add SVE implementation of subword gather load operation

-------------

Changes: https://git.openjdk.org/jdk/pull/26236/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26236&range=05
  Stats: 1070 lines in 20 files changed: 907 ins; 24 del; 139 mod
  Patch: https://git.openjdk.org/jdk/pull/26236.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26236/head:pull/26236

PR: https://git.openjdk.org/jdk/pull/26236

From xgong at openjdk.org  Wed Sep 17 08:48:21 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Wed, 17 Sep 2025 08:48:21 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
Message-ID: <jml2l7L8vagTGx56ahd0lB3nSfJxBmwIapw-51Mkm9M=.5f96ec72-b888-45cd-abc0-b884e5bdcd13@github.com>

On Fri, 5 Sep 2025 10:49:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Merge 'jdk:master' into JDK-8351623-sve
>>  - Address review comments
>>  - Refine IR pattern and clean backend rules
>>  - Fix indentation issue and move the helper matcher method to header files
>>  - Merge branch jdk:master into JDK-8351623-sve
>>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation
>
> Looks very interesting. I have a first series of questions / comments :)
> 
> There is definitively a tradeoff between complexity in the backend and in the C2 IR. So I'm yet trying to wrap my head around that decision. I'm just afraid that adding more very specific C2 IR nodes makes things more complicated to do optimizations in the C2 IR.

Hi @eme64 , I just push a commit which added more comments and assertion in the code. This is just a simple fixing to part of your comments. Regarding to the IR refinement, I need more time taking a look. So could you please take another look at the changes relative to method_rename/comment/assertion? Thanks a lot in advance!

> src/hotspot/cpu/aarch64/aarch64_vector.ad line 6008:
> 
>> 6006: // predicate and place in elements of twice their size within
>> 6007: // the destination predicate.
>> 6008: 
> 
> Suggestion:
> 
> 
> unnecessary empty line

This empty line is auto-generated by the m4 file. I tried some methods to clean it, but all fails. So I have to keep it as it is.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3301972528
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2354773033

From dlunden at openjdk.org  Wed Sep 17 08:56:51 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 17 Sep 2025 08:56:51 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <jUdabfZhjHMX6jJJIgxClIOimKcNN_JyfblrQuF94hY=.b6ad8b57-e731-45d0-88c8-e4906216541e@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
 <zjaKBlp2oBBNRp9pjJZphn3qnqeS0J8kxqoIgEkrMMc=.fe7a7c02-f146-4e92-a89f-2355c3d32160@github.com>
 <jUdabfZhjHMX6jJJIgxClIOimKcNN_JyfblrQuF94hY=.b6ad8b57-e731-45d0-88c8-e4906216541e@github.com>
Message-ID: <oxZIM1T62q2cP55X31ncYqx4eZQhDyK-hRqCnKInu60=.6ebefd31-b117-4f57-be1f-b3c75a6e5034@github.com>

On Tue, 16 Sep 2025 14:21:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Here I'll argue not touching this in this PR (I did not introduce this), as this is the style of the surrounding code. Should be addressed in a follow-up PR though.
>
> I'd say this is not just formatting/naming, but code style. We usually fix these cases when we touch the code ;)

All right, I'll fix the two local occurrences for `value[ureg_lo]`. I'm sure there are more in `postaloc.cpp` though.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2354804179

From mhaessig at openjdk.org  Wed Sep 17 09:06:25 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 17 Sep 2025 09:06:25 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block
In-Reply-To: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
Message-ID: <CvihFOm8NmwV8tPA1Qr1R4h5FpFfDv0PznlH39Wjb2g=.ea64c956-d341-4a38-967d-7e689de060fb@github.com>

On Thu, 11 Sep 2025 06:52:19 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ------------------------------
> 
> **Goals**
> - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
> - Remove `_nodes` from the vector vtnodes.
> 
> **Details**
> - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
>   - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
> - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
> - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
> - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.
> 
> I also made a lot of annotations in the code below, for easier review.
> 
> **Suggested order for review**
> - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
> - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
> - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
> - `VTransformApplyState`: how it now tracks the memory state.
> - `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
> - Then look at all the other details.

Thank you for your continued effort on this, @eme64! The overall change looks good to me, but I have a few minor suggestions and questions.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 143:

> 141:       init_req_with_scalar(n, vtn, MemNode::ValueIn);
> 142:       add_memory_dependencies_of_node_to_vtnode(n, vtn, vtn_memory_dependencies);
> 143:     } else if (n->isa_CountedLoop()) {

Suggestion:

    } else if (n->is_CountedLoop()) {

This is an implicit `!= nullptr`  otherwise.

src/hotspot/share/opto/vectorization.cpp line 228:

> 226:       PhiNode* head = _heads.at(alias_idx);
> 227:       if (head == nullptr) {
> 228:         // We did not find a phi on this slice yet -> must be a slice with only loads.

Could you elaborate for my understanding why this is? Could this not find the load before the phi?

src/hotspot/share/opto/vtransform.hpp line 30:

> 28: #include "opto/vectorization.hpp"
> 29: #include "opto/vectornode.hpp"
> 30: #include "utilities/debug.hpp"

Am I missing something, because I cannot make out the use?

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27208#pullrequestreview-3233203020
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2354695284
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2354674224
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2354824440

From dlunden at openjdk.org  Wed Sep 17 09:17:18 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 17 Sep 2025 09:17:18 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
Message-ID: <ZdPcc8KFmIwQl5pRnfFxihXB5ewGSRX2EHoeAqK9o6w=.3bc10d73-b5a6-4727-adca-ccc9eecb71d5@github.com>

On Tue, 16 Sep 2025 12:01:22 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/regmask.hpp line 241:
> 
>> 239:   //          \_______________________________________________________________________________/
>> 240:   //                                                  |
>> 241:   //                                  _rm_size_in_words=_offset=5
> 
> Can you please add some concise comment why we need `rollover`? Does that happen during register allocation, and if we have rollover then we start spilling instead of keeping values in registers?

I'll update the comment above the definition of `_offset` (which I'll move just above this) to hopefully make this clearer.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2354855077

From dlunden at openjdk.org  Wed Sep 17 09:17:19 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 17 Sep 2025 09:17:19 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <GM3p22TWNkH3xEAvP--T2S4XQcxeX0wEGOyLvd-at3Q=.e088d3b8-5ded-4f61-a1db-231655d7e768@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <gTSXZnQJJzDX9PwW7gai40uUXCf-0ITQui6gN9KZqvk=.a0732829-927d-4fe7-a462-fd4d00fe77f3@github.com>
 <CgDqueiQ2qerbQb_2ZSx6gLYbcZfeZCAMEydhpCaNPg=.7e987b4c-7fe2-406f-9f91-1b630bbdab7a@github.com>
 <GM3p22TWNkH3xEAvP--T2S4XQcxeX0wEGOyLvd-at3Q=.e088d3b8-5ded-4f61-a1db-231655d7e768@github.com>
Message-ID: <wz8HaJ5kbh5Y_PG7SYX2IPSVZWG6aosH2l4cDcDSZZo=.4f7b1a54-31cf-45fe-8751-909be46bf18a@github.com>

On Tue, 16 Sep 2025 14:27:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The ADLC-generated code relies on using the constructor implicitly, so I prefer not touching it in this changeset at least. All the copies are deep, clarified now.
>
> Ok, I understand. Can you show me an example, so I can understand a little better?

Here is an example (compiled from different files). The constructor gets called when converting from reference to value at the return of `divI_proj_mask`. We could fix this by making the return type of  `Matcher::divI_proj_mask` `RegMask&` and updating `UDivModINode::match` accordingly, but I regard this as out of scope for this already large PR. I'd be happy to have a look at this in a follow-up PR.


const RegMask _INT_RAX_REG_mask( 0x100000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, false );

inline const RegMask &INT_RAX_REG_mask() { return _INT_RAX_REG_mask; }

// Register for DIVI projection of divmodI
RegMask Matcher::divI_proj_mask() {
  return INT_RAX_REG_mask();
}

//------------------------------match------------------------------------------
// return result(s) along with their RegMask info
Node* UDivModINode::match( const ProjNode *proj, const Matcher *match ) {
  uint ideal_reg = proj->ideal_reg();
  RegMask rm;
  if (proj->_con == div_proj_num) {
    rm = match->divI_proj_mask();
  } else {
    assert(proj->_con == mod_proj_num, "must be div or mod projection");
    rm = match->modI_proj_mask();
  }
  return new MachProjNode(this, proj->_con, rm, ideal_reg);
}

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2354848956

From mhaessig at openjdk.org  Wed Sep 17 09:38:02 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 17 Sep 2025 09:38:02 GMT
Subject: RFR: 8367721: Test compiler/arguments/TestCompileTaskTimeout.java
 crashed: SIGSEGV
In-Reply-To: <dM0zkUAC_VIxTJCq4JavWqZbHlaNtad3fbpn-xwq_7k=.a9dca0b3-ec6b-4375-98a8-21ba89546bb7@github.com>
References: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>
 <dM0zkUAC_VIxTJCq4JavWqZbHlaNtad3fbpn-xwq_7k=.a9dca0b3-ec6b-4375-98a8-21ba89546bb7@github.com>
Message-ID: <gfMO-EfyXoYWtuWIg7lrm3V8wNKUVPxu5W76F6rGytc=.1918d5bd-10cf-4ca7-8cea-861b7e1ab6a9@github.com>

On Wed, 17 Sep 2025 07:49:47 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> The test `TestCompileTaskTimeout.java` runs `java -Xcomp -XX:CompileTaskTimeout=1 --version` to demonstrate that the timeout works. Part of the timeout working involves it printing the method of the compile task. Inspecting the core file of the execution that failed with a `SIGSEGV` in the compile task timeout signal handler, the backtrace looks as follows:
>> 
>> #n   <called signal handler>
>> #n+1 CompilerThreadTimeoutLinux::signal_handler()
>> #n+2 <called signal handler>
>> #n+3 timer_settime()
>> #n+4 CompilerThreadTimeoutLinux::disarm()
>> #n+5 CompileTaskWrapper::~CompileTaskWrapper()
>> 
>> So, the compile task hit the timeout during destruction of the underlying `CompileTaskWrapper`. Since the timeout was disarmed only after setting the task to null in the destructor, the signal handler segfaulted when trying to access the method of the compile task to print it out. This PR addresses this issue by moving up the disarmament of the timeout to the top of the destructor.
>> 
>> Because this issue can only be triggered with bad --- or good, depending on your view --- luck on timing, I could not devise a regression test. But this is not too big of an issue, since the CI already caught this issue.
>> 
>> Testing:
>>  - [ ] Github Actions
>>  - [ ] tier1,tier2,tier3 plus stress testing on Oracle supported platforms
>
> That makes a lot of sense to me.
> 
> If I understand well, it happens when the compile task is naturally terminating, when compilation is done in pretty much the delay granted by the timeout. It doesn't come from the concurrent handling of the timeout and some other kind of error?

Thank you for taking a look, @marc-chevalier.

> It doesn't come from the concurrent handling of the timeout and some other kind of error?

There is only one timeout going on for every compile task and thus for every compiler thread and they are delivered directly to the native thread running the compiler thread with the timed out compile task. Since `CompileTask`s are thread local, this cannot come from concurrent timeouts. I can successfully exclude other kinds of errors, because the timeout signal handler does not have any other ways to segfault other than on accessing the task, because it does not use any of the pointers passed to it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27331#issuecomment-3302151009

From snatarajan at openjdk.org  Wed Sep 17 09:48:36 2025
From: snatarajan at openjdk.org (Saranya Natarajan)
Date: Wed, 17 Sep 2025 09:48:36 GMT
Subject: Integrated: 8356779: IGV: dump the index of the SafePointNode
 containing the current JVMS during parsing
In-Reply-To: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
References: <dZIsHRR3HjgSVELUlQd4vQX4SeluQeiefbUxn2x_VoQ=.86c48c18-58fe-44ab-9cf8-b32c2c8d8827@github.com>
Message-ID: <KCLRcGZzMhhYr-6JOZCpIDYVYCdCFh2dw1NYX0JeMTI=.2cb902d9-00ce-47f6-91d7-e93a304db8cd@github.com>

On Thu, 4 Sep 2025 05:22:00 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> This PR prints index of the SafePointNode containing the current JVMS during parsing in IGV. As stated in JBS the reason for this is that there are a lot of nodes during parsing, it would be nice to know what are the current nodes in the local slots or in the stack when looking at a graph.
> 
> IGV screenshot of before fix 
> <img width="314" height="675" alt="Screenshot 2025-09-15 at 11 56 54" src="https://github.com/user-attachments/assets/3489f580-f4a3-4f22-86a6-0d9351f1d143" />
> 
> IGV screenshot of after fix
> <img width="314" height="652" alt="Screenshot 2025-09-15 at 11 54 55" src="https://github.com/user-attachments/assets/239c7d60-7a2a-4608-acd0-036bab3ae048" />

This pull request has now been integrated.

Changeset: 6df01178
Author:    Saranya Natarajan <snatarajan at openjdk.org>
Committer: Manuel H?ssig <mhaessig at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/6df01178c03968bee7994eddd187f790c74ba541
Stats:     1 line in 1 file changed: 0 ins; 0 del; 1 mod

8356779: IGV: dump the index of the SafePointNode containing the current JVMS during parsing

Reviewed-by: epeter, chagedorn, qamai

-------------

PR: https://git.openjdk.org/jdk/pull/27083

From dlunden at openjdk.org  Wed Sep 17 09:52:03 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 17 Sep 2025 09:52:03 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
Message-ID: <OtWEtUlWaOy4SQOezWhifr0iuXOPLKuhMu0jIzvH3PU=.8022ef56-15e2-4fe6-944c-db80285c4b12@github.com>

On Tue, 16 Sep 2025 12:02:13 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> src/hotspot/share/opto/chaitin.cpp line 1663:
> 
>> 1661:     if (!OptoReg::is_valid(reg) && is_infinite_stack) {
>> 1662:       // Bump register mask up to next stack chunk
>> 1663:       bool success = lrg->rollover();
> 
> Can you add a comment that explains what this does / means? Do we start spilling to the stack slots instead of using registers?

Sure, I'll expand on the existing comment.

> src/hotspot/share/opto/regmask.hpp line 837:
> 
>> 835:   // ----------------------------------------------------------------------
>> 836:   // The methods below are only for testing purposes (see test_regmask.cpp)
>> 837:   // ----------------------------------------------------------------------
> 
> I wonder if it could be solved with `friend` instead, so it does not have to be public and get accidentally used somehow.
> 
> Or maybe some `gtest_` prefix? Not sure.

I like adding a `gtest_` prefix, I'll do that. Not sure how to make gtests work with `friend`.

> test/hotspot/jtreg/compiler/arguments/TestMethodArguments.java line 51:
> 
>> 49:     static final int INPUT_SIZE = 100;
>> 50: 
>> 51:     public static Template.ZeroArgs generateTest(PrimitiveType t, int numberOfArguments) {
> 
> You should write out `type` instead of `t`, would make it consistent with your `let` below.

Thanks, I'll fix it

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2354938220
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2354945861
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2354946296

From dlunden at openjdk.org  Wed Sep 17 09:55:35 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 17 Sep 2025 09:55:35 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
Message-ID: <feRNgjFxa1MZDl41muWzG13bQfvr1EjhiH7GcMSj_I4=.731caa37-c1d1-404a-8f3d-1030b4c97a05@github.com>

On Tue, 16 Sep 2025 12:42:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits:
>> 
>>  - Clarify comments in regmask.hpp
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Address review comments (renaming on the way in a separate PR)
>>  - Update src/hotspot/share/opto/regmask.hpp
>>    
>>    Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
>>  - Restore modified java/lang/invoke tests
>>  - Sort includes (new requirement)
>>  - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
>>  - Add clarifying comments at definitions of register mask sizes
>>  - Fix implicit zero and nullptr checks
>>  - Add deep copy comment
>>  - ... and 29 more: https://git.openjdk.org/jdk/compare/60930a3e...c1f41288
>
> test/hotspot/jtreg/compiler/arguments/TestMethodArguments.java line 120:
> 
>> 118:                 Template.let("classpath", comp.getEscapedClassPathOfCompiledClasses()),
>> 119:                 """
>> 120:                         import java.util.Arrays;
> 
> Personally, I would not indent this deeply. I know that the generated code will not have proper indentation, but that's no so bad. Readability of the Templates is more important I think. Subjective though.

No strong opinion here, I just went with the eclipse-jdtls autoformatter defaults. The generated code does have fairly OK indentation (the indentation in the code does not add any actual indentation in the generated code). Let me know what you prefer and I'll update it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2354957995

From shade at openjdk.org  Wed Sep 17 09:56:23 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Sep 2025 09:56:23 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
 [v2]
In-Reply-To: <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>
References: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
 <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>
Message-ID: <KBgV6ORtkFKCFs8HIu7scDOpMyamN7R7oxOMovTwKYI=.7bff04a3-797e-4134-a1ad-3d2ccf925ec5@github.com>

On Mon, 15 Sep 2025 14:27:28 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the bug for discussion what issues current machinery has. 
>> 
>> This PR executes the plan outlined in the bug:
>>  1. Common the receiver type profiling code in interpreter and C1
>>  2. Rewrite receiver type profiling code to only do atomic receiver slot installations
>>  3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed 
>> 
>> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
>> 
>> Additional testing:
>>   - [x] Linux x86_64 server fastdebug, `compiler/`
>>   - [ ] Linux x86_64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>  - Drop atomic counters
>  - Initial version

Looking for reviews! @dean-long, @vnkozlov, @veresov -- you would probably be interested in this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3302217390

From dlunden at openjdk.org  Wed Sep 17 09:58:56 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 17 Sep 2025 09:58:56 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v23]
In-Reply-To: <gS-eZ4OguAN-N_CI_x4TipmDjyr1XiPce49X0z3AWc4=.afc55282-4ad2-4d65-9912-d513d11585a3@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <KuhZYofHDkGkzw1Kq6vDvRs4_aDxOJDbTpIL8gnkQL8=.0d25e4bc-1f73-490f-a65b-29bef7ac8903@github.com>
 <qfJuLa2rYGYnrmbp32LpJgVaZfShvNjVkGOuJrSw00A=.5f7b712d-5700-45b5-8beb-fde3611e31de@github.com>
 <GjF5qX4BV-4xAWV6kDweN3luDSVQXxxp5i6creb7_L4=.085a85af-0ec5-42ca-a076-bbf554853d3a@github.com>
 <gS-eZ4OguAN-N_CI_x4TipmDjyr1XiPce49X0z3AWc4=.afc55282-4ad2-4d65-9912-d513d11585a3@github.com>
Message-ID: <nHJuTQPo-3ZXu6X2rLAIwNzjnTTreNLkt5KuNeR-3mY=.183e982f-fa66-4ddc-b997-8a8203fe16db@github.com>

On Tue, 16 Sep 2025 14:36:59 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> @eme64 I have now addressed your comments (the renaming is in https://github.com/openjdk/jdk/pull/27215, as requested). Please have a look and let me know if I've missed something.
>
> @dlunde Thanks for the swift updates! I have in the meantime added some more comments, just making sure you don't miss them :)

@eme64

> You seem to have a build failure:
> 
> ```
> In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/compile.hpp:43,
>                  from /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:29,
>                  from /home/runner/work/jdk/jdk/test/hotspot/gtest/opto/test_rangeinference.cpp:26:
> /home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp: In constructor ?RegMask::RegMask(Arena*)?:
> /home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp:441:53: error: class ?RegMask? does not have any field named ?_read_only?
>   441 |       : _rm_word() DEBUG_ONLY(COMMA _arena(arena)), _read_only(read_only),
>       |                                                     ^~~~~~~~~~
> /home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp:441:64: error: ?read_only? was not declared in this scope
>   441 |       : _rm_word() DEBUG_ONLY(COMMA _arena(arena)), _read_only(read_only),
>       |     
> ```

Thanks, only failed on release so didn't notice. Will fix.

> I really appreciate that you added extensive `gtest`s, thanks for that ?

@robcasloz contributed 90% of that, so the credit goes to him!

> And thanks for using the Template Framework, I'm curious to hear if you have any feedback on it :)

Sure, it was quite convenient. Happy to talk about the experience offline.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3302233210

From dlunden at openjdk.org  Wed Sep 17 10:07:58 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 17 Sep 2025 10:07:58 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v29]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <IJFPzAZnUjnsVQZOmcCfGEa-vgUTBCSYvIkTAzWgFyo=.2a3d0da5-7fb5-48af-b598-77241699f350@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Update after comments from Emanuel

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20404/files
  - new: https://git.openjdk.org/jdk/pull/20404/files/fe69f5a3..9b04b5a7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=28
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=27-28

  Stats: 35 lines in 5 files changed: 6 ins; 0 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From chagedorn at openjdk.org  Wed Sep 17 10:42:03 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Wed, 17 Sep 2025 10:42:03 GMT
Subject: RFR: 8367728: IGV: dump node address type
In-Reply-To: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
References: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
Message-ID: <5-sK87ityBleluOy9_tQuCywbTT5b03KyEUMe9Yk7LQ=.6cb6ca6c-b0b0-417f-b3ad-732d422908b2@github.com>

On Tue, 16 Sep 2025 10:11:50 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset dumps the address type of each node (`Node::adr_type()`), when not null, into the IGV graphs. This should improve the visibility and diagnosability of C2 type inconsistencies, see e.g. [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667).
> 
> #### Testing
> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode).
> - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`).

Looks good!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27310#pullrequestreview-3233795276

From epeter at openjdk.org  Wed Sep 17 11:32:01 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 11:32:01 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode [v2]
In-Reply-To: <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
 <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
Message-ID: <YUojiha5vdS3nD8bQtNKLS3jkWMjFv_HybkUu8VR9Ak=.3ff535d9-9e2c-4a82-b4bf-ad5bd9ef2dbe@github.com>

On Mon, 15 Sep 2025 14:08:57 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:
>> 
>>  1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
>>  2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.
>> 
>> I think we should be running CTW tests in AWT headless mode to begin with. 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8367313-ctw-headless-mode
>  - Fix

Tests passed, yes :)
Approved!

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27187#pullrequestreview-3233980111

From shade at openjdk.org  Wed Sep 17 11:39:21 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Sep 2025 11:39:21 GMT
Subject: RFR: 8367313: CTW: Execute in AWT headless mode [v2]
In-Reply-To: <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
 <iybHTcZUmGuAWqzGCjVu1j9492b1ANCrc-IsWpf5qVs=.a1f74414-739f-4433-a1c6-653ec3c85e15@github.com>
Message-ID: <HtNZPkjpH9R4GeOIqIb0egda4zGNkRObwSioYLjuP7U=.8113d4fa-5d0c-49a6-824a-c386f4100a8a@github.com>

On Mon, 15 Sep 2025 14:08:57 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:
>> 
>>  1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
>>  2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.
>> 
>> I think we should be running CTW tests in AWT headless mode to begin with. 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8367313-ctw-headless-mode
>  - Fix

Thanks! Let's go.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27187#issuecomment-3302573510

From shade at openjdk.org  Wed Sep 17 11:39:22 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 17 Sep 2025 11:39:22 GMT
Subject: Integrated: 8367313: CTW: Execute in AWT headless mode
In-Reply-To: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
References: <d8Vd-xRvi6T6oizZvB3M7abUjT8ifE1n157D24UXGmA=.9370efd0-cc85-4190-bf23-fb7ca62301dd@github.com>
Message-ID: <YX6P5BCMUMazr41KJpFrkUYOlncTfuPyHiwXaQSjAYY=.0468fb52-8810-4640-b7a7-ab7895824610@github.com>

On Wed, 10 Sep 2025 08:11:43 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> I have been doing CTW parallelization improvements, and noticed that some of the AWT clinits run and initialize graphics stack. This is awkward for a few reasons:
> 
>  1. We might be running on headless environment and these clinits could fail, shrinking the CTW testing scope.
>  2. There are dependencies in graphics stack initialization that break -- in one case in my parallelization tests, I have seen the VM crash due to uninitialized AWT lock, because randomized CTW runner managed to execute clinits in unusual order. Running in headless mode avoids dealing with that path altogether.
> 
> I think we should be running CTW tests in AWT headless mode to begin with. 
> 
> Additional testing:
>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`

This pull request has now been integrated.

Changeset: 7e738f0d
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/7e738f0d906e574706a277fabbc2cc1df6f11f19
Stats:     2 lines in 1 file changed: 2 ins; 0 del; 0 mod

8367313: CTW: Execute in AWT headless mode

Reviewed-by: epeter, kvn

-------------

PR: https://git.openjdk.org/jdk/pull/27187

From epeter at openjdk.org  Wed Sep 17 11:45:37 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 11:45:37 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v2]
In-Reply-To: <CvihFOm8NmwV8tPA1Qr1R4h5FpFfDv0PznlH39Wjb2g=.ea64c956-d341-4a38-967d-7e689de060fb@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
 <CvihFOm8NmwV8tPA1Qr1R4h5FpFfDv0PznlH39Wjb2g=.ea64c956-d341-4a38-967d-7e689de060fb@github.com>
Message-ID: <arfabe5cw4v8BihjkayJJfK5sWdl2dcS8t3qXjPFLoM=.7b8069d6-5a01-4e51-a057-44d51f5573f2@github.com>

On Wed, 17 Sep 2025 09:03:48 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   for Manuel
>
> Thank you for your continued effort on this, @eme64! The overall change looks good to me, but I have a few minor suggestions and questions.

@mhaessig Thanks for the comments! I realized I had some extra code comments "pending" on github, so I added them now.

@mhaessig Ready for re-review ;)

> src/hotspot/share/opto/superwordVTransformBuilder.cpp line 143:
> 
>> 141:       init_req_with_scalar(n, vtn, MemNode::ValueIn);
>> 142:       add_memory_dependencies_of_node_to_vtnode(n, vtn, vtn_memory_dependencies);
>> 143:     } else if (n->isa_CountedLoop()) {
> 
> Suggestion:
> 
>     } else if (n->is_CountedLoop()) {
> 
> This is an implicit `!= nullptr`  otherwise.

Good catch!

> src/hotspot/share/opto/vectorization.cpp line 228:
> 
>> 226:       PhiNode* head = _heads.at(alias_idx);
>> 227:       if (head == nullptr) {
>> 228:         // We did not find a phi on this slice yet -> must be a slice with only loads.
> 
> Could you elaborate for my understanding why this is? Could this not find the load before the phi?

We loop over `_body.body()`, which is already topologically ordered. So if the `load` depends on the `phi` on the memory graph, then the `phi` must already have been found.

I'll add a comment in the code.

> src/hotspot/share/opto/vtransform.hpp line 30:
> 
>> 28: #include "opto/vectorization.hpp"
>> 29: #include "opto/vectornode.hpp"
>> 30: #include "utilities/debug.hpp"
> 
> Am I missing something, because I cannot make out the use?

Good catch!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27208#issuecomment-3302599208
PR Comment: https://git.openjdk.org/jdk/pull/27208#issuecomment-3302600873
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2355221323
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2355213338
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2355228785

From epeter at openjdk.org  Wed Sep 17 11:45:57 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 11:45:57 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v2]
In-Reply-To: <i0uwzmuCcxXGG4d4-ronPIB3wETW3sHL6M5yRuHkfA4=.45c69c45-84a0-4234-96f6-eabe01b42a69@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
 <i0uwzmuCcxXGG4d4-ronPIB3wETW3sHL6M5yRuHkfA4=.45c69c45-84a0-4234-96f6-eabe01b42a69@github.com>
Message-ID: <e_07k0GWUu9mzidOroBKe97Il-l1GJXd6ggPj8-QIIU=.b8d8c5da-5aa3-49f6-92b0-436bca39c2ee@github.com>

On Wed, 17 Sep 2025 11:42:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ------------------------------
>> 
>> **Goals**
>> - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
>> - Remove `_nodes` from the vector vtnodes.
>> 
>> **Details**
>> - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
>>   - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
>> - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
>> - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
>> - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.
>> 
>> I also made a lot of annotations in the code below, for easier review.
>> 
>> **Suggested order for review**
>> - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
>> - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
>> - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
>> - `VTransformApplyState`: how it now tracks the memory state.
>> - `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
>> - Then look at all the other details.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   for Manuel

src/hotspot/share/opto/phasetype.hpp line 95:

> 93:   flags(AUTO_VECTORIZATION4_AFTER_SPECULATIVE_RUNTIME_CHECKS, "AutoVectorization 3, after Adding Speculative Runtime Checks") \
> 94:   flags(AUTO_VECTORIZATION5_AFTER_APPLY,                      "AutoVectorization 4, after Apply") \
> 95:   flags(BEFORE_CCP1,                    "Before PhaseCCP 1") \

Removing `apply_memops_reordering_with_schedule`.

src/hotspot/share/opto/superword.cpp line 668:

> 666: }
> 667: 
> 668: // Get all memory nodes of a slice, in reverse order

Refactored and moved to `vectorization.hpp`, where the it belongs.

src/hotspot/share/opto/superword.cpp line 670:

> 668:   // Iterate over all memory phis
> 669:   for (DUIterator_Fast imax, i = cl->fast_outs(imax); i < imax; i++) {
> 670:     PhiNode* phi = cl->fast_out(i)->isa_Phi();

Note: the old way only tracked memory slices that have a phi (i.e. slices that have stores). But we now also need to track slices that only have loads, and hence no phi.

src/hotspot/share/opto/superword.cpp line 1555:

> 1553:     assert(pack != nullptr, "memop of final solution must still be packed");
> 1554:     _vpointer_for_main_loop_alignment = &vpointer(mem);
> 1555:     _aw_for_main_loop_alignment = pack->size() * mem->memory_size();

Later, we only need the `VPointer`, and not the `mem` node itself. This removes the dependency on `_nodes` for vtnodes.

src/hotspot/share/opto/superword.cpp line 1994:

> 1992: }
> 1993: 
> 1994: void VTransformGraph::apply_vectorization_for_each_vtnode(uint& max_vector_length, uint& max_vector_width) const {

We now create the memory graph from scratch, during `apply`, `apply_backedge` and `apply_state.fix_memory_state_uses_after_loop`. The `VTransformApplyState` keeps track of the memory states.

src/hotspot/share/opto/superword.cpp line 2675:

> 2673:   for (uint i = 0; i < pack->size(); i++) {
> 2674:     Node* n = pack->at(i);
> 2675:     assert(n->is_Load(), "only meaningful for loads");

We can use the `pack` to access the nodes during construction of the `VTransform`, and we do not need to keep the `pack` nodes in the `_nodes` any more.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 59:

> 57:   for (uint i = 0; i < _vloop.lpt()->_body.size(); i++) {
> 58:     Node* n = _vloop.lpt()->_body.at(i);
> 59:     if (_packset.get_pack(n) != nullptr) { continue; }

Create nodes for all nodes in the loop, not just the basic block.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 71:

> 69:       vtn = new (_vtransform.arena()) VTransformCountedLoopNode(_vtransform, n->as_CountedLoop());
> 70:     } else if (n->is_CFG()) {
> 71:       vtn = new (_vtransform.arena()) VTransformCFGNode(_vtransform, n);

`CountedLoop` is special case of `CFG`

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 147:

> 145:       init_req_with_scalar(n, vtn, LoopNode::EntryControl);
> 146:       init_req_with_scalar(n, vtn, LoopNode::LoopBackControl);
> 147:     } else {

Also map the backedges of `Phi` and `CountedLoop` - we are mapping the whole loop!

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 178:

> 176:   }
> 177: }
> 178: 

We also create `Outer` vtnodes for all uses after the loop. Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 212:

> 210:     vtn = new (_vtransform.arena()) VTransformElementWiseVectorNode(_vtransform, p0->req(), properties, vopc);
> 211:   }
> 212:   vtn->set_nodes(pack);

We don't need `_nodes` any more!

src/hotspot/share/opto/vectorization.cpp line 190:

> 188:   }
> 189: 
> 190:   _memory_slices.find_memory_slices();

`VLoopMemorySlices` needs the body as input, so compute it earlier!

src/hotspot/share/opto/vectorization.cpp line 212:

> 210: // - No memory phi: only loads. All have the same input memory state from before the loop.
> 211: // - With memory phi. Chain of memory operations inside the loop.
> 212: void VLoopMemorySlices::find_memory_slices() {

See `VLoopMemorySlices` for more documentation on the cases.

src/hotspot/share/opto/vectorization.hpp line 382:

> 380: };
> 381: 
> 382: // Submodule of VLoopAnalyzer.

Refactored and moved down.

src/hotspot/share/opto/vectorization.hpp line 474:

> 472:   const VLoopBody& _body;
> 473: 
> 474:   GrowableArray<Node*>    _inputs;

We used to only track slices with phis (store in the loop), and not those with only loads (no phi needed). But now we need to also know the input memory slice for loads during `apply`, when we call `apply_state.memory_state`.

src/hotspot/share/opto/vtransform.cpp line 83:

> 81: 
> 82:         // Skip LoopPhi backedge.
> 83:         if ((use->isa_LoopPhi() != nullptr || use->isa_CountedLoop() != nullptr) && use->in_req(2) == vtn) { continue; }

We now also map the `Phi` and `CountedLoop` backedges, but for scheduling we need to ignore them to get a DAG.

src/hotspot/share/opto/vtransform.cpp line 778:

> 776:     }
> 777:   }
> 778: }

We now systematically use the edges of the vtnodes when building the graph. Before we just relied on the old C2 node edges still being correct, but we need to get away from this to allow more graph reshaping on the vtnodes later.

src/hotspot/share/opto/vtransform.cpp line 787:

> 785:   if (_node->is_Store()) {
> 786:     apply_state.set_memory_state(_node->adr_type(), _node);
> 787:   }

We build the memory graph on the fly, instead of first reordering the scalar mem nodes with `apply_memops_reordering_with_schedule`.

src/hotspot/share/opto/vtransform.cpp line 914:

> 912:     Node* n = _nodes.at(i);
> 913:     phase->igvn().replace_node(n, vn);
> 914:   }

We don't need to replace the old nodes any more: since we now systematically use the vtnode edges, the old nodes simply get disconnected. This is also why we need to map all use nodes after the loop with `Outer` vtnodes, so that they then automatically change the edges to the new nodes during `apply`.

See `VTransformOuterNode::apply` uses `apply_vtn_inputs_to_node`.

src/hotspot/share/opto/vtransform.cpp line 955:

> 953:   });
> 954: }
> 955: 

Obsolete after removal of `apply_memops_reordering_with_schedule`.

src/hotspot/share/opto/vtransform.hpp line 191:

> 189: 
> 190:   template<typename Callback>
> 191:   void for_each_memop_in_schedule(Callback callback) const;

Obsolete after removal of `apply_memops_reordering_with_schedule`.

src/hotspot/share/opto/vtransform.hpp line 293:

> 291:   // loop. If there is a memory phi, this is initially the memory phi, and each time
> 292:   // a store is processed, it is updated to that store.
> 293:   GrowableArray<Node*> _memory_states;

Needed to build the memory graph on the fly during `apply`.

src/hotspot/share/opto/vtransform.hpp line 452:

> 450:   virtual VTransformApplyResult apply(VTransformApplyState& apply_state) const = 0;
> 451: 
> 452:   Node* find_transformed_input(int i, const GrowableArray<Node*>& vnode_idx_to_transformed_node) const;

Missed the removal in an earlier refactoring. Let's do it now.

src/hotspot/share/opto/vtransform.hpp line 636:

> 634:   const VTransformVectorNodeProperties _properties;
> 635: protected:
> 636:   GrowableArray<Node*> _nodes;

Big win! Saves us some memory per node, and means the vector nodes are no longer tied to scalar nodes. We will soon be able to optimimize the graph with vector nodes that have no scalar equivalent. For example shuffle.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343516365
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343519154
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343562260
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343521196
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343524759
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343515369
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343529810
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343527996
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343533310
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343540827
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343541422
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343544731
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343546719
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343548855
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343570394
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343553989
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343577532
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343580534
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343593023
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343595455
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343598080
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343600818
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343602701
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343608818

From epeter at openjdk.org  Wed Sep 17 11:45:34 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 11:45:34 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v2]
In-Reply-To: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
Message-ID: <i0uwzmuCcxXGG4d4-ronPIB3wETW3sHL6M5yRuHkfA4=.45c69c45-84a0-4234-96f6-eabe01b42a69@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ------------------------------
> 
> **Goals**
> - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
> - Remove `_nodes` from the vector vtnodes.
> 
> **Details**
> - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
>   - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
> - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
> - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
> - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.
> 
> I also made a lot of annotations in the code below, for easier review.
> 
> **Suggested order for review**
> - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
> - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
> - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
> - `VTransformApplyState`: how it now tracks the memory state.
> - `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
> - Then look at all the other details.

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  for Manuel

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27208/files
  - new: https://git.openjdk.org/jdk/pull/27208/files/3ec3ea2a..469426a7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=00-01

  Stats: 4 lines in 3 files changed: 2 ins; 1 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27208.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27208/head:pull/27208

PR: https://git.openjdk.org/jdk/pull/27208

From epeter at openjdk.org  Wed Sep 17 11:45:58 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 11:45:58 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v2]
In-Reply-To: <e_07k0GWUu9mzidOroBKe97Il-l1GJXd6ggPj8-QIIU=.b8d8c5da-5aa3-49f6-92b0-436bca39c2ee@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
 <i0uwzmuCcxXGG4d4-ronPIB3wETW3sHL6M5yRuHkfA4=.45c69c45-84a0-4234-96f6-eabe01b42a69@github.com>
 <e_07k0GWUu9mzidOroBKe97Il-l1GJXd6ggPj8-QIIU=.b8d8c5da-5aa3-49f6-92b0-436bca39c2ee@github.com>
Message-ID: <fl4GnLHqdqPsSbOzEuZRQUCjAgW4l-u9XDyks5uhx7E=.d60285df-f519-4058-ae3b-af768946f077@github.com>

On Fri, 12 Sep 2025 09:09:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   for Manuel
>
> src/hotspot/share/opto/vectorization.cpp line 212:
> 
>> 210: // - No memory phi: only loads. All have the same input memory state from before the loop.
>> 211: // - With memory phi. Chain of memory operations inside the loop.
>> 212: void VLoopMemorySlices::find_memory_slices() {
> 
> See `VLoopMemorySlices` for more documentation on the cases.

Note: we used to only track slices with phis (i.e. with stores on the slice), and not those that have only loads (and hence no phi).

> src/hotspot/share/opto/vectorization.hpp line 382:
> 
>> 380: };
>> 381: 
>> 382: // Submodule of VLoopAnalyzer.
> 
> Refactored and moved down.

Needs to be after `VLoopBody`, because `VLoopMemorySlices` now needs `VLoopBody` as input.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343565551
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2343551782

From chagedorn at openjdk.org  Wed Sep 17 12:00:38 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Wed, 17 Sep 2025 12:00:38 GMT
Subject: RFR: 8367721: Test compiler/arguments/TestCompileTaskTimeout.java
 crashed: SIGSEGV
In-Reply-To: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>
References: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>
Message-ID: <pLXUEKQrdaFVPHDxKNz09fxcxT4_YzwKQiHRbqqAc84=.0bfeb69f-3425-43c0-be52-222edb0bde65@github.com>

On Wed, 17 Sep 2025 06:57:29 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> The test `TestCompileTaskTimeout.java` runs `java -Xcomp -XX:CompileTaskTimeout=1 --version` to demonstrate that the timeout works. Part of the timeout working involves it printing the method of the compile task. Inspecting the core file of the execution that failed with a `SIGSEGV` in the compile task timeout signal handler, the backtrace looks as follows:
> 
> #n   <called signal handler>
> #n+1 CompilerThreadTimeoutLinux::signal_handler()
> #n+2 <called signal handler>
> #n+3 timer_settime()
> #n+4 CompilerThreadTimeoutLinux::disarm()
> #n+5 CompileTaskWrapper::~CompileTaskWrapper()
> 
> So, the compile task hit the timeout during destruction of the underlying `CompileTaskWrapper`. Since the timeout was disarmed only after setting the task to null in the destructor, the signal handler segfaulted when trying to access the method of the compile task to print it out. This PR addresses this issue by moving up the disarmament of the timeout to the top of the destructor.
> 
> Because this issue can only be triggered with bad --- or good, depending on your view --- luck on timing, I could not devise a regression test. But this is not too big of an issue, since the CI already caught this issue.
> 
> Testing:
>  - [ ] Github Actions
>  - [ ] tier1,tier2,tier3 plus stress testing on Oracle supported platforms

Looks good to me, too!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27331#pullrequestreview-3234082978

From syan at openjdk.org  Wed Sep 17 12:34:06 2025
From: syan at openjdk.org (SendaoYan)
Date: Wed, 17 Sep 2025 12:34:06 GMT
Subject: RFR: 8367278: Test compiler/startup/StartupOutput.java timed out
 after completion on Windows
In-Reply-To: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
References: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
Message-ID: <e1efsdO8KaPO6d0URflGVOP0BfiVyuxhJ_gBUflYIVU=.f3885612-2de1-4b13-9135-e7cde6ac8440@github.com>

On Fri, 12 Sep 2025 09:56:24 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

> ## Problem
> After [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555) changed the default TIMEOUT_FACTOR from 4 to 1, the test compiler/startup/StartupOutput.java can occasionally slightly exceed the 2-minute timeout on Windows.
> 
> ## Change
> Rather than increasing the timeout, this change reduces the number of VM runs with randomly generated near-minimum code cache sizes from 200 to 50. This should still provide sufficient coverage while keeping execution well within the timeout.
> 
> ## Testing:
> Tiers 1-3+

LGTM

-------------

Marked as reviewed by syan (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27254#pullrequestreview-3234207633

From rcastanedalo at openjdk.org  Wed Sep 17 12:51:19 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 17 Sep 2025 12:51:19 GMT
Subject: RFR: 8367728: IGV: dump node address type
In-Reply-To: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
References: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
Message-ID: <E-lkJornlB0lWH2reGCp7IAOac7IGLmgChTR_Gvqb_4=.c76c51ca-0340-420b-8444-6a63404a15e6@github.com>

On Tue, 16 Sep 2025 10:11:50 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset dumps the address type of each node (`Node::adr_type()`), when not null, into the IGV graphs. This should improve the visibility and diagnosability of C2 type inconsistencies, see e.g. [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667).
> 
> #### Testing
> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode).
> - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`).

Thanks for reviewing Marc, Damon, and Christian!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27310#issuecomment-3302823069

From rcastanedalo at openjdk.org  Wed Sep 17 12:51:20 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 17 Sep 2025 12:51:20 GMT
Subject: Integrated: 8367728: IGV: dump node address type
In-Reply-To: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
References: <M9c90F3oUYUfELtyNkyZ4TtBXaYA9U1OW-DaQGCq1hs=.61ac61b7-4004-4fc2-8658-7a3f7daa3d7b@github.com>
Message-ID: <_Dj-64Fqwn5MpGBIUoIikNcv1gldIO93cgJr15py53U=.274b31d7-f5a8-47ba-9999-8847bb5d95d4@github.com>

On Tue, 16 Sep 2025 10:11:50 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset dumps the address type of each node (`Node::adr_type()`), when not null, into the IGV graphs. This should improve the visibility and diagnosability of C2 type inconsistencies, see e.g. [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667).
> 
> #### Testing
> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode).
> - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`).

This pull request has now been integrated.

Changeset: b00e0dae
Author:    Roberto Casta?eda Lozano <rcastanedalo at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/b00e0dae9bbd4bd88f8e7307b7c96688fa3194fe
Stats:     5 lines in 1 file changed: 5 ins; 0 del; 0 mod

8367728: IGV: dump node address type

Reviewed-by: mchevalier, dfenacci, chagedorn

-------------

PR: https://git.openjdk.org/jdk/pull/27310

From syan at openjdk.org  Wed Sep 17 13:13:14 2025
From: syan at openjdk.org (SendaoYan)
Date: Wed, 17 Sep 2025 13:13:14 GMT
Subject: RFR: 8367278: Test compiler/startup/StartupOutput.java timed out
 after completion on Windows
In-Reply-To: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
References: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
Message-ID: <a-3PpOX7mJnmJPkcYL2iiyGlbodmYhPOGgxrkT3Fo90=.2794618f-aded-4966-9afc-b924451c08dd@github.com>

On Fri, 12 Sep 2025 09:56:24 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

> ## Problem
> After [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555) changed the default TIMEOUT_FACTOR from 4 to 1, the test compiler/startup/StartupOutput.java can occasionally slightly exceed the 2-minute timeout on Windows.
> 
> ## Change
> Rather than increasing the timeout, this change reduces the number of VM runs with randomly generated near-minimum code cache sizes from 200 to 50. This should still provide sufficient coverage while keeping execution well within the timeout.
> 
> ## Testing:
> Tiers 1-3+

GHA shows GetStackTraceALotWhenPinned.java timed out on macos. The failure has been fixed by [JDK-8366893](https://bugs.openjdk.org/browse/JDK-8366893). I think you can merge the master first.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27254#issuecomment-3302952057

From aturbanov at openjdk.org  Wed Sep 17 14:06:49 2025
From: aturbanov at openjdk.org (Andrey Turbanov)
Date: Wed, 17 Sep 2025 14:06:49 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v5]
In-Reply-To: <2gGUfvVlIaLGOd5iJUN3-oi9jlytrkULE3WZRUX1x78=.c0da1562-8cac-4215-9ae4-5cb248c89c0b@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <2gGUfvVlIaLGOd5iJUN3-oi9jlytrkULE3WZRUX1x78=.c0da1562-8cac-4215-9ae4-5cb248c89c0b@github.com>
Message-ID: <nC1nx2doovrcPikWnyA2rZHwr-nz27wJCvJmsuM8zzs=.b38d26ee-ceed-4725-838b-f7773f1e6f5a@github.com>

On Tue, 16 Sep 2025 07:57:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Demo from here:
>> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
>> 
>> Cleaned up and enhanced with a JTREG and IR test.
>> I also added some additional "generated" normal maps from height functions.
>> And I display the resulting image side-by-side with the normal map.
>> 
>> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
>> 
>> There is a **stand-alone** way to run the demo:
>> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
>> (though it may only run with JDK22+, probably due some amber features)
>> 
>> **Quick Perforance Numbers**, running on my avx512 laptop.
>> default / AVX3: 105 FPS
>> AVX2: 82 FPS
>> AVX1: 50 FPS
>> No vectorization: 19 FPS
>> GraalJIT: 13 FPS (`jdk-26-ea+5` - probably issue with vectorization / inlining?)
>> 
>> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
>> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
>> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
>> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
>> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   more for Christian

test/hotspot/jtreg/compiler/gallery/TestNormalMapping.java line 58:

> 56:         System.out.println("Running JTREG test in mode: " + mode);
> 57: 
> 58:         switch(mode) {

nit
Suggestion:

        switch (mode) {

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27282#discussion_r2355646822

From roland at openjdk.org  Wed Sep 17 14:14:19 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Wed, 17 Sep 2025 14:14:19 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <_YXE9yfxaouyeyMsdurEy_uEx0FJDbGcX8M8L7aDqm0=.770ff0aa-8ae3-46ac-8cc1-7d38710e859e@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
 <BuKfkAAcusJ6TNHSHtVaYYcmjnAVTIInXbhd4Z5Fg5w=.067f6b09-67e0-4b97-9753-c727c67343ca@github.com>
 <PsAetiA4N_lr7Mz7DJKMP7v-pVoRV9LZTvDC0tuNvWw=.4da18f35-c089-4926-a5d4-bcafcb3ab0e3@github.com>
 <_YXE9yfxaouyeyMsdurEy_uEx0FJDbGcX8M8L7aDqm0=.770ff0aa-8ae3-46ac-8cc1-7d38710e859e@github.com>
Message-ID: <_vAArE_XdQUT4nJdyLfvzOzkK87h4e3BtV_KhET-Uuk=.36074582-9265-41f6-a686-d607facb915c@github.com>

On Fri, 12 Sep 2025 22:42:08 GMT, Dean Long <dlong at openjdk.org> wrote:

>> It's one of the things mentioned in that comment:
>> https://github.com/openjdk/jdk/pull/24570#issuecomment-2883651987
>> 
>> "I added asserts to catch cases where proj_out is called but the node has more than one matching projection. With those asserts, I caught some false positive/cases where we got lucky and worked around them by reworking the code so it doesn't use proj_out. That's the case in PhaseIdealLoop::intrinsify_fill(): we can end up there with more than one FramePtr projection because the code pattern used elsewhere is to add one more projection and let identical projections common during igvn. "
>
> Are we just lucky that we don't have the same problem with ReturnAdr here?

Yes, most likely.
This is also a pretty harmless corner case: if there is more than one `Parm` projection, the assert in `proj_out` catches it even though it does no harm to have more than one projection in this particular case. So this change is here, not to fix some broken code, but to make it possible to have a strict assert in `proj_out`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2355669590

From dfenacci at openjdk.org  Wed Sep 17 14:17:13 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Wed, 17 Sep 2025 14:17:13 GMT
Subject: RFR: 8367278: Test compiler/startup/StartupOutput.java timed out
 after completion on Windows [v2]
In-Reply-To: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
References: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
Message-ID: <TlhYtJCJ7iPOGbDE4FDhtPf3L3nfj95bOf6zUY5eH9I=.855e1d29-cfab-4c78-a0ca-b8acc6698f5e@github.com>

> ## Problem
> After [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555) changed the default TIMEOUT_FACTOR from 4 to 1, the test compiler/startup/StartupOutput.java can occasionally slightly exceed the 2-minute timeout on Windows.
> 
> ## Change
> Rather than increasing the timeout, this change reduces the number of VM runs with randomly generated near-minimum code cache sizes from 200 to 50. This should still provide sufficient coverage while keeping execution well within the timeout.
> 
> ## Testing:
> Tiers 1-3+

Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge branch 'master' into JDK-8367278
 - JDK-8367278: reduce loop to 50 cycles
 - JDK-8367278: Test compiler/startup/StartupOutput.java timed out after completion on Windows

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27254/files
  - new: https://git.openjdk.org/jdk/pull/27254/files/1b4149c8..8643e5fd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27254&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27254&range=00-01

  Stats: 38573 lines in 1133 files changed: 19462 ins; 10155 del; 8956 mod
  Patch: https://git.openjdk.org/jdk/pull/27254.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27254/head:pull/27254

PR: https://git.openjdk.org/jdk/pull/27254

From mhaessig at openjdk.org  Wed Sep 17 14:32:37 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 17 Sep 2025 14:32:37 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions
In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
Message-ID: <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>

On Thu, 21 Aug 2025 15:03:57 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
> 
> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
> 
> Details, in **order you should review**:
> - `Operations.java`: maps lots of primitive operators as Expressions.
> - `Expression.java`: the fundamental engine behind Expressions.
> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
> - `tests/TestExpression.java`: correctness test of Expression machinery.
> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
> 
> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
> 
> **Future Work**:
> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
> - Use `Expression`s to model more operations:
>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres...

Thank you for this enhancement, @eme64! It is nice to see the template framework library evolving.

The changes look good. I mostly have nits.

test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java line 31:

> 29:  * @library /test/lib /
> 30:  * @compile ../lib/verify/Verify.java
> 31:  * @run main compiler.igvn.ExpressionFuzzer

Since you are fuzzing, you might want to consider adding a compile task timeout in case the random methods cause degenerate compilations. Below is a suggestion for a timeout of 10 seconds, which should be plenty.
Suggestion:

 * @run main -XX:+IgnoreUnrecognizedVMOptions -XX:CompileTaskTimeout=10000 compiler.igvn.ExpressionFuzzer

test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java line 204:

> 202:         // once, and pass the same values into both the compiled and reference method.
> 203:         var valueTemplate = Template.make("name", "type", (String name, CodeGenerationDataNameType type) -> body(
> 204:             //"#type #name = ", type.con(), ";\n"

Suggestion:

test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 40:

> 38: /**
> 39:  * {@link Expression}s model Java expressions, that have a list of arguments with specified
> 40:  * argument types, and an result with a specified result type. Once can {@link #make} a new

Suggestion:

 * argument types, and a result with a specified result type. Once can {@link #make} a new

Nit: typo

test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 152:

> 150: 
> 151:     /**
> 152:      * Creates a new Espression with 1 arguments.

For every make(): s/Espression/Expression/

test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 164:

> 162:                                   CodeGenerationDataNameType t0,
> 163:                                   String s1) {
> 164:         return new Expression(returnType, List.of(t0), List.of(s0, s1), new Info());

To reduce code duplication, the methods without an additional info should probably use the ones with.
Suggestion:

        return make(returnType, s0, t0, s1, new Info());

test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 358:

> 356:             tokens.add(arguments.get(i));
> 357:         }
> 358:         tokens.add(strings.get(strings.size()-1));

Suggestion:

        tokens.add(strings.getLast());

A wee bit easier to read.

test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 380:

> 378:         }
> 379:         sb.append("\"");
> 380:         sb.append(this.strings.get(this.strings.size()-1));

Suggestion:

        sb.append(this.strings.getLast());

test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 465:

> 463:             newArgumentTypes.add(nestingExpression.argumentTypes.get(i));
> 464:         }
> 465:         newStrings.add(nestingExpression.strings.get(nestingExpression.strings.size() - 1) +

Suggestion:

        newStrings.add(nestingExpression.strings.getLast() +

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 1:

> 1: /*

I gave it my best shot to suggest a reasonable and reasonably consistent alignment.

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 67:

> 65:         ops.add(Expression.make(BYTES, "(byte)(", LONGS,    ")"));
> 66:         ops.add(Expression.make(BYTES, "(byte)(", FLOATS,   ")"));
> 67:         ops.add(Expression.make(BYTES, "(byte)(", DOUBLES,  ")"));

Suggestion:

        ops.add(Expression.make(BYTES, "(byte)(", BYTES,   ")"));
        ops.add(Expression.make(BYTES, "(byte)(", SHORTS,  ")"));
        ops.add(Expression.make(BYTES, "(byte)(", CHARS,   ")"));
        ops.add(Expression.make(BYTES, "(byte)(", INTS,    ")"));
        ops.add(Expression.make(BYTES, "(byte)(", LONGS,   ")"));
        ops.add(Expression.make(BYTES, "(byte)(", FLOATS,  ")"));
        ops.add(Expression.make(BYTES, "(byte)(", DOUBLES, ")"));

Whitespace example

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 78:

> 76:         ops.add(Expression.make(INTS, "Byte.compareUnsigned(", BYTES, ", ", BYTES, ")"));
> 77:         ops.add(Expression.make(INTS, "Byte.toUnsignedInt(", BYTES, ")"));
> 78:         ops.add(Expression.make(LONGS, "Byte.toUnsignedLong(", BYTES, ")"));

Suggestion:

        ops.add(Expression.make(INTS,  "Byte.compare(",         BYTES, ", ", BYTES, ")"));
        ops.add(Expression.make(INTS,  "Byte.compareUnsigned(", BYTES, ", ", BYTES, ")"));
        ops.add(Expression.make(INTS,  "Byte.toUnsignedInt(",   BYTES, ")"));
        ops.add(Expression.make(LONGS, "Byte.toUnsignedLong(",  BYTES, ")"));

Alignment

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 87:

> 85:         ops.add(Expression.make(CHARS, "(char)(", LONGS,    ")"));
> 86:         ops.add(Expression.make(CHARS, "(char)(", FLOATS,   ")"));
> 87:         ops.add(Expression.make(CHARS, "(char)(", DOUBLES,  ")"));

Suggestion:

        ops.add(Expression.make(CHARS, "(char)(", BYTES,   ")"));
        ops.add(Expression.make(CHARS, "(char)(", SHORTS,  ")"));
        ops.add(Expression.make(CHARS, "(char)(", CHARS,   ")"));
        ops.add(Expression.make(CHARS, "(char)(", INTS,    ")"));
        ops.add(Expression.make(CHARS, "(char)(", LONGS,   ")"));
        ops.add(Expression.make(CHARS, "(char)(", FLOATS,  ")"));
        ops.add(Expression.make(CHARS, "(char)(", DOUBLES, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 96:

> 94:         // ------------ Character -------------
> 95:         ops.add(Expression.make(INTS, "Character.compare(", CHARS, ", ", CHARS, ")"));
> 96:         ops.add(Expression.make(CHARS, "Character.reverseBytes(", CHARS, ")"));

Suggestion:

        ops.add(Expression.make(INTS,  "Character.compare(",      CHARS, ", ", CHARS, ")"));
        ops.add(Expression.make(CHARS, "Character.reverseBytes(", CHARS, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 105:

> 103:         ops.add(Expression.make(SHORTS, "(short)(", LONGS,    ")"));
> 104:         ops.add(Expression.make(SHORTS, "(short)(", FLOATS,   ")"));
> 105:         ops.add(Expression.make(SHORTS, "(short)(", DOUBLES,  ")"));

Suggestion:

        ops.add(Expression.make(SHORTS, "(short)(", BYTES,   ")"));
        ops.add(Expression.make(SHORTS, "(short)(", SHORTS,  ")"));
        ops.add(Expression.make(SHORTS, "(short)(", CHARS,   ")"));
        ops.add(Expression.make(SHORTS, "(short)(", INTS,    ")"));
        ops.add(Expression.make(SHORTS, "(short)(", LONGS,   ")"));
        ops.add(Expression.make(SHORTS, "(short)(", FLOATS,  ")"));
        ops.add(Expression.make(SHORTS, "(short)(", DOUBLES, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 117:

> 115:         ops.add(Expression.make(SHORTS, "Short.reverseBytes(", SHORTS, ")"));
> 116:         ops.add(Expression.make(INTS, "Short.toUnsignedInt(", SHORTS, ")"));
> 117:         ops.add(Expression.make(LONGS, "Short.toUnsignedLong(", SHORTS, ")"));

Suggestion:

        ops.add(Expression.make(INTS,   "Short.compare(",         SHORTS, ", ", SHORTS, ")"));
        ops.add(Expression.make(INTS,   "Short.compareUnsigned(", SHORTS, ", ", SHORTS, ")"));
        ops.add(Expression.make(SHORTS, "Short.reverseBytes(",    SHORTS, ")"));
        ops.add(Expression.make(INTS,   "Short.toUnsignedInt(",   SHORTS, ")"));
        ops.add(Expression.make(LONGS,  "Short.toUnsignedLong(",  SHORTS, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 126:

> 124:         ops.add(Expression.make(INTS, "(int)(", LONGS,    ")"));
> 125:         ops.add(Expression.make(INTS, "(int)(", FLOATS,   ")"));
> 126:         ops.add(Expression.make(INTS, "(int)(", DOUBLES,  ")"));

Suggestion:

        ops.add(Expression.make(INTS, "(int)(", BYTES,   ")"));
        ops.add(Expression.make(INTS, "(int)(", SHORTS,  ")"));
        ops.add(Expression.make(INTS, "(int)(", CHARS,   ")"));
        ops.add(Expression.make(INTS, "(int)(", INTS,    ")"));
        ops.add(Expression.make(INTS, "(int)(", LONGS,   ")"));
        ops.add(Expression.make(INTS, "(int)(", FLOATS,  ")"));
        ops.add(Expression.make(INTS, "(int)(", DOUBLES, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 137:

> 135:         ops.add(Expression.make(INTS, "(", INTS, " * ",   INTS, ")"));
> 136:         ops.add(Expression.make(INTS, "(", INTS, " / ",   INTS, ")", withArithmeticException));
> 137:         ops.add(Expression.make(INTS, "(", INTS, " % ",   INTS, ")", withArithmeticException));

Suggestion:

        ops.add(Expression.make(INTS, "(-(", INTS, "))"));
        ops.add(Expression.make(INTS, "(", INTS, " + ", INTS, ")"));
        ops.add(Expression.make(INTS, "(", INTS, " - ", INTS, ")"));
        ops.add(Expression.make(INTS, "(", INTS, " * ", INTS, ")"));
        ops.add(Expression.make(INTS, "(", INTS, " / ", INTS, ")", withArithmeticException));
        ops.add(Expression.make(INTS, "(", INTS, " % ", INTS, ")", withArithmeticException));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 154:

> 152:         ops.add(Expression.make(BOOLEANS, "(", INTS, " < ",    INTS, ")"));
> 153:         ops.add(Expression.make(BOOLEANS, "(", INTS, " >= ",   INTS, ")"));
> 154:         ops.add(Expression.make(BOOLEANS, "(", INTS, " <= ",   INTS, ")"));

Suggestion:

        ops.add(Expression.make(BOOLEANS, "(", INTS, " == ", INTS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", INTS, " != ", INTS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", INTS, " > ",  INTS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", INTS, " < ",  INTS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", INTS, " >= ", INTS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", INTS, " <= ", INTS, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 176:

> 174:         ops.add(Expression.make(INTS, "Integer.signum(", INTS, ")"));
> 175:         ops.add(Expression.make(INTS, "Integer.sum(", INTS, ", ", INTS, ")"));
> 176:         ops.add(Expression.make(LONGS, "Integer.toUnsignedLong(", INTS, ")"));

Suggestion:

        ops.add(Expression.make(INTS,  "Integer.bitCount(", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.compare(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.compareUnsigned(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.compress(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.divideUnsigned(", INTS, ", ", INTS, ")", withArithmeticException));
        ops.add(Expression.make(INTS,  "Integer.expand(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.highestOneBit(", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.lowestOneBit(", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.max(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.min(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.numberOfLeadingZeros(", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.numberOfTrailingZeros(", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.remainderUnsigned(", INTS, ", ", INTS, ")", withArithmeticException));
        ops.add(Expression.make(INTS,  "Integer.reverse(", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.reverseBytes(", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.rotateLeft(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.rotateRight(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.signum(", INTS, ")"));
        ops.add(Expression.make(INTS,  "Integer.sum(", INTS, ", ", INTS, ")"));
        ops.add(Expression.make(LONGS, "Integer.toUnsignedLong(", INTS, ")"));

Also aligning the arguments might be a bit much...

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 185:

> 183:         ops.add(Expression.make(LONGS, "(long)(", LONGS,    ")"));
> 184:         ops.add(Expression.make(LONGS, "(long)(", FLOATS,   ")"));
> 185:         ops.add(Expression.make(LONGS, "(long)(", DOUBLES,  ")"));

Suggestion:

        ops.add(Expression.make(LONGS, "(long)(", BYTES,   ")"));
        ops.add(Expression.make(LONGS, "(long)(", SHORTS,  ")"));
        ops.add(Expression.make(LONGS, "(long)(", CHARS,   ")"));
        ops.add(Expression.make(LONGS, "(long)(", INTS,    ")"));
        ops.add(Expression.make(LONGS, "(long)(", LONGS,   ")"));
        ops.add(Expression.make(LONGS, "(long)(", FLOATS,  ")"));
        ops.add(Expression.make(LONGS, "(long)(", DOUBLES, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 196:

> 194:         ops.add(Expression.make(LONGS, "(", LONGS, " * ",   LONGS, ")"));
> 195:         ops.add(Expression.make(LONGS, "(", LONGS, " / ",   LONGS, ")", withArithmeticException));
> 196:         ops.add(Expression.make(LONGS, "(", LONGS, " % ",   LONGS, ")", withArithmeticException));

Suggestion:

        ops.add(Expression.make(LONGS, "(-(", LONGS, "))"));
        ops.add(Expression.make(LONGS, "(", LONGS, " + ", LONGS, ")"));
        ops.add(Expression.make(LONGS, "(", LONGS, " - ", LONGS, ")"));
        ops.add(Expression.make(LONGS, "(", LONGS, " * ", LONGS, ")"));
        ops.add(Expression.make(LONGS, "(", LONGS, " / ", LONGS, ")", withArithmeticException));
        ops.add(Expression.make(LONGS, "(", LONGS, " % ", LONGS, ")", withArithmeticException));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 213:

> 211:         ops.add(Expression.make(BOOLEANS, "(", LONGS, " < ",    LONGS, ")"));
> 212:         ops.add(Expression.make(BOOLEANS, "(", LONGS, " >= ",   LONGS, ")"));
> 213:         ops.add(Expression.make(BOOLEANS, "(", LONGS, " <= ",   LONGS, ")"));

Suggestion:

        ops.add(Expression.make(BOOLEANS, "(", LONGS, " == ", LONGS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", LONGS, " != ", LONGS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", LONGS, " > ",  LONGS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", LONGS, " < ",  LONGS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", LONGS, " >= ", LONGS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", LONGS, " <= ", LONGS, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 234:

> 232:         ops.add(Expression.make(LONGS, "Long.rotateRight(", LONGS, ", ", INTS, ")"));
> 233:         ops.add(Expression.make(INTS, "Long.signum(", LONGS, ")"));
> 234:         ops.add(Expression.make(LONGS, "Long.sum(", LONGS, ", ", LONGS, ")"));

Suggestion:

        ops.add(Expression.make(INTS,  "Long.bitCount(", LONGS, ")"));
        ops.add(Expression.make(INTS,  "Long.compare(", LONGS, ", ", LONGS, ")"));
        ops.add(Expression.make(INTS,  "Long.compareUnsigned(", LONGS, ", ", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.compress(", LONGS, ", ", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.divideUnsigned(", LONGS, ", ", LONGS, ")", withArithmeticException));
        ops.add(Expression.make(LONGS, "Long.expand(", LONGS, ", ", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.highestOneBit(", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.lowestOneBit(", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.max(", LONGS, ", ", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.min(", LONGS, ", ", LONGS, ")"));
        ops.add(Expression.make(INTS,  "Long.numberOfLeadingZeros(", LONGS, ")"));
        ops.add(Expression.make(INTS,  "Long.numberOfTrailingZeros(", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.remainderUnsigned(", LONGS, ", ", LONGS, ")", withArithmeticException));
        ops.add(Expression.make(LONGS, "Long.reverse(", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.reverseBytes(", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.rotateLeft(", LONGS, ", ", INTS, ")"));
        ops.add(Expression.make(LONGS, "Long.rotateRight(", LONGS, ", ", INTS, ")"));
        ops.add(Expression.make(INTS,  "Long.signum(", LONGS, ")"));
        ops.add(Expression.make(LONGS, "Long.sum(", LONGS, ", ", LONGS, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 243:

> 241:         ops.add(Expression.make(FLOATS, "(float)(", LONGS,    ")"));
> 242:         ops.add(Expression.make(FLOATS, "(float)(", FLOATS,   ")"));
> 243:         ops.add(Expression.make(FLOATS, "(float)(", DOUBLES,  ")"));

Suggestion:

        ops.add(Expression.make(FLOATS, "(float)(", BYTES,   ")"));
        ops.add(Expression.make(FLOATS, "(float)(", SHORTS,  ")"));
        ops.add(Expression.make(FLOATS, "(float)(", CHARS,   ")"));
        ops.add(Expression.make(FLOATS, "(float)(", INTS,    ")"));
        ops.add(Expression.make(FLOATS, "(float)(", LONGS,   ")"));
        ops.add(Expression.make(FLOATS, "(float)(", FLOATS,  ")"));
        ops.add(Expression.make(FLOATS, "(float)(", DOUBLES, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 254:

> 252:         ops.add(Expression.make(FLOATS, "(", FLOATS, " * ",   FLOATS, ")"));
> 253:         ops.add(Expression.make(FLOATS, "(", FLOATS, " / ",   FLOATS, ")"));
> 254:         ops.add(Expression.make(FLOATS, "(", FLOATS, " % ",   FLOATS, ")"));

Suggestion:

        ops.add(Expression.make(FLOATS, "(", FLOATS, " + ", FLOATS, ")"));
        ops.add(Expression.make(FLOATS, "(", FLOATS, " - ", FLOATS, ")"));
        ops.add(Expression.make(FLOATS, "(", FLOATS, " * ", FLOATS, ")"));
        ops.add(Expression.make(FLOATS, "(", FLOATS, " / ", FLOATS, ")"));
        ops.add(Expression.make(FLOATS, "(", FLOATS, " % ", FLOATS, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 286:

> 284:         ops.add(Expression.make(DOUBLES, "(double)(", LONGS,    ")"));
> 285:         ops.add(Expression.make(DOUBLES, "(double)(", FLOATS,   ")"));
> 286:         ops.add(Expression.make(DOUBLES, "(double)(", DOUBLES,  ")"));

Suggestion:

        ops.add(Expression.make(DOUBLES, "(double)(", BYTES,   ")"));
        ops.add(Expression.make(DOUBLES, "(double)(", SHORTS,  ")"));
        ops.add(Expression.make(DOUBLES, "(double)(", CHARS,   ")"));
        ops.add(Expression.make(DOUBLES, "(double)(", INTS,    ")"));
        ops.add(Expression.make(DOUBLES, "(double)(", LONGS,   ")"));
        ops.add(Expression.make(DOUBLES, "(double)(", FLOATS,  ")"));
        ops.add(Expression.make(DOUBLES, "(double)(", DOUBLES, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 297:

> 295:         ops.add(Expression.make(DOUBLES, "(", DOUBLES, " * ",   DOUBLES, ")"));
> 296:         ops.add(Expression.make(DOUBLES, "(", DOUBLES, " / ",   DOUBLES, ")"));
> 297:         ops.add(Expression.make(DOUBLES, "(", DOUBLES, " % ",   DOUBLES, ")"));

Suggestion:

        ops.add(Expression.make(DOUBLES, "(-(", DOUBLES, "))"));
        ops.add(Expression.make(DOUBLES, "(", DOUBLES, " + ", DOUBLES, ")"));
        ops.add(Expression.make(DOUBLES, "(", DOUBLES, " - ", DOUBLES, ")"));
        ops.add(Expression.make(DOUBLES, "(", DOUBLES, " * ", DOUBLES, ")"));
        ops.add(Expression.make(DOUBLES, "(", DOUBLES, " / ", DOUBLES, ")"));
        ops.add(Expression.make(DOUBLES, "(", DOUBLES, " % ", DOUBLES, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 318:

> 316:         ops.add(Expression.make(DOUBLES, "Double.max(", DOUBLES, ", ", DOUBLES, ")"));
> 317:         ops.add(Expression.make(DOUBLES, "Double.min(", DOUBLES, ", ", DOUBLES, ")"));
> 318:         ops.add(Expression.make(DOUBLES, "Double.sum(", DOUBLES, ", ", DOUBLES, ")"));

Suggestion:

        ops.add(Expression.make(INTS,     "Double.compare(", DOUBLES, ", ", DOUBLES, ")"));
        ops.add(Expression.make(LONGS,    "Double.doubleToLongBits(", DOUBLES, ")"));
        // Note: there are multiple NaN values with different bit representations.
        ops.add(Expression.make(LONGS,    "Double.doubleToRawLongBits(", DOUBLES, ")", withNondeterministicResult));
        ops.add(Expression.make(DOUBLES,  "Double.longBitsToDouble(", LONGS, ")"));
        ops.add(Expression.make(BOOLEANS, "Double.isFinite(", DOUBLES, ")"));
        ops.add(Expression.make(BOOLEANS, "Double.isInfinite(", DOUBLES, ")"));
        ops.add(Expression.make(BOOLEANS, "Double.isNaN(", DOUBLES, ")"));
        ops.add(Expression.make(DOUBLES,  "Double.max(", DOUBLES, ", ", DOUBLES, ")"));
        ops.add(Expression.make(DOUBLES,  "Double.min(", DOUBLES, ", ", DOUBLES, ")"));
        ops.add(Expression.make(DOUBLES,  "Double.sum(", DOUBLES, ", ", DOUBLES, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 331:

> 329:         ops.add(Expression.make(BOOLEANS, "(", BOOLEANS, " || ",   BOOLEANS, ")"));
> 330:         ops.add(Expression.make(BOOLEANS, "(", BOOLEANS, " && ",   BOOLEANS, ")"));
> 331:         ops.add(Expression.make(BOOLEANS, "(", BOOLEANS, " ^ ",   BOOLEANS, ")"));

Suggestion:

        ops.add(Expression.make(BOOLEANS, "(!(", BOOLEANS, "))"));
        ops.add(Expression.make(BOOLEANS, "(", BOOLEANS, " || ", BOOLEANS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", BOOLEANS, " && ", BOOLEANS, ")"));
        ops.add(Expression.make(BOOLEANS, "(", BOOLEANS, " ^ ",  BOOLEANS, ")"));

test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 337:

> 335:         ops.add(Expression.make(BOOLEANS, "Boolean.logicalAnd(", BOOLEANS, ", ", BOOLEANS, ")"));
> 336:         ops.add(Expression.make(BOOLEANS, "Boolean.logicalOr(", BOOLEANS, ", ", BOOLEANS, ")"));
> 337:         ops.add(Expression.make(BOOLEANS, "Boolean.logicalXor(", BOOLEANS, ", ", BOOLEANS, ")"));

Suggestion:

        ops.add(Expression.make(INTS,     "Boolean.compare(",    BOOLEANS, ", ", BOOLEANS, ")"));
        ops.add(Expression.make(BOOLEANS, "Boolean.logicalAnd(", BOOLEANS, ", ", BOOLEANS, ")"));
        ops.add(Expression.make(BOOLEANS, "Boolean.logicalOr(",  BOOLEANS, ", ", BOOLEANS, ")"));
        ops.add(Expression.make(BOOLEANS, "Boolean.logicalXor(", BOOLEANS, ", ", BOOLEANS, ")"));

test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestExpressions.java line 27:

> 25:  * @test
> 26:  * @bug 8359412
> 27:  * @summary Demonstrate the use of Expressions form the Template Library.

Suggestion:

 * @summary Demonstrate the use of Expressions from the Template Library.

Typo

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26885#pullrequestreview-3233825268
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355431170
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355670847
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355097557
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355135345
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355130769
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355157237
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355161790
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355198561
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355213439
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355218584
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355223397
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355224657
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355226625
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355227306
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355231339
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355232551
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355236084
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355238196
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355246768
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355239424
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355240287
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355241763
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355248929
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355249864
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355251161
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355253694
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355254950
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355258668
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355259835
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355261549
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2355263778

From epeter at openjdk.org  Wed Sep 17 14:35:23 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 14:35:23 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v6]
In-Reply-To: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
Message-ID: <lzOofJ3qhJ7tovM62NZNLYKuSRGnY0xCLskI0OkqerM=.31fa5fad-0980-4b6d-ae72-ee4ac6b3f973@github.com>

> Demo from here:
> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
> 
> Cleaned up and enhanced with a JTREG and IR test.
> I also added some additional "generated" normal maps from height functions.
> And I display the resulting image side-by-side with the normal map.
> 
> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
> 
> There is a **stand-alone** way to run the demo:
> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
> (though it may only run with JDK22+, probably due some amber features)
> 
> **Quick Perforance Numbers**, running on my avx512 laptop.
> default / AVX3: 105 FPS
> AVX2: 82 FPS
> AVX1: 50 FPS
> No vectorization: 19 FPS
> GraalJIT: 13 FPS (`jdk-26-ea+5` - probably issue with vectorization / inlining?)
> 
> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />

Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:

 - Update test/hotspot/jtreg/compiler/gallery/TestNormalMapping.java
   
   Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>
 - Update test/hotspot/jtreg/compiler/gallery/NormalMapping.java
   
   Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27282/files
  - new: https://git.openjdk.org/jdk/pull/27282/files/806c9379..68ac841a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27282&range=04-05

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27282.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27282/head:pull/27282

PR: https://git.openjdk.org/jdk/pull/27282

From mhaessig at openjdk.org  Wed Sep 17 14:40:16 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 17 Sep 2025 14:40:16 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v2]
In-Reply-To: <i0uwzmuCcxXGG4d4-ronPIB3wETW3sHL6M5yRuHkfA4=.45c69c45-84a0-4234-96f6-eabe01b42a69@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
 <i0uwzmuCcxXGG4d4-ronPIB3wETW3sHL6M5yRuHkfA4=.45c69c45-84a0-4234-96f6-eabe01b42a69@github.com>
Message-ID: <5zLWoCC7_s5VBF435fL1hk_m9vsk5JQrdZ1tEipatFo=.bc502b75-7074-4923-8dce-d367eb1b71af@github.com>

On Wed, 17 Sep 2025 11:45:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ------------------------------
>> 
>> **Goals**
>> - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
>> - Remove `_nodes` from the vector vtnodes.
>> 
>> **Details**
>> - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
>>   - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
>> - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
>> - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
>> - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.
>> 
>> I also made a lot of annotations in the code below, for easier review.
>> 
>> **Suggested order for review**
>> - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
>> - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
>> - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
>> - `VTransformApplyState`: how it now tracks the memory state.
>> - `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
>> - Then look at all the other details.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   for Manuel

Thank you for addressing my comments and answering my question. Bar the new typo, this looks good to me.

src/hotspot/share/opto/vectorization.cpp line 215:

> 213:   Compile* C = _vloop.phase()->C;
> 214:   // We iterate over the body, which is topologically sorted. Hence, if there is a phi
> 215:   // in a slice, we will find it first, and the loads and stres afterwards.

Suggestion:

  // in a slice, we will find it first, and the loads and stores afterwards.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27208#pullrequestreview-3234807557
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2355753434

From chagedorn at openjdk.org  Wed Sep 17 15:04:50 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Wed, 17 Sep 2025 15:04:50 GMT
Subject: RFR: 8367278: Test compiler/startup/StartupOutput.java timed out
 after completion on Windows [v2]
In-Reply-To: <TlhYtJCJ7iPOGbDE4FDhtPf3L3nfj95bOf6zUY5eH9I=.855e1d29-cfab-4c78-a0ca-b8acc6698f5e@github.com>
References: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
 <TlhYtJCJ7iPOGbDE4FDhtPf3L3nfj95bOf6zUY5eH9I=.855e1d29-cfab-4c78-a0ca-b8acc6698f5e@github.com>
Message-ID: <S04C8UpdKSPcmNl9hueK3JodNxWVilp4nanrNedePZQ=.a862d374-b211-45c1-8024-507d2a37aed4@github.com>

On Wed, 17 Sep 2025 14:17:13 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> ## Problem
>> After [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555) changed the default TIMEOUT_FACTOR from 4 to 1, the test compiler/startup/StartupOutput.java can occasionally slightly exceed the 2-minute timeout on Windows.
>> 
>> ## Change
>> Rather than increasing the timeout, this change reduces the number of VM runs with randomly generated near-minimum code cache sizes from 200 to 50. This should still provide sufficient coverage while keeping execution well within the timeout.
>> 
>> ## Testing:
>> Tiers 1-3+
>
> Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8367278
>  - JDK-8367278: reduce loop to 50 cycles
>  - JDK-8367278: Test compiler/startup/StartupOutput.java timed out after completion on Windows

That looks reasonable to me, thanks for fixing it!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27254#pullrequestreview-3234935299

From epeter at openjdk.org  Wed Sep 17 15:22:38 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 15:22:38 GMT
Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add
 of unique value [v18]
In-Reply-To: <53Ado9oN1yU5hgOPU2feecxsArD5yoycn09ZWPNK4AQ=.69035bde-9bec-442e-8dc2-ddd268df9d07@github.com>
References: <uPac6xWF7cP6ux_dPbfCY-XhS0-cAjqO_kQxv9rF_AY=.3972e17c-c80a-4a28-bc91-f74ffe228544@github.com>
 <53Ado9oN1yU5hgOPU2feecxsArD5yoycn09ZWPNK4AQ=.69035bde-9bec-442e-8dc2-ddd268df9d07@github.com>
Message-ID: <gHuX3XkLqHlyEBajTg1mbIADUDKSg-v5bqRXTnsShdM=.30a8b4d3-b8ff-40d9-a96b-14c83842f90f@github.com>

On Tue, 26 Aug 2025 14:47:31 GMT, Kangcheng Xu <kxu at openjdk.org> wrote:

>> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR.
>> 
>> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`)
>> 
>> The following was implemented to address this issue.
>> 
>> if (UseNewCode2) {
>>     *multiplier = bt == T_INT
>>         ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows
>>         : ((jlong) 1) << con->get_int();
>> } else {
>>     *multiplier = ((jlong) 1 << con->get_int());
>> }
>> 
>> 
>> Two new bitshift overflow tests were added.
>
> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 67 commits:
> 
>  - Merge branch 'openjdk:master' into arithmetic-canonicalization
>  - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization
>  - Allow swapping LHS/RHS in case not matched
>  - Merge branch 'refs/heads/master' into arithmetic-canonicalization
>  - improve comment readability and struct helper functions
>  - remove asserts, add more documentation
>  - fix typo: lhs->rhs
>  - update comments
>  - use java_add to avoid cpp overflow UB
>  - add assertion for MulLNode too
>  - ... and 57 more: https://git.openjdk.org/jdk/compare/173dedfb...7bb7e645

@tabjy Thanks for the ping. Sorry I did not respond earlier. I was hoping others would continue the review, but it seems it got stuck on me here, a classic though unfortunate pattern ;)

@rwestrel Asked me if I wanted to continue reviewing. I'm going on a 3-week vacation, so feel free to ask others to review.

--------------------------------

I'll summarize my thoughs now, so others can review the PR in my absence:
- The PR looks much better now, we have made good progress.
- I'm still sad that we are not covering cases like `a * CON1 + a * CON2`, or other patterns that could be collapsed to `a * CON`. But I do understand that this would require some recursive approach, and that could be a little more difficult.

-----------------------------------

I'll leave it at this, and hope that others will review ?

src/hotspot/share/opto/addnode.cpp line 424:

> 422: // Note this also converts, for example, original expression `(a*3) + a` into `4*a` and `(a<<2) + a` into `5*a`. A more
> 423: // generalized pattern `(a*b) + (a*c)` into `a*(b + c)` is handled by AddNode::IdealIL().
> 424: Node* AddNode::convert_serial_additions(PhaseGVN* phase, BasicType bt) {

The name `convert_serial_additions` now seems a bit off. Because we really cover a lot of other cases too.
Really you cover `a + pattern` and `pattern + a`, where `pattern` is one of the cases from `find_serial_addition_patterns`.

Maybe it could be called `AddNode::Ideal_collapse_variable_times_con`. Because in the end you want to find cases that are equivalent to `a * some_con`.

Lead the documentation with that as well, rather than the series of additions. Because the series of additions is not the pattern you actually match here. The series of additions is only one of the use-cases, and there are others.

src/hotspot/share/opto/addnode.cpp line 442:

> 440:       return nullptr;
> 441:     }
> 442:   }

Nice, thanks for adding it!
I think it would be nice if we renamed `find_serial_addition_patterns` so that it is clear that we are looking for `a + a * con` or `con*a + a`. Because currently it is not directly clear why we need the swapping from the method name.

src/hotspot/share/opto/addnode.cpp line 456:

> 454: //     - (3) Simple multiplication: LHS = CON * a
> 455: //     - (4) Power-of-two addition: LHS = (a << CON1) + (a << CON2)
> 456: AddNode::Multiplication AddNode::find_serial_addition_patterns(const Node* lhs, const Node* rhs, BasicType bt) {

Here, we have `rhs = a`, right? I'd suggest just renaming the method arguments `rhs`->`a` and `lhs`->`pattern`. Because you already call (1) - (4) patterns in the documentation. That would be a good fit :)

src/hotspot/share/opto/addnode.cpp line 544:

> 542: //     - (2) AddNode(LShiftNode(a, CON), a)
> 543: //     - (3) AddNode(a, LShiftNode(a, CON))
> 544: //     - (4) AddNode(a, a)

You could drop the `Node` part from the cases here, to make it a bit more concise.

test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 24:

> 22:  */
> 23: 
> 24: package compiler.c2;

I would put the test in a more specific directory. I think the `igvn` directory would be a good canditate, because `Ideal` is part of IGVN ;)

test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 38:

> 36:  * @test
> 37:  * @bug 8325495 8347555
> 38:  * @summary C2 should optimize for series of Add of unique value. e.g., a + a + ... + a => a*n

You may want to change the summary here, and also the PR summary. Because you really do not just do these series of additions, but lots of other cases as well. The examples below suggest that too ;)

test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 334:

> 332:     private static long randomPowerOfTwoAdditionL(long a) {
> 333:         return a * CON1_L + a * CON2_L + a * CON3_L + a * CON4_L;
> 334:     }

Nice, thanks for these :)

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-3234938073
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2355842882
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2355851866
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2355855659
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2355859306
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2355868028
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2355865694
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2355871718

From epeter at openjdk.org  Wed Sep 17 15:22:40 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Wed, 17 Sep 2025 15:22:40 GMT
Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add
 of unique value [v18]
In-Reply-To: <gHuX3XkLqHlyEBajTg1mbIADUDKSg-v5bqRXTnsShdM=.30a8b4d3-b8ff-40d9-a96b-14c83842f90f@github.com>
References: <uPac6xWF7cP6ux_dPbfCY-XhS0-cAjqO_kQxv9rF_AY=.3972e17c-c80a-4a28-bc91-f74ffe228544@github.com>
 <53Ado9oN1yU5hgOPU2feecxsArD5yoycn09ZWPNK4AQ=.69035bde-9bec-442e-8dc2-ddd268df9d07@github.com>
 <gHuX3XkLqHlyEBajTg1mbIADUDKSg-v5bqRXTnsShdM=.30a8b4d3-b8ff-40d9-a96b-14c83842f90f@github.com>
Message-ID: <BtvuhUQ1372di0jFHOa64slWDcAiTQscxkkJyNcbt0w=.7497f7e4-0a77-4334-ae34-967530ce4d55@github.com>

On Wed, 17 Sep 2025 15:08:43 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 67 commits:
>> 
>>  - Merge branch 'openjdk:master' into arithmetic-canonicalization
>>  - Merge remote-tracking branch 'origin/master' into arithmetic-canonicalization
>>  - Allow swapping LHS/RHS in case not matched
>>  - Merge branch 'refs/heads/master' into arithmetic-canonicalization
>>  - improve comment readability and struct helper functions
>>  - remove asserts, add more documentation
>>  - fix typo: lhs->rhs
>>  - update comments
>>  - use java_add to avoid cpp overflow UB
>>  - add assertion for MulLNode too
>>  - ... and 57 more: https://git.openjdk.org/jdk/compare/173dedfb...7bb7e645
>
> src/hotspot/share/opto/addnode.cpp line 544:
> 
>> 542: //     - (2) AddNode(LShiftNode(a, CON), a)
>> 543: //     - (3) AddNode(a, LShiftNode(a, CON))
>> 544: //     - (4) AddNode(a, a)
> 
> You could drop the `Node` part from the cases here, to make it a bit more concise.

Alternatively, you could do it with the `<<` operator like you did in `find_serial_addition_patterns`. I think that would be more consistent.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2355861249

From cslucas at openjdk.org  Wed Sep 17 16:54:51 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Wed, 17 Sep 2025 16:54:51 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <t_dqm0EN559GBmw4cJBPitWnDxmNFjoGMran3JxdRVI=.cb8a4e4f-7ce7-48f6-9c11-fe646c57efd7@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
 <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>
 <2brDXuLmbVBVRaeSyCdKokA706v3t6VsZfGvj_QceJ4=.4483390e-c726-4d82-b220-f1dbdf4efef0@github.com>
 <t_dqm0EN559GBmw4cJBPitWnDxmNFjoGMran3JxdRVI=.cb8a4e4f-7ce7-48f6-9c11-fe646c57efd7@github.com>
Message-ID: <P04BPNL5t9aSXXFG_J9us6cLzsQFhpYiAlj_cPB78-c=.53704114-2174-48aa-9882-574259f197c7@github.com>

On Thu, 11 Sep 2025 07:41:55 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>>> @robcasloz - are you thinking that the "fixed point" loops on `find_scalar_replaceable_allocs` aren't sufficient?
>> 
>> You're right, that should do.
>> 
>>> At first glance yes, I think that the code would be more cleaned up if done that way. If the code had been written like that in the first place we wouldn't have seen the current issue. (...)
>> 
>> Agree, a single fixed point loop combining NSR detection and propagation would be ideal for clarity and maintainability.
>> 
>>>  I propose that we move forward with the current patch and work on this refactoring as a separate issue.
>> 
>> Sounds good, please file a RFE for that. I would suggest then to postpone the clean-up in `revisit_reducible_phi_status` to that RFE.
>
>> @robcasloz - I pushed some changes addressing yours and @eme64 comments. Could you please re-run your internal tests?
> 
> Thanks, I will report back within a couple of days.

Thank you @robcasloz ; I'll start working on that early next week.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3303827082

From cslucas at openjdk.org  Wed Sep 17 16:54:53 2025
From: cslucas at openjdk.org (Cesar Soares Lucas)
Date: Wed, 17 Sep 2025 16:54:53 GMT
Subject: Integrated: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
Message-ID: <8WNEENPja-dRlOVY3Bchz8n_eN-3brvuNzauem5SWIU=.0c69796f-5775-4f39-8309-9bc7d3b917eb@github.com>

On Wed, 3 Sep 2025 00:53:59 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

> Please, review this patch to fix issue that may occur when reducing allocation merge.
> 
> As the assert message describe, the problem is a `Phi` considered reducible during one invocation of  `adjust_scalar_replaceable_state` turned out to be later non-reducible. This situation can happen if a subsequent invocation of the same method causes all inputs to the phi to be NSR; therefore there is no point in reducing the Phi. It can also happen during the propagation of NSR state done by `find_scalar_replaceable_allocs`. 
> 
> The change in `revisit_reducible_phi_status` is just a clean-up.
> The real fix is in `find_scalar_replaceable_allocs`.
> 
> Tested on Linux x64/Aarch64 release/fastdebug with JTREG tier1-3.

This pull request has now been integrated.

Changeset: 6f493b4d
Author:    Cesar Soares Lucas <cslucas at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/6f493b4d2e7120cbe34fb70d595f7626655b47a9
Stats:     71 lines in 2 files changed: 71 ins; 0 del; 0 mod

8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed: Sanity: previous reducible Phi is no longer reducible before SUT

Reviewed-by: rcastanedalo

-------------

PR: https://git.openjdk.org/jdk/pull/27063

From bulasevich at openjdk.org  Wed Sep 17 18:08:58 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Wed, 17 Sep 2025 18:08:58 GMT
Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift
 exponent 100 is too large for 32-bit type 'unsigned int' [v4]
In-Reply-To: <uKUByg7RkOyLsGYoajinrOf76Uu00PIJ-fBeWOKVNcI=.1d4fc3ed-2fd3-454c-9fa4-af97fc676b48@github.com>
References: <uKUByg7RkOyLsGYoajinrOf76Uu00PIJ-fBeWOKVNcI=.1d4fc3ed-2fd3-454c-9fa4-af97fc676b48@github.com>
Message-ID: <dmcwEVVk5OPlA35gtJ-iNVEbSQGl75guwo7gl470tsw=.1577fc95-f7f2-4c0b-8790-1197aed4ae40@github.com>

> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal.
> 
> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run.
> 
> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments.
> 
> The problems is that shift count `n` may be too large here:
> 
> class Pipeline_Use_Cycle_Mask {
> protected:
>   uint _mask;
>   ..
>   Pipeline_Use_Cycle_Mask& operator<<=(int n) {
>     _mask <<= n;
>     return *this;
>   }
> };
> 
> The recent change attempted to cap the shift amount at one call site:
> 
> class Pipeline_Use_Element {
> protected:
>   ..
>   // Mask of specific used cycles
>   Pipeline_Use_Cycle_Mask _mask;
>   ..
>   void step(uint cycles) {
>     _used = 0;
>     uint max_shift = 8 * sizeof(_mask) - 1;
>     _mask <<= (cycles < max_shift) ? cycles : max_shift;
>   }
> }
> 
> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count:
> 
> // The following two routines assume that the root Pipeline_Use entity
> // consists of exactly 1 element for each functional unit
> // start is relative to the current cycle; used for latency-based info
> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const {
>   for (uint i = 0; i < pred._count; i++) {
>     const Pipeline_Use_Element *predUse = pred.element(i);
>     if (predUse->_multiple) {
>       uint min_delay = 7;
>       // Multiple possible functional units, choose first unused one
>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>         const Pipeline_Use_Element *currUse = element(j);
>         uint curr_delay = delay;
>         if (predUse->_used & currUse->_used) {
>           Pipeline_Use_Cycle_Mask x = predUse->_mask;
>           Pipeline_Use_Cycle_Mask y = currUse->_mask;
> 
>           for ( y <<= curr_delay; x.overlaps(y); curr_delay++ )
>             y <<= 1;
>         }
>         if (min_delay > curr_delay)
>           min_delay = curr_delay;
>       }
>       if (delay < min_delay)
>       delay = min_delay;
>     }
>     else {
>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>         const Pipeline_Use_Element *currUse = element(j);
>         if (predUse->_used & currUse->_used) {
>  ...

Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision:

  reduce fixed_latency(100) to fixed_latency(30) for calls/traps on ARM, PPC, RISC-V, X86

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26890/files
  - new: https://git.openjdk.org/jdk/pull/26890/files/e3ac8703..16d28c6d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=02-03

  Stats: 8 lines in 4 files changed: 0 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/26890.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26890/head:pull/26890

PR: https://git.openjdk.org/jdk/pull/26890

From bulasevich at openjdk.org  Wed Sep 17 18:32:10 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Wed, 17 Sep 2025 18:32:10 GMT
Subject: RFR: 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift
 exponent 100 is too large for 32-bit type 'unsigned int' [v5]
In-Reply-To: <uKUByg7RkOyLsGYoajinrOf76Uu00PIJ-fBeWOKVNcI=.1d4fc3ed-2fd3-454c-9fa4-af97fc676b48@github.com>
References: <uKUByg7RkOyLsGYoajinrOf76Uu00PIJ-fBeWOKVNcI=.1d4fc3ed-2fd3-454c-9fa4-af97fc676b48@github.com>
Message-ID: <BIlH0ZSo-ztrvtoihxHfQ8IYax2PoM0MBFCrw8wZ4_I=.e1d8e2a1-2623-4551-a65e-801f02cbc6aa@github.com>

> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal.
> 
> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run.
> 
> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments.
> 
> The problems is that shift count `n` may be too large here:
> 
> class Pipeline_Use_Cycle_Mask {
> protected:
>   uint _mask;
>   ..
>   Pipeline_Use_Cycle_Mask& operator<<=(int n) {
>     _mask <<= n;
>     return *this;
>   }
> };
> 
> The recent change attempted to cap the shift amount at one call site:
> 
> class Pipeline_Use_Element {
> protected:
>   ..
>   // Mask of specific used cycles
>   Pipeline_Use_Cycle_Mask _mask;
>   ..
>   void step(uint cycles) {
>     _used = 0;
>     uint max_shift = 8 * sizeof(_mask) - 1;
>     _mask <<= (cycles < max_shift) ? cycles : max_shift;
>   }
> }
> 
> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count:
> 
> // The following two routines assume that the root Pipeline_Use entity
> // consists of exactly 1 element for each functional unit
> // start is relative to the current cycle; used for latency-based info
> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const {
>   for (uint i = 0; i < pred._count; i++) {
>     const Pipeline_Use_Element *predUse = pred.element(i);
>     if (predUse->_multiple) {
>       uint min_delay = 7;
>       // Multiple possible functional units, choose first unused one
>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>         const Pipeline_Use_Element *currUse = element(j);
>         uint curr_delay = delay;
>         if (predUse->_used & currUse->_used) {
>           Pipeline_Use_Cycle_Mask x = predUse->_mask;
>           Pipeline_Use_Cycle_Mask y = currUse->_mask;
> 
>           for ( y <<= curr_delay; x.overlaps(y); curr_delay++ )
>             y <<= 1;
>         }
>         if (min_delay > curr_delay)
>           min_delay = curr_delay;
>       }
>       if (delay < min_delay)
>       delay = min_delay;
>     }
>     else {
>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>         const Pipeline_Use_Element *currUse = element(j);
>         if (predUse->_used & currUse->_used) {
>  ...

Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - reduce fixed_latency(100) to fixed_latency(30) for calls/traps on ARM, PPC, RISC-V, X86
 - use uint32_t for _mask
 - remove redundant code
 - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int'

-------------

Changes: https://git.openjdk.org/jdk/pull/26890/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26890&range=04
  Stats: 25 lines in 5 files changed: 0 ins; 6 del; 19 mod
  Patch: https://git.openjdk.org/jdk/pull/26890.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26890/head:pull/26890

PR: https://git.openjdk.org/jdk/pull/26890

From vlivanov at openjdk.org  Wed Sep 17 19:34:46 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 19:34:46 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <U1p9TRvNC5OzFwxraenVJ-4R4AV5Tnyhho025Fyh-ow=.5b0cc316-6852-4f90-aede-7363eac525e7@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>
 <U1p9TRvNC5OzFwxraenVJ-4R4AV5Tnyhho025Fyh-ow=.5b0cc316-6852-4f90-aede-7363eac525e7@github.com>
Message-ID: <RehiOszGY462bDoJ1tHZpSjqB1On8H3rUwxHzD8Shlo=.a9612269-df51-4e6c-9a36-f7ad8fe68a2c@github.com>

On Tue, 16 Sep 2025 01:24:35 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Could we also bail out here? Or what would happen now in production if there is a RF edge?
>
> We also use this area past endoff() for storing the "ex_oop" (see for example GraphKit::has_saved_ex_oop()).  Are ex_oop and reachability edges mutually exclusive?

Yes, ex_oop and reachability edges are mutually exclusive, but there's no conflict. ex_oop is kept during parsing while reachability edges stay attached to RF nodes until loop optimizations are over (and no inlining can happen anymore).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2356550410

From vlivanov at openjdk.org  Wed Sep 17 19:41:40 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 19:41:40 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>
Message-ID: <7s8qppZ6lzq5iN-inRFkFuXgElo46UmYyIrvExOLA3A=.cf76da61-89ee-4d29-9b5a-0b6e7b3bac2b@github.com>

On Fri, 12 Sep 2025 13:47:49 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/loopnode.cpp line 5341:
>> 
>>> 5339:     C->print_method(PHASE_ELIMINATE_REACHABILITY_FENCES, 2);
>>> 5340:     assert(C->reachability_fences_count() == 0, "no RF nodes allowed");
>>> 5341:   }
>> 
>> Can we somehow assert that we now really will never do loop-opts again?
>> Why are you checking for `_mode == LoopOptsDefaultFinal` and not for `LoopOptsEliminateRFs`?
>> If that was a bug, then more verification would be extra justified ;)
>
> Otherwise, please explain the meaning of `LoopOptsDefaultFinal`. Maybe it should be an OR here?

> Why are you checking for _mode == LoopOptsDefaultFinal and not for LoopOptsEliminateRFs?

The intention is to avoid an extra `PhaseIdealLoop` construction pass solely for `LoopOptsEliminateRFs` purposes when there's an empty pass during normal flow of loop optimizations.   

`LoopOptsEliminateRFs` is performed as the last resort when there was no previous pass to piggyback on.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2356569319

From vlivanov at openjdk.org  Wed Sep 17 19:47:21 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 19:47:21 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <7s8qppZ6lzq5iN-inRFkFuXgElo46UmYyIrvExOLA3A=.cf76da61-89ee-4d29-9b5a-0b6e7b3bac2b@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>
 <7s8qppZ6lzq5iN-inRFkFuXgElo46UmYyIrvExOLA3A=.cf76da61-89ee-4d29-9b5a-0b6e7b3bac2b@github.com>
Message-ID: <jwNtnLYIGi7j2QdiTUrPGgi5NIbnxTTPlP3eoRkx_mI=.c24b74ae-fdbb-44a9-ac73-9d7fbe6e534b@github.com>

On Wed, 17 Sep 2025 19:38:29 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Otherwise, please explain the meaning of `LoopOptsDefaultFinal`. Maybe it should be an OR here?
>
>> Why are you checking for _mode == LoopOptsDefaultFinal and not for LoopOptsEliminateRFs?
> 
> The intention is to avoid an extra `PhaseIdealLoop` construction pass solely for `LoopOptsEliminateRFs` purposes when there's an empty pass during normal flow of loop optimizations.   
> 
> `LoopOptsEliminateRFs` is performed as the last resort when there was no previous pass to piggyback on.

Maybe `LoopOptsEliminateRFs` should stress that it is intended to happen as the very last step in the flow of loop optimizations. Or, something happening after all other loop optimizations are over. I'll think more about it.

>From code perspective, what makes things more complicated is that  `PhaseIdealLoop` instance is hidden in `PhaseIdealLoop::optimize()`, so shaping it as a step in loop opts pipeline feels like the most appropriate thing to do.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2356597895

From vlivanov at openjdk.org  Wed Sep 17 19:51:25 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 19:51:25 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <4jTV6y9R_JfATA54LC7FK3DKdBX1srsU09DK1I25Uo0=.94233927-71f2-4f13-894d-206d00f5fdaa@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
 <Ci-CQQY-qs8vwCJmOqh2gmFHaULHRF1o9MXTu15rCJg=.becf246a-86a9-4adf-a2b6-ebfe27676347@github.com>
 <JKD0jqllmhfDeNwEd1g7LMAx8V4idsZFTXrS4C0KkUI=.47cf85da-9d62-4339-8bbd-80821a48ac32@github.com>
 <4jTV6y9R_JfATA54LC7FK3DKdBX1srsU09DK1I25Uo0=.94233927-71f2-4f13-894d-206d00f5fdaa@github.com>
Message-ID: <OPZ9TXu8K1JvTi4C_dvvtXyVMYyT-oLGpzq1lOhDBKI=.cc9d4827-dadb-49d2-9132-0127a81d1854@github.com>

On Fri, 12 Sep 2025 13:55:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Yes, maybe say what the general problem is, and make a concrete example. I'm currently a bit struggling to think of one that is relevant.
>
> Ah yes: we may for example move a store out (after) the loop. But wait. We can't move a store across a SafePoint, so that's not a good example.

For example, loads suffer from the same problems as stores, but constraints on them are more lax.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2356613585

From vlivanov at openjdk.org  Wed Sep 17 19:56:19 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 19:56:19 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <sM14v3wTzmIjccAGdJ19bgJ_w8O6ZfVTzCDAYIPtkh4=.4c158d93-6a29-4024-b5e4-413c6ed29481@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <sM14v3wTzmIjccAGdJ19bgJ_w8O6ZfVTzCDAYIPtkh4=.4c158d93-6a29-4024-b5e4-413c6ed29481@github.com>
Message-ID: <abW7rcqDPPGwpB4Z44AyiEI86Ortsd07DoofjbuFtSA=.3c9c3299-88fa-4727-be21-7daf27acabf2@github.com>

On Wed, 17 Sep 2025 01:06:50 GMT, Dean Long <dlong at openjdk.org> wrote:

>> src/hotspot/share/opto/reachability.cpp line 81:
>> 
>>> 79:  * (c) Unfortunately, it's not straightforward to stay with safepoint-attached representation till the very end,
>>> 80:  * because information about derived oops is attached to safepoints in a similar way. So, for now RFs are
>>> 81:  * rematerialized at safepoints before RA (phase #3).
>> 
>> I still don't understand this. What is similar to what? And why is that a problem?
>
> Why don't we put RF edges somewhere else, so they don't look like derived oops?  I was thinking they could go in the monitor area, or if that causes problems, we introduce a new area.

It's solely an implementation limitation. As of now, the only structure imposed on safepoint inputs relates to debug info (represented as JVMState). The rest is adhoc and there are many conflicting use cases introduced over time. The proper way to address it is to introduce proper structure for non-debug inputs, but it requires significant engineering effort to properly handle it across the whole compilation pipeline. For now, I just work-around it by performing additional transformation to avoid conflicts with existing functionality.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2356629144

From vlivanov at openjdk.org  Wed Sep 17 20:22:20 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 20:22:20 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v12]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <K5JfBeADFtX7k6BeO9nNs93KzCCB2SEVeH5B_8DrxxY=.22becd8e-d1f4-4372-ba34-6a2571ec2094@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:

  Add PreserveReachabilityFencesOnConstants test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25315/files
  - new: https://git.openjdk.org/jdk/pull/25315/files/01eaf64f..dc37ccad

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=11
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=10-11

  Stats: 134 lines in 5 files changed: 130 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From vlivanov at openjdk.org  Wed Sep 17 20:22:21 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 20:22:21 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <IHy67LkRLB0zhka_u6d5__rQmA8iXUkObu5DzkKcgYI=.8ee80763-b01c-49a8-b4ea-fae2c83eb870@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
 <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
 <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>
 <IHy67LkRLB0zhka_u6d5__rQmA8iXUkObu5DzkKcgYI=.8ee80763-b01c-49a8-b4ea-fae2c83eb870@github.com>
Message-ID: <hv6RizPbvAf1T_0KRAb0BM_28cTriU-GaeRVtKLAhIM=.1c50fad0-836f-4c51-9a77-5e2194a440fe@github.com>

On Mon, 15 Sep 2025 22:57:51 GMT, Dean Long <dlong at openjdk.org> wrote:

>> @eme64 I think I addressed/answered all your suggestions/questions. Please, take another look. Thanks!
>
> @iwanowww , do you have a test that shows constant oops are a problem?  My initial impression is that PreserveReachabilityFencesOnConstants shouldn't be needed, because any oops referenced during the compile should go into the ciEnv metadata[] and then into the nmethod oops.  So GC can't reclaim these oops because the nmethod keeps references to them.

@dean-long 

> because any oops referenced during the compile should go into the ciEnv metadata[] and then into the nmethod oop

That's not how it behaves in practice. OOPs observed during compilation don't necessarily end up in nmethod metadata unless there're explicit usages. 

> do you have a test that shows constant oops are a problem?

I do. Just pushed one example as `test/hotspot/jtreg/compiler/c2/TestReachabilityFenceOnConstant.java`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3304444302

From vlivanov at openjdk.org  Wed Sep 17 20:31:11 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 20:31:11 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <mgdTRnlYCZYfqFFzzwlsnCv5u4rQQsrVMw5sC7AmdO0=.ac414de9-0972-4efc-accc-e4202ae16797@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
 <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
 <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>
 <mgdTRnlYCZYfqFFzzwlsnCv5u4rQQsrVMw5sC7AmdO0=.ac414de9-0972-4efc-accc-e4202ae16797@github.com>
Message-ID: <tH_KF4QLs4NRWc5EP4h9vH8FTggUKi3x6ibsoEu_uIo=.33688a4c-90bb-4b5a-b5f8-c7d8d77b0e9f@github.com>

On Fri, 12 Sep 2025 14:09:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> @eme64 I think I addressed/answered all your suggestions/questions. Please, take another look. Thanks!
>
> @iwanowww Thanks for the updates! I again only looked through most comments as well.
> 
> These are the major topics for me:
> - `StressReachabilityFences` only inserts RF where they are not needed. So this allows us to test the consistency of the RF machinery, but not to test if we are missing RF where they are needed. That is much harder, and we should probably invest in writing more tests for those cases, even if it is really hard. Maybe we can even write fuzzing tests for it?
> - There seems to be missing support for carrying RF edges through incremental inlining, right? File an RFE, or track it elsewhere. Could we create a reproducer for this case / can we extend the existing one? https://github.com/openjdk/jdk/pull/25315#discussion_r2330095168
> - Are we sure that we don't eliminate the RF for the wrong allocation? https://github.com/openjdk/jdk/pull/25315#discussion_r2330230044
> - Extra compile-time due to extra loop-opts round. https://github.com/openjdk/jdk/pull/25315#discussion_r2330176841 . It used to be a 20% increase, now you managed to make it only 10%. Still considerable. All of it just to call `get_ctrl(referent)` in `enumerate_interfering_sfpts`.
> 
> I think some of these issues should also be discussed in the PR description / JIRA description.
> It would be especially nice if you could summarize the scope of the problem of RF, and which parts are now fixed, and which parts you know are not yet fixed. Of course there may be even more we don't know, but best write everything down we already do know. ;)
> 
> Other ideas:
> - You should file an RFE to add your stress flags to the stress job, and also the fuzzer.
> - I did not yet study the reproducer `TestReachabilityFence.java`. We should consider making a fuzzer style test out of it, maybe using the template framework. Feel free to just file an RFE for that, and assign it to me.
> 
> @shipilev @TobiHartmann @chhagedorn 
> I'm soon going on vacation (in a week), and so I'd like the other reviewers to be aware of these issues.
> I don't want to hold up the patch, so feel free to have someone else review. But I'm also happy to come back to this mid October.

@eme64 

> There seems to be missing support for carrying RF edges through incremental inlining, right? File an RFE, or track it elsewhere. Could we create a reproducer for this case / can we extend the existing one? https://github.com/openjdk/jdk/pull/25315#discussion_r2330095168

There's no problem there. Safepoint-attached reachability edges are introduced when no inlining is allowed. 
(There's one case when virtual calls can be strength-reduced to direct calls very late -- `Compile::process_late_inline_calls_no_inline()`, but such transformation is simply disabled for now when reachability edges are present.)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3304467597

From vlivanov at openjdk.org  Wed Sep 17 21:32:06 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 21:32:06 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <mgdTRnlYCZYfqFFzzwlsnCv5u4rQQsrVMw5sC7AmdO0=.ac414de9-0972-4efc-accc-e4202ae16797@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
 <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
 <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>
 <mgdTRnlYCZYfqFFzzwlsnCv5u4rQQsrVMw5sC7AmdO0=.ac414de9-0972-4efc-accc-e4202ae16797@github.com>
Message-ID: <ElNmjqH-DLIN86dLkcRngrZC-vqY7V5fTdV8g9nHKI4=.f7360c5c-01c4-45d8-bb67-ddb2ce3c8636@github.com>

On Fri, 12 Sep 2025 14:09:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Extra compile-time due to extra loop-opts round. https://github.com/openjdk/jdk/pull/25315#discussion_r2330176841 . It used to be a 20% increase, now you managed to make it only 10%. Still considerable. 

FTR 10% increase in loop opts time is observed with `-XX:+StressReachabiltyFences`.

> All of it just to call get_ctrl(referent) in enumerate_interfering_sfpts.

Well, I wouldn't frame it in such a way. RF elimination transformation relies on dominance information computed by `PhaseIdealLoop` to produce control input for each referent. And there are many other transformations under `PhaseIdealLoop` which "just" rely on dominance info it produces.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3304621503

From vlivanov at openjdk.org  Wed Sep 17 21:39:45 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 21:39:45 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <OQOC6AdwvpZrL5teIwHml6xnayliLgsY5VjI87p5XNs=.cee478ee-56f8-4b6b-bb68-0a5c4ea24df7@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
 <OQOC6AdwvpZrL5teIwHml6xnayliLgsY5VjI87p5XNs=.cee478ee-56f8-4b6b-bb68-0a5c4ea24df7@github.com>
Message-ID: <XI7vKKKSSx-o-wzEKejgUP6mhLds7bdLmQCHmgcRfnQ=.661e2448-1565-4e9a-92c6-f2235bf438d5@github.com>

On Tue, 16 Sep 2025 20:09:18 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.
>> 
>> Consider `FloatVector::lanewiseTemplate`:
>> 
>>     FloatVector lanewiseTemplate(VectorOperators.Unary op) {
>>         if (opKind(op, VO_SPECIAL)) {
>>             ...                             
>>             else if (opKind(op, VO_MATHLIB)) {
>>                 return unaryMathOp(op);
>>             }
>>         }
>>         int opc = opCode(op);
>>         return VectorSupport.unaryOp(opc, ...);
>>     }
>> 
>> 
>> At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 
>> 
>> It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.
>> 
>> The fix is to fail-fast intrinsification rather than crashing the VM.
>> 
>> Testing: tier1 - tier4
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review feedback

Thanks for the reviews, Aleksey, Jatin, and Emanuel.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27263#issuecomment-3304631783

From vlivanov at openjdk.org  Wed Sep 17 21:39:46 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 21:39:46 GMT
Subject: RFR: 8367333: C2: Vector math operation intrinsification failure
 [v2]
In-Reply-To: <k8fR7jU8NQKCVvJ3epN1Me5Yvp19D60-HDCghKqoxsU=.3b6d27c7-b155-4325-8b8c-a5d624eee69a@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
 <sZvkxuXnOiN1VKfG92NEmZl2f4g0xdTBwF62_lhWlZg=.c5c434d8-00fa-4b26-a127-dad00aea9fe6@github.com>
 <3Cy6jhWxbaQeWwo22L9nxPnipY1-vHsGZEtk8IZUiq8=.bfefdef7-0137-422b-a7b0-e4fae2a5b282@github.com>
 <WlNycrFJxUGh9p1sE-I-cTwAuidgnPGUQl-98GxqQG0=.4419f936-35e7-4b64-aa50-831247e8390c@github.com>
 <k8fR7jU8NQKCVvJ3epN1Me5Yvp19D60-HDCghKqoxsU=.3b6d27c7-b155-4325-8b8c-a5d624eee69a@github.com>
Message-ID: <W_qfHU8ykEgV4o39JC6xPsCKQfn0W7MEXAjgfPWg7OE=.12aedba9-633e-4290-b835-e7915ccde3b3@github.com>

On Wed, 17 Sep 2025 06:08:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Also: why not just add the extra run over at the original test?

`test/jdk/jdk/incubator/vector/*VectorTests.java` are huge and already override default timeout setting. But running them with `-XX:+StressIncrementalInlining` does make some sense. (Maybe not by default, but as part of some stress testing configuration we have.)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27263#discussion_r2356845600

From vlivanov at openjdk.org  Wed Sep 17 21:39:48 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 21:39:48 GMT
Subject: Integrated: 8367333: C2: Vector math operation intrinsification
 failure
In-Reply-To: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
References: <qpB4c9uXgeU4UbE9Zm_Pb-52QHm203ICO6IdT1w_bGQ=.be2804be-add6-4157-a335-dea1e0920e08@github.com>
Message-ID: <ZJd7n4_DuHkWqNeBkdL2Hccea_is0CWsUTWkkeAp254=.6ed11247-414e-42ce-9b27-f058663e9508@github.com>

On Fri, 12 Sep 2025 19:14:18 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> As part of [JDK-8353786](https://bugs.openjdk.org/browse/JDK-8353786), C2 support for operations backed by the vector math library was completely removed. On JDK side, there is a special dispatching logic added to avoid intrinsic calls in `jdk.internal.vm.vector.VectorSupport`. But it's still possible to observe such paradoxical situations (intrinsic calls with obsolete operation IDs) when processing effectively dead code.
> 
> Consider `FloatVector::lanewiseTemplate`:
> 
>     FloatVector lanewiseTemplate(VectorOperators.Unary op) {
>         if (opKind(op, VO_SPECIAL)) {
>             ...                             
>             else if (opKind(op, VO_MATHLIB)) {
>                 return unaryMathOp(op);
>             }
>         }
>         int opc = opCode(op);
>         return VectorSupport.unaryOp(opc, ...);
>     }
> 
> 
> At runtime, `unaryMathOp` is unconditionally invoked, but during compilation it's possible to end up with an intrinsification attempt of `VectorSupport.unaryOp()` before `opKind(op, VO_SPECIAL)` is inlined. 
> 
> It can be reliably reproduced `-XX:+StressIncrementalInlining` flag.
> 
> The fix is to fail-fast intrinsification rather than crashing the VM.
> 
> Testing: tier1 - tier4

This pull request has now been integrated.

Changeset: aa36799a
Author:    Vladimir Ivanov <vlivanov at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/aa36799acb5834d730400fb073a9a3a8ee3c28ef
Stats:     167 lines in 3 files changed: 167 ins; 0 del; 0 mod

8367333: C2: Vector math operation intrinsification failure

Reviewed-by: epeter, shade, jbhateja

-------------

PR: https://git.openjdk.org/jdk/pull/27263

From vlivanov at openjdk.org  Wed Sep 17 21:44:29 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 21:44:29 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2]
In-Reply-To: <mgdTRnlYCZYfqFFzzwlsnCv5u4rQQsrVMw5sC7AmdO0=.ac414de9-0972-4efc-accc-e4202ae16797@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com>
 <Ctg8ZdOt0UVSzfJ18f-0Aj1DCLSGsy3pLQnm8SzcpUg=.9eb433cd-2e53-4a12-a24f-1c493a1a7869@github.com>
 <sR_6Jh_j5oxAFBzXbSqwD2L3d74pP_-XnXM8RDWjUXA=.35937cef-adbe-418e-a0cd-3d13c4b145e5@github.com>
 <1FgOFS7aAlEbvVUez6iTfzgf2l7qUbL9C4wfSGmmfo0=.406c10f1-63d5-4333-af6d-525e46203182@github.com>
 <mgdTRnlYCZYfqFFzzwlsnCv5u4rQQsrVMw5sC7AmdO0=.ac414de9-0972-4efc-accc-e4202ae16797@github.com>
Message-ID: <VEI1WMrCDV49ynHCV2_5W4Z7vgTPuMjjH0nJMdlL_8c=.a912cc00-4751-46f2-9406-d1233a3cbf96@github.com>

On Fri, 12 Sep 2025 14:09:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> StressReachabilityFences only inserts RF where they are not needed. So this allows us to test the consistency of the RF machinery, but not to test if we are missing RF where they are needed. That is much harder, and we should probably invest in writing more tests for those cases, even if it is really hard. Maybe we can even write fuzzing tests for it?

That's a fair point. I'll think more about ways to automatically test RF invariants in positive/negative ways and file RFEs.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3304649404

From vlivanov at openjdk.org  Wed Sep 17 22:29:26 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 17 Sep 2025 22:29:26 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <hRJMVwoNPY2xD1ntsuHYrmG283r7RCyAcTZZ6USWe4A=.25f49491-163e-44f8-946c-e157f8837250@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <_n3uP_Dkl3RNq3MFoRDXsS28SM8CcQHaR6vdUJF9U8s=.dcfab97b-be28-4244-93df-c8a23d6d66b8@github.com>
 <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>
 <hRJMVwoNPY2xD1ntsuHYrmG283r7RCyAcTZZ6USWe4A=.25f49491-163e-44f8-946c-e157f8837250@github.com>
Message-ID: <xP4A6eKQTNQXipJZg6T_seJd34FLciuhATeG7nFMatw=.61c8e108-23c3-4898-9ec5-94bc86c6157d@github.com>

On Fri, 12 Sep 2025 13:18:33 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> Is this rf guaranteed to belong to the Allocation somehow?
>> 
>> I don't get your question. The code iterates over users of an allocation which is being eliminated.  Semantically, RF is a no-op on a scalarizable referent and has to be removed in order to let the scalarization happen.
>> 
>>> Ah, you could mention that later ReachabilityFenceNode::Identity removes the rf.
>> 
>> Done.
>
> @iwanowww

The code in `PhaseMacroExpand::process_users_of_allocation` iterates over direct users of result cast from Allocation nodes. And RF is not special there. Any other case in `PhaseMacroExpand::process_users_of_allocation()` would be affected.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2356922289

From dlong at openjdk.org  Wed Sep 17 23:27:06 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 17 Sep 2025 23:27:06 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
 [v2]
In-Reply-To: <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>
References: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
 <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>
Message-ID: <VIQ-aNGEAnxFxJBZsGBJsnbK0ZlPWd-BAhpirU1HjMM=.dc6c04f9-53f2-4eec-bc78-8d61fdcddc43@github.com>

On Mon, 15 Sep 2025 14:27:28 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the bug for discussion what issues current machinery has. 
>> 
>> This PR executes the plan outlined in the bug:
>>  1. Common the receiver type profiling code in interpreter and C1
>>  2. Rewrite receiver type profiling code to only do atomic receiver slot installations
>>  3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed 
>> 
>> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
>> 
>> Additional testing:
>>   - [x] Linux x86_64 server fastdebug, `compiler/`
>>   - [ ] Linux x86_64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>  - Drop atomic counters
>  - Initial version

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4853:

> 4851:       } else {
> 4852:         // Nothing to do, just go with defaults.
> 4853:         assert_different_registers(rax, mdp, recv, offset);

Can't we do all register shuffling and push/pop outside the loop?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2356988910

From dlong at openjdk.org  Wed Sep 17 23:42:41 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 17 Sep 2025 23:42:41 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
 [v2]
In-Reply-To: <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>
References: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
 <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>
Message-ID: <oNoxJOpfphpVIrQxryFIDOeRjhdBGb8GGpskNXExN1k=.69cd182f-c989-4431-a902-9c89ae136dac@github.com>

On Mon, 15 Sep 2025 14:27:28 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the bug for discussion what issues current machinery has. 
>> 
>> This PR executes the plan outlined in the bug:
>>  1. Common the receiver type profiling code in interpreter and C1
>>  2. Rewrite receiver type profiling code to only do atomic receiver slot installations
>>  3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed 
>> 
>> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
>> 
>> Additional testing:
>>   - [x] Linux x86_64 server fastdebug, `compiler/`
>>   - [ ] Linux x86_64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>  - Drop atomic counters
>  - Initial version

src/hotspot/cpu/x86/interp_masm_x86.cpp line 1342:

> 1340: 
> 1341:     // Record the receiver type.
> 1342:     type_profile(receiver, mdp, 0);

Why is 0 the correct offset?  The C1 helper uses md->byte_offset_of_slot().

src/hotspot/cpu/x86/interp_masm_x86.cpp line 1553:

> 1551: 
> 1552:       // Record the object type.
> 1553:       record_klass_in_profile(klass, mdp, reg2, false);

Same question as above about the 0 offset.  Is this because `mdp` has already been adjusted?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2357007843
PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2357010342

From dfenacci at openjdk.org  Thu Sep 18 06:27:45 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Thu, 18 Sep 2025 06:27:45 GMT
Subject: RFR: 8367278: Test compiler/startup/StartupOutput.java timed out
 after completion on Windows
In-Reply-To: <a-3PpOX7mJnmJPkcYL2iiyGlbodmYhPOGgxrkT3Fo90=.2794618f-aded-4966-9afc-b924451c08dd@github.com>
References: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
 <a-3PpOX7mJnmJPkcYL2iiyGlbodmYhPOGgxrkT3Fo90=.2794618f-aded-4966-9afc-b924451c08dd@github.com>
Message-ID: <5E1DUrHS_zkhw6H1ivQak0rhqtxfEivrwhJkkpf2swE=.6dac4e41-4ab4-4fc8-bf89-7af81d78a0b5@github.com>

On Wed, 17 Sep 2025 13:10:33 GMT, SendaoYan <syan at openjdk.org> wrote:

>> ## Problem
>> After [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555) changed the default TIMEOUT_FACTOR from 4 to 1, the test compiler/startup/StartupOutput.java can occasionally slightly exceed the 2-minute timeout on Windows.
>> 
>> ## Change
>> Rather than increasing the timeout, this change reduces the number of VM runs with randomly generated near-minimum code cache sizes from 200 to 50. This should still provide sufficient coverage while keeping execution well within the timeout.
>> 
>> ## Testing:
>> Tiers 1-3+
>
> GHA shows GetStackTraceALotWhenPinned.java timed out on macos. The failure has been fixed by [JDK-8366893](https://bugs.openjdk.org/browse/JDK-8366893). I think you can merge the master first.

Thanks for your reviews @sendaoYan @chhagedorn.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27254#issuecomment-3305596171

From dfenacci at openjdk.org  Thu Sep 18 06:27:46 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Thu, 18 Sep 2025 06:27:46 GMT
Subject: Integrated: 8367278: Test compiler/startup/StartupOutput.java timed
 out after completion on Windows
In-Reply-To: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
References: <AJYaHORLYxUCNMALa2LNo-21UBiP07SW97l4gwFZ9N0=.ab1bb72d-4db8-4472-921d-4d24356fc58c@github.com>
Message-ID: <vAY5wk78eY3kqxfTF1pEr4zE-o_oXppCS__CQUcbpFs=.14e93cf0-7b6e-441c-877a-c8600d2ba993@github.com>

On Fri, 12 Sep 2025 09:56:24 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

> ## Problem
> After [JDK-8260555](https://bugs.openjdk.org/browse/JDK-8260555) changed the default TIMEOUT_FACTOR from 4 to 1, the test compiler/startup/StartupOutput.java can occasionally slightly exceed the 2-minute timeout on Windows.
> 
> ## Change
> Rather than increasing the timeout, this change reduces the number of VM runs with randomly generated near-minimum code cache sizes from 200 to 50. This should still provide sufficient coverage while keeping execution well within the timeout.
> 
> ## Testing:
> Tiers 1-3+

This pull request has now been integrated.

Changeset: a355edbb
Author:    Damon Fenacci <dfenacci at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/a355edbbe43f7356f9439ecabf0ab8218fc9e3e1
Stats:     1 line in 1 file changed: 0 ins; 0 del; 1 mod

8367278: Test compiler/startup/StartupOutput.java timed out after completion on Windows

Reviewed-by: syan, chagedorn

-------------

PR: https://git.openjdk.org/jdk/pull/27254

From epeter at openjdk.org  Thu Sep 18 06:40:12 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 06:40:12 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v3]
In-Reply-To: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
Message-ID: <ptEL3aeVTmddBQF2sIuu01j37I-iIwhEneFbEh6yZUU=.d132275c-1f67-4568-88e1-1cf580558134@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ------------------------------
> 
> **Goals**
> - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
> - Remove `_nodes` from the vector vtnodes.
> 
> **Details**
> - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
>   - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
> - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
> - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
> - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.
> 
> I also made a lot of annotations in the code below, for easier review.
> 
> **Suggested order for review**
> - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
> - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
> - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
> - `VTransformApplyState`: how it now tracks the memory state.
> - `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
> - Then look at all the other details.

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Update src/hotspot/share/opto/vectorization.cpp
  
  Co-authored-by: Manuel H?ssig <manuel at haessig.org>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27208/files
  - new: https://git.openjdk.org/jdk/pull/27208/files/469426a7..9af66755

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27208.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27208/head:pull/27208

PR: https://git.openjdk.org/jdk/pull/27208

From epeter at openjdk.org  Thu Sep 18 06:40:14 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 06:40:14 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v2]
In-Reply-To: <5zLWoCC7_s5VBF435fL1hk_m9vsk5JQrdZ1tEipatFo=.bc502b75-7074-4923-8dce-d367eb1b71af@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
 <i0uwzmuCcxXGG4d4-ronPIB3wETW3sHL6M5yRuHkfA4=.45c69c45-84a0-4234-96f6-eabe01b42a69@github.com>
 <5zLWoCC7_s5VBF435fL1hk_m9vsk5JQrdZ1tEipatFo=.bc502b75-7074-4923-8dce-d367eb1b71af@github.com>
Message-ID: <wtGzUcrELon9wS07VoGcdEVScIqNnkcgfbrOw4yAi3c=.9fdb241f-c1bc-4ad8-bb56-104f89aeed9b@github.com>

On Wed, 17 Sep 2025 14:37:00 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   for Manuel
>
> Thank you for addressing my comments and answering my question. Bar the new typo, this looks good to me.

@mhaessig Thanks a lot for the review, suggestions and approval :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27208#issuecomment-3305626314

From epeter at openjdk.org  Thu Sep 18 06:48:36 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 06:48:36 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions
In-Reply-To: <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
 <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>
Message-ID: <pbGSvpnJxpBh7PcbTlBBTXaXx_voSOUGcCS-NB0M1uQ=.8da3d926-e9f6-4dd2-8b34-27728f8d67f9@github.com>

On Wed, 17 Sep 2025 11:10:09 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
>> 
>> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
>> 
>> Details, in **order you should review**:
>> - `Operations.java`: maps lots of primitive operators as Expressions.
>> - `Expression.java`: the fundamental engine behind Expressions.
>> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
>> - `tests/TestExpression.java`: correctness test of Expression machinery.
>> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
>> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
>> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
>> 
>> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
>> 
>> **Future Work**:
>> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
>> - Use `Expression`s to model more operations:
>>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
>> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
>> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just fol...
>
> test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 358:
> 
>> 356:             tokens.add(arguments.get(i));
>> 357:         }
>> 358:         tokens.add(strings.get(strings.size()-1));
> 
> Suggestion:
> 
>         tokens.add(strings.getLast());
> 
> A wee bit easier to read.

Did not know this was a thing, nice :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2357700924

From epeter at openjdk.org  Thu Sep 18 06:52:34 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 06:52:34 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v2]
In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
Message-ID: <5YknYR1eLr-C-b-XIo863vtjkT9F8Aej2DYEGMaCodQ=.fb7f55bd-1fc5-426d-a974-c4770d9a2981@github.com>

> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
> 
> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
> 
> Details, in **order you should review**:
> - `Operations.java`: maps lots of primitive operators as Expressions.
> - `Expression.java`: the fundamental engine behind Expressions.
> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
> - `tests/TestExpression.java`: correctness test of Expression machinery.
> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
> 
> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
> 
> **Future Work**:
> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
> - Use `Expression`s to model more operations:
>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Apply Manuel's suggestions part 1
  
  Co-authored-by: Manuel H?ssig <manuel at haessig.org>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26885/files
  - new: https://git.openjdk.org/jdk/pull/26885/files/0709731a..d66aa985

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=00-01

  Stats: 134 lines in 3 files changed: 1 ins; 2 del; 131 mod
  Patch: https://git.openjdk.org/jdk/pull/26885.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885

PR: https://git.openjdk.org/jdk/pull/26885

From epeter at openjdk.org  Thu Sep 18 06:52:36 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 06:52:36 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v2]
In-Reply-To: <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
 <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>
Message-ID: <84_nAzM5h9uSyvzRquE4x9EhrnfmNls0Btzts0zSPFw=.8b10e909-122b-421b-b148-57304b32d68c@github.com>

On Wed, 17 Sep 2025 11:33:28 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Apply Manuel's suggestions part 1
>>   
>>   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>
> test/hotspot/jtreg/compiler/lib/template_framework/library/Operations.java line 1:
> 
>> 1: /*
> 
> I gave it my best shot to suggest a reasonable and reasonably consistent alignment.

Oh wow, nice. Thanks for the work :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2357708984

From dfenacci at openjdk.org  Thu Sep 18 07:00:38 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Thu, 18 Sep 2025 07:00:38 GMT
Subject: RFR: 8367740: assembler_<cpu>.inline.hpp should not include
 assembler.inline.hpp
In-Reply-To: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
References: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
Message-ID: <bBBjc6t8CfgJxG1i1LEG6o41fDyCeWpPAcXkksRSUaI=.af52715b-ac1d-428d-b99f-32be606e5799@github.com>

On Tue, 16 Sep 2025 10:15:06 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

> This is the content of assembler.inline.hpp:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/share/asm/assembler.inline.hpp#L28-L30
> 
> Most of the `assembler_<cpu>.inline.hpp` include it:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/cpu/zero/assembler_zero.inline.hpp#L29-L32
> 
> They should probably include `assembler.hpp` instead.
> 
> Testing: tier1 in GHA

Tests passed. Thanks @fandreuz. LGTM

-------------

Marked as reviewed by dfenacci (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27311#pullrequestreview-3237615976

From rcastanedalo at openjdk.org  Thu Sep 18 07:06:36 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 18 Sep 2025 07:06:36 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <CC-N4gxN7vPXi8Z7FxD_KJ0SZhsaM9lC2GhZQ2HS1y4=.ec0c7b0a-6c56-4fb8-abb1-01448066cc9a@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

Hi @bulasevich, thanks for working on this issue, but please note that it was already assigned to me ([JDK-8359378](https://bugs.openjdk.org/browse/JDK-8359378)). I am fine with re-assigning it to you, but [next time please ask first, to avoid work duplication](https://openjdk.org/guide/#i-found-an-issue-in-jbs-that-i-want-to-fix).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3305714305

From epeter at openjdk.org  Thu Sep 18 07:17:50 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 07:17:50 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v3]
In-Reply-To: <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
 <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>
Message-ID: <Civqq_iSOEBeciSzQ6C3RirbVo8mKocD63REsXGfzW8=.b1bc89e7-7a7a-4dd5-b43d-038206e47e96@github.com>

On Wed, 17 Sep 2025 11:02:12 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Apply Manuel's suggestions part 2
>
> test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 152:
> 
>> 150: 
>> 151:     /**
>> 152:      * Creates a new Espression with 1 arguments.
> 
> For every make(): s/Espression/Expression/

Nice catch!

> test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java line 164:
> 
>> 162:                                   CodeGenerationDataNameType t0,
>> 163:                                   String s1) {
>> 164:         return new Expression(returnType, List.of(t0), List.of(s0, s1), new Info());
> 
> To reduce code duplication, the methods without an additional info should probably use the ones with.
> Suggestion:
> 
>         return make(returnType, s0, t0, s1, new Info());

Nice idea :)

> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestExpressions.java line 27:
> 
>> 25:  * @test
>> 26:  * @bug 8359412
>> 27:  * @summary Demonstrate the use of Expressions form the Template Library.
> 
> Suggestion:
> 
>  * @summary Demonstrate the use of Expressions from the Template Library.
> 
> Typo

Done :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2357779934
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2357777770
PR Review Comment: https://git.openjdk.org/jdk/pull/26885#discussion_r2357782258

From epeter at openjdk.org  Thu Sep 18 07:17:47 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 07:17:47 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v3]
In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
Message-ID: <4DQH2DopQ0lMjj78iaff4d1qwotbvZYLgmtq36Hb_MQ=.d0bc6aef-1821-4a97-887b-0cf054667a7f@github.com>

> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
> 
> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
> 
> Details, in **order you should review**:
> - `Operations.java`: maps lots of primitive operators as Expressions.
> - `Expression.java`: the fundamental engine behind Expressions.
> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
> - `tests/TestExpression.java`: correctness test of Expression machinery.
> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
> 
> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
> 
> **Future Work**:
> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
> - Use `Expression`s to model more operations:
>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Apply Manuel's suggestions part 2

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26885/files
  - new: https://git.openjdk.org/jdk/pull/26885/files/d66aa985..05fb63c4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=01-02

  Stats: 13 lines in 2 files changed: 0 ins; 0 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/26885.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885

PR: https://git.openjdk.org/jdk/pull/26885

From rcastanedalo at openjdk.org  Thu Sep 18 07:19:07 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 18 Sep 2025 07:19:07 GMT
Subject: RFR: 8361699: C2: assert(can_reduce_phi(n->as_Phi())) failed:
 Sanity: previous reducible Phi is no longer reducible before SUT
In-Reply-To: <t_dqm0EN559GBmw4cJBPitWnDxmNFjoGMran3JxdRVI=.cb8a4e4f-7ce7-48f6-9c11-fe646c57efd7@github.com>
References: <Vq1VCjG5GC30eMniZhaW0fm2Yr9gKtP7FbAlu6p6IXg=.c9aa2d51-ac43-4221-b8c8-c96484c2d953@github.com>
 <1uDOe3Oe-hihmDHea2h8vcvRZsKKBeNp0J9lKYUujxk=.abd111bc-3625-4c71-bfa2-0a4c1f4d3875@github.com>
 <VOmxZ5c0SKETC1N8-S-WrvXIU8qCaA5NMS_68UGwVDc=.00bbe82b-dd42-4393-b57b-9df634a12d88@github.com>
 <2brDXuLmbVBVRaeSyCdKokA706v3t6VsZfGvj_QceJ4=.4483390e-c726-4d82-b220-f1dbdf4efef0@github.com>
 <t_dqm0EN559GBmw4cJBPitWnDxmNFjoGMran3JxdRVI=.cb8a4e4f-7ce7-48f6-9c11-fe646c57efd7@github.com>
Message-ID: <UDiYCPoSJR7W3TPiZKiE2pLHTyklwMnGT7pzsRT7tLM=.54758242-9951-4d86-b514-9878b4e8952d@github.com>

On Thu, 11 Sep 2025 07:41:55 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>>> @robcasloz - are you thinking that the "fixed point" loops on `find_scalar_replaceable_allocs` aren't sufficient?
>> 
>> You're right, that should do.
>> 
>>> At first glance yes, I think that the code would be more cleaned up if done that way. If the code had been written like that in the first place we wouldn't have seen the current issue. (...)
>> 
>> Agree, a single fixed point loop combining NSR detection and propagation would be ideal for clarity and maintainability.
>> 
>>>  I propose that we move forward with the current patch and work on this refactoring as a separate issue.
>> 
>> Sounds good, please file a RFE for that. I would suggest then to postpone the clean-up in `revisit_reducible_phi_status` to that RFE.
>
>> @robcasloz - I pushed some changes addressing yours and @eme64 comments. Could you please re-run your internal tests?
> 
> Thanks, I will report back within a couple of days.

> Thank you @robcasloz ; I'll start working on that early next week.

@JohnTortugo thanks.

Please, keep in mind that [HotSpot requires two approvals for non-trivial changes like this](https://openjdk.org/guide/#hotspot-development) (apologies if my previous comment somehow could be interpreted as an invitation to integrate, that was not the intention).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27063#issuecomment-3305759603

From epeter at openjdk.org  Thu Sep 18 07:20:24 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 07:20:24 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v4]
In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
Message-ID: <2zAgocXxD80XNyB_HLyO9JSmsqjJfRGYE-FmFmatuYk=.d42bf76a-39d0-49ba-88c9-df9eebc5aa0f@github.com>

> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
> 
> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
> 
> Details, in **order you should review**:
> - `Operations.java`: maps lots of primitive operators as Expressions.
> - `Expression.java`: the fundamental engine behind Expressions.
> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
> - `tests/TestExpression.java`: correctness test of Expression machinery.
> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
> 
> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
> 
> **Future Work**:
> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
> - Use `Expression`s to model more operations:
>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Apply Manuel's suggestions part 3
  
  Co-authored-by: Manuel H?ssig <manuel at haessig.org>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26885/files
  - new: https://git.openjdk.org/jdk/pull/26885/files/05fb63c4..0a269c3b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=02-03

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/26885.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885

PR: https://git.openjdk.org/jdk/pull/26885

From epeter at openjdk.org  Thu Sep 18 07:34:43 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 07:34:43 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v5]
In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
Message-ID: <slxi0av8gU5NvjBV9-seS2gD12MICl4fczNYzu352tc=.385b79b2-d22e-4140-9733-2e22ac1a5bfc@github.com>

> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
> 
> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
> 
> Details, in **order you should review**:
> - `Operations.java`: maps lots of primitive operators as Expressions.
> - `Expression.java`: the fundamental engine behind Expressions.
> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
> - `tests/TestExpression.java`: correctness test of Expression machinery.
> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
> 
> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
> 
> **Future Work**:
> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
> - Use `Expression`s to model more operations:
>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres...

Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 29 additional commits since the last revision:

 - Merge branch 'master' into JDK-8359412-Template-Framework-Expressions
 - Apply Manuel's suggestions part 3
   
   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
 - Apply Manuel's suggestions part 2
 - Apply Manuel's suggestions part 1
   
   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
 - fix whitespaces
 - LibraryRNG example
 - fix bug
 - documentation
 - improve expression fuzzer
 - wip constraints
 - ... and 19 more: https://git.openjdk.org/jdk/compare/06680b79...a6f83b5a

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26885/files
  - new: https://git.openjdk.org/jdk/pull/26885/files/0a269c3b..a6f83b5a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=03-04

  Stats: 69057 lines in 2033 files changed: 39891 ins; 16903 del; 12263 mod
  Patch: https://git.openjdk.org/jdk/pull/26885.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885

PR: https://git.openjdk.org/jdk/pull/26885

From missa at openjdk.org  Thu Sep 18 07:40:46 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 18 Sep 2025 07:40:46 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v15]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <uSsJVsGNLDikbl_xiVwcdZCZdR_jY1p-2sqiFj7lttI=.1581ea79-1c65-4039-838e-07877fc026b8@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Clean up scalar floating point conversion tests

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/5d26ff48..a7940ee0

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=14
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=13-14

  Stats: 83 lines in 1 file changed: 10 ins; 44 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From missa at openjdk.org  Thu Sep 18 07:40:50 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 18 Sep 2025 07:40:50 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v14]
In-Reply-To: <h5zYzw4-3S7--SEB5eAQakfXk41ytIDP2rAAyaSnnfM=.45cb037a-6503-4e5b-b90e-9df9fc3a4bb4@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <4Eui7URmA1Y5NPrrV4813qb7UUsNVSRP-JSnPdX0Ojg=.4db7c50e-18cd-47ec-ae8c-4ae17597b286@github.com>
 <h5zYzw4-3S7--SEB5eAQakfXk41ytIDP2rAAyaSnnfM=.45cb037a-6503-4e5b-b90e-9df9fc3a4bb4@github.com>
Message-ID: <9qebb_d7KLK6ge1CPFO_5009kTCNbsg4xCZhj3v-H0w=.76ebacee-b653-415c-99c8-aae76bd830a8@github.com>

On Sat, 13 Sep 2025 08:26:13 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Introduce scalar floating point conversion tests with IR rules
>
> test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 70:
> 
>> 68:             float_arr[i] = ran.nextFloat(floor_val, ceil_val);
>> 69:             double_arr[i] = ran.nextDouble(floor_val, ceil_val);
>> 70:         }
> 
> Please use Generators instead of direct initialization.

I could do it for int and long. If there's a compact way to do it for the other types, please let me know.

> test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 89:
> 
>> 87:             if (int_arr[i] != expected) {
>> 88:                 throw new RuntimeException("Invalid result: int_arr[" + i + "] = " + int_arr[i] + " != " + expected);
>> 89:             }
> 
> Use Verify.checkEQ instead.

Ok, I'm using Verify.checkEQ instead.

> test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 109:
> 
>> 107:             if (long_arr[i] != expected) {
>> 108:                 throw new RuntimeException("Invalid result: long_arr[" + i + "] = " + long_arr[i] + " != " + expected);
>> 109:             }
> 
> Use Verify.checkEQ, checkout relevant code in https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/lib and their usages

I modified this. Should I do this for VectorFPtoIntCastTest.java as well? Also, using Verify.checkEQ removes the custom error message unless I use try + catch.

> test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java line 122:
> 
>> 120:         checkf2short();
>> 121:     }
>> 122: 
> 
> What is the reason behind additional level of abstraction when now manually inline this code.

No reason other than I migrated the code from VectorFPtoIntCastTest.java, so it's gone now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2357853486
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2357856425
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2357863945
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2357866615

From dfenacci at openjdk.org  Thu Sep 18 08:06:26 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Thu, 18 Sep 2025 08:06:26 GMT
Subject: RFR: 8367613: Test
 compiler/runtime/TestDontCompileHugeMethods.java failed [v2]
In-Reply-To: <5eWiPUhybQOdBZAfm8LnEGLQ8ZwXHcqatCQEf8PVlgo=.ffcd2f7e-f734-49de-a2e4-1099bfb544f5@github.com>
References: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
 <5eWiPUhybQOdBZAfm8LnEGLQ8ZwXHcqatCQEf8PVlgo=.ffcd2f7e-f734-49de-a2e4-1099bfb544f5@github.com>
Message-ID: <PzIiEEx0dprWa2PAEK3N9Eg9psd6FIl41CYlKMAW3rE=.05fd679e-f2c3-43a1-a3f8-d0221d317e1e@github.com>

On Tue, 16 Sep 2025 21:59:10 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi,
>> 
>> Could anyone approve this change that exclude this test when running with `-Xcomp`? This avoids the test failure reported in [JDK-8367613](https://bugs.openjdk.org/browse/JDK-8367613).
>> 
>> For reasons I don't yet understand, the `HugeSwitch::shortMethod` method is not compiled under `-Xcomp  -XX:TieredStopAtLevel=1`. The method gets compiled with either `-Xcomp` or `-XX:TieredStopAtLevel=1`, but not both. I appreciate if anyone could provide insights on possible reasons.
>
> Man Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Switch to disable inlining for shortMethod

Thanks @caoman. LGTM

-------------

Marked as reviewed by dfenacci (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27306#pullrequestreview-3237998259

From epeter at openjdk.org  Thu Sep 18 08:07:17 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 08:07:17 GMT
Subject: RFR: 8367969: C2: compiler/vectorapi/TestVectorMathLib.java fails
 without UnlockDiagnosticVMOptions
Message-ID: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>

Adding missing `-XX:+UnlockDiagnosticVMOptions`.

Seems the test from https://github.com/openjdk/jdk/pull/27263 was not tested with the product build before integration.

-------------

Commit messages:
 - JDK-8367333

Changes: https://git.openjdk.org/jdk/pull/27359/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27359&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367969
  Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27359.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27359/head:pull/27359

PR: https://git.openjdk.org/jdk/pull/27359

From shade at openjdk.org  Thu Sep 18 08:07:17 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Thu, 18 Sep 2025 08:07:17 GMT
Subject: RFR: 8367969: C2: compiler/vectorapi/TestVectorMathLib.java fails
 without UnlockDiagnosticVMOptions
In-Reply-To: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
References: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
Message-ID: <cABnWSvUvTnsRYLb6lR5PR-IpBp29Y4m7-vgsaIp-xY=.1b3423d3-6001-4f29-a9fd-3ea17323f886@github.com>

On Thu, 18 Sep 2025 07:55:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Adding missing `-XX:+UnlockDiagnosticVMOptions`.
> 
> Seems the test from https://github.com/openjdk/jdk/pull/27263 was not tested with the product build before integration.

Ah, oops. Looks fine and trivial.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27359#pullrequestreview-3237952363

From epeter at openjdk.org  Thu Sep 18 08:12:50 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 08:12:50 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v6]
In-Reply-To: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
Message-ID: <F86WzhNjF4KSD7bCieVWq8HEj_7Zr0cbeM-27xKPFzI=.ebeb8e75-25c5-491b-86f4-bbca1ed3487a@github.com>

> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
> 
> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
> 
> Details, in **order you should review**:
> - `Operations.java`: maps lots of primitive operators as Expressions.
> - `Expression.java`: the fundamental engine behind Expressions.
> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
> - `tests/TestExpression.java`: correctness test of Expression machinery.
> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
> 
> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
> 
> **Future Work**:
> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
> - Use `Expression`s to model more operations:
>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just folds away, but under `StressIGVN` and `Stres...

Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:

 - more comments
 - add othervm to test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26885/files
  - new: https://git.openjdk.org/jdk/pull/26885/files/a6f83b5a..c04c879c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26885&range=04-05

  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/26885.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26885/head:pull/26885

PR: https://git.openjdk.org/jdk/pull/26885

From epeter at openjdk.org  Thu Sep 18 08:16:07 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 08:16:07 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v6]
In-Reply-To: <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
 <4YVAopGtxnlkh39pp0TaW4kNpBuSIXfbz40UDW_We1w=.308dd279-514f-4fa4-b361-ab36f165caf6@github.com>
Message-ID: <zm5_lRxaJZYRj5veTDdUTm3V65TfPev9pOPwLf60OFY=.0132c6e4-830e-4e8e-9fa6-8db76a8494e8@github.com>

On Wed, 17 Sep 2025 14:29:02 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - more comments
>>  - add othervm to test
>
> Thank you for this enhancement, @eme64! It is nice to see the template framework library evolving.
> 
> The changes look good. I mostly have nits.

@mhaessig Thanks for the review, and the many good suggestions :)
I've applied all, and the PR is ready for re-review :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26885#issuecomment-3306055839

From fandreuzzi at openjdk.org  Thu Sep 18 08:17:12 2025
From: fandreuzzi at openjdk.org (Francesco Andreuzzi)
Date: Thu, 18 Sep 2025 08:17:12 GMT
Subject: RFR: 8367740: assembler_<cpu>.inline.hpp should not include
 assembler.inline.hpp
In-Reply-To: <e5uncFkZ0PiR2YfzEV-0lbIbzBRw9N0gImMhSP9YABo=.335e998a-10c2-4d93-90f1-6040b027abce@github.com>
References: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
 <e5uncFkZ0PiR2YfzEV-0lbIbzBRw9N0gImMhSP9YABo=.335e998a-10c2-4d93-90f1-6040b027abce@github.com>
Message-ID: <NhtWdDOzzamWUjPJWNytz3HfxcpKlw0NmSNUa8P5Gjg=.e5f0e2d9-45f8-4d33-9df8-7f691b2e670a@github.com>

On Tue, 16 Sep 2025 13:52:33 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> This is the content of assembler.inline.hpp:
>> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/share/asm/assembler.inline.hpp#L28-L30
>> 
>> Most of the `assembler_<cpu>.inline.hpp` include it:
>> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/cpu/zero/assembler_zero.inline.hpp#L29-L32
>> 
>> They should probably include `assembler.hpp` instead.
>> 
>> Testing: tier1 in GHA
>
> It looks like there were a few include cycles. Thanks for fixing this @fandreuz.
> Running tier1-3+ tests...

Thanks for running the tests @dafedafe

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27311#issuecomment-3306064151

From mhaessig at openjdk.org  Thu Sep 18 08:28:29 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 18 Sep 2025 08:28:29 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v6]
In-Reply-To: <F86WzhNjF4KSD7bCieVWq8HEj_7Zr0cbeM-27xKPFzI=.ebeb8e75-25c5-491b-86f4-bbca1ed3487a@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
 <F86WzhNjF4KSD7bCieVWq8HEj_7Zr0cbeM-27xKPFzI=.ebeb8e75-25c5-491b-86f4-bbca1ed3487a@github.com>
Message-ID: <TzqIM7ebqOd7uSkZlyVKWZY4XjWjQcJwDHzYfJoAnao=.6a86a15e-5adb-4cc9-93a9-eb85865e901f@github.com>

On Thu, 18 Sep 2025 08:12:50 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
>> 
>> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
>> 
>> Details, in **order you should review**:
>> - `Operations.java`: maps lots of primitive operators as Expressions.
>> - `Expression.java`: the fundamental engine behind Expressions.
>> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
>> - `tests/TestExpression.java`: correctness test of Expression machinery.
>> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
>> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
>> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
>> 
>> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
>> 
>> **Future Work**:
>> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
>> - Use `Expression`s to model more operations:
>>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
>> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
>> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just fol...
>
> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - more comments
>  - add othervm to test

Thank you for addressing my comments.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26885#pullrequestreview-3238136469

From mhaessig at openjdk.org  Thu Sep 18 08:29:34 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 18 Sep 2025 08:29:34 GMT
Subject: RFR: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation [v2]
In-Reply-To: <em2n02cGTkEEL9PRHfV9pRcQga7Yft_yndZGv4lzbLA=.5c5f7ef9-6e07-43af-9e9a-e72e7bbfff6e@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
 <em2n02cGTkEEL9PRHfV9pRcQga7Yft_yndZGv4lzbLA=.5c5f7ef9-6e07-43af-9e9a-e72e7bbfff6e@github.com>
Message-ID: <KId5PEf6rx3-yttjB6oqjk_xe1AetR_lb29SbBjTrrA=.e4e09fae-4029-4faa-a940-193d986d12a8@github.com>

On Sat, 6 Sep 2025 00:31:56 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase.
>
> Looks good!

Testing passed. Could you please rereview @dean-long, @eme64 ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27120#issuecomment-3306160497

From mhaessig at openjdk.org  Thu Sep 18 08:32:57 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 18 Sep 2025 08:32:57 GMT
Subject: RFR: 8367721: Test compiler/arguments/TestCompileTaskTimeout.java
 crashed: SIGSEGV
In-Reply-To: <pLXUEKQrdaFVPHDxKNz09fxcxT4_YzwKQiHRbqqAc84=.0bfeb69f-3425-43c0-be52-222edb0bde65@github.com>
References: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>
 <pLXUEKQrdaFVPHDxKNz09fxcxT4_YzwKQiHRbqqAc84=.0bfeb69f-3425-43c0-be52-222edb0bde65@github.com>
Message-ID: <vfxgq6zz6IKeVpvTGHhTlaqMnmCRB_S32AdLepR6-Jk=.879b4974-4dd9-485b-a529-201877efd770@github.com>

On Wed, 17 Sep 2025 11:57:54 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> The test `TestCompileTaskTimeout.java` runs `java -Xcomp -XX:CompileTaskTimeout=1 --version` to demonstrate that the timeout works. Part of the timeout working involves it printing the method of the compile task. Inspecting the core file of the execution that failed with a `SIGSEGV` in the compile task timeout signal handler, the backtrace looks as follows:
>> 
>> #n   <called signal handler>
>> #n+1 CompilerThreadTimeoutLinux::signal_handler()
>> #n+2 <called signal handler>
>> #n+3 timer_settime()
>> #n+4 CompilerThreadTimeoutLinux::disarm()
>> #n+5 CompileTaskWrapper::~CompileTaskWrapper()
>> 
>> So, the compile task hit the timeout during destruction of the underlying `CompileTaskWrapper`. Since the timeout was disarmed only after setting the task to null in the destructor, the signal handler segfaulted when trying to access the method of the compile task to print it out. This PR addresses this issue by moving up the disarmament of the timeout to the top of the destructor.
>> 
>> Because this issue can only be triggered with bad --- or good, depending on your view --- luck on timing, I could not devise a regression test. But this is not too big of an issue, since the CI already caught this issue.
>> 
>> Testing:
>>  - [x] Github Actions
>>  - [x] tier1,tier2,tier3 plus stress testing on Oracle supported platforms
>
> Looks good to me, too!

Thank you for your reviews, @chhagedorn and @marc-chevalier!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27331#issuecomment-3306167108

From mhaessig at openjdk.org  Thu Sep 18 08:32:58 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 18 Sep 2025 08:32:58 GMT
Subject: Integrated: 8367721: Test
 compiler/arguments/TestCompileTaskTimeout.java crashed: SIGSEGV
In-Reply-To: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>
References: <TOj6QWViAuXCet_-njgSCsuF03jfb4lDHHs9AgUAqcc=.7d00f824-8632-412e-a213-33206968b2cc@github.com>
Message-ID: <VTia5OOBvDrNW13Ij57bl5nlXQCgzDtQn49Ui0Bbq1M=.8689377d-691b-44e7-916b-b0d5b9eaec56@github.com>

On Wed, 17 Sep 2025 06:57:29 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> The test `TestCompileTaskTimeout.java` runs `java -Xcomp -XX:CompileTaskTimeout=1 --version` to demonstrate that the timeout works. Part of the timeout working involves it printing the method of the compile task. Inspecting the core file of the execution that failed with a `SIGSEGV` in the compile task timeout signal handler, the backtrace looks as follows:
> 
> #n   <called signal handler>
> #n+1 CompilerThreadTimeoutLinux::signal_handler()
> #n+2 <called signal handler>
> #n+3 timer_settime()
> #n+4 CompilerThreadTimeoutLinux::disarm()
> #n+5 CompileTaskWrapper::~CompileTaskWrapper()
> 
> So, the compile task hit the timeout during destruction of the underlying `CompileTaskWrapper`. Since the timeout was disarmed only after setting the task to null in the destructor, the signal handler segfaulted when trying to access the method of the compile task to print it out. This PR addresses this issue by moving up the disarmament of the timeout to the top of the destructor.
> 
> Because this issue can only be triggered with bad --- or good, depending on your view --- luck on timing, I could not devise a regression test. But this is not too big of an issue, since the CI already caught this issue.
> 
> Testing:
>  - [x] Github Actions
>  - [x] tier1,tier2,tier3 plus stress testing on Oracle supported platforms

This pull request has now been integrated.

Changeset: 04dcaa34
Author:    Manuel H?ssig <mhaessig at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/04dcaa3412d07c407aed604874095acaf81d7309
Stats:     5 lines in 1 file changed: 4 ins; 1 del; 0 mod

8367721: Test compiler/arguments/TestCompileTaskTimeout.java crashed: SIGSEGV

Reviewed-by: mchevalier, chagedorn

-------------

PR: https://git.openjdk.org/jdk/pull/27331

From mhaessig at openjdk.org  Thu Sep 18 08:34:58 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 18 Sep 2025 08:34:58 GMT
Subject: RFR: 8367969: C2: compiler/vectorapi/TestVectorMathLib.java fails
 without UnlockDiagnosticVMOptions
In-Reply-To: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
References: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
Message-ID: <cmbiWPftccgEz_zMFbfTxHIWCFavoGDf8syWM1_rpdU=.25347d2b-2b65-49d6-8a85-67cb0c5498f6@github.com>

On Thu, 18 Sep 2025 07:55:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Adding missing `-XX:+UnlockDiagnosticVMOptions`.
> 
> Seems the test from https://github.com/openjdk/jdk/pull/27263 was not tested with the product build before integration.

Thank you for this fix, @eme64. It looks good to me.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27359#pullrequestreview-3238186192

From ayang at openjdk.org  Thu Sep 18 09:04:08 2025
From: ayang at openjdk.org (Albert Mingkun Yang)
Date: Thu, 18 Sep 2025 09:04:08 GMT
Subject: RFR: 8367740: assembler_<cpu>.inline.hpp should not include
 assembler.inline.hpp
In-Reply-To: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
References: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
Message-ID: <AjeEULpOQrr-3l8uqbQMkbntG1Y-1X9LyQug02sTvXA=.1c22344e-067b-4172-b08b-093ecfeba029@github.com>

On Tue, 16 Sep 2025 10:15:06 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

> This is the content of assembler.inline.hpp:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/share/asm/assembler.inline.hpp#L28-L30
> 
> Most of the `assembler_<cpu>.inline.hpp` include it:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/cpu/zero/assembler_zero.inline.hpp#L29-L32
> 
> They should probably include `assembler.hpp` instead.
> 
> Testing: tier1 in GHA

Marked as reviewed by ayang (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27311#pullrequestreview-3238321532

From duke at openjdk.org  Thu Sep 18 09:07:20 2025
From: duke at openjdk.org (duke)
Date: Thu, 18 Sep 2025 09:07:20 GMT
Subject: RFR: 8367740: assembler_<cpu>.inline.hpp should not include
 assembler.inline.hpp
In-Reply-To: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
References: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
Message-ID: <x_iGQRnb3ol8c1BVaCQbKla7N9wBY24cdJofF5fDzMI=.14446950-ee55-4672-a000-cbd3d8322158@github.com>

On Tue, 16 Sep 2025 10:15:06 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

> This is the content of assembler.inline.hpp:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/share/asm/assembler.inline.hpp#L28-L30
> 
> Most of the `assembler_<cpu>.inline.hpp` include it:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/cpu/zero/assembler_zero.inline.hpp#L29-L32
> 
> They should probably include `assembler.hpp` instead.
> 
> Testing: tier1 in GHA

@fandreuz 
Your change (at version ce90f21fb1b61d82f14bd24381914caa81ff2a1f) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27311#issuecomment-3306378862

From fandreuzzi at openjdk.org  Thu Sep 18 09:12:45 2025
From: fandreuzzi at openjdk.org (Francesco Andreuzzi)
Date: Thu, 18 Sep 2025 09:12:45 GMT
Subject: Integrated: 8367740: assembler_<cpu>.inline.hpp should not include
 assembler.inline.hpp
In-Reply-To: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
References: <QYuy3O1NQdXFTX_2k292z-7Ni02AHU1VhbIZDzwG8hk=.48b8361e-0442-41d3-a319-5a506d0cf650@github.com>
Message-ID: <kYG5wzCWH50WrM7N3IVu8N4Bvvyih2tWCZsbDntirL4=.92b7b263-c6d1-45e9-8692-9050cb8a1400@github.com>

On Tue, 16 Sep 2025 10:15:06 GMT, Francesco Andreuzzi <fandreuzzi at openjdk.org> wrote:

> This is the content of assembler.inline.hpp:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/share/asm/assembler.inline.hpp#L28-L30
> 
> Most of the `assembler_<cpu>.inline.hpp` include it:
> https://github.com/openjdk/jdk/blob/ca89cd06d39ed3a6bbe16f60fea4d7382849edbd/src/hotspot/cpu/zero/assembler_zero.inline.hpp#L29-L32
> 
> They should probably include `assembler.hpp` instead.
> 
> Testing: tier1 in GHA

This pull request has now been integrated.

Changeset: 4c7c009d
Author:    Francesco Andreuzzi <fandreuzzi at openjdk.org>
Committer: Damon Fenacci <dfenacci at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/4c7c009dd6aa2ce1f65f05c05d7376240f3c01cd
Stats:     5 lines in 5 files changed: 0 ins; 0 del; 5 mod

8367740: assembler_<cpu>.inline.hpp should not include assembler.inline.hpp

Reviewed-by: dfenacci, ayang

-------------

PR: https://git.openjdk.org/jdk/pull/27311

From duke at openjdk.org  Thu Sep 18 09:41:40 2025
From: duke at openjdk.org (Don Phelix)
Date: Thu, 18 Sep 2025 09:41:40 GMT
Subject: RFR: 8367969: C2: compiler/vectorapi/TestVectorMathLib.java fails
 without UnlockDiagnosticVMOptions
In-Reply-To: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
References: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
Message-ID: <HlArd5_Y1IYR8ayY1BbDUBjcxowGAOjoMh1_19U_0Qw=.cca4a2bb-a051-4dda-bcde-aae75f5e61b1@github.com>

On Thu, 18 Sep 2025 07:55:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Adding missing `-XX:+UnlockDiagnosticVMOptions`.
> 
> Seems the test from https://github.com/openjdk/jdk/pull/27263 was not tested with the product build before integration.

LGTM :)

-------------

Marked as reviewed by donphelix at github.com (no known OpenJDK username).

PR Review: https://git.openjdk.org/jdk/pull/27359#pullrequestreview-3238384691

From bulasevich at openjdk.org  Thu Sep 18 10:47:15 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Thu, 18 Sep 2025 10:47:15 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <CC-N4gxN7vPXi8Z7FxD_KJ0SZhsaM9lC2GhZQ2HS1y4=.ec0c7b0a-6c56-4fb8-abb1-01448066cc9a@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
 <CC-N4gxN7vPXi8Z7FxD_KJ0SZhsaM9lC2GhZQ2HS1y4=.ec0c7b0a-6c56-4fb8-abb1-01448066cc9a@github.com>
Message-ID: <0hDCgQdHVY_yIY00TsLYZlcI7aKnw992z_x0DhqvhIY=.a5ef47d5-cfa8-4117-85b2-9e4c45d50975@github.com>

On Thu, 18 Sep 2025 07:03:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
>> 
>> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
>> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
>> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
>> 
>> Related:
>> - reproduced since #19746
>> - spilling logic: 
>>   - #18967
>>   - #17977
>> 
>> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH
>
> Hi @bulasevich, thanks for working on this issue, but please note that it was already assigned to me ([JDK-8359378](https://bugs.openjdk.org/browse/JDK-8359378)). I am fine with re-assigning it to you, but [next time please ask first, to avoid work duplication](https://openjdk.org/guide/#i-found-an-issue-in-jbs-that-i-want-to-fix).

Right, @robcasloz,
I started investigating this issue thinking it was something wrong in my own code. Once I realized it was a common issue already assigned, I decided to propose a fix since it looked a bit abandoned. I didn?t mean to bypass your work -- you?re right, I should have contacted you first.
Anyway, I?d appreciate your review. Do you think my change is reasonable? If not, let me close this PR and leave it to you.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3306764696

From aph at openjdk.org  Thu Sep 18 10:57:17 2025
From: aph at openjdk.org (Andrew Haley)
Date: Thu, 18 Sep 2025 10:57:17 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <Hf04hvehmScqdm1Bsvw_1n0GQTNnwx6WJ7Pno3zp0EE=.90afbd02-305a-4b11-ae3a-f06423a3f013@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

Given that you're looking at this, I'd appreciate it if you could form an opinion bout whether this option is of any use.

`UseFPUForSpilling` on AArch64 is showing signs of code rot. If it has advantages on some machine we should turn it on by default; if it does not, why support it at all?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3306800418

From epeter at openjdk.org  Thu Sep 18 11:12:24 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 11:12:24 GMT
Subject: RFR: 8367969: C2: compiler/vectorapi/TestVectorMathLib.java fails
 without UnlockDiagnosticVMOptions
In-Reply-To: <cmbiWPftccgEz_zMFbfTxHIWCFavoGDf8syWM1_rpdU=.25347d2b-2b65-49d6-8a85-67cb0c5498f6@github.com>
References: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
 <cmbiWPftccgEz_zMFbfTxHIWCFavoGDf8syWM1_rpdU=.25347d2b-2b65-49d6-8a85-67cb0c5498f6@github.com>
Message-ID: <JOKFmwRIIGhZL_QAq9LBpwsCb73TbTGWWJG11ezI7qw=.ef886abe-3a77-4f6c-b4e2-64415cc0607f@github.com>

On Thu, 18 Sep 2025 08:32:23 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Adding missing `-XX:+UnlockDiagnosticVMOptions`.
>> 
>> Seems the test from https://github.com/openjdk/jdk/pull/27263 was not tested with the product build before integration.
>
> Thank you for this fix, @eme64. It looks good to me.

@mhaessig @shipilev Thanks for the reviews!

I agree that it is trivial, so I'm integrating before the 24h mark to quiet the CI.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27359#issuecomment-3306860478

From epeter at openjdk.org  Thu Sep 18 11:12:26 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 11:12:26 GMT
Subject: Integrated: 8367969: C2: compiler/vectorapi/TestVectorMathLib.java
 fails without UnlockDiagnosticVMOptions
In-Reply-To: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
References: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
Message-ID: <fIu2SZBkT262567rsBxjw96mRyG2m4iBjqP3AgOveMY=.93ad2dda-8d6a-483b-b897-f2b0bc44bf63@github.com>

On Thu, 18 Sep 2025 07:55:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Adding missing `-XX:+UnlockDiagnosticVMOptions`.
> 
> Seems the test from https://github.com/openjdk/jdk/pull/27263 was not tested with the product build before integration.

This pull request has now been integrated.

Changeset: a49856bb
Author:    Emanuel Peter <epeter at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/a49856bb044057a738ffc4186e1e5e3916c0254c
Stats:     3 lines in 1 file changed: 2 ins; 0 del; 1 mod

8367969: C2: compiler/vectorapi/TestVectorMathLib.java fails without UnlockDiagnosticVMOptions

Reviewed-by: shade, mhaessig

-------------

PR: https://git.openjdk.org/jdk/pull/27359

From bulasevich at openjdk.org  Thu Sep 18 11:57:18 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Thu, 18 Sep 2025 11:57:18 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <Hf04hvehmScqdm1Bsvw_1n0GQTNnwx6WJ7Pno3zp0EE=.90afbd02-305a-4b11-ae3a-f06423a3f013@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
 <Hf04hvehmScqdm1Bsvw_1n0GQTNnwx6WJ7Pno3zp0EE=.90afbd02-305a-4b11-ae3a-f06423a3f013@github.com>
Message-ID: <6rmFXFPH5a9AnoLqSQa5XBplICEu961cu-6HuXh4EX4=.30793abf-cfb4-49dc-8733-e5e17357f1a8@github.com>

On Thu, 18 Sep 2025 10:54:58 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
>> 
>> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
>> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
>> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
>> 
>> Related:
>> - reproduced since #19746
>> - spilling logic: 
>>   - #18967
>>   - #17977
>> 
>> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH
>
> Given that you're looking at this, I'd appreciate it if you could form an opinion bout whether this option is of any use.
> 
> `UseFPUForSpilling` on AArch64 is showing signs of code rot. If it has advantages on some machine we should turn it on by default; if it does not, why support it at all?

@theRealAph Andrew, I agree with you. From my experience it is useless on Cortex-A72, Neoverse N1, Neoverse V1. I have now also checked on Neoverse V2 and Apple M4 - in both cases UseFPUForSpilling shows a clear performance degradation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3307050077

From epeter at openjdk.org  Thu Sep 18 11:58:25 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 11:58:25 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <U1p9TRvNC5OzFwxraenVJ-4R4AV5Tnyhho025Fyh-ow=.5b0cc316-6852-4f90-aede-7363eac525e7@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>
 <U1p9TRvNC5OzFwxraenVJ-4R4AV5Tnyhho025Fyh-ow=.5b0cc316-6852-4f90-aede-7363eac525e7@github.com>
Message-ID: <IQTZDf2gw-_aL56PTtGFS6j7suZF4a9Hr662V844Xfo=.68420ca5-42dc-4950-b7d5-65fcd3635e0d@github.com>

On Tue, 16 Sep 2025 01:24:35 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Could we also bail out here? Or what would happen now in production if there is a RF edge?
>
> We also use this area past endoff() for storing the "ex_oop" (see for example GraphKit::has_saved_ex_oop()).  Are ex_oop and reachability edges mutually exclusive?

@dean-long @iwanowww Ok, but probably there will at some point be a conflict. And if RF are rather rare, we will not notice so fast. Or would your stress flag catch the conflict?

Is there not a way to make it clear/explicit which edges are there for what reason?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2358840330

From epeter at openjdk.org  Thu Sep 18 11:58:26 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 11:58:26 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <xP4A6eKQTNQXipJZg6T_seJd34FLciuhATeG7nFMatw=.61c8e108-23c3-4898-9ec5-94bc86c6157d@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <_n3uP_Dkl3RNq3MFoRDXsS28SM8CcQHaR6vdUJF9U8s=.dcfab97b-be28-4244-93df-c8a23d6d66b8@github.com>
 <IcKeEoxM236ICcdLmdt-k_K1peNrxOnlpCLZa_3H4eA=.648697cb-baad-47f1-96d6-f8b13d9b69db@github.com>
 <hRJMVwoNPY2xD1ntsuHYrmG283r7RCyAcTZZ6USWe4A=.25f49491-163e-44f8-946c-e157f8837250@github.com>
 <xP4A6eKQTNQXipJZg6T_seJd34FLciuhATeG7nFMatw=.61c8e108-23c3-4898-9ec5-94bc86c6157d@github.com>
Message-ID: <I00Fs-iY2l3PGGyVrp10VB54XTtL7dYRgvjbc4jxKS8=.3ddcdd6a-38d8-48d9-bc8d-6e2301d5ab47@github.com>

On Wed, 17 Sep 2025 22:26:57 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> @iwanowww
>
> The code in `PhaseMacroExpand::process_users_of_allocation` iterates over direct users of result cast from Allocation nodes. And RF is not special there. Any other case in `PhaseMacroExpand::process_users_of_allocation()` would be affected.

Ah ok. As long as it only iterates over the result cast, that is good :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2358818812

From epeter at openjdk.org  Thu Sep 18 11:58:28 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 11:58:28 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
In-Reply-To: <OPZ9TXu8K1JvTi4C_dvvtXyVMYyT-oLGpzq1lOhDBKI=.cc9d4827-dadb-49d2-9132-0127a81d1854@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <LMu3QXiw3K_5-conH2FlBXyRWNzNDunp0L86Ubxeju8=.86d4010f-ab7b-42cf-a9c3-5c24c23d2a81@github.com>
 <LSNdJRy0deekqrysCXuqGFogGIkwJ5MJEcefFMskxwY=.a28eb64e-ee52-432c-a4cf-9504aaf4a2e1@github.com>
 <Xa1zqmkXNFFxwRU6q-sZIXomxJtIQLepUwcx5IuuE-c=.4affc0fb-1036-4635-9c41-ec76225d1f60@github.com>
 <Ci-CQQY-qs8vwCJmOqh2gmFHaULHRF1o9MXTu15rCJg=.becf246a-86a9-4adf-a2b6-ebfe27676347@github.com>
 <JKD0jqllmhfDeNwEd1g7LMAx8V4idsZFTXrS4C0KkUI=.47cf85da-9d62-4339-8bbd-80821a48ac32@github.com>
 <4jTV6y9R_JfATA54LC7FK3DKdBX1srsU09DK1I25Uo0=.94233927-71f2-4f13-894d-206d00f5fdaa@github.com>
 <OPZ9TXu8K1JvTi4C_dvvtXyVMYyT-oLGpzq1lOhDBKI=.cc9d4827-dadb-49d2-9132-0127a81d1854@github.com>
Message-ID: <sygRN1UY8o8JXngLbwPFbIiyPhGbn87rCqrsyy2VBDI=.bbf51ec0-588a-4da8-acce-24f54878289f@github.com>

On Wed, 17 Sep 2025 19:48:44 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Ah yes: we may for example move a store out (after) the loop. But wait. We can't move a store across a SafePoint, so that's not a good example.
>
> For example, loads suffer from the same problems as stores, but constraints on them are more lax.

Are you saying we are allowed to move loads across SafePoints, but not across RF? If yes, please add such an example to the code comments ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2358830331

From rcastanedalo at openjdk.org  Thu Sep 18 12:00:16 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 18 Sep 2025 12:00:16 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <CC-N4gxN7vPXi8Z7FxD_KJ0SZhsaM9lC2GhZQ2HS1y4=.ec0c7b0a-6c56-4fb8-abb1-01448066cc9a@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
 <CC-N4gxN7vPXi8Z7FxD_KJ0SZhsaM9lC2GhZQ2HS1y4=.ec0c7b0a-6c56-4fb8-abb1-01448066cc9a@github.com>
Message-ID: <gIbEsRs0odSigjEw7DvOJaGe-fXQ3pAIcX1so4Kb3xg=.f0c6004a-4dfd-45c7-898c-1ed7e7178236@github.com>

On Thu, 18 Sep 2025 07:03:52 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
>> 
>> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
>> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
>> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
>> 
>> Related:
>> - reproduced since #19746
>> - spilling logic: 
>>   - #18967
>>   - #17977
>> 
>> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH
>
> Hi @bulasevich, thanks for working on this issue, but please note that it was already assigned to me ([JDK-8359378](https://bugs.openjdk.org/browse/JDK-8359378)). I am fine with re-assigning it to you, but [next time please ask first, to avoid work duplication](https://openjdk.org/guide/#i-found-an-issue-in-jbs-that-i-want-to-fix).

> Right, @robcasloz, I started investigating this issue thinking it was something wrong in my own code. Once I realized it was a common issue already assigned, I decided to propose a fix since it looked a bit abandoned. I didn?t mean to bypass your work -- you?re right, I should have contacted you first. Anyway, I?d appreciate your review. Do you think my change is reasonable? If not, let me close this PR and leave it to you.

Thanks, I had planned to look at this in the upcoming weeks but did not start yet. I just reassigned the issue to you, will have a look at your fix within the next days.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3307058150

From epeter at openjdk.org  Thu Sep 18 12:04:20 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:04:20 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <jwNtnLYIGi7j2QdiTUrPGgi5NIbnxTTPlP3eoRkx_mI=.c24b74ae-fdbb-44a9-ac73-9d7fbe6e534b@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <rJb4N4TiJ4HhAMEFTVFn1e1qPpPfEwiBB6ezo7nM-cg=.fca1c659-150e-4d97-a47a-01bd836d867d@github.com>
 <7s8qppZ6lzq5iN-inRFkFuXgElo46UmYyIrvExOLA3A=.cf76da61-89ee-4d29-9b5a-0b6e7b3bac2b@github.com>
 <jwNtnLYIGi7j2QdiTUrPGgi5NIbnxTTPlP3eoRkx_mI=.c24b74ae-fdbb-44a9-ac73-9d7fbe6e534b@github.com>
Message-ID: <Ylgu-_OUA8cV7PjHLuEAqY3uXBs0elX-QKc4uWBYiVI=.5876bfa5-8c90-40f7-949d-bee250c3ab7c@github.com>

On Wed, 17 Sep 2025 19:44:52 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>>> Why are you checking for _mode == LoopOptsDefaultFinal and not for LoopOptsEliminateRFs?
>> 
>> The intention is to avoid an extra `PhaseIdealLoop` construction pass solely for `LoopOptsEliminateRFs` purposes when there's an empty pass during normal flow of loop optimizations.   
>> 
>> `LoopOptsEliminateRFs` is performed as the last resort when there was no previous pass to piggyback on.
>
> Maybe `LoopOptsEliminateRFs` should stress that it is intended to happen as the very last step in the flow of loop optimizations. Or, something happening after all other loop optimizations are over. I'll think more about it.
> 
> From code perspective, what makes things more complicated is that  `PhaseIdealLoop` instance is hidden in `PhaseIdealLoop::optimize()`, so shaping it as a step in loop opts pipeline feels like the most appropriate thing to do.

@iwanowww It is the last step in the pipeline, but the pipeline could get executed again, right? So then you may think that you have reached the last step in the pipeline, but then in the next pipeline execution, you might have already eliminated the RF, and now you would do some loop-opts that you should not. That's what I'm worried about. Can we have some assert for that?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2358850185

From epeter at openjdk.org  Thu Sep 18 12:04:22 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:04:22 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <abW7rcqDPPGwpB4Z44AyiEI86Ortsd07DoofjbuFtSA=.3c9c3299-88fa-4727-be21-7daf27acabf2@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <sM14v3wTzmIjccAGdJ19bgJ_w8O6ZfVTzCDAYIPtkh4=.4c158d93-6a29-4024-b5e4-413c6ed29481@github.com>
 <abW7rcqDPPGwpB4Z44AyiEI86Ortsd07DoofjbuFtSA=.3c9c3299-88fa-4727-be21-7daf27acabf2@github.com>
Message-ID: <K67KJJ5-d5BakIo0J0bU0R8yUGc1RKgiVaJ5U-lJOQo=.2146ef05-4f44-40f1-ad0a-2297b94871f4@github.com>

On Wed, 17 Sep 2025 19:53:18 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Why don't we put RF edges somewhere else, so they don't look like derived oops?  I was thinking they could go in the monitor area, or if that causes problems, we introduce a new area.
>
> It's solely an implementation limitation. As of now, the only structure imposed on safepoint inputs relates to debug info (represented as JVMState). The rest is adhoc and there are many conflicting use cases introduced over time. The proper way to address it is to introduce proper structure for non-debug inputs, but it requires significant engineering effort to properly handle it across the whole compilation pipeline. For now, I just work-around it by performing additional transformation to avoid conflicts with existing functionality.

Maybe we should do that effort soon, otherwise we just keep heaping up tech dept :/

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2358853253

From epeter at openjdk.org  Thu Sep 18 12:04:23 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:04:23 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v11]
In-Reply-To: <K67KJJ5-d5BakIo0J0bU0R8yUGc1RKgiVaJ5U-lJOQo=.2146ef05-4f44-40f1-ad0a-2297b94871f4@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <1pShdyn-7-wwwiuY1DdMt5iiZ2qc9l_x2F-3AKqkg60=.dd260953-05cc-4b84-b6d1-7f684e74084c@github.com>
 <EZmv1oLssvikAF7lwHm77Q5kmfuKvoqXLhxGvSL1Mho=.c7db1b91-3536-48ad-90c6-c4c098094706@github.com>
 <sM14v3wTzmIjccAGdJ19bgJ_w8O6ZfVTzCDAYIPtkh4=.4c158d93-6a29-4024-b5e4-413c6ed29481@github.com>
 <abW7rcqDPPGwpB4Z44AyiEI86Ortsd07DoofjbuFtSA=.3c9c3299-88fa-4727-be21-7daf27acabf2@github.com>
 <K67KJJ5-d5BakIo0J0bU0R8yUGc1RKgiVaJ5U-lJOQo=.2146ef05-4f44-40f1-ad0a-2297b94871f4@github.com>
Message-ID: <Kqxp1L6Eit1uYwFw4qg3ij-WjcQxAwTlJkAQtNvSOus=.11b05b1a-9790-4c1b-9aec-611197f01c5c@github.com>

On Thu, 18 Sep 2025 11:59:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> It's solely an implementation limitation. As of now, the only structure imposed on safepoint inputs relates to debug info (represented as JVMState). The rest is adhoc and there are many conflicting use cases introduced over time. The proper way to address it is to introduce proper structure for non-debug inputs, but it requires significant engineering effort to properly handle it across the whole compilation pipeline. For now, I just work-around it by performing additional transformation to avoid conflicts with existing functionality.
>
> Maybe we should do that effort soon, otherwise we just keep heaping up tech dept :/

And who knows, maybe conflicts are only avoided by accident, and maybe just because we did not encounter cases where the different features actually overlap and conflict. Or are we confident that we generated sufficient cases with overlaps of the different features that use the safepoint inputs?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2358857194

From epeter at openjdk.org  Thu Sep 18 12:17:27 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:17:27 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v6]
In-Reply-To: <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
Message-ID: <3LhOW_sYJcS3zgNB2PLXAQ393WU73hdgjSqmsmoy7VQ=.3cbc1e66-c59e-41b1-80c8-24373797259a@github.com>

On Wed, 17 Sep 2025 08:48:16 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
>> 
>> ### Background
>> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
>> 
>> ### Implementation
>> 
>> #### Challenges
>> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
>> 
>> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
>> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
>> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
>> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
>> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
>> 
>> Use `ByteVector.SPECIES_512` as an example:
>> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
>> - It requires 4 times of vector gather-loads to finish the whole operation.
>> 
>> 
>> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
>> int[] idx = [0, 1, 2, 3, ..., 63, ...]
>> 
>> 4 gather-load:
>> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
>> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
>> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
>> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
>> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
>> 
>> 
>> #### Solution
>> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
>> 
>> Here is the main changes:
>> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
>> - Added `VectorSliceNode` for result mer...
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Add more comments for IRs and added method
>  - Merge branch 'jdk:master' into JDK-8351623-sve
>  - Merge 'jdk:master' into JDK-8351623-sve
>  - Address review comments
>  - Refine IR pattern and clean backend rules
>  - Fix indentation issue and move the helper matcher method to header files
>  - Merge branch jdk:master into JDK-8351623-sve
>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation

src/hotspot/cpu/aarch64/matcher_aarch64.hpp line 173:

> 171:   // SVE requires vector indices for gather-load/scatter-store operations on all
> 172:   // data types.
> 173:   static bool gather_scatter_requires_index_in_address(BasicType bt) {

I know I agreed to this naming, but I looked at the signature of `Gather` again:
`LoadVectorGatherNode(Node* c, Node* mem, Node* adr, const TypePtr* at, const TypeVect* vt, Node* indices)`

I'm a little confused now what is the `address` that your name references. Is it the `adr`? I think not, because that is the base address, right? Can you clarify a little more? Maybe add to the documentation of the gather and scatter node as well, if you think that helps?

src/hotspot/share/opto/vectornode.hpp line 1121:

> 1119: // that has the same vector type as the node's bottom type. For non-subword types, it must
> 1120: // be. However, for subword types, the basic type of index is int. Hence, the index map
> 1121: // can be either a vector with int elements or an address which saves the int indices.

Very nice, that helps!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2358918581
PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2358924085

From epeter at openjdk.org  Thu Sep 18 12:22:32 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:22:32 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v6]
In-Reply-To: <3LhOW_sYJcS3zgNB2PLXAQ393WU73hdgjSqmsmoy7VQ=.3cbc1e66-c59e-41b1-80c8-24373797259a@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
 <3LhOW_sYJcS3zgNB2PLXAQ393WU73hdgjSqmsmoy7VQ=.3cbc1e66-c59e-41b1-80c8-24373797259a@github.com>
Message-ID: <--dYtit2PWnrw8fxiHum8BLdxnRAWBNfNAz4eGWYI8E=.ac6c9739-e926-47fe-8c5f-db6ef04b906c@github.com>

On Thu, 18 Sep 2025 12:13:55 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
>> 
>>  - Add more comments for IRs and added method
>>  - Merge branch 'jdk:master' into JDK-8351623-sve
>>  - Merge 'jdk:master' into JDK-8351623-sve
>>  - Address review comments
>>  - Refine IR pattern and clean backend rules
>>  - Fix indentation issue and move the helper matcher method to header files
>>  - Merge branch jdk:master into JDK-8351623-sve
>>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation
>
> src/hotspot/cpu/aarch64/matcher_aarch64.hpp line 173:
> 
>> 171:   // SVE requires vector indices for gather-load/scatter-store operations on all
>> 172:   // data types.
>> 173:   static bool gather_scatter_requires_index_in_address(BasicType bt) {
> 
> I know I agreed to this naming, but I looked at the signature of `Gather` again:
> `LoadVectorGatherNode(Node* c, Node* mem, Node* adr, const TypePtr* at, const TypeVect* vt, Node* indices)`
> 
> I'm a little confused now what is the `address` that your name references. Is it the `adr`? I think not, because that is the base address, right? Can you clarify a little more? Maybe add to the documentation of the gather and scatter node as well, if you think that helps?

Actually, you already did add documentation to the gather / scatter nodes now. And based on your explanation there, I suggest you rename the method here to:
`gather_scatter_requires_indices_from_array`
This would say that the indices come from an array, rather than a vector register.

Your current name we had agreed on confuses me because it suggests that the index maybe already in the address `adr`, but that does not make much sense.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2358946032

From epeter at openjdk.org  Thu Sep 18 12:30:52 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:30:52 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v6]
In-Reply-To: <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
Message-ID: <DyyZxb29M5ZRT6cgLF7FUNey8VM-JH4YhqR-qGYyvJM=.57540185-bb6e-48fa-8ef5-193ab035a25b@github.com>

On Wed, 17 Sep 2025 08:48:16 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
>> 
>> ### Background
>> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
>> 
>> ### Implementation
>> 
>> #### Challenges
>> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
>> 
>> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
>> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
>> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
>> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
>> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
>> 
>> Use `ByteVector.SPECIES_512` as an example:
>> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
>> - It requires 4 times of vector gather-loads to finish the whole operation.
>> 
>> 
>> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
>> int[] idx = [0, 1, 2, 3, ..., 63, ...]
>> 
>> 4 gather-load:
>> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
>> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
>> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
>> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
>> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
>> 
>> 
>> #### Solution
>> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
>> 
>> Here is the main changes:
>> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
>> - Added `VectorSliceNode` for result mer...
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Add more comments for IRs and added method
>  - Merge branch 'jdk:master' into JDK-8351623-sve
>  - Merge 'jdk:master' into JDK-8351623-sve
>  - Address review comments
>  - Refine IR pattern and clean backend rules
>  - Fix indentation issue and move the helper matcher method to header files
>  - Merge branch jdk:master into JDK-8351623-sve
>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation

@XiaohongGong I'm going to be away on vacation for about 3 weeks now. So I won't be able to continue with the review until I'm back.

Maybe @vnkozlov or @iwanowww can review instead. Maybe @PaulSandoz or @jatin-bhateja would like to look at it too. If they do, I would want them to consider if the approach with the special vector nodes `VectorConcatenateAndNarrow` and `VectorMaskWiden` are really desirable. The complexity needs to go somewhere, but I'm not sure if it is better in the C2 IR or in the backend.

In this PR, there are already a thread [here](https://github.com/openjdk/jdk/pull/26236#discussion_r2324740007) and [here](https://github.com/openjdk/jdk/pull/26236#discussion_r2324744990).

-------------

PR Review: https://git.openjdk.org/jdk/pull/26236#pullrequestreview-3239353455

From epeter at openjdk.org  Thu Sep 18 12:57:48 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:57:48 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <cwluATNnzACJ0UXNLV2hG9aF1bQzVXlewzGHmYhSz0M=.f2d2a6c0-e49f-419c-820b-5d6103eeeba9@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <cwluATNnzACJ0UXNLV2hG9aF1bQzVXlewzGHmYhSz0M=.f2d2a6c0-e49f-419c-820b-5d6103eeeba9@github.com>
Message-ID: <1YTjbiOmc3OUXZlJ_Pg4W6En5hjU0wd_JBHERbVLDWc=.11ddbe0f-685b-463e-87b7-fcdd14ad4bb2@github.com>

On Tue, 9 Sep 2025 02:09:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update countbitsnode.cpp
>
> Hi @TobiHartmann , @SirYwell , @eme64 , can you kindly verify the changes in the latest patch?

@jatin-bhateja I'm going to be out of the office for about 3 weeks, so feel free to ask others for reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3307301622

From epeter at openjdk.org  Thu Sep 18 12:57:49 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:57:49 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v4]
In-Reply-To: <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>
Message-ID: <Mq9bIIIY6NSmosRI_-owwagq5q83ZU5XWmuCrGndbOs=.ce80f718-c6ce-4c97-959b-0e45e22658b9@github.com>

On Mon, 15 Sep 2025 05:55:43 GMT, erifan <duke at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge branch 'master' into JDK-8363989
>  - Align code example data for better reading
>  - Merge branch 'master' into JDK-8363989
>  - Improve the comment of the vector expand implementation
>  - Merge branch 'master' into JDK-8363989
>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>    
>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>    cases, `expand` has not yet been intrinsified:
>    1. **Subword types** on SVE2-capable hardware.
>    2. **All types** on NEON and SVE1 environments.
>    
>    As a result, `expand` API performance is very poor in these scenarios.
>    This patch intrinsifies the `expand` operation in the above environments.
>    
>    Since there are no native instructions directly corresponding to `expand`
>    in these cases, this patch mainly leverages the `TBL` instruction to
>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>    Take a 128-bit byte vector on SVE2 as an example:
>    ```
>    To compute: dst = src.expand(mask)
>    Data direction: high <== low
>    Input:
>      src                         = p o n m l k j i h g f e d c b a
>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    Expected result:
>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>    ```
>    Step 1: calculate the index input of the TBL instruction.
>    ```
>    // Set tmp1 as all 0 vector.
>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>    
>    // Move the mask bits from the predicate register to a vector register.
>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    
>    // Shift the entire register. Prefix sum algorithm.
>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>    
>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>    
>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>    
>    dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
> ...

I ran testing again, and it passed now. Sorry, must have been an infra issue.

Approved! :)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26740#pullrequestreview-3239515013

From epeter at openjdk.org  Thu Sep 18 12:58:55 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Thu, 18 Sep 2025 12:58:55 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress
In-Reply-To: <UJ7aFOla6ZN9sNBIZF8efrJkN6-ty93pxHeQN6wx4Yk=.36595868-860a-4f0f-8caa-e752e7bedada@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <G8aVuW-KQmy7GbZY0QblQy5taiBlNGRc6XP_Wz1TwWg=.5515c4a2-e293-4d08-a0cd-7b039cd10f43@github.com>
 <UJ7aFOla6ZN9sNBIZF8efrJkN6-ty93pxHeQN6wx4Yk=.36595868-860a-4f0f-8caa-e752e7bedada@github.com>
Message-ID: <yMVtvhhCOyTwJSV14suLPYQGjZ5mR4AhgevjJFttax0=.d48d3c4f-9367-4973-87d8-1aeaa21aeb1d@github.com>

On Mon, 15 Sep 2025 09:58:19 GMT, erifan <duke at openjdk.org> wrote:

>> Would it make sense to additionally run the relevant benchmarks on other popular aarch64 platforms such as Graviton, to make sure the improvements are seen there as well?
>
> @galderz Yeah, absolutely. This is the test results on an **AWS graviton3 V1 machine**, we can see similar performance gain.
> 
> <html xmlns:v="urn:schemas-microsoft-com:vml"
> xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40">
> 
> <head>
> 
> <meta name=ProgId content=Excel.Sheet>
> <meta name=Generator content="Microsoft Excel 15">
> <link id=Main-File rel=Main-File
> href="file:////Users/erfang/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip.htm">
> <link rel=File-List
> href="file:////Users/erfang/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_filelist.xml">
> 
> 
> 
> </head>
> 
> <body link="#467886" vlink="#96607D">
> 
> 
> Benchmark | Units | Before | Error | After | Error | Uplift
> -- | -- | -- | -- | -- | -- | --
> Byte128Vector.compress | ops/ms | 2405.511 | 0.763 | 6116.85 | 17.699 | 2.54284848
> Byte64Vector.compress | ops/ms | 1151.662 | 11.262 | 5278.924 | 6.74 | 4.58374419
> Double128Vector.compress | ops/ms | 4919.017 | 4.909 | 4940.232 | 20.143 | 1.00431285
> Double64Vector.compress | ops/ms | 37.071 | 0.778 | 37.109 | 0.945 | 1.00102506
> Float128Vector.compress | ops/ms | 9580.312 | 48.341 | 9586.499 | 74.934 | 1.0006458
> Float64Vector.compress | ops/ms | 4943.728 | 7.361 | 4941.917 | 5.871 | 0.99963368
> Int128Vector.compress | ops/ms | 9496.991 | 34.972 | 9515.122 | 29.204 | 1.00190913
> Int64Vector.compress | ops/ms | 4940.23 | 7.141 | 4941.815 | 5.077 | 1.00032084
> Long128Vector.compress | ops/ms | 4918.142 | 14.835 | 4917.148 | 9.05 | 0.99979789
> Long64Vector.compress | ops/ms | 36.58 | 0.426 | 36.574 | 0.431 | 0.99983598
> Short128Vector.compress | ops/ms | 3343.878 | 0.898 | 6813.421 | 4.143 | 2.03758062
> Short64Vector.compress | ops/ms | 1595.358 | 3.37 | 3390.959 | 3.55 | 2.12551603
> 
> 
> 
> </body>
> 
> </html>

@erifan I'm going to be out of the office for 3 weeks, so feel free to ask others for reviews :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27188#issuecomment-3307304775

From adinn at openjdk.org  Thu Sep 18 15:01:41 2025
From: adinn at openjdk.org (Andrew Dinn)
Date: Thu, 18 Sep 2025 15:01:41 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <GAW_wgcHPOxqfu7k6IaKVMcFEk3OK7J7_XsE4KQJAvs=.cda6218d-4a3c-49fa-a57d-2c35a5ef03ab@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

I was wondering about that. So, perhaps a better fix is to change the command line ergonomics so that AArch64 either 1) refuses to run with it set to true or 2) prints a warning and resets it to false.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3307970063

From vlivanov at openjdk.org  Thu Sep 18 15:52:06 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 18 Sep 2025 15:52:06 GMT
Subject: RFR: 8367969: C2: compiler/vectorapi/TestVectorMathLib.java fails
 without UnlockDiagnosticVMOptions
In-Reply-To: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
References: <N0Mb54Mx8-EpFwvJ3Fvg11CKp9u_vWPe8U1yhvPvt3I=.60119a79-ced3-4a01-b619-012caa3e38ab@github.com>
Message-ID: <FJT7VqKP4UJv2ZqNfe3reLAtZe7B22-Y0k4gADllz-c=.988b7955-f690-4b70-aa8f-b2d6cde4205b@github.com>

On Thu, 18 Sep 2025 07:55:10 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Adding missing `-XX:+UnlockDiagnosticVMOptions`.
> 
> Seems the test from https://github.com/openjdk/jdk/pull/27263 was not tested with the product build before integration.

Thanks for taking care of it, Emanuel.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27359#issuecomment-3308240304

From galder at openjdk.org  Thu Sep 18 17:35:26 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Thu, 18 Sep 2025 17:35:26 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v6]
In-Reply-To: <F86WzhNjF4KSD7bCieVWq8HEj_7Zr0cbeM-27xKPFzI=.ebeb8e75-25c5-491b-86f4-bbca1ed3487a@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
 <F86WzhNjF4KSD7bCieVWq8HEj_7Zr0cbeM-27xKPFzI=.ebeb8e75-25c5-491b-86f4-bbca1ed3487a@github.com>
Message-ID: <WNaYML3vULg4Ycap4a79o6Siu-8_3Gm_UZo51g-BplE=.8c2951e8-f0fc-4471-aa54-69afc2e67db9@github.com>

On Thu, 18 Sep 2025 08:12:50 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Impliementing ideas from original draft PR: https://github.com/openjdk/jdk/pull/23418 ([Exceptions](https://github.com/openjdk/jdk/pull/23418/files#diff-77e7db8cc0c5e02786e1c993362f98fabe219042eb342fdaffc09fd11380259dR41), [ExpressionFuzzer](https://github.com/openjdk/jdk/pull/23418/files#diff-01844ca5cb007f5eab5fa4195f2f1378d4e7c64ba477fba64626c98ff4054038R66)).
>> 
>> Specifically, I'm extending the Template Library with `Expression`s, and lists of `Operations` (some basic Expressions). These Expressions can easily be nested and then filled with arguments, and applied in a `Template`.
>> 
>> Details, in **order you should review**:
>> - `Operations.java`: maps lots of primitive operators as Expressions.
>> - `Expression.java`: the fundamental engine behind Expressions.
>> - `examples/TestExpressions.java`: basic example using Expressions, filling them with random constants.
>> - `tests/TestExpression.java`: correctness test of Expression machinery.
>> - `compiler/igvn/ExpressionFuzzer.java`: expression fuzzer for primitive type expressions, including input range/bits constraints and output range/bits verification.
>> - `PrimitiveType.java`: added `LibraryRNG` facility. We already had `type.con()` which gave us random constants. But we also want to have `type.callLibraryRNG()` so that we can insert a call to a random number generator of the corresponding primitive type. I use this facility in the `ExpressionFuzzer.java` to generate random arguments for the expressions.
>> - `examples/TestPrimitiveTypes.java`: added a `LibraryRNG` example, that tests that has a weak test for randomness: we should have at least 2 different value in 1000 calls.
>> 
>> If the reviewers absolutely insist, I could split out `LibraryRNG` into a separate RFE. But it's really not that much code, and has direct use in the `Expression` examples.
>> 
>> **Future Work**:
>> - Use `Expression`s in a loop over arrays / MemorySegment: fuzz auto-vectorization.
>> - Use `Expression`s to model more operations:
>>   - `Vector API`, more arithmetic operations like from `Math` classes etc.
>> - Ensure that the constraints / checksum mechanic in `compiler/igvn/ExpressionFuzzer.java` work, using IR rules. We may even need to add new IGVN optimizations. Add unsigned constraints.
>> - Find a way to delay IGVN optimizations to test worklist notification: For example, we could add a new testing operator call `TestUtils.delay(x) -> x`, which is intrinsified as some new `DelayNode` that in normal circumstances just fol...
>
> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - more comments
>  - add othervm to test

Nice additions @eme64!

I would have liked to see an example of real use case of this in action included in the PR, e.g. some kind of IR test that takes advantage of this. E.g. a companion version (and/or replacement) for `VectorReduction2`? A follow up RFE would of course be fine for this.

-------------

Marked as reviewed by galder (Author).

PR Review: https://git.openjdk.org/jdk/pull/26885#pullrequestreview-3241161843

From galder at openjdk.org  Thu Sep 18 18:09:40 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Thu, 18 Sep 2025 18:09:40 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v3]
In-Reply-To: <ptEL3aeVTmddBQF2sIuu01j37I-iIwhEneFbEh6yZUU=.d132275c-1f67-4568-88e1-1cf580558134@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
 <ptEL3aeVTmddBQF2sIuu01j37I-iIwhEneFbEh6yZUU=.d132275c-1f67-4568-88e1-1cf580558134@github.com>
Message-ID: <HcNm6d8ShT4ABVrQSUhoBx1aXT4h7KhTtlc4TdkO41w=.20e83854-d71d-409e-91a9-b921ec1fa38f@github.com>

On Thu, 18 Sep 2025 06:40:12 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> ------------------------------
>> 
>> **Goals**
>> - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
>> - Remove `_nodes` from the vector vtnodes.
>> 
>> **Details**
>> - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
>>   - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
>> - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
>> - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
>> - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.
>> 
>> I also made a lot of annotations in the code below, for easier review.
>> 
>> **Suggested order for review**
>> - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
>> - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
>> - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
>> - `VTransformApplyState`: how it now tracks the memory state.
>> - `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
>> - Then look at all the other details.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update src/hotspot/share/opto/vectorization.cpp
>   
>   Co-authored-by: Manuel H?ssig <manuel at haessig.org>

Small nitpick, the rest looks fine as far as I can understand it :)

src/hotspot/share/opto/vtransform.cpp line 760:

> 758: // We may have reordered the scalar stores, or replaced them with vectors. Now
> 759: // the last memory state in the loop may have changed. Thus, we need to change
> 760: // the uses of the old last memory state the the new last memory state.

Suggestion:

// the uses of the old last memory state the new last memory state.

-------------

Changes requested by galder (Author).

PR Review: https://git.openjdk.org/jdk/pull/27208#pullrequestreview-3241304926
PR Review Comment: https://git.openjdk.org/jdk/pull/27208#discussion_r2360559524

From psandoz at openjdk.org  Thu Sep 18 20:00:53 2025
From: psandoz at openjdk.org (Paul Sandoz)
Date: Thu, 18 Sep 2025 20:00:53 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v6]
In-Reply-To: <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
Message-ID: <YUFBN8XGU-ckgmn3-BhncRqCqYQn1FxHfrFgjt7VEi0=.4feae95f-1bae-456d-86de-2f2d7b7fc319@github.com>

On Wed, 17 Sep 2025 08:48:16 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> This is a follow-up patch of [1], which aims at implementing the subword gather load APIs for AArch64 SVE platform.
>> 
>> ### Background
>> Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices. SVE provides native gather load instructions for `byte`/`short` types using `int` vectors for indices. The vector size for a gather-load instruction is determined by the index vector (i.e. `int` elements). Hence, the total size is `32 * elem_num` bits, where `elem_num` is the number of loaded elements in the vector register.
>> 
>> ### Implementation
>> 
>> #### Challenges
>> Due to size differences between `int` indices (32-bit) and `byte`/`short` data (8/16-bit), operations must be split across multiple vector registers based on the target SVE vector register size constraints.
>> 
>> For a 512-bit SVE machine, loading a `byte` vector with different vector species require different approaches:
>> - SPECIES_64: Single operation with mask (8 elements, 256-bit)
>> - SPECIES_128: Single operation, full register (16 elements, 512-bit)
>> - SPECIES_256: Two operations + merge (32 elements, 1024-bit)
>> - SPECIES_512/MAX: Four operations + merge (64 elements, 2048-bit)
>> 
>> Use `ByteVector.SPECIES_512` as an example:
>> - It contains 64 elements. So the index vector size should be `64 * 32`  bits, which is 4 times of the SVE vector register size.
>> - It requires 4 times of vector gather-loads to finish the whole operation.
>> 
>> 
>> byte[] arr = [a, a, a, a, ..., a, b, b, b, b, ..., b, c, c, c, c, ..., c, d, d, d, d, ..., d, ...]
>> int[] idx = [0, 1, 2, 3, ..., 63, ...]
>> 
>> 4 gather-load:
>> idx_v1 = [15 14 13 ... 1 0]    gather_v1 = [... 0000 0000 0000 0000 aaaa aaaa aaaa aaaa]
>> idx_v2 = [31 30 29 ... 17 16]  gather_v2 = [... 0000 0000 0000 0000 bbbb bbbb bbbb bbbb]
>> idx_v3 = [47 46 45 ... 33 32]  gather_v3 = [... 0000 0000 0000 0000 cccc cccc cccc cccc]
>> idx_v4 = [63 62 61 ... 49 48]  gather_v4 = [... 0000 0000 0000 0000 dddd dddd dddd dddd]
>> merge: v = [dddd dddd dddd dddd cccc cccc cccc cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa]
>> 
>> 
>> #### Solution
>> The implementation simplifies backend complexity by defining each gather load IR to handle one vector gather-load operation, with multiple IRs generated in the compiler mid-end.
>> 
>> Here is the main changes:
>> - Enhanced IR generation with architecture-specific patterns based on `gather_scatter_needs_vector_index()` matcher.
>> - Added `VectorSliceNode` for result mer...
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Add more comments for IRs and added method
>  - Merge branch 'jdk:master' into JDK-8351623-sve
>  - Merge 'jdk:master' into JDK-8351623-sve
>  - Address review comments
>  - Refine IR pattern and clean backend rules
>  - Fix indentation issue and move the helper matcher method to header files
>  - Merge branch jdk:master into JDK-8351623-sve
>  - 8351623: VectorAPI: Add SVE implementation of subword gather load operation

> I would want them to consider if the approach with the special vector nodes `VectorConcatenateAndNarrow` and `VectorMaskWiden` are really desirable. The complexity needs to go somewhere, but I'm not sure if it is better in the C2 IR or in the backend.
> 

> It would just be nice to build on "simple" building blocks and not have too many complex nodes, that have very special semantics (widen + split into two)

Intuitively this seems like the right way to think about it, although I don't have a proposed solution, i am really just agreeing with the above sentiment - a compositional solution, if possible, with the right primitive building blocks will likely be superior.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3309447725

From dlong at openjdk.org  Thu Sep 18 23:12:00 2025
From: dlong at openjdk.org (Dean Long)
Date: Thu, 18 Sep 2025 23:12:00 GMT
Subject: RFR: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation [v3]
In-Reply-To: <6ijTgwXUpwm8C_U7oOsN7RScv-caCal0U67UXFZ6VmY=.5550cf2f-2c57-4fc0-a2cd-3df6627485a2@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
 <6ijTgwXUpwm8C_U7oOsN7RScv-caCal0U67UXFZ6VmY=.5550cf2f-2c57-4fc0-a2cd-3df6627485a2@github.com>
Message-ID: <KChelO2mL1W93m8BNjKl3g656vU_Fe32nGmQY2tSOko=.88ad17c4-5277-46b2-95e7-fa6fda77ec14@github.com>

On Tue, 16 Sep 2025 15:38:12 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> When running a debug JVM on Linux with a compile task timeout and repeated compilation, the execution will time out almost always because the timeout does not reset for repetitions of a compilation. The core of the compile task timeout is to limit the amount of time a single compilation can take. Thus, this PR resets the `CompileTaskTimeout` for every compilation when running with `-XX:RepeatCompilation=<n>` for n > 1.
>> 
>> This PR is stacked on top of #27094.
>> 
>> Testing:
>>  - [x] Github Actions (failures are unrelated)
>>  - [x] tier1, tier2, tier3 plus some additional internal testing
>
> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8366875-repeat-comp-to
>  - Reset timeout on repeated compilations
>  - Add regression test
>  - Use timeuot factor

Still good.

-------------

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27120#pullrequestreview-3242316009

From duke at openjdk.org  Fri Sep 19 03:24:56 2025
From: duke at openjdk.org (duke)
Date: Fri, 19 Sep 2025 03:24:56 GMT
Subject: Withdrawn: 8359963: compiler/c2/aarch64/TestStaticCallStub.java fails
 with for code cache > 250MB the static call stub is expected to be
 implemented using far branch
In-Reply-To: <kMTi6b91nKTgnGmga1q7noovZoliq_FdHmcit4VeHb0=.8dcb08b9-ee8e-4a7a-98f6-b96f50be50b5@github.com>
References: <kMTi6b91nKTgnGmga1q7noovZoliq_FdHmcit4VeHb0=.8dcb08b9-ee8e-4a7a-98f6-b96f50be50b5@github.com>
Message-ID: <GHWwgXJifwAp4I-QYT1O8kMa1xk_0jIJW36S1sqZd3g=.b4df888b-d60d-4aa2-a5f2-f8e15969c39f@github.com>

On Mon, 30 Jun 2025 15:24:42 GMT, Mikhail Ablakatov <mablakatov at openjdk.org> wrote:

> The test assumed that hsdis is always available which is not the case. Make the test accept and scan either real or pseudo disassembly.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/26047

From galder at openjdk.org  Fri Sep 19 04:08:05 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Fri, 19 Sep 2025 04:08:05 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v6]
In-Reply-To: <lzOofJ3qhJ7tovM62NZNLYKuSRGnY0xCLskI0OkqerM=.31fa5fad-0980-4b6d-ae72-ee4ac6b3f973@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <lzOofJ3qhJ7tovM62NZNLYKuSRGnY0xCLskI0OkqerM=.31fa5fad-0980-4b6d-ae72-ee4ac6b3f973@github.com>
Message-ID: <FV_5wk6T4ae26a4rivZYz7G0wLurln3ERbPH6T96b0g=.e923e47d-0d24-494f-ac03-7127b65395ab@github.com>

On Wed, 17 Sep 2025 14:35:23 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Demo from here:
>> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
>> 
>> Cleaned up and enhanced with a JTREG and IR test.
>> I also added some additional "generated" normal maps from height functions.
>> And I display the resulting image side-by-side with the normal map.
>> 
>> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
>> 
>> There is a **stand-alone** way to run the demo:
>> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
>> (though it may only run with JDK22+, probably due some amber features)
>> 
>> **Quick Perforance Numbers**, running on my avx512 laptop.
>> default / AVX3: 105 FPS
>> AVX2: 82 FPS
>> AVX1: 50 FPS
>> No vectorization: 19 FPS
>> GraalJIT: 13 FPS (`jdk-26-ea+5` - probably issue with vectorization / inlining?)
>> 
>> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
>> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
>> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
>> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
>> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />
>
> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Update test/hotspot/jtreg/compiler/gallery/TestNormalMapping.java
>    
>    Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>
>  - Update test/hotspot/jtreg/compiler/gallery/NormalMapping.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

Great demo! I run it on my M4 Pro at 220 FPS with default flags ?

-------------

Marked as reviewed by galder (Author).

PR Review: https://git.openjdk.org/jdk/pull/27282#pullrequestreview-3242796809

From epeter at openjdk.org  Fri Sep 19 05:51:25 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 19 Sep 2025 05:51:25 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v6]
In-Reply-To: <WNaYML3vULg4Ycap4a79o6Siu-8_3Gm_UZo51g-BplE=.8c2951e8-f0fc-4471-aa54-69afc2e67db9@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
 <F86WzhNjF4KSD7bCieVWq8HEj_7Zr0cbeM-27xKPFzI=.ebeb8e75-25c5-491b-86f4-bbca1ed3487a@github.com>
 <WNaYML3vULg4Ycap4a79o6Siu-8_3Gm_UZo51g-BplE=.8c2951e8-f0fc-4471-aa54-69afc2e67db9@github.com>
Message-ID: <OgyQiPhMsqRSBiZvM_5rZX1iouEG0AWzJW9jzHU-ZMw=.9354d371-e56b-427a-aa75-483357c98d9f@github.com>

On Thu, 18 Sep 2025 17:32:22 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - more comments
>>  - add othervm to test
>
> Nice additions @eme64!
> 
> I would have liked to see an example of real use case of this in action included in the PR, e.g. some kind of IR test that takes advantage of this. E.g. a companion version (and/or replacement) for `VectorReduction2`? A follow up RFE would of course be fine for this.

@galderz Thanks for reviewing!
Can you spell out a little more what you would like to see? For me, the `compiler/igvn/ExpressionFuzzer.java` is already "an example of real use" for me. And I have a lot still planned in future RFE's, see the "future work" section in the PR description ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26885#issuecomment-3310686831

From epeter at openjdk.org  Fri Sep 19 05:56:20 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 19 Sep 2025 05:56:20 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v4]
In-Reply-To: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
Message-ID: <KfBXAlxouri0d8mAnsue8G5tT7voCqiWVuKFTgsABEg=.2ca9e683-6120-4ce8-883f-3bcc2e23e2df@github.com>

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ------------------------------
> 
> **Goals**
> - VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
> - Remove `_nodes` from the vector vtnodes.
> 
> **Details**
> - Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
>   - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
> - Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
> - Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
> - `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.
> 
> I also made a lot of annotations in the code below, for easier review.
> 
> **Suggested order for review**
> - Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
> - Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
> - `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
> - `VTransformApplyState`: how it now tracks the memory state.
> - `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
> - Then look at all the other details.

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Update src/hotspot/share/opto/vtransform.cpp
  
  Co-authored-by: Galder Zamarre?o <galder at ibm.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27208/files
  - new: https://git.openjdk.org/jdk/pull/27208/files/9af66755..99fd1c99

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=02-03

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27208.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27208/head:pull/27208

PR: https://git.openjdk.org/jdk/pull/27208

From epeter at openjdk.org  Fri Sep 19 05:56:23 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 19 Sep 2025 05:56:23 GMT
Subject: RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole
 loop instead of just the basic block [v3]
In-Reply-To: <HcNm6d8ShT4ABVrQSUhoBx1aXT4h7KhTtlc4TdkO41w=.20e83854-d71d-409e-91a9-b921ec1fa38f@github.com>
References: <qNs-pEYa6BR200XijY4JB8-DzdxjQPVR28zefXhEFNo=.89143141-f462-45e1-a1cb-cbe66dfcaf5d@github.com>
 <ptEL3aeVTmddBQF2sIuu01j37I-iIwhEneFbEh6yZUU=.d132275c-1f67-4568-88e1-1cf580558134@github.com>
 <HcNm6d8ShT4ABVrQSUhoBx1aXT4h7KhTtlc4TdkO41w=.20e83854-d71d-409e-91a9-b921ec1fa38f@github.com>
Message-ID: <CiEvU8hfyPeZNcKIE7hduZIjxjO7Wps6tK7Gh8RZyeg=.ed481310-a178-4355-a07c-f08f4fa81e6d@github.com>

On Thu, 18 Sep 2025 18:06:33 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update src/hotspot/share/opto/vectorization.cpp
>>   
>>   Co-authored-by: Manuel H?ssig <manuel at haessig.org>
>
> Small nitpick, the rest looks fine as far as I can understand it :)

@galderz Thanks for having a look, I applied the suggestion :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27208#issuecomment-3310689125

From chagedorn at openjdk.org  Fri Sep 19 06:37:27 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 19 Sep 2025 06:37:27 GMT
Subject: RFR: 8367657: C2 SuperWord: NormalMapping demo from JVMLS 2025
 [v6]
In-Reply-To: <lzOofJ3qhJ7tovM62NZNLYKuSRGnY0xCLskI0OkqerM=.31fa5fad-0980-4b6d-ae72-ee4ac6b3f973@github.com>
References: <Ynf0FoI7tkuGWS4jQJ8zcnnJUWa3E_vzk74ZVYhKgOc=.e394db56-1deb-4f35-8f30-3970c4f79e26@github.com>
 <lzOofJ3qhJ7tovM62NZNLYKuSRGnY0xCLskI0OkqerM=.31fa5fad-0980-4b6d-ae72-ee4ac6b3f973@github.com>
Message-ID: <OJIwcLOLjdXOkob57L3AQHoSFsM6cZ7q1NiBHLpXggw=.0bff5112-1aff-4d6a-806d-bd60409c1be3@github.com>

On Wed, 17 Sep 2025 14:35:23 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Demo from here:
>> https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/
>> 
>> Cleaned up and enhanced with a JTREG and IR test.
>> I also added some additional "generated" normal maps from height functions.
>> And I display the resulting image side-by-side with the normal map.
>> 
>> I decided to put it in a new directory `compiler.gallery`, anticipating other compiler tests that are both visually appealing (i.e. can be used for a "gallery") and that we may want to back up with other tests like IR testing.
>> 
>> There is a **stand-alone** way to run the demo:
>> `java test/hotspot/jtreg/compiler/gallery/NormalMapping.java`
>> (though it may only run with JDK22+, probably due some amber features)
>> 
>> **Quick Perforance Numbers**, running on my avx512 laptop.
>> default / AVX3: 105 FPS
>> AVX2: 82 FPS
>> AVX1: 50 FPS
>> No vectorization: 19 FPS
>> GraalJIT: 13 FPS (`jdk-26-ea+5` - probably issue with vectorization / inlining?)
>> 
>> Here some snapshots, but **I really recommend pulling the diff and playing with it, it looks much better in motion**:
>> <img width="2000" height="991" alt="image" src="https://github.com/user-attachments/assets/a693fac8-ecf0-43f2-914b-25f76c2f425d" />
>> <img width="2000" height="997" alt="image" src="https://github.com/user-attachments/assets/c2202e6b-6a90-4f90-a3ca-b73304e25905" />
>> <img width="1997" height="992" alt="image" src="https://github.com/user-attachments/assets/0d6da304-6bb9-4b25-9a7b-72019b02d95e" />
>> <img width="1992" height="994" alt="image" src="https://github.com/user-attachments/assets/9f5f7426-0678-45af-a3eb-ac092c262d4c" />
>
> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Update test/hotspot/jtreg/compiler/gallery/TestNormalMapping.java
>    
>    Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>
>  - Update test/hotspot/jtreg/compiler/gallery/NormalMapping.java
>    
>    Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>

Update looks good, thanks!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27282#pullrequestreview-3243376838

From fyang at openjdk.org  Fri Sep 19 07:12:25 2025
From: fyang at openjdk.org (Fei Yang)
Date: Fri, 19 Sep 2025 07:12:25 GMT
Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v7]
In-Reply-To: <CK9saRbHrBxaXya098IIqpafnO3lI90UJ1ryPwuXP14=.0ed5e960-675a-4808-a96a-eae2c4f09e07@github.com>
References: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
 <CK9saRbHrBxaXya098IIqpafnO3lI90UJ1ryPwuXP14=.0ed5e960-675a-4808-a96a-eae2c4f09e07@github.com>
Message-ID: <NNd3kziE2dM8snLmu-n1vebuYQ-rXlr7u_CiEE4ETnc=.b63e1623-a720-47d4-b5d3-c17c12ea689b@github.com>

On Fri, 12 Sep 2025 03:40:59 GMT, Anjian Wen <wenanjian at openjdk.org> wrote:

>> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed.
>
> Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into aes_ctr
>  - fix the counter increase at limit and add test
>  - change format
>  - update reg use and instruction
>  - change some name and format
>  - delete useless Label, change L_judge_used to L_slow_loop
>  - add Flags and fix the stubid name
>  - RISC-V: implement AES-CTR mode intrinsics

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2667:

> 2665:     __ addi(t0, counter, 8);
> 2666:     __ ld(tmp, Address(t0));
> 2667:     __ rev8(tmp, tmp);

Note that `rev8` is only available under `UseZbb`. Maybe you should use `revb/revbw` instead which considers that the availability of Zbb extension.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2361999055

From manc at openjdk.org  Fri Sep 19 08:00:40 2025
From: manc at openjdk.org (Man Cao)
Date: Fri, 19 Sep 2025 08:00:40 GMT
Subject: RFR: 8368071: Compilation throughput regressed 2X-8X after JDK-8355003
Message-ID: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>

Hi all,

Could anyone review this change that fixes a severe startup performance regression for `-XX:+TieredCompilation`?  See https://bugs.openjdk.org/browse/JDK-8368071 for more details.

-Man

-------------

Commit messages:
 - 8368071: Compilation throughput regressed 2X-8X after JDK-8355003

Changes: https://git.openjdk.org/jdk/pull/27383/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27383&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8368071
  Stats: 13 lines in 1 file changed: 8 ins; 2 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/27383.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27383/head:pull/27383

PR: https://git.openjdk.org/jdk/pull/27383

From manc at openjdk.org  Fri Sep 19 08:04:11 2025
From: manc at openjdk.org (Man Cao)
Date: Fri, 19 Sep 2025 08:04:11 GMT
Subject: RFR: 8367613: Test
 compiler/runtime/TestDontCompileHugeMethods.java failed [v2]
In-Reply-To: <viR_qPrD-PqsT7ndF1WZzTMK3IOXeRESFAXmNRYZris=.966e109b-5573-44e8-8921-c79c9a5b4c88@github.com>
References: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
 <viR_qPrD-PqsT7ndF1WZzTMK3IOXeRESFAXmNRYZris=.966e109b-5573-44e8-8921-c79c9a5b4c88@github.com>
Message-ID: <1rFKLR9URrdZDzT2kXZMXkhzjWMzgGZyK9CLJXB0Q_A=.ff5535f2-7851-4d17-81a4-664148a3d1fa@github.com>

On Tue, 16 Sep 2025 08:10:11 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Man Cao has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Switch to disable inlining for shortMethod
>
> When looking at the test, it seems that we want to verify that `shortMethod()` is compiled while `hugeSwitch()` is not. When running with `-Xcomp`, we will immediately compile `main()` and directly inline `shortMethod()` with C1 (with C2 we fail to inline with "failed initial checks" and thus will compile `shortMethod()` separately when calling it the first time). Therefore, with C1, we will not compile `shortMethod()` separately and the test fails. 
> 
> Excluding `-Xcomp` looks reasonable. An alternative would be to exclude `main()` from compilation. But I think for the purpose of this test, excluding `-Xcomp` seems better.

@chhagedorn Could you also approve the latest commit?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27306#issuecomment-3311081698

From wenanjian at openjdk.org  Fri Sep 19 08:13:20 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Fri, 19 Sep 2025 08:13:20 GMT
Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v8]
In-Reply-To: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
References: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
Message-ID: <2lhB_2BCsW-SIBFxtc7KKPRZ2SGoleG41SR_d6IAAzI=.86cbf242-bb10-4d95-9424-f2bbe4cfc7ca@github.com>

> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed.

Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision:

 - Merge branch 'openjdk:master' into aes_ctr
 - Merge branch 'openjdk:master' into aes_ctr
 - fix the counter increase at limit and add test
 - change format
 - update reg use and instruction
 - change some name and format
 - delete useless Label, change L_judge_used to L_slow_loop
 - add Flags and fix the stubid name
 - RISC-V: implement AES-CTR mode intrinsics

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25281/files
  - new: https://git.openjdk.org/jdk/pull/25281/files/ff513708..35f82e0a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=06-07

  Stats: 26416 lines in 742 files changed: 13049 ins; 7667 del; 5700 mod
  Patch: https://git.openjdk.org/jdk/pull/25281.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281

PR: https://git.openjdk.org/jdk/pull/25281

From shade at openjdk.org  Fri Sep 19 08:23:53 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 19 Sep 2025 08:23:53 GMT
Subject: RFR: 8368071: Compilation throughput regressed 2X-8X after
 JDK-8355003
In-Reply-To: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
References: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
Message-ID: <5vFQJAcARfEjdesFIAbz1F9-xSoEv1IkAt4gfSATgC8=.b0e2ceb9-8deb-48fe-87dc-3364637698f9@github.com>

On Fri, 19 Sep 2025 07:52:16 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi all,
> 
> Could anyone review this change that fixes a severe startup performance regression for `-XX:+TieredCompilation`?  See https://bugs.openjdk.org/browse/JDK-8368071 for more details.
> 
> -Man

This one is for @veresov :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27383#issuecomment-3311139236

From jbhateja at openjdk.org  Fri Sep 19 08:23:50 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 08:23:50 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v9]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <-cXUJ-9Nbp1h9REUqjyCpkrlDm4WzeiE0-6mx_QuWs4=.dd56a6a1-a626-4b03-b556-19b2b954a08b@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Review comments resolutions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/278f1dc8..367622bf

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=07-08

  Stats: 33 lines in 1 file changed: 15 ins; 4 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From jbhateja at openjdk.org  Fri Sep 19 08:23:53 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 08:23:53 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v3]
In-Reply-To: <1YTjbiOmc3OUXZlJ_Pg4W6En5hjU0wd_JBHERbVLDWc=.11ddbe0f-685b-463e-87b7-fcdd14ad4bb2@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <eXrwH7c0C-O1WG8EdYsncgayPjViNP0yIyPL6PNOIk0=.697e0e55-56fa-41ff-aaf4-6a3baf7e65d2@github.com>
 <cwluATNnzACJ0UXNLV2hG9aF1bQzVXlewzGHmYhSz0M=.f2d2a6c0-e49f-419c-820b-5d6103eeeba9@github.com>
 <1YTjbiOmc3OUXZlJ_Pg4W6En5hjU0wd_JBHERbVLDWc=.11ddbe0f-685b-463e-87b7-fcdd14ad4bb2@github.com>
Message-ID: <4VQ6YYLFGU3tscZXp3lYhMPDsRvjUlagiJlMe6xiOMc=.20bf3ee6-0822-42f3-8417-1296f5076456@github.com>

On Thu, 18 Sep 2025 12:55:16 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Hi @TobiHartmann , @SirYwell , @eme64 , can you kindly verify the changes in the latest patch?
>
> @jatin-bhateja I'm going to be out of the office for about 3 weeks, so feel free to ask others for reviews!

Hi @eme64 , @chhagedorn , @SirYwell , let me know if its good to land now.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3311136518

From jbhateja at openjdk.org  Fri Sep 19 08:23:57 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 08:23:57 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v8]
In-Reply-To: <Q8CV6qZKlcjxQTAp6SCDaQPs-JGWLxgUh0nYz0vdKA0=.4f22adc9-41b7-4b7c-b95a-9ebd7f283a60@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <HPgkmQwoaMXSWdXiMXkbqSoMnI13yPpPBJSPxZKTxnc=.f978ac37-462f-496e-b5ec-bf3005cb7e5a@github.com>
 <Q8CV6qZKlcjxQTAp6SCDaQPs-JGWLxgUh0nYz0vdKA0=.4f22adc9-41b7-4b7c-b95a-9ebd7f283a60@github.com>
Message-ID: <_s42gZC5DcP4WobqtuohrzRqET6hpmeaLYwE6BEzcu0=.f24978f2-8ce0-4926-9f5b-0bc2cab57727@github.com>

On Tue, 16 Sep 2025 07:15:00 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Extending the random ranges
>
> test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 56:
> 
>> 54:     static final long rand_bndL2 = G.uniformLongs(-0xFFFFFFL, 0xFFFFFF).next();
>> 55:     static final long rand_popcL1 = G.uniformLongs(0, 4).next();
>> 56:     static final long rand_popcL2 = G.uniformLongs(0, 32).next();
> 
> Can you please give us some code comments why you are doing:
> - only uniform distribution. Is that needed? Generators generates special values more often for a good reason: it creates interesting edge cases, especially for bit operations like this here.
> - Why are you restricting the ranges? There could always be surprises outside the ranges you pick, and it would be a shame to not generate those. Unless you are absolutely sure they are not needed. Or if extending the range would mean we would generate interesting cases with a probability that is too small, that could be another reason to restrict the ranges.

Thanks @eme64!, comment addressed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2362138729

From mhaessig at openjdk.org  Fri Sep 19 09:11:34 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 19 Sep 2025 09:11:34 GMT
Subject: RFR: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation [v3]
In-Reply-To: <KChelO2mL1W93m8BNjKl3g656vU_Fe32nGmQY2tSOko=.88ad17c4-5277-46b2-95e7-fa6fda77ec14@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
 <6ijTgwXUpwm8C_U7oOsN7RScv-caCal0U67UXFZ6VmY=.5550cf2f-2c57-4fc0-a2cd-3df6627485a2@github.com>
 <KChelO2mL1W93m8BNjKl3g656vU_Fe32nGmQY2tSOko=.88ad17c4-5277-46b2-95e7-fa6fda77ec14@github.com>
Message-ID: <mF1e6-9mpPXF2GH4GyP3YWPOv83xU9paS6YfQ9qhD3w=.0fbab326-ccd1-43a4-ac5d-53bf6d32b1f7@github.com>

On Thu, 18 Sep 2025 23:09:32 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8366875-repeat-comp-to
>>  - Reset timeout on repeated compilations
>>  - Add regression test
>>  - Use timeuot factor
>
> Still good.

Thank you both for your reviews, @dean-long and @eme64!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27120#issuecomment-3311368487

From mhaessig at openjdk.org  Fri Sep 19 09:11:35 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 19 Sep 2025 09:11:35 GMT
Subject: Integrated: 8366875: CompileTaskTimeout should be reset for each
 iteration of RepeatCompilation
In-Reply-To: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
References: <4TbOkAMu-KU_tgQPg1sK0L8oto_0nD4mQo7yc0hJPm4=.8d87b900-a614-4c13-a4c6-6fe11e206482@github.com>
Message-ID: <ts22An-Kw010Aog0NafGFblmUeoBY2cFjtn1J2zJ7rE=.c548f2e1-3fa7-480d-8096-93e899fb723b@github.com>

On Fri, 5 Sep 2025 15:27:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> When running a debug JVM on Linux with a compile task timeout and repeated compilation, the execution will time out almost always because the timeout does not reset for repetitions of a compilation. The core of the compile task timeout is to limit the amount of time a single compilation can take. Thus, this PR resets the `CompileTaskTimeout` for every compilation when running with `-XX:RepeatCompilation=<n>` for n > 1.
> 
> This PR is stacked on top of #27094.
> 
> Testing:
>  - [x] Github Actions (failures are unrelated)
>  - [x] tier1, tier2, tier3 plus some additional internal testing

This pull request has now been integrated.

Changeset: 94a301a7
Author:    Manuel H?ssig <mhaessig at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/94a301a70e19be284f406ebb6d8b94b6f96e1a24
Stats:     17 lines in 4 files changed: 16 ins; 0 del; 1 mod

8366875: CompileTaskTimeout should be reset for each iteration of RepeatCompilation

Reviewed-by: dlong, epeter

-------------

PR: https://git.openjdk.org/jdk/pull/27120

From epeter at openjdk.org  Fri Sep 19 09:41:00 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 19 Sep 2025 09:41:00 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v9]
In-Reply-To: <-cXUJ-9Nbp1h9REUqjyCpkrlDm4WzeiE0-6mx_QuWs4=.dd56a6a1-a626-4b03-b556-19b2b954a08b@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <-cXUJ-9Nbp1h9REUqjyCpkrlDm4WzeiE0-6mx_QuWs4=.dd56a6a1-a626-4b03-b556-19b2b954a08b@github.com>
Message-ID: <ORJC3bgHjy7bYmEWHSK32yPBQEftEsgaQH8Jdh05fvY=.78ce50d7-b35c-4d73-8bac-0a4aea1f4624@github.com>

On Fri, 19 Sep 2025 08:23:50 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comments resolutions

As I'm about to board my plane for a 3-week vacation, I'm leaving a last little **note for the reviewers**.

I think this is a really nice addition, so thanks for doing it @jatin-bhateja ? . Though it will only reach its full potential once we implement more "basic" KnownBits optimizations such as [JDK-8367341](https://bugs.openjdk.org/browse/JDK-8367341).

Please make sure you **test** it, and make sure the random values generated with the Generators in `test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java` make sense. Currently, there is for example a 32 bit range for a 64 bit long value, which is not correct, I think.

By default, my recommendation is to **not** constrain the Generators ranges, unless there is a really good reason. Generators are already built to produce values close to zero at an over-proportional rate. But by not restricting we may at some point also hit cases that we did not anticipate, and catch bugs that way.

test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 54:

> 52:     static final long rand_bndL2 = G.longs().next();
> 53:     static final long rand_popcL1 = G.uniformLongs(0, 32).next();
> 54:     static final long rand_popcL2 = G.uniformLongs(0, 32).next();

Why did you limit the range for longs to 32? Can it not go up to 64?
I asked for an explanation (in a code comment) of those that you restrict here, which you have not done, and just "resolved" it instead:
https://github.com/openjdk/jdk/pull/27075#discussion_r2351166568

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27075#pullrequestreview-3244008016
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2362301238

From epeter at openjdk.org  Fri Sep 19 09:41:04 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 19 Sep 2025 09:41:04 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v9]
In-Reply-To: <ORJC3bgHjy7bYmEWHSK32yPBQEftEsgaQH8Jdh05fvY=.78ce50d7-b35c-4d73-8bac-0a4aea1f4624@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <-cXUJ-9Nbp1h9REUqjyCpkrlDm4WzeiE0-6mx_QuWs4=.dd56a6a1-a626-4b03-b556-19b2b954a08b@github.com>
 <ORJC3bgHjy7bYmEWHSK32yPBQEftEsgaQH8Jdh05fvY=.78ce50d7-b35c-4d73-8bac-0a4aea1f4624@github.com>
Message-ID: <ieJFo2vKGuPreqcRQbusYOh-JRdpclziowD3caU9Rg4=.cd93f870-1bce-4faf-8e99-25607bb553d1@github.com>

On Fri, 19 Sep 2025 09:25:56 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Review comments resolutions
>
> test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 54:
> 
>> 52:     static final long rand_bndL2 = G.longs().next();
>> 53:     static final long rand_popcL1 = G.uniformLongs(0, 32).next();
>> 54:     static final long rand_popcL2 = G.uniformLongs(0, 32).next();
> 
> Why did you limit the range for longs to 32? Can it not go up to 64?
> I asked for an explanation (in a code comment) of those that you restrict here, which you have not done, and just "resolved" it instead:
> https://github.com/openjdk/jdk/pull/27075#discussion_r2351166568

If you do restrict it, then at least go over the range a little bit. Why?
You check `Integer.bitCount(num) < rand_popcI2`. The max value you get here is 32, so we could never get a constant folding case for the range `0..32`. Maybe that is ok, but we potentially miss a chance to find something we did not even anticipate.

That is why I would recommend **not** to constrain the values, unless you really have a good reason and write it down in a code comment.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2362316509

From epeter at openjdk.org  Fri Sep 19 09:42:37 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 19 Sep 2025 09:42:37 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <feRNgjFxa1MZDl41muWzG13bQfvr1EjhiH7GcMSj_I4=.731caa37-c1d1-404a-8f3d-1030b4c97a05@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
 <feRNgjFxa1MZDl41muWzG13bQfvr1EjhiH7GcMSj_I4=.731caa37-c1d1-404a-8f3d-1030b4c97a05@github.com>
Message-ID: <Dxq61i03xtwdwM_JdfOHfyvMD-FoWzOdqMnWjdmPO0A=.98465a3a-7f94-4f5a-a10e-453142fe9a1b@github.com>

On Wed, 17 Sep 2025 09:52:34 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/arguments/TestMethodArguments.java line 120:
>> 
>>> 118:                 Template.let("classpath", comp.getEscapedClassPathOfCompiledClasses()),
>>> 119:                 """
>>> 120:                         import java.util.Arrays;
>> 
>> Personally, I would not indent this deeply. I know that the generated code will not have proper indentation, but that's no so bad. Readability of the Templates is more important I think. Subjective though.
>
> No strong opinion here, I just went with the eclipse-jdtls autoformatter defaults. The generated code does have fairly OK indentation (the indentation in the code does not add any actual indentation in the generated code). Let me know what you prefer and I'll update it.

I would prefer readability of the test, not the generated code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2362332715

From jbhateja at openjdk.org  Fri Sep 19 09:49:16 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 09:49:16 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v10]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <HV-W-zZ6kxBI2gg4DnuQyFLxOUCNWcKzwM4GSdFyEPo=.b553be10-cd24-4957-90a3-6ca9970ac2f2@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Update TestPopCountValueTransforms.java

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/367622bf..92cf2fad

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=08-09

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From jbhateja at openjdk.org  Fri Sep 19 09:49:18 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 09:49:18 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v9]
In-Reply-To: <ieJFo2vKGuPreqcRQbusYOh-JRdpclziowD3caU9Rg4=.cd93f870-1bce-4faf-8e99-25607bb553d1@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <-cXUJ-9Nbp1h9REUqjyCpkrlDm4WzeiE0-6mx_QuWs4=.dd56a6a1-a626-4b03-b556-19b2b954a08b@github.com>
 <ORJC3bgHjy7bYmEWHSK32yPBQEftEsgaQH8Jdh05fvY=.78ce50d7-b35c-4d73-8bac-0a4aea1f4624@github.com>
 <ieJFo2vKGuPreqcRQbusYOh-JRdpclziowD3caU9Rg4=.cd93f870-1bce-4faf-8e99-25607bb553d1@github.com>
Message-ID: <sh1VBXKXJjVxdHse8vQrCOoyoZBdX9OvjNXi1PlvpjU=.0fc68b72-9b63-49ca-b4e6-b1db0ca54161@github.com>

On Fri, 19 Sep 2025 09:32:30 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java line 54:
>> 
>>> 52:     static final long rand_bndL2 = G.longs().next();
>>> 53:     static final long rand_popcL1 = G.uniformLongs(0, 32).next();
>>> 54:     static final long rand_popcL2 = G.uniformLongs(0, 32).next();
>> 
>> Why did you limit the range for longs to 32? Can it not go up to 64?
>> I asked for an explanation (in a code comment) of those that you restrict here, which you have not done, and just "resolved" it instead:
>> https://github.com/openjdk/jdk/pull/27075#discussion_r2351166568
>
> If you do restrict it, then at least go over the range a little bit. Why?
> You check `Integer.bitCount(num) < rand_popcI2`. The max value you get here is 32, so we could never get a constant folding case for the range `0..32`. Maybe that is ok, but we potentially miss a chance to find something we did not even anticipate.
> 
> That is why I would recommend **not** to constrain the values, unless you really have a good reason and write it down in a code comment.

> Why did you limit the range for longs to 32? Can it not go up to 64? I asked for an explanation (in a code comment) of those that you restrict here, which you have not done, and just "resolved" it instead: [#27075 (comment)](https://github.com/openjdk/jdk/pull/27075#discussion_r2351166568)

A silly typo, so no explanation :-) enjoy your break :-)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2362345323

From jbhateja at openjdk.org  Fri Sep 19 09:53:06 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 09:53:06 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v9]
In-Reply-To: <ORJC3bgHjy7bYmEWHSK32yPBQEftEsgaQH8Jdh05fvY=.78ce50d7-b35c-4d73-8bac-0a4aea1f4624@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <-cXUJ-9Nbp1h9REUqjyCpkrlDm4WzeiE0-6mx_QuWs4=.dd56a6a1-a626-4b03-b556-19b2b954a08b@github.com>
 <ORJC3bgHjy7bYmEWHSK32yPBQEftEsgaQH8Jdh05fvY=.78ce50d7-b35c-4d73-8bac-0a4aea1f4624@github.com>
Message-ID: <IvVVWJde40vcBJO8RNHevXDiMxO_dWCFyRj43aLwAU0=.23399468-b50b-400f-b226-bb651b62238c@github.com>

On Fri, 19 Sep 2025 09:37:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> As I'm about to board my plane for a 3-week vacation, I'm leaving a last little **note for the reviewers**.
> 
> I think this is a really nice addition, so thanks for doing it @jatin-bhateja ? . Though it will only reach its full potential once we implement more "basic" KnownBits optimizations such as [JDK-8367341](https://bugs.openjdk.org/browse/JDK-8367341).
> 

Correct, currently KnownBits information is constrained as they are generated for limited value ranges, as discussed in 
https://github.com/openjdk/jdk/pull/27075#discussion_r2337215333

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3311500441

From epeter at openjdk.org  Fri Sep 19 10:01:46 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 19 Sep 2025 10:01:46 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v29]
In-Reply-To: <IJFPzAZnUjnsVQZOmcCfGEa-vgUTBCSYvIkTAzWgFyo=.2a3d0da5-7fb5-48af-b598-77241699f350@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <IJFPzAZnUjnsVQZOmcCfGEa-vgUTBCSYvIkTAzWgFyo=.2a3d0da5-7fb5-48af-b598-77241699f350@github.com>
Message-ID: <VnK4MpXfvSaF2IOGwZjTStgs1z1j8m6CWGfjnFQ5RHI=.15e345a1-753e-4ae5-befb-66c98b8a40dd@github.com>

On Wed, 17 Sep 2025 10:07:58 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update after comments from Emanuel

@dlunde I just had another look through the whole code change. And I'm very happy with it now. Especially the additional code comments around `rollover`/`offset` really helped to bring it together for me :)

Thanks for bearing with me through the many comments / suggestions ? 

I would suggest that either @vnkozlov or @robcasloz have another quick look over the changes, just to see if they agree with what we have been doing ;)

test/hotspot/gtest/opto/test_regmask.cpp line 1222:

> 1220: }
> 1221: 
> 1222: #endif // !PRODUCT

Optional:
You could add some tests that expect a vm assert. You can do that with `TEST_VM_ASSERT_MSG`. Example:
https://github.com/openjdk/jdk/blob/9b04b5a74cc09b64098fb9940aa224f529ff1a01/test/hotspot/gtest/utilities/test_growableArray.cpp

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-3244069001
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2362363610

From epeter at openjdk.org  Fri Sep 19 10:01:47 2025
From: epeter at openjdk.org (Emanuel Peter)
Date: Fri, 19 Sep 2025 10:01:47 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <Pk7jfQR86D8wKhFB7wFZ6M6dSzWVPLuaaIHK8lV8i-U=.e75818b5-ed36-4b41-bfd5-6750e3df7722@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
 <yAWlqsoRw6XTVuYKBC9QXG_5rH8xx2MsotmdBqZ3M3Q=.77688482-c807-4a15-9f22-2b868e977f8c@github.com>
 <Pk7jfQR86D8wKhFB7wFZ6M6dSzWVPLuaaIHK8lV8i-U=.e75818b5-ed36-4b41-bfd5-6750e3df7722@github.com>
Message-ID: <ZrN8HsAdCLQyd-KXHk4xJ7Q_wvtmYNQyz6C_BVWhxqo=.b774c00f-5f2f-476b-ae35-02f7fe4cb52d@github.com>

On Tue, 16 Sep 2025 12:54:37 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> `//assert(is_infinite_stack == lrg->mask().is_infinite_stack(), "nbrs must not change InfiniteStackedness");`
>
> No idea, sorry (it has been that way since initial load). I just touched it to change from all_stack to infinite_stack.

@dlunde Would you mind investigating in a follow-up RFE? I would just enable the assert and see if it triggers. If not, add the assert back in, otherwise see why the assert fails, and if that looks reasonable. If yes -> just remove it. If it is not reasonable .... we then investigate more I suppose ;)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2362347841

From jbhateja at openjdk.org  Fri Sep 19 11:10:20 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 11:10:20 GMT
Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize
 NDD instructions [v2]
In-Reply-To: <PrOpTvYJIbrN8uxHoIR7gAOLZuiLCbNDELDI_Rs5mdk=.a536e484-0094-4815-bd07-f1f7cf339d53@github.com>
References: <PrOpTvYJIbrN8uxHoIR7gAOLZuiLCbNDELDI_Rs5mdk=.a536e484-0094-4815-bd07-f1f7cf339d53@github.com>
Message-ID: <sQpWsW33pqJzbp_rOu6loHUUW6RCJuRnfycf1wFVhIM=.6e38ab0a-6dbc-4e38-9a20-c10aa47f73fa@github.com>

> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges.
> 
> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction.
> 
> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations.
> 
> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm.  Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size.
> 
> The patch shows around 5-20% improvement in code size by facilitating NDD demotion.
> 
> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint.
>  
> **Micro:-**
> <img width="1344" height="315" alt="image" src="https://github.com/user-attachments/assets/9cbe9da8-d6af-4b1c-bb55-3e5d86eb2cf9" />
> 
> 
> **Baseline :-**
> <img width="1013" height="163" alt="image" src="https://github.com/user-attachments/assets/ff5d50c6-fdfa-40e8-b93d-5f117d5a1ac6" />
> 
> **With opt:-**
> <img width="940" height="160" alt="image" src="https://github.com/user-attachments/assets/bff425b0-f7bf-4ffd-a43d-18bdeb36b000" />
> 
> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html).
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - Updating as per reivew suggestions
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016
 - Some refactoring
 - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions

-------------

Changes: https://git.openjdk.org/jdk/pull/26283/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=01
  Stats: 87 lines in 2 files changed: 70 ins; 6 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/26283.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283

PR: https://git.openjdk.org/jdk/pull/26283

From hgreule at openjdk.org  Fri Sep 19 11:19:13 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Fri, 19 Sep 2025 11:19:13 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v10]
In-Reply-To: <HV-W-zZ6kxBI2gg4DnuQyFLxOUCNWcKzwM4GSdFyEPo=.b553be10-cd24-4957-90a3-6ca9970ac2f2@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <HV-W-zZ6kxBI2gg4DnuQyFLxOUCNWcKzwM4GSdFyEPo=.b553be10-cd24-4957-90a3-6ca9970ac2f2@github.com>
Message-ID: <vZycyM-_4jUo27au1btICpgKr99tMu5qLEqo_aoL-fw=.efe8d355-c4b3-46ae-98eb-4c307839c717@github.com>

On Fri, 19 Sep 2025 09:49:16 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update TestPopCountValueTransforms.java

src/hotspot/share/opto/countbitsnode.cpp line 123:

> 121: // we have at least and at most.
> 122: // From the definition of KnownBits, we know:
> 123: //   zeros: Indicates which bits must be 0: ones[i] =1 -> t[i]=0

I'm a bit confused by this, is ones[i] mixed up with zeros[i]? I.e., t[i]=0 if zeros[i]=1

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2362569002

From dlunden at openjdk.org  Fri Sep 19 11:33:39 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 19 Sep 2025 11:33:39 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <ZrN8HsAdCLQyd-KXHk4xJ7Q_wvtmYNQyz6C_BVWhxqo=.b774c00f-5f2f-476b-ae35-02f7fe4cb52d@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
 <yAWlqsoRw6XTVuYKBC9QXG_5rH8xx2MsotmdBqZ3M3Q=.77688482-c807-4a15-9f22-2b868e977f8c@github.com>
 <Pk7jfQR86D8wKhFB7wFZ6M6dSzWVPLuaaIHK8lV8i-U=.e75818b5-ed36-4b41-bfd5-6750e3df7722@github.com>
 <ZrN8HsAdCLQyd-KXHk4xJ7Q_wvtmYNQyz6C_BVWhxqo=.b774c00f-5f2f-476b-ae35-02f7fe4cb52d@github.com>
Message-ID: <27TyCS4slIg2kY1yfm1niF8Rr6jr8BGfXaHAhT11VEA=.cdca78a4-ea0f-4e9d-9da3-c1bab5f5c0e4@github.com>

On Fri, 19 Sep 2025 09:46:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> No idea, sorry (it has been that way since initial load). I just touched it to change from all_stack to infinite_stack.
>
> @dlunde Would you mind investigating in a follow-up RFE? I would just enable the assert and see if it triggers. If not, add the assert back in, otherwise see why the assert fails, and if that looks reasonable. If yes -> just remove it. If it is not reasonable .... we then investigate more I suppose ;)

Sure thing, I'll add it to the list.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2362598084

From dlunden at openjdk.org  Fri Sep 19 11:33:40 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 19 Sep 2025 11:33:40 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v27]
In-Reply-To: <Dxq61i03xtwdwM_JdfOHfyvMD-FoWzOdqMnWjdmPO0A=.98465a3a-7f94-4f5a-a10e-453142fe9a1b@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <vV7UBcDm4sXTqxjr2Nx3RTqfAxcSqn9z-RKD6EWY4yo=.700773cc-37fb-47c8-abc4-1e4ca7c79b7c@github.com>
 <dUFR_8r22-kNkoOcRIfqMuVjQ-a1EegX2YNJ8yGfFZU=.dd0eb3ac-601e-4ca7-ab32-41459788d7a2@github.com>
 <feRNgjFxa1MZDl41muWzG13bQfvr1EjhiH7GcMSj_I4=.731caa37-c1d1-404a-8f3d-1030b4c97a05@github.com>
 <Dxq61i03xtwdwM_JdfOHfyvMD-FoWzOdqMnWjdmPO0A=.98465a3a-7f94-4f5a-a10e-453142fe9a1b@github.com>
Message-ID: <iQcY4LMUYlTud5nCyu5Fk5iwwDBt_aH0YFy1I4FV1f0=.3d45ed0f-a167-4475-82ef-e0c646c1c859@github.com>

On Fri, 19 Sep 2025 09:39:44 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> No strong opinion here, I just went with the eclipse-jdtls autoformatter defaults. The generated code does have fairly OK indentation (the indentation in the code does not add any actual indentation in the generated code). Let me know what you prefer and I'll update it.
>
> I would prefer readability of the test, not the generated code.

OK, what I meant is that I did not understand how exactly you wanted me to make the test more readable. But, I had a look at the template framework example and will update to use the same style (align the `"""`)!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2362593056

From dlunden at openjdk.org  Fri Sep 19 12:43:06 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 19 Sep 2025 12:43:06 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v29]
In-Reply-To: <VnK4MpXfvSaF2IOGwZjTStgs1z1j8m6CWGfjnFQ5RHI=.15e345a1-753e-4ae5-befb-66c98b8a40dd@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <IJFPzAZnUjnsVQZOmcCfGEa-vgUTBCSYvIkTAzWgFyo=.2a3d0da5-7fb5-48af-b598-77241699f350@github.com>
 <VnK4MpXfvSaF2IOGwZjTStgs1z1j8m6CWGfjnFQ5RHI=.15e345a1-753e-4ae5-befb-66c98b8a40dd@github.com>
Message-ID: <GuLzp8GYOyUyn02e-HqBRBTiNp_lT8cSDdu6xRY1r1o=.000989ee-ad7a-40b5-ab1d-53c975a1d7d4@github.com>

On Fri, 19 Sep 2025 09:53:43 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update after comments from Emanuel
>
> test/hotspot/gtest/opto/test_regmask.cpp line 1222:
> 
>> 1220: }
>> 1221: 
>> 1222: #endif // !PRODUCT
> 
> Optional:
> You could add some tests that expect a vm assert. You can do that with `TEST_VM_ASSERT_MSG`. Example:
> https://github.com/openjdk/jdk/blob/9b04b5a74cc09b64098fb9940aa224f529ff1a01/test/hotspot/gtest/utilities/test_growableArray.cpp

Thanks, added a few obvious such tests!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2362753532

From dlunden at openjdk.org  Fri Sep 19 12:43:01 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 19 Sep 2025 12:43:01 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v30]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <ivJSMICnKRy3-cG5lqlKtidwCQGyStE0Ly4DkfXxRXs=.c164db8d-673d-422e-a0cd-7f6272700655@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Add vm-assert tests and improve template framework test indentation

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20404/files
  - new: https://git.openjdk.org/jdk/pull/20404/files/9b04b5a7..e165c961

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=29
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=28-29

  Stats: 99 lines in 2 files changed: 37 ins; 3 del; 59 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From dlunden at openjdk.org  Fri Sep 19 12:46:42 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 19 Sep 2025 12:46:42 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v23]
In-Reply-To: <nHJuTQPo-3ZXu6X2rLAIwNzjnTTreNLkt5KuNeR-3mY=.183e982f-fa66-4ddc-b997-8a8203fe16db@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <KuhZYofHDkGkzw1Kq6vDvRs4_aDxOJDbTpIL8gnkQL8=.0d25e4bc-1f73-490f-a65b-29bef7ac8903@github.com>
 <qfJuLa2rYGYnrmbp32LpJgVaZfShvNjVkGOuJrSw00A=.5f7b712d-5700-45b5-8beb-fde3611e31de@github.com>
 <GjF5qX4BV-4xAWV6kDweN3luDSVQXxxp5i6creb7_L4=.085a85af-0ec5-42ca-a076-bbf554853d3a@github.com>
 <gS-eZ4OguAN-N_CI_x4TipmDjyr1XiPce49X0z3AWc4=.afc55282-4ad2-4d65-9912-d513d11585a3@github.com>
 <nHJuTQPo-3ZXu6X2rLAIwNzjnTTreNLkt5KuNeR-3mY=.183e982f-fa66-4ddc-b997-8a8203fe16db@github.com>
Message-ID: <5Sarw16IDhzy-qXC9hrXwflQ549w5lcDQRSJNTWJwv0=.944fbf98-dd34-4607-b08c-591935df359c@github.com>

On Wed, 17 Sep 2025 09:56:16 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> @dlunde Thanks for the swift updates! I have in the meantime added some more comments, just making sure you don't miss them :)
>
> @eme64
> 
>> You seem to have a build failure:
>> 
>> ```
>> In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/compile.hpp:43,
>>                  from /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:29,
>>                  from /home/runner/work/jdk/jdk/test/hotspot/gtest/opto/test_rangeinference.cpp:26:
>> /home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp: In constructor ?RegMask::RegMask(Arena*)?:
>> /home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp:441:53: error: class ?RegMask? does not have any field named ?_read_only?
>>   441 |       : _rm_word() DEBUG_ONLY(COMMA _arena(arena)), _read_only(read_only),
>>       |                                                     ^~~~~~~~~~
>> /home/runner/work/jdk/jdk/src/hotspot/share/opto/regmask.hpp:441:64: error: ?read_only? was not declared in this scope
>>   441 |       : _rm_word() DEBUG_ONLY(COMMA _arena(arena)), _read_only(read_only),
>>       |     
>> ```
> 
> Thanks, only failed on release so didn't notice. Will fix.
> 
>> I really appreciate that you added extensive `gtest`s, thanks for that ?
> 
> @robcasloz contributed 90% of that, so the credit goes to him!
> 
>> And thanks for using the Template Framework, I'm curious to hear if you have any feedback on it :)
> 
> Sure, it was quite convenient. Happy to talk about the experience offline.

> @dlunde I just had another look through the whole code change. And I'm very happy with it now. Especially the additional code comments around `rollover`/`offset` really helped to bring it together for me :)
> 
> Thanks for bearing with me through the many comments / suggestions ?
> 
> I would suggest that either @vnkozlov or @robcasloz have another quick look over the changes, just to see if they agree with what we have been doing ;)

Thank you @eme64 , much appreciated! I agree we have improved the changeset a lot from the initial version. Yes, @robcasloz has let me know he will have a look at the changes soon. @vnkozlov is also welcome to review again of course, but his previous review is for a very much out-of-date version of the changeset.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3312059287

From jbhateja at openjdk.org  Fri Sep 19 12:55:42 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 12:55:42 GMT
Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize
 NDD instructions [v2]
In-Reply-To: <jfGklgmXRANwOFMLnns-6U-iI-E_6RBVYHt3ErYg5RQ=.3614a1ee-2c1e-4e6f-8212-0962b757a332@github.com>
References: <PrOpTvYJIbrN8uxHoIR7gAOLZuiLCbNDELDI_Rs5mdk=.a536e484-0094-4815-bd07-f1f7cf339d53@github.com>
 <jfGklgmXRANwOFMLnns-6U-iI-E_6RBVYHt3ErYg5RQ=.3614a1ee-2c1e-4e6f-8212-0962b757a332@github.com>
Message-ID: <JDZywBBSvq33vGKKY9xSLCVeuhf6neLDWGFkYgZd3jI=.2fea29c2-0b7a-4edc-aeaa-74890b7519b4@github.com>

On Tue, 26 Aug 2025 23:37:01 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
>> 
>>  - Updating as per reivew suggestions
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016
>>  - Some refactoring
>>  - 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions
>
> src/hotspot/share/opto/chaitin.cpp line 1655:
> 
>> 1653:     };
>> 1654: 
>> 1655:     if (X86_ONLY(UseAPX) NOT_X86(false)) {
> 
> The change looks to be generically applicable and not APX or X86 specific.

Hi @sviswa7, I have generalized the fix by lifting X86/APX checks as per the suggestion. Though, our intent here is to facilitate the demotion of NDD instructions having 4 byte EEVEX prefix, in other scenarios of 3-operand instructions, we may not see any benefit from biasing. If a use's live range (LRG) surpasses its user's LRG then, RA automatically prevents sharing of register,  in other case **it may** assign the same register to definition as per first allocation policy. Thus, biasing is only favorable to APX NDD use case where assembler layer is equipped to perform demotion.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26283#discussion_r2362786986

From dlunden at openjdk.org  Fri Sep 19 12:58:40 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 19 Sep 2025 12:58:40 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v31]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <FDQaHboSr6jBkncXBlhzEAMuMjDM6fYn-mUki9THBSs=.d6b6bf26-7a23-47e1-9df0-e82bff309d21@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits:

 - Merge tag 'jdk-26+16' into many-arguments-8325467+pr-updates
   
   Added tag jdk-26+16 for changeset a49856bb
 - Add vm-assert tests and improve template framework test indentation
 - Update after comments from Emanuel
 - Update after comments from Emanuel
 - Clarify comments in regmask.hpp
 - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
 - Address review comments (renaming on the way in a separate PR)
 - Update src/hotspot/share/opto/regmask.hpp
   
   Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
 - Restore modified java/lang/invoke tests
 - Sort includes (new requirement)
 - ... and 33 more: https://git.openjdk.org/jdk/compare/a49856bb...84efc2db

-------------

Changes: https://git.openjdk.org/jdk/pull/20404/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=30
  Stats: 2890 lines in 29 files changed: 2325 ins; 288 del; 277 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From liach at openjdk.org  Fri Sep 19 13:10:33 2025
From: liach at openjdk.org (Chen Liang)
Date: Fri, 19 Sep 2025 13:10:33 GMT
Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v8]
In-Reply-To: <dm6o-XRNWecHjZeenCSRhAOEYI7L_mOOqfguvjlk_Bc=.cdb4e2e5-3500-48d5-8c05-749ff01f06f1@github.com>
References: <dm6o-XRNWecHjZeenCSRhAOEYI7L_mOOqfguvjlk_Bc=.cdb4e2e5-3500-48d5-8c05-749ff01f06f1@github.com>
Message-ID: <lpdw4jDspZqPbC5tKvuFa48-PVaAlapQ82QoKEWX5uE=.a3dfdaed-54c5-4960-9300-5c8c043e6939@github.com>

> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list".

Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:

 - Separate design doc
 - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
 - More review updates
 - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
 - Move intrinsic to be a subsection; just one most common function of the annotation
 - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
 - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
 - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java
   
   Co-authored-by: Raffaello Giulietti <raffaello.giulietti at oracle.com>
 - Shorter first sentence
 - Updates, thanks to John
 - ... and 2 more: https://git.openjdk.org/jdk/compare/0d5ea5a0...e4afa49d

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24777/files
  - new: https://git.openjdk.org/jdk/pull/24777/files/a312d92b..e4afa49d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=06-07

  Stats: 348197 lines in 6043 files changed: 206814 ins; 98457 del; 42926 mod
  Patch: https://git.openjdk.org/jdk/pull/24777.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24777/head:pull/24777

PR: https://git.openjdk.org/jdk/pull/24777

From liach at openjdk.org  Fri Sep 19 13:10:37 2025
From: liach at openjdk.org (Chen Liang)
Date: Fri, 19 Sep 2025 13:10:37 GMT
Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v7]
In-Reply-To: <vfDsc33Hcmqn4buhdw7ZV-jVSMPUkrkNOcCTdkEulCU=.74cb83d5-f2d4-4bc3-96f6-e245ee7f3842@github.com>
References: <dm6o-XRNWecHjZeenCSRhAOEYI7L_mOOqfguvjlk_Bc=.cdb4e2e5-3500-48d5-8c05-749ff01f06f1@github.com>
 <vfDsc33Hcmqn4buhdw7ZV-jVSMPUkrkNOcCTdkEulCU=.74cb83d5-f2d4-4bc3-96f6-e245ee7f3842@github.com>
Message-ID: <dNKHnmkH7M2RIcWPODbIewc-5sYCGFpylYVjUBOz_qA=.8eebdb60-d638-46e5-bb9c-6fee28895c84@github.com>

On Wed, 21 May 2025 21:31:16 GMT, Chen Liang <liach at openjdk.org> wrote:

>> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list".
>
> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:
> 
>  - More review updates
>  - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
>  - Move intrinsic to be a subsection; just one most common function of the annotation
>  - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
>  - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
>  - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java
>    
>    Co-authored-by: Raffaello Giulietti <raffaello.giulietti at oracle.com>
>  - Shorter first sentence
>  - Updates, thanks to John
>  - Refine validation and defensive copying
>  - 8355223: Improve documentation on @IntrinsicCandidate

Let's continue. I've moved the majority of check and stuff into a standalone design doc.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24777#issuecomment-3312125701

From rcastanedalo at openjdk.org  Fri Sep 19 13:12:36 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 19 Sep 2025 13:12:36 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
Message-ID: <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>

On Tue, 9 Sep 2025 11:27:50 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> An `Initialize` node for an `Allocate` node is created with a memory
>> `Proj` of adr type raw memory. In order for stores to be captured, the
>> memory state out of the allocation is a `MergeMem` with slices for the
>> various object fields/array element set to the raw memory `Proj` of
>> the `Initialize` node. If `Phi`s need to be created during later
>> transformations from this memory state, The `Phi` for a particular
>> slice gets its adr type from the type of the `Proj` which is raw
>> memory. If during macro expansion, the `Allocate` is found to have no
>> use and so can be removed, the `Proj` out of the `Initialize` is
>> replaced by the memory state on input to the `Allocate`. A `Phi` for
>> some slice for a field of an object will end up with the raw memory
>> state on input to the `Allocate` node. As a result, memory state at
>> the `Phi` is incorrect and incorrect execution can happen.
>> 
>> The fix I propose is, rather than have a single `Proj` for the memory
>> state out of the `Initialize` with adr type raw memory, to use one
>> `Proj` per slice added to the memory state after the `Initalize`. Each
>> of the `Proj` should return the right adr type for its slice. For that
>> I propose having a new type of `Proj`: `NarrowMemProj` that captures
>> the right adr type.
>> 
>> Logic for the construction of the `Allocate`/`Initialize` subgraph is
>> tweaked so the right adr type captured in is own `NarrowMemProj` is
>> added to the memory sugraph. Code that removes an allocation or moves
>> it also has to be changed so it correctly takes the multiple memory
>> projections out of the `Initialize` node into account.
>> 
>> One tricky issue is that when EA split types for a scalar replaceable
>> `Allocate` node:
>> 
>> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>>   with the type of the slices for the allocation
>>   
>> 2- before EA, the memory state for one particular field out of the
>>   `Initialize` node can be used for a `Store` to the just allocated
>>   object or some other. So we can have a chain of `Store`s, some to
>>   the newly allocated object, some to some other objects, all of them
>>   using the state of `NarrowMemProj` out of the `Initialize`. After
>>   split unique types, the `NarrowMemProj` is for the slice of a
>>   particular allocation. So `Store`s to some other objects shouldn't
>>   use that memory state but the memory state before the `Allocate`.
>>   
>> For that, I added logic to update the adr type of `NarrowMemProj`
>> during split uni...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
> 
>  - more
>  - Merge branch 'master' into JDK-8327963
>  - more
>  - more
>  - Merge branch 'master' into JDK-8327963
>  - more
>  - more
>  - lambda return
>  - lambda clean up
>  - Merge branch 'master' into JDK-8327963
>  - ... and 35 more: https://git.openjdk.org/jdk/compare/e16c5100...b701d03e

Changes requested by rcastanedalo (Reviewer).

src/hotspot/share/opto/escape.hpp line 567:

> 565:                         // MemNode       - new memory input for this node
> 566:                         // CheckCastPP   - allocation that this is a cast of
> 567:                         // allocation    - CheckCastPP of the allocation

Please add a new entry here explaining how `_node_map` is used for `NarrowMemProjNode` nodes.

src/hotspot/share/opto/graphKit.cpp line 3645:

> 3643:     assert(minit_out->is_Proj() && minit_out->in(0) == init, "");
> 3644:     int mark_idx = C->get_alias_index(oop_type->add_offset(oopDesc::mark_offset_in_bytes()));
> 3645:     // Add an edge in the MergeMem for the header fields so an access to one of those has correct memory state

Suggestion:

    // Add an edge in the MergeMem for the header fields so an access to one of those has correct memory state.

src/hotspot/share/opto/graphKit.cpp line 3647:

> 3645:     // Add an edge in the MergeMem for the header fields so an access to one of those has correct memory state
> 3646:     // Use one NarrowMemProjNode per slice to properly record the adr type of each slice. The Initialize node will have
> 3647:     // multiple projection as a result.

Suggestion:

    // multiple projections as a result.

src/hotspot/share/opto/macro.cpp line 1606:

> 1604:       // elimination. Simply add the MemBarStoreStore after object
> 1605:       // initialization.
> 1606:       MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw);

Does the same argument as below apply for relaxing the scope of this memory barrier? Please clarify in a similar comment for this case (if the same argument applies, a reference to the comment below would be enough).

src/hotspot/share/opto/macro.cpp line 1623:

> 1621:       Node* init_ctrl = init->proj_out_or_null(TypeFunc::Control);
> 1622: 
> 1623:       // What we want is to prevent the compiler and the cpu from re-ordering the stores that initialize this object

Suggestion:

      // What we want is to prevent the compiler and the CPU from re-ordering the stores that initialize this object

src/hotspot/share/opto/macro.cpp line 1628:

> 1626:       // only captures/produces a partial memory state making it complicated to insert such a MemBar. Because
> 1627:       // re-ordering by the compiler can't happen by construction (a later Store that publishes the just allocated
> 1628:       // object reference is indirectly control dependent on the Initialize node), preventing reordering by the cpu is

Suggestion:

      // object reference is indirectly control dependent on the Initialize node), preventing reordering by the CPU is

src/hotspot/share/opto/memnode.hpp line 1383:

> 1381:   bool already_has_narrow_mem_proj_with_adr_type(const TypePtr* adr_type) const;
> 1382: 
> 1383:   MachProjNode* mem_mach_proj() const;

Please add a brief comment above this function, possibly clarifying that we do not expect to find more than one Mach memory projection.

src/hotspot/share/opto/multnode.cpp line 73:

> 71:   };
> 72:   return apply_to_projs(filter, which_proj);
> 73: }

Consider moving this implementation to `multnode.hpp`, perhaps next to that of `MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj)`,  for consistency.

src/hotspot/share/opto/multnode.cpp line 279:

> 277: void NarrowMemProjNode::dump_spec(outputStream *st) const {
> 278:   ProjNode::dump_spec(st);
> 279:   dump_adr_type(st);

Do we need to define a special version of `NarrowMemProjNode::dump_adr_type` or could we just have the same effect calling `MemNode::dump_adr_type(this, _adr_type, st)` here?

src/hotspot/share/opto/multnode.cpp line 284:

> 282: void NarrowMemProjNode::dump_compact_spec(outputStream *st) const {
> 283:   ProjNode::dump_compact_spec(st);
> 284:   dump_adr_type(st);

Same here.

src/hotspot/share/opto/multnode.hpp line 71:

> 69:     }
> 70:     Node* current() {
> 71:       return _node->fast_out(_i);;

Suggestion:

      return _node->fast_out(_i);

src/hotspot/share/opto/multnode.hpp line 90:

> 88:     }
> 89:     Node* current() {
> 90:       return _node->out(_i);;

Suggestion:

      return _node->out(_i);

src/hotspot/share/opto/phaseX.cpp line 2621:

> 2619:       add_users_to_worklist0(proj, worklist);
> 2620:       return MultiNode::CONTINUE;
> 2621:     };

Consider defining `enqueue` only once and reusing it in both cases.

test/hotspot/jtreg/compiler/escapeAnalysis/TestIterativeEA.java line 53:

> 51:     analyzer.shouldContain("++++ Eliminated: 26 Allocate");
> 52:     analyzer.shouldContain("++++ Eliminated: 51 Allocate");
> 53:     analyzer.shouldContain("++++ Eliminated: 84 Allocate");

Did you analyze why there are more allocations removed than before in this test case? I did not expect this changeset to have an effect on the number of removed allocations.

test/hotspot/jtreg/compiler/macronodes/TestEarlyEliminationOfAllocationWithoutUse.java line 1:

> 1: /*

Please add a package declaration (and make the corresponding class names fully qualified in the `@run` directives).

test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 30:

> 28:  *          Now that array slice depends on the rawslice. And then when the Initialize MemBar gets
> 29:  *          removed in expand_allocate_common, the rawslice sees that it has now no effect, looks
> 30:  *          through the MergeMem and sees the initial stae. That way, also the linked array slice

Suggestion:

 *          through the MergeMem and sees the initial state. That way, also the linked array slice

-------------

PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3244667543
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362830370
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362759304
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362760441
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362798596
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362800147
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362800934
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362782847
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362757140
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362743051
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362743403
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362746650
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362750245
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362767659
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362816473
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362810978
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362745517

From jbhateja at openjdk.org  Fri Sep 19 13:17:04 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 13:17:04 GMT
Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize
 NDD instructions [v3]
In-Reply-To: <PrOpTvYJIbrN8uxHoIR7gAOLZuiLCbNDELDI_Rs5mdk=.a536e484-0094-4815-bd07-f1f7cf339d53@github.com>
References: <PrOpTvYJIbrN8uxHoIR7gAOLZuiLCbNDELDI_Rs5mdk=.a536e484-0094-4815-bd07-f1f7cf339d53@github.com>
Message-ID: <52RpYM-r-1EZcYjbaNllAEPHQP1nYhQcs-GfydIzP08=.0bfb8185-78a7-4dfb-9700-f4a36a1d0e99@github.com>

> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges.
> 
> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction.
> 
> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations.
> 
> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm.  Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size.
> 
> The patch shows around 5-20% improvement in code size by facilitating NDD demotion.
> 
> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint.
>  
> **Micro:-**
> <img width="1344" height="315" alt="image" src="https://github.com/user-attachments/assets/9cbe9da8-d6af-4b1c-bb55-3e5d86eb2cf9" />
> 
> 
> **Baseline :-**
> <img width="1013" height="163" alt="image" src="https://github.com/user-attachments/assets/ff5d50c6-fdfa-40e8-b93d-5f117d5a1ac6" />
> 
> **With opt:-**
> <img width="940" height="160" alt="image" src="https://github.com/user-attachments/assets/bff425b0-f7bf-4ffd-a43d-18bdeb36b000" />
> 
> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html).
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Fix jtreg, one less spill

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26283/files
  - new: https://git.openjdk.org/jdk/pull/26283/files/cd13fe60..3ebe52fa

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26283&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/26283.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26283/head:pull/26283

PR: https://git.openjdk.org/jdk/pull/26283

From qamai at openjdk.org  Fri Sep 19 13:52:21 2025
From: qamai at openjdk.org (Quan Anh Mai)
Date: Fri, 19 Sep 2025 13:52:21 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v9]
In-Reply-To: <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
Message-ID: <JsoeZ4pIsbb82caU5edkfowPdV2xEwgCE7zYLNSdeKI=.7482e027-45e6-455c-82b7-bf7ee2e2f209@github.com>

On Sun, 14 Sep 2025 14:44:02 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This change improves the precision of the `Mod(I|L)Node::Value()` functions.
>> 
>> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early.
>> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions.
>> 
>> ### Monotonicity
>> 
>> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range).
>> 
>> ### Testing
>> 
>> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something).
>> 
>> Please review and let me know what you think.
>> 
>> ### Other
>> 
>> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508.
>> 
>> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into:
>> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement?
>> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd.
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove unused parameter

src/hotspot/share/opto/divnode.cpp line 1211:

> 1209:   // We always generate the dynamic check for 0.
> 1210:   // 0 MOD X is 0
> 1211:   if (t1 == TypeInteger::zero(bt)) { return t1; }

I think the culprit for [JDK-8356813](https://bugs.openjdk.org/browse/JDK-8356813) is this place. We need to check for the divisor being a constant 0 and return `Type::TOP` before this check and the check below.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2362960218

From hgreule at openjdk.org  Fri Sep 19 14:17:26 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Fri, 19 Sep 2025 14:17:26 GMT
Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v9]
In-Reply-To: <JsoeZ4pIsbb82caU5edkfowPdV2xEwgCE7zYLNSdeKI=.7482e027-45e6-455c-82b7-bf7ee2e2f209@github.com>
References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com>
 <1ZCEMsPvSQaLGWRuNtO89LNP_XUeaz-edeIUrKwRCZY=.9dad5a02-c739-4e24-8692-8941f31e5a49@github.com>
 <JsoeZ4pIsbb82caU5edkfowPdV2xEwgCE7zYLNSdeKI=.7482e027-45e6-455c-82b7-bf7ee2e2f209@github.com>
Message-ID: <edc2trZzNHV5kGhrpFBvXLEedYXZJYVxlJIKVaUVTHA=.a07a2b7f-5b88-4108-8b9d-68813906f279@github.com>

On Fri, 19 Sep 2025 13:49:08 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   remove unused parameter
>
> src/hotspot/share/opto/divnode.cpp line 1211:
> 
>> 1209:   // We always generate the dynamic check for 0.
>> 1210:   // 0 MOD X is 0
>> 1211:   if (t1 == TypeInteger::zero(bt)) { return t1; }
> 
> I think the culprit for [JDK-8356813](https://bugs.openjdk.org/browse/JDK-8356813) is this place. We need to check for the divisor being a constant 0 and return `Type::TOP` before this check and the check below.

Yes, I already worked a bit on it, see https://github.com/SirYwell/jdk/tree/fix/mod-not-monotonic but I didn't have time to create a PR yet.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2363030436

From dlunden at openjdk.org  Fri Sep 19 16:02:35 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Fri, 19 Sep 2025 16:02:35 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v32]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <_3SEByIuKhkAQvZ9gvMOHYMH2y_Xh9F4UM1lS2ixzpw=.f572fe77-31f0-4724-9611-9f53231d6bec@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Increase timeout for TestMethodArguments.java

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20404/files
  - new: https://git.openjdk.org/jdk/pull/20404/files/84efc2db..1dd5084f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=31
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=30-31

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From sviswanathan at openjdk.org  Fri Sep 19 16:23:56 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Fri, 19 Sep 2025 16:23:56 GMT
Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize
 NDD instructions [v3]
In-Reply-To: <52RpYM-r-1EZcYjbaNllAEPHQP1nYhQcs-GfydIzP08=.0bfb8185-78a7-4dfb-9700-f4a36a1d0e99@github.com>
References: <PrOpTvYJIbrN8uxHoIR7gAOLZuiLCbNDELDI_Rs5mdk=.a536e484-0094-4815-bd07-f1f7cf339d53@github.com>
 <52RpYM-r-1EZcYjbaNllAEPHQP1nYhQcs-GfydIzP08=.0bfb8185-78a7-4dfb-9700-f4a36a1d0e99@github.com>
Message-ID: <x6jAKIYb2ljsZI05JUeCzHER3ADYDoyM5OEd3WnWHcA=.da741556-3b49-485e-af7e-866bee4ce1a1@github.com>

On Fri, 19 Sep 2025 13:17:04 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges.
>> 
>> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction.
>> 
>> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations.
>> 
>> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm.  Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size.
>> 
>> The patch shows around 5-20% improvement in code size by facilitating NDD demotion.
>> 
>> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint.
>>  
>> **Micro:-**
>> <img width="1344" height="315" alt="image" src="https://github.com/user-attachments/assets/9cbe9da8-d6af-4b1c-bb55-3e5d86eb2cf9" />
>> 
>> 
>> **Baseline :-**
>> <img width="1013" height="163" alt="image" src="https://github.com/user-attachments/assets/ff5d50c6-fdfa-40e8-b93d-5f117d5a1ac6" />
>> 
>> **With opt:-**
>> <img width="940" height="160" alt="image" src="https://github.com/user-attachments/assets/bff425b0-f7bf-4ffd-a43d-18bdeb36b000" />
>> 
>> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html).
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix jtreg, one less spill

Looks good to me.

-------------

Marked as reviewed by sviswanathan (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3245738167

From iveresov at openjdk.org  Fri Sep 19 16:32:40 2025
From: iveresov at openjdk.org (Igor Veresov)
Date: Fri, 19 Sep 2025 16:32:40 GMT
Subject: RFR: 8368071: Compilation throughput regressed 2X-8X after
 JDK-8355003
In-Reply-To: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
References: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
Message-ID: <rNKSb3_WKY-mazwdVEoswucuK2EPuTdid92R0TXyNpk=.972ef990-7cb7-4234-ada9-56c2d40f8579@github.com>

On Fri, 19 Sep 2025 07:52:16 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi all,
> 
> Could anyone review this change that fixes a severe startup performance regression for `-XX:+TieredCompilation`?  See https://bugs.openjdk.org/browse/JDK-8368071 for more details.
> 
> -Man

Good catch! I need to refactor some of it in the future but for now it's a good conservative fix. Let me run some internal testing on it first and I'll get back to you.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27383#issuecomment-3312866388

From shade at openjdk.org  Fri Sep 19 16:33:46 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 19 Sep 2025 16:33:46 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
 [v2]
In-Reply-To: <VIQ-aNGEAnxFxJBZsGBJsnbK0ZlPWd-BAhpirU1HjMM=.dc6c04f9-53f2-4eec-bc78-8d61fdcddc43@github.com>
References: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
 <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>
 <VIQ-aNGEAnxFxJBZsGBJsnbK0ZlPWd-BAhpirU1HjMM=.dc6c04f9-53f2-4eec-bc78-8d61fdcddc43@github.com>
Message-ID: <_P0NOFGGa3uTaZJ17X8jYvgtbbOU90SD6LJ-mM4-P-U=.5910d123-6993-49b2-8d65-4776f0333d4c@github.com>

On Wed, 17 Sep 2025 23:24:16 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>>  - Drop atomic counters
>>  - Initial version
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4853:
> 
>> 4851:       } else {
>> 4852:         // Nothing to do, just go with defaults.
>> 4853:         assert_different_registers(rax, mdp, recv, offset);
> 
> Can't we do all register shuffling and push/pop outside the loop?

I remember having an initial version that did it, but the code ended up even hairier and inefficient, because: a) there are different exits from the loop; b) in majority of cases we do not need to do any shuffling (e.g. none of the registers in questions are not `rax`); c) it also caused some branches to become un-shortened. For this profiling stencil, every instruction counts :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2363548849

From shade at openjdk.org  Fri Sep 19 16:51:14 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Fri, 19 Sep 2025 16:51:14 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
 [v2]
In-Reply-To: <oNoxJOpfphpVIrQxryFIDOeRjhdBGb8GGpskNXExN1k=.69cd182f-c989-4431-a902-9c89ae136dac@github.com>
References: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
 <xpPUX9kLrXOkdpnXBq9YxRg7Xqmqtxanr9exEfPPn-I=.39db801b-1227-4d8e-8103-1317fb914731@github.com>
 <oNoxJOpfphpVIrQxryFIDOeRjhdBGb8GGpskNXExN1k=.69cd182f-c989-4431-a902-9c89ae136dac@github.com>
Message-ID: <UPCJTknR1OW8hhPJ7BeDMft9L4MPHFhV3pwlcOgclPs=.54905ad7-4ef8-4065-ad35-f0b373fab8e4@github.com>

On Wed, 17 Sep 2025 23:38:39 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>>  - Drop atomic counters
>>  - Initial version
>
> src/hotspot/cpu/x86/interp_masm_x86.cpp line 1342:
> 
>> 1340: 
>> 1341:     // Record the receiver type.
>> 1342:     type_profile(receiver, mdp, 0);
> 
> Why is 0 the correct offset?  The C1 helper uses md->byte_offset_of_slot().

Interpreter and C1 do profiling data offsets a bit differently.

Interpreter tracks MDP as BCI changes.  It has to, because it does not really know statically where it is. Take a look at `InterpreterMacroAssembler::update_mdp_*` family of methods, and one of its uses:


void InterpreterMacroAssembler::profile_taken_branch(Register mdp) {
  if (ProfileInterpreter) {
    Label profile_continue;

    // If no method data exists, go to profile_continue.
    test_method_data_pointer(mdp, profile_continue);

    // We are taking a branch.  Increment the taken count.
    increment_mdp_data_at(mdp, in_bytes(JumpData::taken_offset()));

    // The method data pointer needs to be updated to reflect the new target.
    update_mdp_by_offset(mdp, in_bytes(JumpData::displacement_offset()));
    bind(profile_continue);
  }
}


It is fairly confusing in interpreter code that `mdp` is not pointing to `MethodData*` head, but  actually is the _interior_ pointer somewhere in MDP. Profiling code is weaved in in such a way that MDP at current point is pointing at area that belongs to current BCI.

Compilers are able to compute the mapping from BCI to MDP to data slot directly, since they have a good view on the whole method and can ask VM questions about the slot addresses. C1 commonly does this:


  ciProfileData* data = md->bci_to_data(bci);
  md->byte_offset_of_slot(data, <interior-offset>);


Anyhow, I did most of the interface changes mechanically, so the `0` slot offset naturally appeared in these places through refactoring. Which gives me additional confidence about its correctness.

> src/hotspot/cpu/x86/interp_masm_x86.cpp line 1553:
> 
>> 1551: 
>> 1552:       // Record the object type.
>> 1553:       record_klass_in_profile(klass, mdp, reg2, false);
> 
> Same question as above about the 0 offset.  Is this because `mdp` has already been adjusted?

Same answer as above.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2363591249
PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2363591882

From chagedorn at openjdk.org  Fri Sep 19 19:51:15 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 19 Sep 2025 19:51:15 GMT
Subject: RFR: 8367613: Test
 compiler/runtime/TestDontCompileHugeMethods.java failed [v2]
In-Reply-To: <5eWiPUhybQOdBZAfm8LnEGLQ8ZwXHcqatCQEf8PVlgo=.ffcd2f7e-f734-49de-a2e4-1099bfb544f5@github.com>
References: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
 <5eWiPUhybQOdBZAfm8LnEGLQ8ZwXHcqatCQEf8PVlgo=.ffcd2f7e-f734-49de-a2e4-1099bfb544f5@github.com>
Message-ID: <KnHt_31cnHQyU5aeujTzilbpsyBmBzFpKx9frCdol_s=.b577a891-7059-4397-baa5-7740d01276c0@github.com>

On Tue, 16 Sep 2025 21:59:10 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi,
>> 
>> Could anyone approve this change that exclude this test when running with `-Xcomp`? This avoids the test failure reported in [JDK-8367613](https://bugs.openjdk.org/browse/JDK-8367613).
>> 
>> For reasons I don't yet understand, the `HugeSwitch::shortMethod` method is not compiled under `-Xcomp  -XX:TieredStopAtLevel=1`. The method gets compiled with either `-Xcomp` or `-XX:TieredStopAtLevel=1`, but not both. I appreciate if anyone could provide insights on possible reasons.
>
> Man Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Switch to disable inlining for shortMethod

Looks good, thanks!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27306#pullrequestreview-3246876138

From manc at openjdk.org  Fri Sep 19 19:56:55 2025
From: manc at openjdk.org (Man Cao)
Date: Fri, 19 Sep 2025 19:56:55 GMT
Subject: Integrated: 8367613: Test
 compiler/runtime/TestDontCompileHugeMethods.java failed
In-Reply-To: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
References: <Ouu1htv7wGD_GqsyJhwk1-lH-0ix8UnduNiEBIhuFg0=.8f8393db-6357-4c7b-9a5b-620b3834e9b1@github.com>
Message-ID: <gwPhoPwqZ6oXD9vrhKWPhdQUCIm4pmBlgWusDcQIPAU=.d679953e-ddf5-4ab1-94cc-ff9aa5318a5e@github.com>

On Tue, 16 Sep 2025 06:48:23 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi,
> 
> Could anyone approve this change that exclude this test when running with `-Xcomp`? This avoids the test failure reported in [JDK-8367613](https://bugs.openjdk.org/browse/JDK-8367613).
> 
> For reasons I don't yet understand, the `HugeSwitch::shortMethod` method is not compiled under `-Xcomp  -XX:TieredStopAtLevel=1`. The method gets compiled with either `-Xcomp` or `-XX:TieredStopAtLevel=1`, but not both. I appreciate if anyone could provide insights on possible reasons.

This pull request has now been integrated.

Changeset: 25a4e263
Author:    Man Cao <manc at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/25a4e26320340cdda082cd45639e73b137ce45a2
Stats:     5 lines in 1 file changed: 4 ins; 0 del; 1 mod

8367613: Test compiler/runtime/TestDontCompileHugeMethods.java failed

Reviewed-by: chagedorn, dfenacci

-------------

PR: https://git.openjdk.org/jdk/pull/27306

From jbhateja at openjdk.org  Fri Sep 19 20:44:54 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 20:44:54 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v11]
In-Reply-To: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
Message-ID: <gIml69-bcFkT1V5ug3zT4fKrJBIv1lE8HDSZdas7Qgo=.a4e1f67b-086a-4f22-96aa-d5f4bd5a8a9d@github.com>

> This patch optimizes PopCount value transforms using KnownBits information.
> Following are the results of the micro-benchmark included with the patch
> 
> 
> 
> System: 13th Gen Intel(R) Core(TM) i3-1315U
> 
> Baseline:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
> 
> Withopt:
> Benchmark                                      Mode  Cnt       Score   Error  Units
> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Update countbitsnode.cpp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27075/files
  - new: https://git.openjdk.org/jdk/pull/27075/files/92cf2fad..e206ccc3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27075&range=09-10

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/27075.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075

PR: https://git.openjdk.org/jdk/pull/27075

From jbhateja at openjdk.org  Fri Sep 19 20:52:32 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 19 Sep 2025 20:52:32 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v10]
In-Reply-To: <vZycyM-_4jUo27au1btICpgKr99tMu5qLEqo_aoL-fw=.efe8d355-c4b3-46ae-98eb-4c307839c717@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <HV-W-zZ6kxBI2gg4DnuQyFLxOUCNWcKzwM4GSdFyEPo=.b553be10-cd24-4957-90a3-6ca9970ac2f2@github.com>
 <vZycyM-_4jUo27au1btICpgKr99tMu5qLEqo_aoL-fw=.efe8d355-c4b3-46ae-98eb-4c307839c717@github.com>
Message-ID: <3w_iOULgRgaA1kCHlXrprLPdQfMYcvo1kXqvE7VaaQk=.ab753d0d-1184-4865-bace-564a4938d6d5@github.com>

On Fri, 19 Sep 2025 11:16:26 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update TestPopCountValueTransforms.java
>
> src/hotspot/share/opto/countbitsnode.cpp line 123:
> 
>> 121: // we have at least and at most.
>> 122: // From the definition of KnownBits, we know:
>> 123: //   zeros: Indicates which bits must be 0: ones[i] =1 -> t[i]=0
> 
> I'm a bit confused by this, is ones[i] mixed up with zeros[i]? I.e., t[i]=0 if zeros[i]=1

@SirYwell , comment updated.

Links to formal z3 proofs for this:-

https://github.com/openjdk/jdk/pull/25928#discussion_r2256750507

https://bugs.openjdk.org/browse/JDK-8365205?focusedId=14807707&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14807707:~:text=C%3A%5CGithub%5Csoftwares%5Cz3%5Cz3%2D4.15.2%2Dx64%2Dwin%5Cbin%5Cpython%3Epython3%20known_bits_popcount.py%0AMain%20constraints%20satisfiable.%0AConstraints%20are%20valid%20(negation%20unsatisfiable).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2364498821

From snatarajan at openjdk.org  Fri Sep 19 20:58:21 2025
From: snatarajan at openjdk.org (Saranya Natarajan)
Date: Fri, 19 Sep 2025 20:58:21 GMT
Subject: RFR: 8349835: C2: simplify IGV property printing
Message-ID: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>

The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708).

### Fix 
Implemented the suggested refactoring. 

### Testing 
Github Actions, Tier 1-3

-------------

Commit messages:
 - changing int to bool in a struct
 - fix to failing test
 - initial fix

Changes: https://git.openjdk.org/jdk/pull/26902/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26902&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8349835
  Stats: 117 lines in 2 files changed: 20 ins; 54 del; 43 mod
  Patch: https://git.openjdk.org/jdk/pull/26902.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26902/head:pull/26902

PR: https://git.openjdk.org/jdk/pull/26902

From rriggs at openjdk.org  Fri Sep 19 21:58:22 2025
From: rriggs at openjdk.org (Roger Riggs)
Date: Fri, 19 Sep 2025 21:58:22 GMT
Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v8]
In-Reply-To: <lpdw4jDspZqPbC5tKvuFa48-PVaAlapQ82QoKEWX5uE=.a3dfdaed-54c5-4960-9300-5c8c043e6939@github.com>
References: <dm6o-XRNWecHjZeenCSRhAOEYI7L_mOOqfguvjlk_Bc=.cdb4e2e5-3500-48d5-8c05-749ff01f06f1@github.com>
 <lpdw4jDspZqPbC5tKvuFa48-PVaAlapQ82QoKEWX5uE=.a3dfdaed-54c5-4960-9300-5c8c043e6939@github.com>
Message-ID: <yCYj5SJb98LzA3eczgp7CEpEph1Y8QN0OxHvzG6owXk=.5c8f48df-978b-441e-93d9-fad4318e2538@github.com>

On Fri, 19 Sep 2025 13:10:33 GMT, Chen Liang <liach at openjdk.org> wrote:

>> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list".
>
> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:
> 
>  - Separate design doc
>  - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
>  - More review updates
>  - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
>  - Move intrinsic to be a subsection; just one most common function of the annotation
>  - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
>  - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate
>  - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java
>    
>    Co-authored-by: Raffaello Giulietti <raffaello.giulietti at oracle.com>
>  - Shorter first sentence
>  - Updates, thanks to John
>  - ... and 2 more: https://git.openjdk.org/jdk/compare/380c643a...e4afa49d

This seems more like guidance for people writing intrinsics and should be in the HotSpot part of the src tree. The annotation can link there.

src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 42:

> 40: /// what intrinsics are and cautions for working with annotated methods.
> 41: ///
> 42: /// @since 16

Lets stick to the javadoc /*... */ markup.

src/java.base/share/classes/jdk/internal/vm/annotation/intrinsics.md line 1:

> 1: <!--

Please remove the <cr> characters. LF line endings only please.

-------------

PR Review: https://git.openjdk.org/jdk/pull/24777#pullrequestreview-3247243759
PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2364599695
PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2364600305

From missa at openjdk.org  Sat Sep 20 00:04:43 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Sat, 20 Sep 2025 00:04:43 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v16]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <GFVrKRIgcLl23x-KBrS8RiTNuR2VC9aWqblxrhzrIbw=.a46690ad-e2d3-44db-81df-1c98f011e6a8@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Use compiler generator instead of standard Java streams

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/a7940ee0..0acc719c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=15
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=14-15

  Stats: 48 lines in 1 file changed: 15 ins; 6 del; 27 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From qamai at openjdk.org  Sat Sep 20 06:45:16 2025
From: qamai at openjdk.org (Quan Anh Mai)
Date: Sat, 20 Sep 2025 06:45:16 GMT
Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize
 NDD instructions [v3]
In-Reply-To: <52RpYM-r-1EZcYjbaNllAEPHQP1nYhQcs-GfydIzP08=.0bfb8185-78a7-4dfb-9700-f4a36a1d0e99@github.com>
References: <PrOpTvYJIbrN8uxHoIR7gAOLZuiLCbNDELDI_Rs5mdk=.a536e484-0094-4815-bd07-f1f7cf339d53@github.com>
 <52RpYM-r-1EZcYjbaNllAEPHQP1nYhQcs-GfydIzP08=.0bfb8185-78a7-4dfb-9700-f4a36a1d0e99@github.com>
Message-ID: <PM26qVPprFsdzB38XHrGrggvkqTbW4J0H2FgNqQC3Ns=.2d4cd197-a6e3-4eb9-bae4-fd662101db02@github.com>

On Fri, 19 Sep 2025 13:17:04 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges.
>> 
>> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction.
>> 
>> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations.
>> 
>> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm.  Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size.
>> 
>> The patch shows around 5-20% improvement in code size by facilitating NDD demotion.
>> 
>> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint.
>>  
>> **Micro:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/9cbe9da8-d6af-4b1c-bb55-3e5d86eb2cf9" />
>> 
>> 
>> **Baseline :-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/ff5d50c6-fdfa-40e8-b93d-5f117d5a1ac6" />
>> 
>> **With opt:-**
>> <img width="900" height="300" alt="image" src="https://github.com/user-attachments/assets/bff425b0-f7bf-4ffd-a43d-18bdeb36b000" />
>> 
>> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html).
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix jtreg, one less spill

I can't approve this approach. I think blindly biasing the color of an operation to that of its input is too optimistic and will lead to numerous false-positive cases. It is better to have a more fine-grained selection using the script in the ad file. For example:

    instruct addI_rReg_ndd(rRegI dst, rRegI src1, rRegI src2, rFlagsReg cr)
    %{
      predicate(UseAPX);
      match(Set dst (AddI src1 src2));
      effect(KILL cr);
      bias(src1);
      flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag);

      format %{ "eaddl    $dst, $src1, $src2\t# int ndd" %}
      ins_encode %{
        __ eaddl($dst$$Register, $src1$$Register, $src2$$Register, false);
      %}
      ins_pipe(ialu_reg_reg);
    %}

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3314641414

From iveresov at openjdk.org  Sat Sep 20 19:26:20 2025
From: iveresov at openjdk.org (Igor Veresov)
Date: Sat, 20 Sep 2025 19:26:20 GMT
Subject: RFR: 8368071: Compilation throughput regressed 2X-8X after
 JDK-8355003
In-Reply-To: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
References: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
Message-ID: <HGM020Js7lTj5l5z7LFyn4pK6kX0CkF8DC9sLIWv4Wo=.5ca06fd5-1c9e-491d-a308-a8a22fb96973@github.com>

On Fri, 19 Sep 2025 07:52:16 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi all,
> 
> Could anyone review this change that fixes a severe startup performance regression for `-XX:+TieredCompilation`?  See https://bugs.openjdk.org/browse/JDK-8368071 for more details.
> 
> -Man

Testing looks good.

-------------

Marked as reviewed by iveresov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27383#pullrequestreview-3249207121

From jbhateja at openjdk.org  Sat Sep 20 19:50:16 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Sat, 20 Sep 2025 19:50:16 GMT
Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize
 NDD instructions [v3]
In-Reply-To: <PM26qVPprFsdzB38XHrGrggvkqTbW4J0H2FgNqQC3Ns=.2d4cd197-a6e3-4eb9-bae4-fd662101db02@github.com>
References: <PrOpTvYJIbrN8uxHoIR7gAOLZuiLCbNDELDI_Rs5mdk=.a536e484-0094-4815-bd07-f1f7cf339d53@github.com>
 <52RpYM-r-1EZcYjbaNllAEPHQP1nYhQcs-GfydIzP08=.0bfb8185-78a7-4dfb-9700-f4a36a1d0e99@github.com>
 <PM26qVPprFsdzB38XHrGrggvkqTbW4J0H2FgNqQC3Ns=.2d4cd197-a6e3-4eb9-bae4-fd662101db02@github.com>
Message-ID: <7rNGFuFTMSG8xdoIFjIncfu0Ybq2nocT-mXzO6r4wyo=.7df27c0b-b831-4506-a8c7-511393a3ddf3@github.com>

On Sat, 20 Sep 2025 06:42:29 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

> I can't approve this approach. I think blindly biasing the color of an operation to that of its input is too optimistic and will lead to numerous false-positive cases. It is better to have a more fine-grained selection using the script in the ad file. For example:
> 
> ```
> instruct addI_rReg_ndd(rRegI dst, rRegI src1, rRegI src2, rFlagsReg cr)
> %{
>   predicate(UseAPX);
>   match(Set dst (AddI src1 src2));
>   effect(KILL cr);
>   bias(src1);
>   flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag);
> 
>   format %{ "eaddl    $dst, $src1, $src2\t# int ndd" %}
>   ins_encode %{
>     __ eaddl($dst$$Register, $src1$$Register, $src2$$Register, false);
>   %}
>   ins_pipe(ialu_reg_reg);
> %}
> ```

Solution takes into consideration the live range overlaps, biasing is only enforced if source live range ends at its user instruciton, while picking the color we don't follow first color selection but give preference to the bias.  Second operand bias is only enabled for commutative operations.  Biaising is simply an allocation time hint to allocator used while color selection, and does not modify the infererence graph of LRG.  Our assembler now supports EEVEX to REX/REX2 demotion if dst matches to either first or second source operand for commutative operations. So we just don't intent to bias towards the first but second source also.  Also we dont bias destination if it has a bounded live range.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3315199238

From duke at openjdk.org  Sun Sep 21 01:57:31 2025
From: duke at openjdk.org (duke)
Date: Sun, 21 Sep 2025 01:57:31 GMT
Subject: Withdrawn: 8356044: Use Double::hashCode and Long::hashCode in
 java.vm.ci.meta
In-Reply-To: <8SlBOjUBPGyZbR9GxEBZlLzOiNPbdws1GTZ4gGY8v9c=.fdefa26b-52ee-48f9-b814-3981b79f6012@github.com>
References: <8SlBOjUBPGyZbR9GxEBZlLzOiNPbdws1GTZ4gGY8v9c=.fdefa26b-52ee-48f9-b814-3981b79f6012@github.com>
Message-ID: <Er0qBFwZQphF3LnW3EsvlEE15-tKdz1GM5gDo3OYC9Y=.7f4bd5e1-67a6-4020-823b-3bd2113d0e39@github.com>

On Thu, 1 May 2025 16:05:15 GMT, Shaojin Wen <swen at openjdk.org> wrote:

> Similar to #24959 and #24971 and #24987, AbstractProfiledItem/PrimitiveConstant in java.vm.ci.meta can also be simplified similarly.
> 
> Replace manual bitwise operations in hashCode implementations of java.vm.ci.meta.AbstractProfiledItem/java.vm.ci.meta.PrimitiveConstant with Long::hashCode/Double.hashCode.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/24988

From hgreule at openjdk.org  Sun Sep 21 06:19:56 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Sun, 21 Sep 2025 06:19:56 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
Message-ID: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>

Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.

Please review :)

-------------

Commit messages:
 - move up div by zero check
 - test

Changes: https://git.openjdk.org/jdk/pull/27408/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27408&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367967
  Stats: 75 lines in 2 files changed: 70 ins; 5 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27408.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27408/head:pull/27408

PR: https://git.openjdk.org/jdk/pull/27408

From hgreule at openjdk.org  Sun Sep 21 06:19:57 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Sun, 21 Sep 2025 06:19:57 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
In-Reply-To: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
Message-ID: <fK2UI1tXDlrbJdy4xfZbeJ8KdCmbNeg6Rx6egVkLTPU=.1b69b460-1065-424e-b08b-51aa2450b609@github.com>

On Sun, 21 Sep 2025 06:11:11 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
> 
> Please review :)

Thanks for the test case!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27408#issuecomment-3315532390

From duke at openjdk.org  Mon Sep 22 01:53:20 2025
From: duke at openjdk.org (erifan)
Date: Mon, 22 Sep 2025 01:53:20 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v4]
In-Reply-To: <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>
Message-ID: <M8sBZALXVbvRjFhxF1FEq2DIyEkJPJnSw6SSpHoH9SI=.2e023efe-05c0-4d3b-ba9d-d3ae1ba76207@github.com>

On Mon, 15 Sep 2025 05:55:43 GMT, erifan <duke at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge branch 'master' into JDK-8363989
>  - Align code example data for better reading
>  - Merge branch 'master' into JDK-8363989
>  - Improve the comment of the vector expand implementation
>  - Merge branch 'master' into JDK-8363989
>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>    
>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>    cases, `expand` has not yet been intrinsified:
>    1. **Subword types** on SVE2-capable hardware.
>    2. **All types** on NEON and SVE1 environments.
>    
>    As a result, `expand` API performance is very poor in these scenarios.
>    This patch intrinsifies the `expand` operation in the above environments.
>    
>    Since there are no native instructions directly corresponding to `expand`
>    in these cases, this patch mainly leverages the `TBL` instruction to
>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>    Take a 128-bit byte vector on SVE2 as an example:
>    ```
>    To compute: dst = src.expand(mask)
>    Data direction: high <== low
>    Input:
>      src                         = p o n m l k j i h g f e d c b a
>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    Expected result:
>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>    ```
>    Step 1: calculate the index input of the TBL instruction.
>    ```
>    // Set tmp1 as all 0 vector.
>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>    
>    // Move the mask bits from the predicate register to a vector register.
>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    
>    // Shift the entire register. Prefix sum algorithm.
>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>    
>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>    
>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>    
>    dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
> ...

Thanks all for your help!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3316494033

From duke at openjdk.org  Mon Sep 22 01:53:20 2025
From: duke at openjdk.org (duke)
Date: Mon, 22 Sep 2025 01:53:20 GMT
Subject: RFR: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation [v4]
In-Reply-To: <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
 <fDFpIq8vnu5rCRyytTRyBgARzXew-IsyKU6rXZmDLlc=.17b763e9-ca3a-4a75-b57e-75af5d11a9ef@github.com>
Message-ID: <1Mw7GB0izL4AcBvqhFSOjtRe7ARd_tSatEPH5ELUVF4=.4608a334-702e-41c1-acf0-67923d183a5a@github.com>

On Mon, 15 Sep 2025 05:55:43 GMT, erifan <duke at openjdk.org> wrote:

>> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
>> 1. **Subword types** on SVE2-capable hardware.
>> 2. **All types** on NEON and SVE1 environments.
>> 
>> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
>> 
>> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
>> 
>> To compute: dst = src.expand(mask)
>> Data direction: high <== low
>> Input:
>>   src                         = p o n m l k j i h g f e d c b a
>>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> Expected result:
>>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> Step 1: calculate the index input of the TBL instruction.
>> 
>> // Set tmp1 as all 0 vector.
>> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 
>> // Move the mask bits from the predicate register to a vector register.
>> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>> 
>> // Shift the entire register. Prefix sum algorithm.
>> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>> 
>> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>> 
>> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>> 
>> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
>> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
>> 
>> // Clear inactive elements.
>> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
>> 
>> // Set the inactive lane value to -1 and set the active lane to the target index.
>> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
>> 
>> Step 2: shuffle the source vector elements to the target vector
>> 
>> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>> 
>> 
>> The same algorithm is used for NEON and...
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Merge branch 'master' into JDK-8363989
>  - Align code example data for better reading
>  - Merge branch 'master' into JDK-8363989
>  - Improve the comment of the vector expand implementation
>  - Merge branch 'master' into JDK-8363989
>  - 8363989: AArch64: Add missing backend support of VectorAPI expand operation
>    
>    Currently, on AArch64, the VectorAPI `expand` operation is intrinsified
>    for 32-bit and 64-bit types only when SVE2 is available. In the following
>    cases, `expand` has not yet been intrinsified:
>    1. **Subword types** on SVE2-capable hardware.
>    2. **All types** on NEON and SVE1 environments.
>    
>    As a result, `expand` API performance is very poor in these scenarios.
>    This patch intrinsifies the `expand` operation in the above environments.
>    
>    Since there are no native instructions directly corresponding to `expand`
>    in these cases, this patch mainly leverages the `TBL` instruction to
>    implement `expand`. To compute the index input for `TBL`, the prefix sum
>    algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used.
>    Take a 128-bit byte vector on SVE2 as an example:
>    ```
>    To compute: dst = src.expand(mask)
>    Data direction: high <== low
>    Input:
>      src                         = p o n m l k j i h g f e d c b a
>      mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    Expected result:
>      dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
>    ```
>    Step 1: calculate the index input of the TBL instruction.
>    ```
>    // Set tmp1 as all 0 vector.
>    tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>    
>    // Move the mask bits from the predicate register to a vector register.
>    // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
>    tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
>    
>    // Shift the entire register. Prefix sum algorithm.
>    dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
>    tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
>    
>    dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
>    tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>    
>    dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
>    tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
>    
>    dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
> ...

@erifan 
Your change (at version a5b7fe9c67d5bbbc0fb4443b8517f4e204dbe21f) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26740#issuecomment-3316496001

From duke at openjdk.org  Mon Sep 22 02:06:31 2025
From: duke at openjdk.org (erifan)
Date: Mon, 22 Sep 2025 02:06:31 GMT
Subject: Integrated: 8363989: AArch64: Add missing backend support of VectorAPI
 expand operation
In-Reply-To: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
References: <v7NVC0Vbj-j8nsmID1xYD8tDF8fIvmnk8YDIJzPRwg4=.1038b719-e9b0-4ec9-b071-59fe37255da6@github.com>
Message-ID: <FjGYAc4T72nlj0vqh_0slFU205DcuadjIu1_jrNFBw0=.a1dc3880-ec5c-4c00-910f-f11f25278c17@github.com>

On Tue, 12 Aug 2025 09:02:01 GMT, erifan <duke at openjdk.org> wrote:

> Currently, on AArch64, the VectorAPI `expand` operation is intrinsified for 32-bit and 64-bit types only when SVE2 is available. In the following cases, `expand` has not yet been intrinsified:
> 1. **Subword types** on SVE2-capable hardware.
> 2. **All types** on NEON and SVE1 environments.
> 
> As a result, `expand` API performance is very poor in these scenarios. This patch intrinsifies the `expand` operation in the above environments.
> 
> Since there are no native instructions directly corresponding to `expand` in these cases, this patch mainly leverages the `TBL` instruction to implement `expand`. To compute the index input for `TBL`, the prefix sum algorithm (see https://en.wikipedia.org/wiki/Prefix_sum) is used. Take a 128-bit byte vector on SVE2 as an example:
> 
> To compute: dst = src.expand(mask)
> Data direction: high <== low
> Input:
>   src                         = p o n m l k j i h g f e d c b a
>   mask                        = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
> Expected result:
>   dst                         = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
> 
> Step 1: calculate the index input of the TBL instruction.
> 
> // Set tmp1 as all 0 vector.
> tmp1                          = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 
> // Move the mask bits from the predicate register to a vector register.
> // **1-bit** mask lane of P register to **8-bit** mask lane of V register.
> tmp2 = mask                   = 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
> 
> // Shift the entire register. Prefix sum algorithm.
> dst = tmp2 << 8               = 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
> tmp2 += dst                   = 0 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1
> 
> dst = tmp2 << 16              = 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0 0
> tmp2 += dst                   = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
> 
> dst = tmp2 << 32              = 2 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0
> tmp2 += dst                   = 4 4 4 4 4 4 4 4 4 4 4 3 2 2 2 1
> 
> dst = tmp2 << 64              = 4 4 4 3 2 2 2 1 0 0 0 0 0 0 0 0
> tmp2 += dst                   = 8 8 8 7 6 6 6 5 4 4 4 3 2 2 2 1
> 
> // Clear inactive elements.
> dst = sel(mask, tmp2, tmp1)   = 0 0 8 7 0 0 6 5 0 0 4 3 0 0 2 1
> 
> // Set the inactive lane value to -1 and set the active lane to the target index.
> dst -= 1                      = -1 -1 7 6 -1 -1 5 4 -1 -1 3 2 -1 -1 1 0
> 
> Step 2: shuffle the source vector elements to the target vector
> 
> tbl(dst, src, dst)            = 0 0 h g 0 0 f e 0 0 d c 0 0 b a
> 
> 
> The same algorithm is used for NEON and SVE1, but with different instructions where appropriate.
> 
> The following benchmarks are from panama-...

This pull request has now been integrated.

Changeset: e6f8450d
Author:    erifan <erfang at nvidia.com>
Committer: Xiaohong Gong <xgong at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/e6f8450d957f79beacf2fc70e545db3a4bb58742
Stats:     485 lines in 9 files changed: 388 ins; 12 del; 85 mod

8363989: AArch64: Add missing backend support of VectorAPI expand operation

Reviewed-by: epeter, eliu, xgong

-------------

PR: https://git.openjdk.org/jdk/pull/26740

From xgong at openjdk.org  Mon Sep 22 02:24:14 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Mon, 22 Sep 2025 02:24:14 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v6]
In-Reply-To: <YUFBN8XGU-ckgmn3-BhncRqCqYQn1FxHfrFgjt7VEi0=.4feae95f-1bae-456d-86de-2f2d7b7fc319@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
 <YUFBN8XGU-ckgmn3-BhncRqCqYQn1FxHfrFgjt7VEi0=.4feae95f-1bae-456d-86de-2f2d7b7fc319@github.com>
Message-ID: <0Drsoxc4WEDr9aTxI_jbqBnKLWqmL2GH_CEbuOQ2Umk=.6e6422c4-9f33-45ae-94d4-e0e4b70b9c6f@github.com>

On Thu, 18 Sep 2025 19:58:09 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

> > I would want them to consider if the approach with the special vector nodes `VectorConcatenateAndNarrow` and `VectorMaskWiden` are really desirable. The complexity needs to go somewhere, but I'm not sure if it is better in the C2 IR or in the backend.
> 
> > It would just be nice to build on "simple" building blocks and not have too many complex nodes, that have very special semantics (widen + split into two)
> 
> Intuitively this seems like the right way to think about it, although I don't have a proposed solution, i am really just agreeing with the above sentiment - a compositional solution, if possible, with the right primitive building blocks will likely be superior.

Thanks for your input @PaulSandoz ! And I agree with making the IR simple enough. I'm now working on finding a better way for these two complex operations. Hope I can fix it soon. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26236#issuecomment-3316545262

From xgong at openjdk.org  Mon Sep 22 02:24:16 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Mon, 22 Sep 2025 02:24:16 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v6]
In-Reply-To: <--dYtit2PWnrw8fxiHum8BLdxnRAWBNfNAz4eGWYI8E=.ac6c9739-e926-47fe-8c5f-db6ef04b906c@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <xc00dQJ-uWgObZ0APgMIRk5rjqgjmixV1VJe92lpzq0=.e6503bf6-68b2-44dc-a2a8-ca9e20803d9d@github.com>
 <3LhOW_sYJcS3zgNB2PLXAQ393WU73hdgjSqmsmoy7VQ=.3cbc1e66-c59e-41b1-80c8-24373797259a@github.com>
 <--dYtit2PWnrw8fxiHum8BLdxnRAWBNfNAz4eGWYI8E=.ac6c9739-e926-47fe-8c5f-db6ef04b906c@github.com>
Message-ID: <t9K7ijx_LDe1RgPpUGxZd8qL3-LpOFNDJ7dr38aHMNA=.9bcf2840-a137-433b-b3cc-cf0d89208e9b@github.com>

On Thu, 18 Sep 2025 12:19:51 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/matcher_aarch64.hpp line 173:
>> 
>>> 171:   // SVE requires vector indices for gather-load/scatter-store operations on all
>>> 172:   // data types.
>>> 173:   static bool gather_scatter_requires_index_in_address(BasicType bt) {
>> 
>> I know I agreed to this naming, but I looked at the signature of `Gather` again:
>> `LoadVectorGatherNode(Node* c, Node* mem, Node* adr, const TypePtr* at, const TypeVect* vt, Node* indices)`
>> 
>> I'm a little confused now what is the `address` that your name references. Is it the `adr`? I think not, because that is the base address, right? Can you clarify a little more? Maybe add to the documentation of the gather and scatter node as well, if you think that helps?
>
> Actually, you already did add documentation to the gather / scatter nodes now. And based on your explanation there, I suggest you rename the method here to:
> `gather_scatter_requires_indices_from_array`
> This would say that the indices come from an array, rather than a vector register.
> 
> Your current name we had agreed on confuses me because it suggests that the index maybe already in the address `adr`, but that does not make much sense.

Ok, `gather_scatter_requires_indices_from_array` sounds better to me. I will change it soon.

>I'm a little confused now what is the address that your name references. Is it the adr? I think not, because that is the base address, right? Can you clarify a little more? Maybe add to the documentation of the gather and scatter node as well, if you think that helps?

It means the input `indices` is an address that saves the indexes if this method return true, otherwise, `indices` is a vector register. You are right that it has no relationship with `adr` input which is the memory base address.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2366559782

From jkarthikeyan at openjdk.org  Mon Sep 22 03:03:22 2025
From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan)
Date: Mon, 22 Sep 2025 03:03:22 GMT
Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in
 SuperWord truncation: CastII [v2]
In-Reply-To: <vcVmO3zw1cEkEGwFmSIuh11_gQ1gHez02MhVNQvt79o=.943407c5-d6bc-4ec8-acc0-c2002824fd00@github.com>
References: <XKOkG-MuA1n1Cy1qrBXCPBBx9RLFjD4iMk-oWNKfSPM=.42d2c171-b4b0-4457-b993-014a3cdfe656@github.com>
 <V977QzHH4oel8SJt9d3kG1HFtrscXKdCAAIYx0CqCzI=.22e0de4a-1a32-4ce9-b820-6d1247e0d4a4@github.com>
 <vcVmO3zw1cEkEGwFmSIuh11_gQ1gHez02MhVNQvt79o=.943407c5-d6bc-4ec8-acc0-c2002824fd00@github.com>
Message-ID: <vXlb9QDOlU7Nap7M30K2iL22yFcmibCZ0WFz5vXQyCI=.2343f701-9dc9-4d7f-9f74-6a594d98373f@github.com>

On Tue, 16 Sep 2025 07:47:50 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update comment for constraint casts
>
> test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 431:
> 
>> 429:     }
>> 430: 
>> 431:     @Test(compLevel = CompLevel.C2)
> 
> Any particular reason you've chosen `C2` here and not let the IR framework handle it? (by default it's `ANY` which will compile at the highest available tier). I'm also wondering if this test would fail if someone ran the test with a build without C2.

Thanks for the comment! I used `CompLevel.C2` here to simulate an -Xcomp environment, since unfortunately I couldn't replicate the crash without it with the IR framework. I'll do some investigation to find a way to ensure that it won't fail without C2.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2366609927

From jkarthikeyan at openjdk.org  Mon Sep 22 03:03:24 2025
From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan)
Date: Mon, 22 Sep 2025 03:03:24 GMT
Subject: RFR: 8350468: x86: Improve implementation of vectorized
 numberOfLeadingZeros for int and long
In-Reply-To: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
Message-ID: <yHNrHZPQHRCeyuxZRI4G7Jfiw_lxXLxROKfrXBWS_-U=.d6b4abe9-bca8-4f94-bd80-3f454bb93672@github.com>

On Mon, 4 Aug 2025 02:20:31 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> Hi all,
> This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results:
> 
>                                  Baseline                        Patch        
> Benchmark              Mode  Cnt    Score   Error  Units    Score   Error  Units  Improvement
> LeadingZeros.testInt   avgt   15   91.097 ? 3.276  ns/op   68.665 ? 1.740  ns/op  (+ 28.1%)
> LeadingZeros.testLong  avgt   15  342.545 ? 4.470  ns/op  228.668 ? 5.994  ns/op  (+ 39.9%)
> 
> I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated!

Hi! May I have some reviews on this? Maybe @jatin-bhateja or @sviswa7 since this is an x86 backend change.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26610#issuecomment-3316600692

From dzhang at openjdk.org  Mon Sep 22 03:22:00 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Mon, 22 Sep 2025 03:22:00 GMT
Subject: RFR: 8368206: RISC-V: compiler/vectorapi/VectorMaskCompareNotTest.java
 fails when running without RVV
Message-ID: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>

Hi,
Can you help to review this patch? Thanks!

We noticed that compiler/vectorapi/VectorMaskCompareNotTest.java fails when running on sg2042.
On RISC-V without RVV, ofLargestShape(long.class) falls back to 64 bits (see getMaxVectorBitSize in VectorShape.java),
leading to VectorShape.forBitSize(32) which is unsupported and throws IllegalArgumentException.

### Test (fastdebug)
- [x] Run compiler/vectorapi/VectorMaskCompareNotTest.java on sg2042

-------------

Commit messages:
 - 8368206: RISC-V: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without RVV

Changes: https://git.openjdk.org/jdk/pull/27414/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27414&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8368206
  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27414.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27414/head:pull/27414

PR: https://git.openjdk.org/jdk/pull/27414

From fyang at openjdk.org  Mon Sep 22 06:17:13 2025
From: fyang at openjdk.org (Fei Yang)
Date: Mon, 22 Sep 2025 06:17:13 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
Message-ID: <iuPxJhDj66UuoJ12uIqvtGiI7UDyVUkIoLVgnnQOO3I=.fe5f6c32-00cf-41ac-87da-fa64c0e3b457@github.com>

On Mon, 22 Sep 2025 03:16:29 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> We noticed that compiler/vectorapi/VectorMaskCompareNotTest.java fails when running on sg2042.
> On RISC-V without RVV, ofLargestShape(long.class) falls back to 64 bits (see getMaxVectorBitSize in VectorShape.java),
> leading to VectorShape.forBitSize(32) which is unsupported and throws IllegalArgumentException.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskCompareNotTest.java on sg2042

LGTM.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27414#pullrequestreview-3250747910

From dfenacci at openjdk.org  Mon Sep 22 06:46:20 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Mon, 22 Sep 2025 06:46:20 GMT
Subject: RFR: 8349835: C2: simplify IGV property printing
In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
Message-ID: <TDI60JyBoKBzhv-jldCQB39rN82rfktYiuIQCqsC2G8=.36d8a2a5-0b41-4f77-9654-05ba1be622b2@github.com>

On Fri, 22 Aug 2025 13:28:22 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708).
> 
> ### Fix 
> Implemented the suggested refactoring. 
> 
> ### Testing 
> Github Actions, Tier 1-3

Thanks @sarannat for cleaning this up! The fix looks ok for me. I just left a couple of inline comments for minor things.

src/hotspot/share/opto/idealGraphPrinter.cpp line 259:

> 257: }
> 258: 
> 259: void IdealGraphPrinter::print_prop_record(const IdealGraphPrintRecord rec[], int size) {

Just a small naming note: the method actually prints multiple records. Could it be preferable to use the plural form?

src/hotspot/share/opto/idealGraphPrinter.cpp line 534:

> 532:         {((flags & Node::Flag_has_call) != 0), "has_call", "true"},
> 533:         {((flags & Node::Flag_has_swapped_edges) != 0), "has_swapped_edges", "true"}
> 534:       };

The indentation seems a bit off.

src/hotspot/share/opto/idealGraphPrinter.cpp line 1089:

> 1087:           {1, "cost", nullptr, (int) lrg._cost},
> 1088:           {1, "area", nullptr, (int) lrg._area},
> 1089:           {1, "score", nullptr, (int) lrg.score()},

It is mainly a style matter but as `_cond` is a `bool` it might be better to use `true` rather than `1`

src/hotspot/share/opto/idealGraphPrinter.cpp line 1109:

> 1107:           {(lrg._is_bound != 0), "is_bound", TRUE_VALUE},
> 1108:           {lrg._msize_valid && lrg._degree_valid && lrg.lo_degree(), "trivial", TRUE_VALUE}
> 1109:         };

Same intentation issue here.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3250796143
PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2366871980
PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2366873355
PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2366858280
PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2366874365

From mbaesken at openjdk.org  Mon Sep 22 06:52:17 2025
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Mon, 22 Sep 2025 06:52:17 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
In-Reply-To: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
Message-ID: <xytvQB_dSP5xe_coKl2GxxxF_0wxL4t_bkGSA6N7S9E=.5b1ab6ce-05a5-44a8-9935-a3157e159041@github.com>

On Sun, 21 Sep 2025 06:11:11 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
> 
> Please review :)

I'll  test this in our CI to see if this fixes the linux aarch64 issues (observed when running Test java/foreign/TestUpcallStress.java ) .

Btw. why do we get always zero size replay files when running into the issue ?


# Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_aarch64-dbg/jdk/src/hotspot/share/opto/phaseX.cpp:2763), pid=1089937, tid=1089972
# fatal error: Not monotonic


Is it another bug of the replay file generation or a known limitation ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27408#issuecomment-3317214622
PR Comment: https://git.openjdk.org/jdk/pull/27408#issuecomment-3317218427

From dfenacci at openjdk.org  Mon Sep 22 06:54:20 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Mon, 22 Sep 2025 06:54:20 GMT
Subject: RFR: 8349835: C2: simplify IGV property printing
In-Reply-To: <TDI60JyBoKBzhv-jldCQB39rN82rfktYiuIQCqsC2G8=.36d8a2a5-0b41-4f77-9654-05ba1be622b2@github.com>
References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
 <TDI60JyBoKBzhv-jldCQB39rN82rfktYiuIQCqsC2G8=.36d8a2a5-0b41-4f77-9654-05ba1be622b2@github.com>
Message-ID: <rr91vjqgcQKN0_AqKgKC7aUSQ6i3IU81X_qH2JTNVLc=.ae52b6e9-9c4a-44ff-96dc-da0a1acc338a@github.com>

On Mon, 22 Sep 2025 06:41:46 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

>> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708).
>> 
>> ### Fix 
>> Implemented the suggested refactoring. 
>> 
>> ### Testing 
>> Github Actions, Tier 1-3
>
> src/hotspot/share/opto/idealGraphPrinter.cpp line 534:
> 
>> 532:         {((flags & Node::Flag_has_call) != 0), "has_call", "true"},
>> 533:         {((flags & Node::Flag_has_swapped_edges) != 0), "has_swapped_edges", "true"}
>> 534:       };
> 
> The indentation seems a bit off.

BTW we might want to use `TRUE_VALUE` instead of `"true"` here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2366890675

From dfenacci at openjdk.org  Mon Sep 22 06:58:21 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Mon, 22 Sep 2025 06:58:21 GMT
Subject: RFR: 8349835: C2: simplify IGV property printing
In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
Message-ID: <-wTyKNrZRuNp5TCveUIESvMlEs1Gupj7N4BZWm_RsRw=.f6662f5d-5191-49b5-8054-7763bfb483c0@github.com>

On Fri, 22 Aug 2025 13:28:22 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708).
> 
> ### Fix 
> Implemented the suggested refactoring. 
> 
> ### Testing 
> Github Actions, Tier 1-3

src/hotspot/share/opto/idealGraphPrinter.cpp line 261:

> 259: void IdealGraphPrinter::print_prop_record(const IdealGraphPrintRecord rec[], int size) {
> 260:   for ( int i = 0; i < size; i++ ) {
> 261:     if (rec[i]._cond != 0) {

As for the comment below it might be more consistent to only use `rec[i]._cond` as the condition.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2366898344

From chagedorn at openjdk.org  Mon Sep 22 07:00:16 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 22 Sep 2025 07:00:16 GMT
Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in
 SuperWord truncation: CastII [v2]
In-Reply-To: <vXlb9QDOlU7Nap7M30K2iL22yFcmibCZ0WFz5vXQyCI=.2343f701-9dc9-4d7f-9f74-6a594d98373f@github.com>
References: <XKOkG-MuA1n1Cy1qrBXCPBBx9RLFjD4iMk-oWNKfSPM=.42d2c171-b4b0-4457-b993-014a3cdfe656@github.com>
 <V977QzHH4oel8SJt9d3kG1HFtrscXKdCAAIYx0CqCzI=.22e0de4a-1a32-4ce9-b820-6d1247e0d4a4@github.com>
 <vcVmO3zw1cEkEGwFmSIuh11_gQ1gHez02MhVNQvt79o=.943407c5-d6bc-4ec8-acc0-c2002824fd00@github.com>
 <vXlb9QDOlU7Nap7M30K2iL22yFcmibCZ0WFz5vXQyCI=.2343f701-9dc9-4d7f-9f74-6a594d98373f@github.com>
Message-ID: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com>

On Mon, 22 Sep 2025 03:00:18 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 431:
>> 
>>> 429:     }
>>> 430: 
>>> 431:     @Test(compLevel = CompLevel.C2)
>> 
>> Any particular reason you've chosen `C2` here and not let the IR framework handle it? (by default it's `ANY` which will compile at the highest available tier). I'm also wondering if this test would fail if someone ran the test with a build without C2.
>
> Thanks for the comment! I used `CompLevel.C2` here to simulate an -Xcomp environment, since unfortunately I couldn't replicate the crash without it with the IR framework. I'll do some investigation to find a way to ensure that it won't fail without C2.

When you specify `@Warmup(0)`, the IR framework should directly compile it at the highest level which should be C2 if you are not running with a client build. So, I would have expected that it makes no difference. Can you double-check if you can reproduce it with `CompLevel.C2` but not without?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2366902189

From chagedorn at openjdk.org  Mon Sep 22 07:01:20 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 22 Sep 2025 07:01:20 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
In-Reply-To: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
Message-ID: <_oHjwEyKU4sl2D4hY_NyKhu7JLZj7v_adWzdtxXM1Dk=.640f029b-24a4-4bed-b78a-6a43a4b72988@github.com>

On Sun, 21 Sep 2025 06:11:11 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
> 
> Please review :)

The fix looks good to me, thanks for the fix and the credit for the test!

I'll give it a spin in our testing as well.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27408#pullrequestreview-3250859104

From chagedorn at openjdk.org  Mon Sep 22 07:09:14 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 22 Sep 2025 07:09:14 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
In-Reply-To: <xytvQB_dSP5xe_coKl2GxxxF_0wxL4t_bkGSA6N7S9E=.5b1ab6ce-05a5-44a8-9935-a3157e159041@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <xytvQB_dSP5xe_coKl2GxxxF_0wxL4t_bkGSA6N7S9E=.5b1ab6ce-05a5-44a8-9935-a3157e159041@github.com>
Message-ID: <Y3dA6Y7w5UTe7sesEh3PZi2RQO3jRjrp-rkLOI-wego=.d6f72843-e2f4-49ca-8c75-7058cefd904f@github.com>

On Mon, 22 Sep 2025 06:49:49 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> Btw. why do we get always zero size replay files when running into the issue ?
> 
> ```
> # Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_aarch64-dbg/jdk/src/hotspot/share/opto/phaseX.cpp:2763), pid=1089937, tid=1089972
> # fatal error: Not monotonic
> ```
> 
> Is it another bug of the replay file generation or a known limitation ?

We've encountered empty replay files before which could be traced back to a timeout in error reporting due to threads being stuck. We filed [JDK-8297588](https://bugs.openjdk.org/browse/JDK-8297588) for it but it's not fixed, yet. I did a closer investigation back there (see [summary](https://bugs.openjdk.org/browse/JDK-8297588?focusedId=14543145&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14543145)). You might be hitting the same issue.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27408#issuecomment-3317270620

From duke at openjdk.org  Mon Sep 22 07:46:46 2025
From: duke at openjdk.org (erifan)
Date: Mon, 22 Sep 2025 07:46:46 GMT
Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when
 MaxVectorSize=8
Message-ID: <RF-pB-qfXweWi6FuSCZZWnxElKTYN1m6JjP74TDVWqo=.81c23890-ed5e-454e-baab-b6119a3941e8@github.com>

The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes.

This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher.

-------------

Commit messages:
 - 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when MaxVectorSize=8

Changes: https://git.openjdk.org/jdk/pull/27418/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27418&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8368205
  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27418.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27418/head:pull/27418

PR: https://git.openjdk.org/jdk/pull/27418

From chagedorn at openjdk.org  Mon Sep 22 08:01:02 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Mon, 22 Sep 2025 08:01:02 GMT
Subject: RFR: 8349835: C2: simplify IGV property printing
In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
Message-ID: <0VdOTpwqYxNtt46bdcwe0rgTnSW8-KDY0GjXGAqvC9c=.b89da49e-82c1-4856-a899-59c3cc9aa17b@github.com>

On Fri, 22 Aug 2025 13:28:22 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708).
> 
> ### Fix 
> Implemented the suggested refactoring. 
> 
> ### Testing 
> Github Actions, Tier 1-3

src/hotspot/share/opto/idealGraphPrinter.cpp line 1082:

> 1080:       lrg.mask().dump(&lrg_mask_stream);
> 1081:       IdealGraphPrintRecord rec[] = {
> 1082:           {1, "mask", buffer},

Thank for trying to clean this up! I see the benefit of doing it since it looks like we can get rid of the repetitions. But using this approach now feels like we squeezed in too much into a single generic method and fall back to arrays with structs with some optional fields that are sometimes set to null which is not easy to comprehend. And the method `visit_nodes()` is still quite large. What if we just had different methods or even classes for different properties? This helps with self-documenting the code and also avoids passing in some `nullptr` for unused values. For example (in pseudocode):
- node flag properties all share the same pattern:

class NodeFlags:
  NodeFlags _flags;
  print_properties() {
     print_property(Flag_is_Copy, "is_copy");
     print_property(Flag_rematerialize, "rematerialize");
     ...
  }

 print_property(flag, property) {
    if (_flags & flag) {
      print_prop(property, "true");
    }
 }   

- For lrg: Could also create a class and store `lrg` as field. We could have different printing methods, for example for unconditional prints, for printing true etc.

This is just an idea, I have not actually tried it out. What are your thoughts about that alternative approach? 

We might have even more opportunities to refactor `visit_node()` since there are some more repetitive patterns here and there. But this could also be done separately.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2367031472

From shade at openjdk.org  Mon Sep 22 08:02:07 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 22 Sep 2025 08:02:07 GMT
Subject: RFR: 8368071: Compilation throughput regressed 2X-8X after
 JDK-8355003
In-Reply-To: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
References: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
Message-ID: <7bXrjlQEr9kJsgOtJILZyHXbU8peGShP56PzeA6Z7SA=.00a70312-b23a-4b46-b53f-10dc3b058139@github.com>

On Fri, 19 Sep 2025 07:52:16 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi all,
> 
> Could anyone review this change that fixes a severe startup performance regression for `-XX:+TieredCompilation`?  See https://bugs.openjdk.org/browse/JDK-8368071 for more details.
> 
> -Man

Looks okay for the fix that restores previous tiered policy behavior.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27383#pullrequestreview-3251050569

From galder at openjdk.org  Mon Sep 22 08:06:16 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Mon, 22 Sep 2025 08:06:16 GMT
Subject: RFR: 8359412: Template-Framework Library: Operations and
 Expressions [v6]
In-Reply-To: <OgyQiPhMsqRSBiZvM_5rZX1iouEG0AWzJW9jzHU-ZMw=.9354d371-e56b-427a-aa75-483357c98d9f@github.com>
References: <6Bm5VrrqCOzdOooIU-wud7c3aCSuv_7GNZe7pe7D7Jk=.c99a9df1-e6bb-4c8d-94e9-029978fae6ab@github.com>
 <F86WzhNjF4KSD7bCieVWq8HEj_7Zr0cbeM-27xKPFzI=.ebeb8e75-25c5-491b-86f4-bbca1ed3487a@github.com>
 <WNaYML3vULg4Ycap4a79o6Siu-8_3Gm_UZo51g-BplE=.8c2951e8-f0fc-4471-aa54-69afc2e67db9@github.com>
 <OgyQiPhMsqRSBiZvM_5rZX1iouEG0AWzJW9jzHU-ZMw=.9354d371-e56b-427a-aa75-483357c98d9f@github.com>
Message-ID: <UjAJarnh3vwA1-Dcg8FnP3O_KkOUCkD59q9jI0qfZUA=.b01ee7f1-f7ab-4d03-a785-36c139d75ec7@github.com>

On Fri, 19 Sep 2025 05:48:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Nice additions @eme64!
>> 
>> I would have liked to see an example of real use case of this in action included in the PR, e.g. some kind of IR test that takes advantage of this. E.g. a companion version (and/or replacement) for `VectorReduction2`? A follow up RFE would of course be fine for this.
>
> @galderz Thanks for reviewing!
> Can you spell out a little more what you would like to see? For me, the `compiler/igvn/ExpressionFuzzer.java` is already "an example of real use" for me. And I have a lot still planned in future RFE's, see the "future work" section in the PR description ;)

@eme64 Fair enough, thanks for reminding me about the future work, that will likely cover my suggestion more easily. Enjoy time off!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26885#issuecomment-3317454298

From shade at openjdk.org  Mon Sep 22 08:06:16 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Mon, 22 Sep 2025 08:06:16 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
In-Reply-To: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
Message-ID: <HSIg1l0rj7-rUeyllzqVda1ihzg7CreogP-9EZsiSb8=.24140e55-19fb-4321-9a13-690f3ad76801@github.com>

On Sun, 21 Sep 2025 06:11:11 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
> 
> Please review :)

This makes sense. I was about to ask what would be the result of `0 mod 0` then, but I see it is also covered: we return `TOP` on any `X mod 0` early on.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27408#pullrequestreview-3251064181

From dzhang at openjdk.org  Mon Sep 22 09:16:27 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Mon, 22 Sep 2025 09:16:27 GMT
Subject: RFR: 8368247: RISC-V: enable vectorapi test for expand operation
Message-ID: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>

Hi,
Can you help to review this patch? Thanks!

[JDK-8363989](https://bugs.openjdk.org/browse/JDK-8363989) adds a vectorapi test for VectorAPI expand operation, which we can also enable on RISC-V.

### Test (fastdebug)
- [x] Run compiler/vectorapi/VectorExpandTest.java on k1 and sg2042

-------------

Commit messages:
 - 8368247: RISC-V: enable vectorapi test for expand operation

Changes: https://git.openjdk.org/jdk/pull/27420/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27420&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8368247
  Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/27420.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27420/head:pull/27420

PR: https://git.openjdk.org/jdk/pull/27420

From mchevalier at openjdk.org  Mon Sep 22 11:22:24 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Mon, 22 Sep 2025 11:22:24 GMT
Subject: RFR: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java [v2]
In-Reply-To: <mJBCJpEMT_Yl1s5b8M2Yu6gJo18FZepU9Pj1zqUqZBU=.07df1c29-c004-437f-b611-9a71299aafd7@github.com>
References: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
 <mJBCJpEMT_Yl1s5b8M2Yu6gJo18FZepU9Pj1zqUqZBU=.07df1c29-c004-437f-b611-9a71299aafd7@github.com>
Message-ID: <nZ6k3de38hcHySzy-zb669J0AhMZ_WleywonZtVuxBo=.929f3074-a149-4664-9a42-9da23715b2bc@github.com>

On Tue, 16 Sep 2025 15:42:39 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> The test definitions of `TestAlignVectorFuzzer.java` all contain `printcompilation` directives. These are redundant and slow down the test execution of a test that already often times out. @eme64 also suggested adding a `compileonly` directive to one of the four tests.
>> 
>> Testing:
>>  - [x] Github Actions
>>  - [x] tier1 and stress testing (features `TestAlignVectorFuzzer.java`)
>
> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8366878-align-fuzz-flags
>  - Make compileonly a separate run
>  - Fix flags

Looks good to me.

-------------

Marked as reviewed by mchevalier (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27122#pullrequestreview-3252043642

From mhaessig at openjdk.org  Mon Sep 22 11:28:10 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Mon, 22 Sep 2025 11:28:10 GMT
Subject: Integrated: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java
In-Reply-To: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
References: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
Message-ID: <vMdyq0Xg1uYAx8L_Sz5ISIeN7-dlZsAKK2Bb7trzKYU=.63197370-4bc3-4e80-ad1f-72f2444c1290@github.com>

On Fri, 5 Sep 2025 16:46:09 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> The test definitions of `TestAlignVectorFuzzer.java` all contain `printcompilation` directives. These are redundant and slow down the test execution of a test that already often times out. @eme64 also suggested adding a `compileonly` directive to one of the four tests.
> 
> Testing:
>  - [x] Github Actions
>  - [x] tier1 and stress testing (features `TestAlignVectorFuzzer.java`)

This pull request has now been integrated.

Changeset: 0ba4141c
Author:    Manuel H?ssig <mhaessig at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/0ba4141cb12414c08be88b37ea2a163aacbfa7de
Stats:     17 lines in 1 file changed: 12 ins; 3 del; 2 mod

8366878: Improve flags of compiler/loopopts/superword/TestAlignVectorFuzzer.java

Co-authored-by: Emanuel Peter <epeter at openjdk.org>
Reviewed-by: epeter, mchevalier

-------------

PR: https://git.openjdk.org/jdk/pull/27122

From mhaessig at openjdk.org  Mon Sep 22 11:28:08 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Mon, 22 Sep 2025 11:28:08 GMT
Subject: RFR: 8366878: Improve flags of
 compiler/loopopts/superword/TestAlignVectorFuzzer.java [v2]
In-Reply-To: <bEv5hn6ZblGLpIUUvTJvxCFBYiJv9GsbsW2FdZv5Zuc=.c73d9d88-eb61-42f0-a9f5-b61b214744f5@github.com>
References: <jqOL72bER1G6aunV15U5OE4iUltIiv9dfhdJrvkyy3k=.7697a330-3de2-4591-96e3-70dc43e7139a@github.com>
 <mJBCJpEMT_Yl1s5b8M2Yu6gJo18FZepU9Pj1zqUqZBU=.07df1c29-c004-437f-b611-9a71299aafd7@github.com>
 <bEv5hn6ZblGLpIUUvTJvxCFBYiJv9GsbsW2FdZv5Zuc=.c73d9d88-eb61-42f0-a9f5-b61b214744f5@github.com>
Message-ID: <LBmigbbDy5UkxgxZ57VJeIdJ5flHjAtXM-7M40wB5_Y=.64dc8bc7-7792-4221-a7b2-4a35ebd9f98f@github.com>

On Wed, 17 Sep 2025 05:58:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-8366878-align-fuzz-flags
>>  - Make compileonly a separate run
>>  - Fix flags
>
> Looks good :)

Thank you for your reviews, @eme64 and @marc-chevalier!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27122#issuecomment-3318379337

From roland at openjdk.org  Mon Sep 22 12:07:34 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 22 Sep 2025 12:07:34 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v13]
In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
Message-ID: <r0pgsB3vqLwVHac-RqJ2NWRJ7iTa6_JXMuG9csYPQ30=.d58b5a68-b045-40b1-9fd1-e59361df359b@github.com>

> An `Initialize` node for an `Allocate` node is created with a memory
> `Proj` of adr type raw memory. In order for stores to be captured, the
> memory state out of the allocation is a `MergeMem` with slices for the
> various object fields/array element set to the raw memory `Proj` of
> the `Initialize` node. If `Phi`s need to be created during later
> transformations from this memory state, The `Phi` for a particular
> slice gets its adr type from the type of the `Proj` which is raw
> memory. If during macro expansion, the `Allocate` is found to have no
> use and so can be removed, the `Proj` out of the `Initialize` is
> replaced by the memory state on input to the `Allocate`. A `Phi` for
> some slice for a field of an object will end up with the raw memory
> state on input to the `Allocate` node. As a result, memory state at
> the `Phi` is incorrect and incorrect execution can happen.
> 
> The fix I propose is, rather than have a single `Proj` for the memory
> state out of the `Initialize` with adr type raw memory, to use one
> `Proj` per slice added to the memory state after the `Initalize`. Each
> of the `Proj` should return the right adr type for its slice. For that
> I propose having a new type of `Proj`: `NarrowMemProj` that captures
> the right adr type.
> 
> Logic for the construction of the `Allocate`/`Initialize` subgraph is
> tweaked so the right adr type captured in is own `NarrowMemProj` is
> added to the memory sugraph. Code that removes an allocation or moves
> it also has to be changed so it correctly takes the multiple memory
> projections out of the `Initialize` node into account.
> 
> One tricky issue is that when EA split types for a scalar replaceable
> `Allocate` node:
> 
> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>   with the type of the slices for the allocation
>   
> 2- before EA, the memory state for one particular field out of the
>   `Initialize` node can be used for a `Store` to the just allocated
>   object or some other. So we can have a chain of `Store`s, some to
>   the newly allocated object, some to some other objects, all of them
>   using the state of `NarrowMemProj` out of the `Initialize`. After
>   split unique types, the `NarrowMemProj` is for the slice of a
>   particular allocation. So `Store`s to some other objects shouldn't
>   use that memory state but the memory state before the `Allocate`.
>   
> For that, I added logic to update the adr type of `NarrowMemProj`
> during split unique types and update the memory input of `Store`s that
> don't depend on the memory state ...

Roland Westrelin has updated the pull request incrementally with seven additional commits since the last revision:

 - Update src/hotspot/share/opto/macro.cpp
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>
 - Update src/hotspot/share/opto/macro.cpp
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>
 - Update src/hotspot/share/opto/graphKit.cpp
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>
 - Update src/hotspot/share/opto/graphKit.cpp
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>
 - Update src/hotspot/share/opto/multnode.hpp
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>
 - Update src/hotspot/share/opto/multnode.hpp
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>
 - Update test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24570/files
  - new: https://git.openjdk.org/jdk/pull/24570/files/b701d03e..6ea8c811

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=12
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=11-12

  Stats: 7 lines in 4 files changed: 0 ins; 0 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/24570.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570

PR: https://git.openjdk.org/jdk/pull/24570

From mli at openjdk.org  Mon Sep 22 12:20:36 2025
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 22 Sep 2025 12:20:36 GMT
Subject: RFR: 8368247: RISC-V: enable vectorapi test for expand operation
In-Reply-To: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
References: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
Message-ID: <BYrmV_tbSgVLiFcCIpayun5UXCj-tsSK6LMoxG6Pc4c=.d1b756c2-d37c-45b8-9248-1835abf4b00c@github.com>

On Mon, 22 Sep 2025 09:09:05 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8363989](https://bugs.openjdk.org/browse/JDK-8363989) adds a vectorapi test for VectorAPI expand operation, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorExpandTest.java on k1 and sg2042

Looks good. Thanks!
I saw in https://github.com/openjdk/jdk/pull/26740 it added `EXPAND_VX` in IRNode.java . Does this mean we did not test this ExpandV previously when it's implemented?

-------------

Marked as reviewed by mli (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27420#pullrequestreview-3252354644

From mli at openjdk.org  Mon Sep 22 12:28:51 2025
From: mli at openjdk.org (Hamlin Li)
Date: Mon, 22 Sep 2025 12:28:51 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
Message-ID: <Vg8YsRztlY_-bhe05rpBS39jcLb9vHAGG4kXibYMa7M=.3281cc09-e70f-4cd2-9bbf-698329486546@github.com>

On Mon, 22 Sep 2025 03:16:29 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> We noticed that compiler/vectorapi/VectorMaskCompareNotTest.java fails when running on sg2042.
> On RISC-V without RVV, ofLargestShape(long.class) falls back to 64 bits (see getMaxVectorBitSize in VectorShape.java),
> leading to VectorShape.forBitSize(32) which is unsupported and throws IllegalArgumentException.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskCompareNotTest.java on sg2042

Hey, I'm wondering if all the tests under hotspot/jtreg/compiler/vectorapi should `@require` rvv?
Otherwise seems they are not really testing anything useful?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27414#issuecomment-3318704172

From bmaillard at openjdk.org  Mon Sep 22 13:26:51 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Mon, 22 Sep 2025 13:26:51 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
In-Reply-To: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
Message-ID: <H_fX6kXWVfcTAcqzz6vID1if2ytB9MlSIiGze3y3JDw=.f0a3718c-6e00-49c1-963a-14519ae3262e@github.com>

On Sun, 21 Sep 2025 06:11:11 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
> 
> Please review :)

Looks good to me, I only have one minor comment.

test/hotspot/jtreg/compiler/c2/TestModValueMonotonic.java line 28:

> 26:  * @bug 8367967
> 27:  * @summary Ensure ModI/LNode::Value is monotonic with potential divison by 0
> 28:  * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:CompileOnly=compiler.c2.TestModValueMonotonic::test*

You could probably add another `@run main ...` without flags to potentially catch other things in the future

-------------

PR Review: https://git.openjdk.org/jdk/pull/27408#pullrequestreview-3252765581
PR Review Comment: https://git.openjdk.org/jdk/pull/27408#discussion_r2368382691

From roland at openjdk.org  Mon Sep 22 13:29:27 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 22 Sep 2025 13:29:27 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
 <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>
Message-ID: <G2Wc1K9r04C8Etypi5QVNMPeIMIbEbcRM8X92EhbQEI=.746fa8d9-179d-4ba8-88c8-73e7d119926e@github.com>

On Fri, 19 Sep 2025 13:02:43 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
>> 
>>  - more
>>  - Merge branch 'master' into JDK-8327963
>>  - more
>>  - more
>>  - Merge branch 'master' into JDK-8327963
>>  - more
>>  - more
>>  - lambda return
>>  - lambda clean up
>>  - Merge branch 'master' into JDK-8327963
>>  - ... and 35 more: https://git.openjdk.org/jdk/compare/e16c5100...b701d03e
>
> test/hotspot/jtreg/compiler/escapeAnalysis/TestIterativeEA.java line 53:
> 
>> 51:     analyzer.shouldContain("++++ Eliminated: 26 Allocate");
>> 52:     analyzer.shouldContain("++++ Eliminated: 51 Allocate");
>> 53:     analyzer.shouldContain("++++ Eliminated: 84 Allocate");
> 
> Did you analyze why there are more allocations removed than before in this test case? I did not expect this changeset to have an effect on the number of removed allocations.

There are not more allocations removed. The message is confusing.
"Eliminated: 84 Allocate" logs that node number 84 was eliminated (and not 84 nodes).
This patch changes the number of nodes required at allocations so it also has an impact on node numbering.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2368395832

From roland at openjdk.org  Mon Sep 22 13:38:02 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 22 Sep 2025 13:38:02 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
 <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>
Message-ID: <w17qVoqlDsyaXEHj9cmgpZrpTF8DTUQd4Y6GAyO9c8o=.5f185e45-e654-4a8e-8fcd-cf12a794525c@github.com>

On Fri, 19 Sep 2025 12:41:06 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
>> 
>>  - more
>>  - Merge branch 'master' into JDK-8327963
>>  - more
>>  - more
>>  - Merge branch 'master' into JDK-8327963
>>  - more
>>  - more
>>  - lambda return
>>  - lambda clean up
>>  - Merge branch 'master' into JDK-8327963
>>  - ... and 35 more: https://git.openjdk.org/jdk/compare/e16c5100...b701d03e
>
> src/hotspot/share/opto/multnode.cpp line 73:
> 
>> 71:   };
>> 72:   return apply_to_projs(filter, which_proj);
>> 73: }
> 
> Consider moving this implementation to `multnode.hpp`, perhaps next to that of `MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj)`,  for consistency.

Isn't it better practice to leave the implementation in the cpp file? It's not always possible because of templates so some of the related methods' implementation is in the hpp file but wouldn't we want to keep that to a minimum?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2368422480

From roland at openjdk.org  Mon Sep 22 13:37:55 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 22 Sep 2025 13:37:55 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v8]
In-Reply-To: <ZChc05Qt2p92YdfYKDubkDBnkvFqv3ETpjXRVyxKhnQ=.24861051-f0b7-4ba6-960d-92a5cf9ecf9a@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com>
 <phFEV6ecal3bMYgAt85dr5f6UKm024p2Ssw2l5zDvOQ=.c332a12d-5009-4e99-abc4-e0d58f06a075@github.com>
 <JczlkGMI1ugc2011v3_yecnmAihjcv5YYyixFtvZjvk=.3994dece-26bc-4c73-9850-8f63986b6fc7@github.com>
 <2m1_XtiSsW_LaBRrkX4qv7AKtLOjNgnl4mUp3zisasE=.dda62164-7aa0-4c1a-b83f-fa40ba7902e5@github.com>
 <eMGWpjjtAvxGzXXgDpfqUyz-LHobPg5dEAk99yQYhic=.81804900-b4ae-4b71-9a39-893fa7b6d36c@github.com>
 <LeeKE7VBNvxxD8-1ltyf2CGltyUV90y-ZabbxGVYXZc=.79192936-6954-4b74-a4ec-ead162efe4e2@github.com>
 <4374L3lkQK90wLxxOA7POBmIKNX2DFK-4pO4vj1bkuQ=.5b8d7825-a7f1-497f-ab66-02a85a266659@github.com>
 <QtsENUXeRsA140liru9rjk0KDbNVhKj6qPVU8toDlkI=.4b9eadfe-045e-4bae-a2c8-40c04496cb60@github.com>
 <BrNHUWgnhDZWz523gq_a8Smxck7UE0r0gBLQHfydrXk=.d96048bf-497e-426d-bdab-b58e63b1e5c6@github.com>
 <hGGgYXj4IJCGws1HtyYZjSjpi88IemdVUxZ
 O1HaVDdc=.9ee892d7-09ec-4752-a4ad-385ff209c5c0@github.com>
 <ZChc05Qt2p92YdfYKDubkDBnkvFqv3ETpjXRVyxKhnQ=.24861051-f0b7-4ba6-960d-92a5cf9ecf9a@github.com>
Message-ID: <qOFPKmn1w0ERjCLrlall5UyPyC6cIjWgw3vswQ-nCxI=.d20f3729-4fdb-4c0a-9ff6-32b88500eccd@github.com>

On Thu, 11 Sep 2025 07:48:10 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>>> @rose00 @robcasloz I updated the change with a new way to avoid redundant projections. At matching time, before a `NarrowMemProj` is matched into a `MachProj`, new logic checks whether a `MachProj` already exists. That guarantees that no redundant `MachProj` are ever added. It also performs the new normalization at a major cut-point. What do you think?
>> 
>> That sounds good to me, thank you for enforcing this Roland! I will re-run testing and have a new look at the changeset within the next days.
>
>> That sounds good to me, thank you for enforcing this Roland! I will re-run testing and have a new look at the changeset within the next days.
> 
> Test results of b701d03ed335286587c4d2539dde715b091d30bd on top of jdk-26+14 look good. Will have a look at the code within the next days.

@robcasloz thanks for the review. New commit addresses most of your comments.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3319071258

From roland at openjdk.org  Mon Sep 22 13:37:54 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Mon, 22 Sep 2025 13:37:54 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v14]
In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
Message-ID: <zXDTE10_qJh9J34Y8g-rfTqHPKi17Afgs2aWRW382DY=.170b9116-fec3-4b69-b934-6e30400b5c17@github.com>

> An `Initialize` node for an `Allocate` node is created with a memory
> `Proj` of adr type raw memory. In order for stores to be captured, the
> memory state out of the allocation is a `MergeMem` with slices for the
> various object fields/array element set to the raw memory `Proj` of
> the `Initialize` node. If `Phi`s need to be created during later
> transformations from this memory state, The `Phi` for a particular
> slice gets its adr type from the type of the `Proj` which is raw
> memory. If during macro expansion, the `Allocate` is found to have no
> use and so can be removed, the `Proj` out of the `Initialize` is
> replaced by the memory state on input to the `Allocate`. A `Phi` for
> some slice for a field of an object will end up with the raw memory
> state on input to the `Allocate` node. As a result, memory state at
> the `Phi` is incorrect and incorrect execution can happen.
> 
> The fix I propose is, rather than have a single `Proj` for the memory
> state out of the `Initialize` with adr type raw memory, to use one
> `Proj` per slice added to the memory state after the `Initalize`. Each
> of the `Proj` should return the right adr type for its slice. For that
> I propose having a new type of `Proj`: `NarrowMemProj` that captures
> the right adr type.
> 
> Logic for the construction of the `Allocate`/`Initialize` subgraph is
> tweaked so the right adr type captured in is own `NarrowMemProj` is
> added to the memory sugraph. Code that removes an allocation or moves
> it also has to be changed so it correctly takes the multiple memory
> projections out of the `Initialize` node into account.
> 
> One tricky issue is that when EA split types for a scalar replaceable
> `Allocate` node:
> 
> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>   with the type of the slices for the allocation
>   
> 2- before EA, the memory state for one particular field out of the
>   `Initialize` node can be used for a `Store` to the just allocated
>   object or some other. So we can have a chain of `Store`s, some to
>   the newly allocated object, some to some other objects, all of them
>   using the state of `NarrowMemProj` out of the `Initialize`. After
>   split unique types, the `NarrowMemProj` is for the slice of a
>   particular allocation. So `Store`s to some other objects shouldn't
>   use that memory state but the memory state before the `Allocate`.
>   
> For that, I added logic to update the adr type of `NarrowMemProj`
> during split unique types and update the memory input of `Store`s that
> don't depend on the memory state ...

Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:

  review

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24570/files
  - new: https://git.openjdk.org/jdk/pull/24570/files/6ea8c811..9fd8dc1c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=12-13

  Stats: 42 lines in 10 files changed: 10 ins; 21 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/24570.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570

PR: https://git.openjdk.org/jdk/pull/24570

From jsjolen at openjdk.org  Mon Sep 22 14:37:56 2025
From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=)
Date: Mon, 22 Sep 2025 14:37:56 GMT
Subject: RFR: 8349835: C2: simplify IGV property printing
In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
Message-ID: <3p7tjcioIRB9sg923yLA2OGUCvAJ4XweNgQSzLFF4sw=.7dc31408-4528-4d54-b2eb-4591aa23dcc0@github.com>

On Fri, 22 Aug 2025 13:28:22 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708).
> 
> ### Fix 
> Implemented the suggested refactoring. 
> 
> ### Testing 
> Github Actions, Tier 1-3

Hi,

I'd change the data structure to have a tagged union with separate constructors to make the struct understandable.

src/hotspot/share/opto/idealGraphPrinter.cpp line 534:

> 532:         {((flags & Node::Flag_has_call) != 0), "has_call", "true"},
> 533:         {((flags & Node::Flag_has_swapped_edges) != 0), "has_swapped_edges", "true"}
> 534:       };

This code desperately needs a utility to check the corresponding bits in `flags` :-).

src/hotspot/share/opto/idealGraphPrinter.hpp line 123:

> 121:     const char *_svalue = nullptr;
> 122:     int _ivalue = -1;
> 123:   };

Get rid of optionals and explicit condition checking by introducing a tagged union.

```c++
struct IdealGraphPrintRecord {
  enum class State {
    False, String, Integer
  };
  State _state;
  const char* _name;
  union {
    const char* _svalue;
    int _ivalue;
  };
  IdealGraphPrintRecord(bool cond, const char* name, int value)
  : _state( cond ? State::Integer : State::False ), _name(name), _ivalue(value) {}
  IdealGraphPrintRecord(bool cond, const char* name, const char* value)
  : _state( cond ? State::String : State::False ), _name(name), _svalue(value) {}

  bool has_string() { return _state == State::String; }
  bool has_int() { return _state == State::Integer; }
  const char* string_value() { return _svalue; }
  const char* key() { ... }
  int int_value() { ... }
};

-------------

PR Review: https://git.openjdk.org/jdk/pull/26902#pullrequestreview-3253147647
PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2368702964
PR Review Comment: https://git.openjdk.org/jdk/pull/26902#discussion_r2368689760

From bmaillard at openjdk.org  Mon Sep 22 15:26:06 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Mon, 22 Sep 2025 15:26:06 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop
In-Reply-To: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
Message-ID: <4ueY5v0Oe3KJPCXMjjXoJN9QxVvbego84EgL1Zt42mw=.81431be0-b959-4d4d-a5bc-0d1e73080ec3@github.com>

On Tue, 16 Sep 2025 08:53:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
>> 
>> ### Context
>> 
>> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
>> 
>> 
>>     static public void test() {
>>         x = 0;
>>         for (int i = 0; i < 20000; i++) {
>>             x += i;
>>         }
>>         x = 0;
>>     }
>> 
>> 
>> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
>> 
>> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
>> 
>> ### Detailed Analysis
>> 
>> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
>> 
>> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
>> 
>> This is wh...
>
> src/hotspot/share/opto/loopTransform.cpp line 1797:
> 
>> 1795:         Node* mem_out = find_mem_out_outer_strip_mined(store, outer_loop);
>> 1796:         Node* store_new = old_new[store->_idx];
>> 1797:         store_new->set_req(MemNode::Memory, mem_out);
> 
> Could it be that there are multiple stores in a chain after the loop exit and before the SafePoint?
> 
> Loop
> Exit
> store1
> store2
> store3
> SafePoint
> 
> If so, they all have the same control, namely at the `if_false`.
> Their memory state should be ordered, where store2 depends on store1 and store3 on store2. Only store1 should then really have its memory input updated.
> 
> Your code now finds the `store_new` for each of store1, store2 and store3, and sets all of their memory inputs to `mem_out`. But that means that the "new" stores all have the same memory input, and are not in a chain any more. Did I see this right? Is that ok?

Yes, this can happen. This is actually what we test with the last test case (`test3`), and this is why we have the following:
```c++
// We don't make changes if the memory input is in the loop body as well
if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {

In this case, the condition is only true for `store1` (as its memory input would be last memory operation before the loop, or the memory `Parm`), but not for `store2` nor `store3`. We would only end up rewiring `store1`, and leave `store2` and `store3` as they are.
Does that make sense?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2368979720

From kxu at openjdk.org  Mon Sep 22 15:41:40 2025
From: kxu at openjdk.org (Kangcheng Xu)
Date: Mon, 22 Sep 2025 15:41:40 GMT
Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v9]
In-Reply-To: <RYYMa4Btd93FIC9tCnYMCX7HvuP4D-ODaICLXmjmKic=.5892c08b-92e9-4edd-b37f-cc13e90b469e@github.com>
References: <RYYMa4Btd93FIC9tCnYMCX7HvuP4D-ODaICLXmjmKic=.5892c08b-92e9-4edd-b37f-cc13e90b469e@github.com>
Message-ID: <E00bbRL9EN-UnlshSe7Da9KSJU8t7rzdrUbK4dSSenU=.fa58a8cd-ea01-4dd7-b73e-acceb5f012bc@github.com>

> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. 
> 
> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think.
> 
> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759).

Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision:

 - WIP: refactor structs to classes
 - WIP: removed dead code, renamed fields and signatures

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24458/files
  - new: https://git.openjdk.org/jdk/pull/24458/files/763adeda..5fd98f48

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=07-08

  Stats: 564 lines in 3 files changed: 275 ins; 187 del; 102 mod
  Patch: https://git.openjdk.org/jdk/pull/24458.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458

PR: https://git.openjdk.org/jdk/pull/24458

From bmaillard at openjdk.org  Mon Sep 22 15:44:33 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Mon, 22 Sep 2025 15:44:33 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop
In-Reply-To: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
Message-ID: <_0kPqH6Tw8sVUGLF70gvPpZouDAuMT5gzBrLuSaO5NY=.93189aa3-a110-442e-b952-b1e5ffafe70e@github.com>

On Tue, 16 Sep 2025 08:46:21 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
>> 
>> ### Context
>> 
>> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
>> 
>> 
>>     static public void test() {
>>         x = 0;
>>         for (int i = 0; i < 20000; i++) {
>>             x += i;
>>         }
>>         x = 0;
>>     }
>> 
>> 
>> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
>> 
>> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
>> 
>> ### Detailed Analysis
>> 
>> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
>> 
>> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
>> 
>> This is wh...
>
> src/hotspot/share/opto/loopTransform.cpp line 1793:
> 
>> 1791:     for (DUIterator j = if_false->outs(); if_false->has_out(j); j++) {
>> 1792:       Node* store = if_false->out(j)->isa_Store();
>> 1793:       // We don't make changes if the memory input is in the loop body as well
> 
> Why? I suppose that is because there must be a Phi in the loop then, right? Maybe state that in the comment here.

Having a memory input that is outside of the loop body is the situation where we would normally expect a `Phi`, and this is where we would like to intervene.

If the memory input is in the loop body as well, we can safely assume it is still correct as the whole body get cloned as a unit.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2369078466

From bmaillard at openjdk.org  Mon Sep 22 15:55:14 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Mon, 22 Sep 2025 15:55:14 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v2]
In-Reply-To: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
Message-ID: <MRFu5BvzRkiFqyvk1qT1slc7O4k0BspHokZU2NyCjoQ=.1fe586a0-00d6-4709-80c6-c2ac3c5ac75b@github.com>

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:

  Improve comment about the is_member condition

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27225/files
  - new: https://git.openjdk.org/jdk/pull/27225/files/5142bbf0..0fc0be30

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=00-01

  Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27225.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225

PR: https://git.openjdk.org/jdk/pull/27225

From kxu at openjdk.org  Mon Sep 22 16:13:35 2025
From: kxu at openjdk.org (Kangcheng Xu)
Date: Mon, 22 Sep 2025 16:13:35 GMT
Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v10]
In-Reply-To: <RYYMa4Btd93FIC9tCnYMCX7HvuP4D-ODaICLXmjmKic=.5892c08b-92e9-4edd-b37f-cc13e90b469e@github.com>
References: <RYYMa4Btd93FIC9tCnYMCX7HvuP4D-ODaICLXmjmKic=.5892c08b-92e9-4edd-b37f-cc13e90b469e@github.com>
Message-ID: <LD0ReC6QdvradTfp137BD3SEGnYj39N0KkNIccneePY=.52b484e0-3827-4627-bef6-6cb10bfa50a2@github.com>

> This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. 
> 
> A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think.
> 
> Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759).

Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision:

  WIP: remove unused #include

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/24458/files
  - new: https://git.openjdk.org/jdk/pull/24458/files/5fd98f48..a17bfb28

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=08-09

  Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/24458.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458

PR: https://git.openjdk.org/jdk/pull/24458

From bmaillard at openjdk.org  Mon Sep 22 16:15:31 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Mon, 22 Sep 2025 16:15:31 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v2]
In-Reply-To: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
Message-ID: <ialvmY2U9NGsBirLb2Kzej7sn0ZRwYsEARMiMpGnry8=.6ae1b044-01fb-497a-8984-a17a3970cf83@github.com>

On Tue, 16 Sep 2025 08:39:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Improve comment about the is_member condition
>
> src/hotspot/share/opto/loopTransform.cpp line 1788:
> 
>> 1786:   // right after the execution of the inner CountedLoop.
>> 1787:   // We have to make sure that such stores in the post loop have the right memory inputs from the main loop
>> 1788:   if (loop->tail()->in(0)->is_BaseCountedLoopEnd()) {
> 
> Out of curiosity: when would this condition be false?

I don't think it is ever false, I just changed it to use `main_end` directly instead. Thanks for pointing it out!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2369228441

From sviswanathan at openjdk.org  Mon Sep 22 16:20:14 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Mon, 22 Sep 2025 16:20:14 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v16]
In-Reply-To: <GFVrKRIgcLl23x-KBrS8RiTNuR2VC9aWqblxrhzrIbw=.a46690ad-e2d3-44db-81df-1c98f011e6a8@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <GFVrKRIgcLl23x-KBrS8RiTNuR2VC9aWqblxrhzrIbw=.a46690ad-e2d3-44db-81df-1c98f011e6a8@github.com>
Message-ID: <XCLSTeZYYSGmV2eHsn1YrnPe9BBKjWO15a57QSKX1h0=.ea388958-05a2-4b23-a0b0-60a03f249e39@github.com>

On Sat, 20 Sep 2025 00:04:43 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Use compiler generator instead of standard Java streams

Looks good to me.

-------------

Marked as reviewed by sviswanathan (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26919#pullrequestreview-3253836381

From bmaillard at openjdk.org  Mon Sep 22 16:21:03 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Mon, 22 Sep 2025 16:21:03 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v3]
In-Reply-To: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
Message-ID: <-7pSSnHXqMofCoAHqnzlEfViPzVfaGHfWrRiqv3Hfps=.9913af97-e93d-4726-8bd2-5151f836edcd@github.com>

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:

  Use main_end instead of loop->tail()->in(0)

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27225/files
  - new: https://git.openjdk.org/jdk/pull/27225/files/0fc0be30..f2eb376f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=01-02

  Stats: 14 lines in 1 file changed: 0 ins; 2 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/27225.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225

PR: https://git.openjdk.org/jdk/pull/27225

From manc at openjdk.org  Mon Sep 22 18:08:57 2025
From: manc at openjdk.org (Man Cao)
Date: Mon, 22 Sep 2025 18:08:57 GMT
Subject: RFR: 8368071: Compilation throughput regressed 2X-8X after
 JDK-8355003
In-Reply-To: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
References: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
Message-ID: <efhw9DWcFFm6KNF8D929pABqGE0XYLh2Gp3smsWPh2k=.0abcd7d5-26db-45c0-9158-d3f9451dd915@github.com>

On Fri, 19 Sep 2025 07:52:16 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi all,
> 
> Could anyone review this change that fixes a severe startup performance regression for `-XX:+TieredCompilation`?  See https://bugs.openjdk.org/browse/JDK-8368071 for more details.
> 
> -Man

Thanks for the reviews and testing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27383#issuecomment-3320577301

From manc at openjdk.org  Mon Sep 22 18:08:58 2025
From: manc at openjdk.org (Man Cao)
Date: Mon, 22 Sep 2025 18:08:58 GMT
Subject: Integrated: 8368071: Compilation throughput regressed 2X-8X after
 JDK-8355003
In-Reply-To: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
References: <WYxcB5a4pW-Lagi-N4oP7VaB7INF2_i6nCJFPyJlvtw=.68ea2dec-842d-4f8e-a1f3-8b720899d462@github.com>
Message-ID: <-Hz6VfUuIve0WL6f4qE6rmt9A_n4jlhbiZcSBn6MX3g=.7637e91e-ae87-4dcc-a226-1ba6f83fe274@github.com>

On Fri, 19 Sep 2025 07:52:16 GMT, Man Cao <manc at openjdk.org> wrote:

> Hi all,
> 
> Could anyone review this change that fixes a severe startup performance regression for `-XX:+TieredCompilation`?  See https://bugs.openjdk.org/browse/JDK-8368071 for more details.
> 
> -Man

This pull request has now been integrated.

Changeset: bdfe05b5
Author:    Man Cao <manc at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/bdfe05b595d86c62f7dad78549023a3426423679
Stats:     13 lines in 1 file changed: 8 ins; 2 del; 3 mod

8368071: Compilation throughput regressed 2X-8X after JDK-8355003

Reviewed-by: iveresov, shade

-------------

PR: https://git.openjdk.org/jdk/pull/27383

From hgreule at openjdk.org  Mon Sep 22 21:40:47 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Mon, 22 Sep 2025 21:40:47 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v2]
In-Reply-To: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
Message-ID: <Ph9HXqSUYRL1mc3ZIcTpgW74Zi2BRyLNKb2ZgsOId14=.f59bf6db-576b-494e-be4d-0080c3a55e96@github.com>

> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
> 
> Please review :)

Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:

  add a second @run

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27408/files
  - new: https://git.openjdk.org/jdk/pull/27408/files/9ba78d4e..ade824e0

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27408&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27408&range=00-01

  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27408.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27408/head:pull/27408

PR: https://git.openjdk.org/jdk/pull/27408

From hgreule at openjdk.org  Mon Sep 22 21:40:48 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Mon, 22 Sep 2025 21:40:48 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v2]
In-Reply-To: <H_fX6kXWVfcTAcqzz6vID1if2ytB9MlSIiGze3y3JDw=.f0a3718c-6e00-49c1-963a-14519ae3262e@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <H_fX6kXWVfcTAcqzz6vID1if2ytB9MlSIiGze3y3JDw=.f0a3718c-6e00-49c1-963a-14519ae3262e@github.com>
Message-ID: <T2ypAgAcdTSOT5NhhE4uLbwkyYVjfk6AGA-4TXn2lXw=.7a820e51-010c-4d63-b258-90b54d43cd91@github.com>

On Mon, 22 Sep 2025 13:24:00 GMT, Beno?t Maillard <bmaillard at openjdk.org> wrote:

>> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   add a second @run
>
> Looks good to me, I only have one minor comment.

I added the suggestion from @benoitmaillard and fixed a typo in the test summary. Please let me know when the test results are in :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27408#issuecomment-3321571721

From sviswanathan at openjdk.org  Mon Sep 22 23:37:59 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Mon, 22 Sep 2025 23:37:59 GMT
Subject: RFR: 8350468: x86: Improve implementation of vectorized
 numberOfLeadingZeros for int and long
In-Reply-To: <yHNrHZPQHRCeyuxZRI4G7Jfiw_lxXLxROKfrXBWS_-U=.d6b4abe9-bca8-4f94-bd80-3f454bb93672@github.com>
References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
 <yHNrHZPQHRCeyuxZRI4G7Jfiw_lxXLxROKfrXBWS_-U=.d6b4abe9-bca8-4f94-bd80-3f454bb93672@github.com>
Message-ID: <Bgp2buwK62tN9xMFs5Dk6tIAMD9ObJHLXQBfXym_s2E=.dbdd2e0c-2b97-4279-be26-a6f95341abc4@github.com>

On Mon, 22 Sep 2025 03:00:30 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> Hi! May I have some reviews on this? Maybe @jatin-bhateja or @sviswa7 since this is an x86 backend change.

I will try to review it this week.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26610#issuecomment-3321892883

From fyang at openjdk.org  Tue Sep 23 03:36:02 2025
From: fyang at openjdk.org (Fei Yang)
Date: Tue, 23 Sep 2025 03:36:02 GMT
Subject: RFR: 8368247: RISC-V: enable vectorapi test for expand operation
In-Reply-To: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
References: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
Message-ID: <tPDyIdYPu-q5LLliQt58CQ9FLAIzVw3dUSsY62QFfKI=.beb1b06d-8a18-4c96-a926-da6bff1a73a5@github.com>

On Mon, 22 Sep 2025 09:09:05 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8363989](https://bugs.openjdk.org/browse/JDK-8363989) adds a vectorapi test for VectorAPI expand operation, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorExpandTest.java on k1 and sg2042

Looks fine assuming you are testing with a fastdebug build.

-------------

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27420#pullrequestreview-3256025250

From dzhang at openjdk.org  Tue Sep 23 06:19:11 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Tue, 23 Sep 2025 06:19:11 GMT
Subject: RFR: 8368247: RISC-V: enable vectorapi test for expand operation
In-Reply-To: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
References: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
Message-ID: <Uf68Qq8HK1NEhpZw1ww0s9WLd_4mV9_kwSKSR2ZPCgQ=.c1bb8cf0-edef-4ace-a6ac-8855076ea1f7@github.com>

On Mon, 22 Sep 2025 09:09:05 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8363989](https://bugs.openjdk.org/browse/JDK-8363989) adds a vectorapi test for VectorAPI expand operation, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorExpandTest.java on k1 and sg2042

> Looks good. Thanks! I saw in #26740 it added `EXPAND_VX` in IRNode.java . Does this mean we did not test this ExpandV previously when it's implemented?

@Hamlin-Li  Thanks for the review! We did not add IR-related tests when we introduced it, but instead used the general tests under test/jdk/jdk/incubator/vector to print the nodes.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27420#issuecomment-3322573855

From dzhang at openjdk.org  Tue Sep 23 06:47:31 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Tue, 23 Sep 2025 06:47:31 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <Vg8YsRztlY_-bhe05rpBS39jcLb9vHAGG4kXibYMa7M=.3281cc09-e70f-4cd2-9bbf-698329486546@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
 <Vg8YsRztlY_-bhe05rpBS39jcLb9vHAGG4kXibYMa7M=.3281cc09-e70f-4cd2-9bbf-698329486546@github.com>
Message-ID: <cJoNPYLDWCQzkl7IoEdkvoHe1jwW8M67lDRx0DJefIQ=.47d597da-b267-4b03-9390-0910c364e909@github.com>

On Mon, 22 Sep 2025 12:26:37 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hey, I'm wondering if all the tests under hotspot/jtreg/compiler/vectorapi should `@require` rvv? Otherwise seems they are not really testing anything useful?

@Hamlin-Li Good question! I think almost all IR related tests need RVV. Some non-IR tests can use scalars to implement vectorapi, such as `compiler/vectorapi/TestVectorShuffleIotaByte.java`, which also passes on sg2042 (without RVV).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27414#issuecomment-3322635979

From dzhang at openjdk.org  Tue Sep 23 06:55:58 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Tue, 23 Sep 2025 06:55:58 GMT
Subject: RFR: 8368247: RISC-V: enable vectorapi test for expand operation
In-Reply-To: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
References: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
Message-ID: <Cgb6SkE9RdqymnISwErp2GyXRieq2nA0tCwNdqFNnwU=.0bce5b89-3ee1-4a07-b922-6bee3fc83366@github.com>

On Mon, 22 Sep 2025 09:09:05 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8363989](https://bugs.openjdk.org/browse/JDK-8363989) adds a vectorapi test for VectorAPI expand operation, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorExpandTest.java on k1, k230 and sg2042

Thanks all for the review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27420#issuecomment-3322649230

From duke at openjdk.org  Tue Sep 23 06:55:59 2025
From: duke at openjdk.org (duke)
Date: Tue, 23 Sep 2025 06:55:59 GMT
Subject: RFR: 8368247: RISC-V: enable vectorapi test for expand operation
In-Reply-To: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
References: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
Message-ID: <8ClI27TV-YQb1V403n8bHz_V72SopCZTn5G_yGmQUO4=.f0ecfc0a-a8c1-450b-b502-8e4e644366d3@github.com>

On Mon, 22 Sep 2025 09:09:05 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8363989](https://bugs.openjdk.org/browse/JDK-8363989) adds a vectorapi test for VectorAPI expand operation, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorExpandTest.java on k1, k230 and sg2042

@DingliZhang 
Your change (at version 7bce3039c3d053653b0b5d3a5b0022a3443aa5c2) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27420#issuecomment-3322653722

From dzhang at openjdk.org  Tue Sep 23 07:03:41 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Tue, 23 Sep 2025 07:03:41 GMT
Subject: Integrated: 8368247: RISC-V: enable vectorapi test for expand
 operation
In-Reply-To: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
References: <nWwtWid-xgAmIRSJnFvIPs-R5-kwVNtDMi3Fv9onfkY=.1cad59db-459d-46f3-b495-97c3bbe1fa3a@github.com>
Message-ID: <_VBIPziPiwAXj0Ts_i6Yti50K_nx1MeaBKDccmq8BfY=.58d8ac5b-db61-4ba0-9b8a-87937fe13b2a@github.com>

On Mon, 22 Sep 2025 09:09:05 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> [JDK-8363989](https://bugs.openjdk.org/browse/JDK-8363989) adds a vectorapi test for VectorAPI expand operation, which we can also enable on RISC-V.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorExpandTest.java on k1, k230 and sg2042

This pull request has now been integrated.

Changeset: 942b2177
Author:    Dingli Zhang <dzhang at openjdk.org>
Committer: Fei Yang <fyang at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/942b21772a05e30af344742a02db1643ad0e0227
Stats:     6 lines in 1 file changed: 0 ins; 0 del; 6 mod

8368247: RISC-V: enable vectorapi test for expand operation

Reviewed-by: mli, fyang

-------------

PR: https://git.openjdk.org/jdk/pull/27420

From bmaillard at openjdk.org  Tue Sep 23 07:37:38 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 07:37:38 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v4]
In-Reply-To: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
Message-ID: <IUG2JNI6aldp0_pCgJtEeeEdZFOPuXRGUQxUsA3B-9A=.78f565d8-854f-4110-b468-4bc23ccd799f@github.com>

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:

  Change naming as suggested

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27225/files
  - new: https://git.openjdk.org/jdk/pull/27225/files/f2eb376f..af346054

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=02-03

  Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27225.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225

PR: https://git.openjdk.org/jdk/pull/27225

From bmaillard at openjdk.org  Tue Sep 23 07:37:40 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 07:37:40 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v4]
In-Reply-To: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
Message-ID: <T3jXVeZHabIzbeN2TN8E0hFhDBbAKfDyQHqRRGpLikY=.f45e5626-98fa-4b0d-8993-59d414503fec@github.com>

On Tue, 16 Sep 2025 08:36:25 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Change naming as suggested
>
> src/hotspot/share/opto/loopTransform.cpp line 1679:
> 
>> 1677:       Node* next = out->fast_out(l);
>> 1678:       if (next->is_Mem() && next->in(MemNode::Memory) == out) {
>> 1679:         IdealLoopTree* output_loop = get_loop(get_ctrl(next));
> 
> I would keep the names for `next` and `output_loop` consistent. Maybe `next_loop`? Or just call them `use` and `use_loop`?

Good point, I have changed it to `use` and `use_loop`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2371440440

From chagedorn at openjdk.org  Tue Sep 23 07:49:21 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 23 Sep 2025 07:49:21 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v2]
In-Reply-To: <Ph9HXqSUYRL1mc3ZIcTpgW74Zi2BRyLNKb2ZgsOId14=.f59bf6db-576b-494e-be4d-0080c3a55e96@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <Ph9HXqSUYRL1mc3ZIcTpgW74Zi2BRyLNKb2ZgsOId14=.f59bf6db-576b-494e-be4d-0080c3a55e96@github.com>
Message-ID: <oyzBL6N5XyYE0A2gv8Js0zfG-JIixBVwB08DYYraxnA=.6fd6b835-6c74-4ab8-9327-d2cc3de1a702@github.com>

On Mon, 22 Sep 2025 21:40:47 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
>> 
>> Please review :)
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add a second @run

test/hotspot/jtreg/compiler/c2/TestModValueMonotonic.java line 1:

> 1: /*

Another thing: You could move this test to `compiler/ccp` which fits better than the generic `c2` folder.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27408#discussion_r2371467349

From snatarajan at openjdk.org  Tue Sep 23 07:55:24 2025
From: snatarajan at openjdk.org (Saranya Natarajan)
Date: Tue, 23 Sep 2025 07:55:24 GMT
Subject: RFR: 8349835: C2: simplify IGV property printing
In-Reply-To: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
References: <16DzrQX_urXyKeFfY1FlaEM8Q9QjYgc0CHa25EWQV84=.ecec4b5d-c7d9-4060-9b76-0a4e4e0786e3@github.com>
Message-ID: <fEsNbYcHypkrZCE_vFA5xK7cF_tc5ZXfnoRuHq5-mtU=.41e6e983-46d6-4fed-a075-358f83ca990c@github.com>

On Fri, 22 Aug 2025 13:28:22 GMT, Saranya Natarajan <snatarajan at openjdk.org> wrote:

> The code that prints node properties and live range properties is very verbose and repetitive and could be simplified by applying a refactoring suggested [here](https://github.com/openjdk/jdk/pull/23558#discussion_r1950785708).
> 
> ### Fix 
> Implemented the suggested refactoring. 
> 
> ### Testing 
> Github Actions, Tier 1-3

Thank you for the review. 
@jdksjolen : I did think of tagged unions while fixing the issue. I did not implement it as my understanding was that  it necessarily does not decrease the code size. However, I agree it is better than the current implementation. I plan to try out the changes suggested by @chhagedorn to see if it introduces more opportunities to refactor 'visit_node()'. If it is not feasible, I fall back to tagged union.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26902#issuecomment-3322823817

From hgreule at openjdk.org  Tue Sep 23 08:28:00 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Tue, 23 Sep 2025 08:28:00 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v3]
In-Reply-To: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
Message-ID: <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>

> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
> 
> Please review :)

Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:

  move test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27408/files
  - new: https://git.openjdk.org/jdk/pull/27408/files/ade824e0..0193749b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27408&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27408&range=01-02

  Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/27408.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27408/head:pull/27408

PR: https://git.openjdk.org/jdk/pull/27408

From hgreule at openjdk.org  Tue Sep 23 08:28:03 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Tue, 23 Sep 2025 08:28:03 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v2]
In-Reply-To: <oyzBL6N5XyYE0A2gv8Js0zfG-JIixBVwB08DYYraxnA=.6fd6b835-6c74-4ab8-9327-d2cc3de1a702@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <Ph9HXqSUYRL1mc3ZIcTpgW74Zi2BRyLNKb2ZgsOId14=.f59bf6db-576b-494e-be4d-0080c3a55e96@github.com>
 <oyzBL6N5XyYE0A2gv8Js0zfG-JIixBVwB08DYYraxnA=.6fd6b835-6c74-4ab8-9327-d2cc3de1a702@github.com>
Message-ID: <vNA5eUaNmfH-cIk9TOIoNWZYH2rE3h92qpLF5zPP0EU=.2ba091da-6ef8-4eb1-abf2-1916d44176de@github.com>

On Tue, 23 Sep 2025 07:46:49 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   add a second @run
>
> test/hotspot/jtreg/compiler/ccp/TestModValueMonotonic.java line 1:
> 
>> (failed to retrieve contents of file, check the PR for context)
> Another thing: You could move this test to `compiler/ccp` which fits better than the generic `c2` folder.

Done!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27408#discussion_r2371553540

From bmaillard at openjdk.org  Tue Sep 23 08:36:31 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 08:36:31 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v4]
In-Reply-To: <EkXO0UA215XqrSBCd9ZD6HRnbwpjkIucIiaJaN8yjuY=.89df443d-16bc-4ab0-8b2e-a6e451e2d7ca@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
 <XmnT5_5b88bstnI4s9dViTVJPFIccAizD0yjezUHohI=.8cc48a49-6e9e-4595-9b5b-2f56ee89590f@github.com>
 <KKoDMHzNe6b5JRZBdlA9bQumxKfPlU2LWzZtcnjvS7w=.17239127-765f-4d8c-9b4c-4fe589ff5db0@github.com>
 <EkXO0UA215XqrSBCd9ZD6HRnbwpjkIucIiaJaN8yjuY=.89df443d-16bc-4ab0-8b2e-a6e451e2d7ca@github.com>
Message-ID: <KBRORT0pmrdgUE6S8AbKqPL0bEaBnXr8XyRenGGE1CI=.17196ca9-4790-40cd-9689-d406433b6536@github.com>

On Tue, 16 Sep 2025 08:58:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Ok, I think I have been misled by the names / comments.
>> You are really looking for the last store in the `outer_loop`. And we do have the guarantee of a linear memory graph because it is the one between `if_false` and SafePoint.
>
> I think a better method name would help a lot ;)

> What happens here if we hit an if-diamond (or more complicated), where there can be multiple memory uses, that are then merged again by a memory phi?

This actually cannot happen because of the conditions in `PhaseIdealLoop::try_move_store_after_loop`. [There](https://github.com/benoitmaillard/jdk/blob/af346054e27919bb407ece9c3b8ce206899458ca/src/hotspot/share/opto/loopopts.cpp#L1002-L1023), before moving the store, we make sure that any user of the store is either:
- the `Phi` node attached to the loop head
- outside of the loop body

This means we cannot have any branch (though we can have chains), and it guarantees that the memory subgraph is linear within the loop body.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2371576206

From bmaillard at openjdk.org  Tue Sep 23 08:56:58 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 08:56:58 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v4]
In-Reply-To: <XmnT5_5b88bstnI4s9dViTVJPFIccAizD0yjezUHohI=.8cc48a49-6e9e-4595-9b5b-2f56ee89590f@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
 <XmnT5_5b88bstnI4s9dViTVJPFIccAizD0yjezUHohI=.8cc48a49-6e9e-4595-9b5b-2f56ee89590f@github.com>
Message-ID: <KK49mHoT92-rzlY8l_8xYnQLgnJSMi5SnXzYSpmYTaA=.000bae69-5fcb-4f8f-9a25-3eb98ef589bf@github.com>

On Tue, 16 Sep 2025 08:44:25 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/loopTransform.cpp line 1794:
>> 
>>> 1792:       Node* store = if_false->out(j)->isa_Store();
>>> 1793:       // We don't make changes if the memory input is in the loop body as well
>>> 1794:       if (store && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
>> 
>> Suggestion:
>> 
>>       if (store != nullptr && !outer_loop->is_member(get_loop(get_ctrl(store->in(MemNode::Memory))))) {
>> 
>> No implicit null or zero checks, see hotspot style guide ;)
>
> The loop nesting check looks a bit convoluted. Consider refactoring a little. Could you get rid of the `!` by swapping things around?
> `get_loop(get_ctrl(store->in(MemNode::Memory))))->is_member(outer_loop)`
> Does not look that much better either... hmm.

> No implicit null or zero checks, see hotspot style guide ;)

Missed that, thanks for the reminder!

> The loop nesting check looks a bit convoluted. Consider refactoring a little. Could you get rid of the ! by swapping things around?

I personally think it looks more intuitive with the `!`, but I agree it is a bit convoluted. I have added an intermediate variable to make it more readable.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2371622867

From bmaillard at openjdk.org  Tue Sep 23 09:16:49 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 09:16:49 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v5]
In-Reply-To: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
Message-ID: <JzZXvfvKwK6XzVNc03q79kZIwXOya03xad2LpjhOhNE=.54f6a773-741e-423d-b945-c910b6b8dff7@github.com>

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision:

 - More minor refactoring and renaming
 - Minor refactor

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27225/files
  - new: https://git.openjdk.org/jdk/pull/27225/files/af346054..cc818739

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=03-04

  Stats: 11 lines in 2 files changed: 5 ins; 0 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/27225.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225

PR: https://git.openjdk.org/jdk/pull/27225

From bmaillard at openjdk.org  Tue Sep 23 09:16:52 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 09:16:52 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v5]
In-Reply-To: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
Message-ID: <eMiGFmNWH0jCuKhHLhaet5f4IGZHmHVevrZBafznn6A=.369cb9e0-bc06-42c4-8db7-7921c1186585@github.com>

On Tue, 16 Sep 2025 08:22:28 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - More minor refactoring and renaming
>>  - Minor refactor
>
> src/hotspot/share/opto/loopnode.hpp line 1384:
> 
>> 1382: 
>> 1383:   // Find the last memory node in the loop when following memory usages
>> 1384:   Node *find_mem_out_outer_strip_mined(Node* store, IdealLoopTree* outer_loop);
> 
> The name of the method is a bit confusing. And the comment seems to suggest something different than what the code says.

The name was really bad indeed, sorry for that. I have renamed it to `find_last_store_in_outer_loop`, and added a comment to explain why we have the guarantee of a linear graph here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2371664300

From duke at openjdk.org  Tue Sep 23 09:54:53 2025
From: duke at openjdk.org (erifan)
Date: Tue, 23 Sep 2025 09:54:53 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v3]
In-Reply-To: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
Message-ID: <jPOasUwP5m_uEo6K07ybBr_QQKmv-vunDU-78Kz6VWg=.6d66e01a-03a2-477a-8368-de983eaa88c6@github.com>

> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
> 
> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
> 
> This pull request introduces the following changes:
> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
> 2. Eliminates unnecessary compress operations for partial subword type cases.
> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
> 
> Benchmark results demonstrate that these changes significantly improve performance.
> 
> Benchmarks on Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
> 
> 
> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.

erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - Improve some code style
 - Merge branch 'master' into JDK-8366333-compress
 - Merge branch 'master' into JDK-8366333-compress
 - 8366333: AArch64: Enhance SVE subword type implementation of vector compress
   
   The AArch64 SVE and SVE2 architectures lack an instruction suitable for
   subword-type `compress` operations. Therefore, the current implementation
   uses the 32-bit SVE `compact` instruction to compress subword types by
   first widening the high and low parts to 32 bits, compressing them, and
   then narrowing them back to their original type. Finally, the high and
   low parts are merged using the `index + tbl` instructions.
   
   This approach is significantly slower compared to architectures with native
   support. After evaluating all available AArch64 SVE instructions and
   experimenting with various implementations?such as looping over the active
   elements, extraction, and insertion?I confirmed that the existing algorithm
   is optimal given the instruction set. However, there is still room for
   optimization in the following two aspects:
   1. Merging with `index + tbl` is suboptimal due to the high latency of
   the `index` instruction.
   2. For partial subword types, operations to the highest half are unnecessary
   because those bits are invalid.
   
   This pull request introduces the following changes:
   1. Replaces `index + tbl` with the `whilelt + splice` instructions, which
   offer lower latency and higher throughput.
   2. Eliminates unnecessary compress operations for partial subword type cases.
   3. For `sve_compress_byte`, one less temporary register is used to alleviate
   potential register pressure.
   
   Benchmark results demonstrate that these changes significantly improve performance.
   
   Benchmarks on Nvidia Grace machine with 128-bit SVE:
   ```
   Benchmark	        Unit	Before	 Error	After	 Error	Uplift
   Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
   Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
   Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
   Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
   ```
   
   This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments,
   and all tests passed.

-------------

Changes: https://git.openjdk.org/jdk/pull/27188/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27188&range=02
  Stats: 420 lines in 9 files changed: 303 ins; 24 del; 93 mod
  Patch: https://git.openjdk.org/jdk/pull/27188.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27188/head:pull/27188

PR: https://git.openjdk.org/jdk/pull/27188

From duke at openjdk.org  Tue Sep 23 10:00:57 2025
From: duke at openjdk.org (erifan)
Date: Tue, 23 Sep 2025 10:00:57 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v3]
In-Reply-To: <jPOasUwP5m_uEo6K07ybBr_QQKmv-vunDU-78Kz6VWg=.6d66e01a-03a2-477a-8368-de983eaa88c6@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <jPOasUwP5m_uEo6K07ybBr_QQKmv-vunDU-78Kz6VWg=.6d66e01a-03a2-477a-8368-de983eaa88c6@github.com>
Message-ID: <K75xpUg3sG9NBOTfAY4uQwWZBwBf8ELb_YvgSsmxR1c=.0647dd84-17c0-401c-827f-69f965167c75@github.com>

On Tue, 23 Sep 2025 09:54:53 GMT, erifan <duke at openjdk.org> wrote:

>> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
>> 
>> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
>> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
>> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
>> 
>> This pull request introduces the following changes:
>> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
>> 2. Eliminates unnecessary compress operations for partial subword type cases.
>> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
>> 
>> Benchmark results demonstrate that these changes significantly improve performance.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
>> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>> 
>> 
>> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - Improve some code style
>  - Merge branch 'master' into JDK-8366333-compress
>  - Merge branch 'master' into JDK-8366333-compress
>  - 8366333: AArch64: Enhance SVE subword type implementation of vector compress
>    
>    The AArch64 SVE and SVE2 architectures lack an instruction suitable for
>    subword-type `compress` operations. Therefore, the current implementation
>    uses the 32-bit SVE `compact` instruction to compress subword types by
>    first widening the high and low parts to 32 bits, compressing them, and
>    then narrowing them back to their original type. Finally, the high and
>    low parts are merged using the `index + tbl` instructions.
>    
>    This approach is significantly slower compared to architectures with native
>    support. After evaluating all available AArch64 SVE instructions and
>    experimenting with various implementations?such as looping over the active
>    elements, extraction, and insertion?I confirmed that the existing algorithm
>    is optimal given the instruction set. However, there is still room for
>    optimization in the following two aspects:
>    1. Merging with `index + tbl` is suboptimal due to the high latency of
>    the `index` instruction.
>    2. For partial subword types, operations to the highest half are unnecessary
>    because those bits are invalid.
>    
>    This pull request introduces the following changes:
>    1. Replaces `index + tbl` with the `whilelt + splice` instructions, which
>    offer lower latency and higher throughput.
>    2. Eliminates unnecessary compress operations for partial subword type cases.
>    3. For `sve_compress_byte`, one less temporary register is used to alleviate
>    potential register pressure.
>    
>    Benchmark results demonstrate that these changes significantly improve performance.
>    
>    Benchmarks on Nvidia Grace machine with 128-bit SVE:
>    ```
>    Benchmark	        Unit	Before	 Error	After	 Error	Uplift
>    Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>    Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>    Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>    Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>    ```
>    
>    This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments,
>    and all tests passed.

Thanks for your review @eme64 . Have a nice trip!

-------------

PR Review: https://git.openjdk.org/jdk/pull/27188#pullrequestreview-3257150852

From duke at openjdk.org  Tue Sep 23 10:01:02 2025
From: duke at openjdk.org (erifan)
Date: Tue, 23 Sep 2025 10:01:02 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v2]
In-Reply-To: <EsiYouuvqjFpbUVOKPtUBymx12t--iEc7QNUwBrdDJo=.545aa38b-8933-44a7-9ae5-51872308596c@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>
 <EsiYouuvqjFpbUVOKPtUBymx12t--iEc7QNUwBrdDJo=.545aa38b-8933-44a7-9ae5-51872308596c@github.com>
Message-ID: <cNMiRf91z-H_yl6qHNtvczMcsU0w50W27JwXDh2BojM=.7ce8e47f-eda8-4b76-8b7d-9d98da67912c@github.com>

On Tue, 16 Sep 2025 06:54:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>> 
>>  - Merge branch 'master' into JDK-8366333-compress
>>  - 8366333: AArch64: Enhance SVE subword type implementation of vector compress
>>    
>>    The AArch64 SVE and SVE2 architectures lack an instruction suitable for
>>    subword-type `compress` operations. Therefore, the current implementation
>>    uses the 32-bit SVE `compact` instruction to compress subword types by
>>    first widening the high and low parts to 32 bits, compressing them, and
>>    then narrowing them back to their original type. Finally, the high and
>>    low parts are merged using the `index + tbl` instructions.
>>    
>>    This approach is significantly slower compared to architectures with native
>>    support. After evaluating all available AArch64 SVE instructions and
>>    experimenting with various implementations?such as looping over the active
>>    elements, extraction, and insertion?I confirmed that the existing algorithm
>>    is optimal given the instruction set. However, there is still room for
>>    optimization in the following two aspects:
>>    1. Merging with `index + tbl` is suboptimal due to the high latency of
>>    the `index` instruction.
>>    2. For partial subword types, operations to the highest half are unnecessary
>>    because those bits are invalid.
>>    
>>    This pull request introduces the following changes:
>>    1. Replaces `index + tbl` with the `whilelt + splice` instructions, which
>>    offer lower latency and higher throughput.
>>    2. Eliminates unnecessary compress operations for partial subword type cases.
>>    3. For `sve_compress_byte`, one less temporary register is used to alleviate
>>    potential register pressure.
>>    
>>    Benchmark results demonstrate that these changes significantly improve performance.
>>    
>>    Benchmarks on Nvidia Grace machine with 128-bit SVE:
>>    ```
>>    Benchmark	        Unit	Before	 Error	After	 Error	Uplift
>>    Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>>    Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>>    Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>>    Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>>    ```
>>    
>>    This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments,
>>    and all tests passed.
>
> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2287:
> 
>> 2285:   sve_compress_short(dst, vtmp1, ptmp, vtmp2, vtmp3, pgtmp, extended_size > MaxVectorSize ? MaxVectorSize : extended_size);
>> 2286:   // Narrow the result back to type BYTE.
>> 2287:   // dst   = 0 0 0 0 0 0 0 0 0 0 0 0 0 g c a
> 
> Can you make sure that your examples are all nicely aligned?

Done, thanks.

> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2315:
> 
>> 2313:   // Combine the compressed low with the compressed high.
>> 2314:   // dst   = 0 0 0 0 0 0 0 0 0 0 0 p i g c a
>> 2315:   sve_splice(dst, B, ptmp, vtmp1);
> 
> Alignment of examples would be nice

Done

> test/hotspot/jtreg/compiler/vectorapi/VectorCompressTest.java line 214:
> 
>> 212: 
>> 213:     @Test
>> 214:     @IR(counts = { IRNode.COMPRESS_VD, "= 1" }, applyIfCPUFeature = { "sve", "true" })
> 
> Could you please change this so that the `applyIfCPUFeature` is on a new line?
> That would make it easier to add more platforms later :)

Done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2371774311
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2371775262
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2371776740

From duke at openjdk.org  Tue Sep 23 10:01:05 2025
From: duke at openjdk.org (erifan)
Date: Tue, 23 Sep 2025 10:01:05 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v2]
In-Reply-To: <WxnlIDe3oqkkECuLdBvLKF3XxKxVw8VSL-v2jSpyfgY=.d2d9b002-be0b-4d75-a72d-6ac2affd9cd1@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <NsYuK9-Y_-7KzBniFLpkNeiLslPm-a83mE_GMvMN7oo=.109f1b82-42b1-4f09-b8af-99bc2a9f2528@github.com>
 <EsiYouuvqjFpbUVOKPtUBymx12t--iEc7QNUwBrdDJo=.545aa38b-8933-44a7-9ae5-51872308596c@github.com>
 <Qrf5fEdzfsAlFRuEf2DrPf7Thj4xkdOhM_pjWv3j82Y=.1272dda2-b5bd-4649-a87d-d086f5d99fea@github.com>
 <WxnlIDe3oqkkECuLdBvLKF3XxKxVw8VSL-v2jSpyfgY=.d2d9b002-be0b-4d75-a72d-6ac2affd9cd1@github.com>
Message-ID: <T0W_kxxCLhBoOkAXi_ZlCz_iiEHfsonW3MLwu-bc5eA=.386320b2-7a55-40b8-8f1d-75cd6bebbde6@github.com>

On Wed, 17 Sep 2025 06:25:09 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> It seems that the summary and the PR title are usually consistent. Is there any convention or rule for this?
>
> I think that people often just do whatever they feel like. But I think the summary should summarize the content of the test, give maybe a reason for the test. Sometimes the PR title captures the intent of the test, then I'm fine with that. But sometimes the PR title is not quite adequate, maybe too narrow like here. But it is not a big deal, just a little nit ;)

Done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2371776192

From duke at openjdk.org  Tue Sep 23 10:10:49 2025
From: duke at openjdk.org (erifan)
Date: Tue, 23 Sep 2025 10:10:49 GMT
Subject: RFR: 8367391: Loss of precision on implicit conversion in
 vectornode.cpp
Message-ID: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>

There is an issue with fix against JDK-8356760 on windows-x64 related to following lines

  long mask = (-1ULL >> (64 - vlen));
  long bit  = type->get_con() & mask;


`-1ULL` is an unsigned **64-bit** value; on Linux/macOS-x64, `long` is **64** bits, but on Windows-x64 it?s **32** bits. When assigning `-1ULL >> (64 - vlen)` to a `long` on Windows-x64, the **64-bit** result is truncated to **32** bits, causing precision loss as the upper 32 bits are discarded.

This pull request addresses the issue by replacing the `long` type with `jlong`. The fix has been verified on a Windows x64 machine with avx-512 support and resolves the reported problem.

-------------

Commit messages:
 - 8367391: Loss of precision on implicit conversion in vectornode.cpp

Changes: https://git.openjdk.org/jdk/pull/27449/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27449&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367391
  Stats: 33 lines in 2 files changed: 6 ins; 0 del; 27 mod
  Patch: https://git.openjdk.org/jdk/pull/27449.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27449/head:pull/27449

PR: https://git.openjdk.org/jdk/pull/27449

From chagedorn at openjdk.org  Tue Sep 23 11:42:04 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Tue, 23 Sep 2025 11:42:04 GMT
Subject: RFR: 8367391: Loss of precision on implicit conversion in
 vectornode.cpp
In-Reply-To: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
References: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
Message-ID: <KFFFqjZN8t0d27h5gkFAi-nIlmBRZvK09G_S3awY214=.ad7eca4e-6b83-46cf-ad5e-2fc68a7e48e1@github.com>

On Tue, 23 Sep 2025 10:03:09 GMT, erifan <duke at openjdk.org> wrote:

> There is an issue with fix against JDK-8356760 on windows-x64 related to following lines
> 
>   long mask = (-1ULL >> (64 - vlen));
>   long bit  = type->get_con() & mask;
> 
> 
> `-1ULL` is an unsigned **64-bit** value; on Linux/macOS-x64, `long` is **64** bits, but on Windows-x64 it?s **32** bits. When assigning `-1ULL >> (64 - vlen)` to a `long` on Windows-x64, the **64-bit** result is truncated to **32** bits, causing precision loss as the upper 32 bits are discarded.
> 
> This pull request addresses the issue by replacing the `long` type with `jlong`. The fix has been verified on a Windows x64 machine with avx-512 support and resolves the reported problem.

That was easy to miss, thanks for the fix. Looks good to me!

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27449#pullrequestreview-3257541918

From roland at openjdk.org  Tue Sep 23 11:48:36 2025
From: roland at openjdk.org (Roland Westrelin)
Date: Tue, 23 Sep 2025 11:48:36 GMT
Subject: RFR: 8367391: Loss of precision on implicit conversion in
 vectornode.cpp
In-Reply-To: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
References: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
Message-ID: <yuZcr-sFkUa4JC5-W8PoAYqmB_zL0RMc1i9-T9N8wEI=.46fbab03-ab44-49ac-8c6e-f5d16b69bc38@github.com>

On Tue, 23 Sep 2025 10:03:09 GMT, erifan <duke at openjdk.org> wrote:

> There is an issue with fix against JDK-8356760 on windows-x64 related to following lines
> 
>   long mask = (-1ULL >> (64 - vlen));
>   long bit  = type->get_con() & mask;
> 
> 
> `-1ULL` is an unsigned **64-bit** value; on Linux/macOS-x64, `long` is **64** bits, but on Windows-x64 it?s **32** bits. When assigning `-1ULL >> (64 - vlen)` to a `long` on Windows-x64, the **64-bit** result is truncated to **32** bits, causing precision loss as the upper 32 bits are discarded.
> 
> This pull request addresses the issue by replacing the `long` type with `jlong`. The fix has been verified on a Windows x64 machine with avx-512 support and resolves the reported problem.

Looks good to me too.

-------------

Marked as reviewed by roland (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27449#pullrequestreview-3257568107

From vlivanov at openjdk.org  Tue Sep 23 14:15:38 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Sep 2025 14:15:38 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v3]
In-Reply-To: <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
Message-ID: <TjIJ8kANFmXd5c4CRKd2yndTcblECPfgiuXM3lT4pS0=.cf5fda40-7cef-4f25-a74c-4f7a9899dfa2@github.com>

On Tue, 23 Sep 2025 08:28:00 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
>> 
>> Please review :)
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   move test

src/hotspot/share/opto/divnode.cpp line 1209:

> 1207:   if (t2 == Type::TOP) { return Type::TOP; }
> 1208: 
> 1209:   // Mod by zero?  Throw exception at runtime!

The comment is a bit confusing. It's not the node itself which produces the exception, but a dominating zero check (inserted during parsing). So, if a divisor becomes 0, it means the node is effectively dead and can go away.  

Also, the node should go away anyway as part of CFG pruning of dead branches when corresponding guard goes away. 

BTW if there are cases when control is not eliminated, it may irrevocably break the IR causing crashes down the road (take a look at JDK-8154831 as an example). So, maybe it's safer to just rely on dead control pruning to eliminate effectively dead ModI/ModL nodes and assert that there are no effectively dead ModI/ModL nodes present after GVN pass is over.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27408#discussion_r2372479648

From rcastanedalo at openjdk.org  Tue Sep 23 14:35:08 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 23 Sep 2025 14:35:08 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v32]
In-Reply-To: <_3SEByIuKhkAQvZ9gvMOHYMH2y_Xh9F4UM1lS2ixzpw=.f572fe77-31f0-4724-9611-9f53231d6bec@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <_3SEByIuKhkAQvZ9gvMOHYMH2y_Xh9F4UM1lS2ixzpw=.f572fe77-31f0-4724-9611-9f53231d6bec@github.com>
Message-ID: <Ph1cqeYUFa06PoXorpykpGSeOOMBmVNTkEalJRUy2wM=.e3463a64-d42b-4b5c-b91f-c4813c24a73a@github.com>

On Fri, 19 Sep 2025 16:02:35 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Increase timeout for TestMethodArguments.java

Looks good to me, thanks for the good work done here, Daniel! I only have a couple of minor suggestions.

src/hotspot/share/opto/regmask.hpp line 89:

> 87: 
> 88:   // RM_SIZE_IN_INTS_MIN, but in number of machine words
> 89:   static const unsigned int RM_SIZE_IN_WORDS_MIN =

This appears to be unused, consider removing.

src/hotspot/share/opto/regmask.hpp line 198:

> 196:   // (for a made-up platform with 10 registers and 4-bit
> 197:   // words) that has been extended with two additional words to represent more
> 198:   // stack locations:

Suggestion:

  // (for a made-up platform with 10 registers and 4-bit words) that has been
  // extended with two additional words to represent more stack locations:

src/hotspot/share/opto/regmask.hpp line 417:

> 415: 
> 416:   // A constructor only used by the ADLC output.  All mask fields are filled
> 417:   // in directly.  Calls to this look something like RM(1,2,3,4);

Consider updating this comment after the introduction of the `infinite_stack` parameter:

Suggestion:

  // in directly.  Calls to this look something like RM(0xc0, 0x0, 0x0, false);

test/hotspot/jtreg/compiler/arguments/TestMaxMethodArguments.java line 63:

> 61: 
> 62:     public static int test(int x1, int x2, int x3, int x4, int x5, int x6, int x7, int x8, int x9, int x10, int x11, int x12, int x13, int x14, int x15, int x16, int x17, int x18, int x19, int x20, int x21, int x22, int x23, int x24, int x25, int x26, int x27, int x28, int x29, int x30, int x31, int x32, int x33, int x34, int x35, int x36, int x37, int x38, int x39, int x40, int x41, int x42, int x43, int x44, int x45, int x46, int x47, int x48, int x49, int x50, int x51, int x52, int x53, int x54, int x55, int x56, int x57, int x58, int x59, int x60, int x61, int x62, int x63, int x64, int x65, int x66, int x67, int x68, int x69, int x70, int x71, int x72, int x73, int x74, int x75, int x76, int x77, int x78, int x79, int x80, int x81, int x82, int x83, int x84, int x85, int x86, int x87, int x88, int x89, int x90, int x91, int x92, int x93, int x94, int x95, int x96, int x97, int x98, int x99, int x100, int x101, int x102, int x103, int x104, int x105, int x106, int x107, int
  x108, int x109, int x110, int x111, int x112, int x113, int x114, int x115, int x116, int x117, int x118, int x119, int x120, int x121, int x122, int x123, int x124, int x125, int x126, int x127, int x128, int x129, int x130, int x131, int x132, int x133, int x134, int x135, int x136, int x137, int x138, int x139, int x140, int x141, int x142, int x143, int x144, int x145, int x146, int x147, int x148, int x149, int x150, int x151, int x152, int x153, int x154, int x155, int x156, int x157, int x158, int x159, int x160, int x161, int x162, int x163, int x164, int x165, int x166, int x167, int x168, int x169, int x170, int x171, int x172, int x173, int x174, int x175, int x176, int x177, int x178, int x179, int x180, int x181, int x182, int x183, int x184, int x185, int x186, int x187, int x188, int x189, int x190, int x191, int x192, int x193, int x194, int x195, int x196, int x197, int x198, int x199, int x200, int x201, int x202, int x203, int x204, int x205, int x206, int x207, 
 int x208, int x209, int x210, int x211, int x212, int x213, int x214, int x215, int x216, int x217, int x218, int x219, int x220, int x221, int x222, int x223, int x224, int x225, int x226, int x227, int x228, int x229, int x230, int x231, int x232, int x233, int x234, int x235, int x236, int x237, int x238, int x239, int x240, int x241, int x242, int x243, int x244, int x245, int x246, int x247, int x248, int x249, int x250, int x251, int x252, int x253, int x254, int x255) throws TestException {
> 63:         // Exceptions after every definition of a temporary forces the

Suggestion:

        // Exceptions after every definition of a temporary force the

-------------

Marked as reviewed by rcastanedalo (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-3258293347
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2372522663
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2372525679
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2372529815
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2372531549

From vlivanov at openjdk.org  Tue Sep 23 14:43:29 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Sep 2025 14:43:29 GMT
Subject: RFR: 8350468: x86: Improve implementation of vectorized
 numberOfLeadingZeros for int and long
In-Reply-To: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
Message-ID: <Pq72U2VOOpl6TaTP1uzv_w5vpyCo8wEwI08BM90YObI=.0f38e5d8-fcb3-46a5-bc12-216a1ccf3d19@github.com>

On Mon, 4 Aug 2025 02:20:31 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> Hi all,
> This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results:
> 
>                                  Baseline                        Patch        
> Benchmark              Mode  Cnt    Score   Error  Units    Score   Error  Units  Improvement
> LeadingZeros.testInt   avgt   15   91.097 ? 3.276  ns/op   68.665 ? 1.740  ns/op  (+ 28.1%)
> LeadingZeros.testLong  avgt   15  342.545 ? 4.470  ns/op  228.668 ? 5.994  ns/op  (+ 39.9%)
> 
> I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated!

Looks good. I'll submit it for testing.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26610#pullrequestreview-3258348601

From bmaillard at openjdk.org  Tue Sep 23 14:43:51 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 14:43:51 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v6]
In-Reply-To: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
Message-ID: <uslSZHqByK4m7mzyRcHop88pkf6_wt8leS0bqDerYoo=.bff3fd72-45fa-4f74-ba39-04091285d095@github.com>

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision:

 - Change last test and add comments
 - More refactoring

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27225/files
  - new: https://git.openjdk.org/jdk/pull/27225/files/cc818739..32686981

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=04-05

  Stats: 58 lines in 2 files changed: 40 ins; 2 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/27225.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225

PR: https://git.openjdk.org/jdk/pull/27225

From dlunden at openjdk.org  Tue Sep 23 14:51:10 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 23 Sep 2025 14:51:10 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v33]
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <wBkA2djfIlxl9tHTav_c0175hLKdGvYocJtvAPyGQdw=.64036373-ccda-4ecc-9077-3397db9e2719@github.com>

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:

  Update after Roberto's comments. Do not run TestMethodArguments under Xcomp. Further bump TestMethodArguments timeout to 1000 seconds.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20404/files
  - new: https://git.openjdk.org/jdk/pull/20404/files/1dd5084f..b61dd25c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=32
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=31-32

  Stats: 11 lines in 3 files changed: 1 ins; 5 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/20404.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404

PR: https://git.openjdk.org/jdk/pull/20404

From dlunden at openjdk.org  Tue Sep 23 14:51:14 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Tue, 23 Sep 2025 14:51:14 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v11]
In-Reply-To: <Wlji0f1gsOXK2jeh--1BFWrOnsEny-h73uysAQe-rfU=.36c13175-5397-4e71-8f06-ed8533dbb365@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com>
 <Wlji0f1gsOXK2jeh--1BFWrOnsEny-h73uysAQe-rfU=.36c13175-5397-4e71-8f06-ed8533dbb365@github.com>
Message-ID: <S58uQYnMPLbeIiM3frIyJe4_UF6lFhGkxRY_okB-5V4=.fabeb2ab-3681-4323-8bf7-f85876cc4e4c@github.com>

On Fri, 28 Mar 2025 13:24:09 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Extend example with offset register mask
>
> As we discussed offline, the test coverage of register mask operations with extended dynamic parts, non-zero offsets, etc. is fairly low (basically limited to the new JTReg tests included in this changeset). To increase coverage, I have extended `test_regmask.cpp` with tests that perform random operations on a register mask and on a reference bit set and check that the result is equivalent on both data structures. Here is the extension: https://github.com/openjdk/jdk/commit/4ee703f1ab73f8f43d4603d7fa88dcc8f4950ec0. I ran the random tests a few times on different platforms and could not find any failure, which gives a good confidence of the correctness of the register mask operation changes. I also tested the effectiveness of the tests themselves by injecting a few failures in the register mask implementation and confirming their detection. Feel free to include the test extensions in this changeset (you might want to go through the code and clean it up a bit before, though, things l
 ike e.g. naming consistency).

Thanks @robcasloz, updated! I'm running some additional sanity testing for the changeset currently. I plan to integrate tomorrow.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3324353001

From bmaillard at openjdk.org  Tue Sep 23 14:52:19 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 14:52:19 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v7]
In-Reply-To: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
Message-ID: <7pLx5T3gRd__Q-IHE0FvyVFwElhbQZRh0y2CJ9v9-v8=.f69d8910-31ba-4728-9b6e-7947247c2785@github.com>

> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
> 
> ### Context
> 
> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
> 
> 
>     static public void test() {
>         x = 0;
>         for (int i = 0; i < 20000; i++) {
>             x += i;
>         }
>         x = 0;
>     }
> 
> 
> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
> 
> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
> 
> ### Detailed Analysis
> 
> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
> 
> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
> 
> This is what the IR looks like after the creation of the post lo...

Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:

  More comments

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27225/files
  - new: https://git.openjdk.org/jdk/pull/27225/files/32686981..4b3b9a67

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27225&range=05-06

  Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/27225.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27225/head:pull/27225

PR: https://git.openjdk.org/jdk/pull/27225

From bmaillard at openjdk.org  Tue Sep 23 14:52:23 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 14:52:23 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v7]
In-Reply-To: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
Message-ID: <VNAsFYdXIoS-5gQx9g-lXoYQGj6NdxBr4UuBhi1aNmA=.63cd60db-a41d-4ffd-a706-d662b4c1b172@github.com>

On Tue, 16 Sep 2025 09:10:00 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   More comments
>
> test/hotspot/jtreg/compiler/loopstripmining/MissingStoreAfterOuterStripMinedLoop.java line 77:
> 
>> 75:         a1.field = 0;
>> 76:         a2.field = 0;
>> 77:     }
> 
> Do the field stores both float out of the loop, and end up in a chain between exit and safepoint? Might be nice to add some comments to these tests so we can see what examples you already cover and if we might need some more.

Yes, the entire chain floats out of the loop (each store is moved successively). I have added some comments about the structure that we are trying expose, and changed the test slightly as well.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2372583728

From rcastanedalo at openjdk.org  Tue Sep 23 14:54:24 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Tue, 23 Sep 2025 14:54:24 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v33]
In-Reply-To: <wBkA2djfIlxl9tHTav_c0175hLKdGvYocJtvAPyGQdw=.64036373-ccda-4ecc-9077-3397db9e2719@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <wBkA2djfIlxl9tHTav_c0175hLKdGvYocJtvAPyGQdw=.64036373-ccda-4ecc-9077-3397db9e2719@github.com>
Message-ID: <vLqIDLo-jlqzmflgxyx2UVYyYE62cj0WI2WhHwc12Kc=.94765398-56cf-434c-886b-22eb7e7884de@github.com>

On Tue, 23 Sep 2025 14:51:10 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update after Roberto's comments. Do not run TestMethodArguments under Xcomp. Further bump TestMethodArguments timeout to 1000 seconds.

Marked as reviewed by rcastanedalo (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-3258397667

From bmaillard at openjdk.org  Tue Sep 23 14:55:18 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Tue, 23 Sep 2025 14:55:18 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v7]
In-Reply-To: <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <6G3nK9S5d9u3_esVm6W6hXK3QTDEzTscMlmWPmtp4yU=.21c1289b-83a5-485e-83ad-b30646dfbb89@github.com>
Message-ID: <HnZ96p1AeMxAzSFDdZY4vol3YVcvgXYLiu1e-8Jc35E=.3a423a5b-a096-48d4-a1d5-7014c2e4ca62@github.com>

On Tue, 16 Sep 2025 09:11:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   More comments
>
> Thanks for working on this @benoitmaillard !
> 
> And thanks for all the explanations.
> 
> It seems the missing Phi at the OuterStripMinedLoop are a decision that implies that Stores will just sort of "hang" between loop exit and SafePoint. That is now the new "invariant". Fine for now, but we may want to reconsider adding the Phi for the OuterStripMinedLoop eventually.
> 
> I have read through the PR, and was a little confused about names, so bear with my comments ? 
> 
> On the algo level I was wondering if it is possible to have a chain of stores between the exit and SafePoint? Do you have such examples?

@eme64 Thanks a lot for your detailed comments, this is really helpful. I have tried to address all of them, let me you what you think once you get the chance.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27225#issuecomment-3324371206

From vlivanov at openjdk.org  Tue Sep 23 19:17:10 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Tue, 23 Sep 2025 19:17:10 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v14]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <7jsfljWuvc_f50TXMXT5W7hb-3zO1CCnmmNCNkTxIe4=.2fa89a5a-3c84-438f-b467-5a8be8fa36f2@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:

  scalarization support

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25315/files
  - new: https://git.openjdk.org/jdk/pull/25315/files/68150cc6..15fee72c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=13
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=12-13

  Stats: 49 lines in 4 files changed: 45 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From hgreule at openjdk.org  Tue Sep 23 21:07:00 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Tue, 23 Sep 2025 21:07:00 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v3]
In-Reply-To: <TjIJ8kANFmXd5c4CRKd2yndTcblECPfgiuXM3lT4pS0=.cf5fda40-7cef-4f25-a74c-4f7a9899dfa2@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
 <TjIJ8kANFmXd5c4CRKd2yndTcblECPfgiuXM3lT4pS0=.cf5fda40-7cef-4f25-a74c-4f7a9899dfa2@github.com>
Message-ID: <sHPYPAQwLUPuullEK4PKAe_Ph7z0oOORzlz_wF04eZA=.4e93153e-63f1-4b42-b955-15a1d60ab21f@github.com>

On Tue, 23 Sep 2025 14:12:40 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   move test
>
> src/hotspot/share/opto/divnode.cpp line 1209:
> 
>> 1207:   if (t2 == Type::TOP) { return Type::TOP; }
>> 1208: 
>> 1209:   // Mod by zero?  Throw exception at runtime!
> 
> The comment is a bit confusing. It's not the node itself which produces the exception, but a dominating zero check (inserted during parsing). So, if a divisor becomes 0, it means the node is effectively dead and can go away.  
> 
> Also, the node should go away anyway as part of CFG pruning of dead branches when corresponding guard goes away. 
> 
> BTW if there are cases when control is not eliminated, it may irrevocably break the IR causing crashes down the road (take a look at JDK-8154831 as an example). So, maybe it's safer to just rely on dead control pruning to eliminate effectively dead ModI/ModL nodes and assert that there are no effectively dead ModI/ModL nodes present after GVN pass is over.

The comment comes from the original code before my change in #25254, where that path also returned `POS` but that wasn't monotonic with my changes anymore.

> So, if a divisor becomes 0, it means the node is effectively dead and can go away.

I think this check mostly comes down to CCP. We need to return *something* for a zero divisor, and that something has to be monotonic with subsequent wider inputs.

If you agree with that observation, I can change the comment to better reflect what's going on, e.g., `Mod by zero can be observed in PhaseCCP, return TOP to ensure monotonic results` (I'm open for other suggestions).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27408#discussion_r2373447161

From duke at openjdk.org  Wed Sep 24 01:31:22 2025
From: duke at openjdk.org (erifan)
Date: Wed, 24 Sep 2025 01:31:22 GMT
Subject: RFR: 8367391: Loss of precision on implicit conversion in
 vectornode.cpp
In-Reply-To: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
References: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
Message-ID: <qL1Ib3DdKJjSfDsebCAH97tuvFhECGCZjB7HMPtqyWM=.a88329a6-c77b-47c1-b253-5055a1cb2919@github.com>

On Tue, 23 Sep 2025 10:03:09 GMT, erifan <duke at openjdk.org> wrote:

> There is an issue with fix against JDK-8356760 on windows-x64 related to following lines
> 
>   long mask = (-1ULL >> (64 - vlen));
>   long bit  = type->get_con() & mask;
> 
> 
> `-1ULL` is an unsigned **64-bit** value; on Linux/macOS-x64, `long` is **64** bits, but on Windows-x64 it?s **32** bits. When assigning `-1ULL >> (64 - vlen)` to a `long` on Windows-x64, the **64-bit** result is truncated to **32** bits, causing precision loss as the upper 32 bits are discarded.
> 
> This pull request addresses the issue by replacing the `long` type with `jlong`. The fix has been verified on a Windows x64 machine with avx-512 support and resolves the reported problem.

Thanks for your review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27449#issuecomment-3326105945

From duke at openjdk.org  Wed Sep 24 01:31:22 2025
From: duke at openjdk.org (duke)
Date: Wed, 24 Sep 2025 01:31:22 GMT
Subject: RFR: 8367391: Loss of precision on implicit conversion in
 vectornode.cpp
In-Reply-To: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
References: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
Message-ID: <oXwNntf9Fx-Qn3Q2zLHkWE4n9-s_Kb_OW6p-lok2cPQ=.2853e9e8-995c-4332-a2ab-e0589af7e522@github.com>

On Tue, 23 Sep 2025 10:03:09 GMT, erifan <duke at openjdk.org> wrote:

> There is an issue with fix against JDK-8356760 on windows-x64 related to following lines
> 
>   long mask = (-1ULL >> (64 - vlen));
>   long bit  = type->get_con() & mask;
> 
> 
> `-1ULL` is an unsigned **64-bit** value; on Linux/macOS-x64, `long` is **64** bits, but on Windows-x64 it?s **32** bits. When assigning `-1ULL >> (64 - vlen)` to a `long` on Windows-x64, the **64-bit** result is truncated to **32** bits, causing precision loss as the upper 32 bits are discarded.
> 
> This pull request addresses the issue by replacing the `long` type with `jlong`. The fix has been verified on a Windows x64 machine with avx-512 support and resolves the reported problem.

@erifan 
Your change (at version 67bc7f4c2bcc7df4cceee3b485f142124c72f9d2) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27449#issuecomment-3326107567

From duke at openjdk.org  Wed Sep 24 01:38:25 2025
From: duke at openjdk.org (erifan)
Date: Wed, 24 Sep 2025 01:38:25 GMT
Subject: Integrated: 8367391: Loss of precision on implicit conversion in
 vectornode.cpp
In-Reply-To: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
References: <pDqLghIz44U2SLx5QaJfV6aK-tt9OoKx2vCmc3L2-WE=.0b282db5-dd33-4871-a0cf-33283eff024b@github.com>
Message-ID: <O7Yl9PKPplyF0rP4GzWsx9f-YLXmOMmMECliPBJJ8jY=.dfdedd83-9cfe-43c0-a255-bdadb8e0b42d@github.com>

On Tue, 23 Sep 2025 10:03:09 GMT, erifan <duke at openjdk.org> wrote:

> There is an issue with fix against JDK-8356760 on windows-x64 related to following lines
> 
>   long mask = (-1ULL >> (64 - vlen));
>   long bit  = type->get_con() & mask;
> 
> 
> `-1ULL` is an unsigned **64-bit** value; on Linux/macOS-x64, `long` is **64** bits, but on Windows-x64 it?s **32** bits. When assigning `-1ULL >> (64 - vlen)` to a `long` on Windows-x64, the **64-bit** result is truncated to **32** bits, causing precision loss as the upper 32 bits are discarded.
> 
> This pull request addresses the issue by replacing the `long` type with `jlong`. The fix has been verified on a Windows x64 machine with avx-512 support and resolves the reported problem.

This pull request has now been integrated.

Changeset: 528f93f8
Author:    erifan <erfang at nvidia.com>
Committer: Xiaohong Gong <xgong at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/528f93f8cb9f1fb9c19f31ab80c8a546f47beed2
Stats:     33 lines in 2 files changed: 6 ins; 0 del; 27 mod

8367391: Loss of precision on implicit conversion in vectornode.cpp

Reviewed-by: chagedorn, roland

-------------

PR: https://git.openjdk.org/jdk/pull/27449

From dlong at openjdk.org  Wed Sep 24 02:52:46 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 24 Sep 2025 02:52:46 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v14]
In-Reply-To: <7jsfljWuvc_f50TXMXT5W7hb-3zO1CCnmmNCNkTxIe4=.2fa89a5a-3c84-438f-b467-5a8be8fa36f2@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <7jsfljWuvc_f50TXMXT5W7hb-3zO1CCnmmNCNkTxIe4=.2fa89a5a-3c84-438f-b467-5a8be8fa36f2@github.com>
Message-ID: <yuabmVq92ZQp1Ma41_NqJKsoLt9Z-kRN-3U57nt8sRo=.37156021-8a4f-4261-9221-4ef3a3f8b45c@github.com>

On Tue, 23 Sep 2025 19:17:10 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This PR introduces C2 support for `Reference.reachabilityFence()`.
>> 
>> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
>> 
>> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
>> 
>> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
>> 
>> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
>> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
>> 
>> Testing:
>> - [x] hs-tier1 - hs-tier8
>> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
>> - [x] java/lang/foreign microbenchmarks
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   scalarization support

src/hotspot/share/opto/escape.cpp line 1230:

> 1228:     SafePointNode* sfpt = safepoints.at(spi)->as_SafePoint();
> 1229:     JVMState *jvms      = sfpt->jvms();
> 1230:     uint merge_idx      = (sfpt->req() - jvms->scloff());

The use of `sfpt->req()` looks wrong here, if `sfpt` still has non-debug edges.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2373908504

From dlong at openjdk.org  Wed Sep 24 02:55:40 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 24 Sep 2025 02:55:40 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v14]
In-Reply-To: <7jsfljWuvc_f50TXMXT5W7hb-3zO1CCnmmNCNkTxIe4=.2fa89a5a-3c84-438f-b467-5a8be8fa36f2@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <7jsfljWuvc_f50TXMXT5W7hb-3zO1CCnmmNCNkTxIe4=.2fa89a5a-3c84-438f-b467-5a8be8fa36f2@github.com>
Message-ID: <ToGm4EZeX-x8lkgIEgPEr9I4c8RyVUqY4J7FwlPlLTQ=.436cfc33-6b95-4728-a047-3376912909b5@github.com>

On Tue, 23 Sep 2025 19:17:10 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This PR introduces C2 support for `Reference.reachabilityFence()`.
>> 
>> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
>> 
>> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
>> 
>> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
>> 
>> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
>> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
>> 
>> Testing:
>> - [x] hs-tier1 - hs-tier8
>> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
>> - [x] java/lang/foreign microbenchmarks
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   scalarization support

src/hotspot/share/opto/escape.cpp line 1248:

> 1246:     sfpt->add_req(nsr_merge_pointer);
> 1247:     sfpt->add_req(selector);
> 1248:     sfpt->jvms()->set_endoff(sfpt->req());

This seems like a subtle change that deserves a comment.  Is it changing the behavior?  Now these two edges are considered part of scalar/debug edges, when before they weren't?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2373911923

From mhaessig at openjdk.org  Wed Sep 24 08:35:06 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 24 Sep 2025 08:35:06 GMT
Subject: RFR: 8364757: Missing Store nodes caused by bad wiring in
 PhaseIdealLoop::insert_post_loop [v7]
In-Reply-To: <7pLx5T3gRd__Q-IHE0FvyVFwElhbQZRh0y2CJ9v9-v8=.f69d8910-31ba-4728-9b6e-7947247c2785@github.com>
References: <tj-ST3-S5Lj8H0pyjgsbhrPzGDXXpwl7p75M8lR2Nbk=.f4119e47-837b-4802-ac89-cec929b0033e@github.com>
 <7pLx5T3gRd__Q-IHE0FvyVFwElhbQZRh0y2CJ9v9-v8=.f69d8910-31ba-4728-9b6e-7947247c2785@github.com>
Message-ID: <GUrKOVEONnmRIwNHHmv6DY7xvuGJ8Nk8NMzooQ0rdoA=.4b15908e-c98a-4e91-baba-0964c50b8e11@github.com>

On Tue, 23 Sep 2025 14:52:19 GMT, Beno?t Maillard <bmaillard at openjdk.org> wrote:

>> This PR introduces a fix for wrong results caused by missing `Store` nodes in C2 IR due to incorrect wiring in `PhaseIdealLoop::insert_post_loop`.
>> 
>> ### Context
>> 
>> The issue was initially found by the fuzzer. After some trial and error, and with the help of @chhagedorn I was able to reduce the reproducer to something very simple. After being compiled by C2, the execution of the following method led to the last statement (`x = 0`) to be ignored:
>> 
>> 
>>     static public void test() {
>>         x = 0;
>>         for (int i = 0; i < 20000; i++) {
>>             x += i;
>>         }
>>         x = 0;
>>     }
>> 
>> 
>> After some investigation and discussions with @robcasloz and @chhagedorn, it appeared that this issue is linked to how safepoints are inserted into long running loops, causing the loop to be transformed into a nested loop with an `OuterStripMinedLoop` node. `Store` node are moved out of the inner loop when encountering this pattern, and the associated `Phi` nodes are removed in order to avoid inhibiting loop optimizations taking place later. This was initially adressed in [JDK-8356708](https://bugs.openjdk.org/browse/JDK-8356708) by making the necessary corrections in macro expansion. As explained in the next section, this is not enough here as macro expansion happens too late.
>> 
>> This PR aims at addressing the specific case of the wrong wiring of `Store` nodes in _post_ loops, but on the longer term further investigations into the missing `Phi` node issue are necessary, as they are likely to cause other issues (cf. related JBS issues).
>> 
>> ### Detailed Analysis
>> 
>> In `PhaseIdealLoop::create_outer_strip_mined_loop`, a simple `CountedLoop` is turned into a nested loop with an `OuterStripMinedLoop`. The body of the initial loop remains in the inner loop, but the safepoint is moved to the outer loop. Later, we attempt to move `Store` nodes after the inner loop in `PhaseIdealLoop::try_move_store_after_loop`.  When the `Store` node is moved to the outer loop, we also get rid of its input `Phi` node in order not to confuse loop optimizations happening later.
>> 
>> This only becomes a problem in `PhaseIdealLoop::insert_post_loop`, where we clone the body of the inner/outer loop for the iterations remaining after unrolling. There, we use `Phi` nodes to do the necessary rewiring between the original body and the cloned one. Because we do not have `Phi` nodes for the moved `Store` nodes, their memory inputs may end up being incorrect.
>> 
>> This is wh...
>
> Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision:
> 
>   More comments

Thank you for working on this and for the clear analysis of this tricky issue, @benoitmaillard!

Your solution seems good, but I have a few coding suggestions below.

src/hotspot/share/opto/loopTransform.cpp line 1672:

> 1670: 
> 1671: Node* PhaseIdealLoop::find_last_store_in_outer_loop(Node* store, IdealLoopTree* outer_loop) {
> 1672:   Node* out = store;

Since you want a store, you should probably assert that `store` is not null and actually a store.

src/hotspot/share/opto/loopTransform.cpp line 1694:

> 1692:     }
> 1693:     out = unique_next;
> 1694:   }

I found the loop a bit hard to read. Below is a proposal for a restructured loop. If you like it, take it, otherwise leave it.

Suggestion:

  Node* unique_next = store;
  do {
    out = unique_next;
    for (DUIterator_Fast imax, l = out->fast_outs(imax); l < imax; l++) {
      Node* use = out->fast_out(l);
      if (use->is_Mem() && use->in(MemNode::Memory) == out) {
        IdealLoopTree* use_loop = get_loop(get_ctrl(use));
        if (outer_loop->is_member(use_loop)) {
          assert(unique_next == out, "memory node should only have one usage in the loop body");
          unique_next = use;
        }
      }
    }
  } while (out != unique_next);

src/hotspot/share/opto/loopnode.hpp line 1384:

> 1382: 
> 1383:   // Find the last store in the body of an OuterStripMinedLoop when following memory uses
> 1384:   Node *find_last_store_in_outer_loop(Node* store, IdealLoopTree* outer_loop);

If I am not mistaken, this could be `const` since you are only using `PhaseIdealLoop::get_loop()`, which is also `const`.
Suggestion:

  Node *find_last_store_in_outer_loop(Node* store, IdealLoopTree* outer_loop) const;

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27225#pullrequestreview-3261328829
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2374645703
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2374958370
PR Review Comment: https://git.openjdk.org/jdk/pull/27225#discussion_r2374676449

From mhaessig at openjdk.org  Wed Sep 24 08:56:53 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 24 Sep 2025 08:56:53 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
Message-ID: <G3C4Cfsug5riamAXsXtQ8OylCemyospqibf3SdZN8_s=.d9ec3c8b-cd5d-4f7b-9248-1cc1f4b704b9@github.com>

On Mon, 22 Sep 2025 03:16:29 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> We noticed that compiler/vectorapi/VectorMaskCompareNotTest.java fails when running on sg2042.
> On RISC-V without RVV, ofLargestShape(long.class) falls back to 64 bits (see getMaxVectorBitSize in VectorShape.java),
> leading to VectorShape.forBitSize(32) which is unsupported and throws IllegalArgumentException.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskCompareNotTest.java on sg2042

Thank you for fixing this, @DingliZhang. Your change looks good, but let me just kick off some testing on our side to ensure the test still runs on other platforms. I'll get back to you as soon as the results are in.

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27414#pullrequestreview-3261867172

From mbaesken at openjdk.org  Wed Sep 24 09:24:34 2025
From: mbaesken at openjdk.org (Matthias Baesken)
Date: Wed, 24 Sep 2025 09:24:34 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
In-Reply-To: <xytvQB_dSP5xe_coKl2GxxxF_0wxL4t_bkGSA6N7S9E=.5b1ab6ce-05a5-44a8-9935-a3157e159041@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <xytvQB_dSP5xe_coKl2GxxxF_0wxL4t_bkGSA6N7S9E=.5b1ab6ce-05a5-44a8-9935-a3157e159041@github.com>
Message-ID: <BhjvxMMKh9lLZY63BkjH0ccRnjCvAHns7S2j12QMpyU=.2a036654-6a83-4c9a-8778-76974c8d7ada@github.com>

On Mon, 22 Sep 2025 06:48:45 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> I'll test this in our CI to see if this fixes the linux aarch64 issues (observed when running Test java/foreign/TestUpcallStress.java ) .

Unfortunately we still see an assert in the test   java/foreign/TestUpcallStress on Linux aarch64 .
But this time it is not the 'old' one but

`#  assert(oopDesc::is_oop(obj)) failed: not an oop: 0x0000000000000001`

Maybe it is unrelated, not sure .

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27408#issuecomment-3327392422

From mli at openjdk.org  Wed Sep 24 09:57:21 2025
From: mli at openjdk.org (Hamlin Li)
Date: Wed, 24 Sep 2025 09:57:21 GMT
Subject: RFR: 8368525: nmethod ic cleanup
Message-ID: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>

Hi,
Can you help to review this simple patch?

There is some unused parameter and unnecessary/misleading method in nmethod.cpp, better to clean it up.

Thanks!

-------------

Commit messages:
 - initial commit
 - Merge branch 'openjdk:master' into master
 - Merge branch 'openjdk:master' into master
 - Merge branch 'openjdk:master' into master
 - Merge branch 'openjdk:master' into master
 - initial commit

Changes: https://git.openjdk.org/jdk/pull/27464/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27464&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8368525
  Stats: 10 lines in 1 file changed: 0 ins; 6 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/27464.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27464/head:pull/27464

PR: https://git.openjdk.org/jdk/pull/27464

From xgong at openjdk.org  Wed Sep 24 09:59:11 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Wed, 24 Sep 2025 09:59:11 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <mJ83rN71NdOfHjFP9bpFosqBeBf220ODeE36Bt6wBmA=.74e33425-bda6-41ff-81ac-880210e98c3b@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
 <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>
 <P1FNs23o3qks_15w5YJCBfiwLMs1QW_aBI2KSkptKZ4=.83c7e3b3-15fe-47a0-86dc-e1549af59e20@github.com>
 <mJ83rN71NdOfHjFP9bpFosqBeBf220ODeE36Bt6wBmA=.74e33425-bda6-41ff-81ac-880210e98c3b@github.com>
Message-ID: <l_hsWVcB5TbA42zDJMgwquys1I9dGw-BxvjdB4i3ys0=.43acfff8-07c6-4982-ac33-e24a6fc4cbe1@github.com>

On Tue, 9 Sep 2025 07:30:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>>> Have you considered using `2x Cast + Concatenate` instead, and just matching that in the backend? I don't remember how to do the mere Concat, but it should be possible via the `unslice` or some other operation that concatenates two vectors.
>> 
>> Would using `2x Cast + Concatenate` make the IRs and match rule more complex? Mere concatenate would be something like `vector slice` in Vector API.  It concatenates two vectors into one with an index denoting the merging position. And it requires the vector types are the same for two input vectors and the dst vector. Hence, if we want to separate this operation with cast and concatenate, the IRs would be (assume original type of `v1/v2` is `4-int`, the result type should be `8-short`):
>> 1) Narrow two input vectors:
>> `v1 = VectorCast(v1)  (4-short); v2 = VectorCast(v2) (4-short)`. 
>> The vector length are not changed while the element size is half size. Hence the vector length in bytes is half size as well.
>> 2) Resize `v1` and `v2` to double vector length. The higher bits are cleared:
>> `v1 = VectorReinterpret(v1) (8-short); v2 = VectorReinterpret(v2) (8-short)`.
>> 3) Concatenate `v1` and `v2` like slice. The position is the middle of the vector length.
>> `v = VectorSlice(v1, v2, 4)  (8-short)`.
>> 
>> If we want to merging these IRs in backend, would the match rule be more complex? I will take a considering.
>
> I'm not saying I know that this alternative would be better. I'm just worried about having extra IR nodes, and then optimizations are more complex / just don't work because we don't handle all nodes.

Hi @eme64 , I tried my best simplifying the complex IR of `VectorConcatenateAndNarrow`. To make each IR simple enough, it can be splited to IRs with following pattern:

![Screenshot 2025-09-24 163340](https://github.com/user-attachments/assets/b0e3471a-4991-4c9b-8c6f-7df000672a15)

Here I used a new IR named `VectorSliceNode` which corresponds to the Vector API slice operation. And it will be added in C2 by PR https://github.com/openjdk/jdk/pull/24104 in future. However, it seems it's not so easy if I want to optimize such a complex IR pattern into a single SVE instruction (`uzp1`) with match rule. In addition,  the `VectorSlice` accepts the same two inputs, causing the rule cannot be matched because its input node `VectorReinterpret` is not singled used.

Hence, I think we still need to add a new IR. I have two ideas:
1) Add an IR like `VectorSlice`, but it accepts one vector input. It is used to do element lanes shift. 
   ``` 
   e.g. src: abcd efgh   idx: 4         -> dst: efgh 0000
   ```
   This IR may have overlap with `VectorSlice`. So I personally do not bias toward it.
2) Add an IR of `VectorConcatenate`, which is used to concatenate two vectors. The element basic type is not changed, while the vector length is extended to double size.
    ```
    e.g. src1: abcd   src2: efgh     -> dst: efgh abcd
    ```
WDYT?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2375234209

From chagedorn at openjdk.org  Wed Sep 24 11:20:59 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Wed, 24 Sep 2025 11:20:59 GMT
Subject: RFR: 8368525: nmethod ic cleanup
In-Reply-To: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
References: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
Message-ID: <jqdUMtZGeEHelcIeCKpyMtr4DSA8DbUI4EJciDc9r0M=.3169115b-7a8c-423a-86d7-0d022c137986@github.com>

On Wed, 24 Sep 2025 09:50:24 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this simple patch?
> 
> There is some unused parameter and unnecessary/misleading method in nmethod.cpp, better to clean it up.
> I guess it might be a leftover after some previous refactoring? But I did not check further.
> 
> Thanks!

Looks good, thanks for cleaning it up!

src/hotspot/share/code/nmethod.cpp line 871:

> 869:         // If class unloading occurred we first clear ICs where the cached metadata
> 870:         // is referring to an unloaded klass or method.
> 871:         CompiledIC_at(&iter)->clean_metadata();;

Suggestion:

        CompiledIC_at(&iter)->clean_metadata();

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27464#pullrequestreview-3262382473
PR Review Comment: https://git.openjdk.org/jdk/pull/27464#discussion_r2375428238

From rcastanedalo at openjdk.org  Wed Sep 24 11:55:46 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 24 Sep 2025 11:55:46 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <w17qVoqlDsyaXEHj9cmgpZrpTF8DTUQd4Y6GAyO9c8o=.5f185e45-e654-4a8e-8fcd-cf12a794525c@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
 <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>
 <w17qVoqlDsyaXEHj9cmgpZrpTF8DTUQd4Y6GAyO9c8o=.5f185e45-e654-4a8e-8fcd-cf12a794525c@github.com>
Message-ID: <2rgLRKD7peDnD-efre0nNmYy_7xONt3R0jbnQ7Se47Q=.1df361f5-6c9d-4f4a-b93f-fa6fbbdf93a1@github.com>

On Mon, 22 Sep 2025 13:32:13 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> src/hotspot/share/opto/multnode.cpp line 73:
>> 
>>> 71:   };
>>> 72:   return apply_to_projs(filter, which_proj);
>>> 73: }
>> 
>> Consider moving this implementation to `multnode.hpp`, perhaps next to that of `MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj)`,  for consistency.
>
> Isn't it better practice to leave the implementation in the cpp file? It's not always possible because of templates so some of the related methods' implementation is in the hpp file but wouldn't we want to keep that to a minimum?

Fair enough.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2375531130

From mhaessig at openjdk.org  Wed Sep 24 12:00:20 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 24 Sep 2025 12:00:20 GMT
Subject: RFR: 8366461: Remove obsolete method handle invoke logic [v3]
In-Reply-To: <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
References: <LQQer6eHAvGEV6clizLClEdOtBBIO7GCQCzibGcEzL8=.7ec9480c-c660-460d-ab5c-69d4d4a4d03d@github.com>
 <_pqvEs0LIlAc7RjFUwg-bpxS3D2v5U7c6In2sG8XLhQ=.57e3aead-6ac4-4a42-89d2-385d7e6ecedf@github.com>
Message-ID: <OqHedOUnvbvpWP-wxFSQ-tT8vzpkKfndlbBzmpun-Fk=.94ced09e-0ed3-42ca-ace4-14473a97bede@github.com>

On Tue, 2 Sep 2025 20:52:32 GMT, Dean Long <dlong at openjdk.org> wrote:

>> At one time, JSR292 support needed special logic to save and restore SP across method handle instrinsic calls, but that is no longer the case. The only platform that still does the save/restore is arm32, which is no longer necessary. The save/restore can be removed along with related APIs and logic. Note that the arm32 port is largely based on the x86 port, which stopped doing the save/restore in jdk9 ([JDK-8068945](https://bugs.openjdk.org/browse/JDK-8068945)).
>
> Dean Long has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - revert whitespace change
>  - undo debug changes
>  - cleanup

Thank you again for this extensive cleanup. I did another, more thorough, pass and have a few questions and suggestions.

src/hotspot/cpu/arm/arm_32.ad line 436:

> 434:   bool far = (_method == nullptr) ? maybe_far_call(this) : !cache_reachable();
> 435:   return (far ? 3 : 1) * NativeInstruction::instruction_size;
> 436: }

Why do we still need the `instruction_size` offset? Are all static java calls now method handles?

src/hotspot/cpu/arm/frame_arm.cpp line 365:

> 363:       DEBUG_ONLY(verify_deopt_original_pc(sender_nm, _unextended_sp));
> 364:     }
> 365:   }

All of this could be `NOT_PRODUCT` and the method `const` if I did not miss any side effects.

src/hotspot/cpu/arm/frame_arm.hpp line 1:

> 1: /*

Please update the copyright year.

src/hotspot/cpu/arm/register_arm.hpp line 1:

> 1: /*

Please update the copyright year.

src/hotspot/share/code/debugInfoRec.hpp line 1:

> 1: /*

Please update the copyright year.

src/hotspot/share/code/nmethod.inline.hpp line 1:

> 1: /*

Please update the copyright year.

src/hotspot/share/code/pcDesc.hpp line 1:

> 1: /*

Please update the copyright year.

src/hotspot/share/jvmci/jvmciCodeInstaller.hpp line 1:

> 1: /*

Please update the copyright year.

src/hotspot/share/opto/matcher.hpp line 1:

> 1: /*

Please update the copyright year.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/PCDesc.java line 1:

> 1: /*

Please update the copyright year.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java line 1:

> 1: /*

Please update the copyright year.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/riscv64/RISCV64Frame.java line 1:

> 1: /*

Please update the copyright year.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java line 1:

> 1: /*

Please update the copyright year.

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27059#pullrequestreview-3262358336
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375411757
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375419504
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375518959
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375519168
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375519398
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375523797
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375524042
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375524330
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375524675
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375525018
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375525797
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375526227
PR Review Comment: https://git.openjdk.org/jdk/pull/27059#discussion_r2375527000

From rcastanedalo at openjdk.org  Wed Sep 24 12:01:28 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 24 Sep 2025 12:01:28 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <G2Wc1K9r04C8Etypi5QVNMPeIMIbEbcRM8X92EhbQEI=.746fa8d9-179d-4ba8-88c8-73e7d119926e@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
 <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>
 <G2Wc1K9r04C8Etypi5QVNMPeIMIbEbcRM8X92EhbQEI=.746fa8d9-179d-4ba8-88c8-73e7d119926e@github.com>
Message-ID: <o5V-WnXi__QaZhJmasOqHuc13Sq-qnlFH6hA3B1tABg=.4f3c6583-7fbd-4b84-a659-b21808056a19@github.com>

On Mon, 22 Sep 2025 13:26:05 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> test/hotspot/jtreg/compiler/escapeAnalysis/TestIterativeEA.java line 53:
>> 
>>> 51:     analyzer.shouldContain("++++ Eliminated: 26 Allocate");
>>> 52:     analyzer.shouldContain("++++ Eliminated: 51 Allocate");
>>> 53:     analyzer.shouldContain("++++ Eliminated: 84 Allocate");
>> 
>> Did you analyze why there are more allocations removed than before in this test case? I did not expect this changeset to have an effect on the number of removed allocations.
>
> There are not more allocations removed. The message is confusing.
> "Eliminated: 84 Allocate" logs that node number 84 was eliminated (and not 84 nodes).
> This patch changes the number of nodes required at allocations so it also has an impact on node numbering.

I see, thanks. Expecting specific C2 node identifiers seems fragile. I understand it is a pre-existing issue, but since this changeset needs to address it anyway, please consider making it more robust by e.g. using regular expression matching. Here is a suggestion, feel free to incorporate it: https://github.com/openjdk/jdk/commit/9fd6378156187e497b1e4233d57282cad9ede29f. The ultimately improvement would be using the IR test framework, but that is out of scope here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2375544305

From mhaessig at openjdk.org  Wed Sep 24 12:18:34 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 24 Sep 2025 12:18:34 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
Message-ID: <wi_YXp8Al6w9RYYfKBO0GxzN-H_CAbcmaVbrEu8SEWs=.500a853b-0ee0-4376-9397-7d2cc7a4a6c2@github.com>

On Mon, 22 Sep 2025 03:16:29 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> We noticed that compiler/vectorapi/VectorMaskCompareNotTest.java fails when running on sg2042.
> On RISC-V without RVV, ofLargestShape(long.class) falls back to 64 bits (see getMaxVectorBitSize in VectorShape.java),
> leading to VectorShape.forBitSize(32) which is unsupported and throws IllegalArgumentException.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskCompareNotTest.java on sg2042

Testing passed.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27414#pullrequestreview-3262608568

From rcastanedalo at openjdk.org  Wed Sep 24 12:23:00 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 24 Sep 2025 12:23:00 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v12]
In-Reply-To: <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <S83l4ZBZILBlhRpSoVORBxzNNB9bF5xN-QhRWc16_T4=.6d5c1624-5d77-4b3e-9db0-0e01e6b1b36b@github.com>
 <Q5oX9YdsBPrpdbXle-aox01GBSZGDZXRrIvWaN4r2zs=.49445183-c538-4ea2-bfdb-964c4898137f@github.com>
Message-ID: <wICOo_JWo5evWww9S5JhIcDXDBaTWbjk3tLS7TGQaKs=.7a83fb5c-db96-42c8-9f23-644c032c8ec3@github.com>

On Fri, 19 Sep 2025 12:55:55 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

>> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
>> 
>>  - more
>>  - Merge branch 'master' into JDK-8327963
>>  - more
>>  - more
>>  - Merge branch 'master' into JDK-8327963
>>  - more
>>  - more
>>  - lambda return
>>  - lambda clean up
>>  - Merge branch 'master' into JDK-8327963
>>  - ... and 35 more: https://git.openjdk.org/jdk/compare/e16c5100...b701d03e
>
> src/hotspot/share/opto/macro.cpp line 1606:
> 
>> 1604:       // elimination. Simply add the MemBarStoreStore after object
>> 1605:       // initialization.
>> 1606:       MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw);
> 
> Does the same argument as below apply for relaxing the scope of this memory barrier? Please clarify in a similar comment for this case (if the same argument applies, a reference to the comment below would be enough).

Thanks for adding the comment. A follow-up question: the full comment below makes the argument that _re-ordering by the compiler can't happen by construction_ because _a later Store that publishes the just allocated object reference is indirectly control dependent on the Initialize node_. However, in this case, there may be no such Initialize node (`init == nullptr || init->req() < InitializeNode::RawStores`).  I assume the memory barrier relaxation is still OK in this scenario because we cannot have later, publishing stores of the allocated object reference? That is, if there exists such a store then there must necessarily exist an Initialize node? Or is there any other reason I am missing? It would be good to clarify this point in the comment.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2375585788

From rcastanedalo at openjdk.org  Wed Sep 24 12:22:52 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 24 Sep 2025 12:22:52 GMT
Subject: RFR: 8327963: C2: fix construction of memory graph around
 Initialize node to prevent incorrect execution if allocation is removed [v14]
In-Reply-To: <zXDTE10_qJh9J34Y8g-rfTqHPKi17Afgs2aWRW382DY=.170b9116-fec3-4b69-b934-6e30400b5c17@github.com>
References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com>
 <zXDTE10_qJh9J34Y8g-rfTqHPKi17Afgs2aWRW382DY=.170b9116-fec3-4b69-b934-6e30400b5c17@github.com>
Message-ID: <8-Rrpyw2hYDMyFFmFreO9lCQhCIH7oiqxxO3yUeDyI0=.5edf4b3d-6b35-4338-a053-6aba56a95133@github.com>

On Mon, 22 Sep 2025 13:37:54 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> An `Initialize` node for an `Allocate` node is created with a memory
>> `Proj` of adr type raw memory. In order for stores to be captured, the
>> memory state out of the allocation is a `MergeMem` with slices for the
>> various object fields/array element set to the raw memory `Proj` of
>> the `Initialize` node. If `Phi`s need to be created during later
>> transformations from this memory state, The `Phi` for a particular
>> slice gets its adr type from the type of the `Proj` which is raw
>> memory. If during macro expansion, the `Allocate` is found to have no
>> use and so can be removed, the `Proj` out of the `Initialize` is
>> replaced by the memory state on input to the `Allocate`. A `Phi` for
>> some slice for a field of an object will end up with the raw memory
>> state on input to the `Allocate` node. As a result, memory state at
>> the `Phi` is incorrect and incorrect execution can happen.
>> 
>> The fix I propose is, rather than have a single `Proj` for the memory
>> state out of the `Initialize` with adr type raw memory, to use one
>> `Proj` per slice added to the memory state after the `Initalize`. Each
>> of the `Proj` should return the right adr type for its slice. For that
>> I propose having a new type of `Proj`: `NarrowMemProj` that captures
>> the right adr type.
>> 
>> Logic for the construction of the `Allocate`/`Initialize` subgraph is
>> tweaked so the right adr type captured in is own `NarrowMemProj` is
>> added to the memory sugraph. Code that removes an allocation or moves
>> it also has to be changed so it correctly takes the multiple memory
>> projections out of the `Initialize` node into account.
>> 
>> One tricky issue is that when EA split types for a scalar replaceable
>> `Allocate` node:
>> 
>> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>>   with the type of the slices for the allocation
>>   
>> 2- before EA, the memory state for one particular field out of the
>>   `Initialize` node can be used for a `Store` to the just allocated
>>   object or some other. So we can have a chain of `Store`s, some to
>>   the newly allocated object, some to some other objects, all of them
>>   using the state of `NarrowMemProj` out of the `Initialize`. After
>>   split unique types, the `NarrowMemProj` is for the slice of a
>>   particular allocation. So `Store`s to some other objects shouldn't
>>   use that memory state but the memory state before the `Allocate`.
>>   
>> For that, I added logic to update the adr type of `NarrowMemProj`
>> during split uni...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   review

Thanks for addressing my comments, Roland. I have a couple of follow-up questions. I also realized that we need to adjust IGV's custom logic to schedule the new projection nodes more accurately and combine them into their parent nodes when using the "Condense graph" filter. Please consider incorporating the following patch into this changeset: https://github.com/openjdk/jdk/commit/63a536a1f83aaa10b938eff2d25aac3c68ed57a1.

-------------

Changes requested by rcastanedalo (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3262606037

From mli at openjdk.org  Wed Sep 24 12:38:32 2025
From: mli at openjdk.org (Hamlin Li)
Date: Wed, 24 Sep 2025 12:38:32 GMT
Subject: RFR: 8368525: nmethod ic cleanup [v2]
In-Reply-To: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
References: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
Message-ID: <HMsUlnEn4t3RsBNmzZW2uAxHOwnntebvxp390bUUp8g=.1102e0df-0a8e-4ac3-b862-43e542232fd5@github.com>

> Hi,
> Can you help to review this simple patch?
> 
> There is some unused parameter and unnecessary/misleading method in nmethod.cpp, better to clean it up.
> I guess it might be a leftover after some previous refactoring? But I did not check further.
> 
> Thanks!

Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:

  typo

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27464/files
  - new: https://git.openjdk.org/jdk/pull/27464/files/a2043d44..303c0932

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27464&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27464&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27464.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27464/head:pull/27464

PR: https://git.openjdk.org/jdk/pull/27464

From mli at openjdk.org  Wed Sep 24 12:38:35 2025
From: mli at openjdk.org (Hamlin Li)
Date: Wed, 24 Sep 2025 12:38:35 GMT
Subject: RFR: 8368525: nmethod ic cleanup [v2]
In-Reply-To: <jqdUMtZGeEHelcIeCKpyMtr4DSA8DbUI4EJciDc9r0M=.3169115b-7a8c-423a-86d7-0d022c137986@github.com>
References: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
 <jqdUMtZGeEHelcIeCKpyMtr4DSA8DbUI4EJciDc9r0M=.3169115b-7a8c-423a-86d7-0d022c137986@github.com>
Message-ID: <G6sYf9UQ6Ck8hQZIf3B31Jv3VHRd48GmCjfos6iO_Xs=.37768071-003d-4396-ab90-612f18dc601d@github.com>

On Wed, 24 Sep 2025 11:18:41 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

> Looks good, thanks for cleaning it up!

Thank you for having a look!

> src/hotspot/share/code/nmethod.cpp line 871:
> 
>> 869:         // If class unloading occurred we first clear ICs where the cached metadata
>> 870:         // is referring to an unloaded klass or method.
>> 871:         CompiledIC_at(&iter)->clean_metadata();;
> 
> Suggestion:
> 
>         CompiledIC_at(&iter)->clean_metadata();

Thanks for catching, fixed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27464#issuecomment-3328195860
PR Review Comment: https://git.openjdk.org/jdk/pull/27464#discussion_r2375635334

From shade at openjdk.org  Wed Sep 24 13:08:14 2025
From: shade at openjdk.org (Aleksey Shipilev)
Date: Wed, 24 Sep 2025 13:08:14 GMT
Subject: RFR: 8357258: x86: Improve receiver type profiling reliability
 [v3]
In-Reply-To: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
References: <X9RJoDWqk67MlK7P4ACCbzROxLkfEqvU0zDid3zymnc=.7c0086a3-f514-4b0d-8bc0-592bf4a05bba@github.com>
Message-ID: <yB5KaY5VtroZ8-I1Bu5GImcI9ihR3cmVCwaNwC-Wd6A=.4715ee55-5c7c-4476-a917-184f9b1f6175@github.com>

> See the bug for discussion what issues current machinery has. 
> 
> This PR executes the plan outlined in the bug:
>  1. Common the receiver type profiling code in interpreter and C1
>  2. Rewrite receiver type profiling code to only do atomic receiver slot installations
>  3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed 
> 
> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
> 
> Additional testing:
>   - [x] Linux x86_64 server fastdebug, `compiler/`
>   - [x] Linux x86_64 server fastdebug, `all`

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:

 - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
 - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
 - Drop atomic counters
 - Initial version

-------------

Changes: https://git.openjdk.org/jdk/pull/25305/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=02
  Stats: 350 lines in 7 files changed: 135 ins; 196 del; 19 mod
  Patch: https://git.openjdk.org/jdk/pull/25305.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305

PR: https://git.openjdk.org/jdk/pull/25305

From bulasevich at openjdk.org  Wed Sep 24 14:21:08 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Wed, 24 Sep 2025 14:21:08 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <dA5q5rDHHcvEDUI0HISAcNT2EKvQ3Rp_QLWbz_EPEkM=.a05d0dcd-b0a8-4d9b-9a09-7cbba31f8d3e@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

I suggest handling this in two steps:
- In JDK 25 we fix the crash when UseFPUForSpilling is enabled.
- In the next release we prohibit the option softly: if it is set on the command line, the VM prints a warning and resets it to false. Proposed change for the latter:

diff --git a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
index 308deeaf5e2..7702988c11c 100644
--- a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
@@ -621,4 +621,9 @@ void VM_Version::initialize() {
     FLAG_SET_DEFAULT(UseVectorizedHashCodeIntrinsic, true);
   }
+
+  if (UseFPUForSpilling) {
+    warning("UseFPUForSpilling is known to degrade performance on this platform and will be ignored.");
+    FLAG_SET_DEFAULT(UseFPUForSpilling, false);
+  }
 #endif

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3312695352

From rcastanedalo at openjdk.org  Wed Sep 24 14:21:10 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 24 Sep 2025 14:21:10 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <gIbEsRs0odSigjEw7DvOJaGe-fXQ3pAIcX1so4Kb3xg=.f0c6004a-4dfd-45c7-898c-1ed7e7178236@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
 <CC-N4gxN7vPXi8Z7FxD_KJ0SZhsaM9lC2GhZQ2HS1y4=.ec0c7b0a-6c56-4fb8-abb1-01448066cc9a@github.com>
 <gIbEsRs0odSigjEw7DvOJaGe-fXQ3pAIcX1so4Kb3xg=.f0c6004a-4dfd-45c7-898c-1ed7e7178236@github.com>
Message-ID: <YqqTIbd0jkToMej_OxkoEAmQPogW55D3iV8zu_K0T6Y=.819fbc7b-5806-46ab-883d-8b73052c37dd@github.com>

On Thu, 18 Sep 2025 11:57:13 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> > Right, @robcasloz, I started investigating this issue thinking it was something wrong in my own code. Once I realized it was a common issue already assigned, I decided to propose a fix since it looked a bit abandoned. I didn?t mean to bypass your work -- you?re right, I should have contacted you first. Anyway, I?d appreciate your review. Do you think my change is reasonable? If not, let me close this PR and leave it to you.

The changeset looks good to me, let me just run some testing before approval.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3328772823

From aph at openjdk.org  Wed Sep 24 14:45:47 2025
From: aph at openjdk.org (Andrew Haley)
Date: Wed, 24 Sep 2025 14:45:47 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <oKRLpi1M9qhhMd0QBXjQKJXmOmCF10Jf1kQIwS3u-bs=.8f96156f-8395-4d43-a333-b81aac1871b1@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

Marked as reviewed by aph (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27350#pullrequestreview-3263280018

From dlunden at openjdk.org  Wed Sep 24 15:02:14 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 24 Sep 2025 15:02:14 GMT
Subject: RFR: 8325467: Support methods with many arguments in C2 [v33]
In-Reply-To: <wBkA2djfIlxl9tHTav_c0175hLKdGvYocJtvAPyGQdw=.64036373-ccda-4ecc-9077-3397db9e2719@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
 <wBkA2djfIlxl9tHTav_c0175hLKdGvYocJtvAPyGQdw=.64036373-ccda-4ecc-9077-3397db9e2719@github.com>
Message-ID: <PSGrABkyXQ2BySKZ1gJefh3BbDOIKNM1Bcze8zowTSM=.bbe36e5a-6180-4c8a-b6f9-90f799b5fa24@github.com>

On Tue, 23 Sep 2025 14:51:10 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update after Roberto's comments. Do not run TestMethodArguments under Xcomp. Further bump TestMethodArguments timeout to 1000 seconds.

Final testing looks good, so I'm going ahead with the integration now. Thanks everyone for the reviews!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-3329117041

From dlunden at openjdk.org  Wed Sep 24 15:06:05 2025
From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=)
Date: Wed, 24 Sep 2025 15:06:05 GMT
Subject: Integrated: 8325467: Support methods with many arguments in C2
In-Reply-To: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
References: <kb7aig5eMa34tS6IdufO8e2qFxtazLxkpxUwAhiwBqw=.3a484884-69e8-48cf-bcc0-dd35bd65d217@github.com>
Message-ID: <cE0P5mtfKfc8cFgxfEFHvPAljJ-dBSe8EQEL6Ii8xxA=.22461786-f90b-4d64-89a2-5c2968a0c282@github.com>

On Wed, 31 Jul 2024 12:36:38 GMT, Daniel Lund?n <dlunden at openjdk.org> wrote:

> If a method has a large number of parameters, we currently bail out from C2 compilation.
> 
> ### Changeset
> 
> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
> 
> Changes:
> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
> - Remove all `can_represent` checks and bailouts.
> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
> 
> ### Testing
> 
> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it).
> 
> ![c2-regression](https:/...

This pull request has now been integrated.

Changeset: faf6df54
Author:    Daniel Lund?n <dlunden at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/faf6df5462d6c915434128a876e76fa48f7e3599
Stats:     2887 lines in 29 files changed: 2321 ins; 288 del; 278 mod

8325467: Support methods with many arguments in C2

Co-authored-by: Roberto Casta?eda Lozano <rcastanedalo at openjdk.org>
Reviewed-by: rcastanedalo, kvn, epeter

-------------

PR: https://git.openjdk.org/jdk/pull/20404

From mhaessig at openjdk.org  Wed Sep 24 16:05:34 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Wed, 24 Sep 2025 16:05:34 GMT
Subject: RFR: 8350864: C2: verify structural invariants of the Ideal graph
 [v7]
In-Reply-To: <yV7-E0q8AS7c47YiVbZmioeEAn0KTuZU8-zaI1BV-r8=.c7a00f71-dbac-4911-a183-8af53bc9ee4c@github.com>
References: <XuwKaN3NfsAcX_wY3fTzObQsUX-Bp8vgPJkkN9poL2s=.ce1154be-f110-419d-a03a-7bed408bcd32@github.com>
 <yV7-E0q8AS7c47YiVbZmioeEAn0KTuZU8-zaI1BV-r8=.c7a00f71-dbac-4911-a183-8af53bc9ee4c@github.com>
Message-ID: <P_T5OX1J4Euo9MlhXXUOfWVsYKaYvd89CESbmMJhatY=.4c1e7854-40d1-4bbb-8cb1-3d29f8171ca2@github.com>

On Tue, 9 Sep 2025 17:07:40 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
>> 
>> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
>> 
>> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
>> 
>> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
>> 
>> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
>> 
>> 1 failure for node
>>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>> At node
>>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>>   From path:
>>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>>       <-(0)- 210  IfFalse  === 209  [[ 21...
>
> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
> 
>   A better way to make them not debug-only, without very ad-hoc hacking

Thank you for working on this, @marc-chevalier! The verification of the graph structure is incredibly important and I am looking forward to writing my own invariants. I like that you put an emphasis on the quality of the error messages and the readability of the verification code, which is solved really well with the expression based matching you introduced. Still, I have a few comments and suggestions below.

Regarding other comments:

> > IMO it's better to have node-specific invariant checks co-located with corresponding node (as Node::verify() maybe?); it would make it clearer what are the expectations when changing the implementation.
>
> For instance, PhiArity would be a good candidate (about a special kind of node, no context needed, no exception). So, maybe a solution would be to split the checks in two sources.

I think this would be a good idea. Is this doable reasonably simply with your current centralized approach of adding checkers?

> Though it would be good to discuss a bit more how the patterns now look, especially if this becomes something that we do more widely eventually.

IMO, the focus of the pattern matching in this PR is more on providing the best error-reporting rather than a performant implementation that could eventually be used in IGVN. I think it would be detremental to conflate the two at this stage.

src/hotspot/share/opto/graphInvariants.cpp line 172:

> 170:                 new AtSingleOutputOfType(&Node::is_IfTrue, new TruePattern()),
> 171:                 new AtSingleOutputOfType(&Node::is_IfFalse, new TruePattern()))) {
> 172:   }

Suggestion:

                new AtSingleOutputOfType(&Node::is_IfFalse, new TruePattern()))) {}

This is what you did on line 355 and I find it much more readable.

src/hotspot/share/opto/graphInvariants.cpp line 196:

> 194:                     0,
> 195:                     NodeClassIsAndBind(Region, _region_node)))) {
> 196:   }

Suggestion:

                    NodeClassIsAndBind(Region, _region_node)))) {}

src/hotspot/share/opto/graphInvariants.cpp line 275:

> 273: 
> 274: private:
> 275:   static void print_node_list(const Node_List& ctrl_succ, stringStream& ss) {

Perhaps this would be a good addition to `Node_List` as an analog to `Node::dump(suffix, mark, ss, dc)` instead of a private method of some verification class.

src/hotspot/share/opto/graphInvariants.cpp line 327:

> 325:           for (uint i = 0; i < non_null_inputs.size(); ++i) {
> 326:             non_null_inputs.at(i)->dump("\n", false, &ss);
> 327:           }

That's `ControlSuccessor::print_node_list` from above...

src/hotspot/share/opto/graphInvariants.cpp line 452:

> 450:     MultiBranchNode* mb = center->as_MultiBranch();
> 451:     if (mb->required_outcnt() < static_cast<int>(mb->outcnt())) {
> 452:       ss.print_cr("The required_outcnt of a MultiBranch node must be smaller than or equal to its outcnt. But required_outcnt=%d vs. outcnt=%d", mb->required_outcnt(), mb->outcnt());

Suggestion:

      ss.print_cr("The required_outcnt of a MultiBranch node must be smaller than or equal to its outcnt. But required_outcnt=%d vs. outcnt=%u", mb->required_outcnt(), mb->outcnt());

src/hotspot/share/opto/graphInvariants.hpp line 91:

> 89:   bool run() const;
> 90: };
> 91: #endif

Suggestion:

#endif // !PRODUCT

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26362#pullrequestreview-3234827746
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2376113619
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2376114238
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2376061890
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2376089020
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2376193017
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2355766897

From galder at openjdk.org  Wed Sep 24 16:21:39 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Wed, 24 Sep 2025 16:21:39 GMT
Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when
 MaxVectorSize=8
In-Reply-To: <RF-pB-qfXweWi6FuSCZZWnxElKTYN1m6JjP74TDVWqo=.81c23890-ed5e-454e-baab-b6119a3941e8@github.com>
References: <RF-pB-qfXweWi6FuSCZZWnxElKTYN1m6JjP74TDVWqo=.81c23890-ed5e-454e-baab-b6119a3941e8@github.com>
Message-ID: <fVnd20lp7ktkJCn1jfkdPJ_btnf0FTqs5TBJVCToFQI=.2a23d3cc-f75c-464a-9f97-2032579629fb@github.com>

On Mon, 22 Sep 2025 07:39:24 GMT, erifan <duke at openjdk.org> wrote:

> The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes.
> 
> This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher.

Looking at the test, I see that there are other tests that at a glance don't seem to use `I_SPECIES_FOR_CAST`. Shouldn't this limitation be applied to only the tests that do assert that?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27418#pullrequestreview-3263696891

From vlivanov at openjdk.org  Wed Sep 24 18:07:58 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 24 Sep 2025 18:07:58 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <l_hsWVcB5TbA42zDJMgwquys1I9dGw-BxvjdB4i3ys0=.43acfff8-07c6-4982-ac33-e24a6fc4cbe1@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
 <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>
 <P1FNs23o3qks_15w5YJCBfiwLMs1QW_aBI2KSkptKZ4=.83c7e3b3-15fe-47a0-86dc-e1549af59e20@github.com>
 <mJ83rN71NdOfHjFP9bpFosqBeBf220ODeE36Bt6wBmA=.74e33425-bda6-41ff-81ac-880210e98c3b@github.com>
 <l_hsWVcB5TbA42zDJMgwquys1I9dGw-BxvjdB4i3ys0=.43acfff8-07c6-4982-ac33-e24a6fc4cbe1@github.com>
Message-ID: <ek0-OWP6uD4tPufaQ7o6QgOsuP4jd_RmoC19-qnYsNI=.966ffaa2-4d35-4815-aadb-3b5ee78484a5@github.com>

On Wed, 24 Sep 2025 09:54:37 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> I'm not saying I know that this alternative would be better. I'm just worried about having extra IR nodes, and then optimizations are more complex / just don't work because we don't handle all nodes.
>
> Hi @eme64 , I tried my best simplifying the complex IR of `VectorConcatenateAndNarrow`. To make each IR simple enough, it can be splited to IRs with following pattern:
> 
> ![Screenshot 2025-09-24 163340](https://github.com/user-attachments/assets/b0e3471a-4991-4c9b-8c6f-7df000672a15)
> 
> Here I used a new IR named `VectorSliceNode` which corresponds to the Vector API slice operation. And it will be added in C2 by PR https://github.com/openjdk/jdk/pull/24104 in future. However, it seems it's not so easy if I want to optimize such a complex IR pattern into a single SVE instruction (`uzp1`) with match rule. In addition,  the `VectorSlice` accepts the same two inputs, causing the rule cannot be matched because its input node `VectorReinterpret` is not singled used.
> 
> Hence, I think we still need to add a new IR. I have two ideas:
> 1) Add an IR like `VectorSlice`, but it accepts one vector input. It is used to do element lanes shift. 
>    ``` 
>    e.g. src: abcd efgh   idx: 4         -> dst: efgh 0000
>    ```
>    This IR may have overlap with `VectorSlice`. So I personally do not bias toward it.
> 2) Add an IR of `VectorConcatenate`, which is used to concatenate two vectors. The element basic type is not changed, while the vector length is extended to double size.
>     ```
>     e.g. src1: abcd   src2: efgh     -> dst: efgh abcd
>     ```
> WDYT?

I started looking at the PR and it looks appealing to simplify VM intrinsics and lift more code into Java. In other words, subword gather operation can be coded as a composition of operations on int vectors. Have you considered that?

It doesn't solve the problem how to reliably match complex graph into a single instruction through. Matcher favors tree representation, but there are multiple ways to workaround it. Personally, I'd prefer to address it separately.

For now, a dedicated node to concatenate vectors look appropriate (please, note there's existing PackNode et al).
It can be either exposed through VM intrinsic or substituted for a well-known complex IR shape during IGVN (like the one you depicted). The nice thing is it'll uniformly cover all usages irrespective of whether they come from Vector API implementation itself or from user code. 

In the context of Vector API, the plan was to expose generic element rearranges/shuffles through API, but then enable various strength-reductions to optimize well-known/popular shapes. Packing multiple vectors perfectly fits that effort.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2376644298

From vlivanov at openjdk.org  Wed Sep 24 20:30:15 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 24 Sep 2025 20:30:15 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v15]
In-Reply-To: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
Message-ID: <qDhZhPrIo7aY1r7YF4Ja4HrtJ7OlOLBbANKsZXkfCJQ=.55a7682e-aebb-4f2a-9abe-15e73f06c6bf@github.com>

> This PR introduces C2 support for `Reference.reachabilityFence()`.
> 
> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
> 
> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
> 
> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
> 
> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
> 
> Testing:
> - [x] hs-tier1 - hs-tier8
> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
> - [x] java/lang/foreign microbenchmarks

Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits:

 - Merge branch 'master' into 8290892.rf
 - scalarization support
 - Remove comment
 - Add PreserveReachabilityFencesOnConstants test
 - Minor fix
 - minor fixes
 - Fix guaranteed_safepoint usage
 - update
 - update
 - update
 - ... and 12 more: https://git.openjdk.org/jdk/compare/84aa2952...3890119b

-------------

Changes: https://git.openjdk.org/jdk/pull/25315/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=14
  Stats: 1481 lines in 38 files changed: 1428 ins; 16 del; 37 mod
  Patch: https://git.openjdk.org/jdk/pull/25315.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315

PR: https://git.openjdk.org/jdk/pull/25315

From vlivanov at openjdk.org  Wed Sep 24 21:04:02 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Wed, 24 Sep 2025 21:04:02 GMT
Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v14]
In-Reply-To: <yuabmVq92ZQp1Ma41_NqJKsoLt9Z-kRN-3U57nt8sRo=.37156021-8a4f-4261-9221-4ef3a3f8b45c@github.com>
References: <cvY7oXGFUkuBDOVVcBAwv8pV_i7iy53SNQ8xeMvMpYY=.a761b146-16c9-4d1b-9268-902b888a9456@github.com>
 <7jsfljWuvc_f50TXMXT5W7hb-3zO1CCnmmNCNkTxIe4=.2fa89a5a-3c84-438f-b467-5a8be8fa36f2@github.com>
 <yuabmVq92ZQp1Ma41_NqJKsoLt9Z-kRN-3U57nt8sRo=.37156021-8a4f-4261-9221-4ef3a3f8b45c@github.com>
Message-ID: <3P4hBfq-8PSHDMBjwUP5ivBm2d_sA9cRxPPYULN0lWo=.98fa214a-f07d-4dbf-9332-156d31ef96ab@github.com>

On Wed, 24 Sep 2025 02:50:13 GMT, Dean Long <dlong at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   scalarization support
>
> src/hotspot/share/opto/escape.cpp line 1230:
> 
>> 1228:     SafePointNode* sfpt = safepoints.at(spi)->as_SafePoint();
>> 1229:     JVMState *jvms      = sfpt->jvms();
>> 1230:     uint merge_idx      = (sfpt->req() - jvms->scloff());
> 
> The use of `sfpt->req()` looks wrong here, if `sfpt` still has non-debug edges.

Good catch, Dean! Will be fixed in the next update.

> Is it changing the behavior?

It doesn't. The sole purpose of the change is to please the assert in `create_scalarized_object_description()`.

At the end of successful call, `create_scalarized_object_description()` extends debug info to cover all newly added inputs. And the failure is non-recoverable. So, the very first iteration of the loop extends debug info beyond those 2 edges.  

I'll add a comment.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2377036545
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2377068613

From sviswanathan at openjdk.org  Wed Sep 24 22:03:37 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Wed, 24 Sep 2025 22:03:37 GMT
Subject: RFR: 8350468: x86: Improve implementation of vectorized
 numberOfLeadingZeros for int and long
In-Reply-To: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
Message-ID: <cldId-zPgR_5glmAvLLuG-PtrT2je_Dtd8gl2umv-y0=.c989e532-c8ab-4856-b8ff-75848330dee3@github.com>

On Mon, 4 Aug 2025 02:20:31 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> Hi all,
> This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results:
> 
>                                  Baseline                        Patch        
> Benchmark              Mode  Cnt    Score   Error  Units    Score   Error  Units  Improvement
> LeadingZeros.testInt   avgt   15   91.097 ? 3.276  ns/op   68.665 ? 1.740  ns/op  (+ 28.1%)
> LeadingZeros.testLong  avgt   15  342.545 ? 4.470  ns/op  228.668 ? 5.994  ns/op  (+ 39.9%)
> 
> I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated!

The PR looks good to me. Nice improvement. I have two minor comments.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6289:

> 6287:   // Move the top half result to the bottom half of xtmp1, setting the top half to 0.
> 6288:   vpsrlq(xtmp1, dst, 32, vec_enc);
> 6289:   // By moving the top half result to the right by 6 bytes, if the top half was empty (i.e. 32 is returned) the result bit will

I think you mean 6 bits here and not 6 bytes.

test/hotspot/jtreg/compiler/vectorization/TestNumberOfContinuousZeros.java line 49:

> 47: 
> 48: public class TestNumberOfContinuousZeros {
> 49:     private static final int[] SPECIAL_INT = { 0, 0x01FFFFFF, 0x03FFFFFE, 0x07FFFFFC, 0x0FFFFFF8, 0x1FFFFFF0, 0x3FFFFFE0, 0xFFFFFFFF };

Please also update the copyright year for the file to 2025.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26610#pullrequestreview-3259874081
PR Review Comment: https://git.openjdk.org/jdk/pull/26610#discussion_r2373662506
PR Review Comment: https://git.openjdk.org/jdk/pull/26610#discussion_r2377150533

From dlong at openjdk.org  Wed Sep 24 23:14:27 2025
From: dlong at openjdk.org (Dean Long)
Date: Wed, 24 Sep 2025 23:14:27 GMT
Subject: RFR: 8362117: C2:
 compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a
 wrong result due to invalidated liveness assumptions for data phis [v2]
In-Reply-To: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>
References: <qvfOeUiGNqYakTHBhSYMZt1unv88E27bXXswOlWixOQ=.67376c9c-fe60-4263-82ca-4dd02dedad0c@github.com>
 <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com>
Message-ID: <NIlzZFTI98cUCQ-ix2gQwLB25KoiwwEckzVO_UX3rag=.5b4882bd-7a1b-408c-965c-959c7fa21f67@github.com>

On Wed, 3 Sep 2025 08:02:04 GMT, Daniel Skantz <dskantz at openjdk.org> wrote:

>> This PR addresses a wrong compilation during string optimizations.
>> 
>> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2.
>> 
>> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch.
>> 
>> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117.
>> 
>> Testing: T1-3 (aed5952).
>> 
>> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test.
>
> Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - store intermediate calculations
>  - direction convention

I think we could find a more general pattern, like "no conditional code at all", but that might hurt performance.  It seems like we should be able to handle conditional code and this test case in particular.  I don't have a perfect understanding of this code, but if I was going to try fixing it to handle the problematic cases, I would look at StringConcat::eliminate_unneeded_control(), because it seems to be blindingly removing nodes/edges that can still be needed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27028#issuecomment-3330980650

From wenanjian at openjdk.org  Thu Sep 25 02:35:02 2025
From: wenanjian at openjdk.org (Anjian Wen)
Date: Thu, 25 Sep 2025 02:35:02 GMT
Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v10]
In-Reply-To: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
References: <mMLCDK4Ls37Ebs_3LEM4dktuylNraobBebijN1BxZwM=.30730d56-b8ca-46c9-9dc2-1ca215e66ba4@github.com>
Message-ID: <K0F5C6r5FRPRk62leqBPRApjDkY5kQoVq_hjXDHKPkw=.32deca2c-1632-427c-884e-f1eaf1363afe@github.com>

> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed.

Anjian Wen has updated the pull request incrementally with one additional commit since the last revision:

  add assertion and change test

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25281/files
  - new: https://git.openjdk.org/jdk/pull/25281/files/529f7cf8..8b872327

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=09
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=08-09

  Stats: 8 lines in 2 files changed: 2 ins; 6 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/25281.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281

PR: https://git.openjdk.org/jdk/pull/25281

From xgong at openjdk.org  Thu Sep 25 03:16:21 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Thu, 25 Sep 2025 03:16:21 GMT
Subject: RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE
Message-ID: <Cm7FWhlkKC_7UkwS-GWdPqAuDJrr7TXEhDa-6KpvfmI=.e327ed69-ac72-497b-a3d8-c254b2bbd25e@github.com>

The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures.

For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen.

These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures.

This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations.

It also modifies the Vector API jtreg tests for well testing. Here is the details:

1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity:

  VectorMaskToLong (VectorLongToMask l) => l

Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2.

2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2".

Performance shows significant improvement on NVIDIA's Grace CPU.

Here is the performance data with `-XX:UseSVE=2`:

Benchmark                                   bits inputs Mode   Unit     Before       After    Gain
MaskQueryOperationsBenchmark.testToLongByte  128    1  thrpt  ops/ms  322151.976  1318576.736 4.09
MaskQueryOperationsBenchmark.testToLongByte  128    2  thrpt  ops/ms  322187.144  1315736.931 4.08
MaskQueryOperationsBenchmark.testToLongByte  128    3  thrpt  ops/ms  322213.330  1353272.882 4.19
MaskQueryOperationsBenchmark.testToLongInt   128    1  thrpt  ops/ms 1009426.292  1339834.833 1.32
MaskQueryOperationsBenchmark.testToLongInt   128    2  thrpt  ops/ms 1010311.371  1368379.465 1.35
MaskQueryOperationsBenchmark.testToLongInt   128    3  thrpt  ops/ms 1013333.729  1368077.534 1.35
MaskQueryOperationsBenchmark.testToLongLong  128    1  thrpt  ops/ms  892649.449  1301954.698 1.45
MaskQueryOperationsBenchmark.testToLongLong  128    2  thrpt  ops/ms  894593.615  1324922.719 1.48
MaskQueryOperationsBenchmark.testToLongLong  128    3  thrpt  ops/ms  884498.938  1289828.319 1.45
MaskQueryOperationsBenchmark.testToLongShort 128    1  thrpt  ops/ms 1093444.011  1374164.132 1.25
MaskQueryOperationsBenchmark.testToLongShort 128    2  thrpt  ops/ms 1080117.255  1369234.390 1.26
MaskQueryOperationsBenchmark.testToLongShort 128    3  thrpt  ops/ms 1076327.072  1373219.435 1.27


And here is the performance data with `-XX:UseSVE=1`:

Benchmark                                   bits inputs Mode   Unit   Before        After     Gain
MaskQueryOperationsBenchmark.testToLongByte  128    1  thrpt  ops/ms 686584.179   800329.010  1.16
MaskQueryOperationsBenchmark.testToLongByte  128    2  thrpt  ops/ms 686184.083   801754.893  1.16
MaskQueryOperationsBenchmark.testToLongByte  128    3  thrpt  ops/ms 686426.883   799058.199  1.16
MaskQueryOperationsBenchmark.testToLongInt   128    1  thrpt  ops/ms 945359.331  1179824.693  1.24
MaskQueryOperationsBenchmark.testToLongInt   128    2  thrpt  ops/ms 946546.502  1169208.723  1.23
MaskQueryOperationsBenchmark.testToLongInt   128    3  thrpt  ops/ms 943207.037  1176056.895  1.24
MaskQueryOperationsBenchmark.testToLongLong  128    1  thrpt  ops/ms 874121.577  1179473.834  1.34
MaskQueryOperationsBenchmark.testToLongLong  128    2  thrpt  ops/ms 881023.640  1180854.086  1.34
MaskQueryOperationsBenchmark.testToLongLong  128    3  thrpt  ops/ms 880149.334  1160048.226  1.31
MaskQueryOperationsBenchmark.testToLongShort 128    1  thrpt  ops/ms 938451.594  1164668.529  1.24
MaskQueryOperationsBenchmark.testToLongShort 128    2  thrpt  ops/ms 939189.649  1187096.328  1.26
MaskQueryOperationsBenchmark.testToLongShort 128    3  thrpt  ops/ms 938601.147  1181154.558  1.25

-------------

Commit messages:
 - 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE

Changes: https://git.openjdk.org/jdk/pull/27481/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27481&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367292
  Stats: 710 lines in 48 files changed: 355 ins; 79 del; 276 mod
  Patch: https://git.openjdk.org/jdk/pull/27481.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27481/head:pull/27481

PR: https://git.openjdk.org/jdk/pull/27481

From dzhang at openjdk.org  Thu Sep 25 04:34:30 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Thu, 25 Sep 2025 04:34:30 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <cJoNPYLDWCQzkl7IoEdkvoHe1jwW8M67lDRx0DJefIQ=.47d597da-b267-4b03-9390-0910c364e909@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
 <Vg8YsRztlY_-bhe05rpBS39jcLb9vHAGG4kXibYMa7M=.3281cc09-e70f-4cd2-9bbf-698329486546@github.com>
 <cJoNPYLDWCQzkl7IoEdkvoHe1jwW8M67lDRx0DJefIQ=.47d597da-b267-4b03-9390-0910c364e909@github.com>
Message-ID: <fZHER5Q-_hOJz_9euc1xJXNQRz6hj366NLURfp3tW-Y=.825ca905-fd2b-4f44-8afc-f3efa564ba19@github.com>

On Tue, 23 Sep 2025 06:44:46 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hey, I'm wondering if all the tests under hotspot/jtreg/compiler/vectorapi should `@require` rvv? Otherwise seems they are not really testing anything useful?

@Hamlin-Li Sorry, I think I misunderstood you earlier.
Currently, this is the only test case under `hotspot/jtreg/compiler/vectorapi` that fails without RVV, so I plan to merge this PR first and then open a separate issue [JDK-8368602](https://bugs.openjdk.org/browse/JDK-8368602) to discuss the situation you mentioned.
Does that work for you ?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27414#issuecomment-3332098367

From chagedorn at openjdk.org  Thu Sep 25 05:42:16 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Thu, 25 Sep 2025 05:42:16 GMT
Subject: RFR: 8368525: nmethod ic cleanup [v2]
In-Reply-To: <HMsUlnEn4t3RsBNmzZW2uAxHOwnntebvxp390bUUp8g=.1102e0df-0a8e-4ac3-b862-43e542232fd5@github.com>
References: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
 <HMsUlnEn4t3RsBNmzZW2uAxHOwnntebvxp390bUUp8g=.1102e0df-0a8e-4ac3-b862-43e542232fd5@github.com>
Message-ID: <jZLF2ZPq7zihkxCy6zertVN61fU6-dYP8RVkrfnemZk=.86f93485-fe23-49b1-af97-006a3c5b797c@github.com>

On Wed, 24 Sep 2025 12:38:32 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review this simple patch?
>> 
>> There is some unused parameter and unnecessary/misleading method in nmethod.cpp, better to clean it up.
>> I guess it might be a leftover after some previous refactoring? But I did not check further.
>> 
>> Thanks!
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   typo

Marked as reviewed by chagedorn (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27464#pullrequestreview-3265799709

From xgong at openjdk.org  Thu Sep 25 05:47:24 2025
From: xgong at openjdk.org (Xiaohong Gong)
Date: Thu, 25 Sep 2025 05:47:24 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <ek0-OWP6uD4tPufaQ7o6QgOsuP4jd_RmoC19-qnYsNI=.966ffaa2-4d35-4815-aadb-3b5ee78484a5@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
 <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>
 <P1FNs23o3qks_15w5YJCBfiwLMs1QW_aBI2KSkptKZ4=.83c7e3b3-15fe-47a0-86dc-e1549af59e20@github.com>
 <mJ83rN71NdOfHjFP9bpFosqBeBf220ODeE36Bt6wBmA=.74e33425-bda6-41ff-81ac-880210e98c3b@github.com>
 <l_hsWVcB5TbA42zDJMgwquys1I9dGw-BxvjdB4i3ys0=.43acfff8-07c6-4982-ac33-e24a6fc4cbe1@github.com>
 <ek0-OWP6uD4tPufaQ7o6QgOsuP4jd_RmoC19-qnYsNI=.966ffaa2-4d35-4815-aadb-3b5ee78484a5@github.com>
Message-ID: <GNqs63R5tR5m27cXQhQxgqyrGmjDbiuwXNJi_97Hrf0=.54411d88-2f83-44b2-ba54-e43f3f07cd30@github.com>

On Wed, 24 Sep 2025 18:04:50 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> I started looking at the PR and it looks appealing to simplify VM intrinsics and lift more code into Java. In other words, subword gather operation can be coded as a composition of operations on int vectors. Have you considered that?

Thanks so much for looking at this PR! Yes, personally I think we can move these op generation to Java-level for subword gather operation. And I also considered this when I started working at this task. However, this may break current backend implementation for other architectures like X86. I'm not sure whether moving to Java will be also friendly for non-SVE arches. Per my understanding, subword gather depends much more on the backend solution.

 >For now, a dedicated node to concatenate vectors look appropriate (please, note there's existing PackNode et al).
It can be either exposed through VM intrinsic or substituted for a well-known complex IR shape during IGVN (like the one you depicted). The nice thing is it'll uniformly cover all usages irrespective of whether they come from Vector API implementation itself or from user code.
>
>In the context of Vector API, the plan was to expose generic element rearranges/shuffles through API, but then enable various strength-reductions to optimize well-known/popular shapes. Packing multiple vectors perfectly fits that effort.

Thanks for your inputs on the IR choice. I agree with you about adding such a vector concatenate node in C2. And if we decide to move the complex implementation to Java-level, we'd better also add such an API for vector concatenate, right?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2377806425

From duke at openjdk.org  Thu Sep 25 06:46:52 2025
From: duke at openjdk.org (erifan)
Date: Thu, 25 Sep 2025 06:46:52 GMT
Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when
 MaxVectorSize=8
In-Reply-To: <fVnd20lp7ktkJCn1jfkdPJ_btnf0FTqs5TBJVCToFQI=.2a23d3cc-f75c-464a-9f97-2032579629fb@github.com>
References: <RF-pB-qfXweWi6FuSCZZWnxElKTYN1m6JjP74TDVWqo=.81c23890-ed5e-454e-baab-b6119a3941e8@github.com>
 <fVnd20lp7ktkJCn1jfkdPJ_btnf0FTqs5TBJVCToFQI=.2a23d3cc-f75c-464a-9f97-2032579629fb@github.com>
Message-ID: <t_5c0G7oxWo_STsHm7cD69tDbOnO1b89bTfqT97-rHs=.f0074364-8ba8-4b75-8c4f-4a9f8edb7398@github.com>

On Wed, 24 Sep 2025 16:19:25 GMT, Galder Zamarre?o <galder at openjdk.org> wrote:

> Looking at the test, I see that there are other tests that at a glance don't seem to use `I_SPECIES_FOR_CAST`. Shouldn't this limitation be applied to only the tests that do assert that?

Hi @galderz thanks for your input.

Yeah I agree with you that It would be great if we could precisely control that the relevant tests don't run when MaxVectorSize==8, but the current test framework doesn't handle this well. One approach is to introduce a runtime check, but I feel this approach is inelegant and lacks precedent.

Given that the optimizations tested in this test are independent of MaxVectorSize, and that many architectures don't generate some vector nodes when MaxVectorSize==8, so I think excluding the MaxVectorSize==8 case is ok. What do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27418#issuecomment-3332414430

From mhaessig at openjdk.org  Thu Sep 25 06:56:21 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Thu, 25 Sep 2025 06:56:21 GMT
Subject: RFR: 8368525: nmethod ic cleanup [v2]
In-Reply-To: <HMsUlnEn4t3RsBNmzZW2uAxHOwnntebvxp390bUUp8g=.1102e0df-0a8e-4ac3-b862-43e542232fd5@github.com>
References: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
 <HMsUlnEn4t3RsBNmzZW2uAxHOwnntebvxp390bUUp8g=.1102e0df-0a8e-4ac3-b862-43e542232fd5@github.com>
Message-ID: <qN0GtOkjDMJmXrpuavOjgbmg7kzFOHLhdhVf1puBIJ8=.544f4f47-2a73-462b-b519-7440220ac60f@github.com>

On Wed, 24 Sep 2025 12:38:32 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review this simple patch?
>> 
>> There is some unused parameter and unnecessary/misleading method in nmethod.cpp, better to clean it up.
>> I guess it might be a leftover after some previous refactoring? But I did not check further.
>> 
>> Thanks!
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   typo

Thank you for cleaning this up, @Hamlin-Li! This looks good to me.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27464#pullrequestreview-3266043420

From rcastanedalo at openjdk.org  Thu Sep 25 07:40:20 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Thu, 25 Sep 2025 07:40:20 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <tTiM2e_KQWl6qahYN4Ehlebb1JS-Z9HS7CdMxsbC1xI=.b4092927-8be8-4c8c-929a-1d90e8ada482@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

The fix and the test results look good.

-------------

Marked as reviewed by rcastanedalo (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27350#pullrequestreview-3266208316

From mli at openjdk.org  Thu Sep 25 08:13:53 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 25 Sep 2025 08:13:53 GMT
Subject: RFR: 8368525: nmethod ic cleanup [v2]
In-Reply-To: <jZLF2ZPq7zihkxCy6zertVN61fU6-dYP8RVkrfnemZk=.86f93485-fe23-49b1-af97-006a3c5b797c@github.com>
References: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
 <HMsUlnEn4t3RsBNmzZW2uAxHOwnntebvxp390bUUp8g=.1102e0df-0a8e-4ac3-b862-43e542232fd5@github.com>
 <jZLF2ZPq7zihkxCy6zertVN61fU6-dYP8RVkrfnemZk=.86f93485-fe23-49b1-af97-006a3c5b797c@github.com>
Message-ID: <NVNOw_txR0iIK6Mxr86Z6GLD0-R61xWicRA6dXn1eZQ=.46e9ed57-e0e5-4651-8b10-6e38c6fd42e9@github.com>

On Thu, 25 Sep 2025 05:39:16 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   typo
>
> Marked as reviewed by chagedorn (Reviewer).

Thank you @chhagedorn @mhaessig !

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27464#issuecomment-3332729950

From mli at openjdk.org  Thu Sep 25 08:13:55 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 25 Sep 2025 08:13:55 GMT
Subject: Integrated: 8368525: nmethod ic cleanup
In-Reply-To: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
References: <KUTGfmq27Ye-obMm6eyQw1elN2hK-i3y4E5uF4ziCMI=.069f279e-2e5e-4384-95f2-6dd68b28885c@github.com>
Message-ID: <d1X-IdlsxvH3c5xSBoq76t9Ijq241TEgByOtv8MLlnY=.83b4d45c-a57e-46ee-8ed2-ba9aa84762ee@github.com>

On Wed, 24 Sep 2025 09:50:24 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Hi,
> Can you help to review this simple patch?
> 
> There is some unused parameter and unnecessary/misleading method in nmethod.cpp, better to clean it up.
> I guess it might be a leftover after some previous refactoring? But I did not check further.
> 
> Thanks!

This pull request has now been integrated.

Changeset: e6ddb396
Author:    Hamlin Li <mli at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/e6ddb39635cb8b5a21445a50b28aeeebc9e1d9d3
Stats:     10 lines in 1 file changed: 0 ins; 6 del; 4 mod

8368525: nmethod ic cleanup

Reviewed-by: chagedorn, mhaessig

-------------

PR: https://git.openjdk.org/jdk/pull/27464

From mli at openjdk.org  Thu Sep 25 08:19:07 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 25 Sep 2025 08:19:07 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
Message-ID: <NAoIEX4iT6yMyPf_DMZyqoOeJWDt5cVMBY4nKSqydjg=.d46f8e74-fbe1-4bef-ba68-7099ce2c7d47@github.com>

On Mon, 22 Sep 2025 03:16:29 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> We noticed that compiler/vectorapi/VectorMaskCompareNotTest.java fails when running on sg2042.
> On RISC-V without RVV, ofLargestShape(long.class) falls back to 64 bits (see getMaxVectorBitSize in VectorShape.java),
> leading to VectorShape.forBitSize(32) which is unsupported and throws IllegalArgumentException.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskCompareNotTest.java on sg2042

Marked as reviewed by mli (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27414#pullrequestreview-3266394363

From mli at openjdk.org  Thu Sep 25 08:19:08 2025
From: mli at openjdk.org (Hamlin Li)
Date: Thu, 25 Sep 2025 08:19:08 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <fZHER5Q-_hOJz_9euc1xJXNQRz6hj366NLURfp3tW-Y=.825ca905-fd2b-4f44-8afc-f3efa564ba19@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
 <Vg8YsRztlY_-bhe05rpBS39jcLb9vHAGG4kXibYMa7M=.3281cc09-e70f-4cd2-9bbf-698329486546@github.com>
 <cJoNPYLDWCQzkl7IoEdkvoHe1jwW8M67lDRx0DJefIQ=.47d597da-b267-4b03-9390-0910c364e909@github.com>
 <fZHER5Q-_hOJz_9euc1xJXNQRz6hj366NLURfp3tW-Y=.825ca905-fd2b-4f44-8afc-f3efa564ba19@github.com>
Message-ID: <e2Fo04vuD-ezva9RtdefjjKxro6vyTKMEOk_kwRJfok=.8fcdff23-0c8a-4234-b569-e98441901cc1@github.com>

On Thu, 25 Sep 2025 04:31:55 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

>>> Hey, I'm wondering if all the tests under hotspot/jtreg/compiler/vectorapi should `@require` rvv? Otherwise seems they are not really testing anything useful?
>> 
>> @Hamlin-Li Good question! I think almost all IR related tests need RVV. Some non-IR tests can use scalars to implement vectorapi, such as `compiler/vectorapi/TestVectorShuffleIotaByte.java`, which also passes on sg2042 (without RVV).
>
>> Hey, I'm wondering if all the tests under hotspot/jtreg/compiler/vectorapi should `@require` rvv? Otherwise seems they are not really testing anything useful?
> 
> @Hamlin-Li Sorry, I think I misunderstood you earlier.
> Currently, this is the only test case under `hotspot/jtreg/compiler/vectorapi` that fails without RVV, so I plan to merge this PR first and then open a separate issue [JDK-8368602](https://bugs.openjdk.org/browse/JDK-8368602) to discuss the situation you mentioned.
> Does that work for you ?

@DingliZhang Thank you. The current PR looks good.

Not sure what to do with other vectorapi tests, but it's good to have a look if you got time. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27414#issuecomment-3332760654

From duke at openjdk.org  Thu Sep 25 08:49:47 2025
From: duke at openjdk.org (Yuri Gaevsky)
Date: Thu, 25 Sep 2025 08:49:47 GMT
Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v28]
In-Reply-To: <kPh3dXb-xnnjPpbfqTpLZ-veI7xdJSVP1DXAjd-gRJk=.3f697579-afd3-480c-b937-f5400e5d3883@github.com>
References: <zLCHjD8YiUNz4lIXlaVpeUlxRFNkExCwc-3lwUl2lVw=.72822718-a2b3-49e0-b3cb-ca1c803cbb4f@github.com>
 <kPh3dXb-xnnjPpbfqTpLZ-veI7xdJSVP1DXAjd-gRJk=.3f697579-afd3-480c-b937-f5400e5d3883@github.com>
Message-ID: <JYlb9XBTF_RIi2siWnBij0s9aWkmV-ovIbk1dpjtARI=.a9d6e310-4b60-42a5-b4c1-18e04aa5f7a0@github.com>

On Mon, 18 Aug 2025 08:37:06 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware.
>> 
>> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0.
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
> 
>   - minor updates requested by reviewer

kindly reminder ...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3332914366

From duke at openjdk.org  Thu Sep 25 09:17:16 2025
From: duke at openjdk.org (erifan)
Date: Thu, 25 Sep 2025 09:17:16 GMT
Subject: RFR: 8303762: Optimize vector slice operation with constant index
 using VPALIGNR instruction [v8]
In-Reply-To: <ledEIa5Cj9jGqJfRfGBCIF6_nvGBw5lG8j2ksYwePs8=.6dd826a7-c935-40be-afea-bd8ed146560f@github.com>
References: <oHhjicRldNzRK9vWNQFhpglJ-yICPm9ZgXH7VdwKaug=.8c5ec00b-7b29-4719-9d89-7dcbb28f6c79@github.com>
 <ledEIa5Cj9jGqJfRfGBCIF6_nvGBw5lG8j2ksYwePs8=.6dd826a7-c935-40be-afea-bd8ed146560f@github.com>
Message-ID: <yMkNR3nPhpeTop0AxSZMW7f7sIHMYcEvDtwV86IFUVc=.cbbd64f8-ee7a-4dbd-8252-9ab24e39f5ae@github.com>

On Wed, 20 Aug 2025 10:11:47 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction.
>> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails.
>> 
>>  Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java).
>> 
>> Vector API jtreg tests pass at AVX level 2, remaining validation in progress.
>> 
>> Performance numbers:
>> 
>> 
>> System : 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                                (size)   Mode  Cnt      Score   Error   Units
>> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1     1024  thrpt    2   9444.444          ops/ms
>> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2     1024  thrpt    2  10009.319          ops/ms
>> VectorSliceBenchmark.byteVectorSliceWithVariableIndex      1024  thrpt    2   9081.926          ops/ms
>> VectorSliceBenchmark.intVectorSliceWithConstantIndex1      1024  thrpt    2   6085.825          ops/ms
>> VectorSliceBenchmark.intVectorSliceWithConstantIndex2      1024  thrpt    2   6505.378          ops/ms
>> VectorSliceBenchmark.intVectorSliceWithVariableIndex       1024  thrpt    2   6204.489          ops/ms
>> VectorSliceBenchmark.longVectorSliceWithConstantIndex1     1024  thrpt    2   1651.334          ops/ms
>> VectorSliceBenchmark.longVectorSliceWithConstantIndex2     1024  thrpt    2   1642.784          ops/ms
>> VectorSliceBenchmark.longVectorSliceWithVariableIndex      1024  thrpt    2   1474.808          ops/ms
>> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1    1024  thrpt    2  10399.394          ops/ms
>> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2    1024  thrpt    2  10502.894          ops/ms
>> VectorSliceB...
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update callGenerator.hpp copyright year

src/hotspot/share/classfile/vmIntrinsics.hpp line 1178:

> 1176:                                    "Ljdk/internal/vm/vector/VectorSupport$Vector;"                                                             \
> 1177:                                    "Ljdk/internal/vm/vector/VectorSupport$VectorSliceOp;)"                                                     \
> 1178:                                    "Ljdk/internal/vm/vector/VectorSupport$Vector;")                                                     \

Seems this `` is not aligned ?

src/hotspot/share/classfile/vmIntrinsics.hpp line 1179:

> 1177:                                    "Ljdk/internal/vm/vector/VectorSupport$VectorSliceOp;)"                                                     \
> 1178:                                    "Ljdk/internal/vm/vector/VectorSupport$Vector;")                                                     \
> 1179:    do_name(vector_slice_name, "sliceOp")                                                                                                         \

ditto

test/hotspot/jtreg/compiler/vectorapi/TestSliceOptValueTransforms.java line 45:

> 43:     public static final VectorSpecies<Short> SSP = ShortVector.SPECIES_PREFERRED;
> 44:     public static final VectorSpecies<Integer> ISP = IntVector.SPECIES_PREFERRED;
> 45:     public static final VectorSpecies<Long> LSP = LongVector.SPECIES_PREFERRED;

The implementation supports floating point types, but why doesn't the test include fp types?

test/hotspot/jtreg/compiler/vectorapi/TestSliceOptValueTransforms.java line 122:

> 120:                       .intoArray(bdst, i);
> 121:         }
> 122:     }

Since this optimization also benefits the slice variant with mask, could you add some tests for it as well?

test/micro/org/openjdk/bench/jdk/incubator/vector/VectorSliceBenchmark.java line 59:

> 57:     static final VectorSpecies<Short> sspecies   = ShortVector.SPECIES_PREFERRED;
> 58:     static final VectorSpecies<Integer> ispecies = IntVector.SPECIES_PREFERRED;
> 59:     static final VectorSpecies<Long> lspecies    = LongVector.SPECIES_PREFERRED;

Ditto, no fp types ?

test/micro/org/openjdk/bench/jdk/incubator/vector/VectorSliceBenchmark.java line 133:

> 131:                       .intoArray(bdst, i);
> 132:         }
> 133:     }

Ditto, add a benchmark for the slice variant with mask ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378092410
PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378093047
PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378310217
PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378337340
PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378312763
PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378342519

From dzhang at openjdk.org  Thu Sep 25 10:10:32 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Thu, 25 Sep 2025 10:10:32 GMT
Subject: RFR: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
Message-ID: <qikZ-5eEEee73i0P4wkWmfk_68QMOkgEPIkHI2iTzbQ=.28c3fac3-85c9-4a2d-bac6-da250af3d39e@github.com>

On Mon, 22 Sep 2025 03:16:29 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> We noticed that compiler/vectorapi/VectorMaskCompareNotTest.java fails when running on sg2042.
> On RISC-V without RVV, ofLargestShape(long.class) falls back to 64 bits (see getMaxVectorBitSize in VectorShape.java),
> leading to VectorShape.forBitSize(32) which is unsupported and throws IllegalArgumentException.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskCompareNotTest.java on sg2042

Thanks all for the review!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27414#issuecomment-3333219107

From dzhang at openjdk.org  Thu Sep 25 10:10:34 2025
From: dzhang at openjdk.org (Dingli Zhang)
Date: Thu, 25 Sep 2025 10:10:34 GMT
Subject: Integrated: 8368206: RISC-V:
 compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without
 RVV
In-Reply-To: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
References: <zeLrBLLs95EZvueIFs073vLvuX2LSn6XX51lOZtxh64=.8ed10e1e-08df-49d3-b987-48349f07eeb0@github.com>
Message-ID: <sy9yzMMhhs9h6F6FO8pllcFUFeVvUE12EaYLMvQWbx4=.806db115-5ba7-4519-af41-9d6d6bd48de9@github.com>

On Mon, 22 Sep 2025 03:16:29 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi,
> Can you help to review this patch? Thanks!
> 
> We noticed that compiler/vectorapi/VectorMaskCompareNotTest.java fails when running on sg2042.
> On RISC-V without RVV, ofLargestShape(long.class) falls back to 64 bits (see getMaxVectorBitSize in VectorShape.java),
> leading to VectorShape.forBitSize(32) which is unsupported and throws IllegalArgumentException.
> 
> ### Test (fastdebug)
> - [x] Run compiler/vectorapi/VectorMaskCompareNotTest.java on sg2042

This pull request has now been integrated.

Changeset: 67cb53d0
Author:    Dingli Zhang <dzhang at openjdk.org>
Committer: Fei Yang <fyang at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/67cb53d0888adfeb2909296e21d0532bc3643326
Stats:     1 line in 1 file changed: 1 ins; 0 del; 0 mod

8368206: RISC-V: compiler/vectorapi/VectorMaskCompareNotTest.java fails when running without RVV

Reviewed-by: fyang, mhaessig, mli

-------------

PR: https://git.openjdk.org/jdk/pull/27414

From rehn at openjdk.org  Thu Sep 25 11:30:07 2025
From: rehn at openjdk.org (Robbin Ehn)
Date: Thu, 25 Sep 2025 11:30:07 GMT
Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v28]
In-Reply-To: <JYlb9XBTF_RIi2siWnBij0s9aWkmV-ovIbk1dpjtARI=.a9d6e310-4b60-42a5-b4c1-18e04aa5f7a0@github.com>
References: <zLCHjD8YiUNz4lIXlaVpeUlxRFNkExCwc-3lwUl2lVw=.72822718-a2b3-49e0-b3cb-ca1c803cbb4f@github.com>
 <kPh3dXb-xnnjPpbfqTpLZ-veI7xdJSVP1DXAjd-gRJk=.3f697579-afd3-480c-b937-f5400e5d3883@github.com>
 <JYlb9XBTF_RIi2siWnBij0s9aWkmV-ovIbk1dpjtARI=.a9d6e310-4b60-42a5-b4c1-18e04aa5f7a0@github.com>
Message-ID: <r2TQQzhri52nuJ3SS64Zpd5XpqaAels7jSVptr49BXk=.da14e211-10a2-4e8d-8248-a9c5c11e423f@github.com>

On Thu, 25 Sep 2025 08:46:14 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

> kindly reminder ...

Sorry, I have been very busy and is, thanks for the reminder!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3333497749

From duke at openjdk.org  Thu Sep 25 12:48:47 2025
From: duke at openjdk.org (Yuri Gaevsky)
Date: Thu, 25 Sep 2025 12:48:47 GMT
Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v28]
In-Reply-To: <kPh3dXb-xnnjPpbfqTpLZ-veI7xdJSVP1DXAjd-gRJk=.3f697579-afd3-480c-b937-f5400e5d3883@github.com>
References: <zLCHjD8YiUNz4lIXlaVpeUlxRFNkExCwc-3lwUl2lVw=.72822718-a2b3-49e0-b3cb-ca1c803cbb4f@github.com>
 <kPh3dXb-xnnjPpbfqTpLZ-veI7xdJSVP1DXAjd-gRJk=.3f697579-afd3-480c-b937-f5400e5d3883@github.com>
Message-ID: <HvDu6X-88FWwR1IkQdvR5pA8oLCFYSE_liDrcspuQ3I=.08f1abe2-8944-4227-b646-92de7b56d55f@github.com>

On Mon, 18 Aug 2025 08:37:06 GMT, Yuri Gaevsky <duke at openjdk.org> wrote:

>> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware.
>> 
>> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0.
>
> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision:
> 
>   - minor updates requested by reviewer

Oh, no rush at all, do it at your convinience! Thanks...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-3333763304

From bulasevich at openjdk.org  Thu Sep 25 13:31:35 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Thu, 25 Sep 2025 13:31:35 GMT
Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift
 exponent 100 is too large for 32-bit type 'unsigned int' [v3]
In-Reply-To: <NZwq8Kkwxohsja0ieCwyDN7OxhyqajxijdrgPcNAg8g=.38231d0f-9125-45b7-be7b-066de2b82f41@github.com>
References: <uKUByg7RkOyLsGYoajinrOf76Uu00PIJ-fBeWOKVNcI=.1d4fc3ed-2fd3-454c-9fa4-af97fc676b48@github.com>
 <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com>
 <cG2zLnsmi15Rz63eNkvKyfUKn2MRXYXDkv2XoicRdgU=.58e81cbd-10d8-4c72-8e13-4ad4d445d7f4@github.com>
 <NZwq8Kkwxohsja0ieCwyDN7OxhyqajxijdrgPcNAg8g=.38231d0f-9125-45b7-be7b-066de2b82f41@github.com>
Message-ID: <_Fj7QK00LVOaxn93CUKj_B9EsvqGs1JhNLCCrJ-odUc=.61708167-afde-42c5-8194-ab3e6e0b80b7@github.com>

On Tue, 26 Aug 2025 10:51:43 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

> Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.

Thanks, @dean-long

Along with capping the possible max shift in the generated ad files, I also changed the hard-coded artificially large latency value from 100 to 30 ? it?s still sufficiently large but stays within the bit-width range. Do you think this form is OK for integration, or should we refine it further?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3334057450

From bulasevich at openjdk.org  Thu Sep 25 13:35:53 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Thu, 25 Sep 2025 13:35:53 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <WCK7wq3o14zo6xmh79P3sMSOUendPaKgunvdl8mfbFQ=.66ad6075-bb7a-4844-8d35-bf1e8eda84ec@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

Good. Thanks for reviewing!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3334091938

From bulasevich at openjdk.org  Thu Sep 25 13:38:15 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Thu, 25 Sep 2025 13:38:15 GMT
Subject: Integrated: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <I6QlmTUDze0GTcN16lEllmiSQwXn8z7Di0anmf5ctgM=.36782076-eb8a-44d4-af83-4820a898ae2a@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

This pull request has now been integrated.

Changeset: 2b451131
Author:    Boris Ulasevich <bulasevich at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/2b451131a57dc7080c4ccb77d6cb5a96ee24d891
Stats:     5 lines in 1 file changed: 4 ins; 0 del; 1 mod

8359378: aarch64: crash when using -XX:+UseFPUForSpilling

Reviewed-by: aph, rcastanedalo

-------------

PR: https://git.openjdk.org/jdk/pull/27350

From jbhateja at openjdk.org  Thu Sep 25 16:01:05 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Thu, 25 Sep 2025 16:01:05 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits
In-Reply-To: <o6g1ThW4JWbqZyNKK_r51cAcF5yaYx9bBEeST44uT8k=.e6596448-3dbb-4fc8-a061-50fb37f3d843@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <o6g1ThW4JWbqZyNKK_r51cAcF5yaYx9bBEeST44uT8k=.e6596448-3dbb-4fc8-a061-50fb37f3d843@github.com>
Message-ID: <68xIuXbjAp7T-KmruiU9wzLuRAz9L4HeDqZ6h8R9sTw=.5b7f21ec-61d8-499f-bd5b-f32ff31b9bfc@github.com>

On Thu, 4 Sep 2025 06:26:36 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> The change looks good, but I wonder:
> 
> - if it makes sense to have some kind of IR tests (i.e., it's folded away when unneeded, when the input is a constant, ...)?
> - whether the explanation could be simplified: Assuming a correct implementation of the KnownBits canonicalization, we can argue
> 	- `_zeroes` has the bits set that are known to be always 0. So `BitsPer<Type> - popCount(x)` gives you an upper limit of how many bits *might* be 1. And `BitsPer<Type> - popCount(_zeroes)` is equivalent to `popCount(~_zeroes)`.
> 	- `_ones` has the bits set that are known to be always 1. Trivially, `popCount(_ones)` is a valid lower bound.
> 	- The rest repeats how `adjust_bits_from_unsigned_bounds` works, but that's not specific to the popcount nodes.

Hi @SirYwell , @chhagedorn , @eme64 , I have addressed your comments. Let me know if this is good to land in.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3334870778

From liach at openjdk.org  Thu Sep 25 16:45:06 2025
From: liach at openjdk.org (Chen Liang)
Date: Thu, 25 Sep 2025 16:45:06 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits
In-Reply-To: <68xIuXbjAp7T-KmruiU9wzLuRAz9L4HeDqZ6h8R9sTw=.5b7f21ec-61d8-499f-bd5b-f32ff31b9bfc@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <o6g1ThW4JWbqZyNKK_r51cAcF5yaYx9bBEeST44uT8k=.e6596448-3dbb-4fc8-a061-50fb37f3d843@github.com>
 <68xIuXbjAp7T-KmruiU9wzLuRAz9L4HeDqZ6h8R9sTw=.5b7f21ec-61d8-499f-bd5b-f32ff31b9bfc@github.com>
Message-ID: <niTFyq3ZywC5bNQDPRGDrlaaxmQHRuUUv8NUV_F3HeA=.3fa3f127-120e-4aef-ba05-cc5b380c88e6@github.com>

On Thu, 25 Sep 2025 15:58:27 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> The change looks good, but I wonder:
>> 
>> - if it makes sense to have some kind of IR tests (i.e., it's folded away when unneeded, when the input is a constant, ...)?
>> - whether the explanation could be simplified: Assuming a correct implementation of the KnownBits canonicalization, we can argue
>> 	- `_zeroes` has the bits set that are known to be always 0. So `BitsPer<Type> - popCount(x)` gives you an upper limit of how many bits *might* be 1. And `BitsPer<Type> - popCount(_zeroes)` is equivalent to `popCount(~_zeroes)`.
>> 	- `_ones` has the bits set that are known to be always 1. Trivially, `popCount(_ones)` is a valid lower bound.
>> 	- The rest repeats how `adjust_bits_from_unsigned_bounds` works, but that's not specific to the popcount nodes.
>
> Hi @SirYwell , @chhagedorn , @eme64 , I have addressed your comments. Let me know if this is good to land in.

Hi @jatin-bhateja, sorry for a spurious comment but I wish to ask about the status of [lworld+vector](https://github.com/openjdk/valhalla/tree/lworld%2Bvector) in project Valhalla - It makes use of Unsafe.makePrivateBuffer and finishPrivateBuffer, which is outdated in the current Value Objects model (the larval bit will be gone).  I just wonder if I can proceed with the removal here https://github.com/openjdk/valhalla/pull/1593, or if I should keep these legacy APIs for further vector work. (FYI we can probably migrate to Unsafe.allocateInstance to do the same, as in Method handles)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3335032730

From hgreule at openjdk.org  Thu Sep 25 17:56:15 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Thu, 25 Sep 2025 17:56:15 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v11]
In-Reply-To: <gIml69-bcFkT1V5ug3zT4fKrJBIv1lE8HDSZdas7Qgo=.a4e1f67b-086a-4f22-96aa-d5f4bd5a8a9d@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <gIml69-bcFkT1V5ug3zT4fKrJBIv1lE8HDSZdas7Qgo=.a4e1f67b-086a-4f22-96aa-d5f4bd5a8a9d@github.com>
Message-ID: <GqbS2RA3cfVN-hd5wBJJ1cycyHXdwnojk7fpq-TZmbQ=.802ea387-ceb4-4de7-8687-232f7f5a7dea@github.com>

On Fri, 19 Sep 2025 20:44:54 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update countbitsnode.cpp

Looks good to me now, although I'm not exactly sure about the semantics of widen and whether `Type::WidenMax` is the right choice here. Someone else has to look at that.

Thanks for the work!

-------------

Marked as reviewed by hgreule (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27075#pullrequestreview-3268869427

From sparasa at openjdk.org  Thu Sep 25 19:18:35 2025
From: sparasa at openjdk.org (Srinivas Vamsi Parasa)
Date: Thu, 25 Sep 2025 19:18:35 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v16]
In-Reply-To: <GFVrKRIgcLl23x-KBrS8RiTNuR2VC9aWqblxrhzrIbw=.a46690ad-e2d3-44db-81df-1c98f011e6a8@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <GFVrKRIgcLl23x-KBrS8RiTNuR2VC9aWqblxrhzrIbw=.a46690ad-e2d3-44db-81df-1c98f011e6a8@github.com>
Message-ID: <tWsTUwWx1In_p2tlr12rzmnh3Uz7n07EowgJnAhQBow=.92ec903d-ed02-4cdc-8266-9b5b5f89acb9@github.com>

On Sat, 20 Sep 2025 00:04:43 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Use compiler generator instead of standard Java streams

The code changes to enable AVX10.2 saturated FP conversion look good to me and I also independently ran and verified that the tests are passing. 

Thanks,
Srinivas Vamsi Parasa

-------------

Marked as reviewed by sparasa (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26919#pullrequestreview-3269139531

From missa at openjdk.org  Thu Sep 25 19:34:36 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 25 Sep 2025 19:34:36 GMT
Subject: RFR: 8360558: Use hex literals instead of decimal literals in math
 intrinsic constants
Message-ID: <JqJDah4V1rq9XBUq5WxLd_fnn1CjYCFpBlh2NadSFgI=.969a9af0-0d3f-414b-a3db-8f17e5e6164d@github.com>

A simple change to use hex literals instead of decimal literals in the constant arrays of the x86 cbrt and tanh stubs. The JTREG tests listed below were used to verify correctness. The baseline build used is [OpenJDK v26-b17](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B17).

1. `jtreg:test/jdk/java/lang/Math/CubeRootTests.java`
2. `jtreg:test/jdk/java/lang/Math/HyperbolicTests.java`

-------------

Commit messages:
 - Convert x86 cbrt and tanh static constants from decimal literals to hex literals

Changes: https://git.openjdk.org/jdk/pull/27497/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27497&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8360558
  Stats: 272 lines in 2 files changed: 0 ins; 0 del; 272 mod
  Patch: https://git.openjdk.org/jdk/pull/27497.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27497/head:pull/27497

PR: https://git.openjdk.org/jdk/pull/27497

From dlong at openjdk.org  Thu Sep 25 21:15:31 2025
From: dlong at openjdk.org (Dean Long)
Date: Thu, 25 Sep 2025 21:15:31 GMT
Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift
 exponent 100 is too large for 32-bit type 'unsigned int' [v3]
In-Reply-To: <NZwq8Kkwxohsja0ieCwyDN7OxhyqajxijdrgPcNAg8g=.38231d0f-9125-45b7-be7b-066de2b82f41@github.com>
References: <uKUByg7RkOyLsGYoajinrOf76Uu00PIJ-fBeWOKVNcI=.1d4fc3ed-2fd3-454c-9fa4-af97fc676b48@github.com>
 <-m2kLcudWsrunonBZQcUx_JfBOKYe3gsnAbwC4eHGGI=.9da84c19-c8fb-428a-979a-d18c8769ea6c@github.com>
 <cG2zLnsmi15Rz63eNkvKyfUKn2MRXYXDkv2XoicRdgU=.58e81cbd-10d8-4c72-8e13-4ad4d445d7f4@github.com>
 <NZwq8Kkwxohsja0ieCwyDN7OxhyqajxijdrgPcNAg8g=.38231d0f-9125-45b7-be7b-066de2b82f41@github.com>
Message-ID: <xqFLQvKhgyZ3gwPedkj3xqAzjK2rfmr_RMaBrVqdpek=.4e8c251d-b36c-42bc-ba79-449e2504a6a0@github.com>

On Tue, 26 Aug 2025 10:51:43 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.
>
>> Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.
> 
> When I looked into this earlier I thought the obvious thing needed to fix this was to reassign all the latencies so they represented a realizable pipeline delay. A proper fix would sensibly require each latency to be less than the pipeline length declared in the CPU model -- which for most arches is much less than 32. However, I didn't suggest such a rationalization because I believed (perhaps wrongly) that the latencies were also used to pick a preferred choice when we have alternative instruction/operand rule matches. The selection process involves comparing the cumulative latencies for subgraph nodes against the latency of each node defined by a match rule for the subgraph and picking the lowest latency result. After looking at some of the rules I was not sure that it would be easy to reduce all current latencies so they lie in the range 0-31 and still guarantee the current selection order. It would be even harder when the range was correctly reduced to 0 - lengthof(pipeline).
> 
> I don't even think most rule authors understand that the latencies are used by the pipeline model and instead they simply use latency as a weight to enforce orderings. That's certainly how I understood it until I ran into this issue. If so then perhaps we would be better sticking with the de facto use and fixing the shift issue with a maximum shift bound. The mask tests which rely on this shift count may help with deriving scheduling delays for some instructions with small latencies but I don't believe it is very reliable even in cases where the accumulated shifts lie within the 32 bit range. If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so.

Capping the max value was initially my preferred solution, but that was before @adinn, who has looked into this more than me, explained his reservations with that approach.  I don't think we know for sure that capping the value won't cause a change in behavior.  I think the safest solution is this:

> We could preserve the large latencies for now, and let them trigger the _maxcycleused > 32 code for more platforms.

In other words, we already have code that can handle larger bit masks and shifts, but it is not being enabled when needed because we aren't computing _maxcycleused correctly (it doesn't take fixed_latency into account).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3335962602

From liach at openjdk.org  Thu Sep 25 21:27:45 2025
From: liach at openjdk.org (Chen Liang)
Date: Thu, 25 Sep 2025 21:27:45 GMT
Subject: RFR: 8365205: C2: Optimize popcount value computation using
 knownbits [v11]
In-Reply-To: <gIml69-bcFkT1V5ug3zT4fKrJBIv1lE8HDSZdas7Qgo=.a4e1f67b-086a-4f22-96aa-d5f4bd5a8a9d@github.com>
References: <TRqk-W4Y6V__UK59PvyxxHIVBKHmXlbTYhbnJxGC7PY=.8b79d76c-6346-4567-8d59-6d8f3e0fb1ac@github.com>
 <gIml69-bcFkT1V5ug3zT4fKrJBIv1lE8HDSZdas7Qgo=.a4e1f67b-086a-4f22-96aa-d5f4bd5a8a9d@github.com>
Message-ID: <_dGJWd6T9XyISpZ2F23n8SvsCBt-29zaEQ-VXOE5w2I=.6bec03cb-a042-46fc-a26b-2e54ca461881@github.com>

On Fri, 19 Sep 2025 20:44:54 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update countbitsnode.cpp

A suggestion, we might do this as a later RFE.

src/hotspot/share/opto/countbitsnode.cpp line 124:

> 122: // From the definition of KnownBits, we know:
> 123: //   zeros: Indicates which bits must be 0: zeros[i]=1 -> t[i]=0
> 124: //   ones:  Indicates which bits must be 1: ones[i]=1 -> t[i]=1

I don't think we should duplicate the information available in rangeinference.hpp - we should enhance the documentation there, such as noting `~zeros` is the number where the maximum number of bits are set to 1, to help comprehension.

If we know `ones` is `min_ones` and `~zeros` is `max_ones`, we can easily derive that `pop_count(ones) <= pop_count(t) <= pop_count(~zeros)`

-------------

PR Review: https://git.openjdk.org/jdk/pull/27075#pullrequestreview-3269507322
PR Review Comment: https://git.openjdk.org/jdk/pull/27075#discussion_r2380355747

From vlivanov at openjdk.org  Thu Sep 25 21:28:43 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 25 Sep 2025 21:28:43 GMT
Subject: RFR: 8350468: x86: Improve implementation of vectorized
 numberOfLeadingZeros for int and long
In-Reply-To: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
References: <_YfxynUy7BxtFo15BZG2bdhrUaCkIPSC6l8fTAVyJE8=.ffc4ce38-d9bd-4ab7-b5dd-0ffd847d5c2d@github.com>
Message-ID: <ShBLBYqM4Gr-VbgPX2kuBwRTjnZ2UvunkZSKdV1IBlY=.6b593002-aaa0-4c19-915b-88fd7c94bf89@github.com>

On Mon, 4 Aug 2025 02:20:31 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> Hi all,
> This is a patch that optimizes the x86 backend implementation of `CountLeadingZerosV` for int and long. In the review of [JDK-8349637)](https://bugs.openjdk.org/browse/JDK-8349637) an [optimized algorithm]( https://github.com/openjdk/jdk/pull/23579#issuecomment-2661332497) was proposed by @rgiulietti, which this PR implements. For integer operands, the optimized algorithm reduces the number of vector instructions from 19 to 13. The same algorithm does not work for long operands, however, since avx2 lacks a vectorized long->double conversion instruction. Instead, I found an optimized algorithm to reuse the code for int and compute the leading zeros for long with only 4 additional instructions. I added a benchmark and on my Zen 3 machine I get these results:
> 
>                                  Baseline                        Patch        
> Benchmark              Mode  Cnt    Score   Error  Units    Score   Error  Units  Improvement
> LeadingZeros.testInt   avgt   15   91.097 ? 3.276  ns/op   68.665 ? 1.740  ns/op  (+ 28.1%)
> LeadingZeros.testLong  avgt   15  342.545 ? 4.470  ns/op  228.668 ? 5.994  ns/op  (+ 39.9%)
> 
> I've updated the unit tests to more thoroughly test longs and they pass on my machine. Thoughts and reviews would be appreciated!

Test results are clean.

-------------

Marked as reviewed by vlivanov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26610#pullrequestreview-3269516902

From vlivanov at openjdk.org  Thu Sep 25 21:42:32 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 25 Sep 2025 21:42:32 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <GNqs63R5tR5m27cXQhQxgqyrGmjDbiuwXNJi_97Hrf0=.54411d88-2f83-44b2-ba54-e43f3f07cd30@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
 <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>
 <P1FNs23o3qks_15w5YJCBfiwLMs1QW_aBI2KSkptKZ4=.83c7e3b3-15fe-47a0-86dc-e1549af59e20@github.com>
 <mJ83rN71NdOfHjFP9bpFosqBeBf220ODeE36Bt6wBmA=.74e33425-bda6-41ff-81ac-880210e98c3b@github.com>
 <l_hsWVcB5TbA42zDJMgwquys1I9dGw-BxvjdB4i3ys0=.43acfff8-07c6-4982-ac33-e24a6fc4cbe1@github.com>
 <ek0-OWP6uD4tPufaQ7o6QgOsuP4jd_RmoC19-qnYsNI=.966ffaa2-4d35-4815-aadb-3b5ee78484a5@github.com>
 <GNqs63R5tR5m27cXQhQxgqyrGmjDbiuwXNJi_97Hrf0=.54411d88-2f83-44b2-ba54-e43f3f07cd30@github.com>
Message-ID: <2P4CwjNjmwwWUKtNkWq9jk8fONLyr6Xg9i-g6gd6PSg=.85313b30-d390-4477-92fe-2e5701253107@github.com>

On Thu, 25 Sep 2025 05:44:49 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> if we decide to move the complex implementation to Java-level, we'd better also add such an API for vector concatenate, right?

There's already generic shuffle operation present (rearrange). But there're precedents when more specific operations became part of the API for convenience reasons (e.g., slice/unslice). So, a dedicated operation for vector concatenation may be well-justified.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2380386172

From vlivanov at openjdk.org  Thu Sep 25 22:04:08 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 25 Sep 2025 22:04:08 GMT
Subject: RFR: 8351623: VectorAPI: Add SVE implementation of subword gather
 load operation [v5]
In-Reply-To: <2P4CwjNjmwwWUKtNkWq9jk8fONLyr6Xg9i-g6gd6PSg=.85313b30-d390-4477-92fe-2e5701253107@github.com>
References: <rUEz8UheUmVNYOSMK23ihVdoKuE6QFT3VDZq10zXTYY=.d0073db7-ca68-477b-9aa6-32304ab93f53@github.com>
 <sikeN2K9EHmLEEnGsNttJSsPefvEtrKk9q7OOIgqQG0=.238662f7-394c-475b-bd07-effb28302b2c@github.com>
 <LeM6JS1sHVJMB2Ziw2pYgcTQr7hHcPUSK6guAV5DCC0=.f3de7008-b13c-4a96-bc85-b50784bb21ce@github.com>
 <wBLVOTyKQkrpbBB04sxxxgZ0wl7A8_HuC2K-3s8j4wQ=.22931e86-0381-46ea-8d34-5020542e063c@github.com>
 <0lnaxN7YsQEddGZfWLgFi2YOl_XtXntDoHRr57Bjp7k=.946b3e40-04c1-4eb5-a205-53347cdc91eb@github.com>
 <P1FNs23o3qks_15w5YJCBfiwLMs1QW_aBI2KSkptKZ4=.83c7e3b3-15fe-47a0-86dc-e1549af59e20@github.com>
 <mJ83rN71NdOfHjFP9bpFosqBeBf220ODeE36Bt6wBmA=.74e33425-bda6-41ff-81ac-880210e98c3b@github.com>
 <l_hsWVcB5TbA42zDJMgwquys1I9dGw-BxvjdB4i3ys0=.43acfff8-07c6-4982-ac33-e24a6fc4cbe1@github.com>
 <ek0-OWP6uD4tPufaQ7o6QgOsuP4jd_RmoC19-qnYsNI=.966ffaa2-4d35-4815-aadb-3b5ee78484a5@github.com>
 <GNqs63R5tR5m27cXQhQxgqyrGmjDbiuwXNJi_97Hrf0=.54411d88-2f83-44b2-ba54-e43f3f07cd30@github.com>
 <2P4CwjNjmwwWUKtNkWq9jk8fONLyr6Xg9i-
 g6gd6PSg=.85313b30-d390-4477-92fe-2e5701253107@github.com>
Message-ID: <yWrvIkVouZrpcG39VxTSxMKJ__v155E0Ut07viMxF1M=.564a01d7-fab5-4f5a-8d99-e07ef694c4d0@github.com>

On Thu, 25 Sep 2025 21:39:56 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>>> I started looking at the PR and it looks appealing to simplify VM intrinsics and lift more code into Java. In other words, subword gather operation can be coded as a composition of operations on int vectors. Have you considered that?
>> 
>> Thanks so much for looking at this PR! Yes, personally I think we can move these op generation to Java-level for subword gather operation. And I also considered this when I started working at this task. However, this may break current backend implementation for other architectures like X86. I'm not sure whether moving to Java will be also friendly for non-SVE arches. Per my understanding, subword gather depends much more on the backend solution.
>> 
>>  >For now, a dedicated node to concatenate vectors look appropriate (please, note there's existing PackNode et al).
>> It can be either exposed through VM intrinsic or substituted for a well-known complex IR shape during IGVN (like the one you depicted). The nice thing is it'll uniformly cover all usages irrespective of whether they come from Vector API implementation itself or from user code.
>>>
>>>In the context of Vector API, the plan was to expose generic element rearranges/shuffles through API, but then enable various strength-reductions to optimize well-known/popular shapes. Packing multiple vectors perfectly fits that effort.
>> 
>> Thanks for your inputs on the IR choice. I agree with you about adding such a vector concatenate node in C2. And if we decide to move the complex implementation to Java-level, we'd better also add such an API for vector concatenate, right?
>
>> if we decide to move the complex implementation to Java-level, we'd better also add such an API for vector concatenate, right?
> 
> There's already generic shuffle operation present (rearrange). But there're precedents when more specific operations became part of the API for convenience reasons (e.g., slice/unslice). So, a dedicated operation for vector concatenation may be well-justified.

> However, this may break current backend implementation for other architectures like X86. I'm not sure whether moving to Java will be also friendly for non-SVE arches. Per my understanding, subword gather depends much more on the backend solution.

IMO that's a clear sign that current abstraction is way too ad-hoc and platform-specific. x86 ISA lacks native support, so the operation is emulated with hand-written assembly.  If there's a less performant implementation, but which relies on a uniform cross-platform VM interface, it'll be a clear winner.

The PR, as it is now, introduces a new IR representation which complicates things even more. Instead, I'd encourage you to work on a uniform API even if x86 won't be immediately migrated.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26236#discussion_r2380418706

From missa at openjdk.org  Thu Sep 25 22:10:01 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 25 Sep 2025 22:10:01 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v17]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg...

Mohamed Issa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision:

 - Merge branch 'master' of https://github.com/missa-prime/jdk into user/missa-prime/avx10_2
 - Use compiler generator instead of standard Java streams
 - Clean up scalar floating point conversion tests
 - Introduce scalar floating point conversion tests with IR rules
 - Add extra constraints to vector floating point conversion instruction predicates and tests
 - Change the floating point conversion instruction, IR nodes, and test rules to make them clearer
 - Change debug text format of AVX 10.2 vector conversion instructions
 - Check for instructions that shouldn't appear in vector floating point conversion tests
 - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions
 - Add new IR nodes covering x86 floating point conversion instructions
 - ... and 11 more: https://git.openjdk.org/jdk/compare/2c62849a...0415ddf2

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/0acc719c..0415ddf2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=16
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=15-16

  Stats: 220481 lines in 3346 files changed: 167519 ins; 33959 del; 19003 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From vlivanov at openjdk.org  Thu Sep 25 22:18:20 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 25 Sep 2025 22:18:20 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v3]
In-Reply-To: <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
Message-ID: <Vea24HNry6zqTpnEG3Qhp0GajE9VaAZ6_KIfafMMxYQ=.5be212ac-809b-4174-9391-0ea145a4f937@github.com>

On Tue, 23 Sep 2025 08:28:00 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
>> 
>> Please review :)
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   move test

Marked as reviewed by vlivanov (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27408#pullrequestreview-3269606906

From vlivanov at openjdk.org  Thu Sep 25 22:18:24 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Thu, 25 Sep 2025 22:18:24 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v3]
In-Reply-To: <sHPYPAQwLUPuullEK4PKAe_Ph7z0oOORzlz_wF04eZA=.4e93153e-63f1-4b42-b955-15a1d60ab21f@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
 <TjIJ8kANFmXd5c4CRKd2yndTcblECPfgiuXM3lT4pS0=.cf5fda40-7cef-4f25-a74c-4f7a9899dfa2@github.com>
 <sHPYPAQwLUPuullEK4PKAe_Ph7z0oOORzlz_wF04eZA=.4e93153e-63f1-4b42-b955-15a1d60ab21f@github.com>
Message-ID: <jrX_Ohi3k6KtgveHMKnva3wwK8WBD8NoXLvy2A0_Pjk=.00397364-2eb0-44aa-837e-eeb1de1a98ed@github.com>

On Tue, 23 Sep 2025 21:03:41 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> src/hotspot/share/opto/divnode.cpp line 1209:
>> 
>>> 1207:   if (t2 == Type::TOP) { return Type::TOP; }
>>> 1208: 
>>> 1209:   // Mod by zero?  Throw exception at runtime!
>> 
>> The comment is a bit confusing. It's not the node itself which produces the exception, but a dominating zero check (inserted during parsing). So, if a divisor becomes 0, it means the node is effectively dead and can go away.  
>> 
>> Also, the node should go away anyway as part of CFG pruning of dead branches when corresponding guard goes away. 
>> 
>> BTW if there are cases when control is not eliminated, it may irrevocably break the IR causing crashes down the road (take a look at JDK-8154831 as an example). So, maybe it's safer to just rely on dead control pruning to eliminate effectively dead ModI/ModL nodes and assert that there are no effectively dead ModI/ModL nodes present after GVN pass is over.
>
> The comment comes from the original code before my change in #25254, where that path also returned `POS` but that wasn't monotonic with my changes anymore.
> 
>> So, if a divisor becomes 0, it means the node is effectively dead and can go away.
> 
> I think this check mostly comes down to CCP. We need to return *something* for a zero divisor, and that something has to be monotonic with subsequent wider inputs.
> 
> If you agree with that observation, I can change the comment to better reflect what's going on, e.g., `Mod by zero can be observed in PhaseCCP, return TOP to ensure monotonic results` (I'm open for other suggestions).

Thanks for the clarifications. I thought about it for some time, but as things work now, I don't see a better alternative except just ignoring 0 divisor case. So, please, proceed with the fix as it is now.

Alternatively, to improve robustness, a dead ModI/ModL can kill dependent control akin to what Roland did for Type nodes with JDK-8349479.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27408#discussion_r2380437024

From missa at openjdk.org  Thu Sep 25 22:18:35 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Thu, 25 Sep 2025 22:18:35 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v17]
In-Reply-To: <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>
Message-ID: <dqpGGD68eL4KKqWncYxjkYl6GdGktS_r8gRNV9nDpIc=.3339a931-6699-4988-bfb7-49b22f278b4b@github.com>

On Thu, 25 Sep 2025 22:10:01 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`...
>
> Mohamed Issa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/missa-prime/jdk into user/missa-prime/avx10_2
>  - Use compiler generator instead of standard Java streams
>  - Clean up scalar floating point conversion tests
>  - Introduce scalar floating point conversion tests with IR rules
>  - Add extra constraints to vector floating point conversion instruction predicates and tests
>  - Change the floating point conversion instruction, IR nodes, and test rules to make them clearer
>  - Change debug text format of AVX 10.2 vector conversion instructions
>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions
>  - Add new IR nodes covering x86 floating point conversion instructions
>  - ... and 11 more: https://git.openjdk.org/jdk/compare/886bd319...0415ddf2

@mhaessig Since @eme64 is on vacation, could you test this one as well?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3336103359

From duke at openjdk.org  Thu Sep 25 22:44:11 2025
From: duke at openjdk.org (duke)
Date: Thu, 25 Sep 2025 22:44:11 GMT
Subject: Withdrawn: 8360557: CTW: Inline cold methods to reach more code
In-Reply-To: <Z6Aa1UbcJyKY9FTZfIi0NmAJq31jR8efqUHwcwCrEQk=.7294b0f3-00a1-46a2-856f-0a52173b3f59@github.com>
References: <Z6Aa1UbcJyKY9FTZfIi0NmAJq31jR8efqUHwcwCrEQk=.7294b0f3-00a1-46a2-856f-0a52173b3f59@github.com>
Message-ID: <niHvjOrF2xdN56aCArqKFQd5ddFQoyE3XF0Sm4vhO0I=.04db6e57-1f93-41c1-bcd1-2422292b0b2c@github.com>

On Tue, 1 Jul 2025 12:26:44 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations.
> 
> There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data.
> 
> After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines.
> 
> Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better.
> 
> Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that.
> 
> Additional testing:
>  - [x] GHA
>  - [x] Linux x86_64 server fastdebug, `applications/ctw/modules`
>  - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways)

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/26068

From duke at openjdk.org  Fri Sep 26 03:13:52 2025
From: duke at openjdk.org (erifan)
Date: Fri, 26 Sep 2025 03:13:52 GMT
Subject: RFR: 8303762: Optimize vector slice operation with constant index
 using VPALIGNR instruction [v8]
In-Reply-To: <ledEIa5Cj9jGqJfRfGBCIF6_nvGBw5lG8j2ksYwePs8=.6dd826a7-c935-40be-afea-bd8ed146560f@github.com>
References: <oHhjicRldNzRK9vWNQFhpglJ-yICPm9ZgXH7VdwKaug=.8c5ec00b-7b29-4719-9d89-7dcbb28f6c79@github.com>
 <ledEIa5Cj9jGqJfRfGBCIF6_nvGBw5lG8j2ksYwePs8=.6dd826a7-c935-40be-afea-bd8ed146560f@github.com>
Message-ID: <afgT8ZVXM4RkNyC5K-a_5-SCK5Hcudxx1KHcFON2Gk8=.e40ac5fc-eb26-4875-8206-bced2b4df39d@github.com>

On Wed, 20 Aug 2025 10:11:47 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction.
>> It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails.
>> 
>>  Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java).
>> 
>> Vector API jtreg tests pass at AVX level 2, remaining validation in progress.
>> 
>> Performance numbers:
>> 
>> 
>> System : 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                                (size)   Mode  Cnt      Score   Error   Units
>> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1     1024  thrpt    2   9444.444          ops/ms
>> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2     1024  thrpt    2  10009.319          ops/ms
>> VectorSliceBenchmark.byteVectorSliceWithVariableIndex      1024  thrpt    2   9081.926          ops/ms
>> VectorSliceBenchmark.intVectorSliceWithConstantIndex1      1024  thrpt    2   6085.825          ops/ms
>> VectorSliceBenchmark.intVectorSliceWithConstantIndex2      1024  thrpt    2   6505.378          ops/ms
>> VectorSliceBenchmark.intVectorSliceWithVariableIndex       1024  thrpt    2   6204.489          ops/ms
>> VectorSliceBenchmark.longVectorSliceWithConstantIndex1     1024  thrpt    2   1651.334          ops/ms
>> VectorSliceBenchmark.longVectorSliceWithConstantIndex2     1024  thrpt    2   1642.784          ops/ms
>> VectorSliceBenchmark.longVectorSliceWithVariableIndex      1024  thrpt    2   1474.808          ops/ms
>> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1    1024  thrpt    2  10399.394          ops/ms
>> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2    1024  thrpt    2  10502.894          ops/ms
>> VectorSliceB...
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update callGenerator.hpp copyright year

test/micro/org/openjdk/bench/jdk/incubator/vector/VectorSliceBenchmark.java line 137:

> 135:     @Benchmark
> 136:     public void shortVectorSliceWithConstantIndex1() {
> 137:         for (int i = 0; i < sspecies.loopBound(sdst.length); i += bspecies.length()) {

Typo ? `bspecies` -> `sspecies` and the following cases.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2380745327

From bmaillard at openjdk.org  Fri Sep 26 07:02:20 2025
From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard)
Date: Fri, 26 Sep 2025 07:02:20 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v3]
In-Reply-To: <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
Message-ID: <-k8uy90M9Av5jbIq1WZVMZcuB5NnRuw2vHAjSWb8FhU=.d588a39e-94bf-4066-bef8-fd1dad610c5c@github.com>

On Tue, 23 Sep 2025 08:28:00 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
>> 
>> Please review :)
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   move test

Marked as reviewed by bmaillard (Author).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27408#pullrequestreview-3270427832

From jbhateja at openjdk.org  Fri Sep 26 08:03:12 2025
From: jbhateja at openjdk.org (Jatin Bhateja)
Date: Fri, 26 Sep 2025 08:03:12 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v17]
In-Reply-To: <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>
Message-ID: <0ODl2Qx_vAsMtfkz7Wz_rZNWxJNZ5Fgn_NQKA1rHt9s=.2e4363aa-47bc-42ce-ba41-da1e94b27a38@github.com>

On Thu, 25 Sep 2025 22:10:01 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`...
>
> Mohamed Issa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/missa-prime/jdk into user/missa-prime/avx10_2
>  - Use compiler generator instead of standard Java streams
>  - Clean up scalar floating point conversion tests
>  - Introduce scalar floating point conversion tests with IR rules
>  - Add extra constraints to vector floating point conversion instruction predicates and tests
>  - Change the floating point conversion instruction, IR nodes, and test rules to make them clearer
>  - Change debug text format of AVX 10.2 vector conversion instructions
>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions
>  - Add new IR nodes covering x86 floating point conversion instructions
>  - ... and 11 more: https://git.openjdk.org/jdk/compare/4f01109d...0415ddf2

Thanks @missa-prime for addressing my comments, patch looks good to me.

Best Regards

-------------

Marked as reviewed by jbhateja (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26919#pullrequestreview-3270710468

From hgreule at openjdk.org  Fri Sep 26 08:38:25 2025
From: hgreule at openjdk.org (Hannes Greule)
Date: Fri, 26 Sep 2025 08:38:25 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes
In-Reply-To: <BhjvxMMKh9lLZY63BkjH0ccRnjCvAHns7S2j12QMpyU=.2a036654-6a83-4c9a-8778-76974c8d7ada@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <xytvQB_dSP5xe_coKl2GxxxF_0wxL4t_bkGSA6N7S9E=.5b1ab6ce-05a5-44a8-9935-a3157e159041@github.com>
 <BhjvxMMKh9lLZY63BkjH0ccRnjCvAHns7S2j12QMpyU=.2a036654-6a83-4c9a-8778-76974c8d7ada@github.com>
Message-ID: <NTCvvXj8IspMWlaSFbHZCGz-NIc_5au5VobqM-wsL80=.73349bc9-0a84-479d-a0de-19b89f358ffb@github.com>

On Wed, 24 Sep 2025 09:21:32 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

> Unfortunately we still see an assert in the test java/foreign/TestUpcallStress on Linux aarch64 . But this time it is not the 'old' one but
> 
> `# assert(oopDesc::is_oop(obj)) failed: not an oop: 0x0000000000000001`
> 
> Maybe it is unrelated, not sure .

@MBaesken this looks rather unrelated, but hard to tell without more output.

@chhagedorn did your tests came back green?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27408#issuecomment-3337378419

From chagedorn at openjdk.org  Fri Sep 26 08:43:30 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 26 Sep 2025 08:43:30 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v3]
In-Reply-To: <jrX_Ohi3k6KtgveHMKnva3wwK8WBD8NoXLvy2A0_Pjk=.00397364-2eb0-44aa-837e-eeb1de1a98ed@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
 <TjIJ8kANFmXd5c4CRKd2yndTcblECPfgiuXM3lT4pS0=.cf5fda40-7cef-4f25-a74c-4f7a9899dfa2@github.com>
 <sHPYPAQwLUPuullEK4PKAe_Ph7z0oOORzlz_wF04eZA=.4e93153e-63f1-4b42-b955-15a1d60ab21f@github.com>
 <jrX_Ohi3k6KtgveHMKnva3wwK8WBD8NoXLvy2A0_Pjk=.00397364-2eb0-44aa-837e-eeb1de1a98ed@github.com>
Message-ID: <fxGh1tHWmXWuNd-h0pXHvE6b-gjdPGSbW7qKTAY7i6g=.3fcd8d17-582c-40d8-86bc-78e003a07d6d@github.com>

On Thu, 25 Sep 2025 22:15:47 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> I don't see a better alternative except just ignoring 0 divisor case

That probably also works. It seems that for `DivI/L`, we already ignore this case as well. 

The question is: What is better when the zero check is not folded but we observe zero for the divisor: Having top to possibly corrupt the graph or just possibly risking miscompilation/div by zero crashes at runtime when the zero check is really off - but not folding the zero check does not necessarily mean it's wrong at runtime. The former is probably easy to catch when it happens while the latter seems more robost but when the zero check is off, it's probably harder to detect/trace back.

> Alternatively, to improve robustness, a dead ModI/ModL can kill dependent control akin to what Roland did for Type nodes with JDK-8349479.

Could be an option. We then should probably also extend it to Div nodes. Might be worth to investigate separately.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27408#discussion_r2381407218

From chagedorn at openjdk.org  Fri Sep 26 08:51:31 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 26 Sep 2025 08:51:31 GMT
Subject: RFR: 8367967: C2: "fatal error: Not monotonic" with Mod nodes [v3]
In-Reply-To: <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
References: <EIXct5gcS8DwqvQtU2I1D1o3CjeeSxQv6ricuvQPzxg=.b0a43839-c433-4fc8-882b-6aa21bef06b9@github.com>
 <EDPz9FTr56gUdjcVeokF55vuUSb71g-806n8urrSrVM=.0ea0eebc-9545-48b8-9acf-35f592c14c71@github.com>
Message-ID: <XZ1iAG0BHvJ9hrpQfCmdlmUTk-pAzAcxK-jDa3hiYqw=.d1c5c0dc-928b-4827-9654-5c0bd1e602c5@github.com>

On Tue, 23 Sep 2025 08:28:00 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
>> 
>> Please review :)
>
> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision:
> 
>   move test

Testing looks good!

I also left a comment about ignoring the zero divisor case. It's an interesting thought to just ignore/remove it. Anyway, the current patch just fixes the current situation and does not make it worse. So, I agree with it but if you want to switch to the ignoring case, I'm also fine. In the latter case, I won't be able to review it anymore since I will be on vacation next week (assuming we also wait for Vladimir's additional input about it). But you would have my implicit approval :-)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27408#issuecomment-3337417287

From rcastanedalo at openjdk.org  Fri Sep 26 09:04:44 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 26 Sep 2025 09:04:44 GMT
Subject: RFR: 8368675: IGV: nodes are wrongly marked as changed in the
 difference view
Message-ID: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com>

This changeset refines IGV's node difference analysis to ignore changes in node properties that are derived by IGV, as opposed to generated by HotSpot. Derived properties include the label and color of each node. Ignoring changes in these properties prevents IGV from wrongly marking equal nodes as "changed" (colored in yellow) when showing the difference between two graphs:

<img width="2323" height="651" alt="before-after" src="https://github.com/user-attachments/assets/ea51c86d-4719-45c0-b615-e6e4b8aec023" />

**Testing:** tier1 and manual testing on a few graphs.

-------------

Commit messages:
 - Ignore 'label' and 'color' properties when analyzing node changes

Changes: https://git.openjdk.org/jdk/pull/27515/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27515&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8368675
  Stats: 27 lines in 5 files changed: 14 ins; 2 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/27515.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27515/head:pull/27515

PR: https://git.openjdk.org/jdk/pull/27515

From mhaessig at openjdk.org  Fri Sep 26 09:14:05 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 26 Sep 2025 09:14:05 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v17]
In-Reply-To: <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>
Message-ID: <jFiBDO6uYX5EXKBqZsgDnHIkwYOblvtsogSzpCz4Xag=.9beaabfc-c4e9-4bbb-88ec-f7b25dac49d9@github.com>

On Thu, 25 Sep 2025 22:10:01 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`...
>
> Mohamed Issa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/missa-prime/jdk into user/missa-prime/avx10_2
>  - Use compiler generator instead of standard Java streams
>  - Clean up scalar floating point conversion tests
>  - Introduce scalar floating point conversion tests with IR rules
>  - Add extra constraints to vector floating point conversion instruction predicates and tests
>  - Change the floating point conversion instruction, IR nodes, and test rules to make them clearer
>  - Change debug text format of AVX 10.2 vector conversion instructions
>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions
>  - Add new IR nodes covering x86 floating point conversion instructions
>  - ... and 11 more: https://git.openjdk.org/jdk/compare/eae8f2a6...0415ddf2

Thank you for implementing these new instructions. I kicked off testing on our side.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5182:

> 5180:       evpmovdb(dst, dst, vec_enc);
> 5181:       break;
> 5182:     default: assert(false, "%s", type2name(to_elem_bt));

Perhaps you could provide a richer assert message like this:
Suggestion:

    default: assert(false, "unexpexted basic type for vector castF2X AVX10: %s", type2name(to_elem_bt));

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/26919#pullrequestreview-3271055265
PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2381527655

From mhaessig at openjdk.org  Fri Sep 26 09:21:00 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 26 Sep 2025 09:21:00 GMT
Subject: RFR: 8360558: Use hex literals instead of decimal literals in math
 intrinsic constants
In-Reply-To: <JqJDah4V1rq9XBUq5WxLd_fnn1CjYCFpBlh2NadSFgI=.969a9af0-0d3f-414b-a3db-8f17e5e6164d@github.com>
References: <JqJDah4V1rq9XBUq5WxLd_fnn1CjYCFpBlh2NadSFgI=.969a9af0-0d3f-414b-a3db-8f17e5e6164d@github.com>
Message-ID: <qP6bJTXjzcBmrnjMqW9uLAOJkghpDfm1KT7UPBVYhlM=.635c3d5b-ce3d-4d71-af9a-5d58fd4fd1c8@github.com>

On Thu, 25 Sep 2025 19:26:34 GMT, Mohamed Issa <missa at openjdk.org> wrote:

> A simple change to use hex literals instead of decimal literals in the constant arrays of the x86 cbrt and tanh stubs. The JTREG tests listed below were used to verify correctness. The baseline build used is [OpenJDK v26-b17](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B17).
> 
> 1. `jtreg:test/jdk/java/lang/Math/CubeRootTests.java`
> 2. `jtreg:test/jdk/java/lang/Math/HyperbolicTests.java`

Thank you for this improvement. The few values I checked match ? I assume this is generated and there is not really a way to review it?

I kicked off testing and will report back with results.

-------------

PR Review: https://git.openjdk.org/jdk/pull/27497#pullrequestreview-3271109691

From rcastanedalo at openjdk.org  Fri Sep 26 10:11:42 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 26 Sep 2025 10:11:42 GMT
Subject: RFR: 8368753: IGV: improve CFG view of difference graphs
Message-ID: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com>

This changeset improves the control-flow graph view of difference graphs by:

1. ensuring that nodes are scheduled locally within each block, and
2. hiding internal, artificial blocks containing nodes that remain in the graph even if they are dead, such as the top constant node.

The following screenshot illustrates the effect of scheduling nodes locally:

<img width="3853" height="1033" alt="JDK-8368753" src="https://github.com/user-attachments/assets/bdc0f6de-3d28-4615-9e0d-221de2ad4770" />

For example, before this changeset (left) the `Return` node in B9 is scheduled at the beginning of the block. After the changeset (right), this node is scheduled last, as expected.

**Testing:** tier1 and manual testing on a few graphs.

-------------

Commit messages:
 - Schedule difference graphs locally and mark their artificial block

Changes: https://git.openjdk.org/jdk/pull/27520/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27520&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8368753
  Stats: 64 lines in 4 files changed: 44 ins; 11 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/27520.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27520/head:pull/27520

PR: https://git.openjdk.org/jdk/pull/27520

From rcastanedalo at openjdk.org  Fri Sep 26 10:14:42 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 26 Sep 2025 10:14:42 GMT
Subject: RFR: 8359378: aarch64: crash when using -XX:+UseFPUForSpilling
In-Reply-To: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
References: <0MbLaZYRoz-raa9x1eZycxeHE7mvPiGJujQpg8vBdek=.e3483c47-ad65-4d09-baf5-2db9b780669d@github.com>
Message-ID: <iDLBS-Roxb8ovuxKejNLUvB6gu8twGdW9BSitF1gWnY=.b0d97be9-9363-48c3-8083-23b9e5b9dfc9@github.com>

On Wed, 17 Sep 2025 16:19:12 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> AArch64 BarrierSetAssembler path assumes only FP/vector ideal regs reach the FP spill/restore encoding. With -XX:+UseFPUForSpilling Register Allocator may allocate scalar values in FP registers. When such values (Op_RegI/Op_RegN/Op_RegL/Op_RegP) hit `BarrierSetAssembler::encode_float_vector_register_size`, we trip ShouldNotReachHere in release build and **"unexpected ideal register"** assertion in debug build.
> 
> Fix: teach the encoder to handle scalar ideal regs when they physically live in FP regs:
> - treat Op_RegI / Op_RegN as 32-bit (single slot) - same class as Op_RegF
> - treat Op_RegL / Op_RegP as 64-bit (two slots) - same class as Op_RegD
> 
> Related:
> - reproduced since #19746
> - spilling logic: 
>   - #18967
>   - #17977
> 
> Testing: tier1-3 with javaoptions -Xcomp -Xbatch -XX:+UseFPUForSpilling on AARCH

> I suggest handling this in two steps:
> 
> * In JDK 25 we fix the crash when UseFPUForSpilling is enabled.
> * In the next release we prohibit the option softly: if it is set on the command line, the VM prints a warning and resets it to false. Proposed change for the latter:
> 
> ```diff
> diff --git a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
> index 308deeaf5e2..7702988c11c 100644
> --- a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
> +++ b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
> @@ -621,4 +621,9 @@ void VM_Version::initialize() {
>      FLAG_SET_DEFAULT(UseVectorizedHashCodeIntrinsic, true);
>    }
> +
> +  if (UseFPUForSpilling) {
> +    warning("UseFPUForSpilling is known to degrade performance on this platform and will be ignored.");
> +    FLAG_SET_DEFAULT(UseFPUForSpilling, false);
> +  }
>  #endif
>  
> ```

This seems reasonable to me as well.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27350#issuecomment-3337898957

From mchevalier at openjdk.org  Fri Sep 26 10:30:05 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 26 Sep 2025 10:30:05 GMT
Subject: RFR: 8368675: IGV: nodes are wrongly marked as changed in the
 difference view
In-Reply-To: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com>
References: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com>
Message-ID: <zZ3_F_9EdGZNTSIlgUqks_JoFeItedKrVX6LD6HDY-8=.65644db7-9510-42f6-a148-0ece2a339e2d@github.com>

On Fri, 26 Sep 2025 08:34:34 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset refines IGV's node difference analysis to ignore changes in node properties that are derived by IGV, as opposed to generated by HotSpot. Derived properties include the label and color of each node. Ignoring changes in these properties prevents IGV from wrongly marking equal nodes as "changed" (colored in yellow) when showing the difference between two graphs:
> 
> <img width="2323" height="651" alt="before-after" src="https://github.com/user-attachments/assets/ea51c86d-4719-45c0-b615-e6e4b8aec023" />
> 
> **Testing:** tier1 and manual testing on a few graphs.

Makes sense, looks good, will help!

I've actually already noticed such yellow nodes without change, and couldn't understand. I assumed the color code had some semantics I didn't know. That will be less confusing, thanks.

-------------

Marked as reviewed by mchevalier (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27515#pullrequestreview-3271513172

From rcastanedalo at openjdk.org  Fri Sep 26 11:35:49 2025
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Fri, 26 Sep 2025 11:35:49 GMT
Subject: RFR: 8368675: IGV: nodes are wrongly marked as changed in the
 difference view
In-Reply-To: <zZ3_F_9EdGZNTSIlgUqks_JoFeItedKrVX6LD6HDY-8=.65644db7-9510-42f6-a148-0ece2a339e2d@github.com>
References: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com>
 <zZ3_F_9EdGZNTSIlgUqks_JoFeItedKrVX6LD6HDY-8=.65644db7-9510-42f6-a148-0ece2a339e2d@github.com>
Message-ID: <FpXCbeVQPo_jpvHaGE8JeigJRhrkEInS45jDSPdaHbI=.4cb302be-85cf-4ab2-907c-3f091282e48e@github.com>

On Fri, 26 Sep 2025 10:27:07 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

> Makes sense, looks good, will help!
> 
> I've actually already noticed such yellow nodes without change, and couldn't understand. I assumed the color code had some semantics I didn't know. That will be less confusing, thanks.

Thanks for reviewing, Marc!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27515#issuecomment-3338245937

From bkilambi at openjdk.org  Fri Sep 26 12:11:59 2025
From: bkilambi at openjdk.org (Bhavana Kilambi)
Date: Fri, 26 Sep 2025 12:11:59 GMT
Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16
Message-ID: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com>

This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species.

Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets -

**For AddReduction :**
On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction.

On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order.

**For MulReduction :**
Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported.

Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` -

Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch.
Ratio > 1 indicates the performance with this patch is better than the master branch.

**N1 (UseSVE = 0, max vector length = 16B):**

Benchmark         vectorDim  Mode   Cnt  8B     16B
ReductionAddFP16  256        thrpt  9    1.41   1.40
ReductionAddFP16  512        thrpt  9    1.41   1.41
ReductionAddFP16  1024       thrpt  9    1.43   1.40
ReductionAddFP16  2048       thrpt  9    1.43   1.40
ReductionMulFP16  256        thrpt  9    1.22   1.22
ReductionMulFP16  512        thrpt  9    1.21   1.23
ReductionMulFP16  1024       thrpt  9    1.21   1.22
ReductionMulFP16  2048       thrpt  9    1.20   1.22


On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction and mul reduction respectively.

**V1 (UseSVE = 1, max vector length = 32B):**

Benchmark         vectorDim  Mode   Cnt  8B     16B     32B
ReductionAddFP16  256        thrpt  9    1.11   1.75    2.02
ReductionAddFP16  512        thrpt  9    1.02   1.64    1.93
ReductionAddFP16  1024       thrpt  9    1.02   1.59    1.85
ReductionAddFP16  2048       thrpt  9    1.02   1.56    1.80
ReductionMulFP16  256        thrpt  9    1.12   0.99    1.09
ReductionMulFP16  512        thrpt  9    1.04   1.01    1.04
ReductionMulFP16  1024       thrpt  9    1.02   1.02    1.00
ReductionMulFP16  2048       thrpt  9    1.01   1.01    1.00


On V1, for MaxVectorSize = 8: scalarized `fadd/fmul` sequence will be generated for `AddReductionVHF/MulReductionVHF` as UseSVE defaults to 0 [2].
For MaxVectorSize = 16: scalarized `fmul` sequence is generated for `MulReductionVHF` and `fadda` is generated for `AddReductionVHF` which fetches signficant gains.
For MaxVectorSize = 32: Autovectorization of `MulReductionVHF` is disabled for MaxVectorSize > 16B so the autovectorizer checks for maximal implemented size[1] which is 16B and generates scalarized `fmul` sequence for 16B in this case. For `AddReductionVHF`, it generates the `fadda` instruction.

**V2 (UseSVE = 2, max vector length = 16B)**

Benchmark         vectorDim  Mode   Cnt  8B     16B
ReductionAddFP16  256        thrpt  9    1.16   1.70
ReductionAddFP16  512        thrpt  9    1.02   1.61
ReductionAddFP16  1024       thrpt  9    1.01   1.53
ReductionAddFP16  2048       thrpt  9    1.00   1.49
ReductionMulFP16  256        thrpt  9    1.18   0.99
ReductionMulFP16  512        thrpt  9    1.04   1.01
ReductionMulFP16  1024       thrpt  9    1.02   1.02
ReductionMulFP16  2048       thrpt  9    1.01   1.01


On V2, for MaxVectorSize = 8: scalarized `fadd/fmul` sequence will be generated as UseSVE defaults to 0 [2].
For MaxVectorSize = 16: `fadda` instruction is generated for `AddReductionVHF` which results in significant gains in performance. For `MulReductionVHF`, the scalarized `fmul` sequence will be generated.

**Testing:**
hotspot_all, jdk(tiers1-3) and langtools(tier1) all pass on N1/V1/V2.

[1] https://github.com/openjdk/jdk/blob/a272696813f2e5e896ac9de9985246aaeb9d476c/src/hotspot/share/opto/superword.cpp#L1677
[2] https://github.com/openjdk/jdk/blob/a272696813f2e5e896ac9de9985246aaeb9d476c/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L479

-------------

Commit messages:
 - 8366444: Add support for add/mul reduction operations for Float16

Changes: https://git.openjdk.org/jdk/pull/27526/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8366444
  Stats: 500 lines in 12 files changed: 421 ins; 2 del; 77 mod
  Patch: https://git.openjdk.org/jdk/pull/27526.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27526/head:pull/27526

PR: https://git.openjdk.org/jdk/pull/27526

From mchevalier at openjdk.org  Fri Sep 26 12:39:59 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 26 Sep 2025 12:39:59 GMT
Subject: RFR: 8366444: Add support for add/mul reduction operations for
 Float16
In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com>
References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com>
Message-ID: <j-nP_OUlCQUBzGnT3b18X6mbm0K25fzuBpo1lsD4YzA=.188e31a9-da84-4215-9492-92aae01c6674@github.com>

On Fri, 26 Sep 2025 12:00:31 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species.
> 
> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets -
> 
> **For AddReduction :**
> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction.
> 
> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order.
> 
> **For MulReduction :**
> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported.
> 
> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` -
> 
> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch.
> Ratio > 1 indicates the performance with this patch is better than the master branch.
> 
> **N1 (UseSVE = 0, max vector length = 16B):**
> 
> Benchmark         vectorDim  Mode   Cnt  8B     16B
> ReductionAddFP16  256        thrpt  9    1.41   1.40
> ReductionAddFP16  512        thrpt  9    1.41   1.41
> ReductionAddFP16  1024       thrpt  9    1.43   1.40
> ReductionAddFP16  2048       thrpt  9    1.43   1.40
> ReductionMulFP16  256        thrpt  9    1.22   1.22
> ReductionMulFP16  512        thrpt  9    1.21   1.23
> ReductionMulFP16  1024       thrpt  9    1.21   1.22
> ReductionMulFP16  2048       thrpt  9    1.20   1.22
> 
> 
> On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ...

I'm not an expert in that, so I have mostly superficial comments.

I'm also running some tests, will come back with results eventually.

src/hotspot/share/opto/vectornode.cpp line 1515:

> 1513:   case Op_AndReductionV:   return new AndReductionVNode (ctrl, n1, n2);
> 1514:   case Op_OrReductionV:    return new OrReductionVNode  (ctrl, n1, n2);
> 1515:   case Op_XorReductionV:   return new XorReductionVNode (ctrl, n1, n2);

Do we feel strongly about this alignment? I find unfortunate to have such a big diff for 2 actual new lines.

src/hotspot/share/opto/vectornode.hpp line 340:

> 338: 
> 339:   virtual const Type* bottom_type() const { return Type::HALF_FLOAT; }
> 340:   virtual uint ideal_reg() const { return Op_RegF; }

Why not `override` instead of `virtual`? That has various advantages (like not accidentally declaring a new virtual method in case of mistake in the signature).

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 464:

> 462:         applyIfCPUFeatureAnd = {"fphp", "true", "asimdhp", "true"})
> 463:     public short vectorAddReductionFloat16() {
> 464:     short result = (short) 0;

Suggestion:

        short result = (short) 0;

-------------

PR Review: https://git.openjdk.org/jdk/pull/27526#pullrequestreview-3272041223
PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2382272305
PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2382276562
PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2382280192

From mhaessig at openjdk.org  Fri Sep 26 12:42:53 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 26 Sep 2025 12:42:53 GMT
Subject: RFR: 8368675: IGV: nodes are wrongly marked as changed in the
 difference view
In-Reply-To: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com>
References: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com>
Message-ID: <ClJzZL7ZpkOX3LlQLouDXlS1fzd4nk3e9wBdjHqi2lY=.a2e93933-f02a-4420-a9b9-74dc686add32@github.com>

On Fri, 26 Sep 2025 08:34:34 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset refines IGV's node difference analysis to ignore changes in node properties that are derived by IGV, as opposed to generated by HotSpot. Derived properties include the label and color of each node. Ignoring changes in these properties prevents IGV from wrongly marking equal nodes as "changed" (colored in yellow) when showing the difference between two graphs:
> 
> <img width="2323" height="651" alt="before-after" src="https://github.com/user-attachments/assets/ea51c86d-4719-45c0-b615-e6e4b8aec023" />
> 
> **Testing:** tier1 and manual testing on a few graphs.

Thank you very much for fixing this, @robcasloz! I have been seeing this and wondered if that is intentional.

I tested this on my machine and it works a treat! Looks good to me.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27515#pullrequestreview-3272075747

From mhaessig at openjdk.org  Fri Sep 26 12:49:32 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 26 Sep 2025 12:49:32 GMT
Subject: RFR: 8368753: IGV: improve CFG view of difference graphs
In-Reply-To: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com>
References: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com>
Message-ID: <iI1kvuxszaga0LzEQswgtUsW1jjKjn3IT1L9esbw4X0=.d73f8ad8-545f-4bad-8203-f07eec4ff2b1@github.com>

On Fri, 26 Sep 2025 09:48:57 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset improves the control-flow graph view of difference graphs by:
> 
> 1. ensuring that nodes are scheduled locally within each block, and
> 2. hiding internal, artificial blocks containing nodes that remain in the graph even if they are dead, such as the top constant node.
> 
> The following screenshot illustrates the effect of scheduling nodes locally:
> 
> <img width="3853" height="1033" alt="JDK-8368753" src="https://github.com/user-attachments/assets/bdc0f6de-3d28-4615-9e0d-221de2ad4770" />
> 
> For example, before this changeset (left) the `Return` node in B9 is scheduled at the beginning of the block. After the changeset (right), this node is scheduled last, as expected.
> 
> **Testing:** tier1 and manual testing on a few graphs.

Thank you for your continued work on IGV, @robcasloz! This is really helpful.

The changes look good to me.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27520#pullrequestreview-3272107049

From chagedorn at openjdk.org  Fri Sep 26 13:43:37 2025
From: chagedorn at openjdk.org (Christian Hagedorn)
Date: Fri, 26 Sep 2025 13:43:37 GMT
Subject: RFR: 8368753: IGV: improve CFG view of difference graphs
In-Reply-To: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com>
References: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com>
Message-ID: <FbUjBd0L3b0V2IQ6Y87nMKTXCfpk6iVpnnZF9h9CMdY=.da48340f-da70-4c43-b3c8-e80ab5ddb6ec@github.com>

On Fri, 26 Sep 2025 09:48:57 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset improves the control-flow graph view of difference graphs by:
> 
> 1. ensuring that nodes are scheduled locally within each block, and
> 2. hiding internal, artificial blocks containing nodes that remain in the graph even if they are dead, such as the top constant node.
> 
> The following screenshot illustrates the effect of scheduling nodes locally:
> 
> <img width="3853" height="1033" alt="JDK-8368753" src="https://github.com/user-attachments/assets/bdc0f6de-3d28-4615-9e0d-221de2ad4770" />
> 
> For example, before this changeset (left) the `Return` node in B9 is scheduled at the beginning of the block. After the changeset (right), this node is scheduled last, as expected.
> 
> **Testing:** tier1 and manual testing on a few graphs.

Nice! Looks good to me, too.

-------------

Marked as reviewed by chagedorn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/27520#pullrequestreview-3272344543

From bkilambi at openjdk.org  Fri Sep 26 14:38:31 2025
From: bkilambi at openjdk.org (Bhavana Kilambi)
Date: Fri, 26 Sep 2025 14:38:31 GMT
Subject: RFR: 8366444: Add support for add/mul reduction operations for
 Float16
In-Reply-To: <j-nP_OUlCQUBzGnT3b18X6mbm0K25fzuBpo1lsD4YzA=.188e31a9-da84-4215-9492-92aae01c6674@github.com>
References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com>
 <j-nP_OUlCQUBzGnT3b18X6mbm0K25fzuBpo1lsD4YzA=.188e31a9-da84-4215-9492-92aae01c6674@github.com>
Message-ID: <_OARcwRoClL9e8vvQSReGYNFsyQf6qq1Summ3nPuYYE=.6a354493-c27d-4002-a84e-3143be907b14@github.com>

On Fri, 26 Sep 2025 12:37:16 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species.
>> 
>> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets -
>> 
>> **For AddReduction :**
>> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction.
>> 
>> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order.
>> 
>> **For MulReduction :**
>> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported.
>> 
>> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` -
>> 
>> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch.
>> Ratio > 1 indicates the performance with this patch is better than the master branch.
>> 
>> **N1 (UseSVE = 0, max vector length = 16B):**
>> 
>> Benchmark         vectorDim  Mode   Cnt  8B     16B
>> ReductionAddFP16  256        thrpt  9    1.41   1.40
>> ReductionAddFP16  512        thrpt  9    1.41   1.41
>> ReductionAddFP16  1024       thrpt  9    1.43   1.40
>> ReductionAddFP16  2048       thrpt  9    1.43   1.40
>> ReductionMulFP16  256        thrpt  9    1.22   1.22
>> ReductionMulFP16  512        thrpt  9    1.21   1.23
>> ReductionMulFP16  1024       thrpt  9    1.21   1.22
>> ReductionMulFP16  2048       thrpt  9    1.20   1.22
>> 
>> 
>> On N1, the scalarized sequence of `fadd/fmul` are gener...
>
> I'm not an expert in that, so I have mostly superficial comments.
> 
> I'm also running some tests, will come back with results eventually.

Thanks for reviewing @marc-chevalier . I'll address your review comments soon.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3338993568

From mhaessig at openjdk.org  Fri Sep 26 15:02:00 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 26 Sep 2025 15:02:00 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v17]
In-Reply-To: <jFiBDO6uYX5EXKBqZsgDnHIkwYOblvtsogSzpCz4Xag=.9beaabfc-c4e9-4bbb-88ec-f7b25dac49d9@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>
 <jFiBDO6uYX5EXKBqZsgDnHIkwYOblvtsogSzpCz4Xag=.9beaabfc-c4e9-4bbb-88ec-f7b25dac49d9@github.com>
Message-ID: <a-GDSAI2UbuSAzb3_OhTqX4-VBMX3lYgmxQDAr2amIg=.6a061060-ba29-497d-821a-7ea86b41e3fa@github.com>

On Fri, 26 Sep 2025 09:11:22 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> I kicked off testing on our side.

Testing passed.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3339071655

From mhaessig at openjdk.org  Fri Sep 26 15:04:00 2025
From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=)
Date: Fri, 26 Sep 2025 15:04:00 GMT
Subject: RFR: 8360558: Use hex literals instead of decimal literals in math
 intrinsic constants
In-Reply-To: <JqJDah4V1rq9XBUq5WxLd_fnn1CjYCFpBlh2NadSFgI=.969a9af0-0d3f-414b-a3db-8f17e5e6164d@github.com>
References: <JqJDah4V1rq9XBUq5WxLd_fnn1CjYCFpBlh2NadSFgI=.969a9af0-0d3f-414b-a3db-8f17e5e6164d@github.com>
Message-ID: <RMEdsaqTIEpqWEpQzWQIAdLHiDG_Ae5kYwN5YTI6Qy8=.6e9a5b8e-1a75-45e9-b36b-385e38e79cd4@github.com>

On Thu, 25 Sep 2025 19:26:34 GMT, Mohamed Issa <missa at openjdk.org> wrote:

> A simple change to use hex literals instead of decimal literals in the constant arrays of the x86 cbrt and tanh stubs. The JTREG tests listed below were used to verify correctness. The baseline build used is [OpenJDK v26-b17](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B17).
> 
> 1. `jtreg:test/jdk/java/lang/Math/CubeRootTests.java`
> 2. `jtreg:test/jdk/java/lang/Math/HyperbolicTests.java`

> I kicked off testing and will report back with results.

Testing passed.

-------------

Marked as reviewed by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27497#pullrequestreview-3272662897

From bulasevich at openjdk.org  Fri Sep 26 15:15:05 2025
From: bulasevich at openjdk.org (Boris Ulasevich)
Date: Fri, 26 Sep 2025 15:15:05 GMT
Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift
 exponent 100 is too large for 32-bit type 'unsigned int' [v5]
In-Reply-To: <BIlH0ZSo-ztrvtoihxHfQ8IYax2PoM0MBFCrw8wZ4_I=.e1d8e2a1-2623-4551-a65e-801f02cbc6aa@github.com>
References: <uKUByg7RkOyLsGYoajinrOf76Uu00PIJ-fBeWOKVNcI=.1d4fc3ed-2fd3-454c-9fa4-af97fc676b48@github.com>
 <BIlH0ZSo-ztrvtoihxHfQ8IYax2PoM0MBFCrw8wZ4_I=.e1d8e2a1-2623-4551-a65e-801f02cbc6aa@github.com>
Message-ID: <b0KrdtM_oA9bhmFKpPhLvh8_ReGNa0YN6X8J6ctnSYk=.b037693f-bbfd-408d-a592-36f322e49320@github.com>

On Wed, 17 Sep 2025 18:32:10 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

>> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal.
>> 
>> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run.
>> 
>> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments.
>> 
>> The problems is that shift count `n` may be too large here:
>> 
>> class Pipeline_Use_Cycle_Mask {
>> protected:
>>   uint _mask;
>>   ..
>>   Pipeline_Use_Cycle_Mask& operator<<=(int n) {
>>     _mask <<= n;
>>     return *this;
>>   }
>> };
>> 
>> The recent change attempted to cap the shift amount at one call site:
>> 
>> class Pipeline_Use_Element {
>> protected:
>>   ..
>>   // Mask of specific used cycles
>>   Pipeline_Use_Cycle_Mask _mask;
>>   ..
>>   void step(uint cycles) {
>>     _used = 0;
>>     uint max_shift = 8 * sizeof(_mask) - 1;
>>     _mask <<= (cycles < max_shift) ? cycles : max_shift;
>>   }
>> }
>> 
>> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count:
>> 
>> // The following two routines assume that the root Pipeline_Use entity
>> // consists of exactly 1 element for each functional unit
>> // start is relative to the current cycle; used for latency-based info
>> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const {
>>   for (uint i = 0; i < pred._count; i++) {
>>     const Pipeline_Use_Element *predUse = pred.element(i);
>>     if (predUse->_multiple) {
>>       uint min_delay = 7;
>>       // Multiple possible functional units, choose first unused one
>>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>>         const Pipeline_Use_Element *currUse = element(j);
>>         uint curr_delay = delay;
>>         if (predUse->_used & currUse->_used) {
>>           Pipeline_Use_Cycle_Mask x = predUse->_mask;
>>           Pipeline_Use_Cycle_Mask y = currUse->_mask;
>> 
>>           for ( y <<= curr_delay; x.overlaps(y); curr_delay++ )
>>             y <<= 1;
>>         }
>>         if (min_delay > curr_delay)
>>           min_delay = curr_delay;
>>       }
>>       if (delay < min_delay)
>>       delay = min_delay;
>>     }
>>     else {
>>       for (uint j = predUse->_lb; j <= pre...
>
> Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - reduce fixed_latency(100) to fixed_latency(30) for calls/traps on ARM, PPC, RISC-V, X86
>  - use uint32_t for _mask
>  - remove redundant code
>  - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int'

Thanks for the feedback.

For reference, a minimal change that accounts for fixed_latency looks like this:

diff --git a/src/hotspot/share/adlc/adlparse.cpp b/src/hotspot/share/adlc/adlparse.cpp
index 356c24760e8..854314eddc3 100644
--- a/src/hotspot/share/adlc/adlparse.cpp
+++ b/src/hotspot/share/adlc/adlparse.cpp
@@ -1740,2 +1740,5 @@ void ADLParser::pipe_class_parse(PipelineForm &pipeline) {
       pipe_class->setFixedLatency(fixed_latency);
+      if (pipeline._maxcycleused < fixed_latency) {
+        pipeline._maxcycleused = fixed_latency;
+      }
       next_char(); skipws();


However, switching more platforms to the multi-word mask path doesn?t seem reasonable to me. Aligning _maxcycleused with a real latencies is fine in principle, but here fixed_latency=100 is just a sentinel.

I?m confident the cap is a safe, semantics-preserving change - it mirrors what hardware effectively does for oversized shifts. Reducing the artificial fixed_latency to 30 remains effectively huge and testing shows it doesn?t break anything. By contrast, modifying _maxcycleused would touch long-standing ADLC behavior and carry a much larger risk, so I?d rather avoid it in this static-analyzis issue bug fix.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26890#issuecomment-3339111350

From Fei.Gao2 at arm.com  Fri Sep 26 15:15:15 2025
From: Fei.Gao2 at arm.com (Fei Gao)
Date: Fri, 26 Sep 2025 15:15:15 +0000
Subject: Leverage profiled compiled size to avoid aggressive inlining and code
 growth
In-Reply-To: <PA4PR08MB60801239477769432BDCFEB8D41EA@PA4PR08MB6080.eurprd08.prod.outlook.com>
References: <PA4PR08MB60801239477769432BDCFEB8D41EA@PA4PR08MB6080.eurprd08.prod.outlook.com>
Message-ID: <PA4PR08MB6080DD1BC95A2DD29D900ECFD41EA@PA4PR08MB6080.eurprd08.prod.outlook.com>

Post to hotspot-compiler-dev at openjdk.org<mailto:hotspot-compiler-dev at openjdk.org> instead of hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>.

Sorry for the repetition.

From: Fei Gao <Fei.Gao2 at arm.com>
Date: Friday, 26 September 2025 at 14:52
To: leyden-dev <leyden-dev at openjdk.org>, hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Leverage profiled compiled size to avoid aggressive inlining and code growth
Hi @leyden-dev<mailto:leyden-dev at openjdk.org> and @hotspot compiler<mailto:hotspot-compiler-dev at openjdk.java.net>,

TL;DR

https://github.com/openjdk/jdk/pull/27527

I proposed a PoC that explores leveraging profiled compiled sizes to improve C2 inlining decisions and mitigate code bloat. The approach records method sizes during a pre-run and feeds them back via compiler directives, helping to reduce aggressive inlining of large methods.

Testing on Renaissance and SPECjbb2015 showed clear code size differences but no significant performance impact on either AArch64 or x86. An alternative AOT-cache-based approach was also evaluated but did not produce meaningful code size changes.

Open questions remain about the long-term value of profiling given Project Leyden's direction of caching compiled code in AOT, and whether global profiling information could help C2 make better inlining decisions.

1. Motivation

In the current C2 behavior, the inliner only considers the estimated inlined size [1] [2] of a callee if the method has already been compiled by C2. In particular, C2 will explicitly reject inlining in the following cases:
    Hot methods with bytecode size > FreqInlineSize (325) [3]
    Cold methods with bytecode size > MaxInlineSize (35)

However, a common situation arises where a method's bytecode size is below 325, yet once compiled by C2 it produces a very large machine code body. If this method has not been compiled at the time its caller is being compiled, the inliner may aggressively inline it, potentially bloating the caller, even though an independent compiled copy might eventually exist.

To mitigate such cases, we can make previously profiled compiled sizes available early, allowing the inliner to make more informed decisions and reduce excessive code growth.

[1] https://github.com/openjdk/jdk/blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/bytecodeInfo.cpp#L180
[2] https://github.com/openjdk/jdk/blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/bytecodeInfo.cpp#L274
[3] https://github.com/openjdk/jdk/blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/bytecodeInfo.cpp#L184

2. Proof of Concept

To validate this idea, I created a proof-of-concept: https://github.com/openjdk/jdk/pull/27527

In this PoC:

1) A dumping interface was added to record C2-compiled method sizes, enabled via the `-XX:+PrintOptoMethodSize` flag.

2) A new attribute was introduced in InlineMatcher: _inline_instructions_size. This attribute stores the estimated inlined size of a method, sourced from a compiler directive JSON file generated during a prior profiling run.

3) The inliner was updated to use these previously profiled method sizes to prevent aggressive inlining of large methods.

3. How to Use

To apply this approach to any workload, the workload must be run twice:
1) Pre-run: collect inlined sizes for all C2-compiled methods.
2) Product run: use the profiled method sizes to improve C2 inlining.

Step 1 Profile method size (pre-run)

Log the compiled method size:
`-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:+PrintOptoMethodSize -XX:LogFile=logmethodsize.out` This will generate a log containing method size information from C2.

Step 2 Generate the compiler directive file

Use the provided Python script to extract method size info and generate a JSON file:
`python3 extract_size_to_directives.py logmethodsize.out output_directives.json`

This file contains estimated inlined sizes to guide inlining decisions in product run. If the same method is compiled multiple times, the script conservatively retains the smallest observed size.
Note: Methods that are not accepted by the CompilerDirective format need to be excluded.

Step 3 Use the compiler directive in a product run

Pass the generated JSON to the JVM as a directive:
`-XX:+UnlockDiagnosticVMOptions -XX:CompilerDirectivesFile=output_directives.json`
This enables the inliner to make decisions using previously profiled method sizes, potentially avoiding aggressive inlining of large methods.
Note: The patch reuses the existing `inline` directive attribute for inlining control. If multiple inline rules match the same method, only the first match is effective.

4. Testing

I tested the following workloads using the method above and measured the code cache size with `-XX:+PrintCodeCache`. The results are shown below, compared against the mainline code. All statistics (min, max, median, mean) are based on three runs.

(patch - mainline) / mainline

1) Renaissance.dotty

Code size change:

AArch64:

```
used           min      max      median   mean
non-profiled   -9.88%   -8.13%   -8.92%   -8.98%
profiled       -0.73%   -0.21%   -0.40%   -0.45%
non-nmethods   -15.20%  -0.02%   -14.92%   -10.32%
codecache      -2.82%   -2.88%   -2.97%   -2.89%
max_used       min      max      median   mean
non-profiled   -9.88%   -8.13%   -8.92%   -8.98%
profiled       2.37%    1.41%    1.50%    1.76%
non-nmethods   -0.95%   -1.73%   -0.93%   -1.21%
codecache      -0.35%   -1.00%   -0.95%   -0.77%
```

X86:

```
used            min      max      median   mean
non-profiled    -9.72%   -9.61%   -9.36%   -9.56%
profiled        -0.81%   -0.90%   -1.15%   -0.95%
non-nmethods    -0.04%   0.04%    -0.02%   -0.01%
codecache       -2.94%   -2.96%   -3.11%   -3.00%
max_used        min      max      median   mean
non-profiled    -9.72%   -9.61%   -9.36%   -9.56%
profiled        2.32%    2.60%    2.51%    2.48%
non-nmethods    -0.63%   -2.25%   -1.28%   -1.39%
codecache       -0.68%   -0.59%   -0.70%   -0.66%
```

No significant performance changes were observed on either platform.

2) SPECjbb 2015

Code size change:

AArch64:

```
used           min      max       median    mean
non-profiled   -1.00%   -11.68%   -12.73%   -8.62%
profiled       9.07%    -6.93%    -2.34%    -0.29%
non-nmethods   0.02%    -0.02%    0.00%     0.00%
codecache      2.98%    -7.18%    -5.35%    -3.28%
max_used       min      max       median    mean
non-profiled   -10.85%  -11.68%   -12.73%   -11.76%
profiled       -2.09%   -11.65%   -1.26%   -5.62%
non-nmethods   0.13%    -1.21%    -0.16%   -0.41%
codecache      -6.42%   -6.33%    -6.10%   -6.29%
```

On the AArch64 platform, no significant performance changes were observed for either high-bound IR or max jOPS.

For critical jOPS:
```
Min      Median   Mean     Max     Var%
-2.45%   -1.87%   -2.45%   -3.00%  1.9%
```

X86:

```
used           min      max      median   mean
non-profiled   -9.02%   -9.65%   -7.93%   -8.87%
profiled       -6.09%   -3.18%   -4.52%   -4.61%
non-nmethods   -0.02%   0.25%    0.04%    0.09%
codecache      -5.36%   -4.75%   -4.58%   -4.90%
max_used       min      max      median   mean
non-profiled   -4.03%   -9.65%   -7.93%   -7.23%
profiled       -2.86%   1.16%    -1.03%   -0.93%
non-nmethods   0.02%    -0.08%   0.08%    0.01%
codecache      -0.23%   -4.20%   -3.70%   -2.73%
```

No significant performance change was observed on x86 platform.

5. AOT cache

The current procedure above requires three steps:
a pre-run to record method sizes,
a separate step to process the JSON file,
and finally a product run using the profiled method sizes.

This workflow may add extra burden to workload deployment.

With JEP 515 [4], we can instead store the estimated inlined size in the AOT cache when ciMethod::inline_instructions_size() is called during the premain run, and later load this size from the AOT cache during the product run [5].

The store-load mechanism for inlined size can help reduce the overhead of recomputing actual sizes, but it does not provide the inliner with much additional information about the callee, since the compilation order in the product run generally follows that of the premain run, even if not exactly.

To give the inliner more profiled information about callees, I tried another simple draft that records inlined sizes for more C2-compiled methods:
https://github.com/openjdk/jdk/pull/27519/commits/ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d

However, with this draft using the AOT cache, I did not observe any significant code size changes for any workloads. This may require further investigation.

[4] https://openjdk.org/jeps/515
[5] https://github.com/openjdk/jdk/blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/ciMethod.cpp#L1152

6 Questions

1) Relation to Project Leyden

Project Leyden aims to enhance the AOT cache to store compiled code from training runs [6]. This suggests that we may eventually prefer to cache compiled code directly from the AOT cache rather than rely solely on JIT compilation.

Given this direction, is it still worthwhile to invest further in using profiled method sizes as a means to improve inlining heuristics?

Could such profiling provide complementary benefits even if compiled code is cached?

2) Global profiling information for C2

Should we consider leveraging profiled information stored in the AOT cache to give the C2 inliner a broader, more global view of methods, enabling better inlining decisions?

For example, could global visibility into method sizes and call sites help address pathological cases of code bloat or missed optimization opportunities? [7]

[6] https://openjdk.org/jeps/8335368
[7] https://wiki.openjdk.org/display/hotspot/inlining

I'd greatly appreciate any feedback. Thank you for your time and consideration.

Thanks,
Fei
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-compiler-dev/attachments/20250926/9df54471/attachment-0001.htm>

From galder at openjdk.org  Fri Sep 26 15:16:02 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Fri, 26 Sep 2025 15:16:02 GMT
Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when
 MaxVectorSize=8
In-Reply-To: <RF-pB-qfXweWi6FuSCZZWnxElKTYN1m6JjP74TDVWqo=.81c23890-ed5e-454e-baab-b6119a3941e8@github.com>
References: <RF-pB-qfXweWi6FuSCZZWnxElKTYN1m6JjP74TDVWqo=.81c23890-ed5e-454e-baab-b6119a3941e8@github.com>
Message-ID: <gwuQ4sAnJ6RQZKJubVoq6D6VNbzYIrwsVpZA2gtsstc=.f1e3e85d-162b-4380-9884-02dd96ee0ef8@github.com>

On Mon, 22 Sep 2025 07:39:24 GMT, erifan <duke at openjdk.org> wrote:

> The VectorShape size of `I_SPECIES_FOR_CAST` declared in test **VectorMaskCompareNotTest.java** is half that of `L_SPECIES_FOR_CAST`. And `L_SPECIES_FOR_CAST` is created with the maximum shape. Therefore, if `MaxVectorSize` is set to 8, the shape size of `I_SPECIES_FOR_CAST` is 4, which is an illegal value because the minimum vector size requirement is 8 bytes.
> 
> This pull request addresses the issue by ensuring that this test runs only when `MaxVectorSize` is set to 16 bytes or higher.

Marked as reviewed by galder (Author).

-------------

PR Review: https://git.openjdk.org/jdk/pull/27418#pullrequestreview-3272701788

From galder at openjdk.org  Fri Sep 26 15:16:04 2025
From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=)
Date: Fri, 26 Sep 2025 15:16:04 GMT
Subject: RFR: 8368205: [TESTBUG] VectorMaskCompareNotTest.java crashes when
 MaxVectorSize=8
In-Reply-To: <t_5c0G7oxWo_STsHm7cD69tDbOnO1b89bTfqT97-rHs=.f0074364-8ba8-4b75-8c4f-4a9f8edb7398@github.com>
References: <RF-pB-qfXweWi6FuSCZZWnxElKTYN1m6JjP74TDVWqo=.81c23890-ed5e-454e-baab-b6119a3941e8@github.com>
 <fVnd20lp7ktkJCn1jfkdPJ_btnf0FTqs5TBJVCToFQI=.2a23d3cc-f75c-464a-9f97-2032579629fb@github.com>
 <t_5c0G7oxWo_STsHm7cD69tDbOnO1b89bTfqT97-rHs=.f0074364-8ba8-4b75-8c4f-4a9f8edb7398@github.com>
Message-ID: <HPfZfQmZWveO5dkWLT6p3vJ2diLPPJqcCZM-dQCGc30=.e47051b3-131e-4dd2-9a2b-4a8512db4341@github.com>

On Thu, 25 Sep 2025 06:44:24 GMT, erifan <duke at openjdk.org> wrote:

> Yeah I agree with you that It would be great if we could precisely control that the relevant tests don't run when MaxVectorSize==8, but the current test framework doesn't handle this well. One approach is to introduce a runtime check, but I feel this approach is inelegant and lacks precedent.

Hmmm, I had thought that maybe you could have an individual test if MaxVectorSize >= 16, but actually what you can only do is IR checks when say MaxVectorSize >= 16. So it seems the approach you've chosen is the best I can see.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27418#issuecomment-3339107246

From dfenacci at openjdk.org  Fri Sep 26 15:35:17 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Fri, 26 Sep 2025 15:35:17 GMT
Subject: RFR: 8368675: IGV: nodes are wrongly marked as changed in the
 difference view
In-Reply-To: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com>
References: <16kJmV4PRouP_riAC1VJkGcLNYkUKeD8PxnXZK7U-qs=.33d0f082-84eb-43c1-b8be-8112b41a29f8@github.com>
Message-ID: <TqZe1NLFyePdxUhhxEtPlYX9DocqxmQXHorZfybyzpk=.cd713cd6-4ced-4f0c-8db8-8f413941fa17@github.com>

On Fri, 26 Sep 2025 08:34:34 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset refines IGV's node difference analysis to ignore changes in node properties that are derived by IGV, as opposed to generated by HotSpot. Derived properties include the label and color of each node. Ignoring changes in these properties prevents IGV from wrongly marking equal nodes as "changed" (colored in yellow) when showing the difference between two graphs:
> 
> <img width="2323" height="651" alt="before-after" src="https://github.com/user-attachments/assets/ea51c86d-4719-45c0-b615-e6e4b8aec023" />
> 
> **Testing:** tier1 and manual testing on a few graphs.

Thanks @robcasloz! I've been wondering why sometimes there seemed to be a lot of yellow nodes ? Looks good to me too.

src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/serialization/Parser.java line 84:

> 82:     public static final String TO_INDEX_PROPERTY = "toIndex";
> 83:     public static final String TO_INDEX_ALT_PROPERTY = "index";
> 84:     public static final String EDGE_LABEL_PROPERTY = "label";

changed to improve its "expressiveness"?

-------------

Marked as reviewed by dfenacci (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27515#pullrequestreview-3272764606
PR Review Comment: https://git.openjdk.org/jdk/pull/27515#discussion_r2382759560

From vladimir.kozlov at oracle.com  Fri Sep 26 15:40:16 2025
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 26 Sep 2025 08:40:16 -0700
Subject: Leverage profiled compiled size to avoid aggressive inlining and
 code growth
In-Reply-To: <PA4PR08MB6080DD1BC95A2DD29D900ECFD41EA@PA4PR08MB6080.eurprd08.prod.outlook.com>
References: <PA4PR08MB60801239477769432BDCFEB8D41EA@PA4PR08MB6080.eurprd08.prod.outlook.com>
 <PA4PR08MB6080DD1BC95A2DD29D900ECFD41EA@PA4PR08MB6080.eurprd08.prod.outlook.com>
Message-ID: <8e23a680-cc52-4b43-8620-f74b982f58a1@oracle.com>

Hi Fei,

I think you stumble on `InlineSmallCode` (1000 or 1500) issues. The flag 
is used exactly for filter out inlining of previously big compiled code. 
But it not always helps - for example, if the method is inlined some 
paths could be removed due to constants (exact klass) propagation or EA 
can eliminate some allocations. We know about such limitation and have 
numerous RFEs to improve it.

On other hand, FreqInlineSize and MaxInlineSize flags are based on 
bytecode size of method. This are more stable.

Note, AOT profiling caching also preserves inlining decisions for C2 
which is used during JIT compilation in production run to reproduce 
compilation decisions in training run.

We don't advise to use JSON. Please, store information in AOT cache instead.

Regards,
Vladimir K

On 9/26/25 8:15 AM, Fei Gao wrote:
> Post to hotspot-compiler-dev at openjdk.org <mailto:hotspot-compiler- 
> dev at openjdk.org> instead of hotspot-compiler-dev at openjdk.java.net 
> <mailto:hotspot-compiler-dev at openjdk.java.net>.
> 
> Sorry for the repetition.
> 
> *From: *Fei Gao <Fei.Gao2 at arm.com>
> *Date: *Friday, 26 September 2025 at 14:52
> *To: *leyden-dev <leyden-dev at openjdk.org>, hotspot compiler <hotspot- 
> compiler-dev at openjdk.java.net>
> *Subject: *Leverage profiled compiled size to avoid aggressive inlining 
> and code growth
> 
> Hi @leyden-dev <mailto:leyden-dev at openjdk.org>and @hotspot compiler 
> <mailto:hotspot-compiler-dev at openjdk.java.net>,
> 
> *TL;DR*
> 
> **
> 
> *https://github.com/openjdk/jdk/pull/27527* <https://github.com/openjdk/ 
> jdk/pull/27527>
> 
> I proposed a PoC that explores leveraging profiled compiled sizes to 
> improve C2 inlining decisions and mitigate code bloat. The approach 
> records method sizes during a pre-run and feeds them back via compiler 
> directives, helping to reduce aggressive inlining of large methods.
> 
> Testing on Renaissance and SPECjbb2015 showed clear code size 
> differences but no significant performance impact on either AArch64 or 
> x86. An alternative AOT-cache-based approach was also evaluated but did 
> not produce meaningful code size changes.
> 
> Open questions remain about the long-term value of profiling given 
> Project Leyden's direction of caching compiled code in AOT, and whether 
> global profiling information could help C2 make better inlining decisions.
> 
> *1. Motivation*
> 
> In the current C2 behavior, the inliner only considers the estimated 
> inlined size [1] [2] of a callee if the method has already been compiled 
> by C2. In particular, C2 will explicitly reject inlining in the 
> following cases:
> 
>  ??? Hot methods with bytecode size > FreqInlineSize (325) [3]
> 
>  ??? Cold methods with bytecode size > MaxInlineSize (35)
> 
> However, a common situation arises where a method's bytecode size is 
> below 325, yet once compiled by C2 it produces a very large machine code 
> body. If this method has not been compiled at the time its caller is 
> being compiled, the inliner may aggressively inline it, potentially 
> bloating the caller, even though an independent compiled copy might 
> eventually exist.
> 
> To mitigate such cases, we can make previously profiled compiled sizes 
> available early, allowing the inliner to make more informed decisions 
> and reduce excessive code growth.
> 
> [1] https://github.com/openjdk/jdk/ 
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ 
> bytecodeInfo.cpp#L180 <https://github.com/openjdk/jdk/ 
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ 
> bytecodeInfo.cpp#L180>
> 
> [2] https://github.com/openjdk/jdk/ 
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ 
> bytecodeInfo.cpp#L274 <https://github.com/openjdk/jdk/ 
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ 
> bytecodeInfo.cpp#L274>
> 
> [3] https://github.com/openjdk/jdk/ 
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ 
> bytecodeInfo.cpp#L184 <https://github.com/openjdk/jdk/ 
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/opto/ 
> bytecodeInfo.cpp#L184>
> 
> *2. Proof of Concept*
> 
> To validate this idea, I created a proof-of-concept: *https:// 
> github.com/openjdk/jdk/pull/27527* <https://github.com/openjdk/jdk/ 
> pull/27527>
> 
> In this PoC:
> 
> 1) A dumping interface was added to record C2-compiled method sizes, 
> enabled via the `-XX:+PrintOptoMethodSize` flag.
> 
> 2) A new attribute was introduced in InlineMatcher: 
> _inline_instructions_size. This attribute stores the estimated inlined 
> size of a method, sourced from a compiler directive JSON file generated 
> during a prior profiling run.
> 
> 3) The inliner was updated to use these previously profiled method sizes 
> to prevent aggressive inlining of large methods.
> 
> *3. How to Use*
> 
> To apply this approach to any workload, the workload must be run twice:
> 
> 1) Pre-run: collect inlined sizes for all C2-compiled methods.
> 
> 2) Product run: use the profiled method sizes to improve C2 inlining.
> 
> Step 1 Profile method size (pre-run)
> 
> Log the compiled method size:
> 
> `-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX: 
> +PrintOptoMethodSize -XX:LogFile=logmethodsize.out` This will generate a 
> log containing method size information from C2.
> 
> Step 2 Generate the compiler directive file
> 
> Use the provided Python script to extract method size info and generate 
> a JSON file:
> 
> `python3 extract_size_to_directives.py logmethodsize.out 
> output_directives.json`
> 
> This file contains estimated inlined sizes to guide inlining decisions 
> in product run. If the same method is compiled multiple times, the 
> script conservatively retains the smallest observed size.
> 
> Note: Methods that are not accepted by the CompilerDirective format need 
> to be excluded.
> 
> Step 3 Use the compiler directive in a product run
> 
> Pass the generated JSON to the JVM as a directive:
> 
> `-XX:+UnlockDiagnosticVMOptions - 
> XX:CompilerDirectivesFile=output_directives.json`
> 
> This enables the inliner to make decisions using previously profiled 
> method sizes, potentially avoiding aggressive inlining of large methods.
> 
> Note: The patch reuses the existing `inline` directive attribute for 
> inlining control. If multiple inline rules match the same method, only 
> the first match is effective.
> 
> *4. Testing*
> 
> I tested the following workloads using the method above and measured the 
> code cache size with `-XX:+PrintCodeCache`. The results are shown below, 
> compared against the mainline code. All statistics (min, max, median, 
> mean) are based on three runs.
> 
> (patch - mainline) / mainline
> 
> 1) Renaissance.dotty
> 
> Code size change:
> 
> AArch64:
> 
> ```
> 
> used?????????? min????? max????? median?? mean
> 
> non-profiled?? -9.88%?? -8.13%?? -8.92%?? -8.98%
> 
> profiled?????? -0.73%?? -0.21%?? -0.40%?? -0.45%
> 
> non-nmethods?? -15.20%? -0.02%?? -14.92%?? -10.32%
> 
> codecache????? -2.82%?? -2.88%?? -2.97%?? -2.89%
> 
> max_used?????? min????? max????? median?? mean
> 
> non-profiled?? -9.88%?? -8.13%?? -8.92%?? -8.98%
> 
> profiled?????? 2.37%??? 1.41%??? 1.50%??? 1.76%
> 
> non-nmethods?? -0.95%?? -1.73%?? -0.93%?? -1.21%
> 
> codecache????? -0.35%?? -1.00%?? -0.95%?? -0.77%
> 
> ```
> 
> X86:
> 
> ```
> 
> used??????????? min????? max????? median?? mean
> 
> non-profiled??? -9.72%?? -9.61%?? -9.36%?? -9.56%
> 
> profiled??????? -0.81%?? -0.90%?? -1.15%?? -0.95%
> 
> non-nmethods??? -0.04%?? 0.04%??? -0.02%?? -0.01%
> 
> codecache?????? -2.94%?? -2.96%?? -3.11%?? -3.00%
> 
> max_used??????? min????? max????? median?? mean
> 
> non-profiled??? -9.72%?? -9.61%?? -9.36%?? -9.56%
> 
> profiled??????? 2.32%??? 2.60%??? 2.51%??? 2.48%
> 
> non-nmethods??? -0.63%?? -2.25%?? -1.28%?? -1.39%
> 
> codecache?????? -0.68%?? -0.59%?? -0.70%?? -0.66%
> 
> ```
> 
> No significant performance changes were observed on either platform.
> 
> 2) SPECjbb 2015
> 
> Code size change:
> 
> AArch64:
> 
> ```
> 
> used?????????? min????? max?????? median??? mean
> 
> non-profiled?? -1.00%?? -11.68%?? -12.73%?? -8.62%
> 
> profiled?????? 9.07%??? -6.93%??? -2.34%??? -0.29%
> 
> non-nmethods?? 0.02%??? -0.02%??? 0.00%???? 0.00%
> 
> codecache????? 2.98%??? -7.18%??? -5.35%??? -3.28%
> 
> max_used?????? min????? max?????? median??? mean
> 
> non-profiled?? -10.85%? -11.68%?? -12.73%?? -11.76%
> 
> profiled?????? -2.09%?? -11.65%?? -1.26%?? -5.62%
> 
> non-nmethods?? 0.13%??? -1.21%??? -0.16%?? -0.41%
> 
> codecache????? -6.42%?? -6.33%??? -6.10%?? -6.29%
> 
> ```
> 
> On the AArch64 platform, no significant performance changes were 
> observed for either high-bound IR or max jOPS.
> 
> For critical jOPS:
> 
> ```
> 
> Min????? Median?? Mean???? Max???? Var%
> 
> -2.45%?? -1.87%?? -2.45%?? -3.00%? 1.9%
> 
> ```
> 
> X86:
> 
> ```
> 
> used?????????? min????? max????? median?? mean
> 
> non-profiled?? -9.02%?? -9.65%?? -7.93%?? -8.87%
> 
> profiled?????? -6.09%?? -3.18%?? -4.52%?? -4.61%
> 
> non-nmethods?? -0.02%?? 0.25%??? 0.04%??? 0.09%
> 
> codecache????? -5.36%?? -4.75%?? -4.58%?? -4.90%
> 
> max_used?????? min????? max????? median?? mean
> 
> non-profiled?? -4.03%?? -9.65%?? -7.93%?? -7.23%
> 
> profiled?????? -2.86%?? 1.16%??? -1.03%?? -0.93%
> 
> non-nmethods?? 0.02%??? -0.08%?? 0.08%??? 0.01%
> 
> codecache????? -0.23%?? -4.20%?? -3.70%?? -2.73%
> 
> ```
> 
> No significant performance change was observed on x86 platform.
> 
> *5. AOT cache*
> 
> The current procedure above requires three steps:
> 
> a pre-run to record method sizes,
> 
> a separate step to process the JSON file,
> 
> and finally a product run using the profiled method sizes.
> 
> This workflow may add extra burden to workload deployment.
> 
> With JEP 515 [4], we can instead store the estimated inlined size in the 
> AOT cache when ciMethod::inline_instructions_size() is called during the 
> premain run, and later load this size from the AOT cache during the 
> product run [5].
> 
> The store-load mechanism for inlined size can help reduce the overhead 
> of recomputing actual sizes, but it does not provide the inliner with 
> much additional information about the callee, since the compilation 
> order in the product run generally follows that of the premain run, even 
> if not exactly.
> 
> To give the inliner more profiled information about callees, I tried 
> another simple draft that records inlined sizes for more C2-compiled 
> methods:
> 
> https://github.com/openjdk/jdk/pull/27519/commits/ 
> ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d <https://github.com/openjdk/ 
> jdk/pull/27519/commits/ef5e61f3d68ad565ee11e2cc6aa57b6e2697ae6d>
> 
> However, with this draft using the AOT cache, I did not observe any 
> significant code size changes for any workloads. This may require 
> further investigation.
> 
> [4] https://openjdk.org/jeps/515 <https://openjdk.org/jeps/515>
> 
> [5] https://github.com/openjdk/jdk/ 
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/ 
> ciMethod.cpp#L1152 <https://github.com/openjdk/jdk/ 
> blob/0ba4141cb12414c08be88b37ea2a163aacbfa7de/src/hotspot/share/ci/ 
> ciMethod.cpp#L1152>
> 
> *6 Questions*
> 
> 1) Relation to Project Leyden
> 
> Project Leyden aims to enhance the AOT cache to store compiled code from 
> training runs [6]. This suggests that we may eventually prefer to cache 
> compiled code directly from the AOT cache rather than rely solely on JIT 
> compilation.
> 
> Given this direction, is it still worthwhile to invest further in using 
> profiled method sizes as a means to improve inlining heuristics?
> 
> Could such profiling provide complementary benefits even if compiled 
> code is cached?
> 
> 2) Global profiling information for C2
> 
> Should we consider leveraging profiled information stored in the AOT 
> cache to give the C2 inliner a broader, more global view of methods, 
> enabling better inlining decisions?
> 
> For example, could global visibility into method sizes and call sites 
> help address pathological cases of code bloat or missed optimization 
> opportunities? [7]
> 
> [6] https://openjdk.org/jeps/8335368 <https://openjdk.org/jeps/8335368>
> 
> [7] https://wiki.openjdk.org/display/hotspot/inlining <https:// 
> wiki.openjdk.org/display/hotspot/inlining>
> 
> I'd greatly appreciate any feedback. Thank you for your time and 
> consideration.
> 
> Thanks,
> 
> Fei
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy 
> the information in any medium. Thank you.


From dfenacci at openjdk.org  Fri Sep 26 15:42:59 2025
From: dfenacci at openjdk.org (Damon Fenacci)
Date: Fri, 26 Sep 2025 15:42:59 GMT
Subject: RFR: 8368753: IGV: improve CFG view of difference graphs
In-Reply-To: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com>
References: <_gL57sBFOa4QlLN9_di6aDM-CwiFPjhR2jgVnUZsVDQ=.7b916923-ad53-49b2-99dc-cb01254e69f7@github.com>
Message-ID: <u-OmHbkin-AV3l3isNbPwBIIoNCbgHezCnn5RoVaQI4=.c5f46abc-a0e7-44dc-aa5c-6e4fa432f575@github.com>

On Fri, 26 Sep 2025 09:48:57 GMT, Roberto Casta?eda Lozano <rcastanedalo at openjdk.org> wrote:

> This changeset improves the control-flow graph view of difference graphs by:
> 
> 1. ensuring that nodes are scheduled locally within each block, and
> 2. hiding internal, artificial blocks containing nodes that remain in the graph even if they are dead, such as the top constant node.
> 
> The following screenshot illustrates the effect of scheduling nodes locally:
> 
> <img width="3853" height="1033" alt="JDK-8368753" src="https://github.com/user-attachments/assets/bdc0f6de-3d28-4615-9e0d-221de2ad4770" />
> 
> For example, before this changeset (left) the `Return` node in B9 is scheduled at the beginning of the block. After the changeset (right), this node is scheduled last, as expected.
> 
> **Testing:** tier1 and manual testing on a few graphs.

Thanks for this fix too @robcasloz. Looks good to me.

src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java line 2:

> 1: /*
> 2:  * Copyright (c) 1998, 2024, Oracle and/or its affiliates. All rights reserved.

Since you've changed the copyright year in the other files... ?

-------------

Marked as reviewed by dfenacci (Committer).

PR Review: https://git.openjdk.org/jdk/pull/27520#pullrequestreview-3272794922
PR Review Comment: https://git.openjdk.org/jdk/pull/27520#discussion_r2382777696

From mdoerr at openjdk.org  Fri Sep 26 16:19:51 2025
From: mdoerr at openjdk.org (Martin Doerr)
Date: Fri, 26 Sep 2025 16:19:51 GMT
Subject: RFR: 8368787: Error reporting: hs_err files should show instructions
 when referencing code in nmethods
Message-ID: <H7tb5BgV6pWDJ2g7Zrk9p_kLlAsD-LP2JRwHFK0wZZk=.35899ed5-f4e9-4984-af75-9c6949ffa7ed@github.com>

We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it.

We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses.

I've tested this proposal by the following code on x86_64:

diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp
index a6b4efbe4f2..d715e69c850 100644
--- a/src/hotspot/cpu/x86/interp_masm_x86.cpp
+++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp
@@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() {
 void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) {
   prepare_to_jump_from_interpreted();
 
+  if (UseNewCode) {
+    Label ok;
+    movptr(temp, Address(method, Method::from_interpreted_offset()));
+    cmpptr(temp, Address(method, Method::interpreter_entry_offset()));
+    je(ok);
+    movptr(rax, Address(method, Method::from_compiled_offset()));
+    movptr(rbx, rax);
+    addptr(rbx, 128);
+    hlt();
+    bind(ok);
+  }
+
   if (JvmtiExport::can_post_interpreter_events()) {
     Label run_compiled_code;
     // JVMTI events, such as single-stepping, are implemented partly by avoiding running


The output is:

RAX=0x00007f3b19000100 is at entry_point+0 in (nmethod*)0x00007f3b19000008
Compiled method (c1) 2915    1       3       java.lang.Byte::toUnsignedInt (6 bytes)
 total in heap  [0x00007f3b19000008,0x00007f3b190001f8] = 496
 main code      [0x00007f3b19000100,0x00007f3b190001b8] = 184
 stub code      [0x00007f3b190001b8,0x00007f3b190001f8] = 64
 mutable data [0x00007f3ab401e0b0,0x00007f3ab401e0e0] = 48
 relocation     [0x00007f3ab401e0b0,0x00007f3ab401e0d8] = 40
 metadata       [0x00007f3ab401e0d8,0x00007f3ab401e0e0] = 8
 immutable data [0x00007f3ab401dcd0,0x00007f3ab401dd30] = 96
 dependencies   [0x00007f3ab401dcd0,0x00007f3ab401dcd8] = 8
 scopes pcs     [0x00007f3ab401dcd8,0x00007f3ab401dd18] = 64
 scopes data    [0x00007f3ab401dd18,0x00007f3ab401dd30] = 24
--------------------------------------------------------------------------------
  0x00007f3b19000100:   mov    %eax,-0x18000(%rsp)
  0x00007f3b19000107:   push   %rbp
  0x00007f3b19000108:   sub    $0x20,%rsp
  0x00007f3b1900010c:   cmpl   $0x1,0x20(%r15)
  0x00007f3b19000114:   je     0x00007f3b1900011b
--------------------------------------------------------------------------------
RBX=0x00007f3b19000180 is at entry_point+128 in (nmethod*)0x00007f3b19000008
Compiled method (c1) 2916    1       3       java.lang.Byte::toUnsignedInt (6 bytes)
 total in heap  [0x00007f3b19000008,0x00007f3b190001f8] = 496
 main code      [0x00007f3b19000100,0x00007f3b190001b8] = 184
 stub code      [0x00007f3b190001b8,0x00007f3b190001f8] = 64
 mutable data [0x00007f3ab401e0b0,0x00007f3ab401e0e0] = 48
 relocation     [0x00007f3ab401e0b0,0x00007f3ab401e0d8] = 40
 metadata       [0x00007f3ab401e0d8,0x00007f3ab401e0e0] = 8
 immutable data [0x00007f3ab401dcd0,0x00007f3ab401dd30] = 96
 dependencies   [0x00007f3ab401dcd0,0x00007f3ab401dcd8] = 8
 scopes pcs     [0x00007f3ab401dcd8,0x00007f3ab401dd18] = 64
 scopes data    [0x00007f3ab401dd18,0x00007f3ab401dd30] = 24
--------------------------------------------------------------------------------
  0x00007f3b19000179:   movabs $0x7f3b19000150,%r10
  0x00007f3b19000183:   mov    %r10,0x590(%r15)
--------------------------------------------------------------------------------


Feedback and further improvement suggestions are welcome.

-------------

Commit messages:
 - 8368787: Error reporting: hs_err files should print instructions when referencing code in nemthods

Changes: https://git.openjdk.org/jdk/pull/27530/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27530&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8368787
  Stats: 18 lines in 1 file changed: 18 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/27530.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27530/head:pull/27530

PR: https://git.openjdk.org/jdk/pull/27530

From mchevalier at openjdk.org  Fri Sep 26 16:24:36 2025
From: mchevalier at openjdk.org (Marc Chevalier)
Date: Fri, 26 Sep 2025 16:24:36 GMT
Subject: RFR: 8366444: Add support for add/mul reduction operations for
 Float16
In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com>
References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com>
Message-ID: <8jnLugyioePdrnVuu9GRZ7VBgVGw9c8Hg00YTQRQAoQ=.d8677216-3330-49b6-a72c-b8e8ae454a34@github.com>

On Fri, 26 Sep 2025 12:00:31 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species.
> 
> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets -
> 
> **For AddReduction :**
> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction.
> 
> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order.
> 
> **For MulReduction :**
> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported.
> 
> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` -
> 
> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch.
> Ratio > 1 indicates the performance with this patch is better than the master branch.
> 
> **N1 (UseSVE = 0, max vector length = 16B):**
> 
> Benchmark         vectorDim  Mode   Cnt  8B     16B
> ReductionAddFP16  256        thrpt  9    1.41   1.40
> ReductionAddFP16  512        thrpt  9    1.41   1.41
> ReductionAddFP16  1024       thrpt  9    1.43   1.40
> ReductionAddFP16  2048       thrpt  9    1.43   1.40
> ReductionMulFP16  256        thrpt  9    1.22   1.22
> ReductionMulFP16  512        thrpt  9    1.21   1.23
> ReductionMulFP16  1024       thrpt  9    1.21   1.22
> ReductionMulFP16  2048       thrpt  9    1.20   1.22
> 
> 
> On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ...

I seem to have a failure on `compiler/vectorization/TestFloat16VectorOperations.java` on aarch64
in `C2_MacroAssembler::neon_reduce_add_fp16(FloatRegister, FloatRegister, FloatRegister, unsigned int, FloatRegister)` at `src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp:1930`:

assert(vector_length_in_bytes == 8 || vector_length_in_bytes == 16) failed: unsupported

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3339456935

From missa at openjdk.org  Fri Sep 26 18:33:58 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Fri, 26 Sep 2025 18:33:58 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v18]
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <0KTPduHDhyYKhkKtQrr-pMAgcbYA3zsN1mduzTag0is=.73dd2bc1-afb9-48bf-9f56-d2cf898d8efb@github.com>

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Provide clearer assert messages for vector cast functions in c2 macro-assembler

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26919/files
  - new: https://git.openjdk.org/jdk/pull/26919/files/0415ddf2..c96f136f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=17
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26919&range=16-17

  Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/26919.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919

PR: https://git.openjdk.org/jdk/pull/26919

From missa at openjdk.org  Fri Sep 26 18:34:04 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Fri, 26 Sep 2025 18:34:04 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v17]
In-Reply-To: <jFiBDO6uYX5EXKBqZsgDnHIkwYOblvtsogSzpCz4Xag=.9beaabfc-c4e9-4bbb-88ec-f7b25dac49d9@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <pyUhO3pJd4eUVlaJ4dSVjXaD2adG9iwxtuMcFxV4NXk=.1f5f5133-648f-460d-90e0-528b650d24e7@github.com>
 <jFiBDO6uYX5EXKBqZsgDnHIkwYOblvtsogSzpCz4Xag=.9beaabfc-c4e9-4bbb-88ec-f7b25dac49d9@github.com>
Message-ID: <ICBav3mUJyKIXakUOcfQKyq6mS_VKG-V9ar9jXFVZO8=.927e8d4e-ab89-493c-961f-55507f302c94@github.com>

On Fri, 26 Sep 2025 09:09:02 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

>> Mohamed Issa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision:
>> 
>>  - Merge branch 'master' of https://github.com/missa-prime/jdk into user/missa-prime/avx10_2
>>  - Use compiler generator instead of standard Java streams
>>  - Clean up scalar floating point conversion tests
>>  - Introduce scalar floating point conversion tests with IR rules
>>  - Add extra constraints to vector floating point conversion instruction predicates and tests
>>  - Change the floating point conversion instruction, IR nodes, and test rules to make them clearer
>>  - Change debug text format of AVX 10.2 vector conversion instructions
>>  - Check for instructions that shouldn't appear in vector floating point conversion tests
>>  - Correctly calculate vector lengths and don't rely on VectorReinterpret in cast2F2X and cast2D2X memory instructions
>>  - Add new IR nodes covering x86 floating point conversion instructions
>>  - ... and 11 more: https://git.openjdk.org/jdk/compare/66285fb8...0415ddf2
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5182:
> 
>> 5180:       evpmovdb(dst, dst, vec_enc);
>> 5181:       break;
>> 5182:     default: assert(false, "%s", type2name(to_elem_bt));
> 
> Perhaps you could provide a richer assert message like this:
> Suggestion:
> 
>     default: assert(false, "unexpexted basic type for vector castF2X AVX10: %s", type2name(to_elem_bt));

Sure, I added more verbose messages for the other vector casts as well.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26919#discussion_r2383175122

From sviswanathan at openjdk.org  Fri Sep 26 18:55:27 2025
From: sviswanathan at openjdk.org (Sandhya Viswanathan)
Date: Fri, 26 Sep 2025 18:55:27 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v18]
In-Reply-To: <0KTPduHDhyYKhkKtQrr-pMAgcbYA3zsN1mduzTag0is=.73dd2bc1-afb9-48bf-9f56-d2cf898d8efb@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <0KTPduHDhyYKhkKtQrr-pMAgcbYA3zsN1mduzTag0is=.73dd2bc1-afb9-48bf-9f56-d2cf898d8efb@github.com>
Message-ID: <CdFMM610_pYL84092s4bADDKAKGnp8rqULPHiSfgIE0=.f0579acf-08e7-4d69-93aa-d15c2655ef5b@github.com>

On Fri, 26 Sep 2025 18:33:58 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Provide clearer assert messages for vector cast functions in c2 macro-assembler

Changes look good.

-------------

Marked as reviewed by sviswanathan (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26919#pullrequestreview-3273494149

From missa at openjdk.org  Fri Sep 26 19:53:18 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Fri, 26 Sep 2025 19:53:18 GMT
Subject: RFR: 8360558: Use hex literals instead of decimal literals in math
 intrinsic constants
In-Reply-To: <qP6bJTXjzcBmrnjMqW9uLAOJkghpDfm1KT7UPBVYhlM=.635c3d5b-ce3d-4d71-af9a-5d58fd4fd1c8@github.com>
References: <JqJDah4V1rq9XBUq5WxLd_fnn1CjYCFpBlh2NadSFgI=.969a9af0-0d3f-414b-a3db-8f17e5e6164d@github.com>
 <qP6bJTXjzcBmrnjMqW9uLAOJkghpDfm1KT7UPBVYhlM=.635c3d5b-ce3d-4d71-af9a-5d58fd4fd1c8@github.com>
Message-ID: <l1otE5MvF9IPZeYDulwxIy0dmKwRle0DpWiKi_t7_v8=.deda4421-abc2-43d3-b68e-00f6d22fe4df@github.com>

On Fri, 26 Sep 2025 09:18:15 GMT, Manuel H?ssig <mhaessig at openjdk.org> wrote:

> Thank you for this improvement. The few values I checked match ? I assume this is generated and there is not really a way to review it?
> 
> I kicked off testing and will report back with results.

Yeah, I just wrote a script to convert all the values. I think someone could only comprehensively review by verifying with a similar utility.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27497#issuecomment-3340254407

From dlong at openjdk.org  Fri Sep 26 20:42:17 2025
From: dlong at openjdk.org (Dean Long)
Date: Fri, 26 Sep 2025 20:42:17 GMT
Subject: RFR: 8338197: [ubsan] ad_x86.hpp:6417:11: runtime error: shift
 exponent 100 is too large for 32-bit type 'unsigned int' [v5]
In-Reply-To: <BIlH0ZSo-ztrvtoihxHfQ8IYax2PoM0MBFCrw8wZ4_I=.e1d8e2a1-2623-4551-a65e-801f02cbc6aa@github.com>
References: <uKUByg7RkOyLsGYoajinrOf76Uu00PIJ-fBeWOKVNcI=.1d4fc3ed-2fd3-454c-9fa4-af97fc676b48@github.com>
 <BIlH0ZSo-ztrvtoihxHfQ8IYax2PoM0MBFCrw8wZ4_I=.e1d8e2a1-2623-4551-a65e-801f02cbc6aa@github.com>
Message-ID: <bzV42Z7Eip5z5-JQgG0ZedD74Cgtx07O137thhPqn4c=.c617ce34-c190-49d8-8014-d21882c7dfc9@github.com>

On Wed, 17 Sep 2025 18:32:10 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

>> This reworks the recent update https://github.com/openjdk/jdk/pull/24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal.
>> 
>> The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is `XX:+OptoScheduling` option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple `java -version` run.
>> 
>> This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments.
>> 
>> The problems is that shift count `n` may be too large here:
>> 
>> class Pipeline_Use_Cycle_Mask {
>> protected:
>>   uint _mask;
>>   ..
>>   Pipeline_Use_Cycle_Mask& operator<<=(int n) {
>>     _mask <<= n;
>>     return *this;
>>   }
>> };
>> 
>> The recent change attempted to cap the shift amount at one call site:
>> 
>> class Pipeline_Use_Element {
>> protected:
>>   ..
>>   // Mask of specific used cycles
>>   Pipeline_Use_Cycle_Mask _mask;
>>   ..
>>   void step(uint cycles) {
>>     _used = 0;
>>     uint max_shift = 8 * sizeof(_mask) - 1;
>>     _mask <<= (cycles < max_shift) ? cycles : max_shift;
>>   }
>> }
>> 
>> However, there is another site where `Pipeline_Use_Cycle_Mask::operator<<=` can be called with a too-large shift count:
>> 
>> // The following two routines assume that the root Pipeline_Use entity
>> // consists of exactly 1 element for each functional unit
>> // start is relative to the current cycle; used for latency-based info
>> uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const {
>>   for (uint i = 0; i < pred._count; i++) {
>>     const Pipeline_Use_Element *predUse = pred.element(i);
>>     if (predUse->_multiple) {
>>       uint min_delay = 7;
>>       // Multiple possible functional units, choose first unused one
>>       for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
>>         const Pipeline_Use_Element *currUse = element(j);
>>         uint curr_delay = delay;
>>         if (predUse->_used & currUse->_used) {
>>           Pipeline_Use_Cycle_Mask x = predUse->_mask;
>>           Pipeline_Use_Cycle_Mask y = currUse->_mask;
>> 
>>           for ( y <<= curr_delay; x.overlaps(y); curr_delay++ )
>>             y <<= 1;
>>         }
>>         if (min_delay > curr_delay)
>>           min_delay = curr_delay;
>>       }
>>       if (delay < min_delay)
>>       delay = min_delay;
>>     }
>>     else {
>>       for (uint j = predUse->_lb; j <= pre...
>
> Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - reduce fixed_latency(100) to fixed_latency(30) for calls/traps on ARM, PPC, RISC-V, X86
>  - use uint32_t for _mask
>  - remove redundant code
>  - 8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int'

I'm OK with this as-is, but how about @adinn 's concern:
> If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so

Should that be a blocking dependency or a separate RFE?

-------------

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26890#pullrequestreview-3273897206

From missa at openjdk.org  Fri Sep 26 21:14:32 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Fri, 26 Sep 2025 21:14:32 GMT
Subject: Integrated: 8364305: Support AVX10 saturating floating point
 conversion instructions
In-Reply-To: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
Message-ID: <WfUFN6SLGGf_1jNZ4rcEYanLsY7KBs35fSRhDcYrqPs=.e6bd24ff-adca-46c0-8c0d-39971985b2d0@github.com>

On Mon, 25 Aug 2025 05:20:23 GMT, Mohamed Issa <missa at openjdk.org> wrote:

> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
> 
> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regist
 ers to store intermediate results.
> 
> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
> 
> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`
> 10. `jtreg:test/hotspot/jtreg...

This pull request has now been integrated.

Changeset: 37f0e74d
Author:    Mohamed Issa <missa at openjdk.org>
Committer: Sandhya Viswanathan <sviswanathan at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/37f0e74d328d909810b54f7889cca991426d7488
Stats:     816 lines in 11 files changed: 768 ins; 0 del; 48 mod

8364305: Support AVX10 saturating floating point conversion instructions

Reviewed-by: sviswanathan, sparasa, jbhateja

-------------

PR: https://git.openjdk.org/jdk/pull/26919

From vlivanov at openjdk.org  Fri Sep 26 21:35:28 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 26 Sep 2025 21:35:28 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v18]
In-Reply-To: <0KTPduHDhyYKhkKtQrr-pMAgcbYA3zsN1mduzTag0is=.73dd2bc1-afb9-48bf-9f56-d2cf898d8efb@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <0KTPduHDhyYKhkKtQrr-pMAgcbYA3zsN1mduzTag0is=.73dd2bc1-afb9-48bf-9f56-d2cf898d8efb@github.com>
Message-ID: <Cb7s63HquI00hmWbj3Yuz1_9gHraVGK8kr_Sk3lzffU=.745be7f3-01be-45ea-9ec9-a054f0dd7354@github.com>

On Fri, 26 Sep 2025 18:33:58 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> Intel&reg; AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
>> 
>> Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary regis
 ters to store intermediate results.
>> 
>> This change uses the new AVX10.2 scalar (VCVTTSS2SIS or  VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with `-XX:-UseSuperWord` / `-XX:+UseSuperWord` options to exercise both scalar and vector paths. The baseline build used is [OpenJDK v26-b11](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B11).
>> 
>> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java`
>> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java`
>> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java`
>> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java`
>> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java`
>> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java`...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Provide clearer assert messages for vector cast functions in c2 macro-assembler

I haven't thoroughly reviewed the patch, but what caught my eye is that avx10 and avx10_2 are used interchangeably which adds confusion. My recollection is that AVX10.1 is equivalent to AVX512 set of capabilities. Can we uniformly refer to AVX10.2 as AVX10 in the code base then?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3340580166

From vlivanov at openjdk.org  Fri Sep 26 21:46:19 2025
From: vlivanov at openjdk.org (Vladimir Ivanov)
Date: Fri, 26 Sep 2025 21:46:19 GMT
Subject: RFR: 8366333: AArch64: Enhance SVE subword type implementation of
 vector compress [v3]
In-Reply-To: <jPOasUwP5m_uEo6K07ybBr_QQKmv-vunDU-78Kz6VWg=.6d66e01a-03a2-477a-8368-de983eaa88c6@github.com>
References: <tP4CE7n1GaoLjhB6BfpKBNaX_GCTwI-cJjUbanwm4Qg=.eb29b9ce-10b0-4d5c-8407-25deda9c3f6d@github.com>
 <jPOasUwP5m_uEo6K07ybBr_QQKmv-vunDU-78Kz6VWg=.6d66e01a-03a2-477a-8368-de983eaa88c6@github.com>
Message-ID: <Z8Kbi7H0NsOp1JDC9w3S-GMb3n4Ik6XWBq5qFEDY2-A=.b7aa4df6-e3d0-4fa6-b4d0-134f530b1ae7@github.com>

On Tue, 23 Sep 2025 09:54:53 GMT, erifan <duke at openjdk.org> wrote:

>> The AArch64 SVE and SVE2 architectures lack an instruction suitable for subword-type `compress` operations. Therefore, the current implementation uses the 32-bit SVE `compact` instruction to compress subword types by first widening the high and low parts to 32 bits, compressing them, and then narrowing them back to their original type. Finally, the high and low parts are merged using the `index + tbl` instructions.
>> 
>> This approach is significantly slower compared to architectures with native support. After evaluating all available AArch64 SVE instructions and experimenting with various implementations?such as looping over the active elements, extraction, and insertion?I confirmed that the existing algorithm is optimal given the instruction set. However, there is still room for optimization in the following two aspects:
>> 1. Merging with `index + tbl` is suboptimal due to the high latency of the `index` instruction.
>> 2. For partial subword types, operations to the highest half are unnecessary because those bits are invalid.
>> 
>> This pull request introduces the following changes:
>> 1. Replaces `index + tbl` with the `whilelt + splice` instructions, which offer lower latency and higher throughput.
>> 2. Eliminates unnecessary compress operations for partial subword type cases.
>> 3. For `sve_compress_byte`, one less temporary register is used to alleviate potential register pressure.
>> 
>> Benchmark results demonstrate that these changes significantly improve performance.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE:
>> 
>> Benchmark	            Unit	Before	 Error	After	 Error	Uplift
>> Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>> Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>> Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>> Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>> 
>> 
>> This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments, and all tests passed.
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
> 
>  - Improve some code style
>  - Merge branch 'master' into JDK-8366333-compress
>  - Merge branch 'master' into JDK-8366333-compress
>  - 8366333: AArch64: Enhance SVE subword type implementation of vector compress
>    
>    The AArch64 SVE and SVE2 architectures lack an instruction suitable for
>    subword-type `compress` operations. Therefore, the current implementation
>    uses the 32-bit SVE `compact` instruction to compress subword types by
>    first widening the high and low parts to 32 bits, compressing them, and
>    then narrowing them back to their original type. Finally, the high and
>    low parts are merged using the `index + tbl` instructions.
>    
>    This approach is significantly slower compared to architectures with native
>    support. After evaluating all available AArch64 SVE instructions and
>    experimenting with various implementations?such as looping over the active
>    elements, extraction, and insertion?I confirmed that the existing algorithm
>    is optimal given the instruction set. However, there is still room for
>    optimization in the following two aspects:
>    1. Merging with `index + tbl` is suboptimal due to the high latency of
>    the `index` instruction.
>    2. For partial subword types, operations to the highest half are unnecessary
>    because those bits are invalid.
>    
>    This pull request introduces the following changes:
>    1. Replaces `index + tbl` with the `whilelt + splice` instructions, which
>    offer lower latency and higher throughput.
>    2. Eliminates unnecessary compress operations for partial subword type cases.
>    3. For `sve_compress_byte`, one less temporary register is used to alleviate
>    potential register pressure.
>    
>    Benchmark results demonstrate that these changes significantly improve performance.
>    
>    Benchmarks on Nvidia Grace machine with 128-bit SVE:
>    ```
>    Benchmark	        Unit	Before	 Error	After	 Error	Uplift
>    Byte128Vector.compress	ops/ms	4846.97	 26.23	6638.56	 31.60	1.36
>    Byte64Vector.compress	ops/ms	2447.69	 12.95	7167.68	 34.49	2.92
>    Short128Vector.compress	ops/ms	7174.88	 40.94	8398.45	 9.48	1.17
>    Short64Vector.compress	ops/ms	3618.72	 3.04	8618.22	 10.91	2.38
>    ```
>    
>    This PR was tested on 128-bit, 256-bit, and 512-bit SVE environments,
>    and all tests passed.

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2208:

> 2206: // Preserves: mask, vtmp_zr
> 2207: void C2_MacroAssembler::sve_compress_short(FloatRegister dst, FloatRegister src, PRegister mask,
> 2208:                                            FloatRegister vtmp, FloatRegister vtmp_zr,

On code style: it's confusing to see a temp register used in non-destructive way to pass a constant. If you want to save on materializing an all 0 vector constant, I suggest to name it differently (e.g., `zr`) and put the argument before vtmp.

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2274:

> 2272:   //                  mask  = 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1, one character is 1 bit.
> 2273:   // Expected result: dst   = 0 0 0 0 0 0 0 0 0 0 0 p i g c a
> 2274:   sve_dup(vtmp3, B, 0);

For clarity, you could declare a local `FloatRegister vzr = vtmp3` and refer to it at all use sites. That would make things clearer.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2383550544
PR Review Comment: https://git.openjdk.org/jdk/pull/27188#discussion_r2383550389

From missa at openjdk.org  Fri Sep 26 23:56:25 2025
From: missa at openjdk.org (Mohamed Issa)
Date: Fri, 26 Sep 2025 23:56:25 GMT
Subject: RFR: 8364305: Support AVX10 saturating floating point conversion
 instructions [v18]
In-Reply-To: <Cb7s63HquI00hmWbj3Yuz1_9gHraVGK8kr_Sk3lzffU=.745be7f3-01be-45ea-9ec9-a054f0dd7354@github.com>
References: <ePdgOM2cTYwmytnegh3Jo8btGpB0ws3Dx-SQjG78m7I=.d81de474-93c5-45ef-b70e-35775c6f9e02@github.com>
 <0KTPduHDhyYKhkKtQrr-pMAgcbYA3zsN1mduzTag0is=.73dd2bc1-afb9-48bf-9f56-d2cf898d8efb@github.com>
 <Cb7s63HquI00hmWbj3Yuz1_9gHraVGK8kr_Sk3lzffU=.745be7f3-01be-45ea-9ec9-a054f0dd7354@github.com>
Message-ID: <hYTc8qxLDsRXrcP1qG-lr9OXK-AbDOMq9AJj3enkhrk=.1c334db1-29b7-4853-9b5b-b8fdc31e10d9@github.com>

On Fri, 26 Sep 2025 21:32:34 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> I haven't thoroughly reviewed the patch, but what caught my eye is that avx10 and avx10_2 are used interchangeably which adds confusion. My recollection is that AVX10.1 is equivalent to AVX512 set of capabilities. Can we uniformly refer to AVX10.2 as AVX10 in the code base then?

In the future, we could get AVX10.3 which would be superset of AVX10.2. So, these new floating point conversion instructions would apply to AVX10.3 as well. With that in mind, I think it's useful to have the generic AVX10 label as an umbrella and then distinguish between sub-versions when strictly necessary.

Is the main issue that each AVX10 reference (e.g., `C2_MacroAssembler::vector_castF2X_avx10`) isn't descriptive enough about which minimum sub-version is in use at a quick glance? Or are other concerns that should be addressed?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26919#issuecomment-3340824400