RFR: 8366427: C2 SuperWord: refactor VTransform scalar nodes

Fri Aug 29 14:37:51 UTC 2025

On Fri, 29 Aug 2025 09:49:46 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> The goal is to split up some cases that are currently treated the same, but will alter have different behavior. There may be a little bit of code duplication, but the code will soon be made different ;)
> 
> We split the `VTransformScalarNode`:
> - `VTransformMemopScalarNode`
>   - Uses that only wanted scalar mem nodes can now directly check for `isa_MemopScalar`.
>   - We can directly store the `_vpointer` in a field, that way we don't need to do a lookup via `vloop_analyzer`. This could also be helpful later on if we ever do widening (unrolling during auto vectorization): we could then do the necessary modifications to the `vpointer`.
> - `VTransformLoopPhiNode`
>   - Later on, they will play a more special role, they will give us easy access to the beginning state of the loop body and the backedges.
> - `VTransformCFGNode`
>   - Calling them scalar nodes is not 100% accurate. We'll probably have to further refine them later on. But splitting them off now seems like a reasonable choice. Once we do if-conversion we'll have to do more work on CFG.
> - `VTransformDataScalarNode`
>   - These represent all the normal "calculation" nodes in the loop.
> - `VTransformInputScalarNode` -> `VTransformOuterNode`:
>   - For now, we are still just tracking input nodes, but soon we will need to track input and output nodes: basically just the 1-hop neighbourhood of nodes outside the loop. I'm already renaming them now, so it will be less noise later.
> 
> I decided to rather split up more, and avoid the `VTransformScalarNode` together, avoiding having to override overrides - that can be really confusing (e.g. what I had with `is_load_in_loop`).

src/hotspot/share/opto/vtransform.cpp line 711:

> 709: }
> 710: 
> 711: VTransformApplyResult VTransformMemopScalarNode::apply(VTransformApplyState& apply_state) const {

Why we need to pass unused `apply_state` in these methods?

src/hotspot/share/opto/vtransform.cpp line 1009:

> 1007:   tty->print("node[%d %s] ", _node->_idx, _node->Name());
> 1008:   _vpointer.print_on(tty, false);
> 1009: }

Consider separate RFE to use `outputStream*` for all prints.  If we go into UL word we need to collect all outputs in one buffer as we discussed on recent meeting.

src/hotspot/share/opto/vtransform.hpp line 457:

> 455: class VTransformMemopScalarNode : public VTransformNode {
> 456: private:
> 457:   MemNode* _node;

Why not have `_node` in `VTransformNode` class and use `MemNode* node() const { return _node->as_Mem(); }` in this class?  similar to other new classes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310324107
PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310320298
PR Review Comment: https://git.openjdk.org/jdk/pull/27002#discussion_r2310339568