RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v13]

Jatin Bhateja jbhateja at openjdk.org
Thu Feb 19 06:24:11 UTC 2026


On Tue, 17 Feb 2026 12:16:43 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

> I don't agree with this change because the benefit is little, the intrinsic does not stand in the future, and the implementation is hacky and not trivial. I think I can accept this PR if:
> 
> * It can be proved that the intrinsic is necessary even when we have constant information for `TypeVect`.
> * The fall back inlining can be made more generally applicable, and it is reliable under the non-determinism of incremental inlining.

Hi @merykitty,

Thank you for the detailed feedback. I’ve looked into the alternative you suggested—adding constant/lane info to TypeVect and having the compiler recognize the slice pattern (rearrange + blend with constant shuffle and mask) and replace it with VectorSliceNode, without a dedicated intrinsic—and I’d like to explain why I still believe the hybrid call generator is the more practical choice for this change, while keeping the door open for the pattern-based approach later.

**Why the pattern-based approach is still very non-trivial**

Even *after* we have constant shuffle and constant mask recognizing the slice idiom and recovering `origin` remains highly non-trivial:

1. **The shuffle is not a single constant node.** It is the output of `iotaShuffle(origin, 1, true)`, which is implemented by *several* nodes in the IR. So we must match that iotaShuffle subgraph and find the node whose input is the scalar `origin` (and require that to be ConI), 

2. **The mask** is compare-with-constant (e.g. `iota.compare(LT, filter)` with `filter = vlen - origin`) . 

3. **Actual patten match involving constant shuffle / mask ** depends on getting the blend/rearrange wiring right (which input is vec1 vs vec2, both rearranges using the *same* shuffle, mask semantics matching slice). 

So even with TypeVect constant info, we still need very complex / non-trivial *pattern* match probable the biggest pattern match in idiealization

**Why the hybrid call generator is a better fit for this change**

- **Bounded complexity:** We have one clear boundary—the slice call—and one check—`origin->is_con()`. If true, we emit VectorSliceNode; otherwise we inline the Java fallback. No graph pattern matching, no decoding of shuffle/mask vectors, no dependency on the exact shape of iotaShuffle or compare nodes.

- **Same user benefit when it works:** In both approaches, constant-index slice gets VectorSliceNode → VPALIGNR. With the hybrid approach, if we don’t intrinsify (e.g. origin not yet constant), we still inline the fallback and the subsequent Vector API calls can be optimized as today. 

**Proposed path**

I think the most practical path is to **proceed with the hybrid call generator** for this PR so we can ship a reliable slice optimization (including for use cases like simdjson) without taking on the full cost and risk of TypeVect constant info and slice-pattern matching right now. Once this is in, we have a clear baseline: constant-index slice is optimized; variable index uses the inlined fallback.

If we later introduce TypeVect constant info and a more generic framework for recognizing vector idioms (e.g. patterns that consume constant shuffle/mask or that match subgraphs like iotaShuffle), we can **revisit** slice: add a pattern-based recognition that either supplements or eventually replaces the intrinsic.  Please note x86 back end implimentation will still be usable if we later remove intrinsic and use complex patten  matching to deduce VectorSliceNode

Thanks again for the review.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3924942814


More information about the core-libs-dev mailing list