RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences

Tue Jun 3 17:52:47 UTC 2025

On Tue, 27 May 2025 17:26:59 GMT, Manuel Hässig <mhaessig at openjdk.org> wrote:

> ## Summary
> 
> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result:
> 
> ; OptoAssembly
> 03d     decode_heap_oop_not_null R8,R10
> 041     leaq    R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32
> 
> ; x86
> 0x00007f1f210625bd:   lea    (%r12,%r10,8),%r8        ; result is unused
> 0x00007f1f210625c1:   lea    0xc(%r12,%r10,8),%r10    ; the same computation as decode, but with offset
> 
> 
> This PR adds a peephole optimization to remove such redundant `lea`s.
> 
> ## The Issue in Detail
> 
> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes
> 
> LoadN -> decodeHeapOop_not_null -> leaP*
>     ______________________________Î
> 
> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`:
> 
> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897
> 
> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode,  so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers.
> 
> This leaves us with a handful of possible solutions:
>  1. implement narrow bases for derived oops in oop maps,
>  2. perform some dead code elimination after we know which oops are part of oop maps,
>  3. add a peephole optimization to simply remove unused `lea`s.
> 
> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a...

src/hotspot/cpu/x86/peephole_x86_64.cpp line 244:

> 242: // the DecodeN. However, after matching the DecodeN is added back as the base for the leaP*,
> 243: // which is nessecary if the oop derived by the leaP* gets added to an OopMap, because OopMaps
> 244: // cannot contain derived oops with narrow oops as a base.

Am I correct to assume that if it is referenced in OopMap (which is side table) it will by referenced by some Safepoint node in graph?

src/hotspot/cpu/x86/peephole_x86_64.cpp line 255:

> 253: // This peephole recognizes graphs of the shape as shown above, ensures that the result of the
> 254: // decode is only used by the derived oop and removes that decode if this is the case. Futher,
> 255: // multipe leaP*s can have the same decode as their base. This peephole will remove the decode

Typo `multipe`

src/hotspot/cpu/x86/peephole_x86_64.cpp line 267:

> 265: //           |   /              \
> 266: //           leaP*          MachProj (leaf)
> 267: // In this case where te common parent of the leaP* and the decode is one MemToRegSpill Copy

Typo: `te`

src/hotspot/cpu/x86/peephole_x86_64.cpp line 268:

> 266: //           leaP*          MachProj (leaf)
> 267: // In this case where te common parent of the leaP* and the decode is one MemToRegSpill Copy
> 268: // away, this peephole can als recognize the decode as redundant and also remove the spill copy

Typo: `als`

src/hotspot/cpu/x86/peephole_x86_64.cpp line 270:

> 268: // away, this peephole can als recognize the decode as redundant and also remove the spill copy
> 269: // if that is only used by the decode.
> 270: bool Peephole::lea_remove_redundant(Block* block, int block_index, PhaseCFG* cfg_, PhaseRegAlloc* ra_,

Why do you need `_` suffix?

src/hotspot/cpu/x86/peephole_x86_64.cpp line 324:

> 322: 
> 323:   // Ensure the MachProj is in the same block as the decode and the lea.
> 324:   if (!block->contains(proj)) {

Should we check `proj == nullptr` ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124504924
PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124512685
PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124513810
PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124514534
PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124516088
PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2124528016