RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences [v7]
Roberto Castañeda Lozano
rcastanedalo at openjdk.org
Tue Jun 17 17:51:34 UTC 2025
On Tue, 17 Jun 2025 11:15:49 GMT, Manuel Hässig <mhaessig at openjdk.org> wrote:
>> ## Summary
>>
>> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result:
>>
>> ; OptoAssembly
>> 03d decode_heap_oop_not_null R8,R10
>> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32
>>
>> ; x86
>> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused
>> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset
>>
>>
>> This PR adds a peephole optimization to remove such redundant `lea`s.
>>
>> ## The Issue in Detail
>>
>> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes
>>
>> LoadN -> decodeHeapOop_not_null -> leaP*
>> ______________________________Î
>>
>> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`:
>>
>> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897
>>
>> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers.
>>
>> This leaves us with a handful of possible solutions:
>> 1. implement narrow bases for derived oops in oop maps,
>> 2. perform some dead code elimination after we know which oops are part of oop maps,
>> 3. add a peephole optimization to simply remove unused `lea`s.
>>
>> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi...
>
> Manuel Hässig has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix test flags
Looks good!
src/hotspot/cpu/x86/peephole_x86_64.cpp line 351:
> 349: Node* dependant_lea = decode->fast_out(i);
> 350: if (dependant_lea->is_Mach() && dependant_lea->as_Mach()->ideal_Opcode() == Op_AddP) {
> 351:
Nit: you could remove this empty line.
test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 146:
> 144: }
> 145: i += 1;
> 146: }
Now that you only have one dimension, I think it is simpler to replace this loop with:
scenarios[0] = new Scenario(0, "-XX:+IgnoreUnrecognizedVMOptions", "-XX:-OptoPeephole");
scenarios[1] = new Scenario(1, "-XX:+IgnoreUnrecognizedVMOptions", "-XX:+OptoPeephole");
test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 895:
> 893: machOnly(LEA_P_32_NARROW, "leaP32Narrow");
> 894: }
> 895:
This rule is unused and could be removed.
-------------
Marked as reviewed by rcastanedalo (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/25471#pullrequestreview-2936529256
PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2152811481
PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2152823062
PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2152818974
More information about the hotspot-compiler-dev
mailing list