RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6]

Emanuel Peter epeter at openjdk.org
Thu Jan 8 16:10:08 UTC 2026


On Mon, 22 Dec 2025 12:09:01 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET
>> Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions.
>> Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant.
>> 
>> Consider X as the multiplicand, by variying the scale of  first LEA instruction we can generate 4 input i.e.
>> 
>> 
>>     X + X * 1 = 2X
>>     X + X * 2 = 3X
>>     X + X * 4 = 5X
>>     X + X * 8 = 9X
>> 
>> 
>> Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the 
>> scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add.
>> 
>> 
>>       BASE   INDEX   SCALE  MULTIPLER
>>         X      X       1       2       (Terminal)
>>         X      X       2       3       (Terminal)
>>         X      X       4       5       (Terminal)
>>         X      X       8       9       (Terminal)
>>        3X     3X       1       6
>>         X     3X       2       7
>>        5X     5X       1      10
>>         X     5X       2      11
>>         X     3X       4      13
>>        5X     5X       2      15
>>         X     2X       8      17
>>        9X     9X       1      18
>>         X     9X       2      19
>>         X     5X       4      21
>>        5X     5X       4      25
>>        9X     9X       2      27
>>         X     9X       4      37
>>         X     5X       8      41
>>        9X     9X       4      45
>>         X     9X       8      73
>>        9X     9X       8      81
>> 
>> 
>> All the non-unity inputs tied to BASE / INDEX  are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions.
>> 
>> Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers.
>> 
>> 
>> System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:-
>> Baseline:-
>> Benchmark                                             Mode  Cnt    Score   Error    Units
>> ConstantMultiplierOptimization.testConstMultiplierI  thrpt    2  189.690          ops/min
>> ConstantMultiplierOptimization.testConstMultiplierL  thrpt    2  196.283          ops/min
>> 
>> 
>> Withopt:-
>> Benchmark                                             Mode  Cnt    Score   Error    Units
>> Constant...
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Extending micro and jtreg tests for memory patterns

Looks like a neat optimization :)
I have a few initial comments.

test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 89:

> 87:         ));
> 88:         var testTemplate1 = Template.make(() -> scope(
> 89:             IntStream.of(81, 73, 45, 41, 37, 27, 25, 21, 19, 13, 11).mapToObj(

Is there something special about these values? If yes: add a code comment :)
If no: could we add random values to the list to improve coverage and find edge cases?

test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 102:

> 100:                         private static void runMultBy#{multiplier}I() {
> 101:                             int multiplicand = RANDOM.nextInt();
> 102:                             Verify.checkEQ(#{multiplier} * multiplicand, testMultBy#{multiplier}I(multiplicand));

I think that the `@Run` method also gets compiled, so probably both sides of the verification are compiled. Is that your intention? Probably not, right?

-------------

PR Review: https://git.openjdk.org/jdk/pull/28759#pullrequestreview-3640094256
PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2672927447
PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2672963980


More information about the hotspot-compiler-dev mailing list