RFR: 8346964: C2: Improve integer multiplication with constant in MulINode::Ideal()
Emanuel Peter
epeter at openjdk.org
Mon Feb 10 21:25:02 UTC 2025
On Wed, 8 Jan 2025 02:06:11 GMT, erifan <duke at openjdk.org> wrote:
>> Is this an improvement on aarch64 for all implementations? What about x64?
>
> If `a*6` is in a loop and can be vectorized, there may be big performance improvement. If it's not vectorized, there may be some small performance loss. See the test results of `a*18 = (a<<4) + (a<<1)`, (same with `a*6 = (a<<2) + (a<<1)`) in three different cases:
>
>
> Benchmark V2-now V2-after Uplift Genoa-now Genoa-after Uplift Notes
> testInt18 98.90 102.94 0.96 142.48 140.75 1.01 scalar
> testInt18AddSum 68.70 48.10 1.42 26.88 16.78 1.6 vectorized
> testInt18Store 41.31 43.39 0.95 21.23 20.88 1.01 vectorized
>
>
> We can see that for scalar case the conversion from `a*6 => (a<<2) + (a<<1)` is profitable on aarch64, I have a follow up patch to reimplement this pattern in aarch64 backend, I'll file it later. But for x64, there is no obvious performance change whether or not to do this conversion. So this is also why I leave a TODO in [mulnode.cpp](https://github.com/openjdk/jdk/pull/22922/files/193dc4e5760007784cffd64ef14e0050b0be92b3#diff-b1bd52f0743843e15452764f48ff43c15dd3192a28bfb684b34149f0e964996e)
Benchmark V2-now V2-after Uplift Genoa-now Genoa-after Uplift Notes
testInt18 98.90 102.94 0.96 142.48 140.75 1.01 scalar
Ok, that would be a 4% regression on V2. It is not much, but still possibly relevant.
I think I would need to see a clear strategy that we can actually pull off. Otherwise it may be that you introduce a regression here, that then nobody gets around to fixing later.
>> Also: this is very confusing: why does the result differ depending on `AlignVector`?
>
> Without this patch, since the VPointer issue you mentioned this loop does not vectorize at all.
> With this patch, `i*6` is not changed to `(i<<2) + (i<<1)`, so the VPointer issue is bypassed. So if `AlignVector` is false, this loop will be vecotized.
>
>> why does the result differ depending on `AlignVector` ?
>
> Because this loop operates on a discontinuous and unaligned array address space. If we require aligned vector (that is `AlignVector` is true), this loop will not be vectorized, otherwise it will be vectorized.
Right, that makes sense. I actually thought about that too after work yesterday evening.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22922#discussion_r1906565583
PR Review Comment: https://git.openjdk.org/jdk/pull/22922#discussion_r1906561384
More information about the hotspot-compiler-dev
mailing list