RFR: 8318650: Optimized subword gather for x86 targets. [v14]
Emanuel Peter
epeter at openjdk.org
Mon Feb 26 13:27:00 UTC 2024
On Mon, 26 Feb 2024 13:09:22 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1584:
>>
>>> 1582: if (elem_bt == T_SHORT) {
>>> 1583: Label case0, case1, case2, case3;
>>> 1584: Label* larr[] = {&case0, &case1, &case2, &case3};
>>
>> Not sure if I asked this already: why define them all here, rather than locally in the loop?
>
> To avoid invariant initializations to happen within the loop, compiler will unroll this small loop and will forward the initializations, if it does not then we can save redundant allocation within loop.
At the risk of becoming too nit-picky: which allocations are you talking about? Given you only have a single src and a single dst for this label/jump. So you won't use `_patch_overflow`. And therefore, all allocations are on the stack. The way you do it now, it seems you would allocate 4x the stack memory here, compared to doing it locally in the loop, where the stack space could potentially be reused between the iterations.
It seems to me this is an optimization at the cost of code-style. Having them local makes it more clear that you are only jumping inside a iteration, and not between iterations.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502606358
More information about the hotspot-compiler-dev
mailing list