RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register
Robbin Ehn
rehn at openjdk.org
Thu May 16 07:58:02 UTC 2024
On Wed, 15 May 2024 09:34:11 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
> Hi, please consider!
>
> Materializing a 48-bit pointer, using an additional register, we can do with:
> lui + lui + slli + add + addi
> This 15% faster both on VF2 and in CPU models, compared to movptr().
>
> As we often materialize during calls there is free registers.
>
> I have choose just a few spot to use it, many more can use.
> E.g. la() with tmp register can use li48 instead of movptr.
>
> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast.
> And benchmarks when hardware is free.
Yes, but it's a long term job, as you need to free a register in many cases. (in non-call sites places)
All callsites should be easy to change as you have plenty of callee saved registers which are already saved when using movptr.
As li48 is faster than li when using more than 32-bits these cases should also use li48.
I.e. mv t0, addr
But mv is fishy partly because of RegisterOrConstant constructor, so we can't tell in mv if this was an address or not.
I have been looking into cleaning that up, so mv with literal and mv with address is two seperate cases.
To keep them apart would be to use e.g. "li reg, literal" and "li48 reg, temp_reg, address".
As there is much work, this PR is intended as the first step with the hardest peices implemented already, i.e. li48 is ready to go.
If we also fix mov_metadata la()->li48 we reduce static call stub size down from 12 to 10 instruction, which is significant.
That one is on my todo list.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2114315982
More information about the hotspot-dev
mailing list