RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2]

Ludovic Henry luhenry at openjdk.org
Thu Nov 3 15:22:17 UTC 2022


On Thu, 3 Nov 2022 12:16:41 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> The offset needs to be aligned on 32 bytes (the lower 5 bits must be zero). There is then no guarantee that `$mem$$base + ($mem$$disp & ~((1<<5)-1)` is still on the same cache line. It's then easier to do a prefetch of `base+disp` with `offset = 0`.
>
> But what if we are passed some $mem$$disp which is multiple of 32 and thus satisfies the constraint? Then this "addi" instruction could be optimized out, right? Because we could encoding $mem$$disp in the offset field.

We want to guarantee that `$mem$$base + $mem$$disp` is on the same cache line as `$mem$$base + ($mem$$disp & ~0x1f)` or even `$mem$$base + (($mem$$disp & ~0x1f) + 0x1f)`. It's easy to find cases that trip these two possible solutions:
1. for `$mem$$base + ($mem$$disp & ~0x1f)`, if `$mem$$base = 0x30` and `$mem$$disp = 0x10`, then `$mem$$base + $mem$$disp = 0x40`, while `$mem$$base + ($mem$$disp & 0x1f) = 0x30` which are not on the same 64 bytes cache line.
2. for `$mem$$base + ($mem$$disp & ~0x1f) + 0x20`, if `$mem$$base = 0x30` and `$mem$$disp = 0x0`, then `$mem$$base + ($mem$$disp & ~0x1f) + 0x20 = 0x50` which again are not on the same 64 bytes cache line.

The simplest and most accurate solution is the `__ addi` just before. Given that it's an immediate add and the disp values are at a maximum of `64 * CacheLineSize`, it's always going to fit in `addi` immediate.

I can add a compile time check for the value of `$mem$$disp` and if a multiple of `32` then we can use the offset, otherwise we use `addi`.

-------------

PR: https://git.openjdk.org/jdk/pull/10884


More information about the hotspot-dev mailing list