RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V [v2]

Fei Yang fyang at openjdk.org
Thu Nov 3 12:25:39 UTC 2022


On Wed, 2 Nov 2022 13:19:56 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/riscv.ad line 5196:
>> 
>>> 5194: 
>>> 5195:   ins_encode %{
>>> 5196:     __ addi(t0, as_Register($mem$$base), $mem$$disp);
>> 
>> This might be further improved as I see prefetch instructions can receive some immediate offset.
>
> The offset needs to be aligned on 32 bytes (the lower 5 bits must be zero). There is then no guarantee that `$mem$$base + ($mem$$disp & ~((1<<5)-1)` is still on the same cache line. It's then easier to do a prefetch of `base+disp` with `offset = 0`.

But what if we are passed some $mem$$disp which is multiple of 32 and thus satisfies the constraint? Then this "addi" instruction could be optimized out, right? Because we could encoding $mem$$disp in the offset field.

>> src/hotspot/os_cpu/linux_riscv/prefetch_linux_riscv.inline.hpp line 36:
>> 
>>> 34:         (void (*)(const void*, intptr_t))StubRoutines::riscv::prefetch_r();
>>> 35:     if (interval >= 0 && stub != NULL) {
>>> 36:         stub(loc, interval);
>> 
>> I am not sure if it really worth it to call a stub for read / write here. It looks to me not a big issue for the case the stub tries to catch and resolve. And I see aarch64 simply plant a 'prfm' instruction for prefetching [1]. I guess we might can do the same? 
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/prefetch_linux_aarch64.inline.hpp#L34
>
> We would need to check for `UseZicbop` in any case; the access to a global variable is then required.
> 
> It would be the same issue as https://github.com/openjdk/jdk/pull/10884/files/e968f7164124dcf560807c9ff7765e6f82b64cdd#diff-e3c18b8b83898e82b5a3069319df6a47468e91cc2527bf065e704a685a20f26bR5196 without the stub.
> 
> I've to admit that the `interval` naming here is confusing since no implementation ever uses it as an interval but alway as an offset. Also, the callers assume it to be an offset, like `ContiguousSpace::prepare_for_compaction` for example.

Yes, I agree that a check for UseZicbop option would be necessary. But I still don't understand why we should implement this through a stub here. It looks to me that CPP code with inline assembly would also do. At least this could help eliminate the prologue & epilogue cost of calling the stub.

-------------

PR: https://git.openjdk.org/jdk/pull/10884


More information about the hotspot-dev mailing list