RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10]

Mon May 19 06:26:51 UTC 2025

On Sun, 18 May 2025 12:10:59 GMT, Anjian-Wen <duke at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1749:
>> 
>>> 1747:     __ addi(to, to, 1);
>>> 1748:     __ subi(count, count, 1);
>>> 1749:     __ bnez(count, L_loop);
>> 
>> If we unroll the byte storage, will there be additional performance gains when the `count` is less than 8?
>
> yes, I think normally if we unroll the byte storage we can gains additional performance.  But sometime the dest address may not be aligned with the count,  make the performance very poor on some align sensitive hardware. An additional alignment is required, it seems store the bytes one by one with a loop may be a simple way with limited performance loss compare with it?

The current version is okay. I mean, we can unroll the storage bytes and save some "bnez" by eliminating the loop, something like this:

bind(unroll_4);
test_bit(tmp, count, 2);
beqz(tmp, unroll_2);
sb(value, Address(dest, 0);
sb(value, Address(dest, 1);
sb(value, Address(dest, 2);
sb(value, Address(dest, 3);
addi(dest, dest, 4);
subi(count, count, 4);

bind(unroll_2);
test_bit(tmp, count, 1);
beqz(tmp, unroll_1);
sb(value, Address(dest, 0);
sb(value, Address(dest, 1);
addi(dest, dest, 2);
subi(count, count, 2);

bind(unroll_1);
test_bit(tmp, count, 0);
beqz(tmp, end);
sb(value, Address(dest, 0);
addi(dest, dest, 1);
subi(count, count, 1);

bind(end);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094920943