RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10]
Anjian-Wen
duke at openjdk.org
Mon May 19 06:47:53 UTC 2025
On Mon, 19 May 2025 06:22:21 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:
>> yes, I think normally if we unroll the byte storage we can gains additional performance. But sometime the dest address may not be aligned with the count, make the performance very poor on some align sensitive hardware. An additional alignment is required, it seems store the bytes one by one with a loop may be a simple way with limited performance loss compare with it?
>
> The current version is okay. I mean, we can unroll the storage bytes and save some "bnez" by eliminating the loop, something like this:
>
> bind(unroll_4);
> test_bit(tmp, count, 2);
> beqz(tmp, unroll_2);
> sb(value, Address(dest, 0);
> sb(value, Address(dest, 1);
> sb(value, Address(dest, 2);
> sb(value, Address(dest, 3);
> addi(dest, dest, 4);
> subi(count, count, 4);
>
> bind(unroll_2);
> test_bit(tmp, count, 1);
> beqz(tmp, unroll_1);
> sb(value, Address(dest, 0);
> sb(value, Address(dest, 1);
> addi(dest, dest, 2);
> subi(count, count, 2);
>
> bind(unroll_1);
> test_bit(tmp, count, 0);
> beqz(tmp, end);
> sb(value, Address(dest, 0);
> addi(dest, dest, 1);
> subi(count, count, 1);
>
> bind(end);
I understand, that makes sence, it seems can reduce 4 jump when count is 7, I will test on that later, thanks!!
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094953863
More information about the hotspot-compiler-dev
mailing list