RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9]
Vladimir Kempik
vkempik at openjdk.org
Thu May 11 05:09:48 UTC 2023
On Wed, 10 May 2023 14:24:30 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:
>> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1166:
>>
>>> 1164: slli(cnt1, cnt1, LogBitsPerByte);
>>> 1165: sll(tmp1, tmp1, cnt1);
>>> 1166: bnez(tmp1, DONE);
>>
>> I guess the following sequence would help better utilize the instruction pipeline stall:
>>
>> ld(tmp1, Address(a1));
>> ld(tmp2, Address(a2));
>> neg(cnt1, cnt1);
>> slli(cnt1, cnt1, LogBitsPerByte);
>> xorr(tmp1, tmp1, tmp2);
>> sll(tmp1, tmp1, cnt1);
>> bnez(tmp1, DONE);
>
> that is hard to say.
>
> OoO arches such as thead - don't care about the location of xor opcode here
>
> In order uarches, such as u74/hifive might be affected by such change. however, the memory at address a1/a2 very likely would already be in the l1d cache, due to previous accesses in the same function, so it will be pretty cheap.
> u74 is dual-issue, so it may execute these two loads (from l1d$) in parallel, having these addresses cached in l1d would make such optimisation hard to spot.
>
> To say for sure, need to check with jmh test org.openjdk.bench.java.lang.StringEquals on hifive
Before the PR
Benchmark Mode Cnt Score Error Units
StringEquals.almostEqual avgt 25 1214.131 ± 4.400 ns/op
StringEquals.almostEqualUTF16 avgt 25 1213.310 ± 7.156 ns/op
StringEquals.different avgt 25 20.102 ± 2.306 ns/op
StringEquals.differentCoders avgt 25 14.780 ± 1.147 ns/op
StringEquals.equal avgt 25 1218.393 ± 5.275 ns/op
StringEquals.equalsUTF16 avgt 25 1216.750 ± 4.383 ns/op
With this PR
Benchmark Mode Cnt Score Error Units
StringEquals.almostEqual avgt 25 28.584 ± 1.178 ns/op
StringEquals.almostEqualUTF16 avgt 25 28.375 ± 1.052 ns/op
StringEquals.different avgt 25 19.572 ± 1.031 ns/op
StringEquals.differentCoders avgt 25 14.969 ± 2.348 ns/op
StringEquals.equal avgt 25 28.603 ± 0.148 ns/op
StringEquals.equalsUTF16 avgt 25 29.217 ± 1.969 ns/op
Xor moved
Benchmark Mode Cnt Score Error Units
StringEquals.almostEqual avgt 25 28.455 ± 1.068 ns/op
StringEquals.almostEqualUTF16 avgt 25 28.244 ± 0.920 ns/op
StringEquals.different avgt 25 18.940 ± 0.831 ns/op
StringEquals.differentCoders avgt 25 14.566 ± 1.298 ns/op
StringEquals.equal avgt 25 27.891 ± 0.606 ns/op
StringEquals.equalsUTF16 avgt 25 28.294 ± 0.913 ns/op
hard to say
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1190639944
More information about the hotspot-dev
mailing list