RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9]

Vladimir Kempik vkempik at openjdk.org
Thu May 11 05:09:48 UTC 2023


On Wed, 10 May 2023 14:24:30 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1166:
>> 
>>> 1164:   slli(cnt1, cnt1, LogBitsPerByte);
>>> 1165:   sll(tmp1, tmp1, cnt1);
>>> 1166:   bnez(tmp1, DONE);
>> 
>> I guess the following sequence would help better utilize the instruction pipeline stall:
>> 
>>   ld(tmp1, Address(a1));
>>   ld(tmp2, Address(a2));
>>   neg(cnt1, cnt1);
>>   slli(cnt1, cnt1, LogBitsPerByte);
>>   xorr(tmp1, tmp1, tmp2);
>>   sll(tmp1, tmp1, cnt1);
>>   bnez(tmp1, DONE);
>
> that is hard to say.
> 
> OoO arches such as thead - don't care about the location of xor opcode here
> 
> In order uarches, such as u74/hifive might be affected by such change. however, the memory at address a1/a2 very likely would already be in the l1d cache, due to previous accesses in the same function, so it will be pretty cheap.
> u74 is dual-issue, so it may execute these two loads (from l1d$) in parallel, having these addresses cached in l1d would make such optimisation hard to spot.
> 
> To say for sure, need to check with jmh test org.openjdk.bench.java.lang.StringEquals on hifive

Before the PR


Benchmark                      Mode  Cnt     Score   Error  Units
StringEquals.almostEqual       avgt   25  1214.131 ± 4.400  ns/op
StringEquals.almostEqualUTF16  avgt   25  1213.310 ± 7.156  ns/op
StringEquals.different         avgt   25    20.102 ± 2.306  ns/op
StringEquals.differentCoders   avgt   25    14.780 ± 1.147  ns/op
StringEquals.equal             avgt   25  1218.393 ± 5.275  ns/op
StringEquals.equalsUTF16       avgt   25  1216.750 ± 4.383  ns/op



With this PR


Benchmark                      Mode  Cnt   Score   Error  Units
StringEquals.almostEqual       avgt   25  28.584 ± 1.178  ns/op
StringEquals.almostEqualUTF16  avgt   25  28.375 ± 1.052  ns/op
StringEquals.different         avgt   25  19.572 ± 1.031  ns/op
StringEquals.differentCoders   avgt   25  14.969 ± 2.348  ns/op
StringEquals.equal             avgt   25  28.603 ± 0.148  ns/op
StringEquals.equalsUTF16       avgt   25  29.217 ± 1.969  ns/op


Xor moved


Benchmark                      Mode  Cnt   Score   Error  Units
StringEquals.almostEqual       avgt   25  28.455 ± 1.068  ns/op
StringEquals.almostEqualUTF16  avgt   25  28.244 ± 0.920  ns/op
StringEquals.different         avgt   25  18.940 ± 0.831  ns/op
StringEquals.differentCoders   avgt   25  14.566 ± 1.298  ns/op
StringEquals.equal             avgt   25  27.891 ± 0.606  ns/op
StringEquals.equalsUTF16       avgt   25  28.294 ± 0.913  ns/op


hard to say

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1190639944


More information about the hotspot-dev mailing list