RFR: 8310268: RISC-V: misaligned memory access in String.Compare intrinsic [v4]

Vladimir Kempik vkempik at openjdk.org
Mon Jul 24 11:01:44 UTC 2023


On Mon, 24 Jul 2023 09:59:16 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

>> Please review this fix. it eliminates misaligned loads in String.compare on risc-v
>> 
>> for small compares ( <= 72 bytes), the instrinsic in c2_MacroAssembler.cpp is used,
>> it reads ( in case of UU/LL) 8 bytes per loop, and at then end, it reads tail - misaligned load of last 8 bytes from the string.
>> 
>> so if string length is not 8x bytes long then last load is misaligned, also it performs read/compare of some data which already was processed.
>> 
>> I have changed that to compare only last length%8 bytes using SHORT_STRING part of intrinsic for UL/LU. But for UU/LL I have made an optimised version.
>> 
>> Thanks to optimisations for conditional branching at line [947](https://github.com/openjdk/jdk/pull/14534/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R947) I’ve got no perf drop on thead ( with +AvoidUnalignedAccesses) which supports misaligned access.
>> 
>> Improvements to inflate_XX() methods gives 3-5% improvements for UL/LU cases on thead, almost no perf change on hifive.
>> 
>> for large strings, the instrinsics from stubGenerator.cpp is used
>> for UU/LL - generate_compare_long_string_same_encoding, I have just replaced misaligned ld with load_long_misaligned. Since this tail reading is not on hot path, this give some small penalty for thead when -XX:+AvoidUnalignedAccesses.
>> 
>> large LU/UL comparision is done in compare_long_string_different_encoding in sutbGenerator.cpp:
>> These changes are partially based on feilongjiang's patch, but I have changed tail reading to prevent reading past the end of string. I have observed no perf difference between feilongjiang's and my version.
>> 
>> This also enables regression test for string.Compare which previously was aarch64-only
>> 
>> Testing: tier1 and tier2 clean on hifive.
>> 
>> JMH testing, hifive:
>> before:
>> 
>> Benchmark                                   (delta)  (size)  Mode  Cnt     Score    Error  Units
>> StringCompareToDifferentLength.compareToLL        2      24  avgt    9     6.474 ±  1.475  ms/op
>> StringCompareToDifferentLength.compareToLL        2      36  avgt    9   125.823 ±  1.947  ms/op
>> StringCompareToDifferentLength.compareToLL        2      72  avgt    9    10.512 ±  0.236  ms/op
>> StringCompareToDifferentLength.compareToLL        2     128  avgt    9    13.032 ±  0.821  ms/op
>> StringCompareToDifferentLength.compareToLL        2     256  avgt    9    18.983 ±  0.318  ms/op
>> StringCompareToDifferentLength.compareToLL        2     512  avgt    9    29.925 ± ...
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove some branches and moves

updated results from hifive:
good perf improvements on compareToLU and compareToUL with sizes > 72

Benchmark                                   (delta)  (size)  Mode  Cnt    Score   Error  Units
StringCompareToDifferentLength.compareToLL        2      24  avgt    9    8.610 ± 0.524  ms/op
StringCompareToDifferentLength.compareToLL        2      36  avgt    9    9.623 ± 0.980  ms/op
StringCompareToDifferentLength.compareToLL        2      72  avgt    9   11.483 ± 0.607  ms/op
StringCompareToDifferentLength.compareToLL        2     128  avgt    9   15.931 ± 0.306  ms/op
StringCompareToDifferentLength.compareToLL        2     256  avgt    9   21.179 ± 0.179  ms/op
StringCompareToDifferentLength.compareToLL        2     512  avgt    9   32.687 ± 0.713  ms/op
StringCompareToDifferentLength.compareToLL        2     520  avgt    9   31.122 ± 0.580  ms/op
StringCompareToDifferentLength.compareToLL        2     523  avgt    9   33.225 ± 0.478  ms/op
StringCompareToDifferentLength.compareToLU        2      24  avgt    9   15.019 ± 0.631  ms/op
StringCompareToDifferentLength.compareToLU        2      36  avgt    9   18.538 ± 1.178  ms/op
StringCompareToDifferentLength.compareToLU        2      72  avgt    9   30.966 ± 1.096  ms/op
StringCompareToDifferentLength.compareToLU        2     128  avgt    9   48.397 ± 1.622  ms/op
StringCompareToDifferentLength.compareToLU        2     256  avgt    9   87.368 ± 1.432  ms/op
StringCompareToDifferentLength.compareToLU        2     512  avgt    9  164.575 ± 0.816  ms/op
StringCompareToDifferentLength.compareToLU        2     520  avgt    9  167.250 ± 1.221  ms/op
StringCompareToDifferentLength.compareToLU        2     523  avgt    9  172.279 ± 1.525  ms/op
StringCompareToDifferentLength.compareToUL        2      24  avgt    9   16.391 ± 0.456  ms/op
StringCompareToDifferentLength.compareToUL        2      36  avgt    9   19.760 ± 0.283  ms/op
StringCompareToDifferentLength.compareToUL        2      72  avgt    9   31.841 ± 0.888  ms/op
StringCompareToDifferentLength.compareToUL        2     128  avgt    9   49.545 ± 1.115  ms/op
StringCompareToDifferentLength.compareToUL        2     256  avgt    9   88.728 ± 0.877  ms/op
StringCompareToDifferentLength.compareToUL        2     512  avgt    9  166.147 ± 1.468  ms/op
StringCompareToDifferentLength.compareToUL        2     520  avgt    9  168.843 ± 1.251  ms/op
StringCompareToDifferentLength.compareToUL        2     523  avgt    9  173.655 ± 1.518  ms/op
StringCompareToDifferentLength.compareToUU        2      24  avgt    9    9.462 ± 0.572  ms/op
StringCompareToDifferentLength.compareToUU        2      36  avgt    9   11.976 ± 0.696  ms/op
StringCompareToDifferentLength.compareToUU        2      72  avgt    9   15.301 ± 0.673  ms/op
StringCompareToDifferentLength.compareToUU        2     128  avgt    9   19.836 ± 0.841  ms/op
StringCompareToDifferentLength.compareToUU        2     256  avgt    9   31.328 ± 0.619  ms/op
StringCompareToDifferentLength.compareToUU        2     512  avgt    9   52.381 ± 1.249  ms/op
StringCompareToDifferentLength.compareToUU        2     520  avgt    9   53.119 ± 1.195  ms/op
StringCompareToDifferentLength.compareToUU        2     523  avgt    9   53.588 ± 1.803  ms/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14534#issuecomment-1647689516


More information about the hotspot-dev mailing list