RFR: 8310268: RISC-V: misaligned memory access in String.Compare intrinsic [v2]

Vladimir Kempik vkempik at openjdk.org
Tue Jul 11 08:38:02 UTC 2023


On Thu, 29 Jun 2023 05:50:06 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

>> Please review this fix. it eliminates misaligned loads in String.compare on risc-v
>> 
>> for small compares ( <= 72 bytes), the instrinsic in c2_MacroAssembler.cpp is used,
>> it reads ( in case of UU/LL) 8 bytes per loop, and at then end, it reads tail - misaligned load of last 8 bytes from the string.
>> 
>> so if string length is not 8x bytes long then last load is misaligned, also it performs read/compare of some data which already was processed.
>> 
>> I have changed that to compare only last length%8 bytes using SHORT_STRING part of intrinsic for UL/LU. But for UU/LL I have made an optimised version.
>> 
>> Thanks to optimisations for conditional branching at line [947](https://github.com/openjdk/jdk/pull/14534/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R947) I’ve got no perf drop on thead ( with +AvoidUnalignedAccesses) which supports misaligned access.
>> 
>> Improvements to inflate_XX() methods gives 3-5% improvements for UL/LU cases on thead, almost no perf change on hifive.
>> 
>> for large strings, the instrinsics from stubGenerator.cpp is used
>> for UU/LL - generate_compare_long_string_same_encoding, I have just replaced misaligned ld with load_long_misaligned. Since this tail reading is not on hot path, this give some small penalty for thead when -XX:+AvoidUnalignedAccesses.
>> 
>> large LU/UL comparision is done in compare_long_string_different_encoding in sutbGenerator.cpp:
>> These changes are partially based on feilongjiang's patch, but I have changed tail reading to prevent reading past the end of string. I have observed no perf difference between feilongjiang's and my version.
>> 
>> This also enables regression test for string.Compare which previously was aarch64-only
>> 
>> Testing: tier1 and tier2 clean on hifive.
>> 
>> JMH testing, hifive:
>> before:
>> 
>> Benchmark                                   (delta)  (size)  Mode  Cnt     Score    Error  Units
>> StringCompareToDifferentLength.compareToLL        2      24  avgt    9     6.474 ±  1.475  ms/op
>> StringCompareToDifferentLength.compareToLL        2      36  avgt    9   125.823 ±  1.947  ms/op
>> StringCompareToDifferentLength.compareToLL        2      72  avgt    9    10.512 ±  0.236  ms/op
>> StringCompareToDifferentLength.compareToLL        2     128  avgt    9    13.032 ±  0.821  ms/op
>> StringCompareToDifferentLength.compareToLL        2     256  avgt    9    18.983 ±  0.318  ms/op
>> StringCompareToDifferentLength.compareToLL        2     512  avgt    9    29.925 ± ...
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplify tail for shrot string compare

Now that 17u backport is done, can we please get back to this PR?
we have got compare case for small strings reviewed, only left the case for large strings, where the case for UU/LL is pretty simple ( just replacing ld with misaligned_load)
and the case for large UL/LU strings is the only complex thing

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14534#issuecomment-1630392602


More information about the hotspot-dev mailing list