RFR: 8310268: RISC-V: misaligned memory access in String.Compare intrinsic [v3]
Vladimir Kempik
vkempik at openjdk.org
Thu Jul 20 08:54:43 UTC 2023
On Thu, 20 Jul 2023 08:31:56 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:
>> Please review this fix. it eliminates misaligned loads in String.compare on risc-v
>>
>> for small compares ( <= 72 bytes), the instrinsic in c2_MacroAssembler.cpp is used,
>> it reads ( in case of UU/LL) 8 bytes per loop, and at then end, it reads tail - misaligned load of last 8 bytes from the string.
>>
>> so if string length is not 8x bytes long then last load is misaligned, also it performs read/compare of some data which already was processed.
>>
>> I have changed that to compare only last length%8 bytes using SHORT_STRING part of intrinsic for UL/LU. But for UU/LL I have made an optimised version.
>>
>> Thanks to optimisations for conditional branching at line [947](https://github.com/openjdk/jdk/pull/14534/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R947) I’ve got no perf drop on thead ( with +AvoidUnalignedAccesses) which supports misaligned access.
>>
>> Improvements to inflate_XX() methods gives 3-5% improvements for UL/LU cases on thead, almost no perf change on hifive.
>>
>> for large strings, the instrinsics from stubGenerator.cpp is used
>> for UU/LL - generate_compare_long_string_same_encoding, I have just replaced misaligned ld with load_long_misaligned. Since this tail reading is not on hot path, this give some small penalty for thead when -XX:+AvoidUnalignedAccesses.
>>
>> large LU/UL comparision is done in compare_long_string_different_encoding in sutbGenerator.cpp:
>> These changes are partially based on feilongjiang's patch, but I have changed tail reading to prevent reading past the end of string. I have observed no perf difference between feilongjiang's and my version.
>>
>> This also enables regression test for string.Compare which previously was aarch64-only
>>
>> Testing: tier1 and tier2 clean on hifive.
>>
>> JMH testing, hifive:
>> before:
>>
>> Benchmark (delta) (size) Mode Cnt Score Error Units
>> StringCompareToDifferentLength.compareToLL 2 24 avgt 9 6.474 ± 1.475 ms/op
>> StringCompareToDifferentLength.compareToLL 2 36 avgt 9 125.823 ± 1.947 ms/op
>> StringCompareToDifferentLength.compareToLL 2 72 avgt 9 10.512 ± 0.236 ms/op
>> StringCompareToDifferentLength.compareToLL 2 128 avgt 9 13.032 ± 0.821 ms/op
>> StringCompareToDifferentLength.compareToLL 2 256 avgt 9 18.983 ± 0.318 ms/op
>> StringCompareToDifferentLength.compareToLL 2 512 avgt 9 29.925 ± ...
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
>
> Simplify case for long LU UL compares
Results for latest update, from thead
+AvoidUnaligned
Benchmark (delta) (size) Mode Cnt Score Error Units
StringCompareToDifferentLength.compareToLL 2 24 avgt 9 4.000 ± 0.106 ms/op
StringCompareToDifferentLength.compareToLL 2 36 avgt 9 4.562 ± 0.089 ms/op
StringCompareToDifferentLength.compareToLL 2 72 avgt 9 7.536 ± 0.085 ms/op
StringCompareToDifferentLength.compareToLL 2 128 avgt 9 10.341 ± 0.287 ms/op
StringCompareToDifferentLength.compareToLL 2 256 avgt 9 15.275 ± 0.249 ms/op
StringCompareToDifferentLength.compareToLL 2 512 avgt 9 21.731 ± 0.413 ms/op
StringCompareToDifferentLength.compareToLL 2 520 avgt 9 20.255 ± 0.287 ms/op
StringCompareToDifferentLength.compareToLL 2 523 avgt 9 22.114 ± 0.641 ms/op
StringCompareToDifferentLength.compareToLU 2 24 avgt 9 7.615 ± 0.032 ms/op
StringCompareToDifferentLength.compareToLU 2 36 avgt 9 10.566 ± 0.096 ms/op
StringCompareToDifferentLength.compareToLU 2 72 avgt 9 21.975 ± 0.288 ms/op
StringCompareToDifferentLength.compareToLU 2 128 avgt 9 36.078 ± 0.419 ms/op
StringCompareToDifferentLength.compareToLU 2 256 avgt 9 65.567 ± 0.715 ms/op
StringCompareToDifferentLength.compareToLU 2 512 avgt 9 124.196 ± 0.636 ms/op
StringCompareToDifferentLength.compareToLU 2 520 avgt 9 126.580 ± 1.431 ms/op
StringCompareToDifferentLength.compareToLU 2 523 avgt 9 129.830 ± 1.857 ms/op
StringCompareToDifferentLength.compareToUL 2 24 avgt 9 10.386 ± 0.368 ms/op
StringCompareToDifferentLength.compareToUL 2 36 avgt 9 12.981 ± 0.271 ms/op
StringCompareToDifferentLength.compareToUL 2 72 avgt 9 23.726 ± 0.532 ms/op
StringCompareToDifferentLength.compareToUL 2 128 avgt 9 37.997 ± 0.482 ms/op
StringCompareToDifferentLength.compareToUL 2 256 avgt 9 67.834 ± 0.915 ms/op
StringCompareToDifferentLength.compareToUL 2 512 avgt 9 126.500 ± 0.771 ms/op
StringCompareToDifferentLength.compareToUL 2 520 avgt 9 128.853 ± 2.059 ms/op
StringCompareToDifferentLength.compareToUL 2 523 avgt 9 132.825 ± 3.318 ms/op
StringCompareToDifferentLength.compareToUU 2 24 avgt 9 4.013 ± 0.012 ms/op
StringCompareToDifferentLength.compareToUU 2 36 avgt 9 4.845 ± 0.148 ms/op
StringCompareToDifferentLength.compareToUU 2 72 avgt 9 10.276 ± 0.313 ms/op
StringCompareToDifferentLength.compareToUU 2 128 avgt 9 14.338 ± 0.201 ms/op
StringCompareToDifferentLength.compareToUU 2 256 avgt 9 20.912 ± 0.550 ms/op
StringCompareToDifferentLength.compareToUU 2 512 avgt 9 34.264 ± 0.660 ms/op
StringCompareToDifferentLength.compareToUU 2 520 avgt 9 34.557 ± 0.252 ms/op
StringCompareToDifferentLength.compareToUU 2 523 avgt 9 34.841 ± 0.380 ms/op
-AvoidUnaligned
Benchmark (delta) (size) Mode Cnt Score Error Units
StringCompareToDifferentLength.compareToLL 2 24 avgt 9 2.557 ± 0.034 ms/op
StringCompareToDifferentLength.compareToLL 2 36 avgt 9 3.507 ± 0.035 ms/op
StringCompareToDifferentLength.compareToLL 2 72 avgt 9 7.513 ± 0.033 ms/op
StringCompareToDifferentLength.compareToLL 2 128 avgt 9 9.095 ± 0.210 ms/op
StringCompareToDifferentLength.compareToLL 2 256 avgt 9 13.666 ± 0.134 ms/op
StringCompareToDifferentLength.compareToLL 2 512 avgt 9 20.131 ± 0.234 ms/op
StringCompareToDifferentLength.compareToLL 2 520 avgt 9 20.115 ± 0.065 ms/op
StringCompareToDifferentLength.compareToLL 2 523 avgt 9 20.865 ± 0.224 ms/op
StringCompareToDifferentLength.compareToLU 2 24 avgt 9 7.091 ± 0.067 ms/op
StringCompareToDifferentLength.compareToLU 2 36 avgt 9 9.883 ± 0.109 ms/op
StringCompareToDifferentLength.compareToLU 2 72 avgt 9 22.037 ± 0.327 ms/op
StringCompareToDifferentLength.compareToLU 2 128 avgt 9 35.914 ± 0.307 ms/op
StringCompareToDifferentLength.compareToLU 2 256 avgt 9 65.673 ± 1.075 ms/op
StringCompareToDifferentLength.compareToLU 2 512 avgt 9 124.257 ± 0.722 ms/op
StringCompareToDifferentLength.compareToLU 2 520 avgt 9 126.128 ± 0.453 ms/op
StringCompareToDifferentLength.compareToLU 2 523 avgt 9 129.413 ± 1.567 ms/op
StringCompareToDifferentLength.compareToUL 2 24 avgt 9 9.661 ± 0.440 ms/op
StringCompareToDifferentLength.compareToUL 2 36 avgt 9 12.106 ± 0.290 ms/op
StringCompareToDifferentLength.compareToUL 2 72 avgt 9 23.903 ± 0.441 ms/op
StringCompareToDifferentLength.compareToUL 2 128 avgt 9 38.722 ± 1.049 ms/op
StringCompareToDifferentLength.compareToUL 2 256 avgt 9 67.640 ± 0.957 ms/op
StringCompareToDifferentLength.compareToUL 2 512 avgt 9 126.744 ± 1.904 ms/op
StringCompareToDifferentLength.compareToUL 2 520 avgt 9 129.400 ± 2.463 ms/op
StringCompareToDifferentLength.compareToUL 2 523 avgt 9 130.664 ± 1.380 ms/op
StringCompareToDifferentLength.compareToUU 2 24 avgt 9 3.662 ± 0.166 ms/op
StringCompareToDifferentLength.compareToUU 2 36 avgt 9 4.552 ± 0.217 ms/op
StringCompareToDifferentLength.compareToUU 2 72 avgt 9 9.399 ± 0.270 ms/op
StringCompareToDifferentLength.compareToUU 2 128 avgt 9 13.688 ± 0.294 ms/op
StringCompareToDifferentLength.compareToUU 2 256 avgt 9 20.033 ± 0.290 ms/op
StringCompareToDifferentLength.compareToUU 2 512 avgt 9 33.512 ± 0.433 ms/op
StringCompareToDifferentLength.compareToUU 2 520 avgt 9 33.796 ± 0.435 ms/op
StringCompareToDifferentLength.compareToUU 2 523 avgt 9 33.983 ± 0.152 ms/op
hifive results to follow soon
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14534#issuecomment-1643534917
More information about the hotspot-dev
mailing list