RFR: 8310268: RISC-V: misaligned memory access in String.Compare intrinsic [v2]
Vladimir Kempik
vkempik at openjdk.org
Thu Jun 29 05:50:06 UTC 2023
> Please review this fix. it eliminates misaligned loads in String.compare on risc-v
>
> for small compares ( <= 72 bytes), the instrinsic in c2_MacroAssembler.cpp is used,
> it reads ( in case of UU/LL) 8 bytes per loop, and at then end, it reads tail - misaligned load of last 8 bytes from the string.
>
> so if string length is not 8x bytes long then last load is misaligned, also it performs read/compare of some data which already was processed.
>
> I have changed that to compare only last length%8 bytes using SHORT_STRING part of intrinsic for UL/LU. But for UU/LL I have made an optimised version.
>
> Thanks to optimisations for conditional branching at line [947](https://github.com/openjdk/jdk/pull/14534/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R947) I’ve got no perf drop on thead ( with +AvoidUnalignedAccesses) which supports misaligned access.
>
> Improvements to inflate_XX() methods gives 3-5% improvements for UL/LU cases on thead, almost no perf change on hifive.
>
> for large strings, the instrinsics from stubGenerator.cpp is used
> for UU/LL - generate_compare_long_string_same_encoding, I have just replaced misaligned ld with load_long_misaligned. Since this tail reading is not on hot path, this give some small penalty for thead when -XX:+AvoidUnalignedAccesses.
>
> large LU/UL comparision is done in compare_long_string_different_encoding in sutbGenerator.cpp:
> These changes are partially based on feilongjiang's patch, but I have changed tail reading to prevent reading past the end of string. I have observed no perf difference between feilongjiang's and my version.
>
> This also enables regression test for string.Compare which previously was aarch64-only
>
> Testing: tier1 and tier2 clean on hifive.
>
> JMH testing, hifive:
> before:
>
> Benchmark (delta) (size) Mode Cnt Score Error Units
> StringCompareToDifferentLength.compareToLL 2 24 avgt 9 6.474 ± 1.475 ms/op
> StringCompareToDifferentLength.compareToLL 2 36 avgt 9 125.823 ± 1.947 ms/op
> StringCompareToDifferentLength.compareToLL 2 72 avgt 9 10.512 ± 0.236 ms/op
> StringCompareToDifferentLength.compareToLL 2 128 avgt 9 13.032 ± 0.821 ms/op
> StringCompareToDifferentLength.compareToLL 2 256 avgt 9 18.983 ± 0.318 ms/op
> StringCompareToDifferentLength.compareToLL 2 512 avgt 9 29.925 ± 0.458 ms/op
> StringCompareToDifferentLength.compareToLL 2 ...
Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
Simplify tail for shrot string compare
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/14534/files
- new: https://git.openjdk.org/jdk/pull/14534/files/618c249c..3784bae9
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=14534&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=14534&range=00-01
Stats: 102 lines in 1 file changed: 0 ins; 89 del; 13 mod
Patch: https://git.openjdk.org/jdk/pull/14534.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14534/head:pull/14534
PR: https://git.openjdk.org/jdk/pull/14534
More information about the hotspot-dev
mailing list