RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v5]

Vladimir Kempik vkempik at openjdk.org
Thu Jun 8 13:30:51 UTC 2023


On Thu, 8 Jun 2023 12:24:54 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

>> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V
>> 
>> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite.
>> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706)
>> The other two produced about 100 events combined.
>> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub.
>> Numbers on hifive before and after applying the patch:
>> 
>> 
>> Benchmark                                                  Mode  Cnt       Score      Error  Units
>> StringIndexOf.advancedWithMediumSub                        avgt   25   47031.406 ±  144.005  ns/op
>> 
>> 
>> After:
>> 
>> Benchmark                                                 Mode  Cnt       Score     Error  Units
>> StringIndexOf.advancedWithMediumSub                       avgt   25    4256.830 ±  23.075  ns/op
>> 
>> 
>> Testing: tier1/tier2 is clean on hifive.
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Increase granularity when isLL is false

First change, at [Line496](https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R496) regresses performance of indexOf based on Boyer-Moore-Horspool algo on thead :
Before:

org.openjdk.bench.java.lang.StringIndexOf.advancedWithMediumSub   2790.160 ±  56.442  ns/op

After:

org.openjdk.bench.java.lang.StringIndexOf.advancedWithMediumSub  3377.943 ± 42.496 ns/op


I think this could be improved

Currently, when we compare a needle and a region of haystack, we first read last 8 bytes from both regions then compare them, then if they match, compare rest byte per byte.
Reading 8 bytes from haystack is not always aligned or misaligned, we can read 4 or 2 bytes for first comparision, reducing wasted reads from haystack

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1582582020


More information about the hotspot-compiler-dev mailing list