RFR: 8334554: RISC-V: verify & fix perf of string comparison

Tue Jun 25 09:33:10 UTC 2024

On Tue, 25 Jun 2024 08:56:25 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Thanks for testing!
>> I merged your above test result as below, we can see for UU/LL tests, it brings regression when count==24/36, improves the performance  when count > 36, and the improvement trend is stable when count grows up.
>> In my previous test result, similar things happened, the only difference is it brings regression when count==24.
>> 
>> Benchmark | (delta) | (size) | Mode | Cnt | Score(+rvv) - before | Error | Units | Score(+rvv) - after | Before/after(bigger, better)
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> StringCompareToDifferentLength.compareToLL | 2 | 24 | avgt | 9 | 4130.834 | 32.876 | us/op | 4738.613 | 0.872
>> StringCompareToDifferentLength.compareToLL | 2 | 36 | avgt | 9 | 4194.66 | 50.024 | us/op | 4791.263 | 0.875
>> StringCompareToDifferentLength.compareToLL | 2 | 72 | avgt | 9 | 5632.843 | 39.958 | us/op | 4746.75 | 1.187
>> StringCompareToDifferentLength.compareToLL | 2 | 128 | avgt | 9 | 5537.939 | 102.826 | us/op | 4745.569 | 1.167
>> StringCompareToDifferentLength.compareToLL | 2 | 256 | avgt | 9 | 8410.254 | 48.978 | us/op | 6770.867 | 1.242
>> StringCompareToDifferentLength.compareToLL | 2 | 512 | avgt | 9 | 14190.077 | 58.298 | us/op | 10931.753 | 1.298
>> StringCompareToDifferentLength.compareToLU | 2 | 24 | avgt | 9 | 4746.32 | 26.752 | us/op | 4747.007 | 1
>> StringCompareToDifferentLength.compareToLU | 2 | 36 | avgt | 9 | 4745.934 | 29.01 | us/op | 4742.046 | 1.001
>> StringCompareToDifferentLength.compareToLU | 2 | 72 | avgt | 9 | 7010.726 | 34.604 | us/op | 7013.791 | 1
>> StringCompareToDifferentLength.compareToLU | 2 | 128 | avgt | 9 | 6932.81 | 116.194 | us/op | 6935.089 | 1
>> StringCompareToDifferentLength.compareToLU | 2 | 256 | avgt | 9 | 11299.32 | 71.107 | us/op | 11467.796 | 0.985
>> StringCompareToDifferentLength.compareToLU | 2 | 512 | avgt | 9 | 20284.136 | 518.531 | us/op | 20280.133 | 1
>> StringCompareToDifferentLength.compareToUL | 2 | 24 | avgt | 9 | 4909.746 | 62.347 | us/op | 4918.766 | 0.998
>> StringCompareToDifferentLength.compareToUL | 2 | 36 | avgt | 9 | 4931.501 | 21.065 | us/op | 4927.695 | 1.001
>> StringCompareToDifferentLength.compareToUL | 2 | 72 | avgt | 9 | 7120.069 | 121.244 | us/op | 7159.904 | 0.994
>> StringCompareToDifferentLength.compareToUL | 2 | 128 | avgt | 9 | 7082.143 | 37.576 | us/op | 7097.52 | 0.998
>> StringCompareToDifferentLength.compareToUL | 2 | 256 | avgt | 9 | 11519.615 | 159.86 | us/op | 11734.633 | 0.982
>> StringCompareToDifferentLength.compareToUL | 2 | 512 | avgt...
>
> Based on above observation,  I wonder if we can pass different lmul value depending on different count value? e.g. when count <= 36 we use m2, when count > 36 (maybe not 36 exactly, but some value close to it) we use m4? But that means there will be a branch in the code at runtime.

Sounds reasonable. Further, I wonder if we should consider m1 for these <=36 cases.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19825#discussion_r1652354533