RFR: 8349556: RISC-V: improve the performance when -COH and -AvoidUnalignedAccesses for UL and LU string comparison [v2]
Vladimir Kempik
vkempik at openjdk.org
Mon Feb 10 11:03:11 UTC 2025
On Fri, 7 Feb 2025 12:29:33 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hi,
>>
>> Can you help to review the patch?
>>
>> It tries to improve the string compare when AvoidUnalignedAccesses == false && encoding is LU or UL (i.e. 2 strings encodings are different with each other).
>> The jmh test shows when `-CompactObjectHeaders` (i.e. -COH) && `-AvoidUnalignedAccesses`, the patch bring much better performance, and in other cases, it does not bring obvious regression. And currently by default it's -COH.
>>
>> Thanks
>>
>> ### Performance
>>
>> it's run on bananapi.
>>
>> -COH-AvoidUnalignedAccesses
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> “-COH-Avoid” | (delta) | (size) | (utf16) | Mode | Cnt | Score - master | Score - patch | Error | Units | Improvement
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 24 | N/A | avgt | 10 | 6438443.073 | 6383881.891 | 36912.539 | ns/op | 0.009
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 36 | N/A | avgt | 10 | 9421176.34 | 9390907.1 | 21034.266 | ns/op | 0.003
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 72 | N/A | avgt | 10 | 18592342.33 | 16871350.38 | 15550.827 | ns/op | 0.102
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 128 | N/A | avgt | 10 | 30916157.05 | 28646961.11 | 9263.556 | ns/op | 0.079
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 256 | N/A | avgt | 10 | 58945069.17 | 55505097.77 | 8803.847 | ns/op | 0.062
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToLU | 2 | 512 | N/A | avgt | 10 | 115520355.5 | 110233842.8 | 35056.972 | ns/op | 0.048
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 2 | 24 | N/A | avgt | 10 | 7541299.83 | 7481385.995 | 43240.713 | ns/op | 0.008
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 2 | 36 | N/A | avgt | 10 | 10295051.77 | 10264978.04 | 38938.956 | ns/op | 0.003
>> com.arm.benchmarks.intrinsics.StringCompareToDifferentLength.compareToUL | 2 | 72 | N/A | avgt | 10 | 19652419.64 | 17953481.41 | 10987.17 | ns/op | 0.095
>> com.arm.benchmarks....
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>
> refine comments
I can't see how this end up improving perf, You have moved loading job for first few bytes from one place to another.
Technically it's correct and if jmh shows improvement - let it be
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23495#issuecomment-2647644475
More information about the hotspot-compiler-dev
mailing list