RFR: 8345120: A likely bug in StringSupport::chunkedStrlenShort

Fri Nov 29 08:12:13 UTC 2024

On Fri, 29 Nov 2024 07:55:25 GMT, Per Minborg <pminborg at openjdk.org> wrote:

> This PR proposes to rewrite the `StringSupport::chunkedStrlen*`  methods and move them to the class `SegmentBulkOperations` where other bulk operations reside.
> 
> This PR fixes a bug in the `short_strlen` variant for offsets that were odd (`offset % 2 != 0`).
> 
> This PR also improves performance on modern hardware, as there is no need for pre-looping alignment. Removing this improves performance by about 30% for larger strings.
> 
> Passes the `jdk_foreign` test suit.
> 
> Base:
> 
> 
> Benchmark                               (size)  Mode  Cnt    Score   Error  Units
> InternalStrLen.changedElementQuad            1  avgt   30    2.057 ? 0.012  ns/op
> InternalStrLen.changedElementQuad            4  avgt   30    3.776 ? 0.031  ns/op
> InternalStrLen.changedElementQuad           16  avgt   30    6.690 ? 0.060  ns/op
> InternalStrLen.changedElementQuad          251  avgt   30   48.581 ? 0.764  ns/op
> InternalStrLen.changedElementQuad         1024  avgt   30  196.188 ? 3.484  ns/op
> InternalStrLen.chunkedDouble                 1  avgt   30    1.903 ? 0.013  ns/op
> InternalStrLen.chunkedDouble                 4  avgt   30    3.446 ? 0.025  ns/op
> InternalStrLen.chunkedDouble                16  avgt   30    5.759 ? 0.062  ns/op
> InternalStrLen.chunkedDouble               251  avgt   30   26.892 ? 0.141  ns/op
> InternalStrLen.chunkedDouble              1024  avgt   30   72.940 ? 1.562  ns/op
> InternalStrLen.chunkedSingle                 1  avgt   30    1.897 ? 0.015  ns/op
> InternalStrLen.chunkedSingle                 4  avgt   30    5.357 ? 0.560  ns/op
> InternalStrLen.chunkedSingle                16  avgt   30    3.821 ? 0.052  ns/op
> InternalStrLen.chunkedSingle               251  avgt   30   19.482 ? 0.190  ns/op
> InternalStrLen.chunkedSingle              1024  avgt   30   38.938 ? 0.411  ns/op
> InternalStrLen.chunkedSingleMisaligned       1  avgt   30    2.230 ? 0.147  ns/op
> InternalStrLen.chunkedSingleMisaligned       4  avgt   30    5.424 ? 0.688  ns/op
> InternalStrLen.chunkedSingleMisaligned      16  avgt   30    9.573 ? 0.063  ns/op
> InternalStrLen.chunkedSingleMisaligned     251  avgt   30   22.242 ? 0.182  ns/op
> InternalStrLen.chunkedSingleMisaligned    1024  avgt   30   45.442 ? 0.252  ns/op
> InternalStrLen.elementByteMisaligned         1  avgt   30    1.616 ? 0.041  ns/op
> InternalStrLen.elementByteMisaligned         4  avgt   30    2.982 ? 0.018  ns/op
> InternalStrLen.elementByteMisaligned        16  avgt   30    8.662 ? 0.085  ns/op
> InternalStrLen.elementByteMisaligned       251  avgt   30 ...

Here is a chart of the performance improvements on a Mac M1 with 251 element strings:

![image](https://github.com/user-attachments/assets/88682680-bcf0-4b7e-b35c-0a0cad523852)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22451#issuecomment-2507283173