RFR: 8345120: A likely bug in StringSupport::chunkedStrlenShort
Per Minborg
pminborg at openjdk.org
Fri Nov 29 08:12:13 UTC 2024
On Fri, 29 Nov 2024 07:55:25 GMT, Per Minborg <pminborg at openjdk.org> wrote:
> This PR proposes to rewrite the `StringSupport::chunkedStrlen*` methods and move them to the class `SegmentBulkOperations` where other bulk operations reside.
>
> This PR fixes a bug in the `short_strlen` variant for offsets that were odd (`offset % 2 != 0`).
>
> This PR also improves performance on modern hardware, as there is no need for pre-looping alignment. Removing this improves performance by about 30% for larger strings.
>
> Passes the `jdk_foreign` test suit.
>
> Base:
>
>
> Benchmark (size) Mode Cnt Score Error Units
> InternalStrLen.changedElementQuad 1 avgt 30 2.057 ? 0.012 ns/op
> InternalStrLen.changedElementQuad 4 avgt 30 3.776 ? 0.031 ns/op
> InternalStrLen.changedElementQuad 16 avgt 30 6.690 ? 0.060 ns/op
> InternalStrLen.changedElementQuad 251 avgt 30 48.581 ? 0.764 ns/op
> InternalStrLen.changedElementQuad 1024 avgt 30 196.188 ? 3.484 ns/op
> InternalStrLen.chunkedDouble 1 avgt 30 1.903 ? 0.013 ns/op
> InternalStrLen.chunkedDouble 4 avgt 30 3.446 ? 0.025 ns/op
> InternalStrLen.chunkedDouble 16 avgt 30 5.759 ? 0.062 ns/op
> InternalStrLen.chunkedDouble 251 avgt 30 26.892 ? 0.141 ns/op
> InternalStrLen.chunkedDouble 1024 avgt 30 72.940 ? 1.562 ns/op
> InternalStrLen.chunkedSingle 1 avgt 30 1.897 ? 0.015 ns/op
> InternalStrLen.chunkedSingle 4 avgt 30 5.357 ? 0.560 ns/op
> InternalStrLen.chunkedSingle 16 avgt 30 3.821 ? 0.052 ns/op
> InternalStrLen.chunkedSingle 251 avgt 30 19.482 ? 0.190 ns/op
> InternalStrLen.chunkedSingle 1024 avgt 30 38.938 ? 0.411 ns/op
> InternalStrLen.chunkedSingleMisaligned 1 avgt 30 2.230 ? 0.147 ns/op
> InternalStrLen.chunkedSingleMisaligned 4 avgt 30 5.424 ? 0.688 ns/op
> InternalStrLen.chunkedSingleMisaligned 16 avgt 30 9.573 ? 0.063 ns/op
> InternalStrLen.chunkedSingleMisaligned 251 avgt 30 22.242 ? 0.182 ns/op
> InternalStrLen.chunkedSingleMisaligned 1024 avgt 30 45.442 ? 0.252 ns/op
> InternalStrLen.elementByteMisaligned 1 avgt 30 1.616 ? 0.041 ns/op
> InternalStrLen.elementByteMisaligned 4 avgt 30 2.982 ? 0.018 ns/op
> InternalStrLen.elementByteMisaligned 16 avgt 30 8.662 ? 0.085 ns/op
> InternalStrLen.elementByteMisaligned 251 avgt 30 ...
Here is a chart of the performance improvements on a Mac M1 with 251 element strings:

-------------
PR Comment: https://git.openjdk.org/jdk/pull/22451#issuecomment-2507283173
More information about the core-libs-dev
mailing list