RFR: 8364320: String encodeUTF8 latin1 with negatives
Brett Okken
duke at openjdk.org
Mon Aug 11 13:35:13 UTC 2025
On Fri, 1 Aug 2025 16:12:46 GMT, Chen Liang <liach at openjdk.org> wrote:
>> Benchmark on win64
>>
>> Baseline:
>>
>>
>> Benchmark (charsetName) Mode Cnt Score Error Units
>> StringEncode.encodeAllMixed UTF-8 avgt 10 20067.519 ┬▒ 528.152 ns/op
>> StringEncode.encodeAsciiLong UTF-8 avgt 10 12115.389 ┬▒ 307.491 ns/op
>> StringEncode.encodeAsciiShort UTF-8 avgt 10 70.098 ┬▒ 1.696 ns/op
>> StringEncode.encodeLatin1LongEnd UTF-8 avgt 10 1974.391 ┬▒ 162.405 ns/op
>> StringEncode.encodeLatin1LongOnly UTF-8 avgt 10 270.097 ┬▒ 13.840 ns/op
>> StringEncode.encodeLatin1LongStart UTF-8 avgt 10 1876.366 ┬▒ 51.971 ns/op
>> StringEncode.encodeLatin1Mixed UTF-8 avgt 10 4973.070 ┬▒ 130.426 ns/op
>> StringEncode.encodeLatin1Short UTF-8 avgt 10 96.227 ┬▒ 2.816 ns/op
>> StringEncode.encodeShortMixed UTF-8 avgt 10 360.586 ┬▒ 8.691 ns/op
>> StringEncode.encodeUTF16LongEnd UTF-8 avgt 10 1534.748 ┬▒ 34.584 ns/op
>> StringEncode.encodeUTF16LongOnly UTF-8 avgt 10 528.919 ┬▒ 15.143 ns/op
>> StringEncode.encodeUTF16LongStart UTF-8 avgt 10 2275.117 ┬▒ 50.152 ns/op
>> StringEncode.encodeUTF16Mixed UTF-8 avgt 10 4398.943 ┬▒ 116.607 ns/op
>> StringEncode.encodeUTF16Short UTF-8 avgt 10 152.219 ┬▒ 8.677 ns/op
>>
>>
>>
>> Patch:
>>
>> Benchmark (charsetName) Mode Cnt Score Error Units
>> StringEncode.encodeAllMixed UTF-8 avgt 10 18876.056 ┬▒ 330.644 ns/op
>> StringEncode.encodeAsciiLong UTF-8 avgt 10 12040.590 ┬▒ 165.905 ns/op
>> StringEncode.encodeAsciiShort UTF-8 avgt 10 69.895 ┬▒ 0.318 ns/op
>> StringEncode.encodeLatin1LongEnd UTF-8 avgt 10 574.455 ┬▒ 14.769 ns/op
>> StringEncode.encodeLatin1LongOnly UTF-8 avgt 10 284.553 ┬▒ 1.886 ns/op
>> StringEncode.encodeLatin1LongStart UTF-8 avgt 10 2230.789 ┬▒ 11.043 ns/op
>> StringEncode.encodeLatin1Mixed UTF-8 avgt 10 3278.998 ┬▒ 96.779 ns/op
>> StringEncode.encodeLatin1Short UTF-8 avgt 10 99.332 ┬▒ 1.977 ns/op
>> StringEncode.encodeShortMixed UTF-8 avgt 10 378.183 ┬▒ 17.504 ns/op
>> StringEncode.encodeUTF16LongEnd UTF-8 avgt 10 1531.960 ┬▒ 19.300 ns/op
>> StringEncode.encodeUTF16LongOnly U...
>
> @bokken FYI to make JMH comparison easier, you can let JMH generate JSON reports, upload them to github gists, and use https://jmh.morethan.io/ to compare the two results from two gists.
@liach / @RogerRiggs I have been experimenting locally with other options which are a bit more complex:
https://github.com/bokken/jdk/commits/string-utf8-mincopylength/
This seems like maybe a decent balance of complexity vs gain: https://github.com/bokken/jdk/commit/ee9d9e3496052fd5084f989bd7181504989d812b
I am continuing to evaluate various options.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26597#issuecomment-3174871115
More information about the core-libs-dev
mailing list