RFR: 8364320: String encodeUTF8 latin1 with negatives

Brett Okken duke at openjdk.org
Fri Aug 1 12:41:56 UTC 2025


On Fri, 1 Aug 2025 12:34:15 GMT, Brett Okken <duke at openjdk.org> wrote:

> As suggested on mailing list, when encoding latin1 bytes to utf-8, we can count the leading positive bytes and in the case where there is a negative, we can copy all the positive values to the target byte[] prior to processing the remaining data 1 byte at a time.
> 
> https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149417.html

Benchmark on win64

Baseline:


Benchmark                           (charsetName)  Mode  Cnt      Score     Error  Units
StringEncode.encodeAllMixed                 UTF-8  avgt   10  20067.519 ┬▒ 528.152  ns/op
StringEncode.encodeAsciiLong                UTF-8  avgt   10  12115.389 ┬▒ 307.491  ns/op
StringEncode.encodeAsciiShort               UTF-8  avgt   10     70.098 ┬▒   1.696  ns/op
StringEncode.encodeLatin1LongEnd            UTF-8  avgt   10   1974.391 ┬▒ 162.405  ns/op
StringEncode.encodeLatin1LongOnly           UTF-8  avgt   10    270.097 ┬▒  13.840  ns/op
StringEncode.encodeLatin1LongStart          UTF-8  avgt   10   1876.366 ┬▒  51.971  ns/op
StringEncode.encodeLatin1Mixed              UTF-8  avgt   10   4973.070 ┬▒ 130.426  ns/op
StringEncode.encodeLatin1Short              UTF-8  avgt   10     96.227 ┬▒   2.816  ns/op
StringEncode.encodeShortMixed               UTF-8  avgt   10    360.586 ┬▒   8.691  ns/op
StringEncode.encodeUTF16LongEnd             UTF-8  avgt   10   1534.748 ┬▒  34.584  ns/op
StringEncode.encodeUTF16LongOnly            UTF-8  avgt   10    528.919 ┬▒  15.143  ns/op
StringEncode.encodeUTF16LongStart           UTF-8  avgt   10   2275.117 ┬▒  50.152  ns/op
StringEncode.encodeUTF16Mixed               UTF-8  avgt   10   4398.943 ┬▒ 116.607  ns/op
StringEncode.encodeUTF16Short               UTF-8  avgt   10    152.219 ┬▒   8.677  ns/op



Patch:

Benchmark                           (charsetName)  Mode  Cnt      Score     Error  Units
StringEncode.encodeAllMixed                 UTF-8  avgt   10  18876.056 ┬▒ 330.644  ns/op
StringEncode.encodeAsciiLong                UTF-8  avgt   10  12040.590 ┬▒ 165.905  ns/op
StringEncode.encodeAsciiShort               UTF-8  avgt   10     69.895 ┬▒   0.318  ns/op
StringEncode.encodeLatin1LongEnd            UTF-8  avgt   10    574.455 ┬▒  14.769  ns/op
StringEncode.encodeLatin1LongOnly           UTF-8  avgt   10    284.553 ┬▒   1.886  ns/op
StringEncode.encodeLatin1LongStart          UTF-8  avgt   10   2230.789 ┬▒  11.043  ns/op
StringEncode.encodeLatin1Mixed              UTF-8  avgt   10   3278.998 ┬▒  96.779  ns/op
StringEncode.encodeLatin1Short              UTF-8  avgt   10     99.332 ┬▒   1.977  ns/op
StringEncode.encodeShortMixed               UTF-8  avgt   10    378.183 ┬▒  17.504  ns/op
StringEncode.encodeUTF16LongEnd             UTF-8  avgt   10   1531.960 ┬▒  19.300  ns/op
StringEncode.encodeUTF16LongOnly            UTF-8  avgt   10    563.810 ┬▒   4.811  ns/op
StringEncode.encodeUTF16LongStart           UTF-8  avgt   10   2270.970 ┬▒  28.495  ns/op
StringEncode.encodeUTF16Mixed               UTF-8  avgt   10   4403.824 ┬▒  60.338  ns/op
StringEncode.encodeUTF16Short               UTF-8  avgt   10    158.600 ┬▒   2.044  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26597#issuecomment-3144446972


More information about the core-libs-dev mailing list