RFR: 8316426: Optimization for HexFormat.formatHex

温绍锦 duke at openjdk.org
Mon Sep 18 15:45:01 UTC 2023

On Fri, 15 Sep 2023 18:04:29 GMT, 温绍锦 <duke at openjdk.org> wrote:

> In the improvement of @cl4es PR #15591, the advantages of non-lookup-table were discussed.
> But if the input is byte[], using lookup table can improve performance.
> For HexFormat#formatHex(Appendable, byte[]) and HexFormat#formatHex(byte[]), If the length of byte[] is larger, the performance of table lookup will be improved more obviously.

The performance test results are as follows:

## 0. sciprt

bash configure
make images
sh make/devkit/createJMHBundle.sh
bash configure --with-jmh=build/jmh/jars
make test TEST="micro:java.util.HexFormatBench.*"

## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i)
* cpu : intel xeon sapphire rapids (x64)
* os debian linux

-Benchmark                           (size)  Mode  Cnt  Score   Error  Units (baselinie)
-HexFormatBench.appenderLower           512  avgt   15  2.768 ? 0.034  us/op
-HexFormatBench.appenderLowerCached     512  avgt   15  2.796 ? 0.042  us/op
-HexFormatBench.appenderUpper           512  avgt   15  2.800 ? 0.032  us/op
-HexFormatBench.appenderUpperCached     512  avgt   15  2.781 ? 0.018  us/op
-HexFormatBench.formatLower             512  avgt   15  0.544 ? 0.002  us/op
-HexFormatBench.formatLowerCached       512  avgt   15  0.548 ? 0.004  us/op
-HexFormatBench.formatUpper             512  avgt   15  0.546 ? 0.007  us/op
-HexFormatBench.formatUpperCached       512  avgt   15  0.550 ? 0.005  us/op
-HexFormatBench.toHexDigitsByte         512  avgt   15  3.364 ? 0.015  us/op
-HexFormatBench.toHexDigitsInt          512  avgt   15  3.770 ? 0.017  us/op
-HexFormatBench.toHexDigitsLong         512  avgt   15  4.990 ? 0.018  us/op
-HexFormatBench.toHexDigitsShort        512  avgt   15  3.466 ? 0.017  us/op
-HexFormatBench.toHexLower              512  avgt   15  0.415 ? 0.005  us/op
-HexFormatBench.toHexLowerCached        512  avgt   15  0.422 ? 0.005  us/op
-HexFormatBench.toHexUpper              512  avgt   15  0.413 ? 0.005  us/op
-HexFormatBench.toHexUpperCached        512  avgt   15  0.423 ? 0.004  us/op

+Benchmark                           (size)  Mode  Cnt  Score    Error  Units (optimized)
+HexFormatBench.appenderLower           512  avgt   15  0.163 ?  0.001  us/op (+1598.16)
+HexFormatBench.appenderLowerCached     512  avgt   15  0.161 ?  0.001  us/op (+1636.65)
+HexFormatBench.appenderUpper           512  avgt   15  0.251 ?  0.023  us/op (+1015.54)
+HexFormatBench.appenderUpperCached     512  avgt   15  0.266 ?  0.001  us/op (+945.49)
+HexFormatBench.formatLower             512  avgt   15  0.275 ?  0.001  us/op (+97.82)
+HexFormatBench.formatLowerCached       512  avgt   15  0.277 ?  0.001  us/op (+97.84)
+HexFormatBench.formatUpper             512  avgt   15  0.285 ?  0.001  us/op (+91.58)
+HexFormatBench.formatUpperCached       512  avgt   15  0.285 ?  0.001  us/op (+92.99)
+HexFormatBench.toHexDigitsByte         512  avgt   15  3.554 ?  0.028  us/op (-5.35)
+HexFormatBench.toHexDigitsInt          512  avgt   15  3.910 ?  0.015  us/op (-3.59)
+HexFormatBench.toHexDigitsLong         512  avgt   15  5.288 ?  0.018  us/op (-5.64)
+HexFormatBench.toHexDigitsShort        512  avgt   15  3.637 ?  0.012  us/op (-4.71)
+HexFormatBench.toHexLower              512  avgt   15  0.445 ?  0.001  us/op (-6.75)
+HexFormatBench.toHexLowerCached        512  avgt   15  0.442 ?  0.001  us/op (-4.53)
+HexFormatBench.toHexUpper              512  avgt   15  0.445 ?  0.001  us/op (-7.20)
+HexFormatBench.toHexUpperCached        512  avgt   15  0.441 ?  0.001  us/op (-4.09)

## 2. [aliyun_ecs_c8y.xlarge](https://help.aliyun.com/document_detail/25378.html#c8y)
* cpu : aliyun yitian 710 (aarch64)
* os debian linux

-Benchmark                           (size)  Mode  Cnt  Score   Error  Units (baseline)
-HexFormatBench.appenderLower           512  avgt   15  2.857 ? 0.791  us/op
-HexFormatBench.appenderLowerCached     512  avgt   15  2.832 ? 0.758  us/op
-HexFormatBench.appenderUpper           512  avgt   15  2.360 ? 0.010  us/op
-HexFormatBench.appenderUpperCached     512  avgt   15  2.361 ? 0.013  us/op
-HexFormatBench.formatLower             512  avgt   15  0.947 ? 0.406  us/op
-HexFormatBench.formatLowerCached       512  avgt   15  0.616 ? 0.002  us/op
-HexFormatBench.formatUpper             512  avgt   15  1.212 ? 0.411  us/op
-HexFormatBench.formatUpperCached       512  avgt   15  0.616 ? 0.001  us/op
-HexFormatBench.toHexDigitsByte         512  avgt   15  5.844 ? 0.264  us/op
-HexFormatBench.toHexDigitsInt          512  avgt   15  7.392 ? 0.207  us/op
-HexFormatBench.toHexDigitsLong         512  avgt   15  8.068 ? 0.303  us/op
-HexFormatBench.toHexDigitsShort        512  avgt   15  6.214 ? 0.266  us/op
-HexFormatBench.toHexLower              512  avgt   15  0.926 ? 0.003  us/op
-HexFormatBench.toHexLowerCached        512  avgt   15  1.000 ? 0.005  us/op
-HexFormatBench.toHexUpper              512  avgt   15  0.927 ? 0.002  us/op
-HexFormatBench.toHexUpperCached        512  avgt   15  0.999 ? 0.020  us/op

+Benchmark                           (size)  Mode  Cnt  Score    Error  Units (optimized)
+HexFormatBench.appenderLower           512  avgt   15  0.356 ?  0.001  us/op (+702.53)
+HexFormatBench.appenderLowerCached     512  avgt   15  0.356 ?  0.001  us/op (+695.51)
+HexFormatBench.appenderUpper           512  avgt   15  0.304 ?  0.001  us/op (+676.32)
+HexFormatBench.appenderUpperCached     512  avgt   15  0.304 ?  0.001  us/op (+676.65)
+HexFormatBench.formatLower             512  avgt   15  0.461 ?  0.001  us/op (+105.43)
+HexFormatBench.formatLowerCached       512  avgt   15  0.485 ?  0.001  us/op (+27.02)
+HexFormatBench.formatUpper             512  avgt   15  0.644 ?  0.003  us/op (+88.20)
+HexFormatBench.formatUpperCached       512  avgt   15  0.595 ?  0.003  us/op (+3.53)
+HexFormatBench.toHexDigitsByte         512  avgt   15  5.804 ?  0.237  us/op (+0.69)
+HexFormatBench.toHexDigitsInt          512  avgt   15  7.209 ?  0.212  us/op (+2.54)
+HexFormatBench.toHexDigitsLong         512  avgt   15  8.301 ?  0.422  us/op (-2.81)
+HexFormatBench.toHexDigitsShort        512  avgt   15  5.908 ?  0.255  us/op (+5.18)
+HexFormatBench.toHexLower              512  avgt   15  0.494 ?  0.001  us/op (+87.45)
+HexFormatBench.toHexLowerCached        512  avgt   15  0.494 ?  0.001  us/op (+102.43)
+HexFormatBench.toHexUpper              512  avgt   15  0.494 ?  0.001  us/op (+87.66)
+HexFormatBench.toHexUpperCached        512  avgt   15  0.493 ?  0.001  us/op (+102.64)

## 3. Mac Book Pro M1 Pro

-Benchmark                           (size)  Mode  Cnt  Score    Error  Units (baseline)
-HexFormatBench.appenderLower           512  avgt   15  2.867 ?  0.035  us/op
-HexFormatBench.appenderLowerCached     512  avgt   15  1.656 ?  0.875  us/op
-HexFormatBench.appenderUpper           512  avgt   15  2.813 ?  0.085  us/op
-HexFormatBench.appenderUpperCached     512  avgt   15  1.575 ?  0.901  us/op
-HexFormatBench.formatLower             512  avgt   15  0.385 ?  0.001  us/op
-HexFormatBench.formatLowerCached       512  avgt   15  0.385 ?  0.002  us/op
-HexFormatBench.formatUpper             512  avgt   15  0.385 ?  0.001  us/op
-HexFormatBench.formatUpperCached       512  avgt   15  0.384 ?  0.001  us/op
-HexFormatBench.toHexDigitsByte         512  avgt   15  1.688 ?  0.009  us/op
-HexFormatBench.toHexDigitsInt          512  avgt   15  2.991 ?  0.015  us/op
-HexFormatBench.toHexDigitsLong         512  avgt   15  3.719 ?  0.081  us/op
-HexFormatBench.toHexDigitsShort        512  avgt   15  1.868 ?  0.010  us/op
-HexFormatBench.toHexLower              512  avgt   15  0.321 ?  0.001  us/op
-HexFormatBench.toHexLowerCached        512  avgt   15  0.322 ?  0.001  us/op
-HexFormatBench.toHexUpper              512  avgt   15  0.324 ?  0.001  us/op
-HexFormatBench.toHexUpperCached        512  avgt   15  0.325 ?  0.001  us/op

+Benchmark                           (size)  Mode  Cnt  Score    Error  Units (optimized)
+HexFormatBench.appenderLower           512  avgt   15  0.212 ?  0.003  us/op (+1252.36)
+HexFormatBench.appenderLowerCached     512  avgt   15  0.211 ?  0.001  us/op (+684.84)
+HexFormatBench.appenderUpper           512  avgt   15  0.199 ?  0.002  us/op (+1313.57)
+HexFormatBench.appenderUpperCached     512  avgt   15  0.198 ?  0.001  us/op (+695.46)
+HexFormatBench.formatLower             512  avgt   15  0.221 ?  0.001  us/op (+74.21)
+HexFormatBench.formatLowerCached       512  avgt   15  0.192 ?  0.001  us/op (+100.53)
+HexFormatBench.formatUpper             512  avgt   15  0.317 ?  0.002  us/op (+21.46)
+HexFormatBench.formatUpperCached       512  avgt   15  0.348 ?  0.003  us/op (+10.35)
+HexFormatBench.toHexDigitsByte         512  avgt   15  1.715 ?  0.011  us/op (-1.58)
+HexFormatBench.toHexDigitsInt          512  avgt   15  2.261 ?  0.012  us/op (+32.29)
+HexFormatBench.toHexDigitsLong         512  avgt   15  3.776 ?  0.023  us/op (-1.51)
+HexFormatBench.toHexDigitsShort        512  avgt   15  1.862 ?  0.011  us/op (+0.33)
+HexFormatBench.toHexLower              512  avgt   15  0.289 ?  0.004  us/op (+11.08)
+HexFormatBench.toHexLowerCached        512  avgt   15  0.294 ?  0.002  us/op (+9.53)
+HexFormatBench.toHexUpper              512  avgt   15  0.288 ?  0.001  us/op (+12.50)
+HexFormatBench.toHexUpperCached        512  avgt   15  0.295 ?  0.001  us/op (+10.17)

Add internal methods to StringBuilder for performance optimization, I saw that the implementation of JEP 403 String Template does similar things.

class AbstractStringBuilder {
    long mix(long lengthCoder) { }
    long prepend(long lengthCoder, byte[] buffer) {}
    // ...

However, the StringBuilder.appendHex method can have more usage scenarios and can be considered as a public method. Is it necessary to submit a new PR to add these methods?

class AbstractStringBuilder {
    public void appendHex(byte[] bytes) {}
    public void appendHex(byte[] bytes, boolean ucase) {}
    public void appendHex(byte[] bytes, int fromIndex, int toIndex) {}
    public void appendHex(byte[] bytes, int fromIndex, int toIndex, boolean ucase) {}

Regarding the performance of using lookup table, I think it makes sense when the length of byte[] is greater than 8. I think that when the length of byte[] is actually used, there is a high probability that it will be greater than 8. Of course, I just said the number 8 casually, it could be 12, or 16.

HexDecimal#DIGITS is a table with a size of 512 bytes. I think that in such a table, when it needs to be used continuously, it is worthwhile to perform table lookup operations.

I deleted the newly added AbstractBuilder.appendHex method, Such changes are reduced and performance improvements are similar.

The new performance test results are as follows:

## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i)
* cpu : intel xeon sapphire rapids (x64)
* os debian linux

-Benchmark                           (size)  Mode  Cnt  Score   Error  Units (baselinie)
-HexFormatBench.appenderLower           512  avgt   15  2.768 ? 0.034  us/op
-HexFormatBench.appenderLowerCached     512  avgt   15  2.796 ? 0.042  us/op
-HexFormatBench.appenderUpper           512  avgt   15  2.800 ? 0.032  us/op
-HexFormatBench.appenderUpperCached     512  avgt   15  2.781 ? 0.018  us/op
-HexFormatBench.formatLower             512  avgt   15  0.544 ? 0.002  us/op
-HexFormatBench.formatLowerCached       512  avgt   15  0.548 ? 0.004  us/op
-HexFormatBench.formatUpper             512  avgt   15  0.546 ? 0.007  us/op
-HexFormatBench.formatUpperCached       512  avgt   15  0.550 ? 0.005  us/op
-HexFormatBench.toHexDigitsByte         512  avgt   15  3.364 ? 0.015  us/op
-HexFormatBench.toHexDigitsInt          512  avgt   15  3.770 ? 0.017  us/op
-HexFormatBench.toHexDigitsLong         512  avgt   15  4.990 ? 0.018  us/op
-HexFormatBench.toHexDigitsShort        512  avgt   15  3.466 ? 0.017  us/op
-HexFormatBench.toHexLower              512  avgt   15  0.415 ? 0.005  us/op
-HexFormatBench.toHexLowerCached        512  avgt   15  0.422 ? 0.005  us/op
-HexFormatBench.toHexUpper              512  avgt   15  0.413 ? 0.005  us/op
-HexFormatBench.toHexUpperCached        512  avgt   15  0.423 ? 0.004  us/op

+Benchmark                           (size)  Mode  Cnt  Score    Error  Units (optimized)
+HexFormatBench.appenderLower           512  avgt   15  0.211 ?  0.002  us/op (+1211.85)
+HexFormatBench.appenderLowerCached     512  avgt   15  0.210 ?  0.004  us/op (+1231.43)
+HexFormatBench.appenderUpper           512  avgt   15  0.289 ?  0.002  us/op (+868.86)
+HexFormatBench.appenderUpperCached     512  avgt   15  0.296 ?  0.019  us/op (+839.53)
+HexFormatBench.formatLower             512  avgt   15  0.265 ?  0.001  us/op (+105.29)
+HexFormatBench.formatLowerCached       512  avgt   15  0.267 ?  0.002  us/op (+105.25)
+HexFormatBench.formatUpper             512  avgt   15  0.274 ?  0.002  us/op (+99.28)
+HexFormatBench.formatUpperCached       512  avgt   15  0.286 ?  0.019  us/op (+92.31)
+HexFormatBench.toHexDigitsByte         512  avgt   15  3.351 ?  0.011  us/op (+0.39)
+HexFormatBench.toHexDigitsInt          512  avgt   15  3.708 ?  0.011  us/op (+1.68)
+HexFormatBench.toHexDigitsLong         512  avgt   15  5.051 ?  0.014  us/op (-1.21)
+HexFormatBench.toHexDigitsShort        512  avgt   15  3.456 ?  0.012  us/op (+0.29)
+HexFormatBench.toHexLower              512  avgt   15  0.445 ?  0.001  us/op (-6.75)
+HexFormatBench.toHexLowerCached        512  avgt   15  0.441 ?  0.001  us/op (-4.31)
+HexFormatBench.toHexUpper              512  avgt   15  0.444 ?  0.001  us/op (-6.99)
+HexFormatBench.toHexUpperCached        512  avgt   15  0.441 ?  0.001  us/op (-4.09)

## 2. [aliyun_ecs_c8y.xlarge](https://help.aliyun.com/document_detail/25378.html#c8y)
* cpu : aliyun yitian 710 (aarch64)
* os debian linux

-Benchmark                           (size)  Mode  Cnt  Score   Error  Units (baseline)
-HexFormatBench.appenderLower           512  avgt   15  2.857 ? 0.791  us/op
-HexFormatBench.appenderLowerCached     512  avgt   15  2.832 ? 0.758  us/op
-HexFormatBench.appenderUpper           512  avgt   15  2.360 ? 0.010  us/op
-HexFormatBench.appenderUpperCached     512  avgt   15  2.361 ? 0.013  us/op
-HexFormatBench.formatLower             512  avgt   15  0.947 ? 0.406  us/op
-HexFormatBench.formatLowerCached       512  avgt   15  0.616 ? 0.002  us/op
-HexFormatBench.formatUpper             512  avgt   15  1.212 ? 0.411  us/op
-HexFormatBench.formatUpperCached       512  avgt   15  0.616 ? 0.001  us/op
-HexFormatBench.toHexDigitsByte         512  avgt   15  5.844 ? 0.264  us/op
-HexFormatBench.toHexDigitsInt          512  avgt   15  7.392 ? 0.207  us/op
-HexFormatBench.toHexDigitsLong         512  avgt   15  8.068 ? 0.303  us/op
-HexFormatBench.toHexDigitsShort        512  avgt   15  6.214 ? 0.266  us/op
-HexFormatBench.toHexLower              512  avgt   15  0.926 ? 0.003  us/op
-HexFormatBench.toHexLowerCached        512  avgt   15  1.000 ? 0.005  us/op
-HexFormatBench.toHexUpper              512  avgt   15  0.927 ? 0.002  us/op
-HexFormatBench.toHexUpperCached        512  avgt   15  0.999 ? 0.020  us/op

+Benchmark                           (size)  Mode  Cnt  Score    Error  Units (optimized)
+HexFormatBench.appenderLower           512  avgt   15  0.343 ?  0.001  us/op (+732.95)
+HexFormatBench.appenderLowerCached     512  avgt   15  0.345 ?  0.001  us/op (+720.87)
+HexFormatBench.appenderUpper           512  avgt   15  0.352 ?  0.002  us/op (+570.46)
+HexFormatBench.appenderUpperCached     512  avgt   15  0.349 ?  0.001  us/op (+576.51)
+HexFormatBench.formatLower             512  avgt   15  0.464 ?  0.001  us/op (+104.10)
+HexFormatBench.formatLowerCached       512  avgt   15  0.484 ?  0.002  us/op (+27.28)
+HexFormatBench.formatUpper             512  avgt   15  0.650 ?  0.001  us/op (+86.47)
+HexFormatBench.formatUpperCached       512  avgt   15  0.598 ?  0.001  us/op (+3.02)
+HexFormatBench.toHexDigitsByte         512  avgt   15  5.591 ?  0.058  us/op (+4.53)
+HexFormatBench.toHexDigitsInt          512  avgt   15  7.080 ?  0.114  us/op (+4.41)
+HexFormatBench.toHexDigitsLong         512  avgt   15  7.754 ?  0.040  us/op (+4.05)
+HexFormatBench.toHexDigitsShort        512  avgt   15  5.779 ?  0.076  us/op (+7.53)
+HexFormatBench.toHexLower              512  avgt   15  0.494 ?  0.001  us/op (+87.45)
+HexFormatBench.toHexLowerCached        512  avgt   15  0.493 ?  0.001  us/op (+102.84)
+HexFormatBench.toHexUpper              512  avgt   15  0.494 ?  0.001  us/op (+87.66)
+HexFormatBench.toHexUpperCached        512  avgt   15  0.493 ?  0.001  us/op (+102.64)

## 3. Mac Book Pro M1 Pro

-Benchmark                           (size)  Mode  Cnt  Score    Error  Units (baseline)
-HexFormatBench.appenderLower           512  avgt   15  2.867 ?  0.035  us/op
-HexFormatBench.appenderLowerCached     512  avgt   15  1.656 ?  0.875  us/op
-HexFormatBench.appenderUpper           512  avgt   15  2.813 ?  0.085  us/op
-HexFormatBench.appenderUpperCached     512  avgt   15  1.575 ?  0.901  us/op
-HexFormatBench.formatLower             512  avgt   15  0.385 ?  0.001  us/op
-HexFormatBench.formatLowerCached       512  avgt   15  0.385 ?  0.002  us/op
-HexFormatBench.formatUpper             512  avgt   15  0.385 ?  0.001  us/op
-HexFormatBench.formatUpperCached       512  avgt   15  0.384 ?  0.001  us/op
-HexFormatBench.toHexDigitsByte         512  avgt   15  1.688 ?  0.009  us/op
-HexFormatBench.toHexDigitsInt          512  avgt   15  2.991 ?  0.015  us/op
-HexFormatBench.toHexDigitsLong         512  avgt   15  3.719 ?  0.081  us/op
-HexFormatBench.toHexDigitsShort        512  avgt   15  1.868 ?  0.010  us/op
-HexFormatBench.toHexLower              512  avgt   15  0.321 ?  0.001  us/op
-HexFormatBench.toHexLowerCached        512  avgt   15  0.322 ?  0.001  us/op
-HexFormatBench.toHexUpper              512  avgt   15  0.324 ?  0.001  us/op
-HexFormatBench.toHexUpperCached        512  avgt   15  0.325 ?  0.001  us/op

+Benchmark                           (size)  Mode  Cnt  Score    Error  Units (optimized)
+HexFormatBench.appenderLower           512  avgt   15  0.207 ?  0.001  us/op (+1285.03)
+HexFormatBench.appenderLowerCached     512  avgt   15  0.206 ?  0.001  us/op (+703.89)
+HexFormatBench.appenderUpper           512  avgt   15  0.225 ?  0.001  us/op (+1150.23)
+HexFormatBench.appenderUpperCached     512  avgt   15  0.225 ?  0.001  us/op (+600.00)
+HexFormatBench.formatLower             512  avgt   15  0.211 ?  0.003  us/op (+82.47)
+HexFormatBench.formatLowerCached       512  avgt   15  0.186 ?  0.001  us/op (+106.99)
+HexFormatBench.formatUpper             512  avgt   15  0.312 ?  0.001  us/op (+23.40)
+HexFormatBench.formatUpperCached       512  avgt   15  0.344 ?  0.001  us/op (+11.63)
+HexFormatBench.toHexDigitsByte         512  avgt   15  1.718 ?  0.054  us/op (-1.75)
+HexFormatBench.toHexDigitsInt          512  avgt   15  2.255 ?  0.010  us/op (+32.64)
+HexFormatBench.toHexDigitsLong         512  avgt   15  3.764 ?  0.005  us/op (-1.20)
+HexFormatBench.toHexDigitsShort        512  avgt   15  1.858 ?  0.008  us/op (+0.54)
+HexFormatBench.toHexLower              512  avgt   15  0.289 ?  0.004  us/op (+11.08)
+HexFormatBench.toHexLowerCached        512  avgt   15  0.295 ?  0.001  us/op (+9.16)
+HexFormatBench.toHexUpper              512  avgt   15  0.288 ?  0.001  us/op (+12.50)
+HexFormatBench.toHexUpperCached        512  avgt   15  0.297 ?  0.005  us/op (+9.43)


PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721723317
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721944547
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1722180550

More information about the core-libs-dev mailing list