RFR: 8316426: Optimization for HexFormat.formatHex
温绍锦
duke at openjdk.org
Mon Sep 18 15:45:01 UTC 2023
On Fri, 15 Sep 2023 18:04:29 GMT, 温绍锦 <duke at openjdk.org> wrote:
> In the improvement of @cl4es PR #15591, the advantages of non-lookup-table were discussed.
>
> But if the input is byte[], using lookup table can improve performance.
>
> For HexFormat#formatHex(Appendable, byte[]) and HexFormat#formatHex(byte[]), If the length of byte[] is larger, the performance of table lookup will be improved more obviously.
The performance test results are as follows:
## 0. sciprt
bash configure
make images
sh make/devkit/createJMHBundle.sh
bash configure --with-jmh=build/jmh/jars
make test TEST="micro:java.util.HexFormatBench.*"
## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i)
* cpu : intel xeon sapphire rapids (x64)
* os debian linux
-Benchmark (size) Mode Cnt Score Error Units (baselinie)
-HexFormatBench.appenderLower 512 avgt 15 2.768 ? 0.034 us/op
-HexFormatBench.appenderLowerCached 512 avgt 15 2.796 ? 0.042 us/op
-HexFormatBench.appenderUpper 512 avgt 15 2.800 ? 0.032 us/op
-HexFormatBench.appenderUpperCached 512 avgt 15 2.781 ? 0.018 us/op
-HexFormatBench.formatLower 512 avgt 15 0.544 ? 0.002 us/op
-HexFormatBench.formatLowerCached 512 avgt 15 0.548 ? 0.004 us/op
-HexFormatBench.formatUpper 512 avgt 15 0.546 ? 0.007 us/op
-HexFormatBench.formatUpperCached 512 avgt 15 0.550 ? 0.005 us/op
-HexFormatBench.toHexDigitsByte 512 avgt 15 3.364 ? 0.015 us/op
-HexFormatBench.toHexDigitsInt 512 avgt 15 3.770 ? 0.017 us/op
-HexFormatBench.toHexDigitsLong 512 avgt 15 4.990 ? 0.018 us/op
-HexFormatBench.toHexDigitsShort 512 avgt 15 3.466 ? 0.017 us/op
-HexFormatBench.toHexLower 512 avgt 15 0.415 ? 0.005 us/op
-HexFormatBench.toHexLowerCached 512 avgt 15 0.422 ? 0.005 us/op
-HexFormatBench.toHexUpper 512 avgt 15 0.413 ? 0.005 us/op
-HexFormatBench.toHexUpperCached 512 avgt 15 0.423 ? 0.004 us/op
+Benchmark (size) Mode Cnt Score Error Units (optimized)
+HexFormatBench.appenderLower 512 avgt 15 0.163 ? 0.001 us/op (+1598.16)
+HexFormatBench.appenderLowerCached 512 avgt 15 0.161 ? 0.001 us/op (+1636.65)
+HexFormatBench.appenderUpper 512 avgt 15 0.251 ? 0.023 us/op (+1015.54)
+HexFormatBench.appenderUpperCached 512 avgt 15 0.266 ? 0.001 us/op (+945.49)
+HexFormatBench.formatLower 512 avgt 15 0.275 ? 0.001 us/op (+97.82)
+HexFormatBench.formatLowerCached 512 avgt 15 0.277 ? 0.001 us/op (+97.84)
+HexFormatBench.formatUpper 512 avgt 15 0.285 ? 0.001 us/op (+91.58)
+HexFormatBench.formatUpperCached 512 avgt 15 0.285 ? 0.001 us/op (+92.99)
+HexFormatBench.toHexDigitsByte 512 avgt 15 3.554 ? 0.028 us/op (-5.35)
+HexFormatBench.toHexDigitsInt 512 avgt 15 3.910 ? 0.015 us/op (-3.59)
+HexFormatBench.toHexDigitsLong 512 avgt 15 5.288 ? 0.018 us/op (-5.64)
+HexFormatBench.toHexDigitsShort 512 avgt 15 3.637 ? 0.012 us/op (-4.71)
+HexFormatBench.toHexLower 512 avgt 15 0.445 ? 0.001 us/op (-6.75)
+HexFormatBench.toHexLowerCached 512 avgt 15 0.442 ? 0.001 us/op (-4.53)
+HexFormatBench.toHexUpper 512 avgt 15 0.445 ? 0.001 us/op (-7.20)
+HexFormatBench.toHexUpperCached 512 avgt 15 0.441 ? 0.001 us/op (-4.09)
## 2. [aliyun_ecs_c8y.xlarge](https://help.aliyun.com/document_detail/25378.html#c8y)
* cpu : aliyun yitian 710 (aarch64)
* os debian linux
-Benchmark (size) Mode Cnt Score Error Units (baseline)
-HexFormatBench.appenderLower 512 avgt 15 2.857 ? 0.791 us/op
-HexFormatBench.appenderLowerCached 512 avgt 15 2.832 ? 0.758 us/op
-HexFormatBench.appenderUpper 512 avgt 15 2.360 ? 0.010 us/op
-HexFormatBench.appenderUpperCached 512 avgt 15 2.361 ? 0.013 us/op
-HexFormatBench.formatLower 512 avgt 15 0.947 ? 0.406 us/op
-HexFormatBench.formatLowerCached 512 avgt 15 0.616 ? 0.002 us/op
-HexFormatBench.formatUpper 512 avgt 15 1.212 ? 0.411 us/op
-HexFormatBench.formatUpperCached 512 avgt 15 0.616 ? 0.001 us/op
-HexFormatBench.toHexDigitsByte 512 avgt 15 5.844 ? 0.264 us/op
-HexFormatBench.toHexDigitsInt 512 avgt 15 7.392 ? 0.207 us/op
-HexFormatBench.toHexDigitsLong 512 avgt 15 8.068 ? 0.303 us/op
-HexFormatBench.toHexDigitsShort 512 avgt 15 6.214 ? 0.266 us/op
-HexFormatBench.toHexLower 512 avgt 15 0.926 ? 0.003 us/op
-HexFormatBench.toHexLowerCached 512 avgt 15 1.000 ? 0.005 us/op
-HexFormatBench.toHexUpper 512 avgt 15 0.927 ? 0.002 us/op
-HexFormatBench.toHexUpperCached 512 avgt 15 0.999 ? 0.020 us/op
+Benchmark (size) Mode Cnt Score Error Units (optimized)
+HexFormatBench.appenderLower 512 avgt 15 0.356 ? 0.001 us/op (+702.53)
+HexFormatBench.appenderLowerCached 512 avgt 15 0.356 ? 0.001 us/op (+695.51)
+HexFormatBench.appenderUpper 512 avgt 15 0.304 ? 0.001 us/op (+676.32)
+HexFormatBench.appenderUpperCached 512 avgt 15 0.304 ? 0.001 us/op (+676.65)
+HexFormatBench.formatLower 512 avgt 15 0.461 ? 0.001 us/op (+105.43)
+HexFormatBench.formatLowerCached 512 avgt 15 0.485 ? 0.001 us/op (+27.02)
+HexFormatBench.formatUpper 512 avgt 15 0.644 ? 0.003 us/op (+88.20)
+HexFormatBench.formatUpperCached 512 avgt 15 0.595 ? 0.003 us/op (+3.53)
+HexFormatBench.toHexDigitsByte 512 avgt 15 5.804 ? 0.237 us/op (+0.69)
+HexFormatBench.toHexDigitsInt 512 avgt 15 7.209 ? 0.212 us/op (+2.54)
+HexFormatBench.toHexDigitsLong 512 avgt 15 8.301 ? 0.422 us/op (-2.81)
+HexFormatBench.toHexDigitsShort 512 avgt 15 5.908 ? 0.255 us/op (+5.18)
+HexFormatBench.toHexLower 512 avgt 15 0.494 ? 0.001 us/op (+87.45)
+HexFormatBench.toHexLowerCached 512 avgt 15 0.494 ? 0.001 us/op (+102.43)
+HexFormatBench.toHexUpper 512 avgt 15 0.494 ? 0.001 us/op (+87.66)
+HexFormatBench.toHexUpperCached 512 avgt 15 0.493 ? 0.001 us/op (+102.64)
## 3. Mac Book Pro M1 Pro
-Benchmark (size) Mode Cnt Score Error Units (baseline)
-HexFormatBench.appenderLower 512 avgt 15 2.867 ? 0.035 us/op
-HexFormatBench.appenderLowerCached 512 avgt 15 1.656 ? 0.875 us/op
-HexFormatBench.appenderUpper 512 avgt 15 2.813 ? 0.085 us/op
-HexFormatBench.appenderUpperCached 512 avgt 15 1.575 ? 0.901 us/op
-HexFormatBench.formatLower 512 avgt 15 0.385 ? 0.001 us/op
-HexFormatBench.formatLowerCached 512 avgt 15 0.385 ? 0.002 us/op
-HexFormatBench.formatUpper 512 avgt 15 0.385 ? 0.001 us/op
-HexFormatBench.formatUpperCached 512 avgt 15 0.384 ? 0.001 us/op
-HexFormatBench.toHexDigitsByte 512 avgt 15 1.688 ? 0.009 us/op
-HexFormatBench.toHexDigitsInt 512 avgt 15 2.991 ? 0.015 us/op
-HexFormatBench.toHexDigitsLong 512 avgt 15 3.719 ? 0.081 us/op
-HexFormatBench.toHexDigitsShort 512 avgt 15 1.868 ? 0.010 us/op
-HexFormatBench.toHexLower 512 avgt 15 0.321 ? 0.001 us/op
-HexFormatBench.toHexLowerCached 512 avgt 15 0.322 ? 0.001 us/op
-HexFormatBench.toHexUpper 512 avgt 15 0.324 ? 0.001 us/op
-HexFormatBench.toHexUpperCached 512 avgt 15 0.325 ? 0.001 us/op
+Benchmark (size) Mode Cnt Score Error Units (optimized)
+HexFormatBench.appenderLower 512 avgt 15 0.212 ? 0.003 us/op (+1252.36)
+HexFormatBench.appenderLowerCached 512 avgt 15 0.211 ? 0.001 us/op (+684.84)
+HexFormatBench.appenderUpper 512 avgt 15 0.199 ? 0.002 us/op (+1313.57)
+HexFormatBench.appenderUpperCached 512 avgt 15 0.198 ? 0.001 us/op (+695.46)
+HexFormatBench.formatLower 512 avgt 15 0.221 ? 0.001 us/op (+74.21)
+HexFormatBench.formatLowerCached 512 avgt 15 0.192 ? 0.001 us/op (+100.53)
+HexFormatBench.formatUpper 512 avgt 15 0.317 ? 0.002 us/op (+21.46)
+HexFormatBench.formatUpperCached 512 avgt 15 0.348 ? 0.003 us/op (+10.35)
+HexFormatBench.toHexDigitsByte 512 avgt 15 1.715 ? 0.011 us/op (-1.58)
+HexFormatBench.toHexDigitsInt 512 avgt 15 2.261 ? 0.012 us/op (+32.29)
+HexFormatBench.toHexDigitsLong 512 avgt 15 3.776 ? 0.023 us/op (-1.51)
+HexFormatBench.toHexDigitsShort 512 avgt 15 1.862 ? 0.011 us/op (+0.33)
+HexFormatBench.toHexLower 512 avgt 15 0.289 ? 0.004 us/op (+11.08)
+HexFormatBench.toHexLowerCached 512 avgt 15 0.294 ? 0.002 us/op (+9.53)
+HexFormatBench.toHexUpper 512 avgt 15 0.288 ? 0.001 us/op (+12.50)
+HexFormatBench.toHexUpperCached 512 avgt 15 0.295 ? 0.001 us/op (+10.17)
Add internal methods to StringBuilder for performance optimization, I saw that the implementation of JEP 403 String Template does similar things.
class AbstractStringBuilder {
long mix(long lengthCoder) { }
long prepend(long lengthCoder, byte[] buffer) {}
// ...
}
However, the StringBuilder.appendHex method can have more usage scenarios and can be considered as a public method. Is it necessary to submit a new PR to add these methods?
class AbstractStringBuilder {
public void appendHex(byte[] bytes) {}
public void appendHex(byte[] bytes, boolean ucase) {}
public void appendHex(byte[] bytes, int fromIndex, int toIndex) {}
public void appendHex(byte[] bytes, int fromIndex, int toIndex, boolean ucase) {}
}
Regarding the performance of using lookup table, I think it makes sense when the length of byte[] is greater than 8. I think that when the length of byte[] is actually used, there is a high probability that it will be greater than 8. Of course, I just said the number 8 casually, it could be 12, or 16.
HexDecimal#DIGITS is a table with a size of 512 bytes. I think that in such a table, when it needs to be used continuously, it is worthwhile to perform table lookup operations.
I deleted the newly added AbstractBuilder.appendHex method, Such changes are reduced and performance improvements are similar.
The new performance test results are as follows:
## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i)
* cpu : intel xeon sapphire rapids (x64)
* os debian linux
-Benchmark (size) Mode Cnt Score Error Units (baselinie)
-HexFormatBench.appenderLower 512 avgt 15 2.768 ? 0.034 us/op
-HexFormatBench.appenderLowerCached 512 avgt 15 2.796 ? 0.042 us/op
-HexFormatBench.appenderUpper 512 avgt 15 2.800 ? 0.032 us/op
-HexFormatBench.appenderUpperCached 512 avgt 15 2.781 ? 0.018 us/op
-HexFormatBench.formatLower 512 avgt 15 0.544 ? 0.002 us/op
-HexFormatBench.formatLowerCached 512 avgt 15 0.548 ? 0.004 us/op
-HexFormatBench.formatUpper 512 avgt 15 0.546 ? 0.007 us/op
-HexFormatBench.formatUpperCached 512 avgt 15 0.550 ? 0.005 us/op
-HexFormatBench.toHexDigitsByte 512 avgt 15 3.364 ? 0.015 us/op
-HexFormatBench.toHexDigitsInt 512 avgt 15 3.770 ? 0.017 us/op
-HexFormatBench.toHexDigitsLong 512 avgt 15 4.990 ? 0.018 us/op
-HexFormatBench.toHexDigitsShort 512 avgt 15 3.466 ? 0.017 us/op
-HexFormatBench.toHexLower 512 avgt 15 0.415 ? 0.005 us/op
-HexFormatBench.toHexLowerCached 512 avgt 15 0.422 ? 0.005 us/op
-HexFormatBench.toHexUpper 512 avgt 15 0.413 ? 0.005 us/op
-HexFormatBench.toHexUpperCached 512 avgt 15 0.423 ? 0.004 us/op
+Benchmark (size) Mode Cnt Score Error Units (optimized)
+HexFormatBench.appenderLower 512 avgt 15 0.211 ? 0.002 us/op (+1211.85)
+HexFormatBench.appenderLowerCached 512 avgt 15 0.210 ? 0.004 us/op (+1231.43)
+HexFormatBench.appenderUpper 512 avgt 15 0.289 ? 0.002 us/op (+868.86)
+HexFormatBench.appenderUpperCached 512 avgt 15 0.296 ? 0.019 us/op (+839.53)
+HexFormatBench.formatLower 512 avgt 15 0.265 ? 0.001 us/op (+105.29)
+HexFormatBench.formatLowerCached 512 avgt 15 0.267 ? 0.002 us/op (+105.25)
+HexFormatBench.formatUpper 512 avgt 15 0.274 ? 0.002 us/op (+99.28)
+HexFormatBench.formatUpperCached 512 avgt 15 0.286 ? 0.019 us/op (+92.31)
+HexFormatBench.toHexDigitsByte 512 avgt 15 3.351 ? 0.011 us/op (+0.39)
+HexFormatBench.toHexDigitsInt 512 avgt 15 3.708 ? 0.011 us/op (+1.68)
+HexFormatBench.toHexDigitsLong 512 avgt 15 5.051 ? 0.014 us/op (-1.21)
+HexFormatBench.toHexDigitsShort 512 avgt 15 3.456 ? 0.012 us/op (+0.29)
+HexFormatBench.toHexLower 512 avgt 15 0.445 ? 0.001 us/op (-6.75)
+HexFormatBench.toHexLowerCached 512 avgt 15 0.441 ? 0.001 us/op (-4.31)
+HexFormatBench.toHexUpper 512 avgt 15 0.444 ? 0.001 us/op (-6.99)
+HexFormatBench.toHexUpperCached 512 avgt 15 0.441 ? 0.001 us/op (-4.09)
## 2. [aliyun_ecs_c8y.xlarge](https://help.aliyun.com/document_detail/25378.html#c8y)
* cpu : aliyun yitian 710 (aarch64)
* os debian linux
-Benchmark (size) Mode Cnt Score Error Units (baseline)
-HexFormatBench.appenderLower 512 avgt 15 2.857 ? 0.791 us/op
-HexFormatBench.appenderLowerCached 512 avgt 15 2.832 ? 0.758 us/op
-HexFormatBench.appenderUpper 512 avgt 15 2.360 ? 0.010 us/op
-HexFormatBench.appenderUpperCached 512 avgt 15 2.361 ? 0.013 us/op
-HexFormatBench.formatLower 512 avgt 15 0.947 ? 0.406 us/op
-HexFormatBench.formatLowerCached 512 avgt 15 0.616 ? 0.002 us/op
-HexFormatBench.formatUpper 512 avgt 15 1.212 ? 0.411 us/op
-HexFormatBench.formatUpperCached 512 avgt 15 0.616 ? 0.001 us/op
-HexFormatBench.toHexDigitsByte 512 avgt 15 5.844 ? 0.264 us/op
-HexFormatBench.toHexDigitsInt 512 avgt 15 7.392 ? 0.207 us/op
-HexFormatBench.toHexDigitsLong 512 avgt 15 8.068 ? 0.303 us/op
-HexFormatBench.toHexDigitsShort 512 avgt 15 6.214 ? 0.266 us/op
-HexFormatBench.toHexLower 512 avgt 15 0.926 ? 0.003 us/op
-HexFormatBench.toHexLowerCached 512 avgt 15 1.000 ? 0.005 us/op
-HexFormatBench.toHexUpper 512 avgt 15 0.927 ? 0.002 us/op
-HexFormatBench.toHexUpperCached 512 avgt 15 0.999 ? 0.020 us/op
+Benchmark (size) Mode Cnt Score Error Units (optimized)
+HexFormatBench.appenderLower 512 avgt 15 0.343 ? 0.001 us/op (+732.95)
+HexFormatBench.appenderLowerCached 512 avgt 15 0.345 ? 0.001 us/op (+720.87)
+HexFormatBench.appenderUpper 512 avgt 15 0.352 ? 0.002 us/op (+570.46)
+HexFormatBench.appenderUpperCached 512 avgt 15 0.349 ? 0.001 us/op (+576.51)
+HexFormatBench.formatLower 512 avgt 15 0.464 ? 0.001 us/op (+104.10)
+HexFormatBench.formatLowerCached 512 avgt 15 0.484 ? 0.002 us/op (+27.28)
+HexFormatBench.formatUpper 512 avgt 15 0.650 ? 0.001 us/op (+86.47)
+HexFormatBench.formatUpperCached 512 avgt 15 0.598 ? 0.001 us/op (+3.02)
+HexFormatBench.toHexDigitsByte 512 avgt 15 5.591 ? 0.058 us/op (+4.53)
+HexFormatBench.toHexDigitsInt 512 avgt 15 7.080 ? 0.114 us/op (+4.41)
+HexFormatBench.toHexDigitsLong 512 avgt 15 7.754 ? 0.040 us/op (+4.05)
+HexFormatBench.toHexDigitsShort 512 avgt 15 5.779 ? 0.076 us/op (+7.53)
+HexFormatBench.toHexLower 512 avgt 15 0.494 ? 0.001 us/op (+87.45)
+HexFormatBench.toHexLowerCached 512 avgt 15 0.493 ? 0.001 us/op (+102.84)
+HexFormatBench.toHexUpper 512 avgt 15 0.494 ? 0.001 us/op (+87.66)
+HexFormatBench.toHexUpperCached 512 avgt 15 0.493 ? 0.001 us/op (+102.64)
## 3. Mac Book Pro M1 Pro
-Benchmark (size) Mode Cnt Score Error Units (baseline)
-HexFormatBench.appenderLower 512 avgt 15 2.867 ? 0.035 us/op
-HexFormatBench.appenderLowerCached 512 avgt 15 1.656 ? 0.875 us/op
-HexFormatBench.appenderUpper 512 avgt 15 2.813 ? 0.085 us/op
-HexFormatBench.appenderUpperCached 512 avgt 15 1.575 ? 0.901 us/op
-HexFormatBench.formatLower 512 avgt 15 0.385 ? 0.001 us/op
-HexFormatBench.formatLowerCached 512 avgt 15 0.385 ? 0.002 us/op
-HexFormatBench.formatUpper 512 avgt 15 0.385 ? 0.001 us/op
-HexFormatBench.formatUpperCached 512 avgt 15 0.384 ? 0.001 us/op
-HexFormatBench.toHexDigitsByte 512 avgt 15 1.688 ? 0.009 us/op
-HexFormatBench.toHexDigitsInt 512 avgt 15 2.991 ? 0.015 us/op
-HexFormatBench.toHexDigitsLong 512 avgt 15 3.719 ? 0.081 us/op
-HexFormatBench.toHexDigitsShort 512 avgt 15 1.868 ? 0.010 us/op
-HexFormatBench.toHexLower 512 avgt 15 0.321 ? 0.001 us/op
-HexFormatBench.toHexLowerCached 512 avgt 15 0.322 ? 0.001 us/op
-HexFormatBench.toHexUpper 512 avgt 15 0.324 ? 0.001 us/op
-HexFormatBench.toHexUpperCached 512 avgt 15 0.325 ? 0.001 us/op
+Benchmark (size) Mode Cnt Score Error Units (optimized)
+HexFormatBench.appenderLower 512 avgt 15 0.207 ? 0.001 us/op (+1285.03)
+HexFormatBench.appenderLowerCached 512 avgt 15 0.206 ? 0.001 us/op (+703.89)
+HexFormatBench.appenderUpper 512 avgt 15 0.225 ? 0.001 us/op (+1150.23)
+HexFormatBench.appenderUpperCached 512 avgt 15 0.225 ? 0.001 us/op (+600.00)
+HexFormatBench.formatLower 512 avgt 15 0.211 ? 0.003 us/op (+82.47)
+HexFormatBench.formatLowerCached 512 avgt 15 0.186 ? 0.001 us/op (+106.99)
+HexFormatBench.formatUpper 512 avgt 15 0.312 ? 0.001 us/op (+23.40)
+HexFormatBench.formatUpperCached 512 avgt 15 0.344 ? 0.001 us/op (+11.63)
+HexFormatBench.toHexDigitsByte 512 avgt 15 1.718 ? 0.054 us/op (-1.75)
+HexFormatBench.toHexDigitsInt 512 avgt 15 2.255 ? 0.010 us/op (+32.64)
+HexFormatBench.toHexDigitsLong 512 avgt 15 3.764 ? 0.005 us/op (-1.20)
+HexFormatBench.toHexDigitsShort 512 avgt 15 1.858 ? 0.008 us/op (+0.54)
+HexFormatBench.toHexLower 512 avgt 15 0.289 ? 0.004 us/op (+11.08)
+HexFormatBench.toHexLowerCached 512 avgt 15 0.295 ? 0.001 us/op (+9.16)
+HexFormatBench.toHexUpper 512 avgt 15 0.288 ? 0.001 us/op (+12.50)
+HexFormatBench.toHexUpperCached 512 avgt 15 0.297 ? 0.005 us/op (+9.43)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721723317
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721944547
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1722180550
More information about the core-libs-dev
mailing list