RFR: 8339290: Optimize ClassFile Utf8EntryImpl#writeTo [v4]
Shaojin Wen
swen at openjdk.org
Mon Sep 2 14:06:43 UTC 2024
On Fri, 30 Aug 2024 19:39:32 GMT, Shaojin Wen <swen at openjdk.org> wrote:
>> Use fast path for ascii characters 1 to 127 to improve the performance of writing Utf8Entry to BufferWriter.
>
> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision:
>
> optimize Utf8EntryImpl#writeTo(UTF)
Below are the performance numbers running on a MacBook M1, With the option `Xint -XX:TieredStopAtLevel=1`, the overall performance is improved by about 1%, and without the option, the performance is improved by about 4%.
## 1. TieredStopAtLevel=1
### 1.1 Script
git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao
# baseline
git checkout d53c9f153463d2a1e36f4157574d890763edbf0f
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo" MICRO="VM_OPTIONS=-Xint -XX:TieredStopAtLevel=1"
# current
git checkout f055f3fda508873241f28621cbc71ff19874e5ca
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo" MICRO="VM_OPTIONS=-Xint -XX:TieredStopAtLevel=1"
### 1.2 Performance Numbers
-# baseline
-Benchmark (charType) Mode Cnt Score Error Units
-Utf8EntryWriteTo.writeTo ascii avgt 9 1376.779 ? 4.304 us/op
-Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 1384.241 ? 2.586 us/op
-Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 1423.933 ? 4.316 us/op
-Utf8EntryWriteTo.writeTo emoji avgt 9 1513.636 ? 7.290 us/op
+# current
+Benchmark (charType) Mode Cnt Score Error Units
+Utf8EntryWriteTo.writeTo ascii avgt 9 1361.663 ? 2.395 us/op
+Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 1362.041 ? 4.400 us/op
+Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 1398.706 ? 3.941 us/op
+Utf8EntryWriteTo.writeTo emoji avgt 9 1475.106 ? 3.335 us/op
| | pattern | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 1376.779 | 1361.663 | 1.11% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 1384.241 | 1362.041 | 1.63% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 1423.933 | 1398.706 | 1.80% |
| Utf8EntryWriteTo.writeTo | emoji | 1513.636 | 1475.106 | 2.61% |
## 2. Non-JVM Options
### 2.1 Script
git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao
# baseline
git checkout d53c9f153463d2a1e36f4157574d890763edbf0f
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo"
# current
git checkout f055f3fda508873241f28621cbc71ff19874e5ca
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo"
## 2.2 Performance Numbers
-# baseline
-Benchmark (charType) Mode Cnt Score Error Units
-Utf8EntryWriteTo.writeTo ascii avgt 9 20.013 ? 0.291 us/op
-Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 19.956 ? 0.312 us/op
-Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 20.294 ? 0.297 us/op
-Utf8EntryWriteTo.writeTo emoji avgt 9 20.444 ? 0.343 us/op
+# current
+Benchmark (charType) Mode Cnt Score Error Units
+Utf8EntryWriteTo.writeTo ascii avgt 9 19.161 ? 0.750 us/op
+Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 18.967 ? 0.225 us/op
+Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 19.474 ? 0.362 us/op
+Utf8EntryWriteTo.writeTo emoji avgt 9 19.743 ? 0.474 us/op
| | charType | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 20.013 | 19.161 | 4.45% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 19.956 | 18.967 | 5.21% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 20.294 | 19.474 | 4.21% |
| Utf8EntryWriteTo.writeTo | emoji | 20.444 | 19.743 | 3.55% |
I reverted the use of JLA#isLatin1GreaterThanZero, Using JLA#isLatin1GreaterThanZero is beneficial to performance improvement, but a threshold value is needed to avoid the possibility of slowdown caused by cache miss due to two loops. Liach mentioned above that the performance regression of jetty startup may be caused by this. Currently, this threshold is set to 256, which may be larger.
Here are the performance numbers on a MacBook M1 Pro (aarch64) and Aliyun ECS C8i (Intel x64). When the `TieredStopAtLevel=1` option is configured, the performance is improved by about 2. Without the option, the performance of the MacBook M1 Pro is improved by about 7%, and the performance under Linux Intel x64 is improved by about 0.7%.
## 1. TieredStopAtLevel=1
### 1.1 Script
git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao
# baseline
git checkout d53c9f153463d2a1e36f4157574d890763edbf0f
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo" MICRO="VM_OPTIONS=-Xint -XX:TieredStopAtLevel=1"
# current
git checkout a773005cee59d1ffae494ebbe2a552b15c30a5a6
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo" MICRO="VM_OPTIONS=-Xint -XX:TieredStopAtLevel=1"
### 1.2 MacBook M1 Pro Performance Numbers
-# baseline
-Benchmark (charType) Mode Cnt Score Error Units
-Utf8EntryWriteTo.writeTo ascii avgt 9 1374.766 ? 1.419 us/op
-Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 1382.998 ? 4.286 us/op
-Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 1425.600 ? 5.542 us/op
-Utf8EntryWriteTo.writeTo emoji avgt 9 1510.438 ? 4.431 us/op
+# current
+Benchmark (charType) Mode Cnt Score Error Units
+Utf8EntryWriteTo.writeTo ascii avgt 9 1345.666 ? 1.985 us/op
+Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 1351.628 ? 2.796 us/op
+Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 1388.559 ? 3.768 us/op
+Utf8EntryWriteTo.writeTo emoji avgt 9 1463.979 ? 4.763 us/op
| | charType | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 1374.766 | 1345.666 | 2.16% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 1382.998 | 1351.628 | 2.32% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 1425.600 | 1388.559 | 2.67% |
| Utf8EntryWriteTo.writeTo | emoji | 1510.438 | 1463.979 | 3.17% |
### 1.3 Aliyun ECS Performance Numbers
* Linux, Intel x64 cpu
-# baseline
-Benchmark (charType) Mode Cnt Score Error Units
-Utf8EntryWriteTo.writeTo ascii avgt 9 1641.666 ± 22.443 us/op
-Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 1642.376 ± 10.525 us/op
-Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 1695.354 ± 9.464 us/op
-Utf8EntryWriteTo.writeTo emoji avgt 9 1806.769 ± 11.571 us/op
+# current
+Benchmark (charType) Mode Cnt Score Error Units
+Utf8EntryWriteTo.writeTo ascii avgt 9 1611.783 ± 14.023 us/op
+Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 1630.246 ± 45.576 us/op
+Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 1659.347 ± 7.682 us/op
+Utf8EntryWriteTo.writeTo emoji avgt 9 1763.372 ± 13.568 us/op
| | charType | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 1641.666 | 1611.783 | 1.85% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 1642.376 | 1630.246 | 0.74% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 1695.354 | 1659.347 | 2.17% |
| Utf8EntryWriteTo.writeTo | emoji | 1806.769 | 1763.372 | 2.46% |
### 2.2 MacBook M1 Pro Performance Numbers
-# baseline
-Benchmark (charType) Mode Cnt Score Error Units
-Utf8EntryWriteTo.writeTo ascii avgt 9 1374.766 ? 1.419 us/op
-Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 1382.998 ? 4.286 us/op
-Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 1425.600 ? 5.542 us/op
-Utf8EntryWriteTo.writeTo emoji avgt 9 1510.438 ? 4.431 us/op
+# current
+Benchmark (charType) Mode Cnt Score Error Units
+Utf8EntryWriteTo.writeTo ascii avgt 9 1345.666 ? 1.985 us/op
+Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 1351.628 ? 2.796 us/op
+Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 1388.559 ? 3.768 us/op
+Utf8EntryWriteTo.writeTo emoji avgt 9 1463.979 ? 4.763 us/op
| | charType | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 19.960 | 18.637 | 7.10% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 19.973 | 18.669 | 6.98% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 20.153 | 18.794 | 7.23% |
| Utf8EntryWriteTo.writeTo | emoji | 20.688 | 19.001 | 8.88% |
### 2.3 Aliyun ECS Performance Numbers
* Linux, Intel x64 cpu
-# baseline
-Benchmark (charType) Mode Cnt Score Error Units
-Utf8EntryWriteTo.writeTo ascii avgt 9 21.146 ± 0.649 us/op
-Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 21.232 ± 0.248 us/op
-Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 21.358 ± 0.216 us/op
-Utf8EntryWriteTo.writeTo emoji avgt 9 21.381 ± 0.599 us/op
+# current
+Benchmark (charType) Mode Cnt Score Error Units
+Utf8EntryWriteTo.writeTo ascii avgt 9 21.000 ± 0.181 us/op
+Utf8EntryWriteTo.writeTo utf8_2_bytes avgt 9 21.141 ± 0.520 us/op
+Utf8EntryWriteTo.writeTo utf8_3_bytes avgt 9 21.381 ± 0.281 us/op
+Utf8EntryWriteTo.writeTo emoji avgt 9 21.436 ± 0.390 us/op
| | charType | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 21.146 | 21.000 | 0.70% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 21.232 | 21.141 | 0.43% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 21.358 | 21.381 | -0.11% |
| Utf8EntryWriteTo.writeTo | emoji | 21.381 | 21.436 | -0.26% |

The two red boxes in the figure are the optimization targets of PR #20772 and PR #20756
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20772#issuecomment-2322789508
PR Comment: https://git.openjdk.org/jdk/pull/20772#issuecomment-2323063113
PR Comment: https://git.openjdk.org/jdk/pull/20772#issuecomment-2323372154
More information about the core-libs-dev
mailing list