RFR: 8339290: Optimize ClassFile Utf8EntryImpl#writeTo [v4]

Shaojin Wen swen at openjdk.org
Mon Sep 2 14:06:43 UTC 2024


On Fri, 30 Aug 2024 19:39:32 GMT, Shaojin Wen <swen at openjdk.org> wrote:

>> Use fast path for ascii characters 1 to 127 to improve the performance of writing Utf8Entry to BufferWriter.
>
> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   optimize Utf8EntryImpl#writeTo(UTF)

Below are the performance numbers running on a MacBook M1, With the option `Xint -XX:TieredStopAtLevel=1`, the overall performance is improved by about 1%, and without the option, the performance is improved by about 4%.

## 1. TieredStopAtLevel=1

### 1.1 Script

git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao

# baseline
git checkout d53c9f153463d2a1e36f4157574d890763edbf0f
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo" MICRO="VM_OPTIONS=-Xint -XX:TieredStopAtLevel=1"

# current
git checkout f055f3fda508873241f28621cbc71ff19874e5ca
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo" MICRO="VM_OPTIONS=-Xint -XX:TieredStopAtLevel=1"


### 1.2 Performance Numbers


-# baseline
-Benchmark                   (charType)  Mode  Cnt     Score   Error  Units
-Utf8EntryWriteTo.writeTo         ascii  avgt    9  1376.779 ? 4.304  us/op
-Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  1384.241 ? 2.586  us/op
-Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  1423.933 ? 4.316  us/op
-Utf8EntryWriteTo.writeTo         emoji  avgt    9  1513.636 ? 7.290  us/op

+# current
+Benchmark                   (charType)  Mode  Cnt     Score   Error  Units
+Utf8EntryWriteTo.writeTo         ascii  avgt    9  1361.663 ? 2.395  us/op
+Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  1362.041 ? 4.400  us/op
+Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  1398.706 ? 3.941  us/op
+Utf8EntryWriteTo.writeTo         emoji  avgt    9  1475.106 ? 3.335  us/op


|   | pattern | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 1376.779 | 1361.663 | 1.11% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 1384.241 | 1362.041 | 1.63% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 1423.933 | 1398.706 | 1.80% |
| Utf8EntryWriteTo.writeTo | emoji | 1513.636 | 1475.106 | 2.61% |


## 2. Non-JVM Options

### 2.1 Script

git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao

# baseline
git checkout d53c9f153463d2a1e36f4157574d890763edbf0f
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo"

# current
git checkout f055f3fda508873241f28621cbc71ff19874e5ca
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo"


## 2.2 Performance Numbers

-# baseline
-Benchmark                   (charType)  Mode  Cnt   Score   Error  Units
-Utf8EntryWriteTo.writeTo         ascii  avgt    9  20.013 ? 0.291  us/op
-Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  19.956 ? 0.312  us/op
-Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  20.294 ? 0.297  us/op
-Utf8EntryWriteTo.writeTo         emoji  avgt    9  20.444 ? 0.343  us/op

+# current
+Benchmark                   (charType)  Mode  Cnt   Score   Error  Units
+Utf8EntryWriteTo.writeTo         ascii  avgt    9  19.161 ? 0.750  us/op
+Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  18.967 ? 0.225  us/op
+Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  19.474 ? 0.362  us/op
+Utf8EntryWriteTo.writeTo         emoji  avgt    9  19.743 ? 0.474  us/op


|   | charType | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 20.013 | 19.161 | 4.45% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 19.956 | 18.967 | 5.21% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 20.294 | 19.474 | 4.21% |
| Utf8EntryWriteTo.writeTo | emoji | 20.444 | 19.743 | 3.55% |

I reverted the use of JLA#isLatin1GreaterThanZero, Using JLA#isLatin1GreaterThanZero is beneficial to performance improvement, but a threshold value is needed to avoid the possibility of slowdown caused by cache miss due to two loops. Liach mentioned above that the performance regression of jetty startup may be caused by this. Currently, this threshold is set to 256, which may be larger.

Here are the performance numbers on a MacBook M1 Pro (aarch64) and Aliyun ECS C8i (Intel x64). When the `TieredStopAtLevel=1` option is configured, the performance is improved by about 2. Without the option, the performance of the MacBook M1 Pro is improved by about 7%, and the performance under Linux Intel x64 is improved by about 0.7%.

## 1. TieredStopAtLevel=1

### 1.1 Script

git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao

# baseline
git checkout d53c9f153463d2a1e36f4157574d890763edbf0f
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo" MICRO="VM_OPTIONS=-Xint -XX:TieredStopAtLevel=1"

# current
git checkout a773005cee59d1ffae494ebbe2a552b15c30a5a6
make test TEST="micro:java.lang.classfile.Utf8EntryWriteTo" MICRO="VM_OPTIONS=-Xint -XX:TieredStopAtLevel=1"


### 1.2 MacBook M1 Pro Performance Numbers


-# baseline
-Benchmark                   (charType)  Mode  Cnt     Score   Error  Units
-Utf8EntryWriteTo.writeTo         ascii  avgt    9  1374.766 ? 1.419  us/op
-Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  1382.998 ? 4.286  us/op
-Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  1425.600 ? 5.542  us/op
-Utf8EntryWriteTo.writeTo         emoji  avgt    9  1510.438 ? 4.431  us/op

+# current
+Benchmark                   (charType)  Mode  Cnt     Score   Error  Units
+Utf8EntryWriteTo.writeTo         ascii  avgt    9  1345.666 ? 1.985  us/op
+Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  1351.628 ? 2.796  us/op
+Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  1388.559 ? 3.768  us/op
+Utf8EntryWriteTo.writeTo         emoji  avgt    9  1463.979 ? 4.763  us/op


|   | charType | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 1374.766 | 1345.666 | 2.16% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 1382.998 | 1351.628 | 2.32% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 1425.600 | 1388.559 | 2.67% |
| Utf8EntryWriteTo.writeTo | emoji | 1510.438 | 1463.979 | 3.17% |

### 1.3 Aliyun ECS Performance Numbers
* Linux, Intel x64 cpu


-# baseline
-Benchmark                   (charType)  Mode  Cnt     Score    Error  Units
-Utf8EntryWriteTo.writeTo         ascii  avgt    9  1641.666 ± 22.443  us/op
-Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  1642.376 ± 10.525  us/op
-Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  1695.354 ±  9.464  us/op
-Utf8EntryWriteTo.writeTo         emoji  avgt    9  1806.769 ± 11.571  us/op

+# current
+Benchmark                   (charType)  Mode  Cnt     Score    Error  Units
+Utf8EntryWriteTo.writeTo         ascii  avgt    9  1611.783 ± 14.023  us/op
+Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  1630.246 ± 45.576  us/op
+Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  1659.347 ±  7.682  us/op
+Utf8EntryWriteTo.writeTo         emoji  avgt    9  1763.372 ± 13.568  us/op



|   | charType | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 1641.666 | 1611.783 | 1.85% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 1642.376 | 1630.246 | 0.74% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 1695.354 | 1659.347 | 2.17% |
| Utf8EntryWriteTo.writeTo | emoji | 1806.769 | 1763.372 | 2.46% |


### 2.2 MacBook M1 Pro Performance Numbers


-# baseline
-Benchmark                   (charType)  Mode  Cnt     Score   Error  Units
-Utf8EntryWriteTo.writeTo         ascii  avgt    9  1374.766 ? 1.419  us/op
-Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  1382.998 ? 4.286  us/op
-Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  1425.600 ? 5.542  us/op
-Utf8EntryWriteTo.writeTo         emoji  avgt    9  1510.438 ? 4.431  us/op

+# current
+Benchmark                   (charType)  Mode  Cnt     Score   Error  Units
+Utf8EntryWriteTo.writeTo         ascii  avgt    9  1345.666 ? 1.985  us/op
+Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  1351.628 ? 2.796  us/op
+Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  1388.559 ? 3.768  us/op
+Utf8EntryWriteTo.writeTo         emoji  avgt    9  1463.979 ? 4.763  us/op


|   | charType | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 19.960 | 18.637 | 7.10% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 19.973 | 18.669 | 6.98% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 20.153 | 18.794 | 7.23% |
| Utf8EntryWriteTo.writeTo | emoji | 20.688 | 19.001 | 8.88% |


### 2.3 Aliyun ECS Performance Numbers
* Linux, Intel x64 cpu

-# baseline
-Benchmark                   (charType)  Mode  Cnt   Score   Error  Units
-Utf8EntryWriteTo.writeTo         ascii  avgt    9  21.146 ± 0.649  us/op
-Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  21.232 ± 0.248  us/op
-Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  21.358 ± 0.216  us/op
-Utf8EntryWriteTo.writeTo         emoji  avgt    9  21.381 ± 0.599  us/op

+# current
+Benchmark                   (charType)  Mode  Cnt   Score   Error  Units
+Utf8EntryWriteTo.writeTo         ascii  avgt    9  21.000 ± 0.181  us/op
+Utf8EntryWriteTo.writeTo  utf8_2_bytes  avgt    9  21.141 ± 0.520  us/op
+Utf8EntryWriteTo.writeTo  utf8_3_bytes  avgt    9  21.381 ± 0.281  us/op
+Utf8EntryWriteTo.writeTo         emoji  avgt    9  21.436 ± 0.390  us/op



|   | charType | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Utf8EntryWriteTo.writeTo | ascii | 21.146 | 21.000 | 0.70% |
| Utf8EntryWriteTo.writeTo | utf8_2_bytes | 21.232 | 21.141 | 0.43% |
| Utf8EntryWriteTo.writeTo | utf8_3_bytes | 21.358 | 21.381 | -0.11% |
| Utf8EntryWriteTo.writeTo | emoji | 21.381 | 21.436 | -0.26% |

![image](https://github.com/user-attachments/assets/f116e89a-c784-4c20-bd10-843f792c5265)

The two red boxes in the figure are the optimization targets of PR #20772 and PR #20756

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20772#issuecomment-2322789508
PR Comment: https://git.openjdk.org/jdk/pull/20772#issuecomment-2323063113
PR Comment: https://git.openjdk.org/jdk/pull/20772#issuecomment-2323372154


More information about the core-libs-dev mailing list