RFR: 8349176: Speed up Integer/Long.toString via Unsafe.allocateUninitializedArray

Shaojin Wen swen at openjdk.org
Sat Feb 1 00:51:59 UTC 2025


On Wed, 29 Jan 2025 16:36:24 GMT, Shaojin Wen <swen at openjdk.org> wrote:

> The byte[] allocated in Integer/Long.toString is fully filled, so we can use Unsafe.allocateUninitializedArray to create byte[] to improve performance.

This change demonstrates 2–23% speed improvements across multiple aarch64/x64 scenarios, but introduces ~18% regression in the Integers.toStringTiny benchmark on AMD EPYC™ Genoa processors. The regression is non-deterministic and not consistently reproducible.


## 1. Script

git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao

#baseline
git checkout f98d9a330128302207fb66dfa2555885ad93135f
make test TEST="micro:java.lang.Longs.toString"
make test TEST="micro:java.lang.Integers.toString"

# current
git checkout 2a06d12fcb7822395960c813d91a34eda0d661ce
make test TEST="micro:java.lang.Longs.toString"
make test TEST="micro:java.lang.Integers.toString"


## 2. MacBook M1 Pro (aarch64)

-# baseline
-Benchmark               (size)  Mode  Cnt  Score   Error  Units (f98d9a33012)
-Longs.toStringBig          500  avgt   15  7.265 ? 0.063  us/op
-Longs.toStringSmall        500  avgt   15  3.043 ? 0.051  us/op
-Integers.toStringBig       500  avgt   15  4.837 ? 0.076  us/op
-Integers.toStringSmall     500  avgt   15  2.922 ? 0.020  us/op
-Integers.toStringTiny      500  avgt   15  2.136 ? 0.010  us/op


+# current
+Benchmark               (size)  Mode  Cnt  Score   Error  Units (2a06d12fcb7)
+Longs.toStringBig          500  avgt   15  7.025 ? 0.024  us/op
+Longs.toStringSmall        500  avgt   15  2.735 ? 0.008  us/op
+Integers.toStringBig       500  avgt   15  4.592 ? 0.015  us/op
+Integers.toStringSmall     500  avgt   15  2.632 ? 0.026  us/op
+Integers.toStringTiny      500  avgt   15  1.734 ? 0.006  us/op


|   | pattern | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 7.265 | 7.025 | 3.42% |
| Longs.toStringSmall | 500 | 3.043 | 2.735 | 11.26% |
| Integers.toStringBig | 500 | 4.837 | 4.592 | 5.34% |
| Integers.toStringSmall | 500 | 2.922 | 2.632 | 11.02% |
| Integers.toStringTiny | 500 | 2.136 | 1.734 | 23.18% |


## 3. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)

+# baseline
+Benchmark               (size)  Mode  Cnt  Score   Error  Units (f98d9a33012)
+Longs.toStringBig          500  avgt   15  8.126 ± 0.027  us/op
+Longs.toStringSmall        500  avgt   15  3.296 ± 0.029  us/op
+Integers.toStringBig       500  avgt   15  4.957 ± 0.008  us/op
+Integers.toStringSmall     500  avgt   15  3.467 ± 0.020  us/op
+Integers.toStringTiny      500  avgt   15  2.534 ± 0.040  us/op

-# current
-Benchmark               (size)  Mode  Cnt  Score   Error  Units (2a06d12fcb7)
-Longs.toStringBig          500  avgt   15  7.540 ± 0.019  us/op
-Longs.toStringSmall        500  avgt   15  3.055 ± 0.006  us/op
-Integers.toStringBig       500  avgt   15  4.646 ± 0.024  us/op
-Integers.toStringSmall     500  avgt   15  3.173 ± 0.008  us/op
-Integers.toStringTiny      500  avgt   15  3.118 ± 0.029  us/op


|   | pattern | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 8.126 | 7.540 | 7.77% |
| Longs.toStringSmall | 500 | 3.296 | 3.055 | 7.89% |
| Integers.toStringBig | 500 | 4.957 | 4.646 | 6.69% |
| Integers.toStringSmall | 500 | 3.467 | 3.173 | 9.27% |
| Integers.toStringTiny | 500 | 2.534 | 3.118 | -18.73% |


It is observed here that performance degradation begins at Warmup Iteration 3.


# Warmup Iteration   1: 2.333 us/op
# Warmup Iteration   2: 2.248 us/op
# Warmup Iteration   3: 3.118 us/op
# Warmup Iteration   4: 3.121 us/op
# Warmup Iteration   5: 3.129 us/op
# Warmup Iteration   6: 3.122 us/op
# Warmup Iteration   7: 3.118 us/op
# Warmup Iteration   8: 3.154 us/op
# Warmup Iteration   9: 3.097 us/op
# Warmup Iteration  10: 3.090 us/op
Iteration   1: 3.090 us/op
Iteration   2: 3.091 us/op
Iteration   3: 3.092 us/op
Iteration   4: 3.093 us/op
Iteration   5: 3.098 us/op


## 4. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)

+# baseline
+Benchmark               (size)  Mode  Cnt  Score   Error  Units (f98d9a33012)
+Longs.toStringBig          500  avgt   15  7.992 ± 0.039  us/op
+Longs.toStringSmall        500  avgt   15  3.578 ± 0.022  us/op
+Integers.toStringBig       500  avgt   15  5.536 ± 0.017  us/op
+Integers.toStringSmall     500  avgt   15  3.657 ± 0.152  us/op
+Integers.toStringTiny      500  avgt   15  2.638 ± 0.047  us/op

-# current
-Benchmark               (size)  Mode  Cnt  Score   Error  Units (2a06d12fcb7)
-Longs.toStringBig          500  avgt   15  7.731 ± 0.011  us/op
-Longs.toStringSmall        500  avgt   15  3.413 ± 0.020  us/op
-Integers.toStringBig       500  avgt   15  4.738 ± 0.021  us/op
-Integers.toStringSmall     500  avgt   15  3.184 ± 0.140  us/op
-Integers.toStringTiny      500  avgt   15  2.621 ± 0.126  us/op


|   | pattern | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 7.992 | 7.731 | 3.38% |
| Longs.toStringSmall | 500 | 3.578 | 3.413 | 4.83% |
| Integers.toStringBig | 500 | 5.536 | 4.738 | 16.84% |
| Integers.toStringSmall | 500 | 3.657 | 3.184 | 14.86% |
| Integers.toStringTiny | 500 | 2.638 | 2.621 | 0.65% |


## 5. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)

+# baseline
+Benchmark               (size)  Mode  Cnt  Score   Error  Units (f98d9a33012)
+Longs.toStringBig          500  avgt   15  11.017 ± 0.084  us/op
+Longs.toStringSmall        500  avgt   15   4.400 ± 0.078  us/op
+Integers.toStringBig       500  avgt   15   7.377 ± 0.103  us/op
+Integers.toStringSmall     500  avgt   15   4.504 ± 0.083  us/op
+Integers.toStringTiny      500  avgt   15   3.693 ± 0.107  us/op

-# current
-Benchmark               (size)  Mode  Cnt  Score   Error  Units (2a06d12fcb7)
-Longs.toStringBig          500  avgt   15  10.696 ± 0.055  us/op
-Longs.toStringSmall        500  avgt   15   4.111 ± 0.113  us/op
-Integers.toStringBig       500  avgt   15   6.815 ± 0.097  us/op
-Integers.toStringSmall     500  avgt   15   4.136 ± 0.103  us/op
-Integers.toStringTiny      500  avgt   15   3.588 ± 0.102  us/op


|   | pattern | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 11.017 | 10.696 | 3.00% |
| Longs.toStringSmall | 500 | 4.400 | 4.111 | 7.03% |
| Integers.toStringBig | 500 | 7.377 | 6.815 | 8.25% |
| Integers.toStringSmall | 500 | 4.504 | 4.136 | 8.90% |
| Integers.toStringTiny | 500 | 3.693 | 3.588 | 2.93% |


## 6. orange_pi5_aarch64 (CPU RK3588S)

+# baseline
+Benchmark               (size)  Mode  Cnt   Score   Error  Units (f98d9a33012)
+Longs.toStringBig          500  avgt   15  23.235 ± 1.973  us/op
+Longs.toStringSmall        500  avgt   15   8.262 ± 0.555  us/op
+Integers.toStringBig       500  avgt   15  14.435 ± 0.819  us/op
+Integers.toStringSmall     500  avgt   15   8.384 ± 0.669  us/op
+Integers.toStringTiny      500  avgt   15   5.661 ± 0.404  us/op

-# current
-Benchmark               (size)  Mode  Cnt   Score   Error  Units (2a06d12fcb7)
-Longs.toStringBig          500  avgt   15  21.727 ± 1.396  us/op
-Longs.toStringSmall        500  avgt   15   7.591 ± 0.581  us/op
-Integers.toStringBig       500  avgt   15  13.682 ± 0.930  us/op
-Integers.toStringSmall     500  avgt   15   7.691 ± 0.575  us/op
-Integers.toStringTiny      500  avgt   15   4.943 ± 0.473  us/op


|   | pattern | baseline  | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 23.235 | 21.727 | 6.94% |
| Longs.toStringSmall | 500 | 8.262 | 7.591 | 8.84% |
| Integers.toStringBig | 500 | 14.435 | 13.682 | 5.50% |
| Integers.toStringSmall | 500 | 8.384 | 7.691 | 9.01% |
| Integers.toStringTiny | 500 | 5.661 | 4.943 | 14.53% |

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23353#issuecomment-2623354805


More information about the core-libs-dev mailing list