RFR: 8349176: Speed up Integer/Long.toString via Unsafe.allocateUninitializedArray
Shaojin Wen
swen at openjdk.org
Sat Feb 1 00:51:59 UTC 2025
On Wed, 29 Jan 2025 16:36:24 GMT, Shaojin Wen <swen at openjdk.org> wrote:
> The byte[] allocated in Integer/Long.toString is fully filled, so we can use Unsafe.allocateUninitializedArray to create byte[] to improve performance.
This change demonstrates 2–23% speed improvements across multiple aarch64/x64 scenarios, but introduces ~18% regression in the Integers.toStringTiny benchmark on AMD EPYC™ Genoa processors. The regression is non-deterministic and not consistently reproducible.
## 1. Script
git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao
#baseline
git checkout f98d9a330128302207fb66dfa2555885ad93135f
make test TEST="micro:java.lang.Longs.toString"
make test TEST="micro:java.lang.Integers.toString"
# current
git checkout 2a06d12fcb7822395960c813d91a34eda0d661ce
make test TEST="micro:java.lang.Longs.toString"
make test TEST="micro:java.lang.Integers.toString"
## 2. MacBook M1 Pro (aarch64)
-# baseline
-Benchmark (size) Mode Cnt Score Error Units (f98d9a33012)
-Longs.toStringBig 500 avgt 15 7.265 ? 0.063 us/op
-Longs.toStringSmall 500 avgt 15 3.043 ? 0.051 us/op
-Integers.toStringBig 500 avgt 15 4.837 ? 0.076 us/op
-Integers.toStringSmall 500 avgt 15 2.922 ? 0.020 us/op
-Integers.toStringTiny 500 avgt 15 2.136 ? 0.010 us/op
+# current
+Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7)
+Longs.toStringBig 500 avgt 15 7.025 ? 0.024 us/op
+Longs.toStringSmall 500 avgt 15 2.735 ? 0.008 us/op
+Integers.toStringBig 500 avgt 15 4.592 ? 0.015 us/op
+Integers.toStringSmall 500 avgt 15 2.632 ? 0.026 us/op
+Integers.toStringTiny 500 avgt 15 1.734 ? 0.006 us/op
| | pattern | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 7.265 | 7.025 | 3.42% |
| Longs.toStringSmall | 500 | 3.043 | 2.735 | 11.26% |
| Integers.toStringBig | 500 | 4.837 | 4.592 | 5.34% |
| Integers.toStringSmall | 500 | 2.922 | 2.632 | 11.02% |
| Integers.toStringTiny | 500 | 2.136 | 1.734 | 23.18% |
## 3. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)
+# baseline
+Benchmark (size) Mode Cnt Score Error Units (f98d9a33012)
+Longs.toStringBig 500 avgt 15 8.126 ± 0.027 us/op
+Longs.toStringSmall 500 avgt 15 3.296 ± 0.029 us/op
+Integers.toStringBig 500 avgt 15 4.957 ± 0.008 us/op
+Integers.toStringSmall 500 avgt 15 3.467 ± 0.020 us/op
+Integers.toStringTiny 500 avgt 15 2.534 ± 0.040 us/op
-# current
-Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7)
-Longs.toStringBig 500 avgt 15 7.540 ± 0.019 us/op
-Longs.toStringSmall 500 avgt 15 3.055 ± 0.006 us/op
-Integers.toStringBig 500 avgt 15 4.646 ± 0.024 us/op
-Integers.toStringSmall 500 avgt 15 3.173 ± 0.008 us/op
-Integers.toStringTiny 500 avgt 15 3.118 ± 0.029 us/op
| | pattern | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 8.126 | 7.540 | 7.77% |
| Longs.toStringSmall | 500 | 3.296 | 3.055 | 7.89% |
| Integers.toStringBig | 500 | 4.957 | 4.646 | 6.69% |
| Integers.toStringSmall | 500 | 3.467 | 3.173 | 9.27% |
| Integers.toStringTiny | 500 | 2.534 | 3.118 | -18.73% |
It is observed here that performance degradation begins at Warmup Iteration 3.
# Warmup Iteration 1: 2.333 us/op
# Warmup Iteration 2: 2.248 us/op
# Warmup Iteration 3: 3.118 us/op
# Warmup Iteration 4: 3.121 us/op
# Warmup Iteration 5: 3.129 us/op
# Warmup Iteration 6: 3.122 us/op
# Warmup Iteration 7: 3.118 us/op
# Warmup Iteration 8: 3.154 us/op
# Warmup Iteration 9: 3.097 us/op
# Warmup Iteration 10: 3.090 us/op
Iteration 1: 3.090 us/op
Iteration 2: 3.091 us/op
Iteration 3: 3.092 us/op
Iteration 4: 3.093 us/op
Iteration 5: 3.098 us/op
## 4. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)
+# baseline
+Benchmark (size) Mode Cnt Score Error Units (f98d9a33012)
+Longs.toStringBig 500 avgt 15 7.992 ± 0.039 us/op
+Longs.toStringSmall 500 avgt 15 3.578 ± 0.022 us/op
+Integers.toStringBig 500 avgt 15 5.536 ± 0.017 us/op
+Integers.toStringSmall 500 avgt 15 3.657 ± 0.152 us/op
+Integers.toStringTiny 500 avgt 15 2.638 ± 0.047 us/op
-# current
-Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7)
-Longs.toStringBig 500 avgt 15 7.731 ± 0.011 us/op
-Longs.toStringSmall 500 avgt 15 3.413 ± 0.020 us/op
-Integers.toStringBig 500 avgt 15 4.738 ± 0.021 us/op
-Integers.toStringSmall 500 avgt 15 3.184 ± 0.140 us/op
-Integers.toStringTiny 500 avgt 15 2.621 ± 0.126 us/op
| | pattern | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 7.992 | 7.731 | 3.38% |
| Longs.toStringSmall | 500 | 3.578 | 3.413 | 4.83% |
| Integers.toStringBig | 500 | 5.536 | 4.738 | 16.84% |
| Integers.toStringSmall | 500 | 3.657 | 3.184 | 14.86% |
| Integers.toStringTiny | 500 | 2.638 | 2.621 | 0.65% |
## 5. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)
+# baseline
+Benchmark (size) Mode Cnt Score Error Units (f98d9a33012)
+Longs.toStringBig 500 avgt 15 11.017 ± 0.084 us/op
+Longs.toStringSmall 500 avgt 15 4.400 ± 0.078 us/op
+Integers.toStringBig 500 avgt 15 7.377 ± 0.103 us/op
+Integers.toStringSmall 500 avgt 15 4.504 ± 0.083 us/op
+Integers.toStringTiny 500 avgt 15 3.693 ± 0.107 us/op
-# current
-Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7)
-Longs.toStringBig 500 avgt 15 10.696 ± 0.055 us/op
-Longs.toStringSmall 500 avgt 15 4.111 ± 0.113 us/op
-Integers.toStringBig 500 avgt 15 6.815 ± 0.097 us/op
-Integers.toStringSmall 500 avgt 15 4.136 ± 0.103 us/op
-Integers.toStringTiny 500 avgt 15 3.588 ± 0.102 us/op
| | pattern | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 11.017 | 10.696 | 3.00% |
| Longs.toStringSmall | 500 | 4.400 | 4.111 | 7.03% |
| Integers.toStringBig | 500 | 7.377 | 6.815 | 8.25% |
| Integers.toStringSmall | 500 | 4.504 | 4.136 | 8.90% |
| Integers.toStringTiny | 500 | 3.693 | 3.588 | 2.93% |
## 6. orange_pi5_aarch64 (CPU RK3588S)
+# baseline
+Benchmark (size) Mode Cnt Score Error Units (f98d9a33012)
+Longs.toStringBig 500 avgt 15 23.235 ± 1.973 us/op
+Longs.toStringSmall 500 avgt 15 8.262 ± 0.555 us/op
+Integers.toStringBig 500 avgt 15 14.435 ± 0.819 us/op
+Integers.toStringSmall 500 avgt 15 8.384 ± 0.669 us/op
+Integers.toStringTiny 500 avgt 15 5.661 ± 0.404 us/op
-# current
-Benchmark (size) Mode Cnt Score Error Units (2a06d12fcb7)
-Longs.toStringBig 500 avgt 15 21.727 ± 1.396 us/op
-Longs.toStringSmall 500 avgt 15 7.591 ± 0.581 us/op
-Integers.toStringBig 500 avgt 15 13.682 ± 0.930 us/op
-Integers.toStringSmall 500 avgt 15 7.691 ± 0.575 us/op
-Integers.toStringTiny 500 avgt 15 4.943 ± 0.473 us/op
| | pattern | baseline | current | delta |
| --- | --- | --- | --- | --- |
| Longs.toStringBig | 500 | 23.235 | 21.727 | 6.94% |
| Longs.toStringSmall | 500 | 8.262 | 7.591 | 8.84% |
| Integers.toStringBig | 500 | 14.435 | 13.682 | 5.50% |
| Integers.toStringSmall | 500 | 8.384 | 7.691 | 9.01% |
| Integers.toStringTiny | 500 | 5.661 | 4.943 | 14.53% |
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23353#issuecomment-2623354805
More information about the core-libs-dev
mailing list