RFR: 8359424: Eliminate table lookup in Integer/Long toHexString
Shaojin Wen
swen at openjdk.org
Fri Jun 13 12:07:21 UTC 2025
On Tue, 7 Jan 2025 10:39:18 GMT, Shaojin Wen <swen at openjdk.org> wrote:
> In PR #22928, UUID introduced long-based vectorized hexadecimal to string conversion, which can also be used in Integer::toHexString and Long::toHexString to eliminate table lookups. The benefit of eliminating table lookups is that the performance is better when cache misses occur.
The testing data from both aarch64 and x64 architectures indicates a performance improvement of 10% to 20%. However, under the MacBook M1 Pro environment, the performance enhancement for the Integer.toHexString scenario has reached 100%.
## 1. Script
git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao
# baseline 91db7c0877a
git checkout 91db7c0877a68ad171da2b4501280fc24630ae83
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"
# current 1788d09787c
git checkout 1788d09787cadfe6ec23b9b10bef87a2cdc029a3
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"
## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)
-Benchmark (size) Mode Cnt Score Error Units (baseline 91db7c0877a)
-Integers.toHexString 500 avgt 15 4.855 ± 0.058 us/op
-Longs.toHexString 500 avgt 15 6.098 ± 0.034 us/op
+Benchmark (size) Mode Cnt Score Error Units (current 1788d09787c)
+Integers.toHexString 500 avgt 15 4.105 ± 0.010 us/op +18.27%
+Longs.toHexString 500 avgt 15 4.682 ± 0.116 us/op +30.24%
## 3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)
-Benchmark (size) Mode Cnt Score Error Units
-Integers.toHexString 500 avgt 15 5.158 ± 0.025 us/op
-Longs.toHexString 500 avgt 15 6.072 ± 0.020 us/op
+Benchmark (size) Mode Cnt Score Error Units
+Integers.toHexString 500 avgt 15 4.691 ± 0.024 us/op +9.95%
+Longs.toHexString 500 avgt 15 4.947 ± 0.024 us/op +22.74%
## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)
-Benchmark (size) Mode Cnt Score Error Units
-Integers.toHexString 500 avgt 15 5.880 ± 0.017 us/op
-Longs.toHexString 500 avgt 15 7.183 ± 0.013 us/op
+Benchmark (size) Mode Cnt Score Error Units
+Integers.toHexString 500 avgt 15 5.282 ± 0.012 us/op +11.32%
+Longs.toHexString 500 avgt 15 5.530 ± 0.013 us/op +29.89%
## 5. MacBook M1 Pro (aarch64)
-Benchmark (size) Mode Cnt Score Error Units (baseline 91db7c0877a)
-Integers.toHexString 500 avgt 15 10.519 ? 1.573 us/op
-Longs.toHexString 500 avgt 15 5.754 ? 0.264 us/op
+Benchmark (size) Mode Cnt Score Error Units (current 1788d09787c)
+Integers.toHexString 500 avgt 15 5.057 ? 0.015 us/op +108.00%
+Longs.toHexString 500 avgt 15 5.147 ? 0.095 us/op +11.79%
Because this algorithm underperforms compared to the original version when handling smaller numbers, I have marked this PR as draft.
Additionally, this algorithm is used in another PR #22928 [Speed up UUID::toString](https://github.com/openjdk/jdk/pull/22928) , and it still experiences performance degradation with Long.expand on older CPU architectures.
// Method 1:
i = Long.reverseBytes(Long.expand(i, 0x0F0F_0F0F_0F0F_0F0FL));
// Method 2:
i = ((i & 0xF0000000L) >> 28)
| ((i & 0xF000000L) >> 16)
| ((i & 0xF00000L) >> 4)
| ((i & 0xF0000L) << 8)
| ((i & 0xF000L) << 20)
| ((i & 0xF00L) << 32)
| ((i & 0xF0L) << 44)
| ((i & 0xFL) << 56);
Note: Using Long.reverseBytes + Long.expand is faster on x64 and ARMv9.
However, on AArch64 with ARMv8, it will be slower compared to the manual unrolling shown in Method 2.
ARMv8 includes Apple M1/M2, AWS Graviton 3; ARMv9.0 includes Apple M3/M4, Aliyun Yitian 710.
I haven't tested this on older x64 CPUs, like the AMD ZEN1, but it's possible that they experience the same issue.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22942#issuecomment-2576197320
PR Comment: https://git.openjdk.org/jdk/pull/22942#issuecomment-2578863538
More information about the core-libs-dev
mailing list