RFR: 8359424: Eliminate table lookup in Integer/Long toHexString

Fri Jun 13 12:07:21 UTC 2025

On Tue, 7 Jan 2025 10:39:18 GMT, Shaojin Wen <swen at openjdk.org> wrote:

> In PR #22928, UUID introduced long-based vectorized hexadecimal to string conversion, which can also be used in Integer::toHexString and Long::toHexString to eliminate table lookups. The benefit of eliminating table lookups is that the performance is better when cache misses occur.

The testing data from both aarch64 and x64 architectures indicates a performance improvement of 10% to 20%. However, under the MacBook M1 Pro environment, the performance enhancement for the Integer.toHexString scenario has reached 100%.

## 1. Script

git remote add wenshao git at github.com:wenshao/jdk.git
git fetch wenshao

# baseline 91db7c0877a
git checkout 91db7c0877a68ad171da2b4501280fc24630ae83
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"

 # current 1788d09787c
git checkout 1788d09787cadfe6ec23b9b10bef87a2cdc029a3
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"

## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units (baseline 91db7c0877a)
-Integers.toHexString     500  avgt   15  4.855 ± 0.058  us/op
-Longs.toHexString        500  avgt   15  6.098 ± 0.034  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units (current 1788d09787c)
+Integers.toHexString     500  avgt   15  4.105 ± 0.010  us/op +18.27%
+Longs.toHexString        500  avgt   15  4.682 ± 0.116  us/op +30.24%

## 3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units
-Integers.toHexString     500  avgt   15  5.158 ± 0.025  us/op
-Longs.toHexString        500  avgt   15  6.072 ± 0.020  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units
+Integers.toHexString     500  avgt   15  4.691 ± 0.024  us/op  +9.95%
+Longs.toHexString        500  avgt   15  4.947 ± 0.024  us/op +22.74%

## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units
-Integers.toHexString     500  avgt   15  5.880 ± 0.017  us/op
-Longs.toHexString        500  avgt   15  7.183 ± 0.013  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units
+Integers.toHexString     500  avgt   15  5.282 ± 0.012  us/op +11.32%
+Longs.toHexString        500  avgt   15  5.530 ± 0.013  us/op +29.89%

## 5. MacBook M1 Pro (aarch64)

-Benchmark             (size)  Mode  Cnt   Score   Error  Units (baseline 91db7c0877a)
-Integers.toHexString     500  avgt   15  10.519 ? 1.573  us/op
-Longs.toHexString        500  avgt   15  5.754 ? 0.264  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units (current 1788d09787c)
+Integers.toHexString     500  avgt   15  5.057 ? 0.015  us/op +108.00%
+Longs.toHexString        500  avgt   15  5.147 ? 0.095  us/op  +11.79%

Because this algorithm underperforms compared to the original version when handling smaller numbers, I have marked this PR as draft. 

Additionally, this algorithm is used in another PR #22928 [Speed up UUID::toString](https://github.com/openjdk/jdk/pull/22928) , and it still experiences performance degradation with Long.expand on older CPU architectures.

// Method 1:
i = Long.reverseBytes(Long.expand(i, 0x0F0F_0F0F_0F0F_0F0FL));

// Method 2:
i = ((i & 0xF0000000L) >> 28)
  | ((i & 0xF000000L) >> 16)
  | ((i & 0xF00000L) >> 4)
  | ((i & 0xF0000L) << 8)
  | ((i & 0xF000L) << 20)
  | ((i & 0xF00L) << 32)
  | ((i & 0xF0L) << 44)
  | ((i & 0xFL) << 56);

Note: Using Long.reverseBytes + Long.expand is faster on x64 and ARMv9.
However, on AArch64 with ARMv8, it will be slower compared to the manual unrolling shown in Method 2.
ARMv8 includes Apple M1/M2, AWS Graviton 3; ARMv9.0 includes Apple M3/M4, Aliyun Yitian 710.

I haven't tested this on older x64 CPUs, like the AMD ZEN1, but it's possible that they experience the same issue.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22942#issuecomment-2576197320
PR Comment: https://git.openjdk.org/jdk/pull/22942#issuecomment-2578863538