RFR: 8315789: Minor HexFormat performance improvements

Fri Sep 8 08:49:38 UTC 2023

On Wed, 6 Sep 2023 13:36:22 GMT, Claes Redestad <redestad at openjdk.org> wrote:

> This PR seeks to improve formatting of hex digits using `java.util.HexFormat` somewhat.
> 
> This is achieved getting rid of a couple of lookup tables, caching the result of `HexFormat.of().withUpperCase()`, and removing tiny allocation that happens in the `formatHex(A, byte)` method. Improvements range from 20-40% on throughput, and some operations allocate less:
> 
> 
> Name                               Cnt   Base   Error   Test   Error   Unit   Diff%
> HexFormatBench.appenderLower        15  1,330 ± 0,021  1,065 ± 0,067  us/op   19,9% (p = 0,000*)
>   :gc.alloc.rate                    15 11,481 ± 0,185  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
>   :gc.count                         15  3,000          0,000         counts
>   :gc.time                           3  2,000                            ms
> HexFormatBench.appenderLowerCached  15  1,317 ± 0,013  1,065 ± 0,054  us/op   19,1% (p = 0,000*)
>   :gc.alloc.rate                    15 11,590 ± 0,111  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
>   :gc.count                         15  3,000          0,000         counts
>   :gc.time                           3  2,000                            ms
> HexFormatBench.appenderUpper        15  1,330 ± 0,022  1,065 ± 0,036  us/op   19,9% (p = 0,000*)
>   :gc.alloc.rate                    15 34,416 ± 0,559  0,007 ± 0,000 MB/sec -100,0% (p = 0,000*)
>   :gc.alloc.rate.norm               15 48,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
>   :gc.count                         15  0,000          0,000         counts
> HexFormatBench.appenderUpperCached  15  1,353 ± 0,009  1,033 ± 0,014  us/op   23,6% (p = 0,000*)
>   :gc.alloc.rate                    15 11,284 ± 0,074  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
>   :gc.count                         15  3,000          0,000         counts
>   :gc.time                           3  2,000                            ms
> HexFormatBench.toHexLower           15  0,198 ± 0,001  0,119 ± 0,008  us/op   40,1% (p = 0,000*)
>   :gc.alloc.rate                    15  0,007 ± 0,000  0,007 ± 0,000 MB/sec   -0,0% (p = 0,816 )
>   :gc.alloc.rate.norm               15  0,001 ± 0,000  0,001 ± 0,000   B/op  -40,1% (p = 0,000*)
>   :gc.count                         15  0,000          0,000    ...

I'm not sure that micro-benchmarks are very indicative on whether a lookup table performs better than short and straightforward code.
The reason is that, once in the CPU caches, a lookup table in micro-benchmarks stays there, whereas in more realistic situations, where access is more spread out in time, it is often evicted to make room for other, more lively data.
A micro-benchmark using a lookup table shows good results because of a high rate of cache hits, whereas in other real-world workloads a lookup table might result in many cache misses on access.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15591#issuecomment-1711307164