RFR: 8315789: Minor HexFormat performance improvements

Fri Sep 8 09:36:43 UTC 2023

On Fri, 8 Sep 2023 08:47:04 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:

>> This PR seeks to improve formatting of hex digits using `java.util.HexFormat` somewhat.
>> 
>> This is achieved getting rid of a couple of lookup tables, caching the result of `HexFormat.of().withUpperCase()`, and removing tiny allocation that happens in the `formatHex(A, byte)` method. Improvements range from 20-40% on throughput, and some operations allocate less:
>> 
>> 
>> Name                               Cnt   Base   Error   Test   Error   Unit   Diff%
>> HexFormatBench.appenderLower        15  1,330 ± 0,021  1,065 ± 0,067  us/op   19,9% (p = 0,000*)
>>   :gc.alloc.rate                    15 11,481 ± 0,185  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
>>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
>>   :gc.count                         15  3,000          0,000         counts
>>   :gc.time                           3  2,000                            ms
>> HexFormatBench.appenderLowerCached  15  1,317 ± 0,013  1,065 ± 0,054  us/op   19,1% (p = 0,000*)
>>   :gc.alloc.rate                    15 11,590 ± 0,111  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
>>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
>>   :gc.count                         15  3,000          0,000         counts
>>   :gc.time                           3  2,000                            ms
>> HexFormatBench.appenderUpper        15  1,330 ± 0,022  1,065 ± 0,036  us/op   19,9% (p = 0,000*)
>>   :gc.alloc.rate                    15 34,416 ± 0,559  0,007 ± 0,000 MB/sec -100,0% (p = 0,000*)
>>   :gc.alloc.rate.norm               15 48,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
>>   :gc.count                         15  0,000          0,000         counts
>> HexFormatBench.appenderUpperCached  15  1,353 ± 0,009  1,033 ± 0,014  us/op   23,6% (p = 0,000*)
>>   :gc.alloc.rate                    15 11,284 ± 0,074  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
>>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
>>   :gc.count                         15  3,000          0,000         counts
>>   :gc.time                           3  2,000                            ms
>> HexFormatBench.toHexLower           15  0,198 ± 0,001  0,119 ± 0,008  us/op   40,1% (p = 0,000*)
>>   :gc.alloc.rate                    15  0,007 ± 0,000  0,007 ± 0,000 MB/sec   -0,0% (p = 0,816 )
>>   :gc.alloc.rate.norm               15  0,001 ± 0,000  0,001 ± 0,000   B/op  -40,1% (p = 0,000*)
>>   :gc....
>
> I'm not sure that micro-benchmarks are very indicative on whether a lookup table performs better than short and straightforward code.
> The reason is that, once in the CPU caches, a lookup table in micro-benchmarks stays there, whereas in more realistic situations, where access is more spread out in time, it is often evicted to make room for other, more lively data.
> A micro-benchmark using a lookup table shows good results because of a high rate of cache hits, whereas in other real-world workloads a lookup table might result in many cache misses on access.

As @rgiulietti says lookup-table algorithms may outperform in microbenchmarks but lose out in real world scenarios, so we need to stay clear unless there's major benefit. 

And as it turns out, the relative benefit seem to come mainly from the use of `ByteArrayLittleEndian`, not from using the lookup table in `HexDigits.DIGITS`:

     public String toHexDigits(byte value) {
         byte[] rep = new byte[2];
-        rep[0] = (byte)toHighHexDigit(value);
-        rep[1] = (byte)toLowHexDigit(value);
+        ByteArrayLittleEndian.setChar(rep, 0, (char)(toHighHexDigit(value) << 8 | toLowHexDigit(value)));
         try {
             return jla.newStringNoRepl(rep, StandardCharsets.ISO_8859_1);
         } catch (CharacterCodingException cce) {

This gets the same speed-up (4%) as calling `HexDigits.DIGITS`:

Name                       Cnt      Base    Error       Test    Error   Unit   Diff%
HexFormatBench.toHexDigits  15     1,943 ±  0,014      1,860 ±  0,014  us/op    4,2% (p = 0,000*)
  * = significant

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15591#issuecomment-1711370758