RFR: 8315789: Minor HexFormat performance improvements
Claes Redestad
redestad at openjdk.org
Fri Sep 8 09:36:43 UTC 2023
On Fri, 8 Sep 2023 08:47:04 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:
>> This PR seeks to improve formatting of hex digits using `java.util.HexFormat` somewhat.
>>
>> This is achieved getting rid of a couple of lookup tables, caching the result of `HexFormat.of().withUpperCase()`, and removing tiny allocation that happens in the `formatHex(A, byte)` method. Improvements range from 20-40% on throughput, and some operations allocate less:
>>
>>
>> Name Cnt Base Error Test Error Unit Diff%
>> HexFormatBench.appenderLower 15 1,330 ± 0,021 1,065 ± 0,067 us/op 19,9% (p = 0,000*)
>> :gc.alloc.rate 15 11,481 ± 0,185 0,007 ± 0,000 MB/sec -99,9% (p = 0,000*)
>> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op -100,0% (p = 0,000*)
>> :gc.count 15 3,000 0,000 counts
>> :gc.time 3 2,000 ms
>> HexFormatBench.appenderLowerCached 15 1,317 ± 0,013 1,065 ± 0,054 us/op 19,1% (p = 0,000*)
>> :gc.alloc.rate 15 11,590 ± 0,111 0,007 ± 0,000 MB/sec -99,9% (p = 0,000*)
>> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op -100,0% (p = 0,000*)
>> :gc.count 15 3,000 0,000 counts
>> :gc.time 3 2,000 ms
>> HexFormatBench.appenderUpper 15 1,330 ± 0,022 1,065 ± 0,036 us/op 19,9% (p = 0,000*)
>> :gc.alloc.rate 15 34,416 ± 0,559 0,007 ± 0,000 MB/sec -100,0% (p = 0,000*)
>> :gc.alloc.rate.norm 15 48,009 ± 0,000 0,007 ± 0,000 B/op -100,0% (p = 0,000*)
>> :gc.count 15 0,000 0,000 counts
>> HexFormatBench.appenderUpperCached 15 1,353 ± 0,009 1,033 ± 0,014 us/op 23,6% (p = 0,000*)
>> :gc.alloc.rate 15 11,284 ± 0,074 0,007 ± 0,000 MB/sec -99,9% (p = 0,000*)
>> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op -100,0% (p = 0,000*)
>> :gc.count 15 3,000 0,000 counts
>> :gc.time 3 2,000 ms
>> HexFormatBench.toHexLower 15 0,198 ± 0,001 0,119 ± 0,008 us/op 40,1% (p = 0,000*)
>> :gc.alloc.rate 15 0,007 ± 0,000 0,007 ± 0,000 MB/sec -0,0% (p = 0,816 )
>> :gc.alloc.rate.norm 15 0,001 ± 0,000 0,001 ± 0,000 B/op -40,1% (p = 0,000*)
>> :gc....
>
> I'm not sure that micro-benchmarks are very indicative on whether a lookup table performs better than short and straightforward code.
> The reason is that, once in the CPU caches, a lookup table in micro-benchmarks stays there, whereas in more realistic situations, where access is more spread out in time, it is often evicted to make room for other, more lively data.
> A micro-benchmark using a lookup table shows good results because of a high rate of cache hits, whereas in other real-world workloads a lookup table might result in many cache misses on access.
As @rgiulietti says lookup-table algorithms may outperform in microbenchmarks but lose out in real world scenarios, so we need to stay clear unless there's major benefit.
And as it turns out, the relative benefit seem to come mainly from the use of `ByteArrayLittleEndian`, not from using the lookup table in `HexDigits.DIGITS`:
public String toHexDigits(byte value) {
byte[] rep = new byte[2];
- rep[0] = (byte)toHighHexDigit(value);
- rep[1] = (byte)toLowHexDigit(value);
+ ByteArrayLittleEndian.setChar(rep, 0, (char)(toHighHexDigit(value) << 8 | toLowHexDigit(value)));
try {
return jla.newStringNoRepl(rep, StandardCharsets.ISO_8859_1);
} catch (CharacterCodingException cce) {
This gets the same speed-up (4%) as calling `HexDigits.DIGITS`:
Name Cnt Base Error Test Error Unit Diff%
HexFormatBench.toHexDigits 15 1,943 ± 0,014 1,860 ± 0,014 us/op 4,2% (p = 0,000*)
* = significant
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15591#issuecomment-1711370758
More information about the core-libs-dev
mailing list