RFR: 8316426: Optimization for HexFormat.formatHex

Mon Sep 18 15:45:05 UTC 2023

On Mon, 18 Sep 2023 13:51:06 GMT, Chen Liang <liach at openjdk.org> wrote:

>> In the improvement of @cl4es PR #15591, the advantages of non-lookup-table were discussed.
>> 
>> But if the input is byte[], using lookup table can improve performance.
>> 
>> For HexFormat#formatHex(Appendable, byte[]) and HexFormat#formatHex(byte[]), If the length of byte[] is larger, the performance of table lookup will be improved more obviously.
>
> src/java.base/share/classes/java/util/HexFormat.java line 664:
> 
>> 662:     public char toHighHexDigit(int value) {
>> 663:         value = (value >> 4) & 0xf;
>> 664:         return (char) ((value < 10 ? '0' : hexBase) + value);
> 
> These changes seem to be the reason `toHexDigitsLong` consistently sees a slight performance drop. Can any professional engineer explain why the old version, which would be like `(char) (value < 10 ? '0' + value : hexBase + value)`, is slightly faster?
> 
> Since the effect of this change `hexBase` is not clear, I recommend dropping this for now (remove the field and changes to the 2 methods).

Yes, please drop this. Possibly re-attempting it in a future PR, though I'm a bit skeptical of the potential win here.

I did test something like `hexBase + value` in #15591 to a similar effect, so opted for just storing `digitCase`. My best guess as to why are that the JIT is simply better at (or more eager to) eliminating untaken paths than it is at constant fold the field load. Or the branch isn't eliminated and we're seeing the effect of branch prediction. I'd recommend analyzing the asm generated (via e.g. `-prof perfasm`) to understand this in depth

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15768#discussion_r1328842647