RFR: 8316426: Optimization for HexFormat.formatHex

Mon Sep 18 15:44:59 UTC 2023

On Fri, 15 Sep 2023 18:04:29 GMT, 温绍锦 <duke at openjdk.org> wrote:

> In the improvement of @cl4es PR #15591, the advantages of non-lookup-table were discussed.
> 
> But if the input is byte[], using lookup table can improve performance.
> 
> For HexFormat#formatHex(Appendable, byte[]) and HexFormat#formatHex(byte[]), If the length of byte[] is larger, the performance of table lookup will be improved more obviously.

What numbers do you get with this? 

In my experiments the improvement was either negligible or small enough to hardly matter. I also tried flipping bits similar to what you're doing for uppercasing but saw a net regression from that.

I opted for simplicity in #15991 - a simple cleanup that removed a few tiny lookup-tables and still led to a decent improvement. Going the other direction seems like the wrong move.

Results of an experiment to remove the lookup table use from `StringBuilder::appendHex`:

-p size=512
Benchmark                           (size)  Mode  Cnt  Score   Error  Units
HexFormatBench.appenderLowerCached     512  avgt   15  1,629 ± 0,378  us/op # baseline
HexFormatBench.appenderLowerCached     512  avgt   15  0,222 ± 0,009  us/op # 15768, 7.3x
HexFormatBench.appenderLowerCached     512  avgt   15  0,366 ± 0,002  us/op # no_lookup_tables.diff, 4.5x

-p size=4
Benchmark                           (size)  Mode  Cnt   Score   Error  Units
HexFormatBench.appenderLowerCached       4  avgt    5  26,723 ± 1,042  ns/op
HexFormatBench.appenderLowerCached       4  avgt    5   7,492 ± 0,140  ns/op # 15768, 3.6x
HexFormatBench.appenderLowerCached       4  avgt    5  10,205 ± 0,055  ns/op # no_lookup_tables.diff, 2.6x


Most of the win is from having a loop that is easier for the JIT to optimize; the lookup-table impl is faster than no lookup table in these micros, by a factor that is in the same ballpark as your `format` micro numbers. This on a M1. You might get different numbers.

I'm not convinced 30-100% wins on these micro is worth the increase in cache traffic, and would like to see a deeper analysis of the pros and cons of lookup tables before we go ahead with any optimizations that depend on their use.

The `appendHex` method in `StringBuilder` is interesting, but has other maintainability issues. It's a slippery slope to start adding specialized internal routines to `StringBuilder`, and perhaps there are better ways to model this at a higher level.

no_lookup_table.diff:

diff --git a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
index 57a16bb769a..70727375c0b 100644
--- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
+++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
@@ -1825,6 +1825,15 @@ private final void appendChars(CharSequence s, int off, int end) {
         count += end - off;
     }

+    private static char toHighHexDigit(boolean ucase, int value) {
+        value = (value >> 4) & 0xf;
+        return (char) ((value < 10 ? '0' : ucase ? 'A' : 'a') + value);
+    }
+    private static char toLowHexDigit(boolean ucase, int value) {
+        value = (value) & 0xf;
+        return (char) ((value < 10 ? '0' : ucase ? 'A' : 'a') + value);
+    }
+
     final void appendHex(boolean ucase, byte[] bytes, int fromIndex, int toIndex) {
         Preconditions.checkFromToIndex(fromIndex, toIndex, bytes.length, Preconditions.IOOBE_FORMATTER);

@@ -1832,17 +1841,14 @@ final void appendHex(boolean ucase, byte[] bytes, int fromIndex, int toIndex) {

         int charPos = count;
         for (int i = fromIndex; i < toIndex; ++i) {
-            short packed = HexDigits.digitPair(bytes[i], ucase);
+            byte b = bytes[i];
             if (isLatin1()) {
-                ByteArrayLittleEndian.setShort(
-                        value,
-                        charPos,
-                        packed);
+                StringLatin1.putChar(value, charPos++, toHighHexDigit(ucase, b));
+                StringLatin1.putChar(value, charPos++, toLowHexDigit(ucase, b));
             } else {
-                StringUTF16.putChar(value, charPos, packed & 0xff);
-                StringUTF16.putChar(value, charPos + 1, packed >> 8);
+                StringUTF16.putChar(value, charPos++, toHighHexDigit(ucase, b));
+                StringUTF16.putChar(value, charPos++, toLowHexDigit(ucase, b));
             }
-            charPos += 2;
         }

         count = charPos;

I don't think `appendHex` would make the cut as a public API. I think we would need something with much more general utility for that.

A more high-level idea would be to improve said `String` templates JEP to be able to format more generally into `Appendable`/`StringBuilder`, so that hex, octal - any custom format - can be efficiently appended to a `StringBuilder`. 

BTW, here's a patch on top of #15768 that simplifies `formatHex(Appendable..)` to simply append the result of `formatHex(byte[]...)`:


Benchmark                                              (size)  Mode  Cnt     Score     Error   Units
HexFormatBench.appenderLowerCached                        512  avgt   15     0,202 ±   0,001   us/op
HexFormatBench.formatLowerCached                          512  avgt   15     0,189 ±   0,016   us/op


Same speed. Drawback is that the `appender` becomes allocating but short-lived allocations is one of my least concerns:

```java 
diff --git a/src/java.base/share/classes/java/util/HexFormat.java b/src/java.base/share/classes/java/util/HexFormat.java
index e1d88e3ceb6..68a15307fe0 100644
--- a/src/java.base/share/classes/java/util/HexFormat.java
+++ b/src/java.base/share/classes/java/util/HexFormat.java
@@ -407,38 +407,7 @@ public <A extends Appendable> A formatHex(A out, byte[] bytes, int fromIndex, in
         int length = toIndex - fromIndex;
         if (length > 0) {
             try {
-                boolean prefixEmpty = prefix.isEmpty();
-                boolean suffixEmpty = suffix.isEmpty();
-                boolean delimiterEmpty = delimiter.isEmpty();
-                if (!prefixEmpty) {
-                    out.append(prefix);
-                }
-                if (suffixEmpty && delimiterEmpty && prefixEmpty) {
-                    if (out instanceof StringBuilder sb) {
-                        jla.appendHex(sb, digitCase == Case.UPPERCASE, bytes, fromIndex, toIndex);
-                    } else {
-                        for (int i = 0; i < length; i++) {
-                            toHexDigits(out, bytes[fromIndex + i]);
-                        }
-                    }
-                } else {
-                    toHexDigits(out, bytes[fromIndex]);
-                    for (int i = 1; i < length; i++) {
-                        if (!suffixEmpty) {
-                            out.append(suffix);
-                        }
-                        if (!delimiterEmpty) {
-                            out.append(delimiter);
-                        }
-                        if (!prefixEmpty) {
-                            out.append(prefix);
-                        }
-                        toHexDigits(out, bytes[fromIndex + i]);
-                    }
-                }
-                if (!suffixEmpty) {
-                    out.append(suffix);
-                }
+                out.append(formatHex(bytes, fromIndex, toIndex));
             } catch (IOException ioe) {
                 throw new UncheckedIOException(ioe.getMessage(), ioe);
             }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721672736
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721851723
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1722048682