Improving performance and reducing object allocations of java.util.UUID to/from string

Steven Schlansker steven at likeness.com
Fri Dec 28 15:34:51 PST 2012


Hello JDK developers,

My company uses UUIDs throughout our software.  We recently identified that the java.util.UUID implementations of fromString and toString are horridly inefficient.

An incomplete list of the inefficiencies:

* fromString uses a regular expression that is not even precompiled
* fromString uses a regular expression to parse out the "-" markers, requiring the allocation of many String and String[] objects
* fromString unnecessarily allocates multiple Long objects

* toString creates a char[64], a String object which copies that, and a sub-String object for *each* of the 5 hex digit segments
* toString produces a fixed-length result but involves multiple StringBuilder concatenations

Everyone that I have shown the relevant code to has been horrified by the complete lack of care towards object allocations.  While I am a big fan of the "object allocation is free until otherwise proven" philosophy, we traced multiple production problems down to these hotspots.

But instead of just whining about it, I wish to contribute a patch which I believe fixes the problem.  This is my first attempt to submit anything to the JDK, so please be nice :-)

My initial attempt has yielded micro benchmarks with very favorable outcomes.  By taking the appropriate code from java.lang.Long, modifying it to work on a single pre-allocated array+offset rather than returning a String, and scanning for dashes instead of regex splitting I reduced times 3-6x and object allocations to the low single digits.

My Google Caliper benchmark is available here, to ensure that I have not committed any benchmark sins:
https://gist.github.com/4356621

         benchmark instances  Bytes    ns linear runtime
 JdkUuidFromString     51.00 1544.0 608.2 ==============================
 NewUuidFromString      2.00   72.0 179.1 ========

   JdkUuidToString     31.00 1720.0 321.4 ===============
   NewUuidToString      3.00  200.0  51.5 ==

In particular, the reduction of object allocations from ~1.5KB to 72/200 bytes dramatically reduces GC pressure when you sit in a tight loop deserializing millions of UUIDs from strings.

I believe the same patch can (and should?) apply to jdk7u and jdk8, as the code does not seem to have changed.

I would appreciate any feedback on the code style, efficiency, or correctness that you can offer.  I have run the "java/util/UUID" jtreg suite against this and it passes.  We stole the more thorough Apache Harmony tests for this and they pass as well: https://github.com/stevenschlansker/components-ness-core/blob/master/src/test/java/com/nesscomputing/uuid/TestNessUUID.java

I have not yet secured a CLA, but my company is in the process of signing one.

The rest of the message is a "hg export" of my change set.
Happy holidays, and thank you for your time!
Steven Schlansker




# HG changeset patch
# User Steven Schlansker <steven at nesscomputing.com>
# Date 1356737383 0
# Node ID a812c963b96f08112f6805cbc380b12af196f788
# Parent  9b8c96f96a0f9a5801b55530a387fefbe50482a3
java.util.UUID: improve performance of UUID#toString and UUID#fromString by avoiding many unnecessary object allocations.

         benchmark instances  Bytes    ns linear runtime
 JdkUuidFromString     51.00 1544.0 608.2 ==============================
 NewUuidFromString      2.00   72.0 179.1 ========

   JdkUuidToString     31.00 1720.0 321.4 ===============
   NewUuidToString      3.00  200.0  51.5 ==

diff -r 9b8c96f96a0f -r a812c963b96f src/share/classes/java/util/UUID.java
--- a/src/share/classes/java/util/UUID.java	Mon Jun 27 13:21:34 2011 -0700
+++ b/src/share/classes/java/util/UUID.java	Fri Dec 28 23:29:43 2012 +0000
@@ -189,26 +189,79 @@
      *          described in {@link #toString}
      *
      */
-    public static UUID fromString(String name) {
-        String[] components = name.split("-");
-        if (components.length != 5)
-            throw new IllegalArgumentException("Invalid UUID string: "+name);
-        for (int i=0; i<5; i++)
-            components[i] = "0x"+components[i];
+    public static UUID fromString(String str) {
+        int dashCount = 4;
+        int [] dashPos = new int [6];
+        dashPos[0] = -1;
+        dashPos[5] = str.length();
 
-        long mostSigBits = Long.decode(components[0]).longValue();
+        for (int i = str.length()-1; i >= 0; i--) {
+            if (str.charAt(i) == '-') {
+                if (dashCount == 0) {
+                    throw new IllegalArgumentException("Invalid UUID string: " + str);
+                }
+                dashPos[dashCount--] = i;
+            }
+        }
+
+        if (dashCount > 0) {
+            throw new IllegalArgumentException("Invalid UUID string: " + str);
+        }
+
+        long mostSigBits = decode(str, dashPos, 0) & 0xffffffffL;
         mostSigBits <<= 16;
-        mostSigBits |= Long.decode(components[1]).longValue();
+        mostSigBits |= decode(str, dashPos, 1) & 0xffffL;
         mostSigBits <<= 16;
-        mostSigBits |= Long.decode(components[2]).longValue();
+        mostSigBits |= decode(str,  dashPos, 2) & 0xffff);
 
-        long leastSigBits = Long.decode(components[3]).longValue();
+        long leastSigBits = decode(str,  dashPos, 3) & 0xffffL;
         leastSigBits <<= 48;
-        leastSigBits |= Long.decode(components[4]).longValue();
+        leastSigBits |= decode(str,  dashPos, 4) & 0xffffffffffffL;
 
         return new UUID(mostSigBits, leastSigBits);
     }
 
+    private static long decode(final String str, final int [] dashPos, final int field) {
+        int start = dashPos[field]+1;
+        int end = dashPos[field+1];
+        if (start >= end) {
+            throw new IllegalArgumentException("Invalid UUID string: " + str);
+        }
+        long curr = 0;
+        for (int i = start; i < end; i++) {
+            int x = getNibbleFromChar(str.charAt(i));
+            curr <<= 4;
+            if (curr < 0) {
+                throw new NumberFormatException("long overflow");
+            }
+            curr |= x;
+        }
+        return curr;
+    }
+    
+    private static final int NUM_ALPHA_DIFF = 'A' - '9' - 1;
+    private static final int LOWER_UPPER_DIFF = 'a' - 'A';
+
+    private static int getNibbleFromChar(final char c)
+    {
+        int x = c - '0';
+        if (x > 9) {
+            x -= NUM_ALPHA_DIFF; // difference between '9' and 'A'
+            if (x > 15) {
+                x -= LOWER_UPPER_DIFF; // difference between 'a' and 'A'
+            }
+            if (x < 10) {
+                throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
+            }
+        }
+
+        if (x < 0 || x > 15) {
+            throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
+        }
+
+        return x;
+    }
+
     // Field Accessor Methods
 
     /**
@@ -372,19 +425,46 @@
      * @return  A string representation of this {@code UUID}
      */
     public String toString() {
-        return (digits(mostSigBits >> 32, 8) + "-" +
-                digits(mostSigBits >> 16, 4) + "-" +
-                digits(mostSigBits, 4) + "-" +
-                digits(leastSigBits >> 48, 4) + "-" +
-                digits(leastSigBits, 12));
+        return toString(getMostSignificantBits(), getLeastSignificantBits());
     }
 
-    /** Returns val represented by the specified number of hex digits. */
-    private static String digits(long val, int digits) {
-        long hi = 1L << (digits * 4);
-        return Long.toHexString(hi | (val & (hi - 1))).substring(1);
+    public static String toString(long msb, long lsb) {
+        char[] uuidChars = new char[36];
+
+        hexDigits(uuidChars, 0, 8, msb >> 32);
+        uuidChars[8] = '-';
+        hexDigits(uuidChars, 9, 4, msb >> 16);
+        uuidChars[13] = '-';
+        hexDigits(uuidChars, 14, 4, msb);
+        uuidChars[18] = '-';
+        hexDigits(uuidChars, 19, 4, lsb >> 48);
+        uuidChars[23] = '-';
+        hexDigits(uuidChars, 24, 12, lsb);
+
+        return new String(uuidChars);
     }
 
+    private static void hexDigits(char[] dest, int offset, int digits, long val) {
+        long hi = 1L << (digits * 4);
+        toUnsignedString(dest, offset, digits, hi | (val & (hi - 1)), 4);
+    }
+
+    private final static char[] HEX_DIGITS = {
+        '0' , '1' , '2' , '3' , '4' , '5' ,
+        '6' , '7' , '8' , '9' , 'a' , 'b' ,
+        'c' , 'd' , 'e' , 'f'
+    };
+
+    private static void toUnsignedString(char[] dest, int offset, int len, long i, int shift) {
+        int charPos = len;
+        int radix = 1 << shift;
+        long mask = radix - 1;
+        do {
+            dest[offset + --charPos] = HEX_DIGITS[(int)(i & mask)];
+            i >>>= shift;
+        } while (i != 0 && charPos > 0);
+    }
+    
     /**
      * Returns a hash code for this {@code UUID}.
      *


More information about the jdk7u-dev mailing list