Improving performance and reducing object allocations of java.util.UUID to/from string

Steven Schlansker steven at likeness.com
Sat Dec 29 09:19:26 PST 2012


On Dec 28, 2012, at 11:41 PM, Alan Bateman <Alan.Bateman at oracle.com> wrote:
> Steven,
> 
> You should bring your patch to core-libs-dev to discuss.
> 

Thank you, I shall do so.

> -Alan
> 
> On 28/12/2012 23:34, Steven Schlansker wrote:
>> Hello JDK developers,
>> 
>> My company uses UUIDs throughout our software.  We recently identified that the java.util.UUID implementations of fromString and toString are horridly inefficient.
>> 
>> An incomplete list of the inefficiencies:
>> 
>> * fromString uses a regular expression that is not even precompiled
>> * fromString uses a regular expression to parse out the "-" markers, requiring the allocation of many String and String[] objects
>> * fromString unnecessarily allocates multiple Long objects
>> 
>> * toString creates a char[64], a String object which copies that, and a sub-String object for *each* of the 5 hex digit segments
>> * toString produces a fixed-length result but involves multiple StringBuilder concatenations
>> 
>> Everyone that I have shown the relevant code to has been horrified by the complete lack of care towards object allocations.  While I am a big fan of the "object allocation is free until otherwise proven" philosophy, we traced multiple production problems down to these hotspots.
>> 
>> But instead of just whining about it, I wish to contribute a patch which I believe fixes the problem.  This is my first attempt to submit anything to the JDK, so please be nice :-)
>> 
>> My initial attempt has yielded micro benchmarks with very favorable outcomes.  By taking the appropriate code from java.lang.Long, modifying it to work on a single pre-allocated array+offset rather than returning a String, and scanning for dashes instead of regex splitting I reduced times 3-6x and object allocations to the low single digits.
>> 
>> My Google Caliper benchmark is available here, to ensure that I have not committed any benchmark sins:
>> https://gist.github.com/4356621
>> 
>>          benchmark instances  Bytes    ns linear runtime
>>  JdkUuidFromString     51.00 1544.0 608.2 ==============================
>>  NewUuidFromString      2.00   72.0 179.1 ========
>> 
>>    JdkUuidToString     31.00 1720.0 321.4 ===============
>>    NewUuidToString      3.00  200.0  51.5 ==
>> 
>> In particular, the reduction of object allocations from ~1.5KB to 72/200 bytes dramatically reduces GC pressure when you sit in a tight loop deserializing millions of UUIDs from strings.
>> 
>> I believe the same patch can (and should?) apply to jdk7u and jdk8, as the code does not seem to have changed.
>> 
>> I would appreciate any feedback on the code style, efficiency, or correctness that you can offer.  I have run the "java/util/UUID" jtreg suite against this and it passes.  We stole the more thorough Apache Harmony tests for this and they pass as well: https://github.com/stevenschlansker/components-ness-core/blob/master/src/test/java/com/nesscomputing/uuid/TestNessUUID.java
>> 
>> I have not yet secured a CLA, but my company is in the process of signing one.
>> 
>> The rest of the message is a "hg export" of my change set.
>> Happy holidays, and thank you for your time!
>> Steven Schlansker
>> 
>> 
>> 
>> 
>> # HG changeset patch
>> # User Steven Schlansker<steven at nesscomputing.com>
>> # Date 1356737383 0
>> # Node ID a812c963b96f08112f6805cbc380b12af196f788
>> # Parent  9b8c96f96a0f9a5801b55530a387fefbe50482a3
>> java.util.UUID: improve performance of UUID#toString and UUID#fromString by avoiding many unnecessary object allocations.
>> 
>>          benchmark instances  Bytes    ns linear runtime
>>  JdkUuidFromString     51.00 1544.0 608.2 ==============================
>>  NewUuidFromString      2.00   72.0 179.1 ========
>> 
>>    JdkUuidToString     31.00 1720.0 321.4 ===============
>>    NewUuidToString      3.00  200.0  51.5 ==
>> 
>> diff -r 9b8c96f96a0f -r a812c963b96f src/share/classes/java/util/UUID.java
>> --- a/src/share/classes/java/util/UUID.java	Mon Jun 27 13:21:34 2011 -0700
>> +++ b/src/share/classes/java/util/UUID.java	Fri Dec 28 23:29:43 2012 +0000
>> @@ -189,26 +189,79 @@
>>       *          described in {@link #toString}
>>       *
>>       */
>> -    public static UUID fromString(String name) {
>> -        String[] components = name.split("-");
>> -        if (components.length != 5)
>> -            throw new IllegalArgumentException("Invalid UUID string: "+name);
>> -        for (int i=0; i<5; i++)
>> -            components[i] = "0x"+components[i];
>> +    public static UUID fromString(String str) {
>> +        int dashCount = 4;
>> +        int [] dashPos = new int [6];
>> +        dashPos[0] = -1;
>> +        dashPos[5] = str.length();
>> 
>> -        long mostSigBits = Long.decode(components[0]).longValue();
>> +        for (int i = str.length()-1; i>= 0; i--) {
>> +            if (str.charAt(i) == '-') {
>> +                if (dashCount == 0) {
>> +                    throw new IllegalArgumentException("Invalid UUID string: " + str);
>> +                }
>> +                dashPos[dashCount--] = i;
>> +            }
>> +        }
>> +
>> +        if (dashCount>  0) {
>> +            throw new IllegalArgumentException("Invalid UUID string: " + str);
>> +        }
>> +
>> +        long mostSigBits = decode(str, dashPos, 0)&  0xffffffffL;
>>          mostSigBits<<= 16;
>> -        mostSigBits |= Long.decode(components[1]).longValue();
>> +        mostSigBits |= decode(str, dashPos, 1)&  0xffffL;
>>          mostSigBits<<= 16;
>> -        mostSigBits |= Long.decode(components[2]).longValue();
>> +        mostSigBits |= decode(str,  dashPos, 2)&  0xffff);
>> 
>> -        long leastSigBits = Long.decode(components[3]).longValue();
>> +        long leastSigBits = decode(str,  dashPos, 3)&  0xffffL;
>>          leastSigBits<<= 48;
>> -        leastSigBits |= Long.decode(components[4]).longValue();
>> +        leastSigBits |= decode(str,  dashPos, 4)&  0xffffffffffffL;
>> 
>>          return new UUID(mostSigBits, leastSigBits);
>>      }
>> 
>> +    private static long decode(final String str, final int [] dashPos, final int field) {
>> +        int start = dashPos[field]+1;
>> +        int end = dashPos[field+1];
>> +        if (start>= end) {
>> +            throw new IllegalArgumentException("Invalid UUID string: " + str);
>> +        }
>> +        long curr = 0;
>> +        for (int i = start; i<  end; i++) {
>> +            int x = getNibbleFromChar(str.charAt(i));
>> +            curr<<= 4;
>> +            if (curr<  0) {
>> +                throw new NumberFormatException("long overflow");
>> +            }
>> +            curr |= x;
>> +        }
>> +        return curr;
>> +    }
>> +
>> +    private static final int NUM_ALPHA_DIFF = 'A' - '9' - 1;
>> +    private static final int LOWER_UPPER_DIFF = 'a' - 'A';
>> +
>> +    private static int getNibbleFromChar(final char c)
>> +    {
>> +        int x = c - '0';
>> +        if (x>  9) {
>> +            x -= NUM_ALPHA_DIFF; // difference between '9' and 'A'
>> +            if (x>  15) {
>> +                x -= LOWER_UPPER_DIFF; // difference between 'a' and 'A'
>> +            }
>> +            if (x<  10) {
>> +                throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
>> +            }
>> +        }
>> +
>> +        if (x<  0 || x>  15) {
>> +            throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
>> +        }
>> +
>> +        return x;
>> +    }
>> +
>>      // Field Accessor Methods
>> 
>>      /**
>> @@ -372,19 +425,46 @@
>>       * @return  A string representation of this {@code UUID}
>>       */
>>      public String toString() {
>> -        return (digits(mostSigBits>>  32, 8) + "-" +
>> -                digits(mostSigBits>>  16, 4) + "-" +
>> -                digits(mostSigBits, 4) + "-" +
>> -                digits(leastSigBits>>  48, 4) + "-" +
>> -                digits(leastSigBits, 12));
>> +        return toString(getMostSignificantBits(), getLeastSignificantBits());
>>      }
>> 
>> -    /** Returns val represented by the specified number of hex digits. */
>> -    private static String digits(long val, int digits) {
>> -        long hi = 1L<<  (digits * 4);
>> -        return Long.toHexString(hi | (val&  (hi - 1))).substring(1);
>> +    public static String toString(long msb, long lsb) {
>> +        char[] uuidChars = new char[36];
>> +
>> +        hexDigits(uuidChars, 0, 8, msb>>  32);
>> +        uuidChars[8] = '-';
>> +        hexDigits(uuidChars, 9, 4, msb>>  16);
>> +        uuidChars[13] = '-';
>> +        hexDigits(uuidChars, 14, 4, msb);
>> +        uuidChars[18] = '-';
>> +        hexDigits(uuidChars, 19, 4, lsb>>  48);
>> +        uuidChars[23] = '-';
>> +        hexDigits(uuidChars, 24, 12, lsb);
>> +
>> +        return new String(uuidChars);
>>      }
>> 
>> +    private static void hexDigits(char[] dest, int offset, int digits, long val) {
>> +        long hi = 1L<<  (digits * 4);
>> +        toUnsignedString(dest, offset, digits, hi | (val&  (hi - 1)), 4);
>> +    }
>> +
>> +    private final static char[] HEX_DIGITS = {
>> +        '0' , '1' , '2' , '3' , '4' , '5' ,
>> +        '6' , '7' , '8' , '9' , 'a' , 'b' ,
>> +        'c' , 'd' , 'e' , 'f'
>> +    };
>> +
>> +    private static void toUnsignedString(char[] dest, int offset, int len, long i, int shift) {
>> +        int charPos = len;
>> +        int radix = 1<<  shift;
>> +        long mask = radix - 1;
>> +        do {
>> +            dest[offset + --charPos] = HEX_DIGITS[(int)(i&  mask)];
>> +            i>>>= shift;
>> +        } while (i != 0&&  charPos>  0);
>> +    }
>> +
>>      /**
>>       * Returns a hash code for this {@code UUID}.
>>       *
> 




More information about the jdk7u-dev mailing list