Improving performance and reducing object allocations of java.util.UUID to/from string
Alan Bateman
Alan.Bateman at oracle.com
Fri Dec 28 23:41:12 PST 2012
Steven,
You should bring your patch to core-libs-dev to discuss.
-Alan
On 28/12/2012 23:34, Steven Schlansker wrote:
> Hello JDK developers,
>
> My company uses UUIDs throughout our software. We recently identified that the java.util.UUID implementations of fromString and toString are horridly inefficient.
>
> An incomplete list of the inefficiencies:
>
> * fromString uses a regular expression that is not even precompiled
> * fromString uses a regular expression to parse out the "-" markers, requiring the allocation of many String and String[] objects
> * fromString unnecessarily allocates multiple Long objects
>
> * toString creates a char[64], a String object which copies that, and a sub-String object for *each* of the 5 hex digit segments
> * toString produces a fixed-length result but involves multiple StringBuilder concatenations
>
> Everyone that I have shown the relevant code to has been horrified by the complete lack of care towards object allocations. While I am a big fan of the "object allocation is free until otherwise proven" philosophy, we traced multiple production problems down to these hotspots.
>
> But instead of just whining about it, I wish to contribute a patch which I believe fixes the problem. This is my first attempt to submit anything to the JDK, so please be nice :-)
>
> My initial attempt has yielded micro benchmarks with very favorable outcomes. By taking the appropriate code from java.lang.Long, modifying it to work on a single pre-allocated array+offset rather than returning a String, and scanning for dashes instead of regex splitting I reduced times 3-6x and object allocations to the low single digits.
>
> My Google Caliper benchmark is available here, to ensure that I have not committed any benchmark sins:
> https://gist.github.com/4356621
>
> benchmark instances Bytes ns linear runtime
> JdkUuidFromString 51.00 1544.0 608.2 ==============================
> NewUuidFromString 2.00 72.0 179.1 ========
>
> JdkUuidToString 31.00 1720.0 321.4 ===============
> NewUuidToString 3.00 200.0 51.5 ==
>
> In particular, the reduction of object allocations from ~1.5KB to 72/200 bytes dramatically reduces GC pressure when you sit in a tight loop deserializing millions of UUIDs from strings.
>
> I believe the same patch can (and should?) apply to jdk7u and jdk8, as the code does not seem to have changed.
>
> I would appreciate any feedback on the code style, efficiency, or correctness that you can offer. I have run the "java/util/UUID" jtreg suite against this and it passes. We stole the more thorough Apache Harmony tests for this and they pass as well: https://github.com/stevenschlansker/components-ness-core/blob/master/src/test/java/com/nesscomputing/uuid/TestNessUUID.java
>
> I have not yet secured a CLA, but my company is in the process of signing one.
>
> The rest of the message is a "hg export" of my change set.
> Happy holidays, and thank you for your time!
> Steven Schlansker
>
>
>
>
> # HG changeset patch
> # User Steven Schlansker<steven at nesscomputing.com>
> # Date 1356737383 0
> # Node ID a812c963b96f08112f6805cbc380b12af196f788
> # Parent 9b8c96f96a0f9a5801b55530a387fefbe50482a3
> java.util.UUID: improve performance of UUID#toString and UUID#fromString by avoiding many unnecessary object allocations.
>
> benchmark instances Bytes ns linear runtime
> JdkUuidFromString 51.00 1544.0 608.2 ==============================
> NewUuidFromString 2.00 72.0 179.1 ========
>
> JdkUuidToString 31.00 1720.0 321.4 ===============
> NewUuidToString 3.00 200.0 51.5 ==
>
> diff -r 9b8c96f96a0f -r a812c963b96f src/share/classes/java/util/UUID.java
> --- a/src/share/classes/java/util/UUID.java Mon Jun 27 13:21:34 2011 -0700
> +++ b/src/share/classes/java/util/UUID.java Fri Dec 28 23:29:43 2012 +0000
> @@ -189,26 +189,79 @@
> * described in {@link #toString}
> *
> */
> - public static UUID fromString(String name) {
> - String[] components = name.split("-");
> - if (components.length != 5)
> - throw new IllegalArgumentException("Invalid UUID string: "+name);
> - for (int i=0; i<5; i++)
> - components[i] = "0x"+components[i];
> + public static UUID fromString(String str) {
> + int dashCount = 4;
> + int [] dashPos = new int [6];
> + dashPos[0] = -1;
> + dashPos[5] = str.length();
>
> - long mostSigBits = Long.decode(components[0]).longValue();
> + for (int i = str.length()-1; i>= 0; i--) {
> + if (str.charAt(i) == '-') {
> + if (dashCount == 0) {
> + throw new IllegalArgumentException("Invalid UUID string: " + str);
> + }
> + dashPos[dashCount--] = i;
> + }
> + }
> +
> + if (dashCount> 0) {
> + throw new IllegalArgumentException("Invalid UUID string: " + str);
> + }
> +
> + long mostSigBits = decode(str, dashPos, 0)& 0xffffffffL;
> mostSigBits<<= 16;
> - mostSigBits |= Long.decode(components[1]).longValue();
> + mostSigBits |= decode(str, dashPos, 1)& 0xffffL;
> mostSigBits<<= 16;
> - mostSigBits |= Long.decode(components[2]).longValue();
> + mostSigBits |= decode(str, dashPos, 2)& 0xffff);
>
> - long leastSigBits = Long.decode(components[3]).longValue();
> + long leastSigBits = decode(str, dashPos, 3)& 0xffffL;
> leastSigBits<<= 48;
> - leastSigBits |= Long.decode(components[4]).longValue();
> + leastSigBits |= decode(str, dashPos, 4)& 0xffffffffffffL;
>
> return new UUID(mostSigBits, leastSigBits);
> }
>
> + private static long decode(final String str, final int [] dashPos, final int field) {
> + int start = dashPos[field]+1;
> + int end = dashPos[field+1];
> + if (start>= end) {
> + throw new IllegalArgumentException("Invalid UUID string: " + str);
> + }
> + long curr = 0;
> + for (int i = start; i< end; i++) {
> + int x = getNibbleFromChar(str.charAt(i));
> + curr<<= 4;
> + if (curr< 0) {
> + throw new NumberFormatException("long overflow");
> + }
> + curr |= x;
> + }
> + return curr;
> + }
> +
> + private static final int NUM_ALPHA_DIFF = 'A' - '9' - 1;
> + private static final int LOWER_UPPER_DIFF = 'a' - 'A';
> +
> + private static int getNibbleFromChar(final char c)
> + {
> + int x = c - '0';
> + if (x> 9) {
> + x -= NUM_ALPHA_DIFF; // difference between '9' and 'A'
> + if (x> 15) {
> + x -= LOWER_UPPER_DIFF; // difference between 'a' and 'A'
> + }
> + if (x< 10) {
> + throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
> + }
> + }
> +
> + if (x< 0 || x> 15) {
> + throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
> + }
> +
> + return x;
> + }
> +
> // Field Accessor Methods
>
> /**
> @@ -372,19 +425,46 @@
> * @return A string representation of this {@code UUID}
> */
> public String toString() {
> - return (digits(mostSigBits>> 32, 8) + "-" +
> - digits(mostSigBits>> 16, 4) + "-" +
> - digits(mostSigBits, 4) + "-" +
> - digits(leastSigBits>> 48, 4) + "-" +
> - digits(leastSigBits, 12));
> + return toString(getMostSignificantBits(), getLeastSignificantBits());
> }
>
> - /** Returns val represented by the specified number of hex digits. */
> - private static String digits(long val, int digits) {
> - long hi = 1L<< (digits * 4);
> - return Long.toHexString(hi | (val& (hi - 1))).substring(1);
> + public static String toString(long msb, long lsb) {
> + char[] uuidChars = new char[36];
> +
> + hexDigits(uuidChars, 0, 8, msb>> 32);
> + uuidChars[8] = '-';
> + hexDigits(uuidChars, 9, 4, msb>> 16);
> + uuidChars[13] = '-';
> + hexDigits(uuidChars, 14, 4, msb);
> + uuidChars[18] = '-';
> + hexDigits(uuidChars, 19, 4, lsb>> 48);
> + uuidChars[23] = '-';
> + hexDigits(uuidChars, 24, 12, lsb);
> +
> + return new String(uuidChars);
> }
>
> + private static void hexDigits(char[] dest, int offset, int digits, long val) {
> + long hi = 1L<< (digits * 4);
> + toUnsignedString(dest, offset, digits, hi | (val& (hi - 1)), 4);
> + }
> +
> + private final static char[] HEX_DIGITS = {
> + '0' , '1' , '2' , '3' , '4' , '5' ,
> + '6' , '7' , '8' , '9' , 'a' , 'b' ,
> + 'c' , 'd' , 'e' , 'f'
> + };
> +
> + private static void toUnsignedString(char[] dest, int offset, int len, long i, int shift) {
> + int charPos = len;
> + int radix = 1<< shift;
> + long mask = radix - 1;
> + do {
> + dest[offset + --charPos] = HEX_DIGITS[(int)(i& mask)];
> + i>>>= shift;
> + } while (i != 0&& charPos> 0);
> + }
> +
> /**
> * Returns a hash code for this {@code UUID}.
> *
More information about the jdk7u-dev
mailing list