[PATCH] SignatureParser performance problems due to excessive StringBuilder usage
Claes Redestad
claes.redestad at oracle.com
Tue Nov 29 12:08:02 UTC 2016
Hi and welcome!
I've filed https://bugs.openjdk.java.net/browse/JDK-8170467 to track
this.
Introducing a parseIdentifierInto to avoid repeated creation of
StringBuilder/String pairs seems like a no-brainer (my guess is
this is the largest allocation reduction here?), and although I have a
bit of a knee-jerk reaction against the magic number for the
pre-sized builder it doesn't seem like an outrageous default for
package.Class specifiers in the current ecosystem.
Noting that there's no nested StringBuilder use in this code "pooling"
them would be trivial (a single ThreadLocal<StringBuilder> which we
setLength(0) after toString() would do), which could reduce allocations
further at the cost of a very slight retained footprint increase and
some loss of cache locality. It'd be interesting to see numbers on this,
but I won't insist. :-)
Nit: the use of break-to-label in a non-nested while-loop seems
awkward to me.
/Claes
On 2016-11-29 11:48, Max Kanat-Alexander wrote:
> Currently, any heavy use of sun.reflect.generics.parser.SignatureParser
> leads to many calls to Arrays.copyOf, due to using StringBuilder inside of
> SignatureParser to build up a string a character at a time. This showed up
> as a top CPU-usage culprit in profiling of a large number of tests,
> particularly those which use reflection-heavy dependency injection systems
> such as Guice.
>
> To determine the size to pre-size the StringBuilder in this patch, I made
> AbstractStringBuilder.expandCapacity print out the original capacity and
> new capacity every time it did a resize. I measured the outputs that
> happened during a heavy real-world user of SignatureParser, and took
> particular measurements at different StringBuilder sizes:
>
> 16: Reallocs: 907310 Total Alloc Counting Only Realloc: 84324048
> Estimated Total Alloc: 84310608 Copy Bytes: 27249937 Reallocs from initial
> size: 517883
> 32: Reallocs: 497220 Total Alloc Counting Only Realloc: 68184799
> Estimated Total Alloc: 75412799 Copy Bytes: 22138274 Reallocs from initial
> size: 291168
> 40: Reallocs: 331045 Total Alloc Counting Only Realloc: 54170531
> Estimated Total Alloc: 69096811 Copy Bytes: 17560068 Reallocs from initial
> size: 143886
> 48: Reallocs: 222232 Total Alloc Counting Only Realloc: 42141244
> Estimated Total Alloc: 65213692 Copy Bytes: 13637722 Reallocs from initial
> size: 36367
> 56: Reallocs: 196595 Total Alloc Counting Only Realloc: 39246690
> Estimated Total Alloc: 67601618 Copy Bytes: 12698496 Reallocs from initial
> size: 10705
> 64: Reallocs: 185941 Total Alloc Counting Only Realloc: 38031623
> Estimated Total Alloc: 71112263 Copy Bytes: 12304703 Reallocs from initial
> size: 158
>
> As you can see, 48 gives us the lowest total memory allocation as well as
> significantly reducing the number of times we reallocate arrays. With
> higher values it becomes significantly non-linear in improvement and we
> start to allocate more memory overall, so 48 seemed like the right number
> to pick.
>
> There is no bug which exactly matches this problem.
> https://bugs.openjdk.java.net/browse/JDK-8035424 is close, but describes a
> different root cause for performance issues--one which I did not actually
> see as occupying significant CPU time in profiles.
>
> This is my first OpenJDK contribution, so I can't file a bug myself.
>
> There is no test included as this is entirely a performance problem. This
> patch should be entirely behavior-preserving.
>
> # HG changeset patch
> # User mkanat
> # Date 1480414839 28800
> # Tue Nov 29 02:20:39 2016 -0800
> # Node ID 514fca7d9aa19b074eb32ee8ac700c796b4d9444
> # Parent 7901a13a051ce1630477ef9f8a17c5af6c515e69
> Optimize SignatureParser's use of StringBuilder.
>
> diff --git a/src/java.base/share/classes/sun/reflect/generics/parser/SignatureParser.java
> b/src/java.base/share/classes/sun/reflect/generics/parser/SignatureParser.java
> --- a/src/java.base/share/classes/sun/reflect/generics/parser/SignatureParser.java
> +++ b/src/java.base/share/classes/sun/reflect/generics/parser/SignatureParser.java
> @@ -70,6 +70,11 @@
> private static final char EOI = ':';
> private static final boolean DEBUG = false;
>
> + // StringBuilder does a lot of array copies if we don't pre-size
> + // it when parsing a full class name. This value was determined
> + // empirically by measurements of real-world code.
> + public static final int CLASS_NAME_SB_SIZE = 48;
> +
> // private constructor - enforces use of static factory
> private SignatureParser(){}
>
> @@ -251,9 +256,19 @@
> return FormalTypeParameter.make(id, bs);
> }
>
> - private String parseIdentifier(){
> + private String parseIdentifier() {
> StringBuilder result = new StringBuilder();
> - while (!Character.isWhitespace(current())) {
> + parseIdentifierInto(result);
> + return result.toString();
> + }
> +
> + // This is separate from parseIdentifier for performance reasons.
> + // For a caller who already has a StringBuilder, it's much faster to
> + // re-use the existing builder, because it results in far fewer internal
> + // array copies inside of StringBuilder.
> + private void parseIdentifierInto(StringBuilder result) {
> + int startIndex = index;
> + parseLoop: while (!Character.isWhitespace(current())) {
> char c = current();
> switch(c) {
> case ';':
> @@ -263,16 +278,14 @@
> case ':':
> case '>':
> case '<':
> - return result.toString();
> - default:{
> - result.append(c);
> + break parseLoop;
> + default:
> advance();
> }
> + }
> + result.append(input, startIndex, index - startIndex);
> + }
>
> - }
> - }
> - return result.toString();
> - }
> /**
> * FieldTypeSignature:
> * ClassTypeSignature
> @@ -325,18 +338,16 @@
> // Parse both any optional leading PackageSpecifier as well as
> // the following SimpleClassTypeSignature.
>
> - String id = parseIdentifier();
> + StringBuilder idBuild = new StringBuilder(CLASS_NAME_SB_SIZE);
> + parseIdentifierInto(idBuild);
>
> - if (current() == '/') { // package name
> - StringBuilder idBuild = new StringBuilder(id);
> + while (current() == '/') { // package name
> + advance();
> + idBuild.append('.');
> + parseIdentifierInto(idBuild);
> + }
>
> - while(current() == '/') {
> - advance();
> - idBuild.append(".");
> - idBuild.append(parseIdentifier());
> - }
> - id = idBuild.toString();
> - }
> + String id = idBuild.toString();
>
> switch (current()) {
> case ';':
More information about the core-libs-dev
mailing list