RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
Claes Redestad
redestad at openjdk.org
Sun Apr 14 14:36:07 UTC 2024
On Sat, 13 Apr 2024 17:54:54 GMT, Brett Okken <duke at openjdk.org> wrote:
>> Possibly. I tried a few simple variants that initialized the `StringBuilder` capacity at various guesses, such as sum of constant sizes + some factor based on args, but across a diverse set of micros that gives both some wins and some regressions. Getting the estimation just right is pretty tricky, especially when size of the arguments are arbitrary (like for any String/Object arg).
>
> What are the scenarios which had regressions?
> Given the conservative growth for StringBuilder, it surprises me a bit that any scenario would regress.
I took a second look and it turns out that there were neither regressions nor improvements in my test, only a few false positives: For C2 the SB chain is recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into a shape where the initial size in java bytecode - if any - is ignored.
With C1 having an initial size can have a significant effect. One way to provoke a regression there is to have a huge constant suffix while underestimating the size of the operands, which can lead to excessive allocation:
Name Cnt Base Error Test Error Unit Change
StringConcat.concat23StringConst 5 385,268 ± 5,238 341,251 ± 2,606 ns/op 1,13x (p = 0,000*)
:gc.alloc.rate 6039,095 ± 82,309 12764,169 ± 97,146 MB/sec 2,11x (p = 0,000*)
:gc.alloc.rate.norm 2440,003 ± 0,000 4568,002 ± 0,000 B/op 1,87x (p = 0,000*)
Still a bit better on throughput from less actual copying.
*If* the `StringBuilder` is sized sufficiently (constants + args * N) things look much better, of course:
-XX:TieredStopAtLevel=1
Name Cnt Base Error Test Error Unit Change
StringConcat.concat23StringConst 5 385,268 ± 5,238 259,628 ± 2,515 ns/op 1,48x (p = 0,000*)
:gc.alloc.rate 6039,095 ± 82,309 8902,803 ± 86,563 MB/sec 1,47x (p = 0,000*)
:gc.alloc.rate.norm 2440,003 ± 0,000 2424,002 ± 0,000 B/op 0,99x (p = 0,000*)
For most cases having a size based on the sum of the constants plus some small factor per argument seem to be a decent heuristic - for C1. Since this adds just one bytecode to the generated method I guess it wouldn't hurt.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1564753612
More information about the core-libs-dev
mailing list