RFR: 8335366: Improve String.format performance with fastpath [v7]
Shaojin Wen
duke at openjdk.org
Sun Jun 30 10:07:18 UTC 2024
On Sun, 30 Jun 2024 04:54:35 GMT, Shaojin Wen <duke at openjdk.org> wrote:
>> We need a String format solution with good performance. String Template was once expected, but it has been removed. j.u.Formatter is powerful, but its performance is not good enough.
>>
>> This PR implements a subset of j.u.Formatter capabilities. The performance is good enough that it is a fastpath for commonly used functions. When the supported functions are exceeded, it will fall back to using j.u.Formatter.
>>
>> The performance of this implementation is good enough, the fastpath has low detection cost, There is no noticeable performance degradation when falling back to j.u.Formatter via fastpath.
>>
>> Below is a comparison of String.format and concat-based and StringBuilder:
>>
>> * benchmark java code
>>
>> public class StringFormat {
>> @Benchmark
>> public String stringIntFormat() {
>> return "%s %d".formatted(s, i);
>> }
>>
>> @Benchmark
>> public String stringIntConcat() {
>> return s + " " + i;
>> }
>>
>> @Benchmark
>> public String stringIntStringBuilder() {
>> return new StringBuilder(s).append(" ").append(i).toString();
>> }
>> }
>>
>>
>> * benchmark number on macbook m1 pro
>>
>> Benchmark Mode Cnt Score Error Units
>> StringFormat.stringIntConcat avgt 15 6.541 ? 0.056 ns/op
>> StringFormat.stringIntFormat avgt 15 17.399 ? 0.133 ns/op
>> StringFormat.stringIntStringBuilder avgt 15 8.004 ? 0.063 ns/op
>>
>>
>> From the above data, we can see that the implementation of fastpath reduces the performance difference between String.format and StringBuilder from 10 times to 2~3 times.
>>
>> The implementation of fastpath supports the following four specifiers, which can appear at most twice and support a width of 1 to 9.
>>
>> d
>> x
>> X
>> s
>>
>> If necessary, we can add a few more.
>>
>>
>> Below is a comparison of performance numbers running on a MacBook M1, showing a significant performance improvement.
>>
>> -Benchmark Mode Cnt Score Error Units (baseline)
>> -StringFormat.complexFormat avgt 15 895.954 ? 52.541 ns/op
>> -StringFormat.decimalFormat avgt 15 277.420 ? 18.254 ns/op
>> -StringFormat.stringFormat avgt 15 66.787 ? 2.715 ns/op
>> -StringFormat.stringIntFormat avgt 15 81.046 ? 1.879 ns/op
>> -StringFormat.widthStringFormat avgt 15 38.897 ? 0.114 ns/op
>> -StringFormat.widthStringIntFormat avgt 15 109.841 ? 1.028 ns/op
>>
>> +Benchmark ...
>
> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision:
>
> laze init for `decimal fast path locale`
In most cases when calling String.format, the format parameter is a constant. Is there any chance that C2 can optimize this?
void my_func() {
String str;
int d;
// ..
String info = "info %s %d".formatted(str, d);
}
final class StringFormat {
final class StringFormat {
@ForceInline
static String format(String format, Object... args) {
if (args != null) { // 1
int off = format.indexOf('%'); // 2
if (off == -1) { // 3
return format;
}
int len = format.length(); // 4
if (off + 1 != len) { // 5
int off1 = format.indexOf('%', off + 2); // 6
String s = null;
if (args.length == 1) { // 7
if (off1 == -1) { // 8
s = format1(format, off, args[0]);
}
} else if (args.length == 2) { // 9
if (off1 != -1 && off1 + 1 != len) { // 10
s = format2(format, off, off1, args[0], args[1]);
}
}
if (s != null) {
return s;
}
}
}
return new Formatter().format(format, args).toString();
}
private static String format1(String format, int off, Object arg) {
int len = format.length(); // 11
char conv = format.charAt(off + 1); //12
int width = 0;
if (conv >= '1' && conv <= '9') { // 13
width = conv - '0'; // 14
if (off + 2 < len) { // 15
conv = format.charAt(off + 2); // 16
}
}
if (conv == STRING) { // 17
if (isLong(arg)) {
conv = DECIMAL_INTEGER;
} else {
arg = String.valueOf(arg);
}
}
int size = stringSize(conv, arg);
if (size == -1) {
return null;
}
return format1(format, off, conv, arg, width, size);
}
}
}
The 18 places in the code that can be optimized by C2 can further improve performance
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19956#issuecomment-2198505435
More information about the core-libs-dev
mailing list