RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers
Shaojin Wen
duke at openjdk.org
Mon Oct 16 16:18:21 UTC 2023
On Mon, 25 Sep 2023 12:28:06 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> @cl4es made performance optimizations for the simple specifiers of String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the same idea, I continued to make improvements. I made patterns like %2d %02d also be optimized.
>>
>> The following are the test results based on MacBookPro M1 Pro:
>>
>>
>> -Benchmark Mode Cnt Score Error Units
>> -StringFormat.complexFormat avgt 15 1862.233 ? 217.479 ns/op
>> -StringFormat.int02Format avgt 15 312.491 ? 26.021 ns/op
>> -StringFormat.intFormat avgt 15 84.432 ? 4.145 ns/op
>> -StringFormat.longFormat avgt 15 87.330 ? 6.111 ns/op
>> -StringFormat.stringFormat avgt 15 63.985 ? 11.366 ns/op
>> -StringFormat.stringIntFormat avgt 15 87.422 ? 0.147 ns/op
>> -StringFormat.widthStringFormat avgt 15 250.740 ? 32.639 ns/op
>> -StringFormat.widthStringIntFormat avgt 15 312.474 ? 16.309 ns/op
>>
>> +Benchmark Mode Cnt Score Error Units
>> +StringFormat.complexFormat avgt 15 740.626 ? 66.671 ns/op (+151.45)
>> +StringFormat.int02Format avgt 15 131.049 ? 0.432 ns/op (+138.46)
>> +StringFormat.intFormat avgt 15 67.229 ? 4.155 ns/op (+25.59)
>> +StringFormat.longFormat avgt 15 66.444 ? 0.614 ns/op (+31.44)
>> +StringFormat.stringFormat avgt 15 62.619 ? 4.652 ns/op (+2.19)
>> +StringFormat.stringIntFormat avgt 15 89.606 ? 13.966 ns/op (-2.44)
>> +StringFormat.widthStringFormat avgt 15 52.462 ? 15.649 ns/op (+377.95)
>> +StringFormat.widthStringIntFormat avgt 15 101.814 ? 3.147 ns/op (+206.91)
>
> src/java.base/share/classes/java/util/Formatter.java line 2949:
>
>> 2947: }
>> 2948: } else {
>> 2949: if (first == '0') {
>
> While it's clever to avoid re-parsing I think it muddies the control flow. It would be simpler if we always reset to `off = start; c = first` in this `else` block then unconditionally call `parseFlags(); parseWidth();` outside in `parse`. The few extra calls to `s.charAt(..)` this might add a little overhead on some tests, but the JIT might like the brevity and less branchy structure overall and on larger benchmarks.. Maybe worth experimenting with.
Good idea. In addition, I also plan to simplify the writing of the for statement, such as:
for (int size = 0; off < max; ++off, c = s.charAt(off), size++) {
==>
for (int size = 0; off < max; c = s.charAt(++off), size++) {
> src/java.base/share/classes/java/util/Formatter.java line 3420:
>
>> 3418: && fmt.a instanceof StringBuilder sb
>> 3419: ) {
>> 3420: sb.append(value);
>
> There's a lot of `if`s here, and this doesn't take into account locales with non-ASCII digits:
>
> Locale ar = new Locale.Builder().setLanguageTag("ar-SA-u-nu-arab").build();
> Locale.setDefault(ar);
> System.out.println("%d".formatted(10000)); // should print "١٠٠٠٠" but prints "10000"
The change code of print fast-path has been deleted, and parse fast-path has added support for the pattern "%8.3f".
Where to draw the line of parse fast-path? I have seen patterns that cause performance problems, and they can be easily implemented, so I added them.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1335909991
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1328178751
More information about the core-libs-dev
mailing list