RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v13]

Raffaello Giulietti rgiulietti at openjdk.org
Thu Sep 28 10:03:36 UTC 2023


On Wed, 27 Sep 2023 21:26:46 GMT, 温绍锦 <duke at openjdk.org> wrote:

>> @cl4es made performance optimizations for the simple specifiers of String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the same idea, I continued to make improvements. I made patterns like %2d %02d also be optimized.
>> 
>> The following are the test results based on MacBookPro M1 Pro: 
>> 
>> 
>> -Benchmark                          Mode  Cnt     Score     Error  Units
>> -StringFormat.complexFormat         avgt   15  1862.233 ? 217.479  ns/op
>> -StringFormat.int02Format           avgt   15   312.491 ?  26.021  ns/op
>> -StringFormat.intFormat             avgt   15    84.432 ?   4.145  ns/op
>> -StringFormat.longFormat            avgt   15    87.330 ?   6.111  ns/op
>> -StringFormat.stringFormat          avgt   15    63.985 ?  11.366  ns/op
>> -StringFormat.stringIntFormat       avgt   15    87.422 ?   0.147  ns/op
>> -StringFormat.widthStringFormat     avgt   15   250.740 ?  32.639  ns/op
>> -StringFormat.widthStringIntFormat  avgt   15   312.474 ?  16.309  ns/op
>> 
>> +Benchmark                          Mode  Cnt    Score    Error  Units
>> +StringFormat.complexFormat         avgt   15  740.626 ? 66.671  ns/op (+151.45)
>> +StringFormat.int02Format           avgt   15  131.049 ?  0.432  ns/op (+138.46)
>> +StringFormat.intFormat             avgt   15   67.229 ?  4.155  ns/op (+25.59)
>> +StringFormat.longFormat            avgt   15   66.444 ?  0.614  ns/op (+31.44)
>> +StringFormat.stringFormat          avgt   15   62.619 ?  4.652  ns/op (+2.19)
>> +StringFormat.stringIntFormat       avgt   15   89.606 ? 13.966  ns/op (-2.44)
>> +StringFormat.widthStringFormat     avgt   15   52.462 ? 15.649  ns/op (+377.95)
>> +StringFormat.widthStringIntFormat  avgt   15  101.814 ?  3.147  ns/op (+206.91)
>
> 温绍锦 has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Refactor according to rgiulietti's suggestion and add testcases

For maintainability purposes, we should strive for simplicity and readability.

src/java.base/share/classes/java/util/Formatter.java line 2899:

> 2897:             if (c == '.') {
> 2898:                 // (\.\d+)?
> 2899:                 precisionSize = parsePrecision();

`precisionSize == 0` has two different meanings: (1) there's a '.' not followed by digits and (2) there's no precision field at all.

In the proposed [alternative](https://github.com/openjdk/jdk/pull/15776#issuecomment-1737986458), case (1) is indicated by `-1`, while case (2) is indicated by `0`. Besides, the decision is taken in the `parsePrecision()` method, not partly here and partly in the method.

src/java.base/share/classes/java/util/Formatter.java line 2934:

> 2932:             // (\d+\$)?
> 2933:             int i = off;
> 2934:             for (; i < max && isDigit(c); c = next(++i));  // empty body

No need for `next()` if done as in the proposed alternative.

src/java.base/share/classes/java/util/Formatter.java line 2947:

> 2945:                     c = first;
> 2946:                 }
> 2947:             }

I think the logic in the proposed alternative is more readable.

src/java.base/share/classes/java/util/Formatter.java line 2953:

> 2951:             // ([-#+ 0,(\<]*)?
> 2952:             int i = off;
> 2953:             for (; i < max && Flags.isFlag(c); c = next(++i));  // empty body

No need for `next()` if done as in the proposed alternative.

src/java.base/share/classes/java/util/Formatter.java line 2961:

> 2959:             // (\d+)?
> 2960:             int i = off;
> 2961:             for (; i < max && isDigit(c); c = next(++i));  // empty body

You are using `next()` here, but the simpler solution in `parsePrecision()`.
I think consistency simplifies understanding.

src/java.base/share/classes/java/util/Formatter.java line 2978:

> 2976:         }
> 2977: 
> 2978:         private char next(int off) {

There's no real need for this more expensive method, compared with a simple `s.charAt(i)`, in the alternative proposed [before](https://github.com/openjdk/jdk/pull/15776#issuecomment-1737986458).

-------------

PR Review: https://git.openjdk.org/jdk/pull/15776#pullrequestreview-1647449239
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339852931
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339856693
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339856725
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339857403
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339844252
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339842414


More information about the core-libs-dev mailing list