RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers

Shaojin Wen duke at openjdk.org
Mon Oct 16 16:18:17 UTC 2023


On Sun, 17 Sep 2023 16:01:33 GMT, Shaojin Wen <duke at openjdk.org> wrote:

> @cl4es made performance optimizations for the simple specifiers of String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the same idea, I continued to make improvements. I made patterns like %2d %02d also be optimized.
> 
> The following are the test results based on MacBookPro M1 Pro: 
> 
> 
> -Benchmark                          Mode  Cnt     Score     Error  Units
> -StringFormat.complexFormat         avgt   15  1862.233 ? 217.479  ns/op
> -StringFormat.int02Format           avgt   15   312.491 ?  26.021  ns/op
> -StringFormat.intFormat             avgt   15    84.432 ?   4.145  ns/op
> -StringFormat.longFormat            avgt   15    87.330 ?   6.111  ns/op
> -StringFormat.stringFormat          avgt   15    63.985 ?  11.366  ns/op
> -StringFormat.stringIntFormat       avgt   15    87.422 ?   0.147  ns/op
> -StringFormat.widthStringFormat     avgt   15   250.740 ?  32.639  ns/op
> -StringFormat.widthStringIntFormat  avgt   15   312.474 ?  16.309  ns/op
> 
> +Benchmark                          Mode  Cnt    Score    Error  Units
> +StringFormat.complexFormat         avgt   15  740.626 ? 66.671  ns/op (+151.45)
> +StringFormat.int02Format           avgt   15  131.049 ?  0.432  ns/op (+138.46)
> +StringFormat.intFormat             avgt   15   67.229 ?  4.155  ns/op (+25.59)
> +StringFormat.longFormat            avgt   15   66.444 ?  0.614  ns/op (+31.44)
> +StringFormat.stringFormat          avgt   15   62.619 ?  4.652  ns/op (+2.19)
> +StringFormat.stringIntFormat       avgt   15   89.606 ? 13.966  ns/op (-2.44)
> +StringFormat.widthStringFormat     avgt   15   52.462 ? 15.649  ns/op (+377.95)
> +StringFormat.widthStringIntFormat  avgt   15  101.814 ?  3.147  ns/op (+206.91)

I enhanced parse fast-path to support more specifiers, including:

% flag_1 width_1
% flag_2
% width_2
% width_1 . precesion_1


now benchmark on macbook m1 pro result is:

-Benchmark                           Mode  Cnt     Score     Error  Units (optimized)
-StringFormat.complexFormat          avgt   15  2049.387 ? 121.539  ns/op
-StringFormat.flags2Format           avgt   15   430.964 ?   2.414  ns/op
-StringFormat.flagsFormat            avgt   15   257.851 ?  23.833  ns/op
-StringFormat.stringFormat           avgt   15    63.564 ?  10.490  ns/op
-StringFormat.stringIntFormat        avgt   15    88.111 ?   0.678  ns/op
-StringFormat.width2Format           avgt   15   349.304 ?  31.349  ns/op
-StringFormat.width2PrecisionFormat  avgt   15   464.621 ?  53.918  ns/op
-StringFormat.widthFormat            avgt   15   301.997 ?  34.974  ns/op
-StringFormat.widthPrecisionFormat   avgt   15   484.526 ?  38.098  ns/op
-StringFormat.widthStringFormat      avgt   15   235.421 ?  32.955  ns/op
-StringFormat.widthStringIntFormat   avgt   15   315.178 ?  15.154  ns/op

+Benchmark                           Mode  Cnt    Score    Error  Units
+StringFormat.complexFormat          avgt   15  702.407 ? 85.481  ns/op (+191.77)
+StringFormat.flags2Format           avgt   15  329.551 ?  1.610  ns/op (+30.78)
+StringFormat.flagsFormat            avgt   15  125.798 ?  1.109  ns/op (+104.98)
+StringFormat.stringFormat           avgt   15   60.029 ?  6.275  ns/op (+5.89)
+StringFormat.stringIntFormat        avgt   15   89.020 ?  0.575  ns/op (-1.03)
+StringFormat.width2Format           avgt   15  135.743 ?  0.643  ns/op (+157.33)
+StringFormat.width2PrecisionFormat  avgt   15  351.408 ? 21.031  ns/op (+32.22)
+StringFormat.widthFormat            avgt   15  208.843 ? 47.504  ns/op (+44.61)
+StringFormat.widthPrecisionFormat   avgt   15  354.375 ? 67.314  ns/op (+36.73)
+StringFormat.widthStringFormat      avgt   15   74.846 ? 19.604  ns/op (+214.55)
+StringFormat.widthStringIntFormat   avgt   15  101.638 ?  0.961  ns/op (+210.10)

> I was worried this would sprawl out more, but perhaps ~230 lines of code is a reasonable extra weight to make the long tail of `String.format`'s regex-free.
> 
> I was going to comment that the flag parsing was broken in [f303f29](https://github.com/openjdk/jdk/commit/f303f2959d108d993dc03e86a27ef42bb892647f) but it seems that it was fixed in the latest. I think we need to make a review pass over all existing tests to make sure all imaginable variants are covered.
> 
> The parser code also ought to be shared between `Formatter` and `FormatProcessor` so that there's a single source of truth going forward.

The codes of Formatter and FormatProcessor have been regex-free. There are many changes and require more detailed review.

I will delete redundant performance tests later. and I will delete redundant performance tests ,The current results are as follows :

# Performance Numbers

## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i)
* cpu : intel xeon sapphire rapids (x64)
* os debian linux


-Benchmark                           Mode  Cnt     Score    Error  Units (baseline)
-StringFormat.complexFormat          avgt   15  1426.696 ? 18.469  ns/op
-StringFormat.flags2Format           avgt   15   164.141 ?  2.264  ns/op
-StringFormat.flagsFormat            avgt   15   169.313 ?  6.616  ns/op
-StringFormat.stringFormat           avgt   15    34.710 ?  0.075  ns/op
-StringFormat.stringIntFormat        avgt   15    85.152 ?  0.337  ns/op
-StringFormat.width2Format           avgt   15   242.483 ?  5.586  ns/op
-StringFormat.width2PrecisionFormat  avgt   15   282.838 ?  2.564  ns/op
-StringFormat.widthFormat            avgt   15   175.460 ?  4.458  ns/op
-StringFormat.widthPrecisionFormat   avgt   15   244.593 ?  3.605  ns/op
-StringFormat.widthStringFormat      avgt   15   144.487 ?  5.271  ns/op
-StringFormat.widthStringIntFormat   avgt   15   223.913 ?  6.387  ns/op

+Benchmark                           Mode  Cnt    Score    Error  Units  (59c2983b)
+StringFormat.complexFormat          avgt   15  582.650 ? 17.399  ns/op (+144.87)
+StringFormat.flags2Format           avgt   15   74.214 ?  0.703  ns/op (+121.18)
+StringFormat.flagsFormat            avgt   15   67.764 ?  0.572  ns/op (+149.86)
+StringFormat.stringFormat           avgt   15   34.659 ?  0.201  ns/op (+0.15)
+StringFormat.stringIntFormat        avgt   15   84.448 ?  0.532  ns/op (+0.84)
+StringFormat.width2Format           avgt   15  123.012 ?  0.513  ns/op (+97.13)
+StringFormat.width2PrecisionFormat  avgt   15  148.092 ?  1.273  ns/op (+90.99)
+StringFormat.widthFormat            avgt   15   69.575 ?  1.023  ns/op (+152.19)
+StringFormat.widthPrecisionFormat   avgt   15  116.187 ?  0.938  ns/op (+110.52)
+StringFormat.widthStringFormat      avgt   15   48.389 ?  0.298  ns/op (+198.60)
+StringFormat.widthStringIntFormat   avgt   15  103.617 ?  2.204  ns/op (+116.10)



## 2. [aliyun_ecs_c8y.xlarge](https://help.aliyun.com/document_detail/25378.html#c8y)
* cpu : aliyun yitian 710 (aarch64)
* os debian linux


-Benchmark                           Mode  Cnt     Score    Error  Units (baseline)
-StringFormat.complexFormat          avgt   15  2321.319 ?  9.624  ns/op
-StringFormat.flags2Format           avgt   15   310.377 ? 10.367  ns/op
-StringFormat.flagsFormat            avgt   15   295.118 ?  8.645  ns/op
-StringFormat.stringFormat           avgt   15    55.966 ?  0.949  ns/op
-StringFormat.stringIntFormat        avgt   15   157.949 ?  2.972  ns/op
-StringFormat.width2Format           avgt   15   380.621 ? 11.301  ns/op
-StringFormat.width2PrecisionFormat  avgt   15   447.285 ?  7.323  ns/op
-StringFormat.widthFormat            avgt   15   312.622 ?  5.104  ns/op
-StringFormat.widthPrecisionFormat   avgt   15   407.196 ?  6.466  ns/op
-StringFormat.widthStringFormat      avgt   15   248.538 ?  2.356  ns/op
-StringFormat.widthStringIntFormat   avgt   15   416.661 ?  6.685  ns/op

+Benchmark                           Mode  Cnt    Score    Error  Units (59c2983b)
+StringFormat.complexFormat          avgt   15  930.922 ? 91.995  ns/op (+149.36)
+StringFormat.flags2Format           avgt   15  132.746 ? 10.809  ns/op (+133.82)
+StringFormat.flagsFormat            avgt   15  119.267 ? 11.709  ns/op (+147.45)
+StringFormat.stringFormat           avgt   15   55.820 ?  0.324  ns/op (+0.27)
+StringFormat.stringIntFormat        avgt   15  154.045 ?  7.327  ns/op (+2.54)
+StringFormat.width2Format           avgt   15  177.655 ?  4.797  ns/op (+114.25)
+StringFormat.width2PrecisionFormat  avgt   15  236.680 ?  4.266  ns/op (+88.99)
+StringFormat.widthFormat            avgt   15  132.043 ? 15.730  ns/op (+136.76)
+StringFormat.widthPrecisionFormat   avgt   15  204.085 ? 10.300  ns/op (+99.53)
+StringFormat.widthStringFormat      avgt   15  106.971 ?  5.527  ns/op (+132.35)
+StringFormat.widthStringIntFormat   avgt   15  215.329 ?  3.786  ns/op (+93.50)

> Please don't pile on new refactorings and improvements on a PR that has been opened for review. Better to let things brew as a draft for a bit if you're not sure you're done before opening the PR for review, then once it's been opened (like this one) consider preparing follow-up PR instead of refactoring as you go.
> 
> Specifically I'm not sure [0d977b2](https://github.com/openjdk/jdk/commit/0d977b2febe455f4535e6ee2cb19d3b168d764e3) is a good idea and would like you to roll those changes back. Object pooling for trivial, short-lived objects are considered an anti-pattern, as they add references to old GC generations and share many of the same drawbacks as lookup tables, such as increased cache traffic. Showing great wins on microbenchmarks while being a wash or even regressing real applications.

Sorry, I will pay attention to it in the future and modify it in the open review code. I have revert commit to #0d977b2. I agree with your view on the performance issues of old reference.

int parse(List<FormatString> al, char first, int start, String s, int max)

java.util.Formatter::parse (469 bytes)   failed to inline: callee is too large

The reason why I split it into multiple small methods is to avoid a single method codeSize > 325. After merging small methods, the performance will decrease.

I know, so I'm asking for your opinion, and if you don't agree, I won't submit big changes. There have been a lot of changes now, and I will continue to complete this PR based on the current version.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1723733247
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1730164585
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1731163579
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1732567054
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1733490226
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1740015776


More information about the core-libs-dev mailing list