RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers
Shaojin Wen
duke at openjdk.org
Mon Oct 16 16:18:17 UTC 2023
On Sun, 17 Sep 2023 16:01:33 GMT, Shaojin Wen <duke at openjdk.org> wrote:
> @cl4es made performance optimizations for the simple specifiers of String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the same idea, I continued to make improvements. I made patterns like %2d %02d also be optimized.
>
> The following are the test results based on MacBookPro M1 Pro:
>
>
> -Benchmark Mode Cnt Score Error Units
> -StringFormat.complexFormat avgt 15 1862.233 ? 217.479 ns/op
> -StringFormat.int02Format avgt 15 312.491 ? 26.021 ns/op
> -StringFormat.intFormat avgt 15 84.432 ? 4.145 ns/op
> -StringFormat.longFormat avgt 15 87.330 ? 6.111 ns/op
> -StringFormat.stringFormat avgt 15 63.985 ? 11.366 ns/op
> -StringFormat.stringIntFormat avgt 15 87.422 ? 0.147 ns/op
> -StringFormat.widthStringFormat avgt 15 250.740 ? 32.639 ns/op
> -StringFormat.widthStringIntFormat avgt 15 312.474 ? 16.309 ns/op
>
> +Benchmark Mode Cnt Score Error Units
> +StringFormat.complexFormat avgt 15 740.626 ? 66.671 ns/op (+151.45)
> +StringFormat.int02Format avgt 15 131.049 ? 0.432 ns/op (+138.46)
> +StringFormat.intFormat avgt 15 67.229 ? 4.155 ns/op (+25.59)
> +StringFormat.longFormat avgt 15 66.444 ? 0.614 ns/op (+31.44)
> +StringFormat.stringFormat avgt 15 62.619 ? 4.652 ns/op (+2.19)
> +StringFormat.stringIntFormat avgt 15 89.606 ? 13.966 ns/op (-2.44)
> +StringFormat.widthStringFormat avgt 15 52.462 ? 15.649 ns/op (+377.95)
> +StringFormat.widthStringIntFormat avgt 15 101.814 ? 3.147 ns/op (+206.91)
I enhanced parse fast-path to support more specifiers, including:
% flag_1 width_1
% flag_2
% width_2
% width_1 . precesion_1
now benchmark on macbook m1 pro result is:
-Benchmark Mode Cnt Score Error Units (optimized)
-StringFormat.complexFormat avgt 15 2049.387 ? 121.539 ns/op
-StringFormat.flags2Format avgt 15 430.964 ? 2.414 ns/op
-StringFormat.flagsFormat avgt 15 257.851 ? 23.833 ns/op
-StringFormat.stringFormat avgt 15 63.564 ? 10.490 ns/op
-StringFormat.stringIntFormat avgt 15 88.111 ? 0.678 ns/op
-StringFormat.width2Format avgt 15 349.304 ? 31.349 ns/op
-StringFormat.width2PrecisionFormat avgt 15 464.621 ? 53.918 ns/op
-StringFormat.widthFormat avgt 15 301.997 ? 34.974 ns/op
-StringFormat.widthPrecisionFormat avgt 15 484.526 ? 38.098 ns/op
-StringFormat.widthStringFormat avgt 15 235.421 ? 32.955 ns/op
-StringFormat.widthStringIntFormat avgt 15 315.178 ? 15.154 ns/op
+Benchmark Mode Cnt Score Error Units
+StringFormat.complexFormat avgt 15 702.407 ? 85.481 ns/op (+191.77)
+StringFormat.flags2Format avgt 15 329.551 ? 1.610 ns/op (+30.78)
+StringFormat.flagsFormat avgt 15 125.798 ? 1.109 ns/op (+104.98)
+StringFormat.stringFormat avgt 15 60.029 ? 6.275 ns/op (+5.89)
+StringFormat.stringIntFormat avgt 15 89.020 ? 0.575 ns/op (-1.03)
+StringFormat.width2Format avgt 15 135.743 ? 0.643 ns/op (+157.33)
+StringFormat.width2PrecisionFormat avgt 15 351.408 ? 21.031 ns/op (+32.22)
+StringFormat.widthFormat avgt 15 208.843 ? 47.504 ns/op (+44.61)
+StringFormat.widthPrecisionFormat avgt 15 354.375 ? 67.314 ns/op (+36.73)
+StringFormat.widthStringFormat avgt 15 74.846 ? 19.604 ns/op (+214.55)
+StringFormat.widthStringIntFormat avgt 15 101.638 ? 0.961 ns/op (+210.10)
> I was worried this would sprawl out more, but perhaps ~230 lines of code is a reasonable extra weight to make the long tail of `String.format`'s regex-free.
>
> I was going to comment that the flag parsing was broken in [f303f29](https://github.com/openjdk/jdk/commit/f303f2959d108d993dc03e86a27ef42bb892647f) but it seems that it was fixed in the latest. I think we need to make a review pass over all existing tests to make sure all imaginable variants are covered.
>
> The parser code also ought to be shared between `Formatter` and `FormatProcessor` so that there's a single source of truth going forward.
The codes of Formatter and FormatProcessor have been regex-free. There are many changes and require more detailed review.
I will delete redundant performance tests later. and I will delete redundant performance tests ,The current results are as follows :
# Performance Numbers
## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i)
* cpu : intel xeon sapphire rapids (x64)
* os debian linux
-Benchmark Mode Cnt Score Error Units (baseline)
-StringFormat.complexFormat avgt 15 1426.696 ? 18.469 ns/op
-StringFormat.flags2Format avgt 15 164.141 ? 2.264 ns/op
-StringFormat.flagsFormat avgt 15 169.313 ? 6.616 ns/op
-StringFormat.stringFormat avgt 15 34.710 ? 0.075 ns/op
-StringFormat.stringIntFormat avgt 15 85.152 ? 0.337 ns/op
-StringFormat.width2Format avgt 15 242.483 ? 5.586 ns/op
-StringFormat.width2PrecisionFormat avgt 15 282.838 ? 2.564 ns/op
-StringFormat.widthFormat avgt 15 175.460 ? 4.458 ns/op
-StringFormat.widthPrecisionFormat avgt 15 244.593 ? 3.605 ns/op
-StringFormat.widthStringFormat avgt 15 144.487 ? 5.271 ns/op
-StringFormat.widthStringIntFormat avgt 15 223.913 ? 6.387 ns/op
+Benchmark Mode Cnt Score Error Units (59c2983b)
+StringFormat.complexFormat avgt 15 582.650 ? 17.399 ns/op (+144.87)
+StringFormat.flags2Format avgt 15 74.214 ? 0.703 ns/op (+121.18)
+StringFormat.flagsFormat avgt 15 67.764 ? 0.572 ns/op (+149.86)
+StringFormat.stringFormat avgt 15 34.659 ? 0.201 ns/op (+0.15)
+StringFormat.stringIntFormat avgt 15 84.448 ? 0.532 ns/op (+0.84)
+StringFormat.width2Format avgt 15 123.012 ? 0.513 ns/op (+97.13)
+StringFormat.width2PrecisionFormat avgt 15 148.092 ? 1.273 ns/op (+90.99)
+StringFormat.widthFormat avgt 15 69.575 ? 1.023 ns/op (+152.19)
+StringFormat.widthPrecisionFormat avgt 15 116.187 ? 0.938 ns/op (+110.52)
+StringFormat.widthStringFormat avgt 15 48.389 ? 0.298 ns/op (+198.60)
+StringFormat.widthStringIntFormat avgt 15 103.617 ? 2.204 ns/op (+116.10)
## 2. [aliyun_ecs_c8y.xlarge](https://help.aliyun.com/document_detail/25378.html#c8y)
* cpu : aliyun yitian 710 (aarch64)
* os debian linux
-Benchmark Mode Cnt Score Error Units (baseline)
-StringFormat.complexFormat avgt 15 2321.319 ? 9.624 ns/op
-StringFormat.flags2Format avgt 15 310.377 ? 10.367 ns/op
-StringFormat.flagsFormat avgt 15 295.118 ? 8.645 ns/op
-StringFormat.stringFormat avgt 15 55.966 ? 0.949 ns/op
-StringFormat.stringIntFormat avgt 15 157.949 ? 2.972 ns/op
-StringFormat.width2Format avgt 15 380.621 ? 11.301 ns/op
-StringFormat.width2PrecisionFormat avgt 15 447.285 ? 7.323 ns/op
-StringFormat.widthFormat avgt 15 312.622 ? 5.104 ns/op
-StringFormat.widthPrecisionFormat avgt 15 407.196 ? 6.466 ns/op
-StringFormat.widthStringFormat avgt 15 248.538 ? 2.356 ns/op
-StringFormat.widthStringIntFormat avgt 15 416.661 ? 6.685 ns/op
+Benchmark Mode Cnt Score Error Units (59c2983b)
+StringFormat.complexFormat avgt 15 930.922 ? 91.995 ns/op (+149.36)
+StringFormat.flags2Format avgt 15 132.746 ? 10.809 ns/op (+133.82)
+StringFormat.flagsFormat avgt 15 119.267 ? 11.709 ns/op (+147.45)
+StringFormat.stringFormat avgt 15 55.820 ? 0.324 ns/op (+0.27)
+StringFormat.stringIntFormat avgt 15 154.045 ? 7.327 ns/op (+2.54)
+StringFormat.width2Format avgt 15 177.655 ? 4.797 ns/op (+114.25)
+StringFormat.width2PrecisionFormat avgt 15 236.680 ? 4.266 ns/op (+88.99)
+StringFormat.widthFormat avgt 15 132.043 ? 15.730 ns/op (+136.76)
+StringFormat.widthPrecisionFormat avgt 15 204.085 ? 10.300 ns/op (+99.53)
+StringFormat.widthStringFormat avgt 15 106.971 ? 5.527 ns/op (+132.35)
+StringFormat.widthStringIntFormat avgt 15 215.329 ? 3.786 ns/op (+93.50)
> Please don't pile on new refactorings and improvements on a PR that has been opened for review. Better to let things brew as a draft for a bit if you're not sure you're done before opening the PR for review, then once it's been opened (like this one) consider preparing follow-up PR instead of refactoring as you go.
>
> Specifically I'm not sure [0d977b2](https://github.com/openjdk/jdk/commit/0d977b2febe455f4535e6ee2cb19d3b168d764e3) is a good idea and would like you to roll those changes back. Object pooling for trivial, short-lived objects are considered an anti-pattern, as they add references to old GC generations and share many of the same drawbacks as lookup tables, such as increased cache traffic. Showing great wins on microbenchmarks while being a wash or even regressing real applications.
Sorry, I will pay attention to it in the future and modify it in the open review code. I have revert commit to #0d977b2. I agree with your view on the performance issues of old reference.
int parse(List<FormatString> al, char first, int start, String s, int max)
java.util.Formatter::parse (469 bytes) failed to inline: callee is too large
The reason why I split it into multiple small methods is to avoid a single method codeSize > 325. After merging small methods, the performance will decrease.
I know, so I'm asking for your opinion, and if you don't agree, I won't submit big changes. There have been a lot of changes now, and I will continue to complete this PR based on the current version.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1723733247
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1730164585
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1731163579
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1732567054
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1733490226
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1740015776
More information about the core-libs-dev
mailing list