Integrated: 8316582: Minor startup regression in 22-b15 due JDK-8310929

Claes Redestad redestad at openjdk.org
Thu Sep 21 09:41:35 UTC 2023


On Wed, 20 Sep 2023 09:12:48 GMT, Claes Redestad <redestad at openjdk.org> wrote:

> This patch reverts the use of `ByteArrayLittleEndian` in `StringLatin1`. 
> 
> This use is the cause of a small (~1.5ms) startup regression in 22-b15. While a manageable startup regression in and of itself, the use of `VarHandles` in core utility classes brings an increased risk of bootstrap circularity issues, for example disqualifying the use of things like `Integers.toString` in some places.
> 
> Reverting this partially rolls back the performance improvement gained by JDK-8310929. It seems reasonable that the compiler can be enhanced to gain that loss back.

This PR vs a 22-b15 baseline:

Name                   Cnt  Base   Error   Test   Error  Unit   Diff%
Integers.toStringBig    15 5,318 ± 0,043  6,628 ± 0,127 us/op  -24,6% (p = 0,000*)
Integers.toStringSmall  15 3,202 ± 0,018  3,562 ± 0,027 us/op  -11,2% (p = 0,000*)
Integers.toStringTiny   15 2,286 ± 0,017  2,352 ± 0,024 us/op   -2,9% (p = 0,000*)
  * = significant


This PR vs a 22-b14 baseline:

Name                   Cnt   Base   Error   Test   Error  Unit   Diff%
Integers.toStringBig    15 12,313 ± 0,143  6,628 ± 0,127 us/op   46,2% (p = 0,000*)
Integers.toStringSmall  15  4,816 ± 0,074  3,562 ± 0,027 us/op   26,0% (p = 0,000*)
Integers.toStringTiny   15  2,611 ± 0,022  2,352 ± 0,024 us/op    9,9% (p = 0,000*)
  * = significant


There's still a substantial win compared to 22-b14, stemming from the use of a packed lookup table rather than two disjoint tables for tens and single digit numbers.

Startup numbers improve with the above patch to levels on par with 22-b14:

Name                Cnt          Base         Error           Test         Error         Unit   Diff%
Perfstartup-Noop-G1  20        30,000 ±       0,000         28,500 ±       3,181        ms/op    5,0% (p = 0,083 )
  :.cycles           20  88166516,750 ± 2119868,114   84226439,550 ± 1792195,203       cycles   -4,5% (p = 0,000*)
  :.instructions     20 204321816,400 ±  248867,819  195313416,200 ±  196361,902 instructions   -4,4% (p = 0,000*)
  :.taskclock        20        12,000 ±       4,543         10,000 ±       0,000           ms  -16,7% (p = 0,104 )
  * = significant

(This is simply a Noop/Hello World program in a loop, with stats collected by `/usr/bin/time -l`, run on a MacBook M1)

FWIW when initializing `DIGITS` directly (`DIGITS = new byte[] { ...`)  the `DecimalDigits` class is 2610 bytes, with the for loop in a `static` block it drops down to 2112 bytes. Array constants like this generate sad and bloated bytecode:

         0: bipush        100
         2: newarray       short
         4: dup
         5: iconst_0
         6: sipush        12336
         9: sastore
         ...
        40: dup
        41: bipush        6
        43: sipush        13872
        46: sastore
        ...
       691: dup
       692: bipush        99
       694: sipush        14649
       697: sastore
       698: putstatic     #13                 // Field DIGITS:[S

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727317896
PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727402036
PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727935701


More information about the core-libs-dev mailing list