Integrated: 8316582: Minor startup regression in 22-b15 due JDK-8310929
Claes Redestad
redestad at openjdk.org
Thu Sep 21 09:41:35 UTC 2023
On Wed, 20 Sep 2023 09:12:48 GMT, Claes Redestad <redestad at openjdk.org> wrote:
> This patch reverts the use of `ByteArrayLittleEndian` in `StringLatin1`.
>
> This use is the cause of a small (~1.5ms) startup regression in 22-b15. While a manageable startup regression in and of itself, the use of `VarHandles` in core utility classes brings an increased risk of bootstrap circularity issues, for example disqualifying the use of things like `Integers.toString` in some places.
>
> Reverting this partially rolls back the performance improvement gained by JDK-8310929. It seems reasonable that the compiler can be enhanced to gain that loss back.
This PR vs a 22-b15 baseline:
Name Cnt Base Error Test Error Unit Diff%
Integers.toStringBig 15 5,318 ± 0,043 6,628 ± 0,127 us/op -24,6% (p = 0,000*)
Integers.toStringSmall 15 3,202 ± 0,018 3,562 ± 0,027 us/op -11,2% (p = 0,000*)
Integers.toStringTiny 15 2,286 ± 0,017 2,352 ± 0,024 us/op -2,9% (p = 0,000*)
* = significant
This PR vs a 22-b14 baseline:
Name Cnt Base Error Test Error Unit Diff%
Integers.toStringBig 15 12,313 ± 0,143 6,628 ± 0,127 us/op 46,2% (p = 0,000*)
Integers.toStringSmall 15 4,816 ± 0,074 3,562 ± 0,027 us/op 26,0% (p = 0,000*)
Integers.toStringTiny 15 2,611 ± 0,022 2,352 ± 0,024 us/op 9,9% (p = 0,000*)
* = significant
There's still a substantial win compared to 22-b14, stemming from the use of a packed lookup table rather than two disjoint tables for tens and single digit numbers.
Startup numbers improve with the above patch to levels on par with 22-b14:
Name Cnt Base Error Test Error Unit Diff%
Perfstartup-Noop-G1 20 30,000 ± 0,000 28,500 ± 3,181 ms/op 5,0% (p = 0,083 )
:.cycles 20 88166516,750 ± 2119868,114 84226439,550 ± 1792195,203 cycles -4,5% (p = 0,000*)
:.instructions 20 204321816,400 ± 248867,819 195313416,200 ± 196361,902 instructions -4,4% (p = 0,000*)
:.taskclock 20 12,000 ± 4,543 10,000 ± 0,000 ms -16,7% (p = 0,104 )
* = significant
(This is simply a Noop/Hello World program in a loop, with stats collected by `/usr/bin/time -l`, run on a MacBook M1)
FWIW when initializing `DIGITS` directly (`DIGITS = new byte[] { ...`) the `DecimalDigits` class is 2610 bytes, with the for loop in a `static` block it drops down to 2112 bytes. Array constants like this generate sad and bloated bytecode:
0: bipush 100
2: newarray short
4: dup
5: iconst_0
6: sipush 12336
9: sastore
...
40: dup
41: bipush 6
43: sipush 13872
46: sastore
...
691: dup
692: bipush 99
694: sipush 14649
697: sastore
698: putstatic #13 // Field DIGITS:[S
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727317896
PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727402036
PR Comment: https://git.openjdk.org/jdk/pull/15836#issuecomment-1727935701
More information about the core-libs-dev
mailing list