RFR: 8343704: Bad GC parallelism with processing Cleaner queues
Aleksey Shipilev
shade at openjdk.org
Tue Nov 12 16:07:48 UTC 2024
On Tue, 12 Nov 2024 16:00:39 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> See the bug for more discussion and reproducer. This PR replaces the linked list with an `ArrayList` wrapper that manages synchronization, search and replacements effectively. There are possible improvements here, most glaring is parallelism that is currently knee-capped by global synchronization. The synchronization scheme follows what we already have, and I think it is safer to continue with it right now.
>
> I'll put performance data in a separate comment.
>
> Additional testing:
> - [x] Original reproducer improves drastically
> - [x] New microbenchmark shows no regression on "churning" tests, which covers insertion/removal perf
> - [x] New microbenchmark shows improvement on Full GC times (crude, but repeatable), serves as a proxy for reproducer
> - [x] `java/lang/ref` tests in release
> - [ ] `all` tests in fastdebug
Original reproducer on my M1:
# Before
...
[8.989s][info ][gc ] GC(50) Pause Young (Normal) (G1 Evacuation Pause) 608M->21M(1011M) 46.562ms
[9.187s][info ][gc ] GC(51) Pause Young (Normal) (G1 Evacuation Pause) 608M->22M(1011M) 45.286ms
[9.387s][info ][gc ] GC(52) Pause Young (Normal) (G1 Evacuation Pause) 609M->21M(1011M) 45.636ms
[9.592s][info ][gc ] GC(53) Pause Young (Normal) (G1 Evacuation Pause) 608M->22M(1015M) 47.514ms
[9.794s][info ][gc ] GC(54) Pause Young (Normal) (G1 Evacuation Pause) 612M->22M(1015M) 46.807ms
[9.993s][info ][gc ] GC(55) Pause Young (Normal) (G1 Evacuation Pause) 612M->21M(1015M) 45.964ms
# After
...
[6.964s][info ][gc ] GC(50) Pause Young (Normal) (G1 Evacuation Pause) 521M->36M(830M) 11.096ms
[7.108s][info ][gc ] GC(51) Pause Young (Normal) (G1 Evacuation Pause) 521M->36M(830M) 11.380ms
[7.252s][info ][gc ] GC(52) Pause Young (Normal) (G1 Evacuation Pause) 521M->36M(830M) 11.293ms
[7.397s][info ][gc ] GC(53) Pause Young (Normal) (G1 Evacuation Pause) 520M->35M(830M) 12.407ms
[7.540s][info ][gc ] GC(54) Pause Young (Normal) (G1 Evacuation Pause) 520M->37M(830M) 11.096ms
A closest reproducer in form of JMH test also improves:
Benchmark (count) (recipFreq) Mode Cnt Score Error Units
# Before
CleanerGC.test 16384 N/A avgt 15 2.170 ± 0.082 ms/op
CleanerGC.test 65536 N/A avgt 15 2.281 ± 0.104 ms/op
CleanerGC.test 262144 N/A avgt 15 6.176 ± 0.466 ms/op
CleanerGC.test 1048576 N/A avgt 15 22.913 ± 5.171 ms/op
CleanerGC.test 4194304 N/A avgt 15 77.781 ± 14.937 ms/op
# After
CleanerGC.test 16384 N/A avgt 15 2.169 ± 0.061 ms/op
CleanerGC.test 65536 N/A avgt 15 2.247 ± 0.083 ms/op
CleanerGC.test 262144 N/A avgt 15 3.822 ± 0.191 ms/op
CleanerGC.test 1048576 N/A avgt 15 9.750 ± 0.638 ms/op
CleanerGC.test 4194304 N/A avgt 15 33.842 ± 5.382 ms/op
Churn benchmark, which covers insertion/removal perf, matches the original implementation closely:
Benchmark (count) (recipFreq) Mode Cnt Score Error Units
# Before
CleanerChurn.test N/A 128 avgt 9 7.063 ± 0.262 ns/op
CleanerChurn.test N/A 256 avgt 9 5.669 ± 0.118 ns/op
CleanerChurn.test N/A 512 avgt 9 5.025 ± 0.066 ns/op
CleanerChurn.test N/A 1024 avgt 9 4.714 ± 0.086 ns/op
CleanerChurn.test N/A 2048 avgt 9 4.595 ± 0.091 ns/op
# After
CleanerChurn.test N/A 128 avgt 9 7.050 ± 0.847 ns/op
CleanerChurn.test N/A 256 avgt 9 5.378 ± 0.186 ns/op
CleanerChurn.test N/A 512 avgt 9 4.896 ± 0.112 ns/op
CleanerChurn.test N/A 1024 avgt 9 4.712 ± 0.063 ns/op
CleanerChurn.test N/A 2048 avgt 9 4.671 ± 0.071 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22043#issuecomment-2470928360
More information about the core-libs-dev
mailing list