RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v4]
Thomas Schatzl
tschatzl at openjdk.org
Mon Sep 25 14:13:34 UTC 2023
> Hi all,
>
> please review this change that modifies the code root (remembered) set to use the CHT as internal representation.
>
> This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets.
>
> With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues:
>
> During collection pauses:
>
> [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms
> [..]
> [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16
> [...]
> [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16
>
>
> Code root scan now reduces to ~22ms max on average in this case.
>
> Class unloading (breaking down the code cache flushing, i.e. `CodeCache::flush_unlinked_nmethods`):
>
> Clear Exception Caches 35,5ms
> Unregister NMethods 598,5ms <---- this is nmethod unregistering.
> Unregister Old NMethods 3,0ms
> CodeBlob flush 41,1ms
> CodeCache free 5730,3ms
>
>
> With this change, the `unregister nmethods` phase takes ~25ms max on that stress test. @walulyai contributed this part.
>
> We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too:
>
> [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10
>
>
> Some random comment:
> * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now.
>
> Testing: tier1-5
>
> Thanks,
> Thomas
Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
- Merge branch 'master' into 8315503-code-root-scan-imbalance
- iwalulya review - more (gtest) cleanup
- iwalulya review
- initial version that seems to work
Contains kludge to avoid modification of currently scanned code root set.
Ought to be fixed differently.
Contains debug code in table scanners of CodeRootSet/CardSet to find out problems with table growing
Hashcode hack for code root set, using copy&paste ZHash
Shrink table after clean
Bulk removal of nmethods from code root sets after class unloading. From Ivan.
Cleanup, resize after bulk delete, hashcode verification
-------------
Changes: https://git.openjdk.org/jdk/pull/15811/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15811&range=03
Stats: 458 lines in 23 files changed: 283 ins; 109 del; 66 mod
Patch: https://git.openjdk.org/jdk/pull/15811.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/15811/head:pull/15811
PR: https://git.openjdk.org/jdk/pull/15811
More information about the hotspot-gc-dev
mailing list