[crac] RFR: 8362418: [CRaC] Leave unused G1 heap regions committed for C/R

Timofei Pushkin tpushkin at openjdk.org
Wed Jul 16 14:03:35 UTC 2025


See the JBS issue for the problem statement and the suggested approach to fix it.

This PR:
1. Implements the approach by expanding G1 heap back after shrinking and uncommitting it: shrink+uncommit makes memory in  free regions uncommitted, expand commits it back.
2. Removes `CollectedHeap::finish_collection()` added in #93 because it was only used in G1 to synchronize uncommitting with C/R but now uncommitting is performed synchronously during the CRaC-invoked full GC, i.e. it became redundant.
3. Renames `CollectedHeap::do_cleanup_unused()` to `CollectedHeap::should_cleanup_unused()` because I find the old name unintuitive: it sounds like the method _does_ the actual work while in reality it just returns whether the work _should_ be done.

Notes:
- If `-XX:+AlwaysPreTouch` is used (not the default) the cleaned-up pages will become OS-reserved again immediately which means that if the C/R engine saves such memory into the image the image size will become larger after this change than it was before. This is OK since it is off by default and some engines can handle pre-touched pages without fully saving them into the image.
- G1 heap still gets shrunk during C/R because of the automatic shrinking. But it is less aggressive as before which is enough not to cause immediate full/mixed GCs in my testing. The user can also set `MaxHeapFreeRatio` to a larger value during C/R to reduce the shrinking (or even disable it completely) if needed — that was not possible before.

I used a benchmark based on wrk and Helidon's hello-world example to check if there are any performance gains. On JDK 25+14 I saw a 2–3% improvement for the first after-restore iteration performance (before there was a ~4% regression in last-before-checkpoint iteration vs first-after-restore iteration — it became ~1.4%) and a 20% start-up speed-up (100ms vs 80ms). However, between 25+14 and 25+18 there has been some change made that improved the baseline performance to roughly the same level, so the improvement brought by my fix became almost unnoticeable on this particular benchmark on JDK 25+18.

But in GC logs it is clear that less work is performed on restore so I still believe the change is beneficial:

<details>

<summary>W/o the change</summary>

The heap is shrunk from 252MB to 4MB and preparations for mixed GC are performed. Sometimes (but not this time) mixed GCs themselves occur, on 25+14 there were full GCs even.


[1752668313926ms] ################# benchmark before C/R start  #################
[26.924s][info][gc] GC(121) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.691ms
[27.146s][info][gc] GC(122) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.676ms
[27.368s][info][gc] GC(123) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.645ms
[27.589s][info][gc] GC(124) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.657ms
[27.807s][info][gc] GC(125) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.678ms
[1752668315029ms] ################# benchmark before C/R finish #################
2025.07.16 14:18:35.218 INFO Starting checkpoint
2025.07.16 14:18:35.221 INFO [0x2bde2f6b] @default socket closed.
[28.124s][info][gc] GC(126) Pause Full (System.gc()) 84M->3M(28M) 8.329ms
[28.145s][info][gc] GC(127) Pause Full (System.gc()) 4M->3M(14M) 10.228ms
[28.165s][info][gc] GC(128) Pause Full (FullGCAlot) 4M->3M(4M) 7.684ms
[28.169s][info][crac] Checkpoint ...
[28.750s][info][gc  ] GC(129) Pause Young (Concurrent Start) (G1 Evacuation Pause) 3M->3M(4M) 5.813ms
[28.750s][info][gc  ] GC(130) Concurrent Mark Cycle
[28.762s][info][gc  ] GC(130) Pause Remark 3M->3M(8M) 3.104ms
[28.763s][info][gc  ] GC(130) Pause Cleanup 3M->3M(8M) 0.007ms
[28.763s][info][gc  ] GC(130) Concurrent Mark Cycle 13.344ms
2025.07.16 14:18:35.872 INFO [0x3789f1cb] http://0.0.0.0:8080 bound for socket '@default'
2025.07.16 14:18:35.874 INFO Restored all channels in 1 milliseconds. 104 milliseconds since JVM snapshot restore. Java 25-internal-adhoc.timpushkin.crac
[1752668315905ms] ################# benchmark after C/R start  #################
[28.803s][info][gc  ] GC(131) Pause Young (Normal) (G1 Evacuation Pause) 5M->3M(8M) 0.800ms
[28.807s][info][gc  ] GC(132) Pause Young (Concurrent Start) (G1 Evacuation Pause) 5M->4M(10M) 0.766ms
[28.807s][info][gc  ] GC(133) Concurrent Mark Cycle
[28.810s][info][gc  ] GC(133) Pause Remark 4M->4M(10M) 0.684ms
[28.810s][info][gc  ] GC(133) Pause Cleanup 4M->4M(10M) 0.004ms
[28.810s][info][gc  ] GC(133) Concurrent Mark Cycle 3.516ms
[28.813s][info][gc  ] GC(134) Pause Young (Normal) (G1 Evacuation Pause) 6M->4M(10M) 0.516ms
[28.816s][info][gc  ] GC(135) Pause Young (Normal) (G1 Evacuation Pause) 6M->4M(10M) 0.524ms
[28.821s][info][gc  ] GC(136) Pause Young (Normal) (G1 Evacuation Pause) 6M->4M(130M) 0.633ms
[28.911s][info][gc  ] GC(137) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.720ms
[29.013s][info][gc  ] GC(138) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.671ms
[29.117s][info][gc  ] GC(139) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.592ms
[29.218s][info][gc  ] GC(140) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.702ms
[29.322s][info][gc  ] GC(141) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.657ms
[29.423s][info][gc  ] GC(142) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.622ms
[29.528s][info][gc  ] GC(143) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.654ms
[29.632s][info][gc  ] GC(144) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.622ms
[29.738s][info][gc  ] GC(145) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.732ms
[29.843s][info][gc  ] GC(146) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(130M) 0.647ms
[1752668317008ms] ################# benchmark after C/R finish #################


</details>

<details>

<summary>W/o the change + -XX:MaxHeapFreeRatio=100</summary>

The heap is shrunk from 252MB to 10MB now but this is not enough to stop mixed GCs from occurring.


[1752668979182ms] ################# benchmark before C/R start  #################
[26.943s][info][gc] GC(134) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.693ms
[27.144s][info][gc] GC(135) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.640ms
[27.346s][info][gc] GC(136) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.677ms
[27.547s][info][gc] GC(137) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.663ms
[27.746s][info][gc] GC(138) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.627ms
[1752668980284ms] ################# benchmark before C/R finish #################
2025.07.16 14:29:40.475 INFO Starting checkpoint
2025.07.16 14:29:40.478 INFO [0x2bde2f6b] @default socket closed.
[28.121s][info][gc] GC(139) Pause Full (System.gc()) 117M->3M(252M) 7.355ms
[28.133s][info][gc] GC(140) Pause Full (System.gc()) 4M->3M(252M) 4.566ms
[28.146s][info][gc] GC(141) Pause Full (FullGCAlot) 5M->3M(10M) 4.140ms
[28.158s][info][crac] Checkpoint ...
[28.570s][info][gc  ] GC(142) Pause Young (Concurrent Start) (G1 Evacuation Pause) 3M->3M(10M) 4.823ms
[28.570s][info][gc  ] GC(143) Concurrent Mark Cycle
[28.581s][info][gc  ] GC(143) Pause Remark 3M->3M(18M) 2.299ms
[28.582s][info][gc  ] GC(143) Pause Cleanup 3M->3M(18M) 0.008ms
[28.583s][info][gc  ] GC(143) Concurrent Mark Cycle 12.906ms
2025.07.16 14:29:40.952 INFO [0x6f7e84d6] http://0.0.0.0:8080 bound for socket '@default'
2025.07.16 14:29:40.953 INFO Restored all channels in 1 milliseconds. 103 milliseconds since JVM snapshot restore. Java 25-internal-adhoc.timpushkin.crac
[1752668980986ms] ################# benchmark after C/R start  #################
[28.624s][info][gc  ] GC(144) Pause Young (Prepare Mixed) (G1 Evacuation Pause) 5M->4M(18M) 0.755ms
[28.629s][info][gc  ] GC(145) Pause Young (Mixed) (G1 Evacuation Pause) 8M->4M(20M) 0.669ms
[28.639s][info][gc  ] GC(146) Pause Young (Mixed) (G1 Evacuation Pause) 10M->4M(20M) 2.089ms
[28.653s][info][gc  ] GC(147) Pause Young (Normal) (G1 Evacuation Pause) 14M->4M(136M) 0.739ms
[28.758s][info][gc  ] GC(148) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.600ms
[28.866s][info][gc  ] GC(149) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.676ms
[28.968s][info][gc  ] GC(150) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.621ms
[29.072s][info][gc  ] GC(151) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.689ms
[29.177s][info][gc  ] GC(152) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.648ms
[29.282s][info][gc  ] GC(153) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.635ms
[29.388s][info][gc  ] GC(154) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.608ms
[29.494s][info][gc  ] GC(155) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.596ms
[29.601s][info][gc  ] GC(156) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.614ms
[29.707s][info][gc  ] GC(157) Pause Young (Normal) (G1 Evacuation Pause) 82M->4M(136M) 0.633ms
[1752668982088ms] ################# benchmark after C/R finish #################


</details>

<details>

<summary>W/ the change</summary>

The heap is shrunk from 252MB to 14MB, there are no mixed GCs or preparations for them.


[1752668544274ms] ################# wrk start  #################
[27.010s][info][gc] GC(121) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.671ms
[27.228s][info][gc] GC(122) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.684ms
[27.451s][info][gc] GC(123) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.636ms
[27.672s][info][gc] GC(124) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.656ms
[27.893s][info][gc] GC(125) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.665ms
[1752668545377ms] ################# wrk finish #################
2025.07.16 14:22:25.567 INFO Starting checkpoint
2025.07.16 14:22:25.572 INFO [0x341b3650] @default socket closed.
[28.221s][info][gc] GC(126) Pause Full (System.gc()) 89M->3M(34M) 7.894ms
[28.242s][info][gc] GC(127) Pause Full (System.gc()) 4M->3M(14M) 9.701ms
[28.262s][info][gc] GC(128) Pause Full (FullGCAlot) 4M->3M(14M) 7.277ms
[28.265s][info][crac] Checkpoint ...
2025.07.16 14:22:26.200 INFO [0x0334b7cb] http://0.0.0.0:8080 bound for socket '@default'
2025.07.16 14:22:26.202 INFO Restored all channels in 3 milliseconds. 93 milliseconds since JVM snapshot restore. Java 25-internal-adhoc.timpushkin.crac
[1752668546247ms] ################# wrk start  #################
[28.902s][info][gc  ] GC(129) Pause Young (Normal) (G1 Evacuation Pause) 11M->4M(14M) 1.228ms
[28.915s][info][gc  ] GC(130) Pause Young (Normal) (G1 Evacuation Pause) 10M->4M(14M) 0.535ms
[28.923s][info][gc  ] GC(131) Pause Young (Normal) (G1 Evacuation Pause) 10M->4M(14M) 0.579ms
[28.932s][info][gc  ] GC(132) Pause Young (Normal) (G1 Evacuation Pause) 10M->4M(14M) 0.553ms
[28.940s][info][gc  ] GC(133) Pause Young (Normal) (G1 Evacuation Pause) 10M->4M(132M) 0.719ms
[29.042s][info][gc  ] GC(134) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.669ms
[29.146s][info][gc  ] GC(135) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.627ms
[29.252s][info][gc  ] GC(136) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.629ms
[29.357s][info][gc  ] GC(137) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.675ms
[29.460s][info][gc  ] GC(138) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.656ms
[29.562s][info][gc  ] GC(139) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.615ms
[29.666s][info][gc  ] GC(140) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.618ms
[29.772s][info][gc  ] GC(141) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.663ms
[29.876s][info][gc  ] GC(142) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.626ms
[29.982s][info][gc  ] GC(143) Pause Young (Normal) (G1 Evacuation Pause) 80M->4M(132M) 0.643ms
[1752668547350ms] ################# wrk finish #################


</details>

<details>

<summary>W/ the change + MaxHeapFreeRatio=100</summary>

The heap is not shrunk at all, the amount of GCs before/after C/R is roughly the same.


[1752669161441ms] ################# wrk start  #################
[26.872s][info][gc] GC(133) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.666ms
[27.068s][info][gc] GC(134) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.648ms
[27.270s][info][gc] GC(135) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.697ms
[27.469s][info][gc] GC(136) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.687ms
[27.666s][info][gc] GC(137) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.671ms
[27.865s][info][gc] GC(138) Pause Young (Normal) (G1 Evacuation Pause) 152M->4M(252M) 0.606ms
[1752669162544ms] ################# wrk finish #################
2025.07.16 14:32:42.733 INFO Starting checkpoint
2025.07.16 14:32:42.736 INFO [0x2bde2f6b] @default socket closed.
[28.113s][info][gc] GC(139) Pause Full (System.gc()) 40M->3M(252M) 8.054ms
[28.125s][info][gc] GC(140) Pause Full (System.gc()) 4M->3M(252M) 4.707ms
[28.146s][info][gc] GC(141) Pause Full (FullGCAlot) 5M->3M(252M) 13.005ms
[28.148s][info][crac] Checkpoint ...
2025.07.16 14:32:43.336 INFO [0x0c20af4b] http://0.0.0.0:8080 bound for socket '@default'
2025.07.16 14:32:43.338 INFO Restored all channels in 3 milliseconds. 84 milliseconds since JVM snapshot restore. Java 25-internal-adhoc.timpushkin.crac
[1752669163387ms] ################# wrk start  #################
[28.966s][info][gc  ] GC(142) Pause Young (Normal) (G1 Evacuation Pause) 153M->4M(252M) 1.357ms
[29.170s][info][gc  ] GC(143) Pause Young (Normal) (G1 Evacuation Pause) 152M->3M(252M) 0.711ms
[29.372s][info][gc  ] GC(144) Pause Young (Normal) (G1 Evacuation Pause) 151M->3M(252M) 0.699ms
[29.573s][info][gc  ] GC(145) Pause Young (Normal) (G1 Evacuation Pause) 151M->3M(252M) 0.700ms
[29.774s][info][gc  ] GC(146) Pause Young (Normal) (G1 Evacuation Pause) 151M->4M(252M) 0.648ms
[1752669164490ms] ################# wrk finish #################


</details>

-------------

Commit messages:
 - Cleanup memory instead of uncommitting it on G1

Changes: https://git.openjdk.org/crac/pull/243/files
  Webrev: https://webrevs.openjdk.org/?repo=crac&pr=243&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8362418
  Stats: 57 lines in 10 files changed: 28 ins; 24 del; 5 mod
  Patch: https://git.openjdk.org/crac/pull/243.diff
  Fetch: git fetch https://git.openjdk.org/crac.git pull/243/head:pull/243

PR: https://git.openjdk.org/crac/pull/243


More information about the crac-dev mailing list