RFR: 8335356: Shenandoah: Improve concurrent cleanup locking

Xiaolong Peng xpeng at openjdk.org
Tue Jul 9 23:53:23 UTC 2024


Hi all,
   This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. 
    With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms.
    
   The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance:
    1. Less time spent on acquiring heap lock, less contention with mutators/allocators
    2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction
   
Here are some logs from test running h2 benchmark:

TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns

[6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0.
[6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345.
[9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141.
[9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334.
[12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438.
[12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575.
[15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520.
[15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycling -> 351263.
[18.781s][info][gc] GC(4) Recycled 119 regions in 244275ns, break down: acquiring lock -> 45741, recycling -> 92841.
[18.866s][info][gc] GC(4) Recycled 437 regions in 721638ns, break down: acquiring lock -> 162149, recycling -> 329194.
[20.735s][info][gc] GC(5) Recycled 119 regions in 244292ns, break down: acquiring lock -> 45992, recycling -> 93320.
[20.788s][info][gc] GC(5) Recycled 193 regions in 364782ns, break down: acquiring lock -> 73267, recycling -> 149695.
[21.699s][info][gc] GC(6) Recycled 0 regions in 92333ns, break down: acquiring lock -> 0, recycling -> 0.
[21.856s][info][gc] GC(6) Recycled 552 regions in 1372852ns, break down: acquiring lock -> 302017, recycling -> 621198.
[22.196s][info][gc] GC(7) Recycled 0 regions in 80586ns, break down: acquiring lock -> 0, recycling -> 0.
[22.361s][info][gc] GC(7) Recycled 531 regions in 1433166ns, break down: acquiring lock -> 365550, recycling -> 632302.
[22.720s][info][gc] GC(8) Recycled 0 regions in 74306ns, break down: acquiring lock -> 0, recycling -> 0.
[22.898s][info][gc] GC(8) Recycled 530 regions in 1331485ns, break down: acquiring lock -> 299571, recycling -> 620722.
[23.147s][info][gc] GC(9) Recycled 0 regions in 77732ns, break down: acquiring lock -> 0, recycling -> 0.
[23.331s][info][gc] GC(9) Recycled 531 regions in 1361709ns, break down: acquiring lock -> 311774, recycling -> 653233.
[24.257s][info][gc] GC(10) Recycled 0 regions in 62440ns, break down: acquiring lock -> 0, recycling -> 0.
[24.460s][info][gc] GC(10) Recycled 1480 regions in 3611397ns, break down: acquiring lock -> 906760, recycling -> 1729345.
[25.407s][info][gc] GC(11) Recycled 1 regions in 61455ns, break down: acquiring lock -> 585, recycling -> 1362.
[25.597s][info][gc] GC(11) Recycled 1438 regions in 2313578ns, break down: acquiring lock -> 546099, recycling -> 1126582.


Optimized, but w/o batching optimization, basically recycle all trash with one single lock acquirement , [code link](https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2), Average time per region: 560 ns

[6.097s][info][gc] GC(0) Recycled 641 regions in 280280ns, break down: filtering -> 20216, taking heap lock -> 569, recycling -> 259495.
[9.568s][info][gc] GC(1) Recycled 1 regions in 10592ns, break down: filtering -> 8695, taking heap lock -> 568, recycling -> 1329.
[9.643s][info][gc] GC(1) Recycled 600 regions in 260624ns, break down: filtering -> 8023, taking heap lock -> 736, recycling -> 251865.
[12.648s][info][gc] GC(2) Recycled 34 regions in 24651ns, break down: filtering -> 9739, taking heap lock -> 524, recycling -> 14388.
[12.706s][info][gc] GC(2) Recycled 552 regions in 252231ns, break down: filtering -> 26613, taking heap lock -> 624, recycling -> 224994.
[15.579s][info][gc] GC(3) Recycled 102 regions in 50140ns, break down: filtering -> 7846, taking heap lock -> 550, recycling -> 41744.
[15.662s][info][gc] GC(3) Recycled 461 regions in 187735ns, break down: filtering -> 8851, taking heap lock -> 479, recycling -> 178405.
[19.269s][info][gc] GC(4) Recycled 117 regions in 55504ns, break down: filtering -> 8709, taking heap lock -> 548, recycling -> 46247.
[19.360s][info][gc] GC(4) Recycled 437 regions in 187981ns, break down: filtering -> 9582, taking heap lock -> 505, recycling -> 177894.
[21.269s][info][gc] GC(5) Recycled 124 regions in 57666ns, break down: filtering -> 8986, taking heap lock -> 537, recycling -> 48143.
[21.327s][info][gc] GC(5) Recycled 190 regions in 85890ns, break down: filtering -> 7768, taking heap lock -> 494, recycling -> 77628.
[22.367s][info][gc] GC(6) Recycled 547 regions in 378074ns, break down: filtering -> 11634, taking heap lock -> 714, recycling -> 365726.
[22.733s][info][gc] GC(7) Recycled 2 regions in 27277ns, break down: filtering -> 24172, taking heap lock -> 741, recycling -> 2364.
[22.895s][info][gc] GC(7) Recycled 533 regions in 339216ns, break down: filtering -> 12006, taking heap lock -> 778, recycling -> 326432.
[23.213s][info][gc] GC(8) Recycled 1 regions in 28104ns, break down: filtering -> 25781, taking heap lock -> 722, recycling -> 1601.
[23.393s][info][gc] GC(8) Recycled 529 regions in 341289ns, break down: filtering -> 10257, taking heap lock -> 682, recycling -> 330350.
[23.861s][info][gc] GC(9) Recycled 523 regions in 339914ns, break down: filtering -> 12715, taking heap lock -> 746, recycling -> 326453.
[25.120s][info][gc] GC(10) Recycled 1515 regions in 1148153ns, break down: filtering -> 13906, taking heap lock -> 739, recycling -> 1133508.
[25.873s][info][gc] GC(11) Recycled 1 regions in 13233ns, break down: filtering -> 11375, taking heap lock -> 568, recycling -> 1290.
[26.109s][info][gc] GC(11) Recycled 1237 regions in 493244ns, break down: filtering -> 12178, taking heap lock -> 557, recycling -> 480509.



With batch size of 128, [code link](https://github.com/openjdk/jdk/pull/20086/commits/ece3f3c7612560e1b66ca1e56e896488e3c593bc), Average time per region: 533 ns

[6.066s][info][gc] GC(0) Recycled 641 regions in 290048ns, break down: filtering -> 20937, recycling -> 257691, yields -> 6088.
[9.514s][info][gc] GC(1) Recycled 1 regions in 12863ns, break down: filtering -> 9186, recycling -> 1255, yields -> 1487.
[9.591s][info][gc] GC(1) Recycled 601 regions in 285321ns, break down: filtering -> 11941, recycling -> 265095, yields -> 5540.
[12.590s][info][gc] GC(2) Recycled 35 regions in 27005ns, break down: filtering -> 9873, recycling -> 14893, yields -> 1341.
[12.650s][info][gc] GC(2) Recycled 551 regions in 231127ns, break down: filtering -> 10833, recycling -> 212840, yields -> 5054.
[15.504s][info][gc] GC(3) Recycled 101 regions in 54762ns, break down: filtering -> 9759, recycling -> 42579, yields -> 1500.
[15.591s][info][gc] GC(3) Recycled 466 regions in 197675ns, break down: filtering -> 9672, recycling -> 181928, yields -> 4095.
[19.231s][info][gc] GC(4) Recycled 121 regions in 58985ns, break down: filtering -> 8601, recycling -> 47931, yields -> 1565.
[19.322s][info][gc] GC(4) Recycled 439 regions in 186173ns, break down: filtering -> 9754, recycling -> 170179, yields -> 4269.
[21.204s][info][gc] GC(5) Recycled 120 regions in 59352ns, break down: filtering -> 8120, recycling -> 48746, yields -> 1602.
[21.257s][info][gc] GC(5) Recycled 191 regions in 89995ns, break down: filtering -> 8695, recycling -> 77421, yields -> 2596.
[22.291s][info][gc] GC(6) Recycled 550 regions in 344021ns, break down: filtering -> 7845, recycling -> 325227, yields -> 7251.
[22.789s][info][gc] GC(7) Recycled 535 regions in 352193ns, break down: filtering -> 11420, recycling -> 328723, yields -> 8067.
[23.265s][info][gc] GC(8) Recycled 530 regions in 344795ns, break down: filtering -> 12356, recycling -> 321523, yields -> 7389.
[23.731s][info][gc] GC(9) Recycled 526 regions in 268314ns, break down: filtering -> 11727, recycling -> 248182, yields -> 5403.
[24.913s][info][gc] GC(10) Recycled 1520 regions in 1035594ns, break down: filtering -> 15586, recycling -> 995272, yields -> 16585.
[25.827s][info][gc] GC(11) Recycled 1 regions in 12540ns, break down: filtering -> 9191, recycling -> 1162, yields -> 1286.
[25.997s][info][gc] GC(11) Recycled 1406 regions in 593511ns, break down: filtering -> 11703, recycling -> 566761, yields -> 10335.


Batch with timed lock up to 30us, PR version, Average time per region: 1118 ns

[6.103s][info][gc] GC(0) Recycled 0 regions in 9421ns with 0 batches.
[6.189s][info][gc] GC(0) Recycled 641 regions in 570953ns with 18 batches.
[9.481s][info][gc] GC(1) Recycled 1 regions in 11745ns with 1 batches.
[9.552s][info][gc] GC(1) Recycled 597 regions in 495402ns with 16 batches.
[12.295s][info][gc] GC(2) Recycled 35 regions in 36924ns with 1 batches.
[12.353s][info][gc] GC(2) Recycled 546 regions in 443037ns with 14 batches.
[15.226s][info][gc] GC(3) Recycled 100 regions in 92537ns with 3 batches.
[15.310s][info][gc] GC(3) Recycled 463 regions in 423945ns with 14 batches.
[19.031s][info][gc] GC(4) Recycled 118 regions in 107655ns with 4 batches.
[19.121s][info][gc] GC(4) Recycled 440 regions in 359861ns with 12 batches.
[21.094s][info][gc] GC(5) Recycled 125 regions in 108325ns with 4 batches.
[21.155s][info][gc] GC(5) Recycled 191 regions in 192925ns with 6 batches.
[22.038s][info][gc] GC(6) Recycled 0 regions in 23493ns with 0 batches.
[22.213s][info][gc] GC(6) Recycled 574 regions in 748833ns with 23 batches.
[22.548s][info][gc] GC(7) Recycled 1 regions in 24498ns with 1 batches.
[22.716s][info][gc] GC(7) Recycled 532 regions in 684182ns with 21 batches.
[23.232s][info][gc] GC(8) Recycled 1 regions in 14411ns with 1 batches.
[23.455s][info][gc] GC(8) Recycled 528 regions in 715436ns with 22 batches.
[23.766s][info][gc] GC(9) Recycled 0 regions in 12247ns with 0 batches.
[23.982s][info][gc] GC(9) Recycled 703 regions in 685842ns with 22 batches.
[24.557s][info][gc] GC(10) Recycled 1 regions in 15890ns with 1 batches.
[24.760s][info][gc] GC(10) Recycled 1142 regions in 1506585ns with 47 batches.
[25.524s][info][gc] GC(11) Recycled 1 regions in 12424ns with 1 batches.
[25.731s][info][gc] GC(11) Recycled 1262 regions in 1695506ns with 52 batches.


Decided on batch with timed lock for following reasons:
1. We can manage exactly how long it holds the heap lock(set to up to 30us), therefore manage exactly how much it could impact on long tail latencies in the worst case.
2. Not like static batch size, with timed lock the algorithm is adaptive to different hardwares/runtime, batch size will be automatically adjusted.  

Additional test:
- [x]  ```make clean test TEST=hotspot_gc_shenandoah```

==============================
   TEST                                              TOTAL  PASS  FAIL ERROR   
   jtreg:test/hotspot/jtreg:hotspot_gc_shenandoah      261   261     0     0   
==============================
TEST SUCCESS

-------------

Commit messages:
 - Code polish
 - Remove logs
 - Increase deadline to 30us
 - Timed lock
 - Remove inline and code polish
 - Polish
 - Remove empty lines
 - Polish the loops for recycling trash regions
 - Fix -Werror=reorder
 - Pre-allocate array for trash regions used in recycle_trash
 - ... and 9 more: https://git.openjdk.org/jdk/compare/a9b7f42f...dac1ae6e

Changes: https://git.openjdk.org/jdk/pull/20086/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20086&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335356
  Stats: 18 lines in 2 files changed: 14 ins; 1 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/20086.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20086/head:pull/20086

PR: https://git.openjdk.org/jdk/pull/20086


More information about the hotspot-gc-dev mailing list