Allocation pacing and graceful degradation in ShenandoahGC

Alex Dubrouski adubrouski at linkedin.com
Tue Nov 22 19:49:48 UTC 2022


Hi Ashutosh,

Thanks a lot for your response. I checked the wiki but it did not contain details, so had to take a look into source code.
Plus, I noticed one thing – Wiki mentions that degenerated STW GC continues concurrent cycle, while checking code shows that logic is more complex:
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp#L134

Thanks again.

Regards,
Alex Dubrouski

From: Ashutosh Mehra <asmehra at redhat.com>
Date: Tuesday, November 22, 2022 at 9:14 AM
To: Alex Dubrouski <adubrouski at linkedin.com>
Cc: "shenandoah-dev at openjdk.org" <shenandoah-dev at openjdk.org>
Subject: Re: Allocation pacing and graceful degradation in ShenandoahGC

Hi Alex,
I have spent some time understanding the Shenandoah Pacer and I will try to answer your questions as best I can.

Does that mean that each Java thread goes throw runtime -> heap to allocate, and that's how pacer paces it? So we just pace any allocating thread and threads that allocate more will just hit this code more often.

Allocations from tlab do not go through pacer, but allocating a new tlab does go through the pacer.  And yes a thread allocating more is more likely to hit the pacer.

so I assume if there is no budget available it will pace a thread for up to 10ms, but it does not imply allocation failure.

Yes, it does not imply allocation failure. It is just a mechanism to ensure concurrent gc is able to keep pace with the allocation rate.

 Heap class tries to allocate under lock and if unsuccessful considers this as allocation failure and handles it by calling ShenandoahControlThread. Does it mean that Pacer can’t cause GC to switch to degenerated mode or I am missing something?

Pacer itself does not cause GC to degenerate. It only delays the mutator thread. As you mentioned earlier, after the expiry of wait time the mutator thread would still attempt allocation which may succeed.
If the allocation rate is high, pacer may not be able to cope up, and in that case the mutator thread may suffer allocation failure which would result in running a degenerated GC cycle.

 If Pacer doesn't have budget to allocate memory it paces thread, but is there any global budget for pacing time or it is only per thread max (ShenandoahPacingMaxDelay)?

The wait time introduced by Pacer is per thread and bounded by ShenandoahPacingMaxDelay. I don't think there is any global budget for pacing time.

It would really nice if you can sched some light on these transitions: Concurrent Mode -> Pacing (single thread and total pacing time for all threads) -> most importantly logic of transitioning from pacing to degenerated GC

I will try to summarize the transition to degenerated GC.
Allocation failures are signaled by the mutator thread by setting a flag _alloc_failure_gc [0] in ShenandoahControlThread::handle_alloc_failure() and then it waits on _alloc_failure_waiters_lock [1] for notification from the control thread
after it has handled the allocation failure.
The control thread executing run_service() checks if the flag _alloc_failure_gc is set [2], if so it indicates a pending allocation failure. It then tries to handle alloc failure by running either a Degenerated GC or a Full GC cycle. That decision depends on ShenandoahHeuristics::should_degenerate_cycle() which performs a simple counter check for the number of consecutive degenerated GC cycles.

There are some details in the wiki [4] for pacing and degenerated gc in case you have not looked at that.

I hope this helps you to move forward in your effort.

[0] https://github.com/openjdk/jdk/blob/fba763f82528d2825831a26b4ae4e090c602208f/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L531<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Ffba763f82528d2825831a26b4ae4e090c602208f%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahControlThread.cpp%23L531&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421521483%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cmRF%2F%2FFwQIVUoLAHjQs7HuhLBgEDyvV100UQWjon3pc%3D&reserved=0>
[1] https://github.com/openjdk/jdk/blob/fba763f82528d2825831a26b4ae4e090c602208f/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L543<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Ffba763f82528d2825831a26b4ae4e090c602208f%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahControlThread.cpp%23L543&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JfLvsngY6UyLiR%2F5MWjcQA%2Bl8WTevoiAe8lK1072MYc%3D&reserved=0>
[2] https://github.com/openjdk/jdk/blob/fba763f82528d2825831a26b4ae4e090c602208f/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L100<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Ffba763f82528d2825831a26b4ae4e090c602208f%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahControlThread.cpp%23L100&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yRea2fdAzXYhr2azE9KoCwOMQNlRterUn05xP45RcdY%3D&reserved=0>
[3] https://github.com/openjdk/jdk/blob/fba763f82528d2825831a26b4ae4e090c602208f/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L127<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Ffba763f82528d2825831a26b4ae4e090c602208f%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahControlThread.cpp%23L127&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nvt4GJ4Rq%2FY2wOVBwt6nqYCAoz%2F4UWjkW613jZDz2qo%3D&reserved=0>
[4] https://wiki.openjdk.org/display/shenandoah/Main<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.openjdk.org%2Fdisplay%2Fshenandoah%2FMain&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X6XKEvWm%2FMtY%2FQ9uDsKkc05YD3IeUevDYgXSsWyWtME%3D&reserved=0>

Thanks,
Ashutosh Mehra


On Mon, Nov 14, 2022 at 4:07 PM Alex Dubrouski <adubrouski at linkedin.com<mailto:adubrouski at linkedin.com>> wrote:
Good afternoon everyone,

I checked all video presentations and slides by Alex Shipilev and Roman Kennke about ShenandoahGC to find the answer for my question with no luck. I am trying to find more details about transitions between modes in ShenandoahGC
I am looking for solution to assess concurrent collector health in real time using different metrics.

Here is the schema of transitions, and allocation failure causes degenerated GC cycle, but it does not mention allocation pacing at all:
https://github.com/openjdk/jdk/blob/fba763f82528d2825831a26b4ae4e090c602208f/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L361<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Ffba763f82528d2825831a26b4ae4e090c602208f%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahControlThread.cpp%23L361&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=syJBev7PZMNFYU5y9vQhQpJqZCZzFcyg0j9aniS%2BExg%3D&reserved=0>

I tried to dig further into this logic, but need your help to put all the pieces together
I was not able to effectively trace entry point, but this might work, allocation on heap outside of TLAB:
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/memAllocator.cpp#L258<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Fmaster%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshared%2FmemAllocator.cpp%23L258&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Rp4PwOJSLF%2F11u3O5pbdVb5qbVQIP%2FfQX4sTwJft4dc%3D&reserved=0>
in case of ShenandoahGC I assume we call
https://github.com/openjdk/jdk/blame/739769c8fc4b496f08a92225a12d07414537b6c0/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L901<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblame%2F739769c8fc4b496f08a92225a12d07414537b6c0%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahHeap.cpp%23L901&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yEcGmHjCW%2Bkai2Zenmn%2BUwvtsuNH0CC9AiNZUOGMXkQ%3D&reserved=0>
which then calls
https://github.com/openjdk/jdk/blame/739769c8fc4b496f08a92225a12d07414537b6c0/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L821<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblame%2F739769c8fc4b496f08a92225a12d07414537b6c0%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahHeap.cpp%23L821&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1rtA5PwKkWXJjczTw0AfGr8Ac1zCELNdz%2F%2B29gB3SMg%3D&reserved=0>
if mutator is allocating and pacer enabled (default) we enter Pacer:
https://github.com/openjdk/jdk/blame/739769c8fc4b496f08a92225a12d07414537b6c0/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L828<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblame%2F739769c8fc4b496f08a92225a12d07414537b6c0%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahHeap.cpp%23L828&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Usi%2FYvLHmn9XHCPrgEsPKjzEi0midxwFlG80b%2Fbg9Z4%3D&reserved=0>
https://github.com/openjdk/jdk/blame/master/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L229<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblame%2Fmaster%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahPacer.cpp%23L229&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MVBpmjH64QjLsV0IuJFWvvlUQ2v5WFrSSh49FQByhH0%3D&reserved=0>
and I assume try to handle it nicely, if not we start pacing:
https://github.com/openjdk/jdk/blame/master/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L253<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblame%2Fmaster%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahPacer.cpp%23L253&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PjRH5PIijsSGcKYTsF%2BjBmrF6i1LdfVKT0tjvvloh3c%3D&reserved=0>

I have few questions here:
- Could you please explain a bit how the system of taxes works?  I assume mutators claim budget, while GC replenishes it async, but the details are missing and no comments in the code
- To pace we use wait function from Monitor class
https://github.com/openjdk/jdk/blame/master/src/hotspot/share/runtime/mutex.cpp#L232<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblame%2Fmaster%2Fsrc%2Fhotspot%2Fshare%2Fruntime%2Fmutex.cpp%23L232&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=is4CcL21QgK8N8ucoDZlPS8q%2FUj6RxAVfEryn375nFs%3D&reserved=0>
  but the first thing it gets current Java thread. Does that mean that each Java thread goes throw runtime -> heap to allocate, and that's how pacer paces it? So we just pace any allocating thread and threads that allocate more will just hit this code more often.

- Pacer uses ShenandoahPacingMaxDelay (10ms) as max, but pace_for_allocation returns void
https://github.com/openjdk/jdk/blob/fba763f82528d2825831a26b4ae4e090c602208f/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L826<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Ffba763f82528d2825831a26b4ae4e090c602208f%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahHeap.cpp%23L826&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=e7SMAWcJTgZm8s%2FWgBPu1H9RipquXgQ%2FnW3x7XFeYpg%3D&reserved=0>
https://github.com/openjdk/jdk/blame/master/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L225<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblame%2Fmaster%2Fsrc%2Fhotspot%2Fshare%2Fgc%2Fshenandoah%2FshenandoahPacer.cpp%23L225&data=05%7C01%7Cadubrouski%40linkedin.com%7Cbf74d9dbec17428024d908daccacf25c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638047340421678150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=93OjuodWHs6xDy9xg9Fd0CvdLVH07qr2G72Vz1UhZYM%3D&reserved=0>
so I assume if there is no budget available it will pace a thread for up to 10ms, but it does not imply allocation failure.
Heap class tries to allocate under lock and if unsuccessful considers this as allocation failure and handles it by calling ShenandoahControlThread. Does it mean that Pacer can’t cause GC to switch to degenerated mode or I am missing something?

- If Pacer doesn't have budget to allocate memory it paces thread, but is there any global budget for pacing time or it is only per thread max (ShenandoahPacingMaxDelay)?
- It would really nice if you can sched some light on these transitions: Concurrent Mode -> Pacing (single thread and total pacing time for all threads) -> most importantly logic of transitioning from pacing to degenerated GC

I am trying to build a model which can tell me whether GC is healthy (fully concurrent), a bit unhealthy (pacing), unhealthy (degenerated or full GC) and how close are to the edge of the next state (a bit unhealthy -> unhealthy)

No rush and thanks a lot in advance.

Regards,
Alex Dubrouski

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/shenandoah-dev/attachments/20221122/8cd39f3e/attachment-0001.htm>


More information about the shenandoah-dev mailing list