RFR: 8338534: GenShen: Handle alloc failure differently when immediate garbage is pending

Y. Srinivas Ramakrishna ysr at openjdk.org
Fri Aug 30 23:29:52 UTC 2024


On Wed, 21 Aug 2024 14:48:55 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> Several changes are implemented here:
> 1. Re-order the phases that execute immediately after final-mark so that we do concurrent-cleanup quicker (but still after concurrent weak references)
> 2. After immediate garbage has been reclaimed by concurrent cleanup, notify waiting allocators
> 3. If an allocation failure occurs while immediate garbage recycling is pending, stall the allocation but do not cancel the concurrent gc.

As you suggested offline, I agree that it might make sense to do (1) separately, and then do (2+3).

Left a few comments mainly on (2+3), but I'm still missing the solution to the problem described in the ticket. I'll chat with you offline to get clarity.

src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1071:

> 1069:   ShenandoahHeap* heap = ShenandoahHeap::heap();
> 1070:   if (heap->free_set()->recycle_trash()) {
> 1071:     heap->control_thread()->notify_alloc_failure_waiters(false);

Can you motivate this notification? As far as I can tell, all waiters will react to the notification by waking up, finding that the variables are still set, and consequently go back to wait.

I am sure I am missing something here, or you didn't make an intended change to allow waiters to retry allocation after waking up and go back to sleep if they didn't succeed?

A documentation comment would definitely help cross the t's and dot the i's for the reader.

src/hotspot/share/gc/shenandoah/shenandoahController.cpp line 56:

> 54: }
> 55: 
> 56: void ShenandoahController::anticipate_immediate_garbage(size_t anticipated_immediate_garbage) {

Suggested rename see further above. `set_<field>`.

src/hotspot/share/gc/shenandoah/shenandoahController.cpp line 104:

> 102:     _alloc_failure_gc.unset();
> 103:     _humongous_alloc_failure_gc.unset();
> 104:   }

For good hygiene, I'd move the variable value changes into the monitor which is held when waiting or notifying. I realize this doesn't matter for correctness, but makes debugging easier.

Further, if you protect the updates and reads of the variables with the lock, you don't need to do the extra atomic ops.

You'd need to examine all sets/gets and waits/notifys to make sure this works, but I am guessing it will, and it'll also improve performance.

However, that can be done in a separate effort, if you prefer, for which I'm happy to file a separate ticket for that investigation/change.

src/hotspot/share/gc/shenandoah/shenandoahController.hpp line 76:

> 74:   void handle_alloc_failure(ShenandoahAllocRequest& req, bool block);
> 75: 
> 76:   void anticipate_immediate_garbage(size_t anticipated_immediate_garbage_words);

a 1-line documentation comment on the role of the field (and that the method sets it -- why not simply call it `set_foo(value)` for field `_foo` ? I realize you want readers to get the most recently written value, hence the atomic store & load.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1243:

> 1241:   }
> 1242:   return true;
> 1243: }

If I understood your intent, I think this has a bug because it always returns true here. I believe you just wanted:


    if (r->is_trash()) {
      r->recycle();
      return true;
    }
   return false;

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1246:

> 1244: 
> 1245: bool ShenandoahFreeSet::recycle_trash() {
> 1246:   bool result = false;

I'd take the opportunity to do some counting verification here.


int n_trash_regions = 0;


Read on ...

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1267:

> 1265:     while (idx < count && os::javaTimeNanos() < deadline) {
> 1266:       if (try_recycle_trashed(_trash_regions[idx++])) {
> 1267:         result = true;

...

   n_trash_regions++;

...

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1272:

> 1270:   }
> 1271:   _heap->control_thread()->anticipate_immediate_garbage((size_t) 0);
> 1272:   return result;

...
``` 
clear_anticipated_immediate_garage(n_trash_regions*HeapRegionSize);
   return n_trash_regions > 0;

with

void ...::clear_anticipated_immediate_garbage(size_t aig) {
    assert(_anticipated_immediate_garbage == aig, "Mismatch?");
    _anticipated_immediate_garbage = 0;
}


Is this an intended invariant? I think it is, but don't understand enough of the details to be certain.

src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 756:

> 754:     size_t first_old, last_old, num_old;
> 755:     heap->free_set()->prepare_to_rebuild(young_cset_regions, old_cset_regions, first_old, last_old, num_old);
> 756:     size_t anticipated_immediate_garbage = (old_cset_regions + young_cset_regions) * ShenandoahHeapRegion::region_size_words();

This makes it sound like old_cset_regions & young_cset_regions hold all regions that will be part of `immediate garbage` -- which indeed is the case. The name `prepare_to_rebuild()` does not give one much clue as to the fact that that's happening when we return from the call, and the API spec of the method does not explicitly specify it. One needs to read into the code of the method `find_regions_with_alloc_capacity()` to realize that this is what is happening.

In summary, firstly, I feel some of these methods are in need of tighter header file documentation of post-conditions that callers are relying on and, secondly, I feel given the extremely fat APIs (lots of reference variables that are modified by these methods) that some amount of refactoring is needed in the longer term. The refactoring should be a separate effort, but in the shorter term I think the API/spec documentation of `prepare_to_rebuild` and `find_regions_with_alloc_capacity` could be improved.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2467:

> 2465:   size_t anticipated_immediate_garbage = (old_cset_regions + young_cset_regions) * ShenandoahHeapRegion::region_size_words();
> 2466:   control_thread()->anticipate_immediate_garbage(anticipated_immediate_garbage);
> 2467:   

This is the line that has two whitespaces, vide the jcheck whitespace error above.

src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 478:

> 476:     heap->free_set()->prepare_to_rebuild(cset_young_regions, cset_old_regions, first_old, last_old, num_old);
> 477:     size_t anticipated_immediate_garbage = (cset_young_regions + cset_old_regions) * ShenandoahHeapRegion::region_size_words();
> 478:     heap->control_thread()->anticipate_immediate_garbage(anticipated_immediate_garbage);

suggest `set_<foo>` for changing field value `<foo>`.

-------------

PR Review: https://git.openjdk.org/shenandoah/pull/479#pullrequestreview-2270677989
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737591476
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737612723
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737584419
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737559164
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737618171
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737602478
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737603682
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737609657
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1739499541
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1737601762
PR Review Comment: https://git.openjdk.org/shenandoah/pull/479#discussion_r1739511544


More information about the shenandoah-dev mailing list