RFR: Traveral GC heuristics

Tue Jan 23 16:06:20 UTC 2018

On 01/22/2018 11:17 PM, Roman Kennke wrote:
> - when growing the heap, we must make sure that the TAMS points to end for the new regions, 
> otherwise we'd treat them implicitely marked.

Is this the source of this spooky change?

   inline ShenandoahHeapRegion* get(size_t i) const {
-    assert (i < _active_end, "sanity");
+    assert (i < _reserved_end, "sanity");
     return _regions[i];
   }

get() is supposed to only return added regions. I think if you return something farther the
_active_end, you read garbage.

> - folded 'SATB' queue processing into thread stack scanning. The problem here is that iterating 
> the threads 2x is cumbersome because of the claiming protocol: we need to fire the task/workers
> 2x: once for the SATB queues, once for the thread scanning. I folded it into one pass. This
> required a (trivial) extension in the upstream parallel thread scanning/iteration protocol.

So this is the source for extension of RootProcessor and Thread methods? I am a bit uneasy about
this (mostly because it raises backporting questions). I'd rather include this into RootProcessor
right away, and assert nothing passes non-NULL ThreadClosure there. Then, sh/jdk8u, sh/jdk9 and
sh/jdk10 versions would agree on the shape of RootProcessor methods and the calls to it, while
sh/jdk10 would call RootProcessor with non-NULL ThreadClosure, and it would *also* implement the
relevant parts in Thread.

> - I tripped an assert in SHR::increase_live_data(). I think the reason is that we have a race 
> here: a GC thread might not yet see the updated SHR::_top but already accounts for the updated 
> live data. I excluded conc-traversal from that check. This could probably be fixed by doing the
> proper concurrency membars, but do we care? For assertion code?

But wait, this change means "s" is greater than max_jint on some paths during Traversal?!

 inline void ShenandoahHeapRegion::increase_live_data_words(size_t s) {
-  assert (s <= (size_t)max_jint, "sanity");
+  assert (s <= (size_t)max_jint || _heap->is_concurrent_traversal_in_progress(), "sanity");
   increase_live_data_words((int)s);
 }

Also, I am confused where Traversal calls increase_live_data_words(size_t), because both call sites
are already protected:

    if (!sh->is_concurrent_traversal_in_progress()) {
      r->increase_live_data_words(used_words);
    }

...

    if (!ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) {
      r->increase_live_data_words(word_size);
    }

> Full:
> http://cr.openjdk.java.net/~rkennke/traversal/webrev.04/

Since we have "adaptive" failures with C2 and/or -ShWBMemBar, I propose we chicken out, and drop all
C2 changes (apart from the actual enqueue_barrier) from this change, then follow up on optimization
story in subsequent changesets. This way we could integrated Traversal GC, and not risk immediate
regression in non-Traversal code.

This is done, along with other minor touchups here (apply over webrev.04):
  http://cr.openjdk.java.net/~shade/shenandoah/traversal-shade-updates-1.patch

Thanks,
-Aleksey