RFR: Traveral GC heuristics

Tue Jan 23 20:41:10 UTC 2018

Am 23.01.2018 um 17:06 schrieb Aleksey Shipilev:
> On 01/22/2018 11:17 PM, Roman Kennke wrote:
>> - when growing the heap, we must make sure that the TAMS points to end for the new regions,
>> otherwise we'd treat them implicitely marked.
> 
> Is this the source of this spooky change?
> 
>     inline ShenandoahHeapRegion* get(size_t i) const {
> -    assert (i < _active_end, "sanity");
> +    assert (i < _reserved_end, "sanity");
>       return _regions[i];
>     }
> 
> get() is supposed to only return added regions. I think if you return something farther the
> _active_end, you read garbage.

All SHR are created and added to the regions list at the start. Which 
means iterating to active_end actually does what I wanted. Reverted back 
that change.

>> - folded 'SATB' queue processing into thread stack scanning. The problem here is that iterating
>> the threads 2x is cumbersome because of the claiming protocol: we need to fire the task/workers
>> 2x: once for the SATB queues, once for the thread scanning. I folded it into one pass. This
>> required a (trivial) extension in the upstream parallel thread scanning/iteration protocol.
> 
> So this is the source for extension of RootProcessor and Thread methods? I am a bit uneasy about
> this (mostly because it raises backporting questions). I'd rather include this into RootProcessor
> right away, and assert nothing passes non-NULL ThreadClosure there. Then, sh/jdk8u, sh/jdk9 and
> sh/jdk10 versions would agree on the shape of RootProcessor methods and the calls to it, while
> sh/jdk10 would call RootProcessor with non-NULL ThreadClosure, and it would *also* implement the
> relevant parts in Thread.

Done in separate patch.

>> - I tripped an assert in SHR::increase_live_data(). I think the reason is that we have a race
>> here: a GC thread might not yet see the updated SHR::_top but already accounts for the updated
>> live data. I excluded conc-traversal from that check. This could probably be fixed by doing the
>> proper concurrency membars, but do we care? For assertion code?
> 
> But wait, this change means "s" is greater than max_jint on some paths during Traversal?!
> 
>   inline void ShenandoahHeapRegion::increase_live_data_words(size_t s) {
> -  assert (s <= (size_t)max_jint, "sanity");
> +  assert (s <= (size_t)max_jint || _heap->is_concurrent_traversal_in_progress(), "sanity");
>     increase_live_data_words((int)s);
>   }

I removed the change to that assert. We only need the other one.

> Also, I am confused where Traversal calls increase_live_data_words(size_t), because both call sites
> are already protected:
> 
>      if (!sh->is_concurrent_traversal_in_progress()) {
>        r->increase_live_data_words(used_words);
>      }
> 
> ...
> 
>      if (!ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) {
>        r->increase_live_data_words(word_size);
>      }

It's called from Traveral's own code.

> Since we have "adaptive" failures with C2 and/or -ShWBMemBar, I propose we chicken out, and drop all
> C2 changes (apart from the actual enqueue_barrier) from this change, then follow up on optimization
> story in subsequent changesets. This way we could integrated Traversal GC, and not risk immediate
> regression in non-Traversal code.
> 
> This is done, along with other minor touchups here (apply over webrev.04):
>    http://cr.openjdk.java.net/~shade/shenandoah/traversal-shade-updates-1.patch

Cool, thanks.

Differential patch:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/
Full patch, including your changes:
http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/

(give it some seconds to fully upload)

Roman