RFR: SATB compaction hides unmarked objects until final-mark
Roman Kennke
rkennke at redhat.com
Tue Jun 19 13:51:53 UTC 2018
Wow, very nice.
Patch looks ok.
We need to see and decide soon-ish what we do with our divergences in
SATB ptr queue. Fork it? Upstream the changes? But not now.
Roman
> http://cr.openjdk.java.net/~shade/shenandoah/satb-prompt/webrev.02/
>
> Current SATB filtering code is striving to avoid enqueueing buffers from SATB barriers into the
> global list, if that buffer contains a lot of non-interesting objects. In G1/Shenandoah case, it
> filters out already marked objects, does two-finger compaction, and then decides if it wants to
> return the buffer back to mutator to fill with more data. In many cases, it returns the pristine
> buffer back.
>
> But, it comes with an interesting caveat: if there is an unmarked object surrounded by
> already-marked objects that get filtered all the time, there is a significant chance that unmarked
> objects would be never shown to the GC code. In Shenandoah, we would discover that object only
> during final-mark, when we drain all SATB buffers, regardless of filtering.
>
> In some interesting workloads, that hidden object might be a large oop array, scanning which affects
> final-mark times. Also, even if object is not very heavy-weight, marking it eagerly makes the
> subsequent filtering more efficient. There is a significant chance that we would touch bitmaps on
> filter-compact all the time for objects below enqueueing threshold.
>
> The way out of this is to cap the number of times we take the "not-enqueue" shortcut, and enqueue
> the buffer when that cap is reached. I chose 50 taken shortcuts as the threshold that works well in
> my experiments.
>
> (My very first experiment was taking time since last enqueue as the threshold. That feels more
> reliable, but it queries time on critical path, and that was a potential scalability bottleneck.)
>
> For example, one of our benchmarks:
>
> Before:
>
> Pause Final Mark (G) = 1.22 s (a = 10895 us)
> Pause Final Mark (N) = 0.95 s (a = 8483 us)
> Finish Queues = 0.84 s (a = 7458 us)
> Weak References = 0.02 s (a = 739 us)
> Process = 0.02 s (a = 733 us)
> Prepare Evacuation = 0.06 s (a = 515 us)
> Initial Evacuation = 0.03 s (a = 300 us)
> E: Thread Roots = 0.02 s (a = 194 us)
> E: Code Cache Roots = 0.00 s (a = 43 us)
>
> After:
>
> Pause Final Mark (G) = 0.06 s (a = 2677 us)
> Pause Final Mark (N) = 0.03 s (a = 1217 us)
> Finish Queues = 0.01 s (a = 248 us) <--- (1)
> Weak References = 0.00 s (a = 361 us) <--- (2)
> Process = 0.00 s (a = 355 us)
> Prepare Evacuation = 0.01 s (a = 491 us)
> Initial Evacuation = 0.01 s (a = 365 us)
> E: Thread Roots = 0.00 s (a = 182 us)
> E: Code Cache Roots = 0.00 s (a = 38 us)
>
> (1): Significantly less final-mark queue work, because most hidden objects are now discovered during
> concurrent mark
> (2): Apparently, concurrent precleaning works better, because more hidden objects got marked in
> concurrent phase, and then concurrent precleaning piggybacked on it.
>
> Testing: tier3_gc_shenandoah, specjbb
>
> Thanks,
> -Aleksey
>
More information about the shenandoah-dev
mailing list