RFR (S): Mark scan prefetch

Tue Nov 8 14:53:45 UTC 2016

BTW: Are you also planning to write-prefetch? Most users of this
routine are also writing (copy-object to a different location though..)
I'd expect even more benefit from that.

Also, what's the reason for not prefetching when doing accurate
traversal? (not relevant for mark-compact I guess)

Roman

Am Dienstag, den 08.11.2016, 12:47 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> Not a surprise that our GC blows the CPU caches when walking the
> heap.
> Within the mark phase, there is little we can do, because the object
> graph is random in worst case. But once we have marked, we have the
> marked addresses bitmap in our hands, which we scan *linearly*. Which
> means, knowing that we will access oop fields, headers, etc. while
> scanning that bitmap, we could prefetch oop contents in advance, long
> before we actually reference it.
> 
> The answer is to prefetch when we get the "mark" from the bitmap:
>  http://cr.openjdk.java.net/~shade/shenandoah/markscan-prefetch/webre
> v.01/
> 
> It does improve Full GC times significantly, because the users of
> marked
> bitmap (Calculate Addresses, Adjust Pointers, Copy Objects) improve:
>   http://cr.openjdk.java.net/~shade/shenandoah/markscan-prefetch/pref
> etches
> 
> Concurrent GC users (parallel_evacuate) are not affected, because
> there
> are more bottlenecks in them, e.g. CASing the fwdptr.
> 
> Testing: hotspot_gc_shenandoah, jcstress-all (quick), microbenchmarks
> 
> Thanks,
> -Aleksey
>