RFC (S): Prefetching during mark scans

Wed Nov 2 13:12:50 UTC 2016

Hi,

this is very interesting. Some notes:
- You said the users of the bitmap improve. You're prefetching the oop
though. Would it be useful to prefetch the bitmap too?
- You're prefetching for read. However, most users also write. Maybe
prefetch for write too? That would be 2 different writes though: either
the copy location, and in another case the updating of refs.

Roman

Am Mittwoch, den 02.11.2016, 13:33 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> This describes the work in progress, but I would like early
> feedbacks,
> because re-running perf experiments is tedious, and every little
> change
> there affects performance.
> 
> Not a surprise that our GC blows the CPU caches when walking the
> heap.
> Within the mark phase, there is little we can do, because the object
> graph is random in worst case. But once we have marked, we have the
> marked addresses bitmap in our hands, which we scan *linearly*. Which
> means, knowing that we will access oop fields, headers, etc. while
> scanning that bitmap, we could prefetch oop contents in advance, long
> before we actually reference it.
> 
> This is the prototype patch that affects only mark-compact via
> ShenandoahHeapRegion::marked_object_iterate:
>   http://cr.openjdk.java.net/~shade/shenandoah/markscan-prefetch/webr
> ev.00/
> 
> It does improve Full GC times significantly, because the users of
> marked
> bitmap (Calculate Addresses, Adjust Pointers, Copy Objects) improve:
>   http://cr.openjdk.java.net/~shade/shenandoah/markscan-prefetch/pref
> etches
> 
> Roman is exploring whether we can merge ShenandoahHeapRegion and
> ShehandoahHeap versions of marked_object_iterate, and I would
> forward-port the patch there. After that, the prefetching would also
> affect our regular concurrent GC (e.g. the scan for concurrent
> evacuation).
> 
> Thanks,
> -Aleksey
>