RFC (S): Prefetching during mark scans
Roman Kennke
rkennke at redhat.com
Wed Nov 2 13:12:50 UTC 2016
Hi,
this is very interesting. Some notes:
- You said the users of the bitmap improve. You're prefetching the oop
though. Would it be useful to prefetch the bitmap too?
- You're prefetching for read. However, most users also write. Maybe
prefetch for write too? That would be 2 different writes though: either
the copy location, and in another case the updating of refs.
Roman
Am Mittwoch, den 02.11.2016, 13:33 +0100 schrieb Aleksey Shipilev:
> Hi,
>
> This describes the work in progress, but I would like early
> feedbacks,
> because re-running perf experiments is tedious, and every little
> change
> there affects performance.
>
> Not a surprise that our GC blows the CPU caches when walking the
> heap.
> Within the mark phase, there is little we can do, because the object
> graph is random in worst case. But once we have marked, we have the
> marked addresses bitmap in our hands, which we scan *linearly*. Which
> means, knowing that we will access oop fields, headers, etc. while
> scanning that bitmap, we could prefetch oop contents in advance, long
> before we actually reference it.
>
> This is the prototype patch that affects only mark-compact via
> ShenandoahHeapRegion::marked_object_iterate:
> http://cr.openjdk.java.net/~shade/shenandoah/markscan-prefetch/webr
> ev.00/
>
> It does improve Full GC times significantly, because the users of
> marked
> bitmap (Calculate Addresses, Adjust Pointers, Copy Objects) improve:
> http://cr.openjdk.java.net/~shade/shenandoah/markscan-prefetch/pref
> etches
>
> Roman is exploring whether we can merge ShenandoahHeapRegion and
> ShehandoahHeap versions of marked_object_iterate, and I would
> forward-port the patch there. After that, the prefetching would also
> affect our regular concurrent GC (e.g. the scan for concurrent
> evacuation).
>
> Thanks,
> -Aleksey
>
More information about the shenandoah-dev
mailing list