RFR 8087198: G1 card refinement: batching, sorting, joining, prefetching

Thu Jun 11 11:20:33 UTC 2015

Hi guys,

I have been experimenting a bit with G1 card refinement. Thought I¹d check
if anyone finds this interesting. ;)

Basically we currently process cards one by one even though they always
come in groups of at the very least G1UpdateBufferSize. The cards may or
may not follow a particularly cache friendly order, depending on how lucky
we are, and need to do things like seek the start of the card quite often,
as well as cutting arrays at the card intervals, even though many times
the batches of cards could be linearly scanned and follow a contiguous
order.

What my patch does is to first clean all the G1UpdateBufferSize cards and
potentially swap some with the hot card cache. Then after that, the cards
are sorted in address order for nicer cache interactions. Then the cards
are processed 16 at a time. The cards are joined together if consecutive,
and intervals with re-dirtied cards are split so that finding the start of
the card does not happen so often. The start of the next scanning interval
is prefetched while the current interval is scanned for references.

Webrev with demo:
http://cr.openjdk.java.net/~eosterlund/g1_experiments/card_refinement/webre
v.00/

RFE ID:
https://bugs.openjdk.java.net/browse/JDK-8087198

In the DaCapo benchmarks I could find for instance 3.2% speedup in both fop
(configured to achieve 1 ms latency with a small 64M heap) and h2 3%
speedup running default settings on 512M heap. The measurements were taken
after 40 warmup iterations, and then sampled the average of 10 iterations.

If somebody is interested in sponsoring something like this, it would be
interesting to see how this performs in your favourite benchmarks. I
suspect this kind of stuff is
more important when you deal with larger heaps.

Thanks,
/Erik