RFR: Deferred, batched, parallel Matrix cleanup
Aleksey Shipilev
shade at redhat.com
Fri May 5 18:51:36 UTC 2017
Hi,
The largest problem in our Matrix implementation is the footprint and associated
operation costs. With just 2K regions we have 4MB matrix that we need to at
least clean up sparsely by byte at the end of some phases. With larger number of
regions (e.g. to-gc1 has 8K), this problem is exacerbated further.
There are three things to do:
a) Defer region cleanup: this makes sure we can then...
b) Batch matrix cleanups: this allows more cache-friendly cleanups, see the
comment in SHMatrix::clean_batched. It is a good optimization in itself, but it
also allows to...
c) Parallelize matrix cleanups: lots of regions usually mean large heap, which
means more threads available. This will alleviate matrix cleanup costs. Note
that without batching, you cannot easily avoid false sharing there -- indeed,
this is why the patch performs better than current, already parallelised
recycling in partial GC.
Patch:
http://cr.openjdk.java.net/~shade/shenandoah/matrix-dbp/webrev.01/
Sample experiments on my desktop:
http://cr.openjdk.java.net/~shade/shenandoah/matrix-dbp/perf.txt
4x faster cleanups with "default" 2K regions
up to 18x faster cleanups with artificially high 32K regions
Testing: hotspot_gc_shenandoah, some benchmarks
Thanks,
-Aleksey
More information about the shenandoah-dev
mailing list