RFR: Deferred, batched, parallel Matrix cleanup

Fri May 5 19:58:50 UTC 2017

Cool! Go!

Am 5. Mai 2017 20:51:36 MESZ schrieb Aleksey Shipilev <shade at redhat.com>:
>Hi,
>
>The largest problem in our Matrix implementation is the footprint and
>associated
>operation costs. With just 2K regions we have 4MB matrix that we need
>to at
>least clean up sparsely by byte at the end of some phases. With larger
>number of
>regions (e.g. to-gc1 has 8K), this problem is exacerbated further.
>
>There are three things to do:
>
>  a) Defer region cleanup: this makes sure we can then...
>
>b) Batch matrix cleanups: this allows more cache-friendly cleanups, see
>the
>comment in SHMatrix::clean_batched. It is a good optimization in
>itself, but it
>also allows to...
>
>c) Parallelize matrix cleanups: lots of regions usually mean large
>heap, which
>means more threads available. This will alleviate matrix cleanup costs.
>Note
>that without batching, you cannot easily avoid false sharing there --
>indeed,
>this is why the patch performs better than current, already
>parallelised
>recycling in partial GC.
>
>Patch:
>  http://cr.openjdk.java.net/~shade/shenandoah/matrix-dbp/webrev.01/
>
>Sample experiments on my desktop:
> http://cr.openjdk.java.net/~shade/shenandoah/matrix-dbp/perf.txt
>
> 4x faster cleanups with "default" 2K regions
> up to 18x faster cleanups with artificially high 32K regions
>
>Testing: hotspot_gc_shenandoah, some benchmarks
>
>Thanks,
>-Aleksey

-- 
Sent from my FairPhone