RFR: Deferred, batched, parallel Matrix cleanup

Fri May 5 18:51:36 UTC 2017

Hi,

The largest problem in our Matrix implementation is the footprint and associated
operation costs. With just 2K regions we have 4MB matrix that we need to at
least clean up sparsely by byte at the end of some phases. With larger number of
regions (e.g. to-gc1 has 8K), this problem is exacerbated further.

There are three things to do:

  a) Defer region cleanup: this makes sure we can then...

  b) Batch matrix cleanups: this allows more cache-friendly cleanups, see the
comment in SHMatrix::clean_batched. It is a good optimization in itself, but it
also allows to...

  c) Parallelize matrix cleanups: lots of regions usually mean large heap, which
means more threads available. This will alleviate matrix cleanup costs. Note
that without batching, you cannot easily avoid false sharing there -- indeed,
this is why the patch performs better than current, already parallelised
recycling in partial GC.

Patch:
  http://cr.openjdk.java.net/~shade/shenandoah/matrix-dbp/webrev.01/

Sample experiments on my desktop:
 http://cr.openjdk.java.net/~shade/shenandoah/matrix-dbp/perf.txt

 4x faster cleanups with "default" 2K regions
 up to 18x faster cleanups with artificially high 32K regions

Testing: hotspot_gc_shenandoah, some benchmarks

Thanks,
-Aleksey