RFR (S): Chunked array processing should first push the continuation

Fri Oct 28 09:18:01 UTC 2016

Hi,

This is one of those "LOL" performance bugs. If you profile the
ArrayFragger test [1] that eventually scans a large array, you will
notice that TaskQueues are the hotspots with lots of stealing. If you
wonder why, this is why: in chunked processing we *first* process our
chunk, and then let others know we have more work (of course, next thing
you know, pulling that work under their feet).

The solution is to first fork out the continuation, and then process our
own chunk in solitude:

http://cr.openjdk.java.net/~shade/shenandoah/concmark-cont-first/webrev.01/

Improves the stress test in question by very much:

Benchmark         (ldsMB) (objSize)  Mode  Cnt    Score    Error  Units

# Before
ArrayFragger.test    500       100  avgt  100  903.449 ± 23.912  ns/op

# After
ArrayFragger.test     500      100  avgt  100  581.849 ± 53.288  ns/op

Testing: hotspot_gc_shenandoah

Thanks,
-Aleksey

[1]
http://cr.openjdk.java.net/~shade/shenandoah/shenandoah-gc-bench/src/main/java/org/openjdk/shenandoah/fragger/ArrayFragger.java