Shenandoah WB and tableswitch

Tue Dec 19 18:11:00 UTC 2017

I think I have zeroed in on at least one issue with WBs. Successively dissecting the problematic
workloads first yields the workload like this, derived from UTF-8 decoders in JDK:

http://icedtea.classpath.org/hg/gc-bench/file/d04b4bbbc39f/src/main/java/org/openjdk/gcbench/wip/WriteBarrierUTF8Scan.java

...and then a minimal version of the same:

http://icedtea.classpath.org/hg/gc-bench/file/d04b4bbbc39f/src/main/java/org/openjdk/gcbench/wip/WriteBarrierTableSwitch.java

Now, running it with current sh/jdk10 yields interesting results.

First, running with C1:

------------------------------------------------------------------------------
Benchmark                         (size)  Mode  Cnt     Score    Error  Units

# Parallel, -XX:TieredStopAtLevel=1
WriteBarrierTableSwitch.common      1000  avgt   15  2137.543 ± 9.084  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  2260.783 ± 6.355  ns/op

# Shenandoah passive, -XX:TieredStopAtLevel=1
WriteBarrierTableSwitch.common      1000  avgt   15  2144.273 ± 7.565  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  2270.335 ± 6.433  ns/op

# Shenandoah passive, -XX:TieredStopAtLevel=1, -XX:+ShenandoahWriteBarrier
WriteBarrierTableSwitch.common      1000  avgt   15  2613.767 ± 29.567  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  2670.697 ±  8.822  ns/op
------------------------------------------------------------------------------

Everything seems to be in order: passive Shenandoah is as fast as Parallel, and enabling WBs makes
everything consistently slower, because there are writes to cbuf array all the time.

With C2 the picture gets murkier:

------------------------------------------------------------------------------
Benchmark                         (size)  Mode  Cnt     Score    Error  Units

# Parallel, -XX:-TieredCompilation
WriteBarrierTableSwitch.common      1000  avgt   15  1518.773 ±  3.962  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  2302.127 ± 49.734  ns/op

# Shenandoah passive, -XX:-TieredCompilation
WriteBarrierTableSwitch.common      1000  avgt   15  1575.086 ±  4.616  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  2832.982 ± 70.375  ns/op

# Shenandoah passive, -XX:-TieredCompilation, -XX:+ShenandoahWriteBarrier
WriteBarrierTableSwitch.common      1000  avgt   15  1499.475 ± 38.896  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  3135.664 ± 11.811  ns/op
--------------------------------------------------------------------------------

First of all, why does Shenandoah passive perform worse than Parallel even without barriers? That
one is explained by interaction with counted loop safepoints / loop strip mining, see:

------------------------------------------------------------------------------
Benchmark                         (size)  Mode  Cnt     Score    Error  Units

# Shenandoah passive, -XX:-TieredCompilation, -XX:-UseCountedLoopSafepoints
WriteBarrierTableSwitch.common      1000  avgt   15  1526.821 ±  7.644  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  2327.750 ± 73.020  ns/op
------------------------------------------------------------------------------

It is still weird to see CLS/LSM pessimize this case so much.

Then, why does "separate" regresses when WB is enabled, and "common" does not regress? Perfasm
suggests that in "common" case we are able to hoist the WB out of the loop, and this is why there is
no +WB impact. We failed to do the same with "separate", for some reason. Disabling CLS/LSM helps
just a little:

------------------------------------------------------------------------------
Benchmark                         (size)  Mode  Cnt     Score    Error  Units

WriteBarrierTableSwitch.common      1000  avgt   15  1535.884 ± 21.498  ns/op
WriteBarrierTableSwitch.separate    1000  avgt   15  2876.315 ± 43.569  ns/op
------------------------------------------------------------------------------

This pinpoints at least one problem with WBs that impact Stringy/UTF-8-y code we have in benchmarks.

Thanks,
-Aleksey