RFR: Match barrier fastpath checks better

Wed Jan 10 11:45:37 UTC 2018

Am 09.01.2018 um 16:28 schrieb Aleksey Shipilev:
> http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/
> (Roland made the draft revision of this patch last year)
> 
> Current barrier fastpath checks the flags like this:
> 
>     0x0: movzbl 0x3d8(%r15),%r10d ; check evac-in-progress
>    +0x8: test   %r10d,%r10d
>    +0xB: jne    SLOW-PATH
>   +0x11: ...
> 
> This wastes the register %reg, which is bad when barriers are back-to-back and register pressure is
> high. The fix trivially folds the checks against memory with byte-sized immediates with cmpb, so the
> resulting code is register-less and shorter:
> 
>     0x0: cmpb   $0x0,0x3d8(%r15)
>    +0x8: jne    SLOW-PATH
>    +0xE: ...
> 
> This follows similar .ad patterns that fold particular cmp shapes, and the fix would be upstreamed
> separately. We would like to have this in Shenandoah repos for more thorough testing. "Unsigned"
> shape covers Shenandoah WB checks, and "signed" covers SATB checks. (Amusingly, this affects C2, but
> not C1, which generates cmpb for cases like these.) We actually need only tests against zero-es, but
> there is nothing that prevents us to check for the entire range of bytes.
> 
> Regular benchmarks are affected very little, with some tiny improvements -- because barriers there
> are already well-optimized. But in cases where barriers are not optimized(-able), the improvement is
> substantial. For example, in recent SPSCQueue benchmarks [1], the score improved around +50%.
> 
> Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm
> 
> Thanks,
> -Aleksey
> 
> [1] http://cr.openjdk.java.net/~shade/shenandoah/jctools-QueueThroughputBackoffNone.txt
> 

I tested it with traversal GC. It works and doesn't crash. It doesn't 
seem faster. But traversal GC is handicapped anyway until we get some 
proper optimizations.

Roman