Shenandoah WB fastpath and optimizations

Tue Dec 19 14:19:33 UTC 2017

On 12/19/2017 03:14 PM, Roland Westrelin wrote:
>> Thoughts?
> 
> Could the 18M stores be spills and somewhere in the 77..101M extra loads
> would be their counterpart spill loads? The WB needs at least one extra
> register and there's also the possibility that the WB slow path messes
> up the register allocator heuristics (as we've seen with the XMM
> spills).

Could be, and it was my base theory at some point. But I'd expect more loads to manifest more
reliably. As such, we seem to be very well within the L1-load budget to account for WB loads.

In fact, I wanted to ask you what would it take to teach C2 to emit C1-style check, e.g. instead of:

    movzbl 0x3d8(%rTLS), %rScratch  ; read evac-in-progress
    test %rScratch, %rScratch
    jne EVAC-ENABLED-SLOW-PATH
    mov -0x8(%rObj), %rObj          ; read barrier

...do:

    cmpb 0x3d8(%TLS), 0             ; read evac-in-progress
    jne EVAC-ENABLED-SLOW-PATH
    mov -0x8(%rObj), %rObj          ; read barrier

...thus freeing up the register?

Thanks,
-Aleksey