RFR: RFR: Eliminate write-barrier assembly stub (part 2)

Roman Kennke roman at kennke.org
Mon Mar 12 09:38:13 UTC 2018


Am 12.03.2018 um 10:25 schrieb Aleksey Shipilev:
> On 03/12/2018 10:17 AM, Roman Kennke wrote:
>> Am 12.03.2018 um 10:03 schrieb Aleksey Shipilev:
>>> On 03/10/2018 12:15 PM, Roman Kennke wrote:
>>>> Differential:
>>>> http://cr.openjdk.java.net/~rkennke/eliminate_wb_stub-c2/webrev.02.diff/
>>>> Full patch:
>>>> http://cr.openjdk.java.net/~rkennke/eliminate_wb_stub-c2/webrev.02/
>>>
>>> Oh, I see why we cannot remove the stub: FPU spills.
>>>
>>> shenandoah_wb_C is getting called with CallLeafNoFPNode for a reason: it makes the compiler to
>>> believe the stub is not using FPU/XMM registers, and thus it can spill registers to them. When the
>>> actual barrier hits, our shenandoah_wb_C saves the FPU/XMM registers before calling to the runtime
>>> (and that call would use XMM for, say, object copying). See:
>>>   http://hg.openjdk.java.net/shenandoah/jdk/rev/842e412a3f86
>>
>> But the actual stub now does nothing to C2. I.e. C2 compiled code would
>> jump to the stub, the stub would shuffle registers, spill FPU and so on
>> and then straigt call into runtime.
> 
> Yes, but from perspective of C2, the shenandoah_wb_C does not use XMM registers (it caller-saves all
> of them), and so we can use FP spills around the barrier and its slowpath. This is *not* about the
> asm stub we removed, see the current code:
> 
>     __ save_vector_registers();  <--- !!!!
>     __ movptr(rdi, rax);
>     __ call_VM_leaf(CAST_FROM_FN_PTR(address, ShenandoahBarrierSet::write_barrier_JRT), rdi);
>     __ restore_vector_registers();   <--- !!!!
> 
> And this happens inside the stub, so we can just call shenandoah_wb_C with CallLeafNoFP, and
> everything works out fine.
> 

It sortof is. The asm assembly stuff ensured that we could do the fast
path without touching and FPU regs. Now we *always* jump into the
runtime and thus *always* have to push/pop the FPU regs. I'd argue that
the compiler/register allocator would know better which registers to
push/pop, or if to push/pop any registers at all.

Notice that the situation before was different: we'd have a fast path
that avoided FPU altogether, and would still call the stub with FPU
spilling, which was a bit braindead.

>> However, I don't really understand the FPU spilling issue. In my mind,
>> it *should* turn out something like:
>>
>> if (evac-in-progress && in_cset(obj)) {
>>   save_fpu_regs();
>>   call_runtime_stub();
>>   restore_fpu_regs();
>> }
> 
> Maybe, but I would not testify how C2 tracks the register dependencies. The trouble is, how do we
> communicate that both branches do not affect XMM registers? Doing CallLeafNoFP to wb_stub is
> supposed to do that, I think.

CallLeaf tells C2 that the call might spoil FPU regs. C2/register
allocator should be smart enough to notice the other (empty) paths are
free of any FPU reg usage, wouldn't it?

>> It's probably worth to inspect the generated assembly code to look for
>> problems. Do you happen to remember which program was affected by the
>> FPU spilling issue?
> 
> http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-December/004468.html

Thanks. I will thoroughly check it.

>>> The patch is at least inconsistent here (should be Op_CallLeaf):
>>>
>>> 4242     if (n->Opcode() == Op_CallLeafNoFP && n->as_Call()->_entry_point ==
>>> CAST_FROM_FN_PTR(address, ShenandoahBarrierSet::write_barrier_JRT)) {
>>>

I fixed it locally. Will RFR it, depending on findings of more testing.

Thanks, Roman



More information about the shenandoah-dev mailing list