RFR: RFR: Eliminate write-barrier assembly stub (part 2)
Aleksey Shipilev
shade at redhat.com
Mon Mar 12 09:51:31 UTC 2018
On 03/12/2018 10:38 AM, Roman Kennke wrote:
> Notice that the situation before was different: we'd have a fast path
> that avoided FPU altogether, and would still call the stub with FPU
> spilling, which was a bit braindead.
Um, yes. But we are optimizing the fast-path here: the part when we _do not_ enter the slowpath, and
we want to enjoy FPU spills in the generated code.
It helps to back-track a bit: did eliminating AsmWB regress our workloads? It should, right, if
saving FPU registers was the issue on slow path? It apparently wasn't the issue then, so let's
concentrate on making the real fast-path code fast with FPU spills. If AsmWB removal was
performance-sensitive, then we should backtrack it too, because the choice between doing the
callee-saving vector registers and enabling FPU spills is not trivial.
>>> However, I don't really understand the FPU spilling issue. In my mind,
>>> it *should* turn out something like:
>>>
>>> if (evac-in-progress && in_cset(obj)) {
>>> save_fpu_regs();
>>> call_runtime_stub();
>>> restore_fpu_regs();
>>> }
>>
>> Maybe, but I would not testify how C2 tracks the register dependencies. The trouble is, how do we
>> communicate that both branches do not affect XMM registers? Doing CallLeafNoFP to wb_stub is
>> supposed to do that, I think.
>
> CallLeaf tells C2 that the call might spoil FPU regs. C2/register
> allocator should be smart enough to notice the other (empty) paths are
> free of any FPU reg usage, wouldn't it?
Ah, "magic compiler" card! But the control flow would merge anyhow at some point, and I don't think
C2 is doing caller-saves on "rare" paths to free up registers (especially vector ones). So we are
where we are: making the no-brainer solution with CallLeafNoFP, and figuring out the vector register
spills in the stub itself.
-Aleksey
More information about the shenandoah-dev
mailing list