8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier
Erik Österlund
erik.osterlund at oracle.com
Thu Nov 21 08:40:12 UTC 2019
Hi Paul,
On 2019-11-20 23:22, Hohensee, Paul wrote:
> It's not just the zgc load barrier that's affected, it's every jcc and fuzed jcc (e.g., cmp/jcc and sub/jcc, because the pairs are issued on the same clock).
Yeah. We also have to mitigate against unconditional branches, like ret
and jmp. So I suppose the name "jcc erratum"
is slightly misleading in this context.
> There's a code pattern attribute called ins_alignment(<n>) in the ad file, vis
>
> ins_attrib ins_alignment(1); // Required alignment attribute (must
> // be a power of 2) specifies the
> // alignment that some part of the
> // instruction (not necessarily the
> // start) requires. If > 1, a
> // compute_padding() function must be
> // provided for the instruction
>
> Would it be possible to use/enhance ins_alignment() rather than do something zgc-specific?
> Thanks,
That is a good question. Unfortunately, there are a few problems
applying such a strategy:
1) We do not want to constrain the alignment such that the instruction
(+ specific offset) sits at e.g. the beginning of a 32 byte boundary. We
want to be more loose and say that any alignment is fine... except the
bad ones (crossing and ending at a 32 byte boundary). Otherwise I fear
we will find ourselves bloating the code cache with unnecessary nops to
align instructions that would never have been a problem. So in terms of
alignment constraints, I think such a hammer is too big.
2) Another issue is that the alignment constraints apply not just to the
one Mach node. It's sometimes for a fused op + jcc. Since we currently
match the conditions and their branches separately (and the conditions
not necessarily knowing they are indeed conditions to a branch, like for
example an and instruction). So aligning the jcc for example is not
necessarily going to help, unless its alignment knows what its preceding
instruction is, and whether it will be fused or not. And depending on
that, we want different alignment properties. So here the hammer is
seemingly too loose.
I'm not 100% sure what to suggest for the generic case, but perhaps:
After things stopped moving around, add a pass to the Mach nodes,
similar to branch shortening that:
1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be
used on Mach nodes to mark affected nodes.
2) Walk the Mach nodes and tag branches and conditions used by fused
branches (by walking edges), checking that the two are adjacent (by
looking at the node index in the block), and possibly also checking that
it is one of the affected condition instructions that will get fused.
3) Now that we know what Mach nodes (and sequences of macro fused nodes)
are problematic, we can put some code where the mach nodes are emitted
that checks for consecutively tagged nodes and inject nops in the code
buffer if they cross or end at 32 byte boundaries.
I suppose an alternative strategy is making sure that any problematic
instruction sequence that would be fused, is also fused into one Mach
node by sprinkling more rules in the AD file for the various forms of
conditional branches that we think cover all the cases, and then
applying the alignment constraint on individual nodes only. But it feels
like that could be more intrusive and less efficient).
Since the generic problem is more involved compared to the simpler ZGC
load barrier fix (which will need special treatment anyway), I would
like to focus this RFE only on the ZGC load barrier branch, because it
makes me sad when it has to suffer. Having said that, we will certainly
look into fixing the generic problem too after this.
Thanks,
/Erik
More information about the hotspot-compiler-dev
mailing list