8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier

Thu Nov 21 08:40:12 UTC 2019

Hi Paul,

On 2019-11-20 23:22, Hohensee, Paul wrote:
> It's not just the zgc load barrier that's affected, it's every jcc and fuzed jcc (e.g., cmp/jcc and sub/jcc, because the pairs are issued on the same clock).

Yeah. We also have to mitigate against unconditional branches, like ret 
and jmp. So I suppose the name "jcc erratum"
is slightly misleading in this context.

> There's a code pattern attribute called ins_alignment(<n>) in the ad file, vis
>
> ins_attrib ins_alignment(1);    // Required alignment attribute (must
>                                  // be a power of 2) specifies the
>                                  // alignment that some part of the
>                                  // instruction (not necessarily the
>                                  // start) requires.  If > 1, a
>                                  // compute_padding() function must be
>                                  // provided for the instruction
>
> Would it be possible to use/enhance ins_alignment() rather than do something zgc-specific?
> Thanks,

That is a good question. Unfortunately, there are a few problems 
applying such a strategy:

1) We do not want to constrain the alignment such that the instruction 
(+ specific offset) sits at e.g. the beginning of a 32 byte boundary. We 
want to be more loose and say that any alignment is fine... except the 
bad ones (crossing and ending at a 32 byte boundary). Otherwise I fear 
we will find ourselves bloating the code cache with unnecessary nops to 
align instructions that would never have been a problem. So in terms of 
alignment constraints, I think such a hammer is too big.
2) Another issue is that the alignment constraints apply not just to the 
one Mach node. It's sometimes for a fused op + jcc. Since we currently 
match the conditions and their branches separately (and the conditions 
not necessarily knowing they are indeed conditions to a branch, like for 
example an and instruction). So aligning the jcc for example is not 
necessarily going to help, unless its alignment knows what its preceding 
instruction is, and whether it will be fused or not. And depending on 
that, we want different alignment properties. So here the hammer is 
seemingly too loose.

I'm not 100% sure what to suggest for the generic case, but perhaps:

After things stopped moving around, add a pass to the Mach nodes, 
similar to branch shortening that:

1) Set up a new flag (Flags_intel_jcc_mitigation or something) to be 
used on Mach nodes to mark affected nodes.
2) Walk the Mach nodes and tag branches and conditions used by fused 
branches (by walking edges), checking that the two are adjacent (by 
looking at the node index in the block), and possibly also checking that 
it is one of the affected condition instructions that will get fused.
3) Now that we know what Mach nodes (and sequences of macro fused nodes) 
are problematic, we can put some code where the mach nodes are emitted 
that checks for consecutively tagged nodes and inject nops in the code 
buffer if they cross or end at 32 byte boundaries.

I suppose an alternative strategy is making sure that any problematic 
instruction sequence that would be fused, is also fused into one Mach 
node by sprinkling more rules in the AD file for the various forms of 
conditional branches that we think cover all the cases, and then 
applying the alignment constraint on individual nodes only. But it feels 
like that could be more intrusive and less efficient).

Since the generic problem is more involved compared to the simpler ZGC 
load barrier fix (which will need special treatment anyway), I would 
like to focus this RFE only on the ZGC load barrier branch, because it 
makes me sad when it has to suffer. Having said that, we will certainly 
look into fixing the generic problem too after this.

Thanks,
/Erik