8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier

Mon Nov 25 15:31:03 UTC 2019

Hi Erik,

>> But I'd include stubs as well. Many of them are extensively used from 
>> C2-generated code.
> 
> Okay. Any specific stubs you have in mind?If there are some critical 
> ones, we can sprinkle some scope objects like I did in the ZGC code.

There are intrinsics for compressed strings [1], numerous copy stubs 
[2], trigonometric functions [3].

It would be unfortunate if we have to go over all that code and manually 
instrument all the places where problematic instructions are issued. 
Moreover, the process has to be repeated for new code being added over time.

> I do have concerns though about injecting magic into the MacroAssembler 
> that tries to solve this automagically on the assembly level, by having 
> the assembler spit out different
> instructions than you requested.
> The following comment from assembler.hpp captures my thought exactly:
> 
> 207: // The Abstract Assembler: Pure assembler doing NO optimizations on 
> the
> 208: // instruction level; i.e., what you write is what you get.
> 209: // The Assembler is generating code into a CodeBuffer.

While I see that Assembler follows that (instruction per method), 
MacroAssembler does not: there are cases when generated code differ 
depending on runtime flags (e.g., verification code) or input values 
(e.g., whether AddressLiteral is reachable or not).

> I think it is desirable to keep the property that when we tell the 
> *Assembler to generate a __ cmp(); __ jcc(); it will do exactly that.
> When such assumptions break, any code that has calculated the size of 
> instructions, making assumptions about their size, will fail.
> For example, any MachNode with hardcoded size() might underestimate how 
> much memory is really needed, and code such as nmethod entry barriers
> that have calculated the offset to the cmp immediate might suddenly stop 
> working because. There is similar code for oop maps where we
> calculate offsets into mach nodes with oop maps to describe the PC after 
> a call, which will stop working:
> 
> // !!!!! Special hack to get all types of calls to specify the byte offset
> //       from the start of the call to the point where the return address
> //       will point.
> int MachCallStaticJavaNode::ret_addr_offset()
> {
>    int offset = 5; // 5 bytes from start of call to where return address 
> points
>    offset += clear_avx_size();
>    return offset;
> }
> 
> Basically, I think you might be able to mitigate more branches on the 
> MacroAssembler layer, but I think it would also be more risky, as code 
> that was
> not built for having random size will start failing, in places we didn't 
> think of.I can think of a few, and feel like there are probably other 
> places I have not thought about.
> 
> So from that point of view, I think I would rather to this on Mach nodes 
> where it is safe, and I think we can catch the most important ones there,
> and miss a few branches that the macro assembler would have found with 
> magic, than apply it to all branches and hope we find all the bugs due 
> to unexpected magic.
> 
> Do you agree? Or perhaps I misunderstood what you are suggesting.

You raise a valid point: there are places in the VM which rely on 
hard-coded instruction sequences. If such instruction changes, all 
relevant places have to be adjusted. And JVM is already very cautious 
about such cases.

I agree with you that MacroAssembler-based more risky, but IMO the risk 
is modest (few places are affected) and manageable (dedicated stress 
mode should greatly improve test effectiveness).

My opinion is that if we are satisfied with the coverage C2 CFG 
instrumentation provides and don't expect any more work on mitigations, 
then there's no motivation in investing into MacroAssembler-based approach.

Otherwise, there are basically 2 options:

   * "opt-in": explicitly mark all the places where mitigations are 
applied, by default nothing is mitigated

   * "opt-out": mitigate everything unless mitigations are explicitly 
disabled

Both approaches provide fine-grained control over what's being 
mitigated, but with "opt-out" there's more code to care about: it's easy 
to miss important cases and too tempting to enable more than we are 100% 
certain about.

Both can be applied to individual CFG nodes and make CFG instrumentation 
redundant.

But if there's a need to instrument large portions of (macro)assembly 
code, then IMO opt-in adds too much in terms of work required, noise (on 
code level), maintenance, and burden for future code changes. So, I 
don't consider it as a feasible option in such situation.

It looks like a mixture of opt-in (explicitly enable in some context: in 
C2 during code emission, particular stub generation, etc) and opt-out 
(on the level of individual instructions) gives the best of both approaches.

But, again, if C2 CFG instrumentation is good enough, then it'll be a 
wasted effort.

So, I envision 3 possible scenarios:

   (1) just instrument Mach IR and be done with it;

   (2) (a) start with Mach IR;
       (b) later it turns out that extensive portions of (macro)assembly 
code have to me instrumented (or, for example, C1/Interpreter)
       (c) implement MacroAssembler mitigations

   (3) start with MacroAssembler mitigations and be done with it
      * doesn't perclude gradual roll out across different subsystems

Mach IR instrumentation (#1/#2) is the safest variant, but it may 
require more work.

#3 is broadly applicable, but also riskier.

What I don't consider as a viable option is C2 CFG instrumentation 
accompanied by numerous per-instruction mitigations scattered across the 
code base.

>>> I have made a prototype, what this might look like and it looks like 
>>> this:
>>> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/
>>
>> Just one more comment: it's weird to see intel_jcc_erratum referenced 
>> in shared code. You could #ifdef it for x86-only, but it's much better 
>> to move the code to x86-specific location.
> 
> Sure, I can move that to an x86 file and make it build only on x86_64.

Yes, sounds good. But let's agree on general direction first.

Best regards,
Vladimir Ivanov

[1] 
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l1666

[2] 
http://hg.openjdk.java.net/jdk/jdk/file/623722a6aeb9/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp

[3] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/
     macroAssembler_x86_(sin|cos|...).cpp