8234160: ZGC: Enable optimized mitigation for Intel jcc erratum in C2 load barrier
erik.osterlund at oracle.com
erik.osterlund at oracle.com
Fri Feb 14 16:46:42 UTC 2020
Hi everyone,
Eric Caspole helped me evaluate my solution thoroughly. Thanks a lot for
that. The results were good.
In practice, my patch mitigated almost all of the regressions in C2 code
caused by the JCC erratum microcode mitigation.
There were a few cases where the nop sprinkling did hurt a few percent,
possibly due to instruction cache pollution.
Because of the positive outcome of the experiments, I would like to
propose we go with my
opt-in & safety-first based solution that I already posted earlier.
I have talked to Vladimir, and I think we at this point in time agree
about doing this.
As a reminder, here is my webrev that I proposed (+ rebase that I just
made):
http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/
Thanks,
/Erik
On 11/25/19 4:31 PM, Vladimir Ivanov wrote:
> Hi Erik,
>
>>> But I'd include stubs as well. Many of them are extensively used
>>> from C2-generated code.
>>
>> Okay. Any specific stubs you have in mind?If there are some critical
>> ones, we can sprinkle some scope objects like I did in the ZGC code.
>
> There are intrinsics for compressed strings [1], numerous copy stubs
> [2], trigonometric functions [3].
>
> It would be unfortunate if we have to go over all that code and
> manually instrument all the places where problematic instructions are
> issued. Moreover, the process has to be repeated for new code being
> added over time.
>
>> I do have concerns though about injecting magic into the
>> MacroAssembler that tries to solve this automagically on the assembly
>> level, by having the assembler spit out different
>> instructions than you requested.
>> The following comment from assembler.hpp captures my thought exactly:
>>
>> 207: // The Abstract Assembler: Pure assembler doing NO optimizations
>> on the
>> 208: // instruction level; i.e., what you write is what you get.
>> 209: // The Assembler is generating code into a CodeBuffer.
>
> While I see that Assembler follows that (instruction per method),
> MacroAssembler does not: there are cases when generated code differ
> depending on runtime flags (e.g., verification code) or input values
> (e.g., whether AddressLiteral is reachable or not).
>
>> I think it is desirable to keep the property that when we tell the
>> *Assembler to generate a __ cmp(); __ jcc(); it will do exactly that.
>> When such assumptions break, any code that has calculated the size of
>> instructions, making assumptions about their size, will fail.
>> For example, any MachNode with hardcoded size() might underestimate
>> how much memory is really needed, and code such as nmethod entry
>> barriers
>> that have calculated the offset to the cmp immediate might suddenly
>> stop working because. There is similar code for oop maps where we
>> calculate offsets into mach nodes with oop maps to describe the PC
>> after a call, which will stop working:
>>
>> // !!!!! Special hack to get all types of calls to specify the byte
>> offset
>> // from the start of the call to the point where the return
>> address
>> // will point.
>> int MachCallStaticJavaNode::ret_addr_offset()
>> {
>> int offset = 5; // 5 bytes from start of call to where return
>> address points
>> offset += clear_avx_size();
>> return offset;
>> }
>>
>> Basically, I think you might be able to mitigate more branches on the
>> MacroAssembler layer, but I think it would also be more risky, as
>> code that was
>> not built for having random size will start failing, in places we
>> didn't think of.I can think of a few, and feel like there are
>> probably other places I have not thought about.
>>
>> So from that point of view, I think I would rather to this on Mach
>> nodes where it is safe, and I think we can catch the most important
>> ones there,
>> and miss a few branches that the macro assembler would have found
>> with magic, than apply it to all branches and hope we find all the
>> bugs due to unexpected magic.
>>
>> Do you agree? Or perhaps I misunderstood what you are suggesting.
>
> You raise a valid point: there are places in the VM which rely on
> hard-coded instruction sequences. If such instruction changes, all
> relevant places have to be adjusted. And JVM is already very cautious
> about such cases.
>
> I agree with you that MacroAssembler-based more risky, but IMO the
> risk is modest (few places are affected) and manageable (dedicated
> stress mode should greatly improve test effectiveness).
>
> My opinion is that if we are satisfied with the coverage C2 CFG
> instrumentation provides and don't expect any more work on
> mitigations, then there's no motivation in investing into
> MacroAssembler-based approach.
>
> Otherwise, there are basically 2 options:
>
> * "opt-in": explicitly mark all the places where mitigations are
> applied, by default nothing is mitigated
>
> * "opt-out": mitigate everything unless mitigations are explicitly
> disabled
>
> Both approaches provide fine-grained control over what's being
> mitigated, but with "opt-out" there's more code to care about: it's
> easy to miss important cases and too tempting to enable more than we
> are 100% certain about.
>
> Both can be applied to individual CFG nodes and make CFG
> instrumentation redundant.
>
> But if there's a need to instrument large portions of (macro)assembly
> code, then IMO opt-in adds too much in terms of work required, noise
> (on code level), maintenance, and burden for future code changes. So,
> I don't consider it as a feasible option in such situation.
>
> It looks like a mixture of opt-in (explicitly enable in some context:
> in C2 during code emission, particular stub generation, etc) and
> opt-out (on the level of individual instructions) gives the best of
> both approaches.
>
> But, again, if C2 CFG instrumentation is good enough, then it'll be a
> wasted effort.
>
> So, I envision 3 possible scenarios:
>
> (1) just instrument Mach IR and be done with it;
>
> (2) (a) start with Mach IR;
> (b) later it turns out that extensive portions of
> (macro)assembly code have to me instrumented (or, for example,
> C1/Interpreter)
> (c) implement MacroAssembler mitigations
>
> (3) start with MacroAssembler mitigations and be done with it
> * doesn't perclude gradual roll out across different subsystems
>
> Mach IR instrumentation (#1/#2) is the safest variant, but it may
> require more work.
>
> #3 is broadly applicable, but also riskier.
>
> What I don't consider as a viable option is C2 CFG instrumentation
> accompanied by numerous per-instruction mitigations scattered across
> the code base.
>
>>>> I have made a prototype, what this might look like and it looks
>>>> like this:
>>>> http://cr.openjdk.java.net/~eosterlund/8234160/webrev.01/
>>>
>>> Just one more comment: it's weird to see intel_jcc_erratum
>>> referenced in shared code. You could #ifdef it for x86-only, but
>>> it's much better to move the code to x86-specific location.
>>
>> Sure, I can move that to an x86 file and make it build only on x86_64.
>
> Yes, sounds good. But let's agree on general direction first.
>
> Best regards,
> Vladimir Ivanov
>
> [1]
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/macroAssembler_x86.hpp#l1666
>
> [2]
> http://hg.openjdk.java.net/jdk/jdk/file/623722a6aeb9/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
>
> [3] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/
> macroAssembler_x86_(sin|cos|...).cpp
>
More information about the hotspot-compiler-dev
mailing list