RFR: 8272493: Suboptimal code generation around Preconditions.checkIndex intrinsic with AVX2
Yi Yang
yyang at openjdk.java.net
Thu Mar 10 09:02:40 UTC 2022
On Thu, 10 Mar 2022 08:19:50 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
> IMO the hotspot seems to be too conservative and yet not cover all the cases regarding the generation of `vzeroupper`. Given the assembler itself doesn't emit SSE legacy code on AVX machines, this instruction coud be emitted only on transition to native/VM code (and maybe the interpreter?). The current state generates `vzeroupper` on every function return and function call if 256-bit vector is involved, which is less than optimal. On the other hand, `clear_upper_avx` only emits `vzeroupper` on AVX2, when we clearly have 256-bit vectors on AVX1?
>
> Please correct me if I miss something important here, thanks.
Hi @merykitty, sorry I didn't fully understand what you meant, do you mean the JVM is missing vzeroupper somewhere? Can you elaborate more on where we are able to emit them while we don't do that now?
> The current state generates vzeroupper on every function return and function call if 256-bit vector is involved, which is less than optimal.
It seems that currently we only emit vzeroupper on some function return or function call when they set clear_upper_avx flag or max_vector_size >16:
// Call
enc_class clear_avx %{
if (generate_vzeroupper(Compile::current())) {
MacroAssembler _masm(&cbuf);
__ vzeroupper();
}
%}
instruct CallDynamicJavaDirect(method meth)
%{
match(CallDynamicJava);
effect(USE meth);
ins_cost(300);
format %{ "movq rax, #Universe::non_oop_word()\n\t"
"call,dynamic " %}
ins_encode(clear_avx, Java_Dynamic_Call(meth), call_epilog);
ins_pipe(pipe_slow);
ins_alignment(4);
%}
instruct CallRuntimeDirect(method meth)
%{
match(CallRuntime);
effect(USE meth);
ins_cost(300);
format %{ "call,runtime " %}
ins_encode(clear_avx, Java_To_Runtime(meth));
ins_pipe(pipe_slow);
%}
instruct CallLeafDirect(method meth)
%{
match(CallLeaf);
effect(USE meth);
ins_cost(300);
format %{ "call_leaf,runtime " %}
ins_encode(clear_avx, Java_To_Runtime(meth));
ins_pipe(pipe_slow);
%}
// Return
void MachEpilogNode::emit(CodeBuffer& cbuf, PhaseRegAlloc* ra_) const
{
Compile* C = ra_->C;
MacroAssembler _masm(&cbuf);
if (generate_vzeroupper(C)) {
__ vzeroupper();
}
...
}
They are all guared by generate_vzeroupper:
static bool generate_vzeroupper(Compile* C) {
return (VM_Version::supports_vzeroupper() && (C->max_vector_size() > 16 || C->clear_upper_avx() == true)) ? true: false;
}
-------------
PR: https://git.openjdk.java.net/jdk/pull/7770
More information about the hotspot-compiler-dev
mailing list