RFR: 8257772: Vectorizing clear memory operation using AVX-512 masked operations [v2]

Tue Dec 8 12:23:11 UTC 2020

On Tue, 8 Dec 2020 11:52:55 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>>> Submitted some quick testing for this and there are failures with tests in `compiler/c2/cr6340864/`:
>>> 
>>> ```
>>> #  Internal Error (workspace/open/src/hotspot/cpu/x86/macroAssembler_x86.cpp:8178), pid=27510, tid=27529
>>> #  assert(MaxVectorSize >= 32) failed: vector length should be >= 32
>>> 
>>> Current CompileTask:
>>> C2:    259   28    b        java.lang.StringCoding::encodeASCII (158 bytes)
>>> 
>>> Stack: [0x00007f2d144f8000,0x00007f2d145f9000],  sp=0x00007f2d145f3750,  free space=1005k
>>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>>> V  [libjvm.so+0x13a326c]  MacroAssembler::fill64_avx(RegisterImpl*, int, XMMRegisterImpl*, bool)+0x11c
>>> V  [libjvm.so+0x13a3415]  MacroAssembler::xmm_clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*)+0x195
>>> V  [libjvm.so+0x13a458b]  MacroAssembler::clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*, bool)+0x19b
>>> V  [libjvm.so+0x395487]  rep_stosNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x167
>>> V  [libjvm.so+0x15b79da]  PhaseOutput::scratch_emit_size(Node const*)+0x3fa
>>> V  [libjvm.so+0x15ae88c]  PhaseOutput::shorten_branches(unsigned int*)+0x2ac
>>> V  [libjvm.so+0x15c045a]  PhaseOutput::Output()+0xcda
>>> V  [libjvm.so+0xa0a798]  Compile::Code_Gen()+0x438
>>> V  [libjvm.so+0xa13fe7]  Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1917
>>> V  [libjvm.so+0x8466ac]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc
>>> V  [libjvm.so+0xa24498]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08
>>> V  [libjvm.so+0xa24fe8]  CompileBroker::compiler_thread_loop()+0x5a8
>>> V  [libjvm.so+0x18ae756]  JavaThread::thread_main_inner()+0x256
>>> V  [libjvm.so+0x18b50e0]  Thread::call_run()+0x100
>>> V  [libjvm.so+0x1598346]  thread_native_entry(Thread*)+0x116
>>> ```
>>> 
>>> Tests are executed with `-XX:CompileThreshold=100 -XX:-TieredCompilation`.
>> 
>> Hi Tobi, thanks,  I missed a safety check for MaxVectorSize >= 32 in xmm_clear_mem, for platforms supporting AVX feature, I have fixed this and running tests,  can you kindly run the patch with default options over your internal performance suite and confirm there is no performance degradation.
>
> Okay, will do and report back once it finished.

Just noticed that you didn't update the patch yet. Could you first push the fix?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1631