RFR: 8257772: Vectorizing clear memory operation using AVX-512 masked operations [v2]

Tobias Hartmann thartmann at openjdk.java.net
Tue Dec 8 11:55:13 UTC 2020


On Mon, 7 Dec 2020 12:20:21 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Submitted some quick testing for this and there are failures with tests in `compiler/c2/cr6340864/`:
>> #  Internal Error (workspace/open/src/hotspot/cpu/x86/macroAssembler_x86.cpp:8178), pid=27510, tid=27529
>> #  assert(MaxVectorSize >= 32) failed: vector length should be >= 32
>> 
>> Current CompileTask:
>> C2:    259   28    b        java.lang.StringCoding::encodeASCII (158 bytes)
>> 
>> Stack: [0x00007f2d144f8000,0x00007f2d145f9000],  sp=0x00007f2d145f3750,  free space=1005k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x13a326c]  MacroAssembler::fill64_avx(RegisterImpl*, int, XMMRegisterImpl*, bool)+0x11c
>> V  [libjvm.so+0x13a3415]  MacroAssembler::xmm_clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*)+0x195
>> V  [libjvm.so+0x13a458b]  MacroAssembler::clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*, bool)+0x19b
>> V  [libjvm.so+0x395487]  rep_stosNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x167
>> V  [libjvm.so+0x15b79da]  PhaseOutput::scratch_emit_size(Node const*)+0x3fa
>> V  [libjvm.so+0x15ae88c]  PhaseOutput::shorten_branches(unsigned int*)+0x2ac
>> V  [libjvm.so+0x15c045a]  PhaseOutput::Output()+0xcda
>> V  [libjvm.so+0xa0a798]  Compile::Code_Gen()+0x438
>> V  [libjvm.so+0xa13fe7]  Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1917
>> V  [libjvm.so+0x8466ac]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc
>> V  [libjvm.so+0xa24498]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08
>> V  [libjvm.so+0xa24fe8]  CompileBroker::compiler_thread_loop()+0x5a8
>> V  [libjvm.so+0x18ae756]  JavaThread::thread_main_inner()+0x256
>> V  [libjvm.so+0x18b50e0]  Thread::call_run()+0x100
>> V  [libjvm.so+0x1598346]  thread_native_entry(Thread*)+0x116
>> 
>> Tests are executed with `-XX:CompileThreshold=100 -XX:-TieredCompilation`.
>
>> Submitted some quick testing for this and there are failures with tests in `compiler/c2/cr6340864/`:
>> 
>> ```
>> #  Internal Error (workspace/open/src/hotspot/cpu/x86/macroAssembler_x86.cpp:8178), pid=27510, tid=27529
>> #  assert(MaxVectorSize >= 32) failed: vector length should be >= 32
>> 
>> Current CompileTask:
>> C2:    259   28    b        java.lang.StringCoding::encodeASCII (158 bytes)
>> 
>> Stack: [0x00007f2d144f8000,0x00007f2d145f9000],  sp=0x00007f2d145f3750,  free space=1005k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x13a326c]  MacroAssembler::fill64_avx(RegisterImpl*, int, XMMRegisterImpl*, bool)+0x11c
>> V  [libjvm.so+0x13a3415]  MacroAssembler::xmm_clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*)+0x195
>> V  [libjvm.so+0x13a458b]  MacroAssembler::clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*, bool)+0x19b
>> V  [libjvm.so+0x395487]  rep_stosNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x167
>> V  [libjvm.so+0x15b79da]  PhaseOutput::scratch_emit_size(Node const*)+0x3fa
>> V  [libjvm.so+0x15ae88c]  PhaseOutput::shorten_branches(unsigned int*)+0x2ac
>> V  [libjvm.so+0x15c045a]  PhaseOutput::Output()+0xcda
>> V  [libjvm.so+0xa0a798]  Compile::Code_Gen()+0x438
>> V  [libjvm.so+0xa13fe7]  Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1917
>> V  [libjvm.so+0x8466ac]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc
>> V  [libjvm.so+0xa24498]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08
>> V  [libjvm.so+0xa24fe8]  CompileBroker::compiler_thread_loop()+0x5a8
>> V  [libjvm.so+0x18ae756]  JavaThread::thread_main_inner()+0x256
>> V  [libjvm.so+0x18b50e0]  Thread::call_run()+0x100
>> V  [libjvm.so+0x1598346]  thread_native_entry(Thread*)+0x116
>> ```
>> 
>> Tests are executed with `-XX:CompileThreshold=100 -XX:-TieredCompilation`.
> 
> Hi Tobi, thanks,  I missed a safety check for MaxVectorSize >= 32 in xmm_clear_mem, for platforms supporting AVX feature, I have fixed this and running tests,  can you kindly run the patch with default options over your internal performance suite and confirm there is no performance degradation.

Okay, will do and report back once it finished.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1631


More information about the hotspot-compiler-dev mailing list