RFR: 8257772: Vectorizing clear memory operation using AVX-512 masked operations [v2]
Tobias Hartmann
thartmann at openjdk.java.net
Tue Dec 8 11:55:13 UTC 2020
On Mon, 7 Dec 2020 12:20:21 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Submitted some quick testing for this and there are failures with tests in `compiler/c2/cr6340864/`:
>> # Internal Error (workspace/open/src/hotspot/cpu/x86/macroAssembler_x86.cpp:8178), pid=27510, tid=27529
>> # assert(MaxVectorSize >= 32) failed: vector length should be >= 32
>>
>> Current CompileTask:
>> C2: 259 28 b java.lang.StringCoding::encodeASCII (158 bytes)
>>
>> Stack: [0x00007f2d144f8000,0x00007f2d145f9000], sp=0x00007f2d145f3750, free space=1005k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V [libjvm.so+0x13a326c] MacroAssembler::fill64_avx(RegisterImpl*, int, XMMRegisterImpl*, bool)+0x11c
>> V [libjvm.so+0x13a3415] MacroAssembler::xmm_clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*)+0x195
>> V [libjvm.so+0x13a458b] MacroAssembler::clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*, bool)+0x19b
>> V [libjvm.so+0x395487] rep_stosNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x167
>> V [libjvm.so+0x15b79da] PhaseOutput::scratch_emit_size(Node const*)+0x3fa
>> V [libjvm.so+0x15ae88c] PhaseOutput::shorten_branches(unsigned int*)+0x2ac
>> V [libjvm.so+0x15c045a] PhaseOutput::Output()+0xcda
>> V [libjvm.so+0xa0a798] Compile::Code_Gen()+0x438
>> V [libjvm.so+0xa13fe7] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1917
>> V [libjvm.so+0x8466ac] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc
>> V [libjvm.so+0xa24498] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08
>> V [libjvm.so+0xa24fe8] CompileBroker::compiler_thread_loop()+0x5a8
>> V [libjvm.so+0x18ae756] JavaThread::thread_main_inner()+0x256
>> V [libjvm.so+0x18b50e0] Thread::call_run()+0x100
>> V [libjvm.so+0x1598346] thread_native_entry(Thread*)+0x116
>>
>> Tests are executed with `-XX:CompileThreshold=100 -XX:-TieredCompilation`.
>
>> Submitted some quick testing for this and there are failures with tests in `compiler/c2/cr6340864/`:
>>
>> ```
>> # Internal Error (workspace/open/src/hotspot/cpu/x86/macroAssembler_x86.cpp:8178), pid=27510, tid=27529
>> # assert(MaxVectorSize >= 32) failed: vector length should be >= 32
>>
>> Current CompileTask:
>> C2: 259 28 b java.lang.StringCoding::encodeASCII (158 bytes)
>>
>> Stack: [0x00007f2d144f8000,0x00007f2d145f9000], sp=0x00007f2d145f3750, free space=1005k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V [libjvm.so+0x13a326c] MacroAssembler::fill64_avx(RegisterImpl*, int, XMMRegisterImpl*, bool)+0x11c
>> V [libjvm.so+0x13a3415] MacroAssembler::xmm_clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*)+0x195
>> V [libjvm.so+0x13a458b] MacroAssembler::clear_mem(RegisterImpl*, RegisterImpl*, RegisterImpl*, XMMRegisterImpl*, bool)+0x19b
>> V [libjvm.so+0x395487] rep_stosNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x167
>> V [libjvm.so+0x15b79da] PhaseOutput::scratch_emit_size(Node const*)+0x3fa
>> V [libjvm.so+0x15ae88c] PhaseOutput::shorten_branches(unsigned int*)+0x2ac
>> V [libjvm.so+0x15c045a] PhaseOutput::Output()+0xcda
>> V [libjvm.so+0xa0a798] Compile::Code_Gen()+0x438
>> V [libjvm.so+0xa13fe7] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1917
>> V [libjvm.so+0x8466ac] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1dc
>> V [libjvm.so+0xa24498] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08
>> V [libjvm.so+0xa24fe8] CompileBroker::compiler_thread_loop()+0x5a8
>> V [libjvm.so+0x18ae756] JavaThread::thread_main_inner()+0x256
>> V [libjvm.so+0x18b50e0] Thread::call_run()+0x100
>> V [libjvm.so+0x1598346] thread_native_entry(Thread*)+0x116
>> ```
>>
>> Tests are executed with `-XX:CompileThreshold=100 -XX:-TieredCompilation`.
>
> Hi Tobi, thanks, I missed a safety check for MaxVectorSize >= 32 in xmm_clear_mem, for platforms supporting AVX feature, I have fixed this and running tests, can you kindly run the patch with default options over your internal performance suite and confirm there is no performance degradation.
Okay, will do and report back once it finished.
-------------
PR: https://git.openjdk.java.net/jdk/pull/1631
More information about the hotspot-compiler-dev
mailing list