RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X [v2]

Wed Oct 28 08:52:22 UTC 2020

On Mon, 26 Oct 2020 17:40:40 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474
>> 
>> Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. 
>> In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. 
>> 
>> The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721
>> 
>> I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: 
>> 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html 
>> 
>> Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. 
>> And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it.
>> 
>> The code added by  JDK-8076284  is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it.
>> 
>> There were 2 way to exclude it: add new field in superword class:
>> 
>>    bool           _do_vector_loop;  // whether to do vectorization/simd style
>> +  bool           _do_vector_loop_experimental; // experimental optimization
>>    bool           _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output
>> or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`.
>> 
>> Testing: tier1-7 with avx2 and avx512.
>> Performance testing - no regrression.
>> I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change.
>
> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Use static const bool local variable

src/hotspot/share/opto/superword.cpp line 94:

> 92: }
> 93: 
> 94: static const bool _do_vector_loop_experimental = false; // Experimental vectorization which uses data from loop unrolling.

For such purposes, I find debug VM flags a better option: they are constant in product binaries, easy to experiment/change from command-line with debug binaries, and enumerated in a single place.

Any reason to prefer `static const` declarations?

-------------

PR: https://git.openjdk.java.net/jdk/pull/859