RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
    Vladimir Ivanov 
    vladimir.x.ivanov at oracle.com
       
    Tue Nov 26 09:26:20 UTC 2019
    
    
  
> http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/
Looks good.
Testing results (hs-precheckin-comp,hs-tier1,hs-tier2) are clean.
Best regards,
Vladimir Ivanov
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Monday, November 25, 2019 12:23 PM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> 
> 
>>   From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level.
>> If AVX=3, it checks if save/restore of zmm worked properly.
>> If AVX=1 or 2, it checks is save/restore of ymm worked properly.
>> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3.  And that is what this patch is trying to fix.
> 
> Ok, now I see how it works:
> 
> os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features:
> 
>     static bool supports_avx()      { return (_features & CPU_AVX) != 0; }
> 
>     static bool supports_evex()     { return (_features & CPU_AVX512F) !=
> 0; }
> 
> But VM_Version::get_processor_features() clears detected features depending on UseAVX level.
> 
> So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2.
> 
>> The rest we are entering 8221092 discussion.
>> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks.
> 
> My best guess is it helped analysis by eliminating all problematic instructions.
> 
> Anyway, I'm fine with leaving that code. (Though it would be nice to simply get rid of it.)
> 
> It looks like all the code after start_simd_check (and checks to
> legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added).
> 
> Otherwise, looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Monday, November 25, 2019 10:15 AM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir
>> Kozlov <vladimir.kozlov at oracle.com>
>> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>> UseAVX=3 is specified after JDK-8221092
>>
>> Thanks for the clarifications, Sandhya!
>>
>> Some more questions inlined.
>>> The bug happens like below:
>>>
>>> * User specifies -XX:UseAVX=3 on command line.
>>> * On Skylake platform due to the following lines:
>>>        __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>>>        __ movl(rax, Address(rsi, 0));
>>>        __ cmpl(rax, 0x50654);              // If it is Skylake
>>>        __ jcc(Assembler::equal, legacy_setup);
>>>       The zmm registers are not saved/restored.
>>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>>>      int max_vector_size = 0;
>>>      if (UseSSE < 2) {
>>>        // Vectors (in XMM) are only supported with SSE2+
>>>        // SSE is always 2 on x64.
>>>        max_vector_size = 0;
>>>      } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>>>        // 16 byte vectors (in XMM) are supported with SSE2+
>>>        max_vector_size = 16;                              ====> This is the point where max_vector_size is set to 16
>>>      } else else if (UseAVX == 1 || UseAVX == 2) {
>>>         ...
>>>      }
>>> * And so we get UseAVX=3 and max_vector_size = 16.
>>>
>>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.
>>
>> Unfortunately, I'm still confused :-(
>>
>> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and
>> -XX:UseAVX=2 aren't affected on Skylake as well?
>>
>>> On your question regarding why 8221092 needs the code you conditionally exclude:
>>> This was introduced so as not to do any AVX512 execution if not required.  ZMM register save/restore uses AVX512 instruction.
>>
>> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3?
>>
>> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid?
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> Best Regards,
>>> Sandhya
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Friday, November 22, 2019 4:39 AM
>>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot
>>> compiler <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>>> UseAVX=3 is specified after JDK-8221092
>>>
>>> Hi Sandhya,
>>>
>>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>>>
>>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>>>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>>>
>>>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>>>
>>> Please, elaborate how it happens and how legacy_setup affects it?
>>>
>>> Why 8221092 needs the code you conditionally exclude.
>>> Why the following isn't enough?
>>>
>>>        if (FLAG_IS_DEFAULT(UseAVX)) {
>>>          FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
>>> +    if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
>>> _stepping < 5) {
>>> +      FLAG_SET_DEFAULT(UseAVX, 2);  //Set UseAVX=2 for Skylake
>>> +    }
>>>        } else if (UseAVX > use_avx_limit) {
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
    
    
More information about the hotspot-compiler-dev
mailing list