RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
Viswanathan, Sandhya
sandhya.viswanathan at intel.com
Mon Nov 25 23:31:42 UTC 2019
Hi Vladimir,
Please find below updated webrev with your comments implemented:
http://cr.openjdk.java.net/~sviswanathan/8234610/webrev.02/
Best Regards,
Sandhya
-----Original Message-----
From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
Sent: Monday, November 25, 2019 12:23 PM
To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when UseAVX=3 is specified after JDK-8221092
> From the source code, it looks like os_supports_avx_vectors() does two different things based on the AVX level.
> If AVX=3, it checks if save/restore of zmm worked properly.
> If AVX=1 or 2, it checks is save/restore of ymm worked properly.
> Since only save/restore of zmm is done conditionally for Skylake, the problem is only with AVX=3. And that is what this patch is trying to fix.
Ok, now I see how it works:
os_supports_avx_vectors() uses supports_evex() and supports_avx() to dispatch which inspect supported CPU features:
static bool supports_avx() { return (_features & CPU_AVX) != 0; }
static bool supports_evex() { return (_features & CPU_AVX512F) !=
0; }
But VM_Version::get_processor_features() clears detected features depending on UseAVX level.
So, as you described os_supports_avx_vectors() goes through different paths between UseAVX=3 and UseAVX=1/2.
> The rest we are entering 8221092 discussion.
> I think Vivek's intent there was not do any AVX 512 instructions at all if AVX < 3 to overcome performance regressions observed by Scott Oaks.
My best guess is it helped analysis by eliminating all problematic instructions.
Anyway, I'm fine with leaving that code. (Though it would be nice to simply get rid of it.)
It looks like all the code after start_simd_check (and checks to
legacy_save_restore) can go under use_evex check. And it'll make the code clearer (I spent too much time reasoning about interactions between use_evex and FLAG_IS_DEFAULT(UseAVX) you added).
Otherwise, looks good.
Best regards,
Vladimir Ivanov
> -----Original Message-----
> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> Sent: Monday, November 25, 2019 10:15 AM
> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Vladimir
> Kozlov <vladimir.kozlov at oracle.com>
> Cc: hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
> UseAVX=3 is specified after JDK-8221092
>
> Thanks for the clarifications, Sandhya!
>
> Some more questions inlined.
>> The bug happens like below:
>>
>> * User specifies -XX:UseAVX=3 on command line.
>> * On Skylake platform due to the following lines:
>> __ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset())));
>> __ movl(rax, Address(rsi, 0));
>> __ cmpl(rax, 0x50654); // If it is Skylake
>> __ jcc(Assembler::equal, legacy_setup);
>> The zmm registers are not saved/restored.
>> * This results in os_supports_avx_vectors() returning false when called from VM_Version::get_processor_features():
>> int max_vector_size = 0;
>> if (UseSSE < 2) {
>> // Vectors (in XMM) are only supported with SSE2+
>> // SSE is always 2 on x64.
>> max_vector_size = 0;
>> } else if (UseAVX == 0 || !os_supports_avx_vectors()) {
>> // 16 byte vectors (in XMM) are supported with SSE2+
>> max_vector_size = 16; ====> This is the point where max_vector_size is set to 16
>> } else else if (UseAVX == 1 || UseAVX == 2) {
>> ...
>> }
>> * And so we get UseAVX=3 and max_vector_size = 16.
>>
>> So the fix is to save/restore zmm registers when the flag is not default and user specifies UseAVX > 2.
>
> Unfortunately, I'm still confused :-(
>
> If it turns os_supports_avx_vectors() == false, why -XX:UseAVX=1 and
> -XX:UseAVX=2 aren't affected on Skylake as well?
>
>> On your question regarding why 8221092 needs the code you conditionally exclude:
>> This was introduced so as not to do any AVX512 execution if not required. ZMM register save/restore uses AVX512 instruction.
>
> Are you talking about completely avoiding execution of AVX512 instructions on Skylakes if UseAVX < 3?
>
> Considering it is in generate_get_cpu_info() which is run only once early at startup, what kind of effects you intend to avoid?
>
> Best regards,
> Vladimir Ivanov
>
>>
>> Best Regards,
>> Sandhya
>>
>>
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Friday, November 22, 2019 4:39 AM
>> To: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; hotspot
>> compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: RFR (XXS) 8234610: MaxVectorSize set wrongly when
>> UseAVX=3 is specified after JDK-8221092
>>
>> Hi Sandhya,
>>
>>> On Skylake platform the JVM by default sets the UseAVX level to 2 and accordingly sets MaxVectorSize to 32 bytes as per JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>.
>>>
>>> When the user explicitly wants to use AVX3 on Skylake platform, they need to invoke the JVM with -XX:UseAVX=3 command line argument.
>>> This should automatically result in MaxVectorSize being set to 64 bytes.
>>>
>>> However post JDK-8221092<https://bugs.openjdk.java.net/browse/JDK-8221092>, when -XX:UseAVX=3 is given on command line, the MaxVectorSize is being wrongly set to 16 bytes.
>>
>> Please, elaborate how it happens and how legacy_setup affects it?
>>
>> Why 8221092 needs the code you conditionally exclude.
>> Why the following isn't enough?
>>
>> if (FLAG_IS_DEFAULT(UseAVX)) {
>> FLAG_SET_DEFAULT(UseAVX, use_avx_limit);
>> + if (is_intel_family_core() && _model == CPU_MODEL_SKYLAKE &&
>> _stepping < 5) {
>> + FLAG_SET_DEFAULT(UseAVX, 2); //Set UseAVX=2 for Skylake
>> + }
>> } else if (UseAVX > use_avx_limit) {
>>
>> Best regards,
>> Vladimir Ivanov
>>
More information about the hotspot-compiler-dev
mailing list