RFR: Newer AMD 17h (EPYC) Processor family defaults

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Sep 5 17:49:01 UTC 2017


On 9/4/17 11:02 PM, David Holmes wrote:
> Hi Rohit,
> 
> I couldn't see a bug filed for this so I did it:
> 
> https://bugs.openjdk.java.net/browse/JDK-8187219
> 
> I also hosted the webrev as I wanted to see the change in context:
> 
> http://cr.openjdk.java.net/~dholmes/8187219/webrev/

Thank you, David, for filing RFE and preparing webrev.

> 
> I have a couple of comments/queries:
> 
> src/cpu/x86/vm/vm_version_x86.hpp
> 
> So this moved the adx/bmi2/sha/fam settings out from being Intel 
> specific to applying to AMD as well - ok. Have these features always 
> been available in AMD chips? Just wondering if they might not be valid 
> for some older processors.

Looks like AMD used the *same* CPUID bits to check availability of these 
features. Older CPUs will not have these bits set.

> 
> You added:
> 
>   526       if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>   527         result |= CPU_HT;
> 
> and I'm wondering of there would be any case where this would not be 
> covered by the earlier:
> 
>   448     if (threads_per_core() > 1)
>   449       result |= CPU_HT;
> 
> ?
> ---

Valid question.

Thanks,
Vladimir

> 
> src/cpu/x86/vm/vm_version_x86.cpp
> 
> No comments on AMD specific changes.
> 
> Thanks,
> David
> -----
> 
> On 5/09/2017 3:43 PM, David Holmes wrote:
>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote:
>>> Hello David,
>>>
>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes 
>>> <david.holmes at oracle.com> wrote:
>>>> Hi Rohit,
>>>>
>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo.
>>>>
>>>
>>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826]
>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch]
>>> without any issues.
>>> Can you share the error message that you are getting?
>>
>> I was getting this:
>>
>> applying hotspot.patch
>> patching file src/cpu/x86/vm/vm_version_x86.cpp
>> Hunk #1 FAILED at 1108
>> 1 out of 1 hunks FAILED -- saving rejects to file 
>> src/cpu/x86/vm/vm_version_x86.cpp.rej
>> patching file src/cpu/x86/vm/vm_version_x86.hpp
>> Hunk #2 FAILED at 522
>> 1 out of 2 hunks FAILED -- saving rejects to file 
>> src/cpu/x86/vm/vm_version_x86.hpp.rej
>> abort: patch failed to apply
>>
>> but I started again and this time it applied fine, so not sure what 
>> was going on there.
>>
>> Cheers,
>> David
>>
>>> Regards,
>>> Rohit
>>>
>>>
>>>>
>>>>
>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote:
>>>>>
>>>>> Hello Vladimir,
>>>>>
>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
>>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>>>
>>>>>> Hi Rohit,
>>>>>>
>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hello Vladimir,
>>>>>>>
>>>>>>>> Changes look good. Only question I have is about MaxVectorSize. 
>>>>>>>> It is
>>>>>>>> set
>>>>>>>>>
>>>>>>>>>
>>>>>>>> 16 only in presence of AVX:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 
>>>>>>>>
>>>>>>>>
>>>>>>>> Does that code works for AMD 17h too?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD 
>>>>>>> 17h. So
>>>>>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>>>>>> have updated, re-tested and attached the patch.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Which check you removed?
>>>>>>
>>>>>
>>>>> My older patch had the below mentioned check which was required on
>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled
>>>>> better in openJDK10. So this check is not required anymore.
>>>>>
>>>>> +    // Some defaults for AMD family 17h
>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>> ...
>>>>> ...
>>>>> +      if (MaxVectorSize > 32) {
>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>> +      }
>>>>> ..
>>>>> ..
>>>>> +      }
>>>>>
>>>>>>>
>>>>>>> I have one query regarding the setting of UseSHA flag:
>>>>>>>
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 
>>>>>>>
>>>>>>>
>>>>>>> AMD 17h has support for SHA.
>>>>>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is 
>>>>>>> there an
>>>>>>> underlying reason for this? I have handled this in the patch but 
>>>>>>> just
>>>>>>> wanted to confirm.
>>>>>>
>>>>>>
>>>>>>
>>>>>> It was done with next changes which use only AVX2 and BMI2 
>>>>>> instructions
>>>>>> to
>>>>>> calculate SHA-256:
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>>>>>
>>>>>> I don't know if AMD 15h supports these instructions and can 
>>>>>> execute that
>>>>>> code. You need to test it.
>>>>>>
>>>>>
>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
>>>>> it should work.
>>>>> Confirmed by running following sanity tests:
>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java 
>>>>>
>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java 
>>>>>
>>>>>
>>>>> So I have removed those SHA checks from my patch too.
>>>>>
>>>>> Please find attached updated, re-tested patch.
>>>>>
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> @@ -1109,11 +1109,27 @@
>>>>>        }
>>>>>
>>>>>    #ifdef COMPILER2
>>>>> -    if (MaxVectorSize > 16) {
>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>        }
>>>>>    #endif // COMPILER2
>>>>> +
>>>>> +    // Some defaults for AMD family 17h
>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>> Array Copy
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>>> +      }
>>>>> +      if (supports_sse2() && 
>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>>> +      }
>>>>> +#ifdef COMPILER2
>>>>> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>> +      }
>>>>> +#endif
>>>>> +    }
>>>>>      }
>>>>>
>>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> @@ -505,6 +505,14 @@
>>>>>          result |= CPU_CLMUL;
>>>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>          result |= CPU_RTM;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> +       result |= CPU_ADX;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> +      result |= CPU_BMI2;
>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> +      result |= CPU_SHA;
>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> +      result |= CPU_FMA;
>>>>>
>>>>>        // AMD features.
>>>>>        if (is_amd()) {
>>>>> @@ -515,19 +523,13 @@
>>>>>            result |= CPU_LZCNT;
>>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>            result |= CPU_SSE4A;
>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>> +        result |= CPU_HT;
>>>>>        }
>>>>>        // Intel features.
>>>>>        if(is_intel()) {
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> -         result |= CPU_ADX;
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> -        result |= CPU_BMI2;
>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> -        result |= CPU_SHA;
>>>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>            result |= CPU_LZCNT;
>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> -        result |= CPU_FMA;
>>>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>> support for prefetchw
>>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>            result |= CPU_3DNOW_PREFETCH;
>>>>>
>>>>> Please let me know your comments.
>>>>>
>>>>> Thanks for your time.
>>>>> Rohit
>>>>>
>>>>>>>
>>>>>>> Thanks for taking time to review the code.
>>>>>>>
>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>>           }
>>>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>         }
>>>>>>> +    if (supports_sha()) {
>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>> +      }
>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || 
>>>>>>> UseSHA256Intrinsics ||
>>>>>>> UseSHA512Intrinsics) {
>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>> +      }
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +    }
>>>>>>>
>>>>>>>         // some defaults for AMD family 15h
>>>>>>>         if ( cpu_family() == 0x15 ) {
>>>>>>> @@ -1109,11 +1125,40 @@
>>>>>>>         }
>>>>>>>
>>>>>>>     #ifdef COMPILER2
>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>         }
>>>>>>>     #endif // COMPILER2
>>>>>>> +
>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>> +      // On family 17h processors use XMM and 
>>>>>>> UnalignedLoadStores for
>>>>>>> Array Copy
>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>>>>> +      }
>>>>>>> +      if (supports_sse2() && 
>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>>>>> +      }
>>>>>>> +      if (supports_bmi2() && 
>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>>>>>> +      }
>>>>>>> +      if (UseSHA) {
>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>> functions not available on this CPU.");
>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +        }
>>>>>>> +      }
>>>>>>> +#ifdef COMPILER2
>>>>>>> +      if (supports_sse4_2()) {
>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>> +        }
>>>>>>> +      }
>>>>>>> +#endif
>>>>>>> +    }
>>>>>>>       }
>>>>>>>
>>>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> @@ -505,6 +505,14 @@
>>>>>>>           result |= CPU_CLMUL;
>>>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>>           result |= CPU_RTM;
>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>> +       result |= CPU_ADX;
>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>> +      result |= CPU_BMI2;
>>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>> +      result |= CPU_SHA;
>>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>> +      result |= CPU_FMA;
>>>>>>>
>>>>>>>         // AMD features.
>>>>>>>         if (is_amd()) {
>>>>>>> @@ -515,19 +523,13 @@
>>>>>>>             result |= CPU_LZCNT;
>>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>             result |= CPU_SSE4A;
>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>> +        result |= CPU_HT;
>>>>>>>         }
>>>>>>>         // Intel features.
>>>>>>>         if(is_intel()) {
>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>> -         result |= CPU_ADX;
>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>> -        result |= CPU_BMI2;
>>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>> -        result |= CPU_SHA;
>>>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>>             result |= CPU_LZCNT;
>>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>> -        result |= CPU_FMA;
>>>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>>>> support for prefetchw
>>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Rohit
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj
>>>>>>>>> <rohitarulraj at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes
>>>>>>>>>> <david.holmes at oracle.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Rohit,
>>>>>>>>>>>
>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a 
>>>>>>>>>>> lot of
>>>>>>>>>>> logic
>>>>>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, 
>>>>>>>>>> test and
>>>>>>>>>> resubmit for review.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Rohit
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>>>>>> default) and didnt find any regressions.
>>>>>>>>>
>>>>>>>>> Can anyone please volunteer to review this patch  which sets 
>>>>>>>>> flag/ISA
>>>>>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>>>>>
>>>>>>>>> ************************* Patch ****************************
>>>>>>>>>
>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>>>>            }
>>>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>>          }
>>>>>>>>> +    if (supports_sha()) {
>>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>>> +      }
>>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || 
>>>>>>>>> UseSHA256Intrinsics ||
>>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>> +        warning("SHA instructions are not available on this 
>>>>>>>>> CPU");
>>>>>>>>> +      }
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +    }
>>>>>>>>>
>>>>>>>>>          // some defaults for AMD family 15h
>>>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>>>> @@ -1109,11 +1125,43 @@
>>>>>>>>>          }
>>>>>>>>>
>>>>>>>>>      #ifdef COMPILER2
>>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>>          }
>>>>>>>>>      #endif // COMPILER2
>>>>>>>>> +
>>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>>> +      // On family 17h processors use XMM and 
>>>>>>>>> UnalignedLoadStores for
>>>>>>>>> Array Copy
>>>>>>>>> +      if (supports_sse2() && 
>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (supports_sse2() && 
>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>>> {
>>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (supports_bmi2() && 
>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>>> +      }
>>>>>>>>> +      if (UseSHA) {
>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>>> functions not available on this CPU.");
>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +        }
>>>>>>>>> +      }
>>>>>>>>> +#ifdef COMPILER2
>>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>>> +        }
>>>>>>>>> +      }
>>>>>>>>> +#endif
>>>>>>>>> +    }
>>>>>>>>>        }
>>>>>>>>>
>>>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> @@ -505,6 +505,14 @@
>>>>>>>>>            result |= CPU_CLMUL;
>>>>>>>>>          if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>>>>            result |= CPU_RTM;
>>>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>> +       result |= CPU_ADX;
>>>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>> +      result |= CPU_BMI2;
>>>>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>> +      result |= CPU_SHA;
>>>>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>> +      result |= CPU_FMA;
>>>>>>>>>
>>>>>>>>>          // AMD features.
>>>>>>>>>          if (is_amd()) {
>>>>>>>>> @@ -515,19 +523,13 @@
>>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>>              result |= CPU_SSE4A;
>>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>>> +        result |= CPU_HT;
>>>>>>>>>          }
>>>>>>>>>          // Intel features.
>>>>>>>>>          if(is_intel()) {
>>>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>> -         result |= CPU_ADX;
>>>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>> -        result |= CPU_BMI2;
>>>>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>> -        result |= CPU_SHA;
>>>>>>>>>            if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>> -        result |= CPU_FMA;
>>>>>>>>>            // for Intel, ecx.bits.misalignsse bit (bit 8) 
>>>>>>>>> indicates
>>>>>>>>> support for prefetchw
>>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>>>>              result |= CPU_3DNOW_PREFETCH;
>>>>>>>>>
>>>>>>>>> **************************************************************
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Rohit
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>>>>>> <david.holmes at oracle.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Rohit,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) 
>>>>>>>>>>>>>> which
>>>>>>>>>>>>>> sets
>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and 
>>>>>>>>>>>>>> help us
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>> the commit process.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems 
>>>>>>>>>>>>> outside the
>>>>>>>>>>>>> OpenJDK
>>>>>>>>>>>>> infrastructure and ...
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail
>>>>>>>>>>>>> servers.
>>>>>>>>>>>>> If
>>>>>>>>>>>>> the
>>>>>>>>>>>>> patch is small please include it inline. Otherwise you will 
>>>>>>>>>>>>> need
>>>>>>>>>>>>> to
>>>>>>>>>>>>> find
>>>>>>>>>>>>> an
>>>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make 
>>>>>>>>>>>>>> default) and
>>>>>>>>>>>>>> didnt find any regressions.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to 
>>>>>>>>>>>>> comment on
>>>>>>>>>>>>> testing
>>>>>>>>>>>>> requirements.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks David,
>>>>>>>>>>>> Yes, it's a small patch.
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>>>>>             }
>>>>>>>>>>>>             FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>>>>>           }
>>>>>>>>>>>> +    if (supports_sha()) {
>>>>>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || 
>>>>>>>>>>>> UseSHA256Intrinsics
>>>>>>>>>>>> ||
>>>>>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>>>>> +        warning("SHA instructions are not available on this 
>>>>>>>>>>>> CPU");
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>>>> +    }
>>>>>>>>>>>>
>>>>>>>>>>>>           // some defaults for AMD family 15h
>>>>>>>>>>>>           if ( cpu_family() == 0x15 ) {
>>>>>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>>>>>           }
>>>>>>>>>>>>
>>>>>>>>>>>>       #ifdef COMPILER2
>>>>>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>>>>>             FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>>>>>           }
>>>>>>>>>>>>       #endif // COMPILER2
>>>>>>>>>>>> +
>>>>>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>>>>>> +      // On family 17h processors use XMM and 
>>>>>>>>>>>> UnalignedLoadStores
>>>>>>>>>>>> for
>>>>>>>>>>>> Array Copy
>>>>>>>>>>>> +      if (supports_sse2() && 
>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy))
>>>>>>>>>>>> {
>>>>>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +      if (supports_sse2() &&
>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>>>>>> {
>>>>>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +      if (supports_bmi2() && 
>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions))
>>>>>>>>>>>> {
>>>>>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +      if (UseSHA) {
>>>>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 
>>>>>>>>>>>> crypto hash
>>>>>>>>>>>> functions not available on this CPU.");
>>>>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +#ifdef COMPILER2
>>>>>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +#endif
>>>>>>>>>>>> +    }
>>>>>>>>>>>>         }
>>>>>>>>>>>>
>>>>>>>>>>>>         if( is_intel() ) { // Intel cpus specific settings
>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>>>>>               result |= CPU_LZCNT;
>>>>>>>>>>>>             if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>>>>>               result |= CPU_SSE4A;
>>>>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>>>>>> +        result |= CPU_HT;
>>>>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>>>>> +        result |= CPU_ADX;
>>>>>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>>>>> +        result |= CPU_SHA;
>>>>>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>>>>>           }
>>>>>>>>>>>>           // Intel features.
>>>>>>>>>>>>           if(is_intel()) {
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Rohit
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>


More information about the hotspot-dev mailing list