RFR: Newer AMD 17h (EPYC) Processor family defaults

David Holmes david.holmes at oracle.com
Tue Sep 5 05:43:03 UTC 2017


On 5/09/2017 3:29 PM, Rohit Arul Raj wrote:
> Hello David,
> 
> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes <david.holmes at oracle.com> wrote:
>> Hi Rohit,
>>
>> I was unable to apply your patch to latest jdk10/hs/hotspot repo.
>>
> 
> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826]
> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch]
> without any issues.
> Can you share the error message that you are getting?

I was getting this:

applying hotspot.patch
patching file src/cpu/x86/vm/vm_version_x86.cpp
Hunk #1 FAILED at 1108
1 out of 1 hunks FAILED -- saving rejects to file 
src/cpu/x86/vm/vm_version_x86.cpp.rej
patching file src/cpu/x86/vm/vm_version_x86.hpp
Hunk #2 FAILED at 522
1 out of 2 hunks FAILED -- saving rejects to file 
src/cpu/x86/vm/vm_version_x86.hpp.rej
abort: patch failed to apply

but I started again and this time it applied fine, so not sure what was 
going on there.

Cheers,
David

> Regards,
> Rohit
> 
> 
>>
>>
>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote:
>>>
>>> Hello Vladimir,
>>>
>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>> Hi Rohit,
>>>>
>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> Hello Vladimir,
>>>>>
>>>>>> Changes look good. Only question I have is about MaxVectorSize. It is
>>>>>> set
>>>>>>>
>>>>>>>
>>>>>> 16 only in presence of AVX:
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>>>>
>>>>>> Does that code works for AMD 17h too?
>>>>>
>>>>>
>>>>>
>>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>>>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>>>> have updated, re-tested and attached the patch.
>>>>
>>>>
>>>>
>>>> Which check you removed?
>>>>
>>>
>>> My older patch had the below mentioned check which was required on
>>> JDK9 where the default MaxVectorSize was 64. It has been handled
>>> better in openJDK10. So this check is not required anymore.
>>>
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> ...
>>> ...
>>> +      if (MaxVectorSize > 32) {
>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>> +      }
>>> ..
>>> ..
>>> +      }
>>>
>>>>>
>>>>> I have one query regarding the setting of UseSHA flag:
>>>>>
>>>>>
>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>>>>
>>>>> AMD 17h has support for SHA.
>>>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>>>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>>>>> underlying reason for this? I have handled this in the patch but just
>>>>> wanted to confirm.
>>>>
>>>>
>>>>
>>>> It was done with next changes which use only AVX2 and BMI2 instructions
>>>> to
>>>> calculate SHA-256:
>>>>
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>>>
>>>> I don't know if AMD 15h supports these instructions and can execute that
>>>> code. You need to test it.
>>>>
>>>
>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
>>> it should work.
>>> Confirmed by running following sanity tests:
>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>>>
>>> So I have removed those SHA checks from my patch too.
>>>
>>> Please find attached updated, re-tested patch.
>>>
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>> @@ -1109,11 +1109,27 @@
>>>        }
>>>
>>>    #ifdef COMPILER2
>>> -    if (MaxVectorSize > 16) {
>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>        }
>>>    #endif // COMPILER2
>>> +
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>> Array Copy
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>> +      }
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>> +      }
>>> +#ifdef COMPILER2
>>> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>> +      }
>>> +#endif
>>> +    }
>>>      }
>>>
>>>      if( is_intel() ) { // Intel cpus specific settings
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>> @@ -505,6 +505,14 @@
>>>          result |= CPU_CLMUL;
>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>          result |= CPU_RTM;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> +       result |= CPU_ADX;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> +      result |= CPU_BMI2;
>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> +      result |= CPU_SHA;
>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> +      result |= CPU_FMA;
>>>
>>>        // AMD features.
>>>        if (is_amd()) {
>>> @@ -515,19 +523,13 @@
>>>            result |= CPU_LZCNT;
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>            result |= CPU_SSE4A;
>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>> +        result |= CPU_HT;
>>>        }
>>>        // Intel features.
>>>        if(is_intel()) {
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> -         result |= CPU_ADX;
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> -        result |= CPU_BMI2;
>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> -        result |= CPU_SHA;
>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>            result |= CPU_LZCNT;
>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> -        result |= CPU_FMA;
>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>> support for prefetchw
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>            result |= CPU_3DNOW_PREFETCH;
>>>
>>> Please let me know your comments.
>>>
>>> Thanks for your time.
>>> Rohit
>>>
>>>>>
>>>>> Thanks for taking time to review the code.
>>>>>
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> @@ -1088,6 +1088,22 @@
>>>>>           }
>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>         }
>>>>> +    if (supports_sha()) {
>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>> +      }
>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>> UseSHA512Intrinsics) {
>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>> +      }
>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +    }
>>>>>
>>>>>         // some defaults for AMD family 15h
>>>>>         if ( cpu_family() == 0x15 ) {
>>>>> @@ -1109,11 +1125,40 @@
>>>>>         }
>>>>>
>>>>>     #ifdef COMPILER2
>>>>> -    if (MaxVectorSize > 16) {
>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>         }
>>>>>     #endif // COMPILER2
>>>>> +
>>>>> +    // Some defaults for AMD family 17h
>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>> Array Copy
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>>> +      }
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>>> +      }
>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>>>> +      }
>>>>> +      if (UseSHA) {
>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>> functions not available on this CPU.");
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        }
>>>>> +      }
>>>>> +#ifdef COMPILER2
>>>>> +      if (supports_sse4_2()) {
>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>> +        }
>>>>> +      }
>>>>> +#endif
>>>>> +    }
>>>>>       }
>>>>>
>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> @@ -505,6 +505,14 @@
>>>>>           result |= CPU_CLMUL;
>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>           result |= CPU_RTM;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> +       result |= CPU_ADX;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> +      result |= CPU_BMI2;
>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> +      result |= CPU_SHA;
>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> +      result |= CPU_FMA;
>>>>>
>>>>>         // AMD features.
>>>>>         if (is_amd()) {
>>>>> @@ -515,19 +523,13 @@
>>>>>             result |= CPU_LZCNT;
>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>             result |= CPU_SSE4A;
>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>> +        result |= CPU_HT;
>>>>>         }
>>>>>         // Intel features.
>>>>>         if(is_intel()) {
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> -         result |= CPU_ADX;
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> -        result |= CPU_BMI2;
>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> -        result |= CPU_SHA;
>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>             result |= CPU_LZCNT;
>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> -        result |= CPU_FMA;
>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>> support for prefetchw
>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>
>>>>>
>>>>> Regards,
>>>>> Rohit
>>>>>
>>>>>
>>>>>
>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj
>>>>>>> <rohitarulraj at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes
>>>>>>>> <david.holmes at oracle.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Rohit,
>>>>>>>>>
>>>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>>>>> logic
>>>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>>>>> resubmit for review.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Rohit
>>>>>>>>
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>>>> default) and didnt find any regressions.
>>>>>>>
>>>>>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>>>
>>>>>>> ************************* Patch ****************************
>>>>>>>
>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>>            }
>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>          }
>>>>>>> +    if (supports_sha()) {
>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>> +      }
>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>>> UseSHA512Intrinsics) {
>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>> +      }
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +    }
>>>>>>>
>>>>>>>          // some defaults for AMD family 15h
>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>> @@ -1109,11 +1125,43 @@
>>>>>>>          }
>>>>>>>
>>>>>>>      #ifdef COMPILER2
>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>          }
>>>>>>>      #endif // COMPILER2
>>>>>>> +
>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>>>> Array Copy
>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>> +      }
>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>> {
>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>> +      }
>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>> +        UseBMI2Instructions = true;
>>>>>>> +      }
>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>> +      }
>>>>>>> +      if (UseSHA) {
>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>> functions not available on this CPU.");
>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +        }
>>>>>>> +      }
>>>>>>> +#ifdef COMPILER2
>>>>>>> +      if (supports_sse4_2()) {
>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>> +        }
>>>>>>> +      }
>>>>>>> +#endif
>>>>>>> +    }
>>>>>>>        }
>>>>>>>
>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> @@ -505,6 +505,14 @@
>>>>>>>            result |= CPU_CLMUL;
>>>>>>>          if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>>            result |= CPU_RTM;
>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>> +       result |= CPU_ADX;
>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>> +      result |= CPU_BMI2;
>>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>> +      result |= CPU_SHA;
>>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>> +      result |= CPU_FMA;
>>>>>>>
>>>>>>>          // AMD features.
>>>>>>>          if (is_amd()) {
>>>>>>> @@ -515,19 +523,13 @@
>>>>>>>              result |= CPU_LZCNT;
>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>              result |= CPU_SSE4A;
>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>> +        result |= CPU_HT;
>>>>>>>          }
>>>>>>>          // Intel features.
>>>>>>>          if(is_intel()) {
>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>> -         result |= CPU_ADX;
>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>> -        result |= CPU_BMI2;
>>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>> -        result |= CPU_SHA;
>>>>>>>            if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>>              result |= CPU_LZCNT;
>>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>> -        result |= CPU_FMA;
>>>>>>>            // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>>>> support for prefetchw
>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>>              result |= CPU_3DNOW_PREFETCH;
>>>>>>>
>>>>>>> **************************************************************
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rohit
>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>>>> <david.holmes at oracle.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Rohit,
>>>>>>>>>>>
>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>>>>> sets
>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>>>>> with
>>>>>>>>>>>> the commit process.
>>>>>>>>>>>>
>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>>>>> OpenJDK
>>>>>>>>>>> infrastructure and ...
>>>>>>>>>>>
>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail
>>>>>>>>>>> servers.
>>>>>>>>>>> If
>>>>>>>>>>> the
>>>>>>>>>>> patch is small please include it inline. Otherwise you will need
>>>>>>>>>>> to
>>>>>>>>>>> find
>>>>>>>>>>> an
>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>>>>> didnt find any regressions.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>>>> testing
>>>>>>>>>>> requirements.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks David,
>>>>>>>>>> Yes, it's a small patch.
>>>>>>>>>>
>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>>>             }
>>>>>>>>>>             FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>>>           }
>>>>>>>>>> +    if (supports_sha()) {
>>>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>>>> +      }
>>>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics
>>>>>>>>>> ||
>>>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>>>>> +      }
>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>> +    }
>>>>>>>>>>
>>>>>>>>>>           // some defaults for AMD family 15h
>>>>>>>>>>           if ( cpu_family() == 0x15 ) {
>>>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>>>           }
>>>>>>>>>>
>>>>>>>>>>       #ifdef COMPILER2
>>>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>>>             FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>>>           }
>>>>>>>>>>       #endif // COMPILER2
>>>>>>>>>> +
>>>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>>>>> for
>>>>>>>>>> Array Copy
>>>>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy))
>>>>>>>>>> {
>>>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>>>> +      }
>>>>>>>>>> +      if (supports_sse2() &&
>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>>>> {
>>>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>>>> +      }
>>>>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions))
>>>>>>>>>> {
>>>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>>>> +      }
>>>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>>>> +      }
>>>>>>>>>> +      if (UseSHA) {
>>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>>>> functions not available on this CPU.");
>>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>> +        }
>>>>>>>>>> +      }
>>>>>>>>>> +#ifdef COMPILER2
>>>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>>>> +        }
>>>>>>>>>> +      }
>>>>>>>>>> +#endif
>>>>>>>>>> +    }
>>>>>>>>>>         }
>>>>>>>>>>
>>>>>>>>>>         if( is_intel() ) { // Intel cpus specific settings
>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>>>               result |= CPU_LZCNT;
>>>>>>>>>>             if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>>>               result |= CPU_SSE4A;
>>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>>>> +        result |= CPU_HT;
>>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>>> +        result |= CPU_ADX;
>>>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>>> +        result |= CPU_SHA;
>>>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>>>           }
>>>>>>>>>>           // Intel features.
>>>>>>>>>>           if(is_intel()) {
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Rohit
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>>


More information about the hotspot-dev mailing list