RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj rohitarulraj at gmail.com
Tue Sep 5 05:29:08 UTC 2017


Hello David,

On Tue, Sep 5, 2017 at 10:31 AM, David Holmes <david.holmes at oracle.com> wrote:
> Hi Rohit,
>
> I was unable to apply your patch to latest jdk10/hs/hotspot repo.
>

I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826]
and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch]
without any issues.
Can you share the error message that you are getting?

Regards,
Rohit


>
>
> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote:
>>
>> Hello Vladimir,
>>
>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>> Hi Rohit,
>>>
>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>>
>>>>
>>>> Hello Vladimir,
>>>>
>>>>> Changes look good. Only question I have is about MaxVectorSize. It is
>>>>> set
>>>>>>
>>>>>>
>>>>> 16 only in presence of AVX:
>>>>>
>>>>>
>>>>>
>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>>>
>>>>> Does that code works for AMD 17h too?
>>>>
>>>>
>>>>
>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>>> have updated, re-tested and attached the patch.
>>>
>>>
>>>
>>> Which check you removed?
>>>
>>
>> My older patch had the below mentioned check which was required on
>> JDK9 where the default MaxVectorSize was 64. It has been handled
>> better in openJDK10. So this check is not required anymore.
>>
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> ...
>> ...
>> +      if (MaxVectorSize > 32) {
>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>> +      }
>> ..
>> ..
>> +      }
>>
>>>>
>>>> I have one query regarding the setting of UseSHA flag:
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>>>
>>>> AMD 17h has support for SHA.
>>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>>>> underlying reason for this? I have handled this in the patch but just
>>>> wanted to confirm.
>>>
>>>
>>>
>>> It was done with next changes which use only AVX2 and BMI2 instructions
>>> to
>>> calculate SHA-256:
>>>
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>>
>>> I don't know if AMD 15h supports these instructions and can execute that
>>> code. You need to test it.
>>>
>>
>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
>> it should work.
>> Confirmed by running following sanity tests:
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>>
>> So I have removed those SHA checks from my patch too.
>>
>> Please find attached updated, re-tested patch.
>>
>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>> b/src/cpu/x86/vm/vm_version_x86.cpp
>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>> @@ -1109,11 +1109,27 @@
>>       }
>>
>>   #ifdef COMPILER2
>> -    if (MaxVectorSize > 16) {
>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>       }
>>   #endif // COMPILER2
>> +
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>> Array Copy
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>> +      }
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>> +      }
>> +#ifdef COMPILER2
>> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>> +      }
>> +#endif
>> +    }
>>     }
>>
>>     if( is_intel() ) { // Intel cpus specific settings
>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>> b/src/cpu/x86/vm/vm_version_x86.hpp
>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>> @@ -505,6 +505,14 @@
>>         result |= CPU_CLMUL;
>>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>         result |= CPU_RTM;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> +       result |= CPU_ADX;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> +      result |= CPU_BMI2;
>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> +      result |= CPU_SHA;
>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> +      result |= CPU_FMA;
>>
>>       // AMD features.
>>       if (is_amd()) {
>> @@ -515,19 +523,13 @@
>>           result |= CPU_LZCNT;
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>           result |= CPU_SSE4A;
>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>> +        result |= CPU_HT;
>>       }
>>       // Intel features.
>>       if(is_intel()) {
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> -         result |= CPU_ADX;
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> -        result |= CPU_BMI2;
>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> -        result |= CPU_SHA;
>>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>           result |= CPU_LZCNT;
>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> -        result |= CPU_FMA;
>>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>> support for prefetchw
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>           result |= CPU_3DNOW_PREFETCH;
>>
>> Please let me know your comments.
>>
>> Thanks for your time.
>> Rohit
>>
>>>>
>>>> Thanks for taking time to review the code.
>>>>
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> @@ -1088,6 +1088,22 @@
>>>>          }
>>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>        }
>>>> +    if (supports_sha()) {
>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>> +      }
>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>> UseSHA512Intrinsics) {
>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +        warning("SHA instructions are not available on this CPU");
>>>> +      }
>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +    }
>>>>
>>>>        // some defaults for AMD family 15h
>>>>        if ( cpu_family() == 0x15 ) {
>>>> @@ -1109,11 +1125,40 @@
>>>>        }
>>>>
>>>>    #ifdef COMPILER2
>>>> -    if (MaxVectorSize > 16) {
>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>        }
>>>>    #endif // COMPILER2
>>>> +
>>>> +    // Some defaults for AMD family 17h
>>>> +    if ( cpu_family() == 0x17 ) {
>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>> Array Copy
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>> +      }
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>> +      }
>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>>> +      }
>>>> +      if (UseSHA) {
>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        } else if (UseSHA512Intrinsics) {
>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>> functions not available on this CPU.");
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        }
>>>> +      }
>>>> +#ifdef COMPILER2
>>>> +      if (supports_sse4_2()) {
>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>> +        }
>>>> +      }
>>>> +#endif
>>>> +    }
>>>>      }
>>>>
>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> @@ -505,6 +505,14 @@
>>>>          result |= CPU_CLMUL;
>>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>          result |= CPU_RTM;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> +       result |= CPU_ADX;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> +      result |= CPU_BMI2;
>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> +      result |= CPU_SHA;
>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> +      result |= CPU_FMA;
>>>>
>>>>        // AMD features.
>>>>        if (is_amd()) {
>>>> @@ -515,19 +523,13 @@
>>>>            result |= CPU_LZCNT;
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>            result |= CPU_SSE4A;
>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>> +        result |= CPU_HT;
>>>>        }
>>>>        // Intel features.
>>>>        if(is_intel()) {
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> -         result |= CPU_ADX;
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> -        result |= CPU_BMI2;
>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> -        result |= CPU_SHA;
>>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>            result |= CPU_LZCNT;
>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> -        result |= CPU_FMA;
>>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>> support for prefetchw
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>            result |= CPU_3DNOW_PREFETCH;
>>>>
>>>>
>>>> Regards,
>>>> Rohit
>>>>
>>>>
>>>>
>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj
>>>>>> <rohitarulraj at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes
>>>>>>> <david.holmes at oracle.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Rohit,
>>>>>>>>
>>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>>>> logic
>>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>>>> resubmit for review.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Rohit
>>>>>>>
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>>> default) and didnt find any regressions.
>>>>>>
>>>>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>>
>>>>>> ************************* Patch ****************************
>>>>>>
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>           }
>>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>         }
>>>>>> +    if (supports_sha()) {
>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>> +      }
>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>> UseSHA512Intrinsics) {
>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>> +      }
>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +    }
>>>>>>
>>>>>>         // some defaults for AMD family 15h
>>>>>>         if ( cpu_family() == 0x15 ) {
>>>>>> @@ -1109,11 +1125,43 @@
>>>>>>         }
>>>>>>
>>>>>>     #ifdef COMPILER2
>>>>>> -    if (MaxVectorSize > 16) {
>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>         }
>>>>>>     #endif // COMPILER2
>>>>>> +
>>>>>> +    // Some defaults for AMD family 17h
>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>>> Array Copy
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>> +        UseXMMForArrayCopy = true;
>>>>>> +      }
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>> {
>>>>>> +        UseUnalignedLoadStores = true;
>>>>>> +      }
>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>> +        UseBMI2Instructions = true;
>>>>>> +      }
>>>>>> +      if (MaxVectorSize > 32) {
>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>> +      }
>>>>>> +      if (UseSHA) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>> functions not available on this CPU.");
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#ifdef COMPILER2
>>>>>> +      if (supports_sse4_2()) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#endif
>>>>>> +    }
>>>>>>       }
>>>>>>
>>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> @@ -505,6 +505,14 @@
>>>>>>           result |= CPU_CLMUL;
>>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>           result |= CPU_RTM;
>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> +       result |= CPU_ADX;
>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> +      result |= CPU_BMI2;
>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> +      result |= CPU_SHA;
>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> +      result |= CPU_FMA;
>>>>>>
>>>>>>         // AMD features.
>>>>>>         if (is_amd()) {
>>>>>> @@ -515,19 +523,13 @@
>>>>>>             result |= CPU_LZCNT;
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>             result |= CPU_SSE4A;
>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>> +        result |= CPU_HT;
>>>>>>         }
>>>>>>         // Intel features.
>>>>>>         if(is_intel()) {
>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> -         result |= CPU_ADX;
>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> -        result |= CPU_BMI2;
>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> -        result |= CPU_SHA;
>>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>             result |= CPU_LZCNT;
>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> -        result |= CPU_FMA;
>>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>>> support for prefetchw
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>>
>>>>>> **************************************************************
>>>>>>
>>>>>> Thanks,
>>>>>> Rohit
>>>>>>
>>>>>>>>
>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>>> <david.holmes at oracle.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Rohit,
>>>>>>>>>>
>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>>>> sets
>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>>>> with
>>>>>>>>>>> the commit process.
>>>>>>>>>>>
>>>>>>>>>>> Webrev:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>>>> OpenJDK
>>>>>>>>>> infrastructure and ...
>>>>>>>>>>
>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail
>>>>>>>>>> servers.
>>>>>>>>>> If
>>>>>>>>>> the
>>>>>>>>>> patch is small please include it inline. Otherwise you will need
>>>>>>>>>> to
>>>>>>>>>> find
>>>>>>>>>> an
>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>>>> didnt find any regressions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>>> testing
>>>>>>>>>> requirements.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks David,
>>>>>>>>> Yes, it's a small patch.
>>>>>>>>>
>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>>            }
>>>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>>          }
>>>>>>>>> +    if (supports_sha()) {
>>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>>> +      }
>>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics
>>>>>>>>> ||
>>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>>>> +      }
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +    }
>>>>>>>>>
>>>>>>>>>          // some defaults for AMD family 15h
>>>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>>          }
>>>>>>>>>
>>>>>>>>>      #ifdef COMPILER2
>>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>>          }
>>>>>>>>>      #endif // COMPILER2
>>>>>>>>> +
>>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>>>> for
>>>>>>>>> Array Copy
>>>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy))
>>>>>>>>> {
>>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (supports_sse2() &&
>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>>> {
>>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions))
>>>>>>>>> {
>>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>>> +      }
>>>>>>>>> +      if (UseSHA) {
>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>>> functions not available on this CPU.");
>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +        }
>>>>>>>>> +      }
>>>>>>>>> +#ifdef COMPILER2
>>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>>> +        }
>>>>>>>>> +      }
>>>>>>>>> +#endif
>>>>>>>>> +    }
>>>>>>>>>        }
>>>>>>>>>
>>>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>>              result |= CPU_SSE4A;
>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>>> +        result |= CPU_HT;
>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>> +        result |= CPU_ADX;
>>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>> +        result |= CPU_SHA;
>>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>>          }
>>>>>>>>>          // Intel features.
>>>>>>>>>          if(is_intel()) {
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Rohit
>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>


More information about the hotspot-dev mailing list