RFR: Newer AMD 17h (EPYC) Processor family defaults
David Holmes
david.holmes at oracle.com
Tue Sep 5 05:01:44 UTC 2017
Hi Rohit,
I was unable to apply your patch to latest jdk10/hs/hotspot repo.
Vladimir: are you able to host a webrev for this change please?
Thanks,
David
----
On 4/09/2017 2:42 AM, Rohit Arul Raj wrote:
> Hello Vladimir,
>
> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
> <vladimir.kozlov at oracle.com> wrote:
>> Hi Rohit,
>>
>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>
>>> Hello Vladimir,
>>>
>>>> Changes look good. Only question I have is about MaxVectorSize. It is set
>>>>>
>>>> 16 only in presence of AVX:
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>>
>>>> Does that code works for AMD 17h too?
>>>
>>>
>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>> have updated, re-tested and attached the patch.
>>
>>
>> Which check you removed?
>>
>
> My older patch had the below mentioned check which was required on
> JDK9 where the default MaxVectorSize was 64. It has been handled
> better in openJDK10. So this check is not required anymore.
>
> + // Some defaults for AMD family 17h
> + if ( cpu_family() == 0x17 ) {
> ...
> ...
> + if (MaxVectorSize > 32) {
> + FLAG_SET_DEFAULT(MaxVectorSize, 32);
> + }
> ..
> ..
> + }
>
>>>
>>> I have one query regarding the setting of UseSHA flag:
>>>
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>>
>>> AMD 17h has support for SHA.
>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets
>>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>>> underlying reason for this? I have handled this in the patch but just
>>> wanted to confirm.
>>
>>
>> It was done with next changes which use only AVX2 and BMI2 instructions to
>> calculate SHA-256:
>>
>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>
>> I don't know if AMD 15h supports these instructions and can execute that
>> code. You need to test it.
>>
>
> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
> it should work.
> Confirmed by running following sanity tests:
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>
> So I have removed those SHA checks from my patch too.
>
> Please find attached updated, re-tested patch.
>
> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
> b/src/cpu/x86/vm/vm_version_x86.cpp
> --- a/src/cpu/x86/vm/vm_version_x86.cpp
> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
> @@ -1109,11 +1109,27 @@
> }
>
> #ifdef COMPILER2
> - if (MaxVectorSize > 16) {
> - // Limit vectors size to 16 bytes on current AMD cpus.
> + if (cpu_family() < 0x17 && MaxVectorSize > 16) {
> + // Limit vectors size to 16 bytes on AMD cpus < 17h.
> FLAG_SET_DEFAULT(MaxVectorSize, 16);
> }
> #endif // COMPILER2
> +
> + // Some defaults for AMD family 17h
> + if ( cpu_family() == 0x17 ) {
> + // On family 17h processors use XMM and UnalignedLoadStores for
> Array Copy
> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
> + }
> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
> + }
> +#ifdef COMPILER2
> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
> + FLAG_SET_DEFAULT(UseFPUForSpilling, true);
> + }
> +#endif
> + }
> }
>
> if( is_intel() ) { // Intel cpus specific settings
> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
> b/src/cpu/x86/vm/vm_version_x86.hpp
> --- a/src/cpu/x86/vm/vm_version_x86.hpp
> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
> @@ -505,6 +505,14 @@
> result |= CPU_CLMUL;
> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
> result |= CPU_RTM;
> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> + result |= CPU_ADX;
> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> + result |= CPU_BMI2;
> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> + result |= CPU_SHA;
> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> + result |= CPU_FMA;
>
> // AMD features.
> if (is_amd()) {
> @@ -515,19 +523,13 @@
> result |= CPU_LZCNT;
> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
> result |= CPU_SSE4A;
> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
> + result |= CPU_HT;
> }
> // Intel features.
> if(is_intel()) {
> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> - result |= CPU_ADX;
> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> - result |= CPU_BMI2;
> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> - result |= CPU_SHA;
> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
> result |= CPU_LZCNT;
> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> - result |= CPU_FMA;
> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
> support for prefetchw
> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
> result |= CPU_3DNOW_PREFETCH;
>
> Please let me know your comments.
>
> Thanks for your time.
> Rohit
>
>>>
>>> Thanks for taking time to review the code.
>>>
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>> @@ -1088,6 +1088,22 @@
>>> }
>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>> }
>>> + if (supports_sha()) {
>>> + if (FLAG_IS_DEFAULT(UseSHA)) {
>>> + FLAG_SET_DEFAULT(UseSHA, true);
>>> + }
>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>> UseSHA512Intrinsics) {
>>> + if (!FLAG_IS_DEFAULT(UseSHA) ||
>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> + warning("SHA instructions are not available on this CPU");
>>> + }
>>> + FLAG_SET_DEFAULT(UseSHA, false);
>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> + }
>>>
>>> // some defaults for AMD family 15h
>>> if ( cpu_family() == 0x15 ) {
>>> @@ -1109,11 +1125,40 @@
>>> }
>>>
>>> #ifdef COMPILER2
>>> - if (MaxVectorSize > 16) {
>>> - // Limit vectors size to 16 bytes on current AMD cpus.
>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>> FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>> }
>>> #endif // COMPILER2
>>> +
>>> + // Some defaults for AMD family 17h
>>> + if ( cpu_family() == 0x17 ) {
>>> + // On family 17h processors use XMM and UnalignedLoadStores for
>>> Array Copy
>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>> + }
>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>> + }
>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>> + }
>>> + if (UseSHA) {
>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> + } else if (UseSHA512Intrinsics) {
>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>> functions not available on this CPU.");
>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> + }
>>> + }
>>> +#ifdef COMPILER2
>>> + if (supports_sse4_2()) {
>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>> + }
>>> + }
>>> +#endif
>>> + }
>>> }
>>>
>>> if( is_intel() ) { // Intel cpus specific settings
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>> @@ -505,6 +505,14 @@
>>> result |= CPU_CLMUL;
>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>> result |= CPU_RTM;
>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> + result |= CPU_ADX;
>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> + result |= CPU_BMI2;
>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> + result |= CPU_SHA;
>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> + result |= CPU_FMA;
>>>
>>> // AMD features.
>>> if (is_amd()) {
>>> @@ -515,19 +523,13 @@
>>> result |= CPU_LZCNT;
>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>> result |= CPU_SSE4A;
>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>> + result |= CPU_HT;
>>> }
>>> // Intel features.
>>> if(is_intel()) {
>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> - result |= CPU_ADX;
>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> - result |= CPU_BMI2;
>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> - result |= CPU_SHA;
>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>> result |= CPU_LZCNT;
>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> - result |= CPU_FMA;
>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>> support for prefetchw
>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>> result |= CPU_3DNOW_PREFETCH;
>>>
>>>
>>> Regards,
>>> Rohit
>>>
>>>
>>>
>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj <rohitarulraj at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <david.holmes at oracle.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Rohit,
>>>>>>>
>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>>> logic
>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>>> resubmit for review.
>>>>>>
>>>>>> Regards,
>>>>>> Rohit
>>>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>> default) and didnt find any regressions.
>>>>>
>>>>> Can anyone please volunteer to review this patch which sets flag/ISA
>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>
>>>>> ************************* Patch ****************************
>>>>>
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> @@ -1088,6 +1088,22 @@
>>>>> }
>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>> }
>>>>> + if (supports_sha()) {
>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>> + FLAG_SET_DEFAULT(UseSHA, true);
>>>>> + }
>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>> UseSHA512Intrinsics) {
>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> + warning("SHA instructions are not available on this CPU");
>>>>> + }
>>>>> + FLAG_SET_DEFAULT(UseSHA, false);
>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> + }
>>>>>
>>>>> // some defaults for AMD family 15h
>>>>> if ( cpu_family() == 0x15 ) {
>>>>> @@ -1109,11 +1125,43 @@
>>>>> }
>>>>>
>>>>> #ifdef COMPILER2
>>>>> - if (MaxVectorSize > 16) {
>>>>> - // Limit vectors size to 16 bytes on current AMD cpus.
>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>> }
>>>>> #endif // COMPILER2
>>>>> +
>>>>> + // Some defaults for AMD family 17h
>>>>> + if ( cpu_family() == 0x17 ) {
>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for
>>>>> Array Copy
>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>> + UseXMMForArrayCopy = true;
>>>>> + }
>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>>> + UseUnalignedLoadStores = true;
>>>>> + }
>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>> + UseBMI2Instructions = true;
>>>>> + }
>>>>> + if (MaxVectorSize > 32) {
>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>> + }
>>>>> + if (UseSHA) {
>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> + } else if (UseSHA512Intrinsics) {
>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>> functions not available on this CPU.");
>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> + }
>>>>> + }
>>>>> +#ifdef COMPILER2
>>>>> + if (supports_sse4_2()) {
>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>> + }
>>>>> + }
>>>>> +#endif
>>>>> + }
>>>>> }
>>>>>
>>>>> if( is_intel() ) { // Intel cpus specific settings
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> @@ -505,6 +505,14 @@
>>>>> result |= CPU_CLMUL;
>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>> result |= CPU_RTM;
>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> + result |= CPU_ADX;
>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> + result |= CPU_BMI2;
>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> + result |= CPU_SHA;
>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> + result |= CPU_FMA;
>>>>>
>>>>> // AMD features.
>>>>> if (is_amd()) {
>>>>> @@ -515,19 +523,13 @@
>>>>> result |= CPU_LZCNT;
>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>> result |= CPU_SSE4A;
>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>> + result |= CPU_HT;
>>>>> }
>>>>> // Intel features.
>>>>> if(is_intel()) {
>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> - result |= CPU_ADX;
>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> - result |= CPU_BMI2;
>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> - result |= CPU_SHA;
>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>> result |= CPU_LZCNT;
>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> - result |= CPU_FMA;
>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>> support for prefetchw
>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>> result |= CPU_3DNOW_PREFETCH;
>>>>>
>>>>> **************************************************************
>>>>>
>>>>> Thanks,
>>>>> Rohit
>>>>>
>>>>>>>
>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>> <david.holmes at oracle.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Rohit,
>>>>>>>>>
>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>>> sets
>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>>> with
>>>>>>>>>> the commit process.
>>>>>>>>>>
>>>>>>>>>> Webrev:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>>> OpenJDK
>>>>>>>>> infrastructure and ...
>>>>>>>>>
>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ... unfortunately patches tend to get stripped by the mail servers.
>>>>>>>>> If
>>>>>>>>> the
>>>>>>>>> patch is small please include it inline. Otherwise you will need to
>>>>>>>>> find
>>>>>>>>> an
>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>
>>>>>>>>
>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>>> didnt find any regressions.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>> testing
>>>>>>>>> requirements.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks David,
>>>>>>>> Yes, it's a small patch.
>>>>>>>>
>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>> }
>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>> }
>>>>>>>> + if (supports_sha()) {
>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>> + }
>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>> + warning("SHA instructions are not available on this CPU");
>>>>>>>> + }
>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> + }
>>>>>>>>
>>>>>>>> // some defaults for AMD family 15h
>>>>>>>> if ( cpu_family() == 0x15 ) {
>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>> }
>>>>>>>>
>>>>>>>> #ifdef COMPILER2
>>>>>>>> - if (MaxVectorSize > 16) {
>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>> }
>>>>>>>> #endif // COMPILER2
>>>>>>>> +
>>>>>>>> + // Some defaults for AMD family 17h
>>>>>>>> + if ( cpu_family() == 0x17 ) {
>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>>> for
>>>>>>>> Array Copy
>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>>> + UseXMMForArrayCopy = true;
>>>>>>>> + }
>>>>>>>> + if (supports_sse2() &&
>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>> {
>>>>>>>> + UseUnalignedLoadStores = true;
>>>>>>>> + }
>>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>>> + UseBMI2Instructions = true;
>>>>>>>> + }
>>>>>>>> + if (MaxVectorSize > 32) {
>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>> + }
>>>>>>>> + if (UseSHA) {
>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> + } else if (UseSHA512Intrinsics) {
>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>> functions not available on this CPU.");
>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> +#ifdef COMPILER2
>>>>>>>> + if (supports_sse4_2()) {
>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> +#endif
>>>>>>>> + }
>>>>>>>> }
>>>>>>>>
>>>>>>>> if( is_intel() ) { // Intel cpus specific settings
>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>> result |= CPU_LZCNT;
>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>> result |= CPU_SSE4A;
>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>> + result |= CPU_BMI2;
>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>> + result |= CPU_HT;
>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>> + result |= CPU_ADX;
>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>> + result |= CPU_SHA;
>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>> + result |= CPU_FMA;
>>>>>>>> }
>>>>>>>> // Intel features.
>>>>>>>> if(is_intel()) {
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Rohit
>>>>>>>>
>>>>>>>
>>>>
>>
More information about the hotspot-dev
mailing list