From coleen.phillimore at oracle.com Fri Sep 1 00:03:31 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 31 Aug 2017 20:03:31 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A804F8.9000501@oracle.com> References: <59A804F8.9000501@oracle.com> Message-ID: <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> Hi, I'm trying to parse the templates to review this but maybe it's convention but decoding these with parameters that are single capital letters make reading the template very difficult.? There are already a lot of non-alphanumeric characters.?? When the letter is T, that is expected by convention, but D or especially I makes it really hard.?? Can these be normalized to all use T when there is only one template parameter?? It'll be clear that T* is a pointer and T is an integer without having it be P. +template +struct Atomic::IncImpl::value>::type> VALUE_OBJ_CLASS_SPEC { + void operator()(I volatile* dest) const { + typedef IntegralConstant Adjustment; + typedef PlatformInc PlatformOp; + PlatformOp()(dest); + } +}; This one isn't as difficult, because it's short, but it would be faster to understand with T. +template +struct Atomic::IncImpl::value>::type> VALUE_OBJ_CLASS_SPEC { + void operator()(T volatile* dest) const { + typedef IntegralConstant Adjustment; + typedef PlatformInc PlatformOp; + PlatformOp()(dest); + } +}; +template<> +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { + void operator()(jshort volatile* dest) const { + add(jshort(1), dest); + } +}; Did I already ask if this could be changed to u2 rather than jshort?? Or is that the follow-on RFE? +// Helper for platforms wanting a constant adjustment. +template +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { + typedef PlatformInc Derived; I can't find the caller of this.? Is it really a lot faster than having the platform independent add(1, T) / add(-1, T) to make all this code worth having?? How is this called?? I couldn't parse the trick.? Atomic::inc() is always a "constant adjustment" so I'm confused about what the comment means and what motivates all the asm code.?? Do these platform implementations exist because they don't have twos complement for integer representation?? really? Also, the function name This() is really disturbing and distracting.? Can it be called some verb() representing what it does?? cast_to_derived()? + template + void operator()(I volatile* dest) const { + This()->template inc(dest); + } I didn't know you could put "template" there.?? What does this call? Rather than I for integer case, and P for pointer case, can you add a one line comment above this like: // Helper for integer types and // Helper for pointer types Small local comments would be really helpful for many of these functions.?? Just to get more english words in there...? Since Kim's on vacation can you help me understand this code and add comments so I remember the reasons for some of this? Thanks! Coleen On 8/31/17 8:45 AM, Erik ?sterlund wrote: > Hi everyone, > > Bug ID: > https://bugs.openjdk.java.net/browse/JDK-8186838 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ > > The time has come for the next step in generalizing Atomic with > templates. Today I will focus on Atomic::inc/dec. > > I have tried to mimic the new Kim style that seems to have been > universally accepted. Like Atomic::add and Atomic::cmpxchg, the > structure looks like this: > > Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object > that performs some basic type checks. > Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define > the operation arbitrarily for a given platform. The default > implementation if not specialized for a platform is to call > Atomic::add. So only platforms that want to do something different > than that as an optimization have to provide a specialization. > Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec > to be more optimized may inherit from a helper class > IncUsingConstant/DecUsingConstant. This helper helps performing the > necessary computation what the increment/decrement should be after > pointer scaling using CRTP. The PlatformInc/PlatformDec operation then > only needs to define an inc/dec member function, and will then get all > the context information necessary to generate a more optimized > implementation. Easy peasy. > > It is worth noticing that the generalized Atomic::dec operation > assumes a two's complement integer machine and potentially sends the > unary negative of a potentially unsigned type to Atomic::add. I have > the following comments about this: > 1) We already assume in other code that two's complement integers must > be present. > 2) A machine that does not have two's complement integers may still > simply provide a specialization that solves the problem in a different > way. > 3) The alternative that does not make assumptions about that would use > the good old IntegerTypes::cast_to_signed metaprogramming stuff, and I > seem to recall we thought that was a bit too involved and complicated. > This is the reason why I have chosen to use unary minus on the > potentially unsigned type in the shared helper code that sends the > decrement as an addend to Atomic::add. > > It would also be nice if somebody with access to PPC and s390 machines > could try out the relevant changes there so I do not accidentally > break those platforms. I have blind-coded the addition of the > immediate values passed in to the inline assembly in a way that I > think looks like it should work. > > Testing: > RBT hs-tier3, JPRT --testset hotspot > > Thanks, > /Erik From david.holmes at oracle.com Fri Sep 1 00:49:43 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 1 Sep 2017 10:49:43 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A804F8.9000501@oracle.com> References: <59A804F8.9000501@oracle.com> Message-ID: <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> Hi Erik, Sorry but this one is really losing me. What is the role of Adjustment ?? How are inc/dec anything but "using constant" ?? Why do we special case jshort?? This is indecipherable to normal people ;-) This()->template inc(dest); For something as trivial as adding or subtracting 1 the template machinations here are just mind boggling! Cheers, David On 31/08/2017 10:45 PM, Erik ?sterlund wrote: > Hi everyone, > > Bug ID: > https://bugs.openjdk.java.net/browse/JDK-8186838 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ > > The time has come for the next step in generalizing Atomic with > templates. Today I will focus on Atomic::inc/dec. > > I have tried to mimic the new Kim style that seems to have been > universally accepted. Like Atomic::add and Atomic::cmpxchg, the > structure looks like this: > > Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object > that performs some basic type checks. > Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define > the operation arbitrarily for a given platform. The default > implementation if not specialized for a platform is to call Atomic::add. > So only platforms that want to do something different than that as an > optimization have to provide a specialization. > Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to > be more optimized may inherit from a helper class > IncUsingConstant/DecUsingConstant. This helper helps performing the > necessary computation what the increment/decrement should be after > pointer scaling using CRTP. The PlatformInc/PlatformDec operation then > only needs to define an inc/dec member function, and will then get all > the context information necessary to generate a more optimized > implementation. Easy peasy. > > It is worth noticing that the generalized Atomic::dec operation assumes > a two's complement integer machine and potentially sends the unary > negative of a potentially unsigned type to Atomic::add. I have the > following comments about this: > 1) We already assume in other code that two's complement integers must > be present. > 2) A machine that does not have two's complement integers may still > simply provide a specialization that solves the problem in a different way. > 3) The alternative that does not make assumptions about that would use > the good old IntegerTypes::cast_to_signed metaprogramming stuff, and I > seem to recall we thought that was a bit too involved and complicated. > This is the reason why I have chosen to use unary minus on the > potentially unsigned type in the shared helper code that sends the > decrement as an addend to Atomic::add. > > It would also be nice if somebody with access to PPC and s390 machines > could try out the relevant changes there so I do not accidentally break > those platforms. I have blind-coded the addition of the immediate values > passed in to the inline assembly in a way that I think looks like it > should work. > > Testing: > RBT hs-tier3, JPRT --testset hotspot > > Thanks, > /Erik From rohitarulraj at gmail.com Fri Sep 1 04:57:34 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 1 Sep 2017 10:27:34 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 3:01 AM, David Holmes wrote: > Hi Rohit, > > I think the patch needs updating for jdk10 as I already see a lot of logic > around UseSHA in vm_version_x86.cpp. > > Thanks, > David > Thanks David, I will update the patch wrt JDK10 source base, test and resubmit for review. Regards, Rohit > > On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >> >> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >> wrote: >>> >>> Hi Rohit, >>> >>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>> >>>> >>>> I would like an volunteer to review this patch (openJDK9) which sets >>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>> the commit process. >>>> >>>> Webrev: >>>> >>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>> >>> >>> >>> Unfortunately patches can not be accepted from systems outside the >>> OpenJDK >>> infrastructure and ... >>> >>>> I have also attached the patch (hg diff -g) for reference. >>> >>> >>> >>> ... unfortunately patches tend to get stripped by the mail servers. If >>> the >>> patch is small please include it inline. Otherwise you will need to find >>> an >>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>> >> >>>> 3) I have done regression testing using jtreg ($make default) and >>>> didnt find any regressions. >>> >>> >>> >>> Sounds good, but until I see the patch it is hard to comment on testing >>> requirements. >>> >>> Thanks, >>> David >> >> >> Thanks David, >> Yes, it's a small patch. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1051,6 +1051,22 @@ >> } >> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >> } >> + if (supports_sha()) { >> + if (FLAG_IS_DEFAULT(UseSHA)) { >> + FLAG_SET_DEFAULT(UseSHA, true); >> + } >> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >> UseSHA512Intrinsics) { >> + if (!FLAG_IS_DEFAULT(UseSHA) || >> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + warning("SHA instructions are not available on this CPU"); >> + } >> + FLAG_SET_DEFAULT(UseSHA, false); >> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> >> // some defaults for AMD family 15h >> if ( cpu_family() == 0x15 ) { >> @@ -1072,11 +1088,43 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + UseXMMForArrayCopy = true; >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + UseUnalignedLoadStores = true; >> + } >> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >> + UseBMI2Instructions = true; >> + } >> + if (MaxVectorSize > 32) { >> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >> + } >> + if (UseSHA) { >> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } else if (UseSHA512Intrinsics) { >> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >> functions not available on this CPU."); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2()) { >> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -513,6 +513,16 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> } >> // Intel features. >> if(is_intel()) { >> >> Regards, >> Rohit >> > From rohitarulraj at gmail.com Fri Sep 1 05:14:44 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 1 Sep 2017 10:44:44 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <5f8def30-1554-29a5-dde7-62b9940d0161@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <5f8def30-1554-29a5-dde7-62b9940d0161@oracle.com> Message-ID: Hello Vladimir, > But it also mean that AMD will have to do Java testing for this new platform > and be responsible for it. Can you please elaborate on this a little more? What all Java test suites would you like us to test from our end? > In a future we may forward this CPU related problems to you to analyze and > fix. Sure, looking forward to it. Regards, Rohit > Regards, > Vladimir > > > On 8/31/17 2:31 PM, David Holmes wrote: >> >> Hi Rohit, >> >> I think the patch needs updating for jdk10 as I already see a lot of logic >> around UseSHA in vm_version_x86.cpp. >> >> Thanks, >> David >> >> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>> >>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>> wrote: >>>> >>>> Hi Rohit, >>>> >>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>> the commit process. >>>>> >>>>> Webrev: >>>>> >>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>> >>>> >>>> >>>> Unfortunately patches can not be accepted from systems outside the >>>> OpenJDK >>>> infrastructure and ... >>>> >>>>> I have also attached the patch (hg diff -g) for reference. >>>> >>>> >>>> >>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>> the >>>> patch is small please include it inline. Otherwise you will need to find >>>> an >>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>> >>> >>>>> 3) I have done regression testing using jtreg ($make default) and >>>>> didnt find any regressions. >>>> >>>> >>>> >>>> Sounds good, but until I see the patch it is hard to comment on testing >>>> requirements. >>>> >>>> Thanks, >>>> David >>> >>> >>> Thanks David, >>> Yes, it's a small patch. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1051,6 +1051,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1072,11 +1088,43 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + UseXMMForArrayCopy = true; >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + UseUnalignedLoadStores = true; >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + UseBMI2Instructions = true; >>> + } >>> + if (MaxVectorSize > 32) { >>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -513,6 +513,16 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> } >>> // Intel features. >>> if(is_intel()) { >>> >>> Regards, >>> Rohit >>> > From vladimir.kozlov at oracle.com Fri Sep 1 07:19:18 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Sep 2017 00:19:18 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <5f8def30-1554-29a5-dde7-62b9940d0161@oracle.com> Message-ID: <95adc793-0faf-9379-ae56-8dae1d5c498f@oracle.com> On 8/31/17 10:14 PM, Rohit Arul Raj wrote: > Hello Vladimir, > >> But it also mean that AMD will have to do Java testing for this new platform >> and be responsible for it. > > Can you please elaborate on this a little more? > What all Java test suites would you like us to test from our end? First, I am talking only about testing on your platform. In this case it is AMD 17h. You need to build and use fastdebug JVM for testing: configure --with-debug-level=fastdebug You need to make sure to run hotspot and jdk jtreg tests. At least next set of tests: make test JOBS=1 TEST_JOBS=1 TEST="hotspot_compiler hotspot_gc hotspot_runtime hotspot_serviceability hotspot_misc jdk_util jdk_lang" It will take time. You can try to increase JOBS=1 and TEST_JOBS=1 numbers to run tests in parallel but depending on memory and swap sizes it may not work. In addition to that would be nice if you track performance changes with specjvm2008 and specjbb2015 on your cpu to avoid regression when you apply new changes or pull changes from OpenJDK. If you have questions, please ask. > >> In a future we may forward this CPU related problems to you to analyze and >> fix. > > Sure, looking forward to it. Best regards, Vladimir > > Regards, > Rohit > >> Regards, >> Vladimir >> >> >> On 8/31/17 2:31 PM, David Holmes wrote: >>> >>> Hi Rohit, >>> >>> I think the patch needs updating for jdk10 as I already see a lot of logic >>> around UseSHA in vm_version_x86.cpp. >>> >>> Thanks, >>> David >>> >>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>> >>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>> wrote: >>>>> >>>>> Hi Rohit, >>>>> >>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>> the commit process. >>>>>> >>>>>> Webrev: >>>>>> >>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>> >>>>> >>>>> >>>>> Unfortunately patches can not be accepted from systems outside the >>>>> OpenJDK >>>>> infrastructure and ... >>>>> >>>>>> I have also attached the patch (hg diff -g) for reference. >>>>> >>>>> >>>>> >>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>> the >>>>> patch is small please include it inline. Otherwise you will need to find >>>>> an >>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>> >>>> >>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>> didnt find any regressions. >>>>> >>>>> >>>>> >>>>> Sounds good, but until I see the patch it is hard to comment on testing >>>>> requirements. >>>>> >>>>> Thanks, >>>>> David >>>> >>>> >>>> Thanks David, >>>> Yes, it's a small patch. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1051,6 +1051,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1072,11 +1088,43 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + UseXMMForArrayCopy = true; >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + UseUnalignedLoadStores = true; >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + UseBMI2Instructions = true; >>>> + } >>>> + if (MaxVectorSize > 32) { >>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -513,6 +513,16 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> >>>> Regards, >>>> Rohit >>>> >> From erik.osterlund at oracle.com Fri Sep 1 08:40:06 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 10:40:06 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> References: <59A804F8.9000501@oracle.com> <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> Message-ID: <59A91CE6.2080206@oracle.com> Hi Coleen, Thank you for taking your time to review this. On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote: > > Hi, I'm trying to parse the templates to review this but maybe it's > convention but decoding these with parameters that are single capital > letters make reading the template very difficult. There are already a > lot of non-alphanumeric characters. When the letter is T, that is > expected by convention, but D or especially I makes it really hard. > Can these be normalized to all use T when there is only one template > parameter? It'll be clear that T* is a pointer and T is an integer > without having it be P. I apologize the names of the template parameters are hard to understand. For what it's worth, I am only consistently applying Kim's conventions here. It seemed like a bad idea to violate conventions already set up - that would arguably be more confusing. The convention from earlier work by Kim is: D: Type of destination I: Operand type that has to be an integral type P: Operand type that is a pointer element type T: Generic operand type, may be integral or pointer type Personally, I do not mind this convention. It is more specific and annotates things we know about the type into the name of the type. Do you want me to: 1) Keep the convention, now that I have explained what the convention is and why it is your friend 2) Break the convention for this change only making the naming inconsistent 3) Change the convention throughout consistently, including all earlier work from Kim > > +template > +struct Atomic::IncImpl EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { > + void operator()(I volatile* dest) const { > + typedef IntegralConstant Adjustment; > + typedef PlatformInc PlatformOp; > + PlatformOp()(dest); > + } > +}; > > This one isn't as difficult, because it's short, but it would be > faster to understand with T. > > +template > +struct Atomic::IncImpl EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { > + void operator()(T volatile* dest) const { > + typedef IntegralConstant Adjustment; > + typedef PlatformInc PlatformOp; > + PlatformOp()(dest); > + } > +}; > > +template<> > +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { > + void operator()(jshort volatile* dest) const { > + add(jshort(1), dest); > + } > +}; > > > Did I already ask if this could be changed to u2 rather than jshort? > Or is that the follow-on RFE? That is a follow-on RFE. > +// Helper for platforms wanting a constant adjustment. > +template > +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { > + typedef PlatformInc Derived; > > > I can't find the caller of this. Is it really a lot faster than > having the platform independent add(1, T) / add(-1, T) to make all > this code worth having? How is this called? I couldn't parse the > trick. Atomic::inc() is always a "constant adjustment" so I'm > confused about what the comment means and what motivates all the asm > code. Do these platform implementations exist because they don't > have twos complement for integer representation? really? This is used by some x86, PPC and s390 platforms. Personally I question its usefulness for x86. I believe it might be one of those things were we ran some benchmarks a decade ago and concluded that it was slightly faster to have a slimmed path for Atomic::inc rather than reusing Atomic::add. I did not initially want to bring this up as it seems like none of my business, but now that the question has been asked about differences, I could not help but notice the advertised "leading sync" convention of Atomic::inc on PPC is not respected. That is, there is no "sync" fence before the atomic increment, as required by the specified semantics. There is not even a leading "lwsync". The corresponding Atomic::add operation though, does have leading lwsync (unlike Atomic::inc). Now this should arguably be reinforced to sync rather than lwsync to respect the advertised semantics of both Atomic::add and Atomic::inc on PPC. Hopefully that statement will not turn into a long unrelated mailing thread... Conclusively though, there is definitely a substantial difference in the fencing comparing the PPC implementation of Atomic::inc to Atomic::add. Whether either one of them conforms to intended semantics or not is a different matter - one that I was hoping not to have to deal with in this RFE as I am merely templateifying what was already there, without judging the existing specializations. And it is my observation that as the code looks now, we would incur a bunch of more fencing compared to what the code does today on PPC. > Also, the function name This() is really disturbing and distracting. > Can it be called some verb() representing what it does? > cast_to_derived()? > > + template > + void operator()(I volatile* dest) const { > + This()->template inc(dest); > + } > Yes, I will change the name accordingly as you suggest. > I didn't know you could put "template" there. It is required to put the template keyword before the member function name when calling a template member function with explicit template parameters (as opposed to implicitly inferred template parameters) on a template type. > What does this call? This calls the platform-defined intrinsic that is defined in the platform files - the one that contains the inline assembly. > Rather than I for integer case, and P for pointer case, can you add a > one line comment above this like: > // Helper for integer types > and > // Helper for pointer types Or perhaps we could do both? Nevertheless, I will add these comments. But as per the discussion above, I would be happy if we could keep the convention that Kim has already set up for the template type names. > Small local comments would be really helpful for many of these > functions. Just to get more english words in there... Since Kim's > on vacation can you help me understand this code and add comments so I > remember the reasons for some of this? Sure - I will decorate the code with some comments to help understanding. I will send an updated webrev when I get your reply regarding the typename naming convention verdict. Thanks for the review! /Erik > > Thanks! > Coleen > > > On 8/31/17 8:45 AM, Erik ?sterlund wrote: >> Hi everyone, >> >> Bug ID: >> https://bugs.openjdk.java.net/browse/JDK-8186838 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >> >> The time has come for the next step in generalizing Atomic with >> templates. Today I will focus on Atomic::inc/dec. >> >> I have tried to mimic the new Kim style that seems to have been >> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >> structure looks like this: >> >> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >> that performs some basic type checks. >> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >> define the operation arbitrarily for a given platform. The default >> implementation if not specialized for a platform is to call >> Atomic::add. So only platforms that want to do something different >> than that as an optimization have to provide a specialization. >> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec >> to be more optimized may inherit from a helper class >> IncUsingConstant/DecUsingConstant. This helper helps performing the >> necessary computation what the increment/decrement should be after >> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >> then only needs to define an inc/dec member function, and will then >> get all the context information necessary to generate a more >> optimized implementation. Easy peasy. >> >> It is worth noticing that the generalized Atomic::dec operation >> assumes a two's complement integer machine and potentially sends the >> unary negative of a potentially unsigned type to Atomic::add. I have >> the following comments about this: >> 1) We already assume in other code that two's complement integers >> must be present. >> 2) A machine that does not have two's complement integers may still >> simply provide a specialization that solves the problem in a >> different way. >> 3) The alternative that does not make assumptions about that would >> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, >> and I seem to recall we thought that was a bit too involved and >> complicated. >> This is the reason why I have chosen to use unary minus on the >> potentially unsigned type in the shared helper code that sends the >> decrement as an addend to Atomic::add. >> >> It would also be nice if somebody with access to PPC and s390 >> machines could try out the relevant changes there so I do not >> accidentally break those platforms. I have blind-coded the addition >> of the immediate values passed in to the inline assembly in a way >> that I think looks like it should work. >> >> Testing: >> RBT hs-tier3, JPRT --testset hotspot >> >> Thanks, >> /Erik > From erik.osterlund at oracle.com Fri Sep 1 09:29:58 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 11:29:58 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> Message-ID: <59A92896.9010604@oracle.com> Hi David, On 2017-09-01 02:49, David Holmes wrote: > Hi Erik, > > Sorry but this one is really losing me. > > What is the role of Adjustment ?? Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates. The type of the increment/decrement is the type of the destination when the destination is an integral type, otherwise if it is a pointer type, the increment/decrement type is ptrdiff_t. > How are inc/dec anything but "using constant" ?? I was also a bit torn on that name (I assume you are referring to IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted what this platform helper does. I considered calling the helper something with immediate in the name because it is really used to embed the constant as immediate values in inline assembly today. But then again that seemed too specific, as it is not completely obvious platform specializations will use it in that way. One might just want to specialize this to send it into some compiler Atomic::inc intrinsic for example. Do you have any other preferred names? Here are a few possible names for IncUsingConstant: IncUsingScaledConstant IncUsingAdjustedConstant IncUsingPlatformHelper Any favourites? > Why do we special case jshort?? To be consistent with the special case of Atomic::add on jshort. Do you want it removed? > This is indecipherable to normal people ;-) > > This()->template inc(dest); > > For something as trivial as adding or subtracting 1 the template > machinations here are just mind boggling! This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The idea is to devirtualize a virtual call by passing in the derived type as a template parameter to a base class, and then let the base class static_cast to the derived class to devirtualize the call. I hope this explanation sheds some light on what is going on. The same CRTP idiom was used in the Atomic::add implementation in a similar fashion. I will add some comments describing this in the next round after Coleen replies. Thanks for looking at this. /Erik > > Cheers, > David > > On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >> Hi everyone, >> >> Bug ID: >> https://bugs.openjdk.java.net/browse/JDK-8186838 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >> >> The time has come for the next step in generalizing Atomic with >> templates. Today I will focus on Atomic::inc/dec. >> >> I have tried to mimic the new Kim style that seems to have been >> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >> structure looks like this: >> >> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >> that performs some basic type checks. >> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >> define the operation arbitrarily for a given platform. The default >> implementation if not specialized for a platform is to call >> Atomic::add. So only platforms that want to do something different >> than that as an optimization have to provide a specialization. >> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec >> to be more optimized may inherit from a helper class >> IncUsingConstant/DecUsingConstant. This helper helps performing the >> necessary computation what the increment/decrement should be after >> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >> then only needs to define an inc/dec member function, and will then >> get all the context information necessary to generate a more >> optimized implementation. Easy peasy. >> >> It is worth noticing that the generalized Atomic::dec operation >> assumes a two's complement integer machine and potentially sends the >> unary negative of a potentially unsigned type to Atomic::add. I have >> the following comments about this: >> 1) We already assume in other code that two's complement integers >> must be present. >> 2) A machine that does not have two's complement integers may still >> simply provide a specialization that solves the problem in a >> different way. >> 3) The alternative that does not make assumptions about that would >> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, >> and I seem to recall we thought that was a bit too involved and >> complicated. >> This is the reason why I have chosen to use unary minus on the >> potentially unsigned type in the shared helper code that sends the >> decrement as an addend to Atomic::add. >> >> It would also be nice if somebody with access to PPC and s390 >> machines could try out the relevant changes there so I do not >> accidentally break those platforms. I have blind-coded the addition >> of the immediate values passed in to the inline assembly in a way >> that I think looks like it should work. >> >> Testing: >> RBT hs-tier3, JPRT --testset hotspot >> >> Thanks, >> /Erik From rohitarulraj at gmail.com Fri Sep 1 09:34:52 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 1 Sep 2017 15:04:52 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <95adc793-0faf-9379-ae56-8dae1d5c498f@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <5f8def30-1554-29a5-dde7-62b9940d0161@oracle.com> <95adc793-0faf-9379-ae56-8dae1d5c498f@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 12:49 PM, Vladimir Kozlov wrote: > On 8/31/17 10:14 PM, Rohit Arul Raj wrote: >> >> Hello Vladimir, >> >>> But it also mean that AMD will have to do Java testing for this new >>> platform >>> and be responsible for it. >> >> >> Can you please elaborate on this a little more? >> What all Java test suites would you like us to test from our end? > > > First, I am talking only about testing on your platform. In this case it is > AMD 17h. > > You need to build and use fastdebug JVM for testing: configure > --with-debug-level=fastdebug > > You need to make sure to run hotspot and jdk jtreg tests. At least next set > of tests: > > make test JOBS=1 TEST_JOBS=1 TEST="hotspot_compiler hotspot_gc > hotspot_runtime hotspot_serviceability hotspot_misc jdk_util jdk_lang" > > It will take time. You can try to increase JOBS=1 and TEST_JOBS=1 numbers to > run tests in parallel but depending on memory and swap sizes it may not > work. Yes, We will do that. > In addition to that would be nice if you track performance changes with > specjvm2008 and specjbb2015 on your cpu to avoid regression when you apply > new changes or pull changes from OpenJDK. We do run SPECjbb2015. Regarding SPECjvm2008, we tried the base run but the results are pretty inconsistent. The base throughput varies from run to run (~30%). This is the command we use to generate the numbers (startup.compiler.sunflow & compiler.sunflow have been disabled). Is there any benchmark option we may be missing? java -jar SPECjvm2008.jar startup.helloworld startup.compiler.compiler startup.compress startup.crypto.aes startup.crypto.rsa startup.crypto.signverify startup.mpegaudio startup.scimark.fft startup.scimark.lu startup.scimark.monte_carlo startup.scimark.sor startup.scimark.sparse startup.serial startup.sunflow startup.xml.transform startup.xml.validation compiler.compiler compress crypto.aes crypto.rsa crypto.signverify derby mpegaudio scimark.fft.large scimark.lu.large scimark.sor.large scimark.sparse.large scimark.fft.small scimark.lu.small scimark.sor.small scimark.sparse.small scimark.monte_carlo serial sunflow xml.transform xml.validation Regards, Rohit > If you have questions, please ask. > >> >>> In a future we may forward this CPU related problems to you to analyze >>> and >>> fix. >> >> >> Sure, looking forward to it. > > > Best regards, > Vladimir > > >> >> Regards, >> Rohit >> >>> Regards, >>> Vladimir >>> >>> >>> On 8/31/17 2:31 PM, David Holmes wrote: >>>> >>>> >>>> Hi Rohit, >>>> >>>> I think the patch needs updating for jdk10 as I already see a lot of >>>> logic >>>> around UseSHA in vm_version_x86.cpp. >>>> >>>> Thanks, >>>> David >>>> >>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>>> the commit process. >>>>>>> >>>>>>> Webrev: >>>>>>> >>>>>>> >>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>> OpenJDK >>>>>> infrastructure and ... >>>>>> >>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>>> the >>>>>> patch is small please include it inline. Otherwise you will need to >>>>>> find >>>>>> an >>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>> >>>>> >>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>> didnt find any regressions. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>> testing >>>>>> requirements. >>>>>> >>>>>> Thanks, >>>>>> David >>>>> >>>>> >>>>> >>>>> Thanks David, >>>>> Yes, it's a small patch. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1051,6 +1051,22 @@ >>>>> } >>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>> } >>>>> + if (supports_sha()) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>> + } >>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>> UseSHA512Intrinsics) { >>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + warning("SHA instructions are not available on this CPU"); >>>>> + } >>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> >>>>> // some defaults for AMD family 15h >>>>> if ( cpu_family() == 0x15 ) { >>>>> @@ -1072,11 +1088,43 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + UseXMMForArrayCopy = true; >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>> { >>>>> + UseUnalignedLoadStores = true; >>>>> + } >>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>> + UseBMI2Instructions = true; >>>>> + } >>>>> + if (MaxVectorSize > 32) { >>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> + } >>>>> + if (UseSHA) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } else if (UseSHA512Intrinsics) { >>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>> functions not available on this CPU."); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2()) { >>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -513,6 +513,16 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> >>>>> Regards, >>>>> Rohit >>>>> >>> > From glaubitz at physik.fu-berlin.de Fri Sep 1 09:35:21 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Fri, 1 Sep 2017 11:35:21 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A804F8.9000501@oracle.com> References: <59A804F8.9000501@oracle.com> Message-ID: <602b39a1-85e3-0e34-ff0c-c9076885c206@physik.fu-berlin.de> On 08/31/2017 02:45 PM, Erik ?sterlund wrote: > It would also be nice if somebody with access to PPC and s390 machines > could try out the relevant changes there so I do not accidentally break > those platforms. And linux-zero and linux-sparc, of course :). I will test that. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From david.holmes at oracle.com Fri Sep 1 10:34:22 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 1 Sep 2017 20:34:22 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A92896.9010604@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> Message-ID: <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> Hi Erik, I just wanted to add that I would expect the cmpxchg, add and inc, Atomic API's to all require similar basic structure for manipulating types/values etc, yet all three seem to have quite different structures that I find very confusing. I'm still at a loss to fathom the CRTP and the hoops we seemingly have to jump through just to add or subtract 1!!! Cheers, David On 1/09/2017 7:29 PM, Erik ?sterlund wrote: > Hi David, > > On 2017-09-01 02:49, David Holmes wrote: >> Hi Erik, >> >> Sorry but this one is really losing me. >> >> What is the role of Adjustment ?? > > Adjustment represents the increment/decrement value as an > IntegralConstant - your template friend for passing around a constant > with both a specified type and value in templates. The type of the > increment/decrement is the type of the destination when the destination > is an integral type, otherwise if it is a pointer type, the > increment/decrement type is ptrdiff_t. > >> How are inc/dec anything but "using constant" ?? > > I was also a bit torn on that name (I assume you are referring to > IncUsingConstant/DecUsingConstant). It was hard to find a name that > depicted what this platform helper does. I considered calling the helper > something with immediate in the name because it is really used to embed > the constant as immediate values in inline assembly today. But then > again that seemed too specific, as it is not completely obvious platform > specializations will use it in that way. One might just want to > specialize this to send it into some compiler Atomic::inc intrinsic for > example. Do you have any other preferred names? Here are a few possible > names for IncUsingConstant: > > IncUsingScaledConstant > IncUsingAdjustedConstant > IncUsingPlatformHelper > > Any favourites? > >> Why do we special case jshort?? > > To be consistent with the special case of Atomic::add on jshort. Do you > want it removed? > >> This is indecipherable to normal people ;-) >> >> ?This()->template inc(dest); >> >> For something as trivial as adding or subtracting 1 the template >> machinations here are just mind boggling! > > This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The > idea is to devirtualize a virtual call by passing in the derived type as > a template parameter to a base class, and then let the base class > static_cast to the derived class to devirtualize the call. I hope this > explanation sheds some light on what is going on. The same CRTP idiom > was used in the Atomic::add implementation in a similar fashion. > > I will add some comments describing this in the next round after Coleen > replies. > > Thanks for looking at this. > > /Erik > >> >> Cheers, >> David >> >> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>> Hi everyone, >>> >>> Bug ID: >>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>> >>> The time has come for the next step in generalizing Atomic with >>> templates. Today I will focus on Atomic::inc/dec. >>> >>> I have tried to mimic the new Kim style that seems to have been >>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>> structure looks like this: >>> >>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>> that performs some basic type checks. >>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>> define the operation arbitrarily for a given platform. The default >>> implementation if not specialized for a platform is to call >>> Atomic::add. So only platforms that want to do something different >>> than that as an optimization have to provide a specialization. >>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec >>> to be more optimized may inherit from a helper class >>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>> necessary computation what the increment/decrement should be after >>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >>> then only needs to define an inc/dec member function, and will then >>> get all the context information necessary to generate a more >>> optimized implementation. Easy peasy. >>> >>> It is worth noticing that the generalized Atomic::dec operation >>> assumes a two's complement integer machine and potentially sends the >>> unary negative of a potentially unsigned type to Atomic::add. I have >>> the following comments about this: >>> 1) We already assume in other code that two's complement integers >>> must be present. >>> 2) A machine that does not have two's complement integers may still >>> simply provide a specialization that solves the problem in a >>> different way. >>> 3) The alternative that does not make assumptions about that would >>> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, >>> and I seem to recall we thought that was a bit too involved and >>> complicated. >>> This is the reason why I have chosen to use unary minus on the >>> potentially unsigned type in the shared helper code that sends the >>> decrement as an addend to Atomic::add. >>> >>> It would also be nice if somebody with access to PPC and s390 >>> machines could try out the relevant changes there so I do not >>> accidentally break those platforms. I have blind-coded the addition >>> of the immediate values passed in to the inline assembly in a way >>> that I think looks like it should work. >>> >>> Testing: >>> RBT hs-tier3, JPRT --testset hotspot >>> >>> Thanks, >>> /Erik > From neugens at redhat.com Fri Sep 1 10:43:41 2017 From: neugens at redhat.com (Mario Torre) Date: Fri, 1 Sep 2017 12:43:41 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A92896.9010604@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 11:29 AM, Erik ?sterlund wrote: > Hi David, > > On 2017-09-01 02:49, David Holmes wrote: >> >> Hi Erik, >> >> Sorry but this one is really losing me. >> >> What is the role of Adjustment ?? > > > Adjustment represents the increment/decrement value as an IntegralConstant - > your template friend for passing around a constant with both a specified > type and value in templates. The type of the increment/decrement is the type > of the destination when the destination is an integral type, otherwise if it > is a pointer type, the increment/decrement type is ptrdiff_t. > >> How are inc/dec anything but "using constant" ?? > > > I was also a bit torn on that name (I assume you are referring to > IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted > what this platform helper does. I considered calling the helper something > with immediate in the name because it is really used to embed the constant > as immediate values in inline assembly today. But then again that seemed too > specific, as it is not completely obvious platform specializations will use > it in that way. One might just want to specialize this to send it into some > compiler Atomic::inc intrinsic for example. Do you have any other preferred > names? Here are a few possible names for IncUsingConstant: > > IncUsingScaledConstant > IncUsingAdjustedConstant > IncUsingPlatformHelper > > Any favourites? > >> Why do we special case jshort?? > > > To be consistent with the special case of Atomic::add on jshort. Do you want > it removed? > >> This is indecipherable to normal people ;-) >> >> This()->template inc(dest); >> >> For something as trivial as adding or subtracting 1 the template >> machinations here are just mind boggling! > > > This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The > idea is to devirtualize a virtual call by passing in the derived type as a > template parameter to a base class, and then let the base class static_cast > to the derived class to devirtualize the call. I hope this explanation sheds > some light on what is going on. The same CRTP idiom was used in the > Atomic::add implementation in a similar fashion. > > I will add some comments describing this in the next round after Coleen > replies. > Isn't that a lot more slower than the current inline? BTW, I think I see what those magic constants are (4, 8... rings a bell ;), but I think a define here could make things more readable. Cheers, Mario From erik.osterlund at oracle.com Fri Sep 1 10:49:55 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 12:49:55 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> Message-ID: <59A93B53.9010505@oracle.com> Hi David, The shared structure for all operations is the following: An Atomic::something call creates a SomethingImpl function object that performs some basic type checking and then forwards the call straight to a PlatformSomething function object. This PlatformSomething object could decide to do anything. But to make life easier, it may inherit from a shared SomethingHelper function object with CRTP that calls back into the PlatformSomething function object to emit inline assembly. Hope this explanation helps understanding the intended structure of this work. Thanks, /Erik On 2017-09-01 12:34, David Holmes wrote: > Hi Erik, > > I just wanted to add that I would expect the cmpxchg, add and inc, > Atomic API's to all require similar basic structure for manipulating > types/values etc, yet all three seem to have quite different > structures that I find very confusing. I'm still at a loss to fathom > the CRTP and the hoops we seemingly have to jump through just to add > or subtract 1!!! > > Cheers, > David > > On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >> Hi David, >> >> On 2017-09-01 02:49, David Holmes wrote: >>> Hi Erik, >>> >>> Sorry but this one is really losing me. >>> >>> What is the role of Adjustment ?? >> >> Adjustment represents the increment/decrement value as an >> IntegralConstant - your template friend for passing around a constant >> with both a specified type and value in templates. The type of the >> increment/decrement is the type of the destination when the >> destination is an integral type, otherwise if it is a pointer type, >> the increment/decrement type is ptrdiff_t. >> >>> How are inc/dec anything but "using constant" ?? >> >> I was also a bit torn on that name (I assume you are referring to >> IncUsingConstant/DecUsingConstant). It was hard to find a name that >> depicted what this platform helper does. I considered calling the >> helper something with immediate in the name because it is really used >> to embed the constant as immediate values in inline assembly today. >> But then again that seemed too specific, as it is not completely >> obvious platform specializations will use it in that way. One might >> just want to specialize this to send it into some compiler >> Atomic::inc intrinsic for example. Do you have any other preferred >> names? Here are a few possible names for IncUsingConstant: >> >> IncUsingScaledConstant >> IncUsingAdjustedConstant >> IncUsingPlatformHelper >> >> Any favourites? >> >>> Why do we special case jshort?? >> >> To be consistent with the special case of Atomic::add on jshort. Do >> you want it removed? >> >>> This is indecipherable to normal people ;-) >>> >>> This()->template inc(dest); >>> >>> For something as trivial as adding or subtracting 1 the template >>> machinations here are just mind boggling! >> >> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. >> The idea is to devirtualize a virtual call by passing in the derived >> type as a template parameter to a base class, and then let the base >> class static_cast to the derived class to devirtualize the call. I >> hope this explanation sheds some light on what is going on. The same >> CRTP idiom was used in the Atomic::add implementation in a similar >> fashion. >> >> I will add some comments describing this in the next round after >> Coleen replies. >> >> Thanks for looking at this. >> >> /Erik >> >>> >>> Cheers, >>> David >>> >>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>> Hi everyone, >>>> >>>> Bug ID: >>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>> >>>> The time has come for the next step in generalizing Atomic with >>>> templates. Today I will focus on Atomic::inc/dec. >>>> >>>> I have tried to mimic the new Kim style that seems to have been >>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>> structure looks like this: >>>> >>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>> object that performs some basic type checks. >>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>> define the operation arbitrarily for a given platform. The default >>>> implementation if not specialized for a platform is to call >>>> Atomic::add. So only platforms that want to do something different >>>> than that as an optimization have to provide a specialization. >>>> Layer 3) Platforms that decide to specialize >>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>> performing the necessary computation what the increment/decrement >>>> should be after pointer scaling using CRTP. The >>>> PlatformInc/PlatformDec operation then only needs to define an >>>> inc/dec member function, and will then get all the context >>>> information necessary to generate a more optimized implementation. >>>> Easy peasy. >>>> >>>> It is worth noticing that the generalized Atomic::dec operation >>>> assumes a two's complement integer machine and potentially sends >>>> the unary negative of a potentially unsigned type to Atomic::add. I >>>> have the following comments about this: >>>> 1) We already assume in other code that two's complement integers >>>> must be present. >>>> 2) A machine that does not have two's complement integers may still >>>> simply provide a specialization that solves the problem in a >>>> different way. >>>> 3) The alternative that does not make assumptions about that would >>>> use the good old IntegerTypes::cast_to_signed metaprogramming >>>> stuff, and I seem to recall we thought that was a bit too involved >>>> and complicated. >>>> This is the reason why I have chosen to use unary minus on the >>>> potentially unsigned type in the shared helper code that sends the >>>> decrement as an addend to Atomic::add. >>>> >>>> It would also be nice if somebody with access to PPC and s390 >>>> machines could try out the relevant changes there so I do not >>>> accidentally break those platforms. I have blind-coded the addition >>>> of the immediate values passed in to the inline assembly in a way >>>> that I think looks like it should work. >>>> >>>> Testing: >>>> RBT hs-tier3, JPRT --testset hotspot >>>> >>>> Thanks, >>>> /Erik >> From erik.osterlund at oracle.com Fri Sep 1 11:42:51 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 13:42:51 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> Message-ID: <59A947BB.3040506@oracle.com> Hi Mario, On 2017-09-01 12:43, Mario Torre wrote: > On Fri, Sep 1, 2017 at 11:29 AM, Erik ?sterlund > wrote: >> Hi David, >> >> On 2017-09-01 02:49, David Holmes wrote: >>> Hi Erik, >>> >>> Sorry but this one is really losing me. >>> >>> What is the role of Adjustment ?? >> >> Adjustment represents the increment/decrement value as an IntegralConstant - >> your template friend for passing around a constant with both a specified >> type and value in templates. The type of the increment/decrement is the type >> of the destination when the destination is an integral type, otherwise if it >> is a pointer type, the increment/decrement type is ptrdiff_t. >> >>> How are inc/dec anything but "using constant" ?? >> >> I was also a bit torn on that name (I assume you are referring to >> IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted >> what this platform helper does. I considered calling the helper something >> with immediate in the name because it is really used to embed the constant >> as immediate values in inline assembly today. But then again that seemed too >> specific, as it is not completely obvious platform specializations will use >> it in that way. One might just want to specialize this to send it into some >> compiler Atomic::inc intrinsic for example. Do you have any other preferred >> names? Here are a few possible names for IncUsingConstant: >> >> IncUsingScaledConstant >> IncUsingAdjustedConstant >> IncUsingPlatformHelper >> >> Any favourites? >> >>> Why do we special case jshort?? >> >> To be consistent with the special case of Atomic::add on jshort. Do you want >> it removed? >> >>> This is indecipherable to normal people ;-) >>> >>> This()->template inc(dest); >>> >>> For something as trivial as adding or subtracting 1 the template >>> machinations here are just mind boggling! >> >> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The >> idea is to devirtualize a virtual call by passing in the derived type as a >> template parameter to a base class, and then let the base class static_cast >> to the derived class to devirtualize the call. I hope this explanation sheds >> some light on what is going on. The same CRTP idiom was used in the >> Atomic::add implementation in a similar fashion. >> >> I will add some comments describing this in the next round after Coleen >> replies. >> > Isn't that a lot more slower than the current inline? What makes you think so? Everything is inlined all the way to the underlying platform layer. Achieving that is the very reason why CRTP is used instead of virtual calls. > BTW, I think I see what those magic constants are (4, 8... rings a > bell ;), but I think a define here could make things more readable. Sorry, I am not sure I am following what you mean here. Thanks, /Erik > Cheers, > Mario From jesper.wilhelmsson at oracle.com Fri Sep 1 11:54:26 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Fri, 1 Sep 2017 13:54:26 +0200 Subject: Hotspot repository jdk10/hs closes today In-Reply-To: <2081B825-B14B-4846-A1BA-294B4ECC1B5A@oracle.com> References: <2081B825-B14B-4846-A1BA-294B4ECC1B5A@oracle.com> Message-ID: <26D1BC95-12A8-4FFC-BF52-661B338233B4@oracle.com> Hi, Just a reminder that this is happening today at 2 pm PT. The repository will be made read only for approx two weeks. Thanks, /Jesper > On 29 Aug 2017, at 19:08, jesper.wilhelmsson at oracle.com wrote: > > Hi, > > The repository consolidation is approaching and to prepare for that we need to push all new changes from jdk10/hs to jdk10/jdk10. Once that push is done the hotspot repository jdk10/hs will be closed for all pushes until the consolidation is done. > > The current plan is to integrate 10/hs to 10/10 on Friday/Saturday. The snapshot will be taken at 2pm PST. Pushes not completed before 2pm will be killed and rejected. > > To increase the likelihood of this proceeding smoothly, please act quickly if a bug is filed due to any change you are pushing this week. > > The repo consolidation will likely take at least two weeks. The preliminary date for opening 10/hs is September 18. This is subject to change depending on the duration of the consolidation effort. > > Thanks, > /Jesper > From shade at redhat.com Fri Sep 1 12:00:29 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 1 Sep 2017 14:00:29 +0200 Subject: Hotspot repository jdk10/hs closes today In-Reply-To: <26D1BC95-12A8-4FFC-BF52-661B338233B4@oracle.com> References: <2081B825-B14B-4846-A1BA-294B4ECC1B5A@oracle.com> <26D1BC95-12A8-4FFC-BF52-661B338233B4@oracle.com> Message-ID: <7e46d1d2-e058-cd70-2d2b-73437806b7c3@redhat.com> Hi, On 09/01/2017 01:54 PM, jesper.wilhelmsson at oracle.com wrote: > Just a reminder that this is happening today at 2 pm PT. The repository will be made read only > for approx two weeks. Auxiliary question: does that mean jdk10/hs is "stable" now? I.e. no pending integrations, integration blockers, etc? We are preparing the derived shenandoah/jdk10 forest for consolidation too, and want to pull latest stable jdk10/hs to shenandoah/jdk10 to test in the interim two weeks of consolidation. Thanks, -Aleksey From neugens at redhat.com Fri Sep 1 12:20:37 2017 From: neugens at redhat.com (Mario Torre) Date: Fri, 1 Sep 2017 14:20:37 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A947BB.3040506@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <59A947BB.3040506@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 1:42 PM, Erik ?sterlund wrote: >> Isn't that a lot more slower than the current inline? > > > What makes you think so? Everything is inlined all the way to the underlying > platform layer. Achieving that is the very reason why CRTP is used instead > of virtual calls. I'm not familiar with the CRTP so that's probably what confuses me, but I assume the templates are inlined, but the actual function call aren't, are they? I understand that inline is just a suggestion and with more aggressive optimisation the compiler will probably inline those anyway, but have you done some measurement to see what's the cost of all those templates? >> BTW, I think I see what those magic constants are (4, 8... rings a >> bell ;), but I think a define here could make things more readable. > > > Sorry, I am not sure I am following what you mean here. I mean this: +template +struct Atomic::PlatformInc<4, Adjustment>: Atomic::IncUsingConstant<4, Adjustment> { I need to look at atomic.hpp to find out that this 4 is a sizeof. I would rather make that more explicit, also hard coding numbers is error prone, since you are refactoring this code anyway, I think this is a nice touch that makes things a bit easier, especially given that those templates are quite cryptic to the untrained. Cheers, Mario From coleen.phillimore at oracle.com Fri Sep 1 12:51:55 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 1 Sep 2017 08:51:55 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A91CE6.2080206@oracle.com> References: <59A804F8.9000501@oracle.com> <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> <59A91CE6.2080206@oracle.com> Message-ID: <516f938c-3ed5-2d95-2a3b-418ad2d2a149@oracle.com> On 9/1/17 4:40 AM, Erik ?sterlund wrote: > Hi Coleen, > > Thank you for taking your time to review this. > > On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote: >> >> Hi, I'm trying to parse the templates to review this but maybe it's >> convention but decoding these with parameters that are single capital >> letters make reading the template very difficult.? There are already >> a lot of non-alphanumeric characters.?? When the letter is T, that is >> expected by convention, but D or especially I makes it really hard.?? >> Can these be normalized to all use T when there is only one template >> parameter?? It'll be clear that T* is a pointer and T is an integer >> without having it be P. > > I apologize the names of the template parameters are hard to > understand. For what it's worth, I am only consistently applying Kim's > conventions here. It seemed like a bad idea to violate conventions > already set up - that would arguably be more confusing. > > The convention from earlier work by Kim is: > D: Type of destination > I: Operand type that has to be an integral type > P: Operand type that is a pointer element type > T: Generic operand type, may be integral or pointer type > > Personally, I do not mind this convention. It is more specific and > annotates things we know about the type into the name of the type. > > Do you want me to: > > 1) Keep the convention, now that I have explained what the convention > is and why it is your friend It is not my friend.? It's not helpful.?? I have to go through multiple non-alphabetic characters looking for the letter I or the letter P to mentally make the substitution of the template type. > 2) Break the convention for this change only making the naming > inconsistent Break it for this changeset and we'll fix it later for the earlier work from Kim.? I don't remember P and I in Kim's changeset but realized while looking at your changeset, this was one thing that makes these templates slower and more difficult to read. In the case of cmpxchg templates with a source, destination and original values, it was necessary to have more than T be the template type, although unsatisfying, because it turned out that the types couldn't be the same. > 3) Change the convention throughout consistently, including all > earlier work from Kim > >> >> +template >> +struct Atomic::IncImpl> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >> + void operator()(I volatile* dest) const { >> + typedef IntegralConstant Adjustment; >> + typedef PlatformInc PlatformOp; >> + PlatformOp()(dest); >> + } >> +}; >> >> This one isn't as difficult, because it's short, but it would be >> faster to understand with T. >> >> +template >> +struct Atomic::IncImpl> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >> + void operator()(T volatile* dest) const { >> + typedef IntegralConstant Adjustment; >> + typedef PlatformInc PlatformOp; >> + PlatformOp()(dest); >> + } >> +}; >> >> +template<> >> +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { >> + void operator()(jshort volatile* dest) const { >> + add(jshort(1), dest); >> + } >> +}; >> >> >> Did I already ask if this could be changed to u2 rather than jshort?? >> Or is that the follow-on RFE? > > That is a follow-on RFE. Good.? I think that's the one that I assigned to myself. > >> +// Helper for platforms wanting a constant adjustment. >> +template >> +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { >> + typedef PlatformInc Derived; >> >> >> I can't find the caller of this.? Is it really a lot faster than >> having the platform independent add(1, T) / add(-1, T) to make all >> this code worth having?? How is this called?? I couldn't parse the >> trick.? Atomic::inc() is always a "constant adjustment" so I'm >> confused about what the comment means and what motivates all the asm >> code.?? Do these platform implementations exist because they don't >> have twos complement for integer representation?? really? > > This is used by some x86, PPC and s390 platforms. Personally I > question its usefulness for x86. I believe it might be one of those > things were we ran some benchmarks a decade ago and concluded that it > was slightly faster to have a slimmed path for Atomic::inc rather than > reusing Atomic::add. Yes, there are a lot of optimizations that we slog along in the code base because they might have either theoretically or measurably made some difference in something we don't have anymore. > > I did not initially want to bring this up as it seems like none of my > business, but now that the question has been asked about differences, > I could not help but notice the advertised "leading sync" convention > of Atomic::inc on PPC is not respected. That is, there is no "sync" > fence before the atomic increment, as required by the specified > semantics. There is not even a leading "lwsync". The corresponding > Atomic::add operation though, does have leading lwsync (unlike > Atomic::inc). Now this should arguably be reinforced to sync rather > than lwsync to respect the advertised semantics of both Atomic::add > and Atomic::inc on PPC. Hopefully that statement will not turn into a > long unrelated mailing thread... Could you file an bug with this observation? > > Conclusively though, there is definitely a substantial difference in > the fencing comparing the PPC implementation of Atomic::inc to > Atomic::add. Whether either one of them conforms to intended semantics > or not is a different matter - one that I was hoping not to have to > deal with in this RFE as I am merely templateifying what was already > there, without judging the existing specializations. And it is my > observation that as the code looks now, we would incur a bunch of more > fencing compared to what the code does today on PPC. > Completely understand.?? How are these called exactly though?? I couldn't figure it out. >> Also, the function name This() is really disturbing and distracting.? >> Can it be called some verb() representing what it does?? >> cast_to_derived()? >> >> + template >> + void operator()(I volatile* dest) const { >> + This()->template inc(dest); >> + } >> > > Yes, I will change the name accordingly as you suggest. > >> I didn't know you could put "template" there. > > It is required to put the template keyword before the member function > name when calling a template member function with explicit template > parameters (as opposed to implicitly inferred template parameters) on > a template type. I thought you could just stay inc() in the call, but my C++ template vocabularly is minimal. > >> What does this call? > > This calls the platform-defined intrinsic that is defined in the > platform files - the one that contains the inline assembly. How?? I don't see how...? :( > >> Rather than I for integer case, and P for pointer case, can you add a >> one line comment above this like: >> // Helper for integer types >> and >> // Helper for pointer types > > Or perhaps we could do both? Nevertheless, I will add these comments. > But as per the discussion above, I would be happy if we could keep the > convention that Kim has already set up for the template type names. > >> Small local comments would be really helpful for many of these >> functions.?? Just to get more english words in there...? Since Kim's >> on vacation can you help me understand this code and add comments so >> I remember the reasons for some of this? > > Sure - I will decorate the code with some comments to help > understanding. I will send an updated webrev when I get your reply > regarding the typename naming convention verdict. That's my opinion anyway.?? David might have the opposite opinion. Thanks, Coleen > > Thanks for the review! > > /Erik > >> >> Thanks! >> Coleen >> >> >> On 8/31/17 8:45 AM, Erik ?sterlund wrote: >>> Hi everyone, >>> >>> Bug ID: >>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>> >>> The time has come for the next step in generalizing Atomic with >>> templates. Today I will focus on Atomic::inc/dec. >>> >>> I have tried to mimic the new Kim style that seems to have been >>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>> structure looks like this: >>> >>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>> object that performs some basic type checks. >>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>> define the operation arbitrarily for a given platform. The default >>> implementation if not specialized for a platform is to call >>> Atomic::add. So only platforms that want to do something different >>> than that as an optimization have to provide a specialization. >>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec >>> to be more optimized may inherit from a helper class >>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>> necessary computation what the increment/decrement should be after >>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >>> then only needs to define an inc/dec member function, and will then >>> get all the context information necessary to generate a more >>> optimized implementation. Easy peasy. >>> >>> It is worth noticing that the generalized Atomic::dec operation >>> assumes a two's complement integer machine and potentially sends the >>> unary negative of a potentially unsigned type to Atomic::add. I have >>> the following comments about this: >>> 1) We already assume in other code that two's complement integers >>> must be present. >>> 2) A machine that does not have two's complement integers may still >>> simply provide a specialization that solves the problem in a >>> different way. >>> 3) The alternative that does not make assumptions about that would >>> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, >>> and I seem to recall we thought that was a bit too involved and >>> complicated. >>> This is the reason why I have chosen to use unary minus on the >>> potentially unsigned type in the shared helper code that sends the >>> decrement as an addend to Atomic::add. >>> >>> It would also be nice if somebody with access to PPC and s390 >>> machines could try out the relevant changes there so I do not >>> accidentally break those platforms. I have blind-coded the addition >>> of the immediate values passed in to the inline assembly in a way >>> that I think looks like it should work. >>> >>> Testing: >>> RBT hs-tier3, JPRT --testset hotspot >>> >>> Thanks, >>> /Erik >> > From erik.osterlund at oracle.com Fri Sep 1 13:31:24 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 15:31:24 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <516f938c-3ed5-2d95-2a3b-418ad2d2a149@oracle.com> References: <59A804F8.9000501@oracle.com> <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> <59A91CE6.2080206@oracle.com> <516f938c-3ed5-2d95-2a3b-418ad2d2a149@oracle.com> Message-ID: <59A9612C.40900@oracle.com> Hi Coleen, On 2017-09-01 14:51, coleen.phillimore at oracle.com wrote: > > > On 9/1/17 4:40 AM, Erik ?sterlund wrote: >> Hi Coleen, >> >> Thank you for taking your time to review this. >> >> On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote: >>> >>> Hi, I'm trying to parse the templates to review this but maybe it's >>> convention but decoding these with parameters that are single >>> capital letters make reading the template very difficult. There are >>> already a lot of non-alphanumeric characters. When the letter is >>> T, that is expected by convention, but D or especially I makes it >>> really hard. Can these be normalized to all use T when there is >>> only one template parameter? It'll be clear that T* is a pointer >>> and T is an integer without having it be P. >> >> I apologize the names of the template parameters are hard to >> understand. For what it's worth, I am only consistently applying >> Kim's conventions here. It seemed like a bad idea to violate >> conventions already set up - that would arguably be more confusing. >> >> The convention from earlier work by Kim is: >> D: Type of destination >> I: Operand type that has to be an integral type >> P: Operand type that is a pointer element type >> T: Generic operand type, may be integral or pointer type >> >> Personally, I do not mind this convention. It is more specific and >> annotates things we know about the type into the name of the type. >> >> Do you want me to: >> >> 1) Keep the convention, now that I have explained what the convention >> is and why it is your friend > > It is not my friend. It's not helpful. I have to go through > multiple non-alphabetic characters looking for the letter I or the > letter P to mentally make the substitution of the template type. Okay. I understand now that the pre-existing naming convention of types named I and P differentiating integral types from pointer types is not helpful to you. And if I understand you correctly, you would like to introduce a new naming convention that you find more helpful that uses the more general type name T instead, regardless if it refers to an integral type or a pointer type, and save the exercise of figuring out whether it is intentionally constrained to be a pointer type or an integral type to the reader by going to the declaration, and there reading some kind of comment describing such properties in text instead? Do we have a consensus that this new convention is indeed more desirable? > >> 2) Break the convention for this change only making the naming >> inconsistent > > Break it for this changeset and we'll fix it later for the earlier > work from Kim. I don't remember P and I in Kim's changeset but > realized while looking at your changeset, this was one thing that > makes these templates slower and more difficult to read. Okay. > In the case of cmpxchg templates with a source, destination and > original values, it was necessary to have more than T be the template > type, although unsatisfying, because it turned out that the types > couldn't be the same. Okay. > >> 3) Change the convention throughout consistently, including all >> earlier work from Kim >> >>> >>> +template >>> +struct Atomic::IncImpl>> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >>> + void operator()(I volatile* dest) const { >>> + typedef IntegralConstant Adjustment; >>> + typedef PlatformInc PlatformOp; >>> + PlatformOp()(dest); >>> + } >>> +}; >>> >>> This one isn't as difficult, because it's short, but it would be >>> faster to understand with T. >>> >>> +template >>> +struct Atomic::IncImpl>> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >>> + void operator()(T volatile* dest) const { >>> + typedef IntegralConstant Adjustment; >>> + typedef PlatformInc PlatformOp; >>> + PlatformOp()(dest); >>> + } >>> +}; >>> >>> +template<> >>> +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { >>> + void operator()(jshort volatile* dest) const { >>> + add(jshort(1), dest); >>> + } >>> +}; >>> >>> >>> Did I already ask if this could be changed to u2 rather than >>> jshort? Or is that the follow-on RFE? >> >> That is a follow-on RFE. > > Good. I think that's the one that I assigned to myself. Yes, you are right. >> >>> +// Helper for platforms wanting a constant adjustment. >>> +template >>> +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { >>> + typedef PlatformInc Derived; >>> >>> >>> I can't find the caller of this. Is it really a lot faster than >>> having the platform independent add(1, T) / add(-1, T) to make all >>> this code worth having? How is this called? I couldn't parse the >>> trick. Atomic::inc() is always a "constant adjustment" so I'm >>> confused about what the comment means and what motivates all the asm >>> code. Do these platform implementations exist because they don't >>> have twos complement for integer representation? really? >> >> This is used by some x86, PPC and s390 platforms. Personally I >> question its usefulness for x86. I believe it might be one of those >> things were we ran some benchmarks a decade ago and concluded that it >> was slightly faster to have a slimmed path for Atomic::inc rather >> than reusing Atomic::add. > > Yes, there are a lot of optimizations that we slog along in the code > base because they might have either theoretically or measurably made > some difference in something we don't have anymore. I noticed. :) > >> >> I did not initially want to bring this up as it seems like none of my >> business, but now that the question has been asked about differences, >> I could not help but notice the advertised "leading sync" convention >> of Atomic::inc on PPC is not respected. That is, there is no "sync" >> fence before the atomic increment, as required by the specified >> semantics. There is not even a leading "lwsync". The corresponding >> Atomic::add operation though, does have leading lwsync (unlike >> Atomic::inc). Now this should arguably be reinforced to sync rather >> than lwsync to respect the advertised semantics of both Atomic::add >> and Atomic::inc on PPC. Hopefully that statement will not turn into a >> long unrelated mailing thread... > > Could you file an bug with this observation? Sure. >> >> Conclusively though, there is definitely a substantial difference in >> the fencing comparing the PPC implementation of Atomic::inc to >> Atomic::add. Whether either one of them conforms to intended >> semantics or not is a different matter - one that I was hoping not to >> have to deal with in this RFE as I am merely templateifying what was >> already there, without judging the existing specializations. And it >> is my observation that as the code looks now, we would incur a bunch >> of more fencing compared to what the code does today on PPC. >> > > Completely understand. How are these called exactly though? I > couldn't figure it out. They are called like this: IncImpl::operator() calls PlatformInc::operator(), which has its class partially specialized by the platform (e.g. atomic_linux_pcc.hpp). Its operator() is defined by the super class helper, IncUsingConstant::operator(), that scales the addend accordingly and subsequently calls the PlatformInc::inc function that is defined in the PPC-specific atomic header and performs some suitable inline assembly for the operation. > >>> Also, the function name This() is really disturbing and >>> distracting. Can it be called some verb() representing what it >>> does? cast_to_derived()? >>> >>> + template >>> + void operator()(I volatile* dest) const { >>> + This()->template inc(dest); >>> + } >>> >> >> Yes, I will change the name accordingly as you suggest. >> >>> I didn't know you could put "template" there. >> >> It is required to put the template keyword before the member function >> name when calling a template member function with explicit template >> parameters (as opposed to implicitly inferred template parameters) on >> a template type. > > I thought you could just stay inc() in the call, but my C++ > template vocabularly is minimal. >> >>> What does this call? >> >> This calls the platform-defined intrinsic that is defined in the >> platform files - the one that contains the inline assembly. > > How? I don't see how... :( Hopefully I already explained this above. >> >>> Rather than I for integer case, and P for pointer case, can you add >>> a one line comment above this like: >>> // Helper for integer types >>> and >>> // Helper for pointer types >> >> Or perhaps we could do both? Nevertheless, I will add these comments. >> But as per the discussion above, I would be happy if we could keep >> the convention that Kim has already set up for the template type names. >> >>> Small local comments would be really helpful for many of these >>> functions. Just to get more english words in there... Since Kim's >>> on vacation can you help me understand this code and add comments so >>> I remember the reasons for some of this? >> >> Sure - I will decorate the code with some comments to help >> understanding. I will send an updated webrev when I get your reply >> regarding the typename naming convention verdict. > > That's my opinion anyway. David might have the opposite opinion. David? I am curious if you have the same opinion. If you both want to replace the template names I and P with T, then I am happy to do that. Thanks for the review. /Erik > Thanks, > Coleen > >> >> Thanks for the review! >> >> /Erik >> >>> >>> Thanks! >>> Coleen >>> >>> >>> On 8/31/17 8:45 AM, Erik ?sterlund wrote: >>>> Hi everyone, >>>> >>>> Bug ID: >>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>> >>>> The time has come for the next step in generalizing Atomic with >>>> templates. Today I will focus on Atomic::inc/dec. >>>> >>>> I have tried to mimic the new Kim style that seems to have been >>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>> structure looks like this: >>>> >>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>> object that performs some basic type checks. >>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>> define the operation arbitrarily for a given platform. The default >>>> implementation if not specialized for a platform is to call >>>> Atomic::add. So only platforms that want to do something different >>>> than that as an optimization have to provide a specialization. >>>> Layer 3) Platforms that decide to specialize >>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>> performing the necessary computation what the increment/decrement >>>> should be after pointer scaling using CRTP. The >>>> PlatformInc/PlatformDec operation then only needs to define an >>>> inc/dec member function, and will then get all the context >>>> information necessary to generate a more optimized implementation. >>>> Easy peasy. >>>> >>>> It is worth noticing that the generalized Atomic::dec operation >>>> assumes a two's complement integer machine and potentially sends >>>> the unary negative of a potentially unsigned type to Atomic::add. I >>>> have the following comments about this: >>>> 1) We already assume in other code that two's complement integers >>>> must be present. >>>> 2) A machine that does not have two's complement integers may still >>>> simply provide a specialization that solves the problem in a >>>> different way. >>>> 3) The alternative that does not make assumptions about that would >>>> use the good old IntegerTypes::cast_to_signed metaprogramming >>>> stuff, and I seem to recall we thought that was a bit too involved >>>> and complicated. >>>> This is the reason why I have chosen to use unary minus on the >>>> potentially unsigned type in the shared helper code that sends the >>>> decrement as an addend to Atomic::add. >>>> >>>> It would also be nice if somebody with access to PPC and s390 >>>> machines could try out the relevant changes there so I do not >>>> accidentally break those platforms. I have blind-coded the addition >>>> of the immediate values passed in to the inline assembly in a way >>>> that I think looks like it should work. >>>> >>>> Testing: >>>> RBT hs-tier3, JPRT --testset hotspot >>>> >>>> Thanks, >>>> /Erik >>> >> > From erik.osterlund at oracle.com Fri Sep 1 13:31:43 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 15:31:43 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <602b39a1-85e3-0e34-ff0c-c9076885c206@physik.fu-berlin.de> References: <59A804F8.9000501@oracle.com> <602b39a1-85e3-0e34-ff0c-c9076885c206@physik.fu-berlin.de> Message-ID: <59A9613F.3080902@oracle.com> Hi Adrian, Thank you for trying this for me. /Erik On 2017-09-01 11:35, John Paul Adrian Glaubitz wrote: > On 08/31/2017 02:45 PM, Erik ?sterlund wrote: >> It would also be nice if somebody with access to PPC and s390 machines >> could try out the relevant changes there so I do not accidentally break >> those platforms. > > And linux-zero and linux-sparc, of course :). I will test that. > > Adrian > From aph at redhat.com Fri Sep 1 13:41:01 2017 From: aph at redhat.com (Andrew Haley) Date: Fri, 1 Sep 2017 14:41:01 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A804F8.9000501@oracle.com> References: <59A804F8.9000501@oracle.com> Message-ID: On 31/08/17 13:45, Erik ?sterlund wrote: > Hi everyone, > > Bug ID: > https://bugs.openjdk.java.net/browse/JDK-8186838 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ > > The time has come for the next step in generalizing Atomic with > templates. Today I will focus on Atomic::inc/dec. > > I have tried to mimic the new Kim style that seems to have been > universally accepted. Like Atomic::add and Atomic::cmpxchg, the > structure looks like this: > > Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object > that performs some basic type checks. > Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define > the operation arbitrarily for a given platform. The default > implementation if not specialized for a platform is to call Atomic::add. > So only platforms that want to do something different than that as an > optimization have to provide a specialization. > Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to > be more optimized may inherit from a helper class > IncUsingConstant/DecUsingConstant. This helper helps performing the > necessary computation what the increment/decrement should be after > pointer scaling using CRTP. The PlatformInc/PlatformDec operation then > only needs to define an inc/dec member function, and will then get all > the context information necessary to generate a more optimized > implementation. Easy peasy. I wanted to say something nice, but I honestly can't. I am dismayed. I hoped that inc/dec would turn out to be much simpler than the cmpxchg functions: I think they should, because they don't have to deal with the complexity of potentially three different types. Instead we have, again, a large and complex patch. Even on AArch64, which should be the simplest case because Atomic::inc can be defined as template inc(T1 *dest) { return __sync_add_and_fetch(dest, 1); } or something similar, we have Atomic::inc Atomic::IncImpl::operator() Atomic::PlatformInc<4ul, IntegralConstant >::operator() Atomic::add Atomic::AddImpl::operator() Atomic::AddAndFetch >::operator() Atomic::PlatformAdd<4ul>::add_and_fetch __sync_add_and_fetch I quite understand that it isn't so easy on some systems, and they need a generic form that explodes into four different calls, one for each size of integer. I completely accept that it will be more complex for everything else. But is it necessary to have so much code for something so simple? This is a 1400 line patch. Granted, much of it is simply moving stuff around, but despite the potential of template code to simplify the implementation we have a more complex solution than we had before. I ask you, is this the simplest solution that you believe is possible? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Fri Sep 1 14:15:57 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 16:15:57 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> Message-ID: <59A96B9D.6070002@oracle.com> Hi Andrew, On 2017-09-01 15:41, Andrew Haley wrote: > On 31/08/17 13:45, Erik ?sterlund wrote: >> Hi everyone, >> >> Bug ID: >> https://bugs.openjdk.java.net/browse/JDK-8186838 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >> >> The time has come for the next step in generalizing Atomic with >> templates. Today I will focus on Atomic::inc/dec. >> >> I have tried to mimic the new Kim style that seems to have been >> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >> structure looks like this: >> >> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >> that performs some basic type checks. >> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >> the operation arbitrarily for a given platform. The default >> implementation if not specialized for a platform is to call Atomic::add. >> So only platforms that want to do something different than that as an >> optimization have to provide a specialization. >> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to >> be more optimized may inherit from a helper class >> IncUsingConstant/DecUsingConstant. This helper helps performing the >> necessary computation what the increment/decrement should be after >> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >> only needs to define an inc/dec member function, and will then get all >> the context information necessary to generate a more optimized >> implementation. Easy peasy. > I wanted to say something nice, but I honestly can't. I am dismayed. Okay. > I hoped that inc/dec would turn out to be much simpler than the > cmpxchg functions: I think they should, because they don't have to > deal with the complexity of potentially three different types. > Instead we have, again, a large and complex patch. > > Even on AArch64, which should be the simplest case because Atomic::inc > can be defined as > > template > inc(T1 *dest) { > return __sync_add_and_fetch(dest, 1); > } AArch64 is indeed the simplest case. It does not have a specialization in my patch. It simply expresses Atomic::inc in terms of Atomic::add. > or something similar, we have > > Atomic::inc > Atomic::IncImpl::operator() > Atomic::PlatformInc<4ul, IntegralConstant >::operator() > Atomic::add > Atomic::AddImpl::operator() > Atomic::AddAndFetch >::operator() > Atomic::PlatformAdd<4ul>::add_and_fetch > __sync_add_and_fetch > > I quite understand that it isn't so easy on some systems, and they > need a generic form that explodes into four different calls, one for > each size of integer. I completely accept that it will be more > complex for everything else. But is it necessary to have so much code > for something so simple? This is a 1400 line patch. Granted, much of > it is simply moving stuff around, but despite the potential of > template code to simplify the implementation we have a more complex > solution than we had before. > > I ask you, is this the simplest solution that you believe is possible? It is not the simplest solution I can think of. The simplest solution I can think of is to remove all specialized versions of Atomic::inc/dec and just have it call Atomic::add directly. That would remove the optimizations we have today, for whatever reason we have them. It would lead to slightly more conservative fencing on PPC/S390, and would lead to slightly less optimal machine encoding on x86 (without immediate values in the instructions). But it would be simpler for sure. I did not put any judgement into whether our existing optimizations are worthwhile or not. But if you want to prioritize simplicity, removing those optimizations is one possible solution. Would you prefer that? Thanks, /Erik From coleen.phillimore at oracle.com Fri Sep 1 14:42:48 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 1 Sep 2017 10:42:48 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A96B9D.6070002@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> On 9/1/17 10:15 AM, Erik ?sterlund wrote: > Hi Andrew, > > On 2017-09-01 15:41, Andrew Haley wrote: >> On 31/08/17 13:45, Erik ?sterlund wrote: >>> Hi everyone, >>> >>> Bug ID: >>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>> >>> The time has come for the next step in generalizing Atomic with >>> templates. Today I will focus on Atomic::inc/dec. >>> >>> I have tried to mimic the new Kim style that seems to have been >>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>> structure looks like this: >>> >>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>> that performs some basic type checks. >>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >>> the operation arbitrarily for a given platform. The default >>> implementation if not specialized for a platform is to call >>> Atomic::add. >>> So only platforms that want to do something different than that as an >>> optimization have to provide a specialization. >>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to >>> be more optimized may inherit from a helper class >>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>> necessary computation what the increment/decrement should be after >>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >>> only needs to define an inc/dec member function, and will then get all >>> the context information necessary to generate a more optimized >>> implementation. Easy peasy. >> I wanted to say something nice, but I honestly can't.? I am dismayed. > > Okay. > >> I hoped that inc/dec would turn out to be much simpler than the >> cmpxchg functions: I think they should, because they don't have to >> deal with the complexity of potentially three different types. >> Instead we have, again, a large and complex patch. >> >> Even on AArch64, which should be the simplest case because Atomic::inc >> can be defined as >> >> template >> inc(T1 *dest) { >> ?? return __sync_add_and_fetch(dest, 1); >> } > > AArch64 is indeed the simplest case. It does not have a specialization > in my patch. It simply expresses Atomic::inc in terms of Atomic::add. > >> or something similar, we have >> >> Atomic::inc >> Atomic::IncImpl::operator() >> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >> Atomic::add >> Atomic::AddImpl::operator() >> Atomic::AddAndFetch >::operator() >> Atomic::PlatformAdd<4ul>::add_and_fetch >> __sync_add_and_fetch >> >> I quite understand that it isn't so easy on some systems, and they >> need a generic form that explodes into four different calls, one for >> each size of integer.? I completely accept that it will be more >> complex for everything else.? But is it necessary to have so much code >> for something so simple?? This is a 1400 line patch.? Granted, much of >> it is simply moving stuff around, but despite the potential of >> template code to simplify the implementation we have a more complex >> solution than we had before. >> >> I ask you, is this the simplest solution that you believe is possible? > > It is not the simplest solution I can think of. The simplest solution > I can think of is to remove all specialized versions of > Atomic::inc/dec and just have it call Atomic::add directly. That would > remove the optimizations we have today, for whatever reason we have > them. It would lead to slightly more conservative fencing on PPC/S390, > and would lead to slightly less optimal machine encoding on x86 > (without immediate values in the instructions). But it would be > simpler for sure. I did not put any judgement into whether our > existing optimizations are worthwhile or not. But if you want to > prioritize simplicity, removing those optimizations is one possible > solution. Would you prefer that? I wonder if you could remove the linux x86 asm code for inc/dec, recode it to use add, and do a dev submit run against your patch? While we're discussing this. thanks, Coleen > > Thanks, > /Erik From rohitarulraj at gmail.com Fri Sep 1 15:04:04 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 1 Sep 2017 20:34:04 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj wrote: > On Fri, Sep 1, 2017 at 3:01 AM, David Holmes wrote: >> Hi Rohit, >> >> I think the patch needs updating for jdk10 as I already see a lot of logic >> around UseSHA in vm_version_x86.cpp. >> >> Thanks, >> David >> > > Thanks David, I will update the patch wrt JDK10 source base, test and > resubmit for review. > > Regards, > Rohit > Hi All, I have updated the patch wrt openjdk10/hotspot (parent: 13519:71337910df60), did regression testing using jtreg ($make default) and didnt find any regressions. Can anyone please volunteer to review this patch which sets flag/ISA defaults for newer AMD 17h (EPYC) processor? ************************* Patch **************************** diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -1088,6 +1088,22 @@ } FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); } + if (supports_sha()) { + if (FLAG_IS_DEFAULT(UseSHA)) { + FLAG_SET_DEFAULT(UseSHA, true); + } + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || UseSHA512Intrinsics) { + if (!FLAG_IS_DEFAULT(UseSHA) || + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { + warning("SHA instructions are not available on this CPU"); + } + FLAG_SET_DEFAULT(UseSHA, false); + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } // some defaults for AMD family 15h if ( cpu_family() == 0x15 ) { @@ -1109,11 +1125,43 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + UseXMMForArrayCopy = true; + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + UseUnalignedLoadStores = true; + } + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { + UseBMI2Instructions = true; + } + if (MaxVectorSize > 32) { + FLAG_SET_DEFAULT(MaxVectorSize, 32); + } + if (UseSHA) { + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } else if (UseSHA512Intrinsics) { + warning("Intrinsics for SHA-384 and SHA-512 crypto hash functions not available on this CPU."); + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } + } +#ifdef COMPILER2 + if (supports_sse4_2()) { + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -505,6 +505,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -515,19 +523,13 @@ result |= CPU_LZCNT; if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) result |= CPU_SSE4A; + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) + result |= CPU_HT; } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; ************************************************************** Thanks, Rohit >> >> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>> >>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>> wrote: >>>> >>>> Hi Rohit, >>>> >>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>> the commit process. >>>>> >>>>> Webrev: >>>>> >>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>> >>>> >>>> >>>> Unfortunately patches can not be accepted from systems outside the >>>> OpenJDK >>>> infrastructure and ... >>>> >>>>> I have also attached the patch (hg diff -g) for reference. >>>> >>>> >>>> >>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>> the >>>> patch is small please include it inline. Otherwise you will need to find >>>> an >>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>> >>> >>>>> 3) I have done regression testing using jtreg ($make default) and >>>>> didnt find any regressions. >>>> >>>> >>>> >>>> Sounds good, but until I see the patch it is hard to comment on testing >>>> requirements. >>>> >>>> Thanks, >>>> David >>> >>> >>> Thanks David, >>> Yes, it's a small patch. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1051,6 +1051,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1072,11 +1088,43 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + UseXMMForArrayCopy = true; >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + UseUnalignedLoadStores = true; >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + UseBMI2Instructions = true; >>> + } >>> + if (MaxVectorSize > 32) { >>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -513,6 +513,16 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> } >>> // Intel features. >>> if(is_intel()) { >>> >>> Regards, >>> Rohit >>> >> From erik.osterlund at oracle.com Fri Sep 1 15:23:49 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 17:23:49 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> Message-ID: <59A97B85.8@oracle.com> Hi Coleen, On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote: > > > On 9/1/17 10:15 AM, Erik ?sterlund wrote: >> Hi Andrew, >> >> On 2017-09-01 15:41, Andrew Haley wrote: >>> On 31/08/17 13:45, Erik ?sterlund wrote: >>>> Hi everyone, >>>> >>>> Bug ID: >>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>> >>>> The time has come for the next step in generalizing Atomic with >>>> templates. Today I will focus on Atomic::inc/dec. >>>> >>>> I have tried to mimic the new Kim style that seems to have been >>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>> structure looks like this: >>>> >>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>>> that performs some basic type checks. >>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >>>> the operation arbitrarily for a given platform. The default >>>> implementation if not specialized for a platform is to call >>>> Atomic::add. >>>> So only platforms that want to do something different than that as an >>>> optimization have to provide a specialization. >>>> Layer 3) Platforms that decide to specialize >>>> PlatformInc/PlatformDec to >>>> be more optimized may inherit from a helper class >>>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>>> necessary computation what the increment/decrement should be after >>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >>>> only needs to define an inc/dec member function, and will then get all >>>> the context information necessary to generate a more optimized >>>> implementation. Easy peasy. >>> I wanted to say something nice, but I honestly can't. I am dismayed. >> >> Okay. >> >>> I hoped that inc/dec would turn out to be much simpler than the >>> cmpxchg functions: I think they should, because they don't have to >>> deal with the complexity of potentially three different types. >>> Instead we have, again, a large and complex patch. >>> >>> Even on AArch64, which should be the simplest case because Atomic::inc >>> can be defined as >>> >>> template >>> inc(T1 *dest) { >>> return __sync_add_and_fetch(dest, 1); >>> } >> >> AArch64 is indeed the simplest case. It does not have a >> specialization in my patch. It simply expresses Atomic::inc in terms >> of Atomic::add. >> >>> or something similar, we have >>> >>> Atomic::inc >>> Atomic::IncImpl::operator() >>> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >>> Atomic::add >>> Atomic::AddImpl::operator() >>> Atomic::AddAndFetch >::operator() >>> Atomic::PlatformAdd<4ul>::add_and_fetch >>> __sync_add_and_fetch >>> >>> I quite understand that it isn't so easy on some systems, and they >>> need a generic form that explodes into four different calls, one for >>> each size of integer. I completely accept that it will be more >>> complex for everything else. But is it necessary to have so much code >>> for something so simple? This is a 1400 line patch. Granted, much of >>> it is simply moving stuff around, but despite the potential of >>> template code to simplify the implementation we have a more complex >>> solution than we had before. >>> >>> I ask you, is this the simplest solution that you believe is possible? >> >> It is not the simplest solution I can think of. The simplest solution >> I can think of is to remove all specialized versions of >> Atomic::inc/dec and just have it call Atomic::add directly. That >> would remove the optimizations we have today, for whatever reason we >> have them. It would lead to slightly more conservative fencing on >> PPC/S390, and would lead to slightly less optimal machine encoding on >> x86 (without immediate values in the instructions). But it would be >> simpler for sure. I did not put any judgement into whether our >> existing optimizations are worthwhile or not. But if you want to >> prioritize simplicity, removing those optimizations is one possible >> solution. Would you prefer that? > > I wonder if you could remove the linux x86 asm code for inc/dec, > recode it to use add, and do a dev submit run against your patch? > While we're discussing this. Okay, I will try that. /Erik > thanks, > Coleen > >> >> Thanks, >> /Erik > From jesper.wilhelmsson at oracle.com Fri Sep 1 15:42:03 2017 From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson) Date: Fri, 1 Sep 2017 17:42:03 +0200 Subject: Hotspot repository jdk10/hs closes today In-Reply-To: <7e46d1d2-e058-cd70-2d2b-73437806b7c3@redhat.com> References: <2081B825-B14B-4846-A1BA-294B4ECC1B5A@oracle.com> <26D1BC95-12A8-4FFC-BF52-661B338233B4@oracle.com> <7e46d1d2-e058-cd70-2d2b-73437806b7c3@redhat.com> Message-ID: <3DE5D3A5-0E5E-413D-9191-9C6B7132D22F@oracle.com> Jdk10/hs has been fairly stable lately. There are no open integration blockers right now. I'll send out a status update tomorrow when I have looked at the results of the Friday nightly. /Jesper > 1 sep. 2017 kl. 14:00 skrev Aleksey Shipilev : > > Hi, > >> On 09/01/2017 01:54 PM, jesper.wilhelmsson at oracle.com wrote: >> Just a reminder that this is happening today at 2 pm PT. The repository will be made read only >> for approx two weeks. > Auxiliary question: does that mean jdk10/hs is "stable" now? I.e. no pending integrations, > integration blockers, etc? We are preparing the derived shenandoah/jdk10 forest for consolidation > too, and want to pull latest stable jdk10/hs to shenandoah/jdk10 to test in the interim two weeks of > consolidation. > > Thanks, > -Aleksey > From volker.simonis at gmail.com Fri Sep 1 15:42:53 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 1 Sep 2017 17:42:53 +0200 Subject: RFR(S): 8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob() Message-ID: Hi, can I please have a review and sponsor for the following small fix: http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091/ https://bugs.openjdk.java.net/browse/JDK-8187091 We see failures in test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java which are cause by problems in CodeHeap::contains_blob() for corner cases with CodeBlobs of zero size: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (heap.cpp:248), pid=27586, tid=27587 # guarantee((char*) b >= _memory.low_boundary() && (char*) b < _memory.high()) failed: The block to be deallocated 0x00007fffe6666f80 is not within the heap starting with 0x00007fffe6667000 and ending with 0x00007fffe6ba000 The problem is that JDK-8183573 replaced virtual bool contains_blob(const CodeBlob* blob) const { return low_boundary() <= (char*) blob && (char*) blob < high(); } by: bool contains_blob(const CodeBlob* blob) const { return contains(blob->code_begin()); } But that my be wrong in the corner case where the size of the CodeBlob's payload is zero (i.e. the CodeBlob consists only of the 'header' - i.e. the C++ object itself) because in that case CodeBlob::code_begin() points right behind the CodeBlob's header which is a memory location which doesn't belong to the CodeBlob anymore. This exact corner case is exercised by ReturnBlobToWrongHeapTest which allocates CodeBlobs of size zero (i.e. zero 'payload') with the help of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills up. The test first fills the 'non-profiled nmethods' CodeHeap. If the 'non-profiled nmethods' CodeHeap is full, the VM automatically tries to allocate from the 'profiled nmethods' CodeHeap until that fills up as well. But in the CodeCache the 'profiled nmethods' CodeHeap is located right before the non-profiled nmethods' CodeHeap. So if the last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a payload size of zero and uses all the CodeHeaps remaining size, we will end up with a CodeBlob whose code_begin() address will point right behind the actual CodeHeap (i.e. it will point right at the beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This will result in the above guarantee to fire, when we will try to free the last allocated CodeBlob (with sun.hotspot.WhiteBox.freeCodeBlob()). In a previous mail thread (http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/028175.html) Vladimir explained why JDK-8183573 was done: > About contains_blob(). The problem is that AOTCompiledMethod allocated in CHeap and not in aot code section (which is RO): > > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 > > It is allocated in CHeap after AOT library is loaded. Its code_begin() points to AOT code section but AOTCompiledMethod* > points outside it (to normal malloced space) so you can't use (char*)blob address. and proposed these two fixes: > There are 2 ways to fix it, I think. > One is to add new field to CodeBlobLayout and set it to blob* address for normal CodeCache blobs and to code_begin for > AOT code. > Second is to use contains(blob->code_end() - 1) assuming that AOT code is never zero. I came up with a slightly different solution - just use 'CodeHeap::code_blob_type()' whether to use 'blob->code_begin()' (for the AOT case) or '(void*)blob' (for all other blobs) as input for the call to 'CodeHeap::contain()'. It's simple and still much cheaper than a virtual call. What do you think? I've also updated the documentation of the CodeBlob class hierarchy in codeBlob.hpp. Please let me know if I've missed something. Thank you and best regards, Volker From volker.simonis at gmail.com Fri Sep 1 15:46:32 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 1 Sep 2017 17:46:32 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> Message-ID: Hi, I've decided to split the fix for the 'CodeHeap::contains_blob()' problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob()" (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new review thread for discussing it at: http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html So please lets keep this thread for discussing the interpreter code size issue only. I've prepared a new version of the webrev which is the same as the first one with the only difference that the change to 'CodeHeap::contains_blob()' has been removed: http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ Thanks, Volker On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis wrote: > On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov > wrote: >> Very good change. Thank you, Volker. >> >> About contains_blob(). The problem is that AOTCompiledMethod allocated in >> CHeap and not in aot code section (which is RO): >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >> >> It is allocated in CHeap after AOT library is loaded. Its code_begin() >> points to AOT code section but AOTCompiledMethod* points outside it (to >> normal malloced space) so you can't use (char*)blob address. >> > > Thanks for the explanation - now I got it. > >> There are 2 ways to fix it, I think. >> One is to add new field to CodeBlobLayout and set it to blob* address for >> normal CodeCache blobs and to code_begin for AOT code. >> Second is to use contains(blob->code_end() - 1) assuming that AOT code is >> never zero. >> > > I'll give it a try tomorrow and will send out a new webrev. > > Regards, > Volker > >> Thanks, >> Vladimir >> >> >> On 8/31/17 5:43 AM, Volker Simonis wrote: >>> >>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>> wrote: >>>> >>>> >>>> >>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>> >>>>> >>>>> While working on this, I found another problem which is related to the >>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg test >>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>> >>>>> The problem is that JDK-8183573 replaced >>>>> >>>>> virtual bool contains_blob(const CodeBlob* blob) const { return >>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>> >>>>> by: >>>>> >>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>> contains(blob->code_begin()); } >>>>> >>>>> But that my be wrong in the corner case where the size of the >>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>> 'header' - i.e. the C++ object itself) because in that case >>>>> CodeBlob::code_begin() points right behind the CodeBlob's header which >>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>> >>>> >>>> >>>> I recall this change was somehow necessary to allow merging >>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>> one devirtualized method, so you need to ensure all AOT tests >>>> pass with this change (on linux-x64). >>>> >>> >>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>> successful. Are there any other tests I should check? >>> >>> That said, it is a little hard to follow the stages of your change. It >>> seems like >>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>> was reviewed [1] but then finally the slightly changed version from >>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ was >>> checked in and linked to the bug report. >>> >>> The first, reviewed version of the change still had a correct version >>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>> checked in version has the faulty version of that method. >>> >>> I don't know why you finally did that change to 'contains_blob()' but >>> I don't see any reason why we shouldn't be able to directly use the >>> blob's address for inclusion checking. From what I understand, it >>> should ALWAYS be contained in the corresponding CodeHeap so no reason >>> to mess with 'CodeBlob::code_begin()'. >>> >>> Please let me know if I'm missing something. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>> >>>> I can't help to wonder if we'd not be better served by disallowing >>>> zero-sized payloads. Is this something that can ever actually >>>> happen except by abuse of the white box API? >>>> >>> >>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>> wants to allocate "segment sized" blocks which is most easily achieved >>> by allocation zero-sized CodeBlobs. And I think there's nothing wrong >>> about it if we handle the inclusion tests correctly. >>> >>> Thank you and best regards, >>> Volker >>> >>>> /Claes From coleen.phillimore at oracle.com Fri Sep 1 15:52:01 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 1 Sep 2017 11:52:01 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A97B85.8@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> <59A97B85.8@oracle.com> Message-ID: <2914a34f-845a-3dd1-0407-c42dbac04b19@oracle.com> The only Atomic::inc* that I found in product code that wasn't printing statistics or exception cases was mostly in G1 and one interesting case in objectMonitor and safepointing, where a lot of other CAS operations already have been done.? I'm willing to bet this platform specific optimization has no value.?? I would vote removal, pending examination of these places. share/vm/gc/g1/dirtyCardQueue.cpp ? if (result) { ??? assert_fully_consumed(node, buffer_size()); ??? Atomic::inc(&_processed_buffers_mut); ? } ... ????? Atomic::inc(&_processed_buffers_rs_thread);* * share/vm/gc/g1/heapRegionRemSet.cpp ????????? Atomic::inc(&_occupied); ? Atomic::inc(&_n_coarsenings); share/vm/runtime/objectMonitor.cpp ObjectMonitor::enter() ? // Prevent deflation at STW-time.? See deflate_idle_monitors() and is_busy(). ? // Ensure the object-monitor relationship remains stable while there's contention. ? Atomic::inc(&_count); share/vm/runtime/safepoint.cpp ????? if (is_synchronizing()) { ???????? Atomic::inc (&TryingToBlock) ; ????? } share/vm/code/nmethod.cpp nmethodLocker ? Atomic::inc(&nm->_lock_count); Coleen On 9/1/17 11:23 AM, Erik ?sterlund wrote: > Hi Coleen, > > On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote: >> >> >> On 9/1/17 10:15 AM, Erik ?sterlund wrote: >>> Hi Andrew, >>> >>> On 2017-09-01 15:41, Andrew Haley wrote: >>>> On 31/08/17 13:45, Erik ?sterlund wrote: >>>>> Hi everyone, >>>>> >>>>> Bug ID: >>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>> >>>>> The time has come for the next step in generalizing Atomic with >>>>> templates. Today I will focus on Atomic::inc/dec. >>>>> >>>>> I have tried to mimic the new Kim style that seems to have been >>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>> structure looks like this: >>>>> >>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>>>> that performs some basic type checks. >>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>> define >>>>> the operation arbitrarily for a given platform. The default >>>>> implementation if not specialized for a platform is to call >>>>> Atomic::add. >>>>> So only platforms that want to do something different than that as an >>>>> optimization have to provide a specialization. >>>>> Layer 3) Platforms that decide to specialize >>>>> PlatformInc/PlatformDec to >>>>> be more optimized may inherit from a helper class >>>>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>>>> necessary computation what the increment/decrement should be after >>>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >>>>> then >>>>> only needs to define an inc/dec member function, and will then get >>>>> all >>>>> the context information necessary to generate a more optimized >>>>> implementation. Easy peasy. >>>> I wanted to say something nice, but I honestly can't.? I am dismayed. >>> >>> Okay. >>> >>>> I hoped that inc/dec would turn out to be much simpler than the >>>> cmpxchg functions: I think they should, because they don't have to >>>> deal with the complexity of potentially three different types. >>>> Instead we have, again, a large and complex patch. >>>> >>>> Even on AArch64, which should be the simplest case because Atomic::inc >>>> can be defined as >>>> >>>> template >>>> inc(T1 *dest) { >>>> ?? return __sync_add_and_fetch(dest, 1); >>>> } >>> >>> AArch64 is indeed the simplest case. It does not have a >>> specialization in my patch. It simply expresses Atomic::inc in terms >>> of Atomic::add. >>> >>>> or something similar, we have >>>> >>>> Atomic::inc >>>> Atomic::IncImpl::operator() >>>> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >>>> Atomic::add >>>> Atomic::AddImpl::operator() >>>> Atomic::AddAndFetch >::operator() >>>> Atomic::PlatformAdd<4ul>::add_and_fetch >>>> __sync_add_and_fetch >>>> >>>> I quite understand that it isn't so easy on some systems, and they >>>> need a generic form that explodes into four different calls, one for >>>> each size of integer.? I completely accept that it will be more >>>> complex for everything else.? But is it necessary to have so much code >>>> for something so simple?? This is a 1400 line patch. Granted, much of >>>> it is simply moving stuff around, but despite the potential of >>>> template code to simplify the implementation we have a more complex >>>> solution than we had before. >>>> >>>> I ask you, is this the simplest solution that you believe is possible? >>> >>> It is not the simplest solution I can think of. The simplest >>> solution I can think of is to remove all specialized versions of >>> Atomic::inc/dec and just have it call Atomic::add directly. That >>> would remove the optimizations we have today, for whatever reason we >>> have them. It would lead to slightly more conservative fencing on >>> PPC/S390, and would lead to slightly less optimal machine encoding >>> on x86 (without immediate values in the instructions). But it would >>> be simpler for sure. I did not put any judgement into whether our >>> existing optimizations are worthwhile or not. But if you want to >>> prioritize simplicity, removing those optimizations is one possible >>> solution. Would you prefer that? >> >> I wonder if you could remove the linux x86 asm code for inc/dec, >> recode it to use add, and do a dev submit run against your patch? >> While we're discussing this. > > Okay, I will try that. > > /Erik > >> thanks, >> Coleen >> >>> >>> Thanks, >>> /Erik >> > From vladimir.kozlov at oracle.com Fri Sep 1 16:00:42 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Sep 2017 09:00:42 -0700 Subject: RFR(S): 8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob() In-Reply-To: References: Message-ID: Checking type is emulation of virtual call ;-) But I agree that it is simplest solution - one line change (excluding comment - comment is good BTW). You can also add guard AOT_ONLY() around aot specific code: const void* start = AOT_ONLY( (code_blob_type() == CodeBlobType::AOT) ? blob->code_begin() : ) (void*)blob; because we do have builds without AOT. Thanks, Vladimir On 9/1/17 8:42 AM, Volker Simonis wrote: > Hi, > > can I please have a review and sponsor for the following small fix: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091/ > https://bugs.openjdk.java.net/browse/JDK-8187091 > > We see failures in > test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java which > are cause by problems in CodeHeap::contains_blob() for corner cases > with CodeBlobs of zero size: > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (heap.cpp:248), pid=27586, tid=27587 > # guarantee((char*) b >= _memory.low_boundary() && (char*) b < > _memory.high()) failed: The block to be deallocated 0x00007fffe6666f80 > is not within the heap starting with 0x00007fffe6667000 and ending > with 0x00007fffe6ba000 > > The problem is that JDK-8183573 replaced > > virtual bool contains_blob(const CodeBlob* blob) const { return > low_boundary() <= (char*) blob && (char*) blob < high(); } > > by: > > bool contains_blob(const CodeBlob* blob) const { return > contains(blob->code_begin()); } > > But that my be wrong in the corner case where the size of the > CodeBlob's payload is zero (i.e. the CodeBlob consists only of the > 'header' - i.e. the C++ object itself) because in that case > CodeBlob::code_begin() points right behind the CodeBlob's header which > is a memory location which doesn't belong to the CodeBlob anymore. > > This exact corner case is exercised by ReturnBlobToWrongHeapTest which > allocates CodeBlobs of size zero (i.e. zero 'payload') with the help > of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills > up. The test first fills the 'non-profiled nmethods' CodeHeap. If the > 'non-profiled nmethods' CodeHeap is full, the VM automatically tries > to allocate from the 'profiled nmethods' CodeHeap until that fills up > as well. But in the CodeCache the 'profiled nmethods' CodeHeap is > located right before the non-profiled nmethods' CodeHeap. So if the > last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a > payload size of zero and uses all the CodeHeaps remaining size, we > will end up with a CodeBlob whose code_begin() address will point > right behind the actual CodeHeap (i.e. it will point right at the > beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This > will result in the above guarantee to fire, when we will try to free > the last allocated CodeBlob (with > sun.hotspot.WhiteBox.freeCodeBlob()). > > In a previous mail thread > (http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/028175.html) > Vladimir explained why JDK-8183573 was done: > >> About contains_blob(). The problem is that AOTCompiledMethod allocated in CHeap and not in aot code section (which is RO): >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >> >> It is allocated in CHeap after AOT library is loaded. Its code_begin() points to AOT code section but AOTCompiledMethod* >> points outside it (to normal malloced space) so you can't use (char*)blob address. > > and proposed these two fixes: > >> There are 2 ways to fix it, I think. >> One is to add new field to CodeBlobLayout and set it to blob* address for normal CodeCache blobs and to code_begin for >> AOT code. >> Second is to use contains(blob->code_end() - 1) assuming that AOT code is never zero. > > I came up with a slightly different solution - just use > 'CodeHeap::code_blob_type()' whether to use 'blob->code_begin()' (for > the AOT case) or '(void*)blob' (for all other blobs) as input for the > call to 'CodeHeap::contain()'. It's simple and still much cheaper than > a virtual call. What do you think? > > I've also updated the documentation of the CodeBlob class hierarchy in > codeBlob.hpp. Please let me know if I've missed something. > > Thank you and best regards, > Volker > From erik.osterlund at oracle.com Fri Sep 1 16:10:38 2017 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 1 Sep 2017 18:10:38 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <2914a34f-845a-3dd1-0407-c42dbac04b19@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> <59A97B85.8@oracle.com> <2914a34f-845a-3dd1-0407-c42dbac04b19@oracle.com> Message-ID: <5F1D016E-C92B-4C26-8A3C-D4BF59033751@oracle.com> Hi Coleen, I tend to agree. I would happily nuke this optimization in the name of simplicity. Thanks, /Erik > On 1 Sep 2017, at 17:52, coleen.phillimore at oracle.com wrote: > > > The only Atomic::inc* that I found in product code that wasn't printing statistics or exception cases was mostly in G1 and one interesting case in objectMonitor and safepointing, where a lot of other CAS operations already have been done. I'm willing to bet this platform specific optimization has no value. I would vote removal, pending examination of these places. > > share/vm/gc/g1/dirtyCardQueue.cpp > > if (result) { > assert_fully_consumed(node, buffer_size()); > Atomic::inc(&_processed_buffers_mut); > } > ... > Atomic::inc(&_processed_buffers_rs_thread); > > share/vm/gc/g1/heapRegionRemSet.cpp > > Atomic::inc(&_occupied); > Atomic::inc(&_n_coarsenings); > > share/vm/runtime/objectMonitor.cpp > > ObjectMonitor::enter() > > // Prevent deflation at STW-time. See deflate_idle_monitors() and is_busy(). > // Ensure the object-monitor relationship remains stable while there's contention. > Atomic::inc(&_count); > > share/vm/runtime/safepoint.cpp > > if (is_synchronizing()) { > Atomic::inc (&TryingToBlock) ; > } > > share/vm/code/nmethod.cpp > nmethodLocker > > Atomic::inc(&nm->_lock_count); > > > Coleen > > >> On 9/1/17 11:23 AM, Erik ?sterlund wrote: >> Hi Coleen, >> >>> On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote: >>> >>> >>>> On 9/1/17 10:15 AM, Erik ?sterlund wrote: >>>> Hi Andrew, >>>> >>>>> On 2017-09-01 15:41, Andrew Haley wrote: >>>>>> On 31/08/17 13:45, Erik ?sterlund wrote: >>>>>> Hi everyone, >>>>>> >>>>>> Bug ID: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>> >>>>>> The time has come for the next step in generalizing Atomic with >>>>>> templates. Today I will focus on Atomic::inc/dec. >>>>>> >>>>>> I have tried to mimic the new Kim style that seems to have been >>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>>> structure looks like this: >>>>>> >>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>>>>> that performs some basic type checks. >>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >>>>>> the operation arbitrarily for a given platform. The default >>>>>> implementation if not specialized for a platform is to call Atomic::add. >>>>>> So only platforms that want to do something different than that as an >>>>>> optimization have to provide a specialization. >>>>>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to >>>>>> be more optimized may inherit from a helper class >>>>>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>>>>> necessary computation what the increment/decrement should be after >>>>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >>>>>> only needs to define an inc/dec member function, and will then get all >>>>>> the context information necessary to generate a more optimized >>>>>> implementation. Easy peasy. >>>>> I wanted to say something nice, but I honestly can't. I am dismayed. >>>> >>>> Okay. >>>> >>>>> I hoped that inc/dec would turn out to be much simpler than the >>>>> cmpxchg functions: I think they should, because they don't have to >>>>> deal with the complexity of potentially three different types. >>>>> Instead we have, again, a large and complex patch. >>>>> >>>>> Even on AArch64, which should be the simplest case because Atomic::inc >>>>> can be defined as >>>>> >>>>> template >>>>> inc(T1 *dest) { >>>>> return __sync_add_and_fetch(dest, 1); >>>>> } >>>> >>>> AArch64 is indeed the simplest case. It does not have a specialization in my patch. It simply expresses Atomic::inc in terms of Atomic::add. >>>> >>>>> or something similar, we have >>>>> >>>>> Atomic::inc >>>>> Atomic::IncImpl::operator() >>>>> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >>>>> Atomic::add >>>>> Atomic::AddImpl::operator() >>>>> Atomic::AddAndFetch >::operator() >>>>> Atomic::PlatformAdd<4ul>::add_and_fetch >>>>> __sync_add_and_fetch >>>>> >>>>> I quite understand that it isn't so easy on some systems, and they >>>>> need a generic form that explodes into four different calls, one for >>>>> each size of integer. I completely accept that it will be more >>>>> complex for everything else. But is it necessary to have so much code >>>>> for something so simple? This is a 1400 line patch. Granted, much of >>>>> it is simply moving stuff around, but despite the potential of >>>>> template code to simplify the implementation we have a more complex >>>>> solution than we had before. >>>>> >>>>> I ask you, is this the simplest solution that you believe is possible? >>>> >>>> It is not the simplest solution I can think of. The simplest solution I can think of is to remove all specialized versions of Atomic::inc/dec and just have it call Atomic::add directly. That would remove the optimizations we have today, for whatever reason we have them. It would lead to slightly more conservative fencing on PPC/S390, and would lead to slightly less optimal machine encoding on x86 (without immediate values in the instructions). But it would be simpler for sure. I did not put any judgement into whether our existing optimizations are worthwhile or not. But if you want to prioritize simplicity, removing those optimizations is one possible solution. Would you prefer that? >>> >>> I wonder if you could remove the linux x86 asm code for inc/dec, recode it to use add, and do a dev submit run against your patch? While we're discussing this. >> >> Okay, I will try that. >> >> /Erik >> >>> thanks, >>> Coleen >>> >>>> >>>> Thanks, >>>> /Erik >>> >> > From vladimir.kozlov at oracle.com Fri Sep 1 16:16:28 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Sep 2017 09:16:28 -0700 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> Message-ID: <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> May be add new CodeBlob's method to adjust sizes instead of directly setting them in CodeCache::free_unused_tail(). Then you would not need friend class CodeCache in CodeBlob. Also I think adjustment to header_size should be done in CodeCache::free_unused_tail() to limit scope of code who knows about blob layout. Thanks, Vladimir On 9/1/17 8:46 AM, Volker Simonis wrote: > Hi, > > I've decided to split the fix for the 'CodeHeap::contains_blob()' > problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails > because of problems in CodeHeap::contains_blob()" > (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new > review thread for discussing it at: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html > > So please lets keep this thread for discussing the interpreter code > size issue only. I've prepared a new version of the webrev which is > the same as the first one with the only difference that the change to > 'CodeHeap::contains_blob()' has been removed: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ > > Thanks, > Volker > > > On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis > wrote: >> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >> wrote: >>> Very good change. Thank you, Volker. >>> >>> About contains_blob(). The problem is that AOTCompiledMethod allocated in >>> CHeap and not in aot code section (which is RO): >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>> >>> It is allocated in CHeap after AOT library is loaded. Its code_begin() >>> points to AOT code section but AOTCompiledMethod* points outside it (to >>> normal malloced space) so you can't use (char*)blob address. >>> >> >> Thanks for the explanation - now I got it. >> >>> There are 2 ways to fix it, I think. >>> One is to add new field to CodeBlobLayout and set it to blob* address for >>> normal CodeCache blobs and to code_begin for AOT code. >>> Second is to use contains(blob->code_end() - 1) assuming that AOT code is >>> never zero. >>> >> >> I'll give it a try tomorrow and will send out a new webrev. >> >> Regards, >> Volker >> >>> Thanks, >>> Vladimir >>> >>> >>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>> >>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>> wrote: >>>>> >>>>> >>>>> >>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>> >>>>>> >>>>>> While working on this, I found another problem which is related to the >>>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg test >>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>> >>>>>> The problem is that JDK-8183573 replaced >>>>>> >>>>>> virtual bool contains_blob(const CodeBlob* blob) const { return >>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>>> >>>>>> by: >>>>>> >>>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>>> contains(blob->code_begin()); } >>>>>> >>>>>> But that my be wrong in the corner case where the size of the >>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>>> 'header' - i.e. the C++ object itself) because in that case >>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header which >>>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>>> >>>>> >>>>> >>>>> I recall this change was somehow necessary to allow merging >>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>>> one devirtualized method, so you need to ensure all AOT tests >>>>> pass with this change (on linux-x64). >>>>> >>>> >>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>>> successful. Are there any other tests I should check? >>>> >>>> That said, it is a little hard to follow the stages of your change. It >>>> seems like >>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>> was reviewed [1] but then finally the slightly changed version from >>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ was >>>> checked in and linked to the bug report. >>>> >>>> The first, reviewed version of the change still had a correct version >>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>>> checked in version has the faulty version of that method. >>>> >>>> I don't know why you finally did that change to 'contains_blob()' but >>>> I don't see any reason why we shouldn't be able to directly use the >>>> blob's address for inclusion checking. From what I understand, it >>>> should ALWAYS be contained in the corresponding CodeHeap so no reason >>>> to mess with 'CodeBlob::code_begin()'. >>>> >>>> Please let me know if I'm missing something. >>>> >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>> >>>>> I can't help to wonder if we'd not be better served by disallowing >>>>> zero-sized payloads. Is this something that can ever actually >>>>> happen except by abuse of the white box API? >>>>> >>>> >>>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>>> wants to allocate "segment sized" blocks which is most easily achieved >>>> by allocation zero-sized CodeBlobs. And I think there's nothing wrong >>>> about it if we handle the inclusion tests correctly. >>>> >>>> Thank you and best regards, >>>> Volker >>>> >>>>> /Claes From vladimir.kozlov at oracle.com Fri Sep 1 16:43:54 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Sep 2017 09:43:54 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> Message-ID: Hi Rohit, Changes look good. Only question I have is about MaxVectorSize. It is set > 16 only in presence of AVX: http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 Does that code works for AMD 17h too? Thanks, Vladimir On 9/1/17 8:04 AM, Rohit Arul Raj wrote: > On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj wrote: >> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes wrote: >>> Hi Rohit, >>> >>> I think the patch needs updating for jdk10 as I already see a lot of logic >>> around UseSHA in vm_version_x86.cpp. >>> >>> Thanks, >>> David >>> >> >> Thanks David, I will update the patch wrt JDK10 source base, test and >> resubmit for review. >> >> Regards, >> Rohit >> > > Hi All, > > I have updated the patch wrt openjdk10/hotspot (parent: > 13519:71337910df60), did regression testing using jtreg ($make > default) and didnt find any regressions. > > Can anyone please volunteer to review this patch which sets flag/ISA > defaults for newer AMD 17h (EPYC) processor? > > ************************* Patch **************************** > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -1088,6 +1088,22 @@ > } > FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); > } > + if (supports_sha()) { > + if (FLAG_IS_DEFAULT(UseSHA)) { > + FLAG_SET_DEFAULT(UseSHA, true); > + } > + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || > UseSHA512Intrinsics) { > + if (!FLAG_IS_DEFAULT(UseSHA) || > + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || > + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || > + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { > + warning("SHA instructions are not available on this CPU"); > + } > + FLAG_SET_DEFAULT(UseSHA, false); > + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); > + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } > > // some defaults for AMD family 15h > if ( cpu_family() == 0x15 ) { > @@ -1109,11 +1125,43 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + UseXMMForArrayCopy = true; > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + UseUnalignedLoadStores = true; > + } > + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { > + UseBMI2Instructions = true; > + } > + if (MaxVectorSize > 32) { > + FLAG_SET_DEFAULT(MaxVectorSize, 32); > + } > + if (UseSHA) { > + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } else if (UseSHA512Intrinsics) { > + warning("Intrinsics for SHA-384 and SHA-512 crypto hash > functions not available on this CPU."); > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } > + } > +#ifdef COMPILER2 > + if (supports_sse4_2()) { > + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -505,6 +505,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -515,19 +523,13 @@ > result |= CPU_LZCNT; > if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) > result |= CPU_SSE4A; > + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > + result |= CPU_HT; > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > > ************************************************************** > > Thanks, > Rohit > >>> >>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>> >>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>> wrote: >>>>> >>>>> Hi Rohit, >>>>> >>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>> the commit process. >>>>>> >>>>>> Webrev: >>>>>> >>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>> >>>>> >>>>> >>>>> Unfortunately patches can not be accepted from systems outside the >>>>> OpenJDK >>>>> infrastructure and ... >>>>> >>>>>> I have also attached the patch (hg diff -g) for reference. >>>>> >>>>> >>>>> >>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>> the >>>>> patch is small please include it inline. Otherwise you will need to find >>>>> an >>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>> >>>> >>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>> didnt find any regressions. >>>>> >>>>> >>>>> >>>>> Sounds good, but until I see the patch it is hard to comment on testing >>>>> requirements. >>>>> >>>>> Thanks, >>>>> David >>>> >>>> >>>> Thanks David, >>>> Yes, it's a small patch. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1051,6 +1051,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1072,11 +1088,43 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + UseXMMForArrayCopy = true; >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + UseUnalignedLoadStores = true; >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + UseBMI2Instructions = true; >>>> + } >>>> + if (MaxVectorSize > 32) { >>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -513,6 +513,16 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> >>>> Regards, >>>> Rohit >>>> >>> From david.holmes at oracle.com Fri Sep 1 21:51:17 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 2 Sep 2017 07:51:17 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A9612C.40900@oracle.com> References: <59A804F8.9000501@oracle.com> <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> <59A91CE6.2080206@oracle.com> <516f938c-3ed5-2d95-2a3b-418ad2d2a149@oracle.com> <59A9612C.40900@oracle.com> Message-ID: <20fe4ff5-5ea6-449a-09a8-3859db022477@oracle.com> > David? I am curious if you have the same opinion. If you both want to replace the template names I and P with T, then I am happy to do that. I don't mind the P, I convention, but probably would not miss it either. So I'm on the fence. David ----- On 1/09/2017 11:31 PM, Erik ?sterlund wrote: > Hi Coleen, > > On 2017-09-01 14:51, coleen.phillimore at oracle.com wrote: >> >> >> On 9/1/17 4:40 AM, Erik ?sterlund wrote: >>> Hi Coleen, >>> >>> Thank you for taking your time to review this. >>> >>> On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote: >>>> >>>> Hi, I'm trying to parse the templates to review this but maybe it's >>>> convention but decoding these with parameters that are single >>>> capital letters make reading the template very difficult.? There are >>>> already a lot of non-alphanumeric characters.?? When the letter is >>>> T, that is expected by convention, but D or especially I makes it >>>> really hard.?? Can these be normalized to all use T when there is >>>> only one template parameter?? It'll be clear that T* is a pointer >>>> and T is an integer without having it be P. >>> >>> I apologize the names of the template parameters are hard to >>> understand. For what it's worth, I am only consistently applying >>> Kim's conventions here. It seemed like a bad idea to violate >>> conventions already set up - that would arguably be more confusing. >>> >>> The convention from earlier work by Kim is: >>> D: Type of destination >>> I: Operand type that has to be an integral type >>> P: Operand type that is a pointer element type >>> T: Generic operand type, may be integral or pointer type >>> >>> Personally, I do not mind this convention. It is more specific and >>> annotates things we know about the type into the name of the type. >>> >>> Do you want me to: >>> >>> 1) Keep the convention, now that I have explained what the convention >>> is and why it is your friend >> >> It is not my friend.? It's not helpful.?? I have to go through >> multiple non-alphabetic characters looking for the letter I or the >> letter P to mentally make the substitution of the template type. > > Okay. I understand now that the pre-existing naming convention of types > named I and P differentiating integral types from pointer types is not > helpful to you. And if I understand you correctly, you would like to > introduce a new naming convention that you find more helpful that uses > the more general type name T instead, regardless if it refers to an > integral type or a pointer type, and save the exercise of figuring out > whether it is intentionally constrained to be a pointer type or an > integral type to the reader by going to the declaration, and there > reading some kind of comment describing such properties in text instead? > > Do we have a consensus that this new convention is indeed more desirable? > >> >>> 2) Break the convention for this change only making the naming >>> inconsistent >> >> Break it for this changeset and we'll fix it later for the earlier >> work from Kim.? I don't remember P and I in Kim's changeset but >> realized while looking at your changeset, this was one thing that >> makes these templates slower and more difficult to read. > > Okay. > >> In the case of cmpxchg templates with a source, destination and >> original values, it was necessary to have more than T be the template >> type, although unsatisfying, because it turned out that the types >> couldn't be the same. > > Okay. > >> >>> 3) Change the convention throughout consistently, including all >>> earlier work from Kim >>> >>>> >>>> +template >>>> +struct Atomic::IncImpl>>> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >>>> + void operator()(I volatile* dest) const { >>>> + typedef IntegralConstant Adjustment; >>>> + typedef PlatformInc PlatformOp; >>>> + PlatformOp()(dest); >>>> + } >>>> +}; >>>> >>>> This one isn't as difficult, because it's short, but it would be >>>> faster to understand with T. >>>> >>>> +template >>>> +struct Atomic::IncImpl>>> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >>>> + void operator()(T volatile* dest) const { >>>> + typedef IntegralConstant Adjustment; >>>> + typedef PlatformInc PlatformOp; >>>> + PlatformOp()(dest); >>>> + } >>>> +}; >>>> >>>> +template<> >>>> +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { >>>> + void operator()(jshort volatile* dest) const { >>>> + add(jshort(1), dest); >>>> + } >>>> +}; >>>> >>>> >>>> Did I already ask if this could be changed to u2 rather than >>>> jshort?? Or is that the follow-on RFE? >>> >>> That is a follow-on RFE. >> >> Good.? I think that's the one that I assigned to myself. > > Yes, you are right. > >>> >>>> +// Helper for platforms wanting a constant adjustment. >>>> +template >>>> +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { >>>> + typedef PlatformInc Derived; >>>> >>>> >>>> I can't find the caller of this.? Is it really a lot faster than >>>> having the platform independent add(1, T) / add(-1, T) to make all >>>> this code worth having?? How is this called?? I couldn't parse the >>>> trick.? Atomic::inc() is always a "constant adjustment" so I'm >>>> confused about what the comment means and what motivates all the asm >>>> code.?? Do these platform implementations exist because they don't >>>> have twos complement for integer representation?? really? >>> >>> This is used by some x86, PPC and s390 platforms. Personally I >>> question its usefulness for x86. I believe it might be one of those >>> things were we ran some benchmarks a decade ago and concluded that it >>> was slightly faster to have a slimmed path for Atomic::inc rather >>> than reusing Atomic::add. >> >> Yes, there are a lot of optimizations that we slog along in the code >> base because they might have either theoretically or measurably made >> some difference in something we don't have anymore. > > I noticed. :) > >> >>> >>> I did not initially want to bring this up as it seems like none of my >>> business, but now that the question has been asked about differences, >>> I could not help but notice the advertised "leading sync" convention >>> of Atomic::inc on PPC is not respected. That is, there is no "sync" >>> fence before the atomic increment, as required by the specified >>> semantics. There is not even a leading "lwsync". The corresponding >>> Atomic::add operation though, does have leading lwsync (unlike >>> Atomic::inc). Now this should arguably be reinforced to sync rather >>> than lwsync to respect the advertised semantics of both Atomic::add >>> and Atomic::inc on PPC. Hopefully that statement will not turn into a >>> long unrelated mailing thread... >> >> Could you file an bug with this observation? > > Sure. > >>> >>> Conclusively though, there is definitely a substantial difference in >>> the fencing comparing the PPC implementation of Atomic::inc to >>> Atomic::add. Whether either one of them conforms to intended >>> semantics or not is a different matter - one that I was hoping not to >>> have to deal with in this RFE as I am merely templateifying what was >>> already there, without judging the existing specializations. And it >>> is my observation that as the code looks now, we would incur a bunch >>> of more fencing compared to what the code does today on PPC. >>> >> >> Completely understand.?? How are these called exactly though?? I >> couldn't figure it out. > > They are called like this: > IncImpl::operator() calls PlatformInc::operator(), which has its class > partially specialized by the platform (e.g. atomic_linux_pcc.hpp). Its > operator() is defined by the super class helper, > IncUsingConstant::operator(), that scales the addend accordingly and > subsequently calls the PlatformInc::inc function that is defined in the > PPC-specific atomic header and performs some suitable inline assembly > for the operation. > >> >>>> Also, the function name This() is really disturbing and >>>> distracting.? Can it be called some verb() representing what it >>>> does?? cast_to_derived()? >>>> >>>> + template >>>> + void operator()(I volatile* dest) const { >>>> + This()->template inc(dest); >>>> + } >>>> >>> >>> Yes, I will change the name accordingly as you suggest. >>> >>>> I didn't know you could put "template" there. >>> >>> It is required to put the template keyword before the member function >>> name when calling a template member function with explicit template >>> parameters (as opposed to implicitly inferred template parameters) on >>> a template type. >> >> I thought you could just stay inc() in the call, but my C++ >> template vocabularly is minimal. >>> >>>> What does this call? >>> >>> This calls the platform-defined intrinsic that is defined in the >>> platform files - the one that contains the inline assembly. >> >> How?? I don't see how...? :( > > Hopefully I already explained this above. > >>> >>>> Rather than I for integer case, and P for pointer case, can you add >>>> a one line comment above this like: >>>> // Helper for integer types >>>> and >>>> // Helper for pointer types >>> >>> Or perhaps we could do both? Nevertheless, I will add these comments. >>> But as per the discussion above, I would be happy if we could keep >>> the convention that Kim has already set up for the template type names. >>> >>>> Small local comments would be really helpful for many of these >>>> functions.?? Just to get more english words in there...? Since Kim's >>>> on vacation can you help me understand this code and add comments so >>>> I remember the reasons for some of this? >>> >>> Sure - I will decorate the code with some comments to help >>> understanding. I will send an updated webrev when I get your reply >>> regarding the typename naming convention verdict. >> >> That's my opinion anyway.?? David might have the opposite opinion. > > David? I am curious if you have the same opinion. If you both want to > replace the template names I and P with T, then I am happy to do that. > > Thanks for the review. > > /Erik > >> Thanks, >> Coleen >> >>> >>> Thanks for the review! >>> >>> /Erik >>> >>>> >>>> Thanks! >>>> Coleen >>>> >>>> >>>> On 8/31/17 8:45 AM, Erik ?sterlund wrote: >>>>> Hi everyone, >>>>> >>>>> Bug ID: >>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>> >>>>> The time has come for the next step in generalizing Atomic with >>>>> templates. Today I will focus on Atomic::inc/dec. >>>>> >>>>> I have tried to mimic the new Kim style that seems to have been >>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>> structure looks like this: >>>>> >>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>> object that performs some basic type checks. >>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>> define the operation arbitrarily for a given platform. The default >>>>> implementation if not specialized for a platform is to call >>>>> Atomic::add. So only platforms that want to do something different >>>>> than that as an optimization have to provide a specialization. >>>>> Layer 3) Platforms that decide to specialize >>>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>>> performing the necessary computation what the increment/decrement >>>>> should be after pointer scaling using CRTP. The >>>>> PlatformInc/PlatformDec operation then only needs to define an >>>>> inc/dec member function, and will then get all the context >>>>> information necessary to generate a more optimized implementation. >>>>> Easy peasy. >>>>> >>>>> It is worth noticing that the generalized Atomic::dec operation >>>>> assumes a two's complement integer machine and potentially sends >>>>> the unary negative of a potentially unsigned type to Atomic::add. I >>>>> have the following comments about this: >>>>> 1) We already assume in other code that two's complement integers >>>>> must be present. >>>>> 2) A machine that does not have two's complement integers may still >>>>> simply provide a specialization that solves the problem in a >>>>> different way. >>>>> 3) The alternative that does not make assumptions about that would >>>>> use the good old IntegerTypes::cast_to_signed metaprogramming >>>>> stuff, and I seem to recall we thought that was a bit too involved >>>>> and complicated. >>>>> This is the reason why I have chosen to use unary minus on the >>>>> potentially unsigned type in the shared helper code that sends the >>>>> decrement as an addend to Atomic::add. >>>>> >>>>> It would also be nice if somebody with access to PPC and s390 >>>>> machines could try out the relevant changes there so I do not >>>>> accidentally break those platforms. I have blind-coded the addition >>>>> of the immediate values passed in to the inline assembly in a way >>>>> that I think looks like it should work. >>>>> >>>>> Testing: >>>>> RBT hs-tier3, JPRT --testset hotspot >>>>> >>>>> Thanks, >>>>> /Erik >>>> >>> >> > From david.holmes at oracle.com Fri Sep 1 21:57:14 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 2 Sep 2017 07:57:14 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A96B9D.6070002@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: On 2/09/2017 12:15 AM, Erik ?sterlund wrote: > Hi Andrew, > > On 2017-09-01 15:41, Andrew Haley wrote: >> On 31/08/17 13:45, Erik ?sterlund wrote: >>> Hi everyone, >>> >>> Bug ID: >>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>> >>> The time has come for the next step in generalizing Atomic with >>> templates. Today I will focus on Atomic::inc/dec. >>> >>> I have tried to mimic the new Kim style that seems to have been >>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>> structure looks like this: >>> >>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>> that performs some basic type checks. >>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >>> the operation arbitrarily for a given platform. The default >>> implementation if not specialized for a platform is to call Atomic::add. >>> So only platforms that want to do something different than that as an >>> optimization have to provide a specialization. >>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to >>> be more optimized may inherit from a helper class >>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>> necessary computation what the increment/decrement should be after >>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >>> only needs to define an inc/dec member function, and will then get all >>> the context information necessary to generate a more optimized >>> implementation. Easy peasy. >> I wanted to say something nice, but I honestly can't.? I am dismayed. > > Okay. > >> I hoped that inc/dec would turn out to be much simpler than the >> cmpxchg functions: I think they should, because they don't have to >> deal with the complexity of potentially three different types. >> Instead we have, again, a large and complex patch. >> >> Even on AArch64, which should be the simplest case because Atomic::inc >> can be defined as >> >> template >> inc(T1 *dest) { >> ?? return __sync_add_and_fetch(dest, 1); >> } > > AArch64 is indeed the simplest case. It does not have a specialization > in my patch. It simply expresses Atomic::inc in terms of Atomic::add. > >> or something similar, we have >> >> Atomic::inc >> Atomic::IncImpl::operator() >> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >> Atomic::add >> Atomic::AddImpl::operator() >> Atomic::AddAndFetch >::operator() >> Atomic::PlatformAdd<4ul>::add_and_fetch >> __sync_add_and_fetch >> >> I quite understand that it isn't so easy on some systems, and they >> need a generic form that explodes into four different calls, one for >> each size of integer.? I completely accept that it will be more >> complex for everything else.? But is it necessary to have so much code >> for something so simple?? This is a 1400 line patch.? Granted, much of >> it is simply moving stuff around, but despite the potential of >> template code to simplify the implementation we have a more complex >> solution than we had before. >> >> I ask you, is this the simplest solution that you believe is possible? > > It is not the simplest solution I can think of. The simplest solution I > can think of is to remove all specialized versions of Atomic::inc/dec > and just have it call Atomic::add directly. That would remove the I don't think this is the source of complexity that screams for simplification. It is all the template stuff. I can't get right into this right now (it's Saturday) but I still don't see why we have such seemingly different things in cmpxchg, add and inc/dec when the basic jobs at each level are the same. Maybe it is just use of different names that is confusing me. David ----- > optimizations we have today, for whatever reason we have them. It would > lead to slightly more conservative fencing on PPC/S390, and would lead > to slightly less optimal machine encoding on x86 (without immediate > values in the instructions). But it would be simpler for sure. I did not > put any judgement into whether our existing optimizations are worthwhile > or not. But if you want to prioritize simplicity, removing those > optimizations is one possible solution. Would you prefer that? > > Thanks, > /Erik From serguei.spitsyn at oracle.com Fri Sep 1 22:48:15 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 1 Sep 2017 15:48:15 -0700 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> Message-ID: Hi Coleen, The fix looks good. Thanks, Serguei On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: > Summary: Add resolved_references and init_lock as hidden static field > in class so root is found. > > Tested manually with YourKit. See bug for images. Also ran > serviceability tests. > > open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8081323 > > Thanks, > Coleen > From rohitarulraj at gmail.com Sat Sep 2 08:16:33 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Sat, 2 Sep 2017 13:46:33 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> Message-ID: Hello Vladimir, > Changes look good. Only question I have is about MaxVectorSize. It is set > > 16 only in presence of AVX: > > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 > > Does that code works for AMD 17h too? Thanks for pointing that out. Yes, the code works fine for AMD 17h. So I have removed the surplus check for MaxVectorSize from my patch. I have updated, re-tested and attached the patch. I have one query regarding the setting of UseSHA flag: http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 AMD 17h has support for SHA. AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets enabled for it based on the availability of BMI2 and AVX2. Is there an underlying reason for this? I have handled this in the patch but just wanted to confirm. Thanks for taking time to review the code. diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -1088,6 +1088,22 @@ } FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); } + if (supports_sha()) { + if (FLAG_IS_DEFAULT(UseSHA)) { + FLAG_SET_DEFAULT(UseSHA, true); + } + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || UseSHA512Intrinsics) { + if (!FLAG_IS_DEFAULT(UseSHA) || + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { + warning("SHA instructions are not available on this CPU"); + } + FLAG_SET_DEFAULT(UseSHA, false); + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } // some defaults for AMD family 15h if ( cpu_family() == 0x15 ) { @@ -1109,11 +1125,40 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); + } + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { + FLAG_SET_DEFAULT(UseBMI2Instructions, true); + } + if (UseSHA) { + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } else if (UseSHA512Intrinsics) { + warning("Intrinsics for SHA-384 and SHA-512 crypto hash functions not available on this CPU."); + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } + } +#ifdef COMPILER2 + if (supports_sse4_2()) { + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -505,6 +505,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -515,19 +523,13 @@ result |= CPU_LZCNT; if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) result |= CPU_SSE4A; + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) + result |= CPU_HT; } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; Regards, Rohit > On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >> >> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >> wrote: >>> >>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>> wrote: >>>> >>>> Hi Rohit, >>>> >>>> I think the patch needs updating for jdk10 as I already see a lot of >>>> logic >>>> around UseSHA in vm_version_x86.cpp. >>>> >>>> Thanks, >>>> David >>>> >>> >>> Thanks David, I will update the patch wrt JDK10 source base, test and >>> resubmit for review. >>> >>> Regards, >>> Rohit >>> >> >> Hi All, >> >> I have updated the patch wrt openjdk10/hotspot (parent: >> 13519:71337910df60), did regression testing using jtreg ($make >> default) and didnt find any regressions. >> >> Can anyone please volunteer to review this patch which sets flag/ISA >> defaults for newer AMD 17h (EPYC) processor? >> >> ************************* Patch **************************** >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1088,6 +1088,22 @@ >> } >> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >> } >> + if (supports_sha()) { >> + if (FLAG_IS_DEFAULT(UseSHA)) { >> + FLAG_SET_DEFAULT(UseSHA, true); >> + } >> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >> UseSHA512Intrinsics) { >> + if (!FLAG_IS_DEFAULT(UseSHA) || >> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + warning("SHA instructions are not available on this CPU"); >> + } >> + FLAG_SET_DEFAULT(UseSHA, false); >> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> >> // some defaults for AMD family 15h >> if ( cpu_family() == 0x15 ) { >> @@ -1109,11 +1125,43 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + UseXMMForArrayCopy = true; >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + UseUnalignedLoadStores = true; >> + } >> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >> + UseBMI2Instructions = true; >> + } >> + if (MaxVectorSize > 32) { >> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >> + } >> + if (UseSHA) { >> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } else if (UseSHA512Intrinsics) { >> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >> functions not available on this CPU."); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2()) { >> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -505,6 +505,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -515,19 +523,13 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> >> ************************************************************** >> >> Thanks, >> Rohit >> >>>> >>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>>> the commit process. >>>>>>> >>>>>>> Webrev: >>>>>>> >>>>>>> >>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>> OpenJDK >>>>>> infrastructure and ... >>>>>> >>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>>> the >>>>>> patch is small please include it inline. Otherwise you will need to >>>>>> find >>>>>> an >>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>> >>>>> >>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>> didnt find any regressions. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>> testing >>>>>> requirements. >>>>>> >>>>>> Thanks, >>>>>> David >>>>> >>>>> >>>>> >>>>> Thanks David, >>>>> Yes, it's a small patch. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1051,6 +1051,22 @@ >>>>> } >>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>> } >>>>> + if (supports_sha()) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>> + } >>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>> UseSHA512Intrinsics) { >>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + warning("SHA instructions are not available on this CPU"); >>>>> + } >>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> >>>>> // some defaults for AMD family 15h >>>>> if ( cpu_family() == 0x15 ) { >>>>> @@ -1072,11 +1088,43 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + UseXMMForArrayCopy = true; >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>> { >>>>> + UseUnalignedLoadStores = true; >>>>> + } >>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>> + UseBMI2Instructions = true; >>>>> + } >>>>> + if (MaxVectorSize > 32) { >>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> + } >>>>> + if (UseSHA) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } else if (UseSHA512Intrinsics) { >>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>> functions not available on this CPU."); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2()) { >>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -513,6 +513,16 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> >>>>> Regards, >>>>> Rohit >>>>> >>>> > From aph at redhat.com Sat Sep 2 08:31:46 2017 From: aph at redhat.com (Andrew Haley) Date: Sat, 2 Sep 2017 09:31:46 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A96B9D.6070002@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: On 01/09/17 15:15, Erik ?sterlund wrote: > It is not the simplest solution I can think of. The simplest solution I > can think of is to remove all specialized versions of Atomic::inc/dec > and just have it call Atomic::add directly. That would remove the > optimizations we have today, for whatever reason we have them. It would > lead to slightly more conservative fencing on PPC/S390, I see. Can you say what instructions would be different? > and would lead to slightly less optimal machine encoding on x86 > (without immediate values in the instructions). But it would be > simpler for sure. I did not put any judgement into whether our > existing optimizations are worthwhile or not. But if you want to > prioritize simplicity, removing those optimizations is one possible > solution. Would you prefer that? Is this really about optimization? If we cared about getting this stuff as optimized as possible we'd use intrinsics on GCC/x86 targets. These have been supported for a long time. But it seems we're determined to preserve the legacy assembly-language implementations and use them everywhere, even where they are not necessary. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jesper.wilhelmsson at oracle.com Sat Sep 2 10:15:30 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Sat, 2 Sep 2017 12:15:30 +0200 Subject: jdk10/hs integration status Message-ID: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> Hi, After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: JDK-8187124 TestInterpreterMethodEntries.java: Unable to create shared archive file This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. There are four test failures that looks slightly different: tools/jar/modularJar/Basic.java tools/jar/multiRelease/ApiValidatorTest.java tools/jar/multiRelease/Basic.java tools/launcher/InfoStreams.java These four tests fails because they get a warning on stderr: Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. I do not consider this a blocker for integration, bug filed: JDK-8187125 JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. /Jesper From david.holmes at oracle.com Sat Sep 2 11:03:03 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 2 Sep 2017 21:03:03 +1000 Subject: jdk10/hs integration status In-Reply-To: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> Message-ID: <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> Hi Jesper, On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. The JPRT job that was used for the nightly testing was not valid. The repos were out of sync due to a re-run with an intervening integration job. The MaxRAMFraction failures in the tools tests below were caused by that. David ----- > > There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: > > JDK-8187124 > TestInterpreterMethodEntries.java: Unable to create shared archive file > > This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. > > > There are four test failures that looks slightly different: > tools/jar/modularJar/Basic.java > tools/jar/multiRelease/ApiValidatorTest.java > tools/jar/multiRelease/Basic.java > tools/launcher/InfoStreams.java > > These four tests fails because they get a warning on stderr: > Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. > I do not consider this a blocker for integration, bug filed: JDK-8187125 > > > JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. > > /Jesper > From jesper.wilhelmsson at oracle.com Sat Sep 2 12:30:45 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Sat, 2 Sep 2017 14:30:45 +0200 Subject: jdk10/hs integration status In-Reply-To: <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> Message-ID: <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> > On 2 Sep 2017, at 13:03, David Holmes wrote: > > Hi Jesper, > > On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >> Hi, >> After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. > > The JPRT job that was used for the nightly testing was not valid. The repos were out of sync due to a re-run with an intervening integration job. The MaxRAMFraction failures in the tools tests below were caused by that. Sigh... I thought we didn't use rerun for integration jobs. Thanks for the heads-up David! I'll start a new nightly now to get a trustworthy result. /Jesper > > David > ----- > >> There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: >> JDK-8187124 >> TestInterpreterMethodEntries.java: Unable to create shared archive file >> This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. >> There are four test failures that looks slightly different: >> tools/jar/modularJar/Basic.java >> tools/jar/multiRelease/ApiValidatorTest.java >> tools/jar/multiRelease/Basic.java >> tools/launcher/InfoStreams.java >> These four tests fails because they get a warning on stderr: >> Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. >> I do not consider this a blocker for integration, bug filed: JDK-8187125 >> JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >> /Jesper From daniel.daugherty at oracle.com Sat Sep 2 14:49:36 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 2 Sep 2017 08:49:36 -0600 Subject: jdk10/hs integration status In-Reply-To: <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> Message-ID: We're not completely out of the woods. These tests: tools/jar/modularJar/Basic.java tools/jar/multiRelease/ApiValidatorTest.java tools/jar/multiRelease/Basic.java tools/launcher/InfoStreams.java still failed in the 2017-09-01 JDK10-hs nightly with: java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit Server VM warning: Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. Dan On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >> On 2 Sep 2017, at 13:03, David Holmes wrote: >> >> Hi Jesper, >> >> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>> Hi, >>> After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. >> The JPRT job that was used for the nightly testing was not valid. The repos were out of sync due to a re-run with an intervening integration job. The MaxRAMFraction failures in the tools tests below were caused by that. > Sigh... I thought we didn't use rerun for integration jobs. > > Thanks for the heads-up David! I'll start a new nightly now to get a trustworthy result. > > /Jesper > > >> David >> ----- >> >>> There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: >>> JDK-8187124 >>> TestInterpreterMethodEntries.java: Unable to create shared archive file >>> This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. >>> There are four test failures that looks slightly different: >>> tools/jar/modularJar/Basic.java >>> tools/jar/multiRelease/ApiValidatorTest.java >>> tools/jar/multiRelease/Basic.java >>> tools/launcher/InfoStreams.java >>> These four tests fails because they get a warning on stderr: >>> Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. >>> I do not consider this a blocker for integration, bug filed: JDK-8187125 >>> JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >>> /Jesper > From vladimir.kozlov at oracle.com Sat Sep 2 17:55:31 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 2 Sep 2017 10:55:31 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> Message-ID: Hi Rohit, On 9/2/17 1:16 AM, Rohit Arul Raj wrote: > Hello Vladimir, > >> Changes look good. Only question I have is about MaxVectorSize. It is set > >> 16 only in presence of AVX: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >> >> Does that code works for AMD 17h too? > > Thanks for pointing that out. Yes, the code works fine for AMD 17h. So > I have removed the surplus check for MaxVectorSize from my patch. I > have updated, re-tested and attached the patch. Which check you removed? > > I have one query regarding the setting of UseSHA flag: > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 > > AMD 17h has support for SHA. > AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets > enabled for it based on the availability of BMI2 and AVX2. Is there an > underlying reason for this? I have handled this in the patch but just > wanted to confirm. It was done with next changes which use only AVX2 and BMI2 instructions to calculate SHA-256: http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 I don't know if AMD 15h supports these instructions and can execute that code. You need to test it. May be you should move your new UseSHA related code to the line 821 to set UseSHA for AMD. Then you don't need to overwrite UseSHA*Intrinsics flags which are set after that line. Regards, Vladimir > > Thanks for taking time to review the code. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -1088,6 +1088,22 @@ > } > FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); > } > + if (supports_sha()) { > + if (FLAG_IS_DEFAULT(UseSHA)) { > + FLAG_SET_DEFAULT(UseSHA, true); > + } > + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || > UseSHA512Intrinsics) { > + if (!FLAG_IS_DEFAULT(UseSHA) || > + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || > + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || > + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { > + warning("SHA instructions are not available on this CPU"); > + } > + FLAG_SET_DEFAULT(UseSHA, false); > + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); > + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } > > // some defaults for AMD family 15h > if ( cpu_family() == 0x15 ) { > @@ -1109,11 +1125,40 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { > + FLAG_SET_DEFAULT(UseBMI2Instructions, true); > + } > + if (UseSHA) { > + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } else if (UseSHA512Intrinsics) { > + warning("Intrinsics for SHA-384 and SHA-512 crypto hash > functions not available on this CPU."); > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } > + } > +#ifdef COMPILER2 > + if (supports_sse4_2()) { > + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -505,6 +505,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -515,19 +523,13 @@ > result |= CPU_LZCNT; > if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) > result |= CPU_SSE4A; > + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > + result |= CPU_HT; > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > > > Regards, > Rohit > > > >> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>> >>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>> wrote: >>>> >>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>> wrote: >>>>> >>>>> Hi Rohit, >>>>> >>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>> logic >>>>> around UseSHA in vm_version_x86.cpp. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>> >>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>> resubmit for review. >>>> >>>> Regards, >>>> Rohit >>>> >>> >>> Hi All, >>> >>> I have updated the patch wrt openjdk10/hotspot (parent: >>> 13519:71337910df60), did regression testing using jtreg ($make >>> default) and didnt find any regressions. >>> >>> Can anyone please volunteer to review this patch which sets flag/ISA >>> defaults for newer AMD 17h (EPYC) processor? >>> >>> ************************* Patch **************************** >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1088,6 +1088,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1109,11 +1125,43 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + UseXMMForArrayCopy = true; >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + UseUnalignedLoadStores = true; >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + UseBMI2Instructions = true; >>> + } >>> + if (MaxVectorSize > 32) { >>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -505,6 +505,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -515,19 +523,13 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> >>> ************************************************************** >>> >>> Thanks, >>> Rohit >>> >>>>> >>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Rohit, >>>>>>> >>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>>>> the commit process. >>>>>>>> >>>>>>>> Webrev: >>>>>>>> >>>>>>>> >>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>> OpenJDK >>>>>>> infrastructure and ... >>>>>>> >>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>>>> the >>>>>>> patch is small please include it inline. Otherwise you will need to >>>>>>> find >>>>>>> an >>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>> >>>>>> >>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>> didnt find any regressions. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>> testing >>>>>>> requirements. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>> >>>>>> >>>>>> >>>>>> Thanks David, >>>>>> Yes, it's a small patch. >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -1051,6 +1051,22 @@ >>>>>> } >>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>> } >>>>>> + if (supports_sha()) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>> + } >>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>> UseSHA512Intrinsics) { >>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>> + } >>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> >>>>>> // some defaults for AMD family 15h >>>>>> if ( cpu_family() == 0x15 ) { >>>>>> @@ -1072,11 +1088,43 @@ >>>>>> } >>>>>> >>>>>> #ifdef COMPILER2 >>>>>> - if (MaxVectorSize > 16) { >>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> } >>>>>> #endif // COMPILER2 >>>>>> + >>>>>> + // Some defaults for AMD family 17h >>>>>> + if ( cpu_family() == 0x17 ) { >>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>> Array Copy >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> + UseXMMForArrayCopy = true; >>>>>> + } >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>> { >>>>>> + UseUnalignedLoadStores = true; >>>>>> + } >>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>> + UseBMI2Instructions = true; >>>>>> + } >>>>>> + if (MaxVectorSize > 32) { >>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>> + } >>>>>> + if (UseSHA) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } else if (UseSHA512Intrinsics) { >>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>> functions not available on this CPU."); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> + } >>>>>> +#ifdef COMPILER2 >>>>>> + if (supports_sse4_2()) { >>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> + } >>>>>> + } >>>>>> +#endif >>>>>> + } >>>>>> } >>>>>> >>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -513,6 +513,16 @@ >>>>>> result |= CPU_LZCNT; >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>> result |= CPU_SSE4A; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> + result |= CPU_BMI2; >>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>> + result |= CPU_HT; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> + result |= CPU_ADX; >>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> + result |= CPU_SHA; >>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> + result |= CPU_FMA; >>>>>> } >>>>>> // Intel features. >>>>>> if(is_intel()) { >>>>>> >>>>>> Regards, >>>>>> Rohit >>>>>> >>>>> >> From ioi.lam at oracle.com Sun Sep 3 00:05:55 2017 From: ioi.lam at oracle.com (Ioi Lam) Date: Sat, 2 Sep 2017 17:05:55 -0700 Subject: jdk10/hs integration status In-Reply-To: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> Message-ID: <6962b514-502c-9f56-b87f-ff68a1911847@oracle.com> On 9/2/17 3:15 AM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. > > > There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: > > JDK-8187124 > TestInterpreterMethodEntries.java: Unable to create shared archive file > > This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. JDK-8187124 is not a new regression and it's a test bug, so I've removed the integration_blocker label. Thanks - Ioi > > There are four test failures that looks slightly different: > tools/jar/modularJar/Basic.java > tools/jar/multiRelease/ApiValidatorTest.java > tools/jar/multiRelease/Basic.java > tools/launcher/InfoStreams.java > > These four tests fails because they get a warning on stderr: > Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. > I do not consider this a blocker for integration, bug filed: JDK-8187125 > > > JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. > > /Jesper > From david.holmes at oracle.com Sun Sep 3 04:40:32 2017 From: david.holmes at oracle.com (David Holmes) Date: Sun, 3 Sep 2017 14:40:32 +1000 Subject: jdk10/hs integration status In-Reply-To: References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> Message-ID: On 3/09/2017 12:49 AM, Daniel D. Daugherty wrote: > We're not completely out of the woods. These tests: > > tools/jar/modularJar/Basic.java > tools/jar/multiRelease/ApiValidatorTest.java > tools/jar/multiRelease/Basic.java > tools/launcher/InfoStreams.java > > still failed in the 2017-09-01 JDK10-hs nightly with: > > java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit Server > VM warning: Option MaxRAMFraction was deprecated in version 10.0 and > will likely be removed in a future release. Isn't that the nightly we're talking about Dan? Those tests only fail if the closed repo has not got Bob's changes that switch from MaxRAMFraction to MaxRAMPercentage. David > Dan > > > On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >>> On 2 Sep 2017, at 13:03, David Holmes wrote: >>> >>> Hi Jesper, >>> >>> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>>> Hi, >>>> After going through the results of our nightlies it seems we are in >>>> fairly good shape for integration. There was one issue with a typo >>>> in a recent fix that caused some failures, this issue was resolved >>>> yesterday just after the nightly snapshot was taken. >>> The JPRT job that was used for the nightly testing was not valid. The >>> repos were out of sync due to a re-run with an intervening >>> integration job. The MaxRAMFraction failures in the tools tests below >>> were caused by that. >> Sigh... I thought we didn't use rerun for integration jobs. >> >> Thanks for the heads-up David!? I'll start a new nightly now to get a >> trustworthy result. >> >> /Jesper >> >> >>> David >>> ----- >>> >>>> There is currently one issue that I didn't recognise and at the >>>> moment it is marked as an integration blocker: >>>> JDK-8187124 >>>> TestInterpreterMethodEntries.java: Unable to create shared archive >>>> file >>>> This could as well be a problem with the test execution in which >>>> case it is not a blocker, but someone needs to look into the details >>>> here. >>>> There are four test failures that looks slightly different: >>>> tools/jar/modularJar/Basic.java >>>> tools/jar/multiRelease/ApiValidatorTest.java >>>> tools/jar/multiRelease/Basic.java >>>> tools/launcher/InfoStreams.java >>>> These four tests fails because they get a warning on stderr: >>>> Option MaxRAMFraction was deprecated in version 10.0 and will likely >>>> be removed in a future release. >>>> I do not consider this a blocker for integration, bug filed: >>>> JDK-8187125 >>>> JDK10/hs now has restricted write access. Basically it is locked but >>>> in order to fix any urgent issues that might pop up over the next >>>> couple of days these people have write access: Vladimir Kozlov, Dan >>>> Daugherty, Stefan Karlsson, and myself. >>>> /Jesper >> > From daniel.daugherty at oracle.com Sun Sep 3 04:52:38 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 2 Sep 2017 22:52:38 -0600 Subject: jdk10/hs integration status In-Reply-To: References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> Message-ID: <232afa04-3bf5-1ea5-3462-6b0920b8f0d9@oracle.com> On 9/2/17 10:40 PM, David Holmes wrote: > On 3/09/2017 12:49 AM, Daniel D. Daugherty wrote: >> We're not completely out of the woods. These tests: >> >> tools/jar/modularJar/Basic.java >> tools/jar/multiRelease/ApiValidatorTest.java >> tools/jar/multiRelease/Basic.java >> tools/launcher/InfoStreams.java >> >> still failed in the 2017-09-01 JDK10-hs nightly with: >> >> java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit >> Server VM warning: Option MaxRAMFraction was deprecated in version >> 10.0 and will likely be removed in a future release. > > Isn't that the nightly we're talking about Dan? No. The failures that were originally discussed were in the 2017-08-31 JDK10-hs nightly and were mostly caused by Calvin's rerun JPRT job. I found a couple of places in the current JDK10-hs repo where MaxRAMFraction is still used, but none that explain the above four test failures. My conclusion is that they are picking up the MaxRAMFraction from whatever mechanism is being used to launch those tests... These two kitchensink config files still use MaxRAMFraction: ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:test.jvm.args=-XX:MaxRAMFraction=2 -XX:+CrashOnOutOfMemoryError -Djava.net.preferIPv6Addresses=false -XX:-PrintVMOptions -XX:+DisplayVMOutputToStderr -XX:+UsePerfData -Xlog:gc*:gc.log -XX:+DisableExplicitGC -XX:+PrintFlagsFinal -XX:+StartAttachListener -XX:+UnlockCommercialFeatures -XX:NativeMemoryTracking=detail -XX:+ResourceManagement -XX:+FlightRecorder ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:original.jvm.args=-XX:MaxRAMFraction=8 -Djava.net.preferIPv6Addresses=false Dan > Those tests only fail if the closed repo has not got Bob's changes > that switch from MaxRAMFraction to MaxRAMPercentage. > > David > >> Dan >> >> >> On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >>>> On 2 Sep 2017, at 13:03, David Holmes wrote: >>>> >>>> Hi Jesper, >>>> >>>> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>>>> Hi, >>>>> After going through the results of our nightlies it seems we are >>>>> in fairly good shape for integration. There was one issue with a >>>>> typo in a recent fix that caused some failures, this issue was >>>>> resolved yesterday just after the nightly snapshot was taken. >>>> The JPRT job that was used for the nightly testing was not valid. >>>> The repos were out of sync due to a re-run with an intervening >>>> integration job. The MaxRAMFraction failures in the tools tests >>>> below were caused by that. >>> Sigh... I thought we didn't use rerun for integration jobs. >>> >>> Thanks for the heads-up David!? I'll start a new nightly now to get >>> a trustworthy result. >>> >>> /Jesper >>> >>> >>>> David >>>> ----- >>>> >>>>> There is currently one issue that I didn't recognise and at the >>>>> moment it is marked as an integration blocker: >>>>> JDK-8187124 >>>>> TestInterpreterMethodEntries.java: Unable to create shared archive >>>>> file >>>>> This could as well be a problem with the test execution in which >>>>> case it is not a blocker, but someone needs to look into the >>>>> details here. >>>>> There are four test failures that looks slightly different: >>>>> tools/jar/modularJar/Basic.java >>>>> tools/jar/multiRelease/ApiValidatorTest.java >>>>> tools/jar/multiRelease/Basic.java >>>>> tools/launcher/InfoStreams.java >>>>> These four tests fails because they get a warning on stderr: >>>>> Option MaxRAMFraction was deprecated in version 10.0 and will >>>>> likely be removed in a future release. >>>>> I do not consider this a blocker for integration, bug filed: >>>>> JDK-8187125 >>>>> JDK10/hs now has restricted write access. Basically it is locked >>>>> but in order to fix any urgent issues that might pop up over the >>>>> next couple of days these people have write access: Vladimir >>>>> Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >>>>> /Jesper >>> >> From daniel.daugherty at oracle.com Sun Sep 3 04:56:52 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 2 Sep 2017 22:56:52 -0600 Subject: jdk10/hs integration status In-Reply-To: <232afa04-3bf5-1ea5-3462-6b0920b8f0d9@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> <232afa04-3bf5-1ea5-3462-6b0920b8f0d9@oracle.com> Message-ID: The test suite execution directory has a jtreg.sh script: 'time' ''/bin/bash'' '/export/home/aginfra/CommonData/jtreg_dir/bin/jtreg' '-testjdk:"/export/home/aginfra/CommonData/TEST_JAVA_HOME"' '-dir:/export/home/aginfra/CommonData/j2se_jdk/jdk//test' '-w:/export/home/aginfra/sandbox/results/workDir' '-r:/export/home/aginfra/sandbox/results/report' '-retain:fail,error' '-status:notRun,error,fail' '-ignore:quiet' '-a' '-javacoptions:' '-javaoption:-Xmixed' '-javaoption:-server' '-javaoption:-XX:MaxRAMPercentage=12.5' ''-k:!ignore'' '-timeout:16' '-verbose:summary' '-nativepath:/export/home/aginfra/sandbox/JTREG_NATIVEPATH_LIBRARY_PREPARED' '-thd:/export/home/aginfra/CommonData/JTREG_EFH_HOME/jtregFailureHandler.jar' '-th:jdk.test.failurehandler.jtreg.GatherProcessInfoTimeoutHandler' '-od:/export/home/aginfra/CommonData/JTREG_EFH_HOME/jtregFailureHandler.jar' '-o:jdk.test.failurehandler.jtreg.GatherDiagnosticInfoObserver' '-J-Djava.library.path='/export/home/aginfra/CommonData/JTREG_EFH_HOME'' '-othervm' '-conc:3' '-vmoptions:'-XX:MaxRAMFraction=6'' '-exclude:/export/home/aginfra/sandbox/results/exclude1.jtx' '-exclude:/export/home/aginfra/sandbox/results/exclude2.jtx' 'tools' so whatever Aurora/RBT used to setup this job added: -XX:MaxRAMFraction=6 Dan On 9/2/17 10:52 PM, Daniel D. Daugherty wrote: > On 9/2/17 10:40 PM, David Holmes wrote: >> On 3/09/2017 12:49 AM, Daniel D. Daugherty wrote: >>> We're not completely out of the woods. These tests: >>> >>> tools/jar/modularJar/Basic.java >>> tools/jar/multiRelease/ApiValidatorTest.java >>> tools/jar/multiRelease/Basic.java >>> tools/launcher/InfoStreams.java >>> >>> still failed in the 2017-09-01 JDK10-hs nightly with: >>> >>> java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit >>> Server VM warning: Option MaxRAMFraction was deprecated in version >>> 10.0 and will likely be removed in a future release. >> >> Isn't that the nightly we're talking about Dan? > > No. The failures that were originally discussed were in > the 2017-08-31 JDK10-hs nightly and were mostly caused by > Calvin's rerun JPRT job. > > I found a couple of places in the current JDK10-hs repo where > MaxRAMFraction is still used, but none that explain the above > four test failures. My conclusion is that they are picking up > the MaxRAMFraction from whatever mechanism is being used to > launch those tests... > > These two kitchensink config files still use MaxRAMFraction: > > ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:test.jvm.args=-XX:MaxRAMFraction=2 > -XX:+CrashOnOutOfMemoryError -Djava.net.preferIPv6Addresses=false > -XX:-PrintVMOptions -XX:+DisplayVMOutputToStderr -XX:+UsePerfData > -Xlog:gc*:gc.log -XX:+DisableExplicitGC -XX:+PrintFlagsFinal > -XX:+StartAttachListener -XX:+UnlockCommercialFeatures > -XX:NativeMemoryTracking=detail -XX:+ResourceManagement > -XX:+FlightRecorder > ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:original.jvm.args=-XX:MaxRAMFraction=8 > -Djava.net.preferIPv6Addresses=false > > Dan > > >> Those tests only fail if the closed repo has not got Bob's changes >> that switch from MaxRAMFraction to MaxRAMPercentage. >> >> David >> >>> Dan >>> >>> >>> On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >>>>> On 2 Sep 2017, at 13:03, David Holmes >>>>> wrote: >>>>> >>>>> Hi Jesper, >>>>> >>>>> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>>>>> Hi, >>>>>> After going through the results of our nightlies it seems we are >>>>>> in fairly good shape for integration. There was one issue with a >>>>>> typo in a recent fix that caused some failures, this issue was >>>>>> resolved yesterday just after the nightly snapshot was taken. >>>>> The JPRT job that was used for the nightly testing was not valid. >>>>> The repos were out of sync due to a re-run with an intervening >>>>> integration job. The MaxRAMFraction failures in the tools tests >>>>> below were caused by that. >>>> Sigh... I thought we didn't use rerun for integration jobs. >>>> >>>> Thanks for the heads-up David!? I'll start a new nightly now to get >>>> a trustworthy result. >>>> >>>> /Jesper >>>> >>>> >>>>> David >>>>> ----- >>>>> >>>>>> There is currently one issue that I didn't recognise and at the >>>>>> moment it is marked as an integration blocker: >>>>>> JDK-8187124 >>>>>> TestInterpreterMethodEntries.java: Unable to create shared >>>>>> archive file >>>>>> This could as well be a problem with the test execution in which >>>>>> case it is not a blocker, but someone needs to look into the >>>>>> details here. >>>>>> There are four test failures that looks slightly different: >>>>>> tools/jar/modularJar/Basic.java >>>>>> tools/jar/multiRelease/ApiValidatorTest.java >>>>>> tools/jar/multiRelease/Basic.java >>>>>> tools/launcher/InfoStreams.java >>>>>> These four tests fails because they get a warning on stderr: >>>>>> Option MaxRAMFraction was deprecated in version 10.0 and will >>>>>> likely be removed in a future release. >>>>>> I do not consider this a blocker for integration, bug filed: >>>>>> JDK-8187125 >>>>>> JDK10/hs now has restricted write access. Basically it is locked >>>>>> but in order to fix any urgent issues that might pop up over the >>>>>> next couple of days these people have write access: Vladimir >>>>>> Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >>>>>> /Jesper >>>> >>> > From daniel.daugherty at oracle.com Sun Sep 3 05:01:19 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 2 Sep 2017 23:01:19 -0600 Subject: jdk10/hs integration status In-Reply-To: References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> <232afa04-3bf5-1ea5-3462-6b0920b8f0d9@oracle.com> Message-ID: <93becaf6-6dd5-1471-0200-7a73c8c82156@oracle.com> Based on the Aurora job log for this testsuite run, it looks to me like UTE is being used to execute JTREG which executes these tests. Dan On 9/2/17 10:56 PM, Daniel D. Daugherty wrote: > The test suite execution directory has a jtreg.sh script: > > 'time' ''/bin/bash'' > '/export/home/aginfra/CommonData/jtreg_dir/bin/jtreg' > '-testjdk:"/export/home/aginfra/CommonData/TEST_JAVA_HOME"' > '-dir:/export/home/aginfra/CommonData/j2se_jdk/jdk//test' > '-w:/export/home/aginfra/sandbox/results/workDir' > '-r:/export/home/aginfra/sandbox/results/report' '-retain:fail,error' > '-status:notRun,error,fail' '-ignore:quiet' '-a' '-javacoptions:' > '-javaoption:-Xmixed' '-javaoption:-server' > '-javaoption:-XX:MaxRAMPercentage=12.5' ''-k:!ignore'' '-timeout:16' > '-verbose:summary' > '-nativepath:/export/home/aginfra/sandbox/JTREG_NATIVEPATH_LIBRARY_PREPARED' > '-thd:/export/home/aginfra/CommonData/JTREG_EFH_HOME/jtregFailureHandler.jar' > '-th:jdk.test.failurehandler.jtreg.GatherProcessInfoTimeoutHandler' > '-od:/export/home/aginfra/CommonData/JTREG_EFH_HOME/jtregFailureHandler.jar' > '-o:jdk.test.failurehandler.jtreg.GatherDiagnosticInfoObserver' > '-J-Djava.library.path='/export/home/aginfra/CommonData/JTREG_EFH_HOME'' > '-othervm' '-conc:3' '-vmoptions:'-XX:MaxRAMFraction=6'' > '-exclude:/export/home/aginfra/sandbox/results/exclude1.jtx' > '-exclude:/export/home/aginfra/sandbox/results/exclude2.jtx' 'tools' > > so whatever Aurora/RBT used to setup this job added: -XX:MaxRAMFraction=6 > > Dan > > > On 9/2/17 10:52 PM, Daniel D. Daugherty wrote: >> On 9/2/17 10:40 PM, David Holmes wrote: >>> On 3/09/2017 12:49 AM, Daniel D. Daugherty wrote: >>>> We're not completely out of the woods. These tests: >>>> >>>> tools/jar/modularJar/Basic.java >>>> tools/jar/multiRelease/ApiValidatorTest.java >>>> tools/jar/multiRelease/Basic.java >>>> tools/launcher/InfoStreams.java >>>> >>>> still failed in the 2017-09-01 JDK10-hs nightly with: >>>> >>>> java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit >>>> Server VM warning: Option MaxRAMFraction was deprecated in version >>>> 10.0 and will likely be removed in a future release. >>> >>> Isn't that the nightly we're talking about Dan? >> >> No. The failures that were originally discussed were in >> the 2017-08-31 JDK10-hs nightly and were mostly caused by >> Calvin's rerun JPRT job. >> >> I found a couple of places in the current JDK10-hs repo where >> MaxRAMFraction is still used, but none that explain the above >> four test failures. My conclusion is that they are picking up >> the MaxRAMFraction from whatever mechanism is being used to >> launch those tests... >> >> These two kitchensink config files still use MaxRAMFraction: >> >> ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:test.jvm.args=-XX:MaxRAMFraction=2 >> -XX:+CrashOnOutOfMemoryError -Djava.net.preferIPv6Addresses=false >> -XX:-PrintVMOptions -XX:+DisplayVMOutputToStderr -XX:+UsePerfData >> -Xlog:gc*:gc.log -XX:+DisableExplicitGC -XX:+PrintFlagsFinal >> -XX:+StartAttachListener -XX:+UnlockCommercialFeatures >> -XX:NativeMemoryTracking=detail -XX:+ResourceManagement >> -XX:+FlightRecorder >> ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:original.jvm.args=-XX:MaxRAMFraction=8 >> -Djava.net.preferIPv6Addresses=false >> >> Dan >> >> >>> Those tests only fail if the closed repo has not got Bob's changes >>> that switch from MaxRAMFraction to MaxRAMPercentage. >>> >>> David >>> >>>> Dan >>>> >>>> >>>> On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >>>>>> On 2 Sep 2017, at 13:03, David Holmes >>>>>> wrote: >>>>>> >>>>>> Hi Jesper, >>>>>> >>>>>> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>>>>>> Hi, >>>>>>> After going through the results of our nightlies it seems we are >>>>>>> in fairly good shape for integration. There was one issue with a >>>>>>> typo in a recent fix that caused some failures, this issue was >>>>>>> resolved yesterday just after the nightly snapshot was taken. >>>>>> The JPRT job that was used for the nightly testing was not valid. >>>>>> The repos were out of sync due to a re-run with an intervening >>>>>> integration job. The MaxRAMFraction failures in the tools tests >>>>>> below were caused by that. >>>>> Sigh... I thought we didn't use rerun for integration jobs. >>>>> >>>>> Thanks for the heads-up David!? I'll start a new nightly now to >>>>> get a trustworthy result. >>>>> >>>>> /Jesper >>>>> >>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> There is currently one issue that I didn't recognise and at the >>>>>>> moment it is marked as an integration blocker: >>>>>>> JDK-8187124 >>>>>>> TestInterpreterMethodEntries.java: Unable to create shared >>>>>>> archive file >>>>>>> This could as well be a problem with the test execution in which >>>>>>> case it is not a blocker, but someone needs to look into the >>>>>>> details here. >>>>>>> There are four test failures that looks slightly different: >>>>>>> tools/jar/modularJar/Basic.java >>>>>>> tools/jar/multiRelease/ApiValidatorTest.java >>>>>>> tools/jar/multiRelease/Basic.java >>>>>>> tools/launcher/InfoStreams.java >>>>>>> These four tests fails because they get a warning on stderr: >>>>>>> Option MaxRAMFraction was deprecated in version 10.0 and will >>>>>>> likely be removed in a future release. >>>>>>> I do not consider this a blocker for integration, bug filed: >>>>>>> JDK-8187125 >>>>>>> JDK10/hs now has restricted write access. Basically it is locked >>>>>>> but in order to fix any urgent issues that might pop up over the >>>>>>> next couple of days these people have write access: Vladimir >>>>>>> Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >>>>>>> /Jesper >>>>> >>>> >> > From rohitarulraj at gmail.com Sun Sep 3 16:42:42 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Sun, 3 Sep 2017 22:12:42 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> Message-ID: Hello Vladimir, On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov wrote: > Hi Rohit, > > On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >> >> Hello Vladimir, >> >>> Changes look good. Only question I have is about MaxVectorSize. It is set >>> > >>> 16 only in presence of AVX: >>> >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>> >>> Does that code works for AMD 17h too? >> >> >> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >> I have removed the surplus check for MaxVectorSize from my patch. I >> have updated, re-tested and attached the patch. > > > Which check you removed? > My older patch had the below mentioned check which was required on JDK9 where the default MaxVectorSize was 64. It has been handled better in openJDK10. So this check is not required anymore. + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { ... ... + if (MaxVectorSize > 32) { + FLAG_SET_DEFAULT(MaxVectorSize, 32); + } .. .. + } >> >> I have one query regarding the setting of UseSHA flag: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >> >> AMD 17h has support for SHA. >> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >> enabled for it based on the availability of BMI2 and AVX2. Is there an >> underlying reason for this? I have handled this in the patch but just >> wanted to confirm. > > > It was done with next changes which use only AVX2 and BMI2 instructions to > calculate SHA-256: > > http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 > > I don't know if AMD 15h supports these instructions and can execute that > code. You need to test it. > Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, it should work. Confirmed by running following sanity tests: ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java So I have removed those SHA checks from my patch too. Please find attached updated, re-tested patch. diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -1109,11 +1109,27 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); + } +#ifdef COMPILER2 + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -505,6 +505,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -515,19 +523,13 @@ result |= CPU_LZCNT; if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) result |= CPU_SSE4A; + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) + result |= CPU_HT; } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; Please let me know your comments. Thanks for your time. Rohit >> >> Thanks for taking time to review the code. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1088,6 +1088,22 @@ >> } >> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >> } >> + if (supports_sha()) { >> + if (FLAG_IS_DEFAULT(UseSHA)) { >> + FLAG_SET_DEFAULT(UseSHA, true); >> + } >> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >> UseSHA512Intrinsics) { >> + if (!FLAG_IS_DEFAULT(UseSHA) || >> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + warning("SHA instructions are not available on this CPU"); >> + } >> + FLAG_SET_DEFAULT(UseSHA, false); >> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> >> // some defaults for AMD family 15h >> if ( cpu_family() == 0x15 ) { >> @@ -1109,11 +1125,40 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> + } >> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >> + } >> + if (UseSHA) { >> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } else if (UseSHA512Intrinsics) { >> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >> functions not available on this CPU."); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2()) { >> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -505,6 +505,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -515,19 +523,13 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> >> >> Regards, >> Rohit >> >> >> >>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>> >>>> >>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>> wrote: >>>>> >>>>> >>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>> logic >>>>>> around UseSHA in vm_version_x86.cpp. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>> >>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>> resubmit for review. >>>>> >>>>> Regards, >>>>> Rohit >>>>> >>>> >>>> Hi All, >>>> >>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>> 13519:71337910df60), did regression testing using jtreg ($make >>>> default) and didnt find any regressions. >>>> >>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>> defaults for newer AMD 17h (EPYC) processor? >>>> >>>> ************************* Patch **************************** >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1088,6 +1088,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1109,11 +1125,43 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + UseXMMForArrayCopy = true; >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + UseUnalignedLoadStores = true; >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + UseBMI2Instructions = true; >>>> + } >>>> + if (MaxVectorSize > 32) { >>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -505,6 +505,14 @@ >>>> result |= CPU_CLMUL; >>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> result |= CPU_RTM; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> >>>> // AMD features. >>>> if (is_amd()) { >>>> @@ -515,19 +523,13 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> - result |= CPU_ADX; >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> - result |= CPU_BMI2; >>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> - result |= CPU_SHA; >>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> result |= CPU_LZCNT; >>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> - result |= CPU_FMA; >>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> result |= CPU_3DNOW_PREFETCH; >>>> >>>> ************************************************************** >>>> >>>> Thanks, >>>> Rohit >>>> >>>>>> >>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Rohit, >>>>>>>> >>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>> sets >>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>> with >>>>>>>>> the commit process. >>>>>>>>> >>>>>>>>> Webrev: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>> OpenJDK >>>>>>>> infrastructure and ... >>>>>>>> >>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ... unfortunately patches tend to get stripped by the mail servers. >>>>>>>> If >>>>>>>> the >>>>>>>> patch is small please include it inline. Otherwise you will need to >>>>>>>> find >>>>>>>> an >>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>> >>>>>>> >>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>> didnt find any regressions. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>> testing >>>>>>>> requirements. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks David, >>>>>>> Yes, it's a small patch. >>>>>>> >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>> } >>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>> } >>>>>>> + if (supports_sha()) { >>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>> + } >>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>>> UseSHA512Intrinsics) { >>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>> + } >>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } >>>>>>> >>>>>>> // some defaults for AMD family 15h >>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>> } >>>>>>> >>>>>>> #ifdef COMPILER2 >>>>>>> - if (MaxVectorSize > 16) { >>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>> } >>>>>>> #endif // COMPILER2 >>>>>>> + >>>>>>> + // Some defaults for AMD family 17h >>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>> for >>>>>>> Array Copy >>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>> + UseXMMForArrayCopy = true; >>>>>>> + } >>>>>>> + if (supports_sse2() && >>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>> { >>>>>>> + UseUnalignedLoadStores = true; >>>>>>> + } >>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>> + UseBMI2Instructions = true; >>>>>>> + } >>>>>>> + if (MaxVectorSize > 32) { >>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>> + } >>>>>>> + if (UseSHA) { >>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>> functions not available on this CPU."); >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } >>>>>>> + } >>>>>>> +#ifdef COMPILER2 >>>>>>> + if (supports_sse4_2()) { >>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>> + } >>>>>>> + } >>>>>>> +#endif >>>>>>> + } >>>>>>> } >>>>>>> >>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> @@ -513,6 +513,16 @@ >>>>>>> result |= CPU_LZCNT; >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>> result |= CPU_SSE4A; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> + result |= CPU_BMI2; >>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>> + result |= CPU_HT; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> + result |= CPU_ADX; >>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> + result |= CPU_SHA; >>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> + result |= CPU_FMA; >>>>>>> } >>>>>>> // Intel features. >>>>>>> if(is_intel()) { >>>>>>> >>>>>>> Regards, >>>>>>> Rohit >>>>>>> >>>>>> >>> > From jesper.wilhelmsson at oracle.com Sun Sep 3 20:02:14 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Sun, 3 Sep 2017 22:02:14 +0200 Subject: jdk10/hs integration status In-Reply-To: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> Message-ID: <47E6B89B-D08C-46AB-B8ED-B2FE02F5D8CC@oracle.com> Hi, JDK-8187124 is no longer considered a blocker. Thanks to everyone involved in the investigation! The integration is now completed. jdk10/hs will now remain closed until the repo consolidation is done. Thanks, /Jesper > On 2 Sep 2017, at 12:15, jesper.wilhelmsson at oracle.com wrote: > > Hi, > > After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. > > > There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: > > JDK-8187124 > TestInterpreterMethodEntries.java: Unable to create shared archive file > > This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. > > > There are four test failures that looks slightly different: > tools/jar/modularJar/Basic.java > tools/jar/multiRelease/ApiValidatorTest.java > tools/jar/multiRelease/Basic.java > tools/launcher/InfoStreams.java > > These four tests fails because they get a warning on stderr: > Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. > I do not consider this a blocker for integration, bug filed: JDK-8187125 > > > JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. > > /Jesper > From david.holmes at oracle.com Mon Sep 4 01:24:32 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 4 Sep 2017 11:24:32 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A93B53.9010505@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> Message-ID: <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> Hi Erik, On 1/09/2017 8:49 PM, Erik ?sterlund wrote: > Hi David, > > The shared structure for all operations is the following: > > An Atomic::something call creates a SomethingImpl function object that > performs some basic type checking and then forwards the call straight to > a PlatformSomething function object. This PlatformSomething object could > decide to do anything. But to make life easier, it may inherit from a > shared SomethingHelper function object with CRTP that calls back into > the PlatformSomething function object to emit inline assembly. Right, but! Lets look at some details. Atomic::add AddImpl PlatformAdd FetchAndAdd AddAndFetch add_using_helper Atomic::cmpxchg CmpxchgImpl PlatformCmpxchg cmpxchg_using_helper Atomic::inc IncImpl PlatformInc IncUsingConstant Why is it that the simplest operation (inc/dec) has the most complex platform template definition? Why do we need Adjustment? You previously said "Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates". But add passes around values and doesn't need this. Further inc/dec don't need to pass anything around anywhere - inc adds 1, dec subtracts 1! This "1" does not need to appear anywhere in the API or get passed across layers - the only place this "1" becomes evident is in the actual platform asm that does the logic of "add 1" or "subtract 1". My understanding from previous discussions is that much of the template machinations was to deal with type management for "dest" and the values being passed around. But here, for inc/dec there are no values being passed so we don't have to make "dest" type-compatible with any value. Cheers, David ----- > Hope this explanation helps understanding the intended structure of this > work. > > Thanks, > /Erik > > On 2017-09-01 12:34, David Holmes wrote: >> Hi Erik, >> >> I just wanted to add that I would expect the cmpxchg, add and inc, >> Atomic API's to all require similar basic structure for manipulating >> types/values etc, yet all three seem to have quite different >> structures that I find very confusing. I'm still at a loss to fathom >> the CRTP and the hoops we seemingly have to jump through just to add >> or subtract 1!!! >> >> Cheers, >> David >> >> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>> Hi David, >>> >>> On 2017-09-01 02:49, David Holmes wrote: >>>> Hi Erik, >>>> >>>> Sorry but this one is really losing me. >>>> >>>> What is the role of Adjustment ?? >>> >>> Adjustment represents the increment/decrement value as an >>> IntegralConstant - your template friend for passing around a constant >>> with both a specified type and value in templates. The type of the >>> increment/decrement is the type of the destination when the >>> destination is an integral type, otherwise if it is a pointer type, >>> the increment/decrement type is ptrdiff_t. >>> >>>> How are inc/dec anything but "using constant" ?? >>> >>> I was also a bit torn on that name (I assume you are referring to >>> IncUsingConstant/DecUsingConstant). It was hard to find a name that >>> depicted what this platform helper does. I considered calling the >>> helper something with immediate in the name because it is really used >>> to embed the constant as immediate values in inline assembly today. >>> But then again that seemed too specific, as it is not completely >>> obvious platform specializations will use it in that way. One might >>> just want to specialize this to send it into some compiler >>> Atomic::inc intrinsic for example. Do you have any other preferred >>> names? Here are a few possible names for IncUsingConstant: >>> >>> IncUsingScaledConstant >>> IncUsingAdjustedConstant >>> IncUsingPlatformHelper >>> >>> Any favourites? >>> >>>> Why do we special case jshort?? >>> >>> To be consistent with the special case of Atomic::add on jshort. Do >>> you want it removed? >>> >>>> This is indecipherable to normal people ;-) >>>> >>>> ?This()->template inc(dest); >>>> >>>> For something as trivial as adding or subtracting 1 the template >>>> machinations here are just mind boggling! >>> >>> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. >>> The idea is to devirtualize a virtual call by passing in the derived >>> type as a template parameter to a base class, and then let the base >>> class static_cast to the derived class to devirtualize the call. I >>> hope this explanation sheds some light on what is going on. The same >>> CRTP idiom was used in the Atomic::add implementation in a similar >>> fashion. >>> >>> I will add some comments describing this in the next round after >>> Coleen replies. >>> >>> Thanks for looking at this. >>> >>> /Erik >>> >>>> >>>> Cheers, >>>> David >>>> >>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>> Hi everyone, >>>>> >>>>> Bug ID: >>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>> >>>>> The time has come for the next step in generalizing Atomic with >>>>> templates. Today I will focus on Atomic::inc/dec. >>>>> >>>>> I have tried to mimic the new Kim style that seems to have been >>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>> structure looks like this: >>>>> >>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>> object that performs some basic type checks. >>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>> define the operation arbitrarily for a given platform. The default >>>>> implementation if not specialized for a platform is to call >>>>> Atomic::add. So only platforms that want to do something different >>>>> than that as an optimization have to provide a specialization. >>>>> Layer 3) Platforms that decide to specialize >>>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>>> performing the necessary computation what the increment/decrement >>>>> should be after pointer scaling using CRTP. The >>>>> PlatformInc/PlatformDec operation then only needs to define an >>>>> inc/dec member function, and will then get all the context >>>>> information necessary to generate a more optimized implementation. >>>>> Easy peasy. >>>>> >>>>> It is worth noticing that the generalized Atomic::dec operation >>>>> assumes a two's complement integer machine and potentially sends >>>>> the unary negative of a potentially unsigned type to Atomic::add. I >>>>> have the following comments about this: >>>>> 1) We already assume in other code that two's complement integers >>>>> must be present. >>>>> 2) A machine that does not have two's complement integers may still >>>>> simply provide a specialization that solves the problem in a >>>>> different way. >>>>> 3) The alternative that does not make assumptions about that would >>>>> use the good old IntegerTypes::cast_to_signed metaprogramming >>>>> stuff, and I seem to recall we thought that was a bit too involved >>>>> and complicated. >>>>> This is the reason why I have chosen to use unary minus on the >>>>> potentially unsigned type in the shared helper code that sends the >>>>> decrement as an addend to Atomic::add. >>>>> >>>>> It would also be nice if somebody with access to PPC and s390 >>>>> machines could try out the relevant changes there so I do not >>>>> accidentally break those platforms. I have blind-coded the addition >>>>> of the immediate values passed in to the inline assembly in a way >>>>> that I think looks like it should work. >>>>> >>>>> Testing: >>>>> RBT hs-tier3, JPRT --testset hotspot >>>>> >>>>> Thanks, >>>>> /Erik >>> > From vladimir.kozlov at oracle.com Mon Sep 4 02:39:15 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 3 Sep 2017 19:39:15 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> Message-ID: <998a7014-4199-26b0-a8f5-20441f4d3f04@oracle.com> Looks good. Currently jdk10 repository is undergoing "consolidation" update. It may take 2 weeks. You need to wait when we can push your changes. Regards, Vladimir On 9/3/17 9:42 AM, Rohit Arul Raj wrote: > Hello Vladimir, > > On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov > wrote: >> Hi Rohit, >> >> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>> >>> Hello Vladimir, >>> >>>> Changes look good. Only question I have is about MaxVectorSize. It is set >>>>> >>>> 16 only in presence of AVX: >>>> >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>> >>>> Does that code works for AMD 17h too? >>> >>> >>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >>> I have removed the surplus check for MaxVectorSize from my patch. I >>> have updated, re-tested and attached the patch. >> >> >> Which check you removed? >> > > My older patch had the below mentioned check which was required on > JDK9 where the default MaxVectorSize was 64. It has been handled > better in openJDK10. So this check is not required anymore. > > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > ... > ... > + if (MaxVectorSize > 32) { > + FLAG_SET_DEFAULT(MaxVectorSize, 32); > + } > .. > .. > + } > >>> >>> I have one query regarding the setting of UseSHA flag: >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>> >>> AMD 17h has support for SHA. >>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>> enabled for it based on the availability of BMI2 and AVX2. Is there an >>> underlying reason for this? I have handled this in the patch but just >>> wanted to confirm. >> >> >> It was done with next changes which use only AVX2 and BMI2 instructions to >> calculate SHA-256: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >> >> I don't know if AMD 15h supports these instructions and can execute that >> code. You need to test it. >> > > Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, > it should work. > Confirmed by running following sanity tests: > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > > So I have removed those SHA checks from my patch too. > > Please find attached updated, re-tested patch. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -1109,11 +1109,27 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > +#ifdef COMPILER2 > + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -505,6 +505,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -515,19 +523,13 @@ > result |= CPU_LZCNT; > if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) > result |= CPU_SSE4A; > + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > + result |= CPU_HT; > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > > Please let me know your comments. > > Thanks for your time. > Rohit > >>> >>> Thanks for taking time to review the code. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1088,6 +1088,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1109,11 +1125,40 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -505,6 +505,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -515,19 +523,13 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> >>> >>> Regards, >>> Rohit >>> >>> >>> >>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>> wrote: >>>>>> >>>>>> >>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Rohit, >>>>>>> >>>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>>> logic >>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>> >>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>>> resubmit for review. >>>>>> >>>>>> Regards, >>>>>> Rohit >>>>>> >>>>> >>>>> Hi All, >>>>> >>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>> default) and didnt find any regressions. >>>>> >>>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>>> defaults for newer AMD 17h (EPYC) processor? >>>>> >>>>> ************************* Patch **************************** >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1088,6 +1088,22 @@ >>>>> } >>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>> } >>>>> + if (supports_sha()) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>> + } >>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>> UseSHA512Intrinsics) { >>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + warning("SHA instructions are not available on this CPU"); >>>>> + } >>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> >>>>> // some defaults for AMD family 15h >>>>> if ( cpu_family() == 0x15 ) { >>>>> @@ -1109,11 +1125,43 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + UseXMMForArrayCopy = true; >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>> + UseUnalignedLoadStores = true; >>>>> + } >>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>> + UseBMI2Instructions = true; >>>>> + } >>>>> + if (MaxVectorSize > 32) { >>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> + } >>>>> + if (UseSHA) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } else if (UseSHA512Intrinsics) { >>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>> functions not available on this CPU."); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2()) { >>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -505,6 +505,14 @@ >>>>> result |= CPU_CLMUL; >>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>> result |= CPU_RTM; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> >>>>> // AMD features. >>>>> if (is_amd()) { >>>>> @@ -515,19 +523,13 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> - result |= CPU_ADX; >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> - result |= CPU_BMI2; >>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> - result |= CPU_SHA; >>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>> result |= CPU_LZCNT; >>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> - result |= CPU_FMA; >>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>> support for prefetchw >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>> result |= CPU_3DNOW_PREFETCH; >>>>> >>>>> ************************************************************** >>>>> >>>>> Thanks, >>>>> Rohit >>>>> >>>>>>> >>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Rohit, >>>>>>>>> >>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>>> sets >>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>>> with >>>>>>>>>> the commit process. >>>>>>>>>> >>>>>>>>>> Webrev: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>>> OpenJDK >>>>>>>>> infrastructure and ... >>>>>>>>> >>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ... unfortunately patches tend to get stripped by the mail servers. >>>>>>>>> If >>>>>>>>> the >>>>>>>>> patch is small please include it inline. Otherwise you will need to >>>>>>>>> find >>>>>>>>> an >>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>> >>>>>>>> >>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>>> didnt find any regressions. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>> testing >>>>>>>>> requirements. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks David, >>>>>>>> Yes, it's a small patch. >>>>>>>> >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>> } >>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>> } >>>>>>>> + if (supports_sha()) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>> + } >>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>>>> UseSHA512Intrinsics) { >>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>> + } >>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } >>>>>>>> >>>>>>>> // some defaults for AMD family 15h >>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>> } >>>>>>>> >>>>>>>> #ifdef COMPILER2 >>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>> } >>>>>>>> #endif // COMPILER2 >>>>>>>> + >>>>>>>> + // Some defaults for AMD family 17h >>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>> for >>>>>>>> Array Copy >>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>> + } >>>>>>>> + if (supports_sse2() && >>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>> { >>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>> + } >>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>> + UseBMI2Instructions = true; >>>>>>>> + } >>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>> + } >>>>>>>> + if (UseSHA) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>> functions not available on this CPU."); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +#ifdef COMPILER2 >>>>>>>> + if (supports_sse4_2()) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +#endif >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>> result |= CPU_LZCNT; >>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>> result |= CPU_SSE4A; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> + result |= CPU_BMI2; >>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>> + result |= CPU_HT; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> + result |= CPU_ADX; >>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> + result |= CPU_SHA; >>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> + result |= CPU_FMA; >>>>>>>> } >>>>>>>> // Intel features. >>>>>>>> if(is_intel()) { >>>>>>>> >>>>>>>> Regards, >>>>>>>> Rohit >>>>>>>> >>>>>>> >>>> >> > From rohitarulraj at gmail.com Mon Sep 4 03:59:09 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Mon, 4 Sep 2017 09:29:09 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <998a7014-4199-26b0-a8f5-20441f4d3f04@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <998a7014-4199-26b0-a8f5-20441f4d3f04@oracle.com> Message-ID: On Mon, Sep 4, 2017 at 8:09 AM, Vladimir Kozlov wrote: > Looks good. > > Currently jdk10 repository is undergoing "consolidation" update. It may take > 2 weeks. You need to wait when we can push your changes. > Sure Vladimir, Thanks for the support. Regards, Rohit > > On 9/3/17 9:42 AM, Rohit Arul Raj wrote: >> >> Hello Vladimir, >> >> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >> wrote: >>> >>> Hi Rohit, >>> >>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>> >>>> >>>> Hello Vladimir, >>>> >>>>> Changes look good. Only question I have is about MaxVectorSize. It is >>>>> set >>>>>> >>>>>> >>>>> 16 only in presence of AVX: >>>>> >>>>> >>>>> >>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>> >>>>> Does that code works for AMD 17h too? >>>> >>>> >>>> >>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>> have updated, re-tested and attached the patch. >>> >>> >>> >>> Which check you removed? >>> >> >> My older patch had the below mentioned check which was required on >> JDK9 where the default MaxVectorSize was 64. It has been handled >> better in openJDK10. So this check is not required anymore. >> >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> ... >> ... >> + if (MaxVectorSize > 32) { >> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >> + } >> .. >> .. >> + } >> >>>> >>>> I have one query regarding the setting of UseSHA flag: >>>> >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>> >>>> AMD 17h has support for SHA. >>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>> enabled for it based on the availability of BMI2 and AVX2. Is there an >>>> underlying reason for this? I have handled this in the patch but just >>>> wanted to confirm. >>> >>> >>> >>> It was done with next changes which use only AVX2 and BMI2 instructions >>> to >>> calculate SHA-256: >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>> >>> I don't know if AMD 15h supports these instructions and can execute that >>> code. You need to test it. >>> >> >> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >> it should work. >> Confirmed by running following sanity tests: >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >> >> So I have removed those SHA checks from my patch too. >> >> Please find attached updated, re-tested patch. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1109,11 +1109,27 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -505,6 +505,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -515,19 +523,13 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> >> Please let me know your comments. >> >> Thanks for your time. >> Rohit >> >>>> >>>> Thanks for taking time to review the code. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1088,6 +1088,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1109,11 +1125,40 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -505,6 +505,14 @@ >>>> result |= CPU_CLMUL; >>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> result |= CPU_RTM; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> >>>> // AMD features. >>>> if (is_amd()) { >>>> @@ -515,19 +523,13 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> - result |= CPU_ADX; >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> - result |= CPU_BMI2; >>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> - result |= CPU_SHA; >>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> result |= CPU_LZCNT; >>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> - result |= CPU_FMA; >>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> result |= CPU_3DNOW_PREFETCH; >>>> >>>> >>>> Regards, >>>> Rohit >>>> >>>> >>>> >>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Rohit, >>>>>>>> >>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>>>> logic >>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>> >>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>>>> resubmit for review. >>>>>>> >>>>>>> Regards, >>>>>>> Rohit >>>>>>> >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>> default) and didnt find any regressions. >>>>>> >>>>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>> >>>>>> ************************* Patch **************************** >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -1088,6 +1088,22 @@ >>>>>> } >>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>> } >>>>>> + if (supports_sha()) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>> + } >>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>> UseSHA512Intrinsics) { >>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>> + } >>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> >>>>>> // some defaults for AMD family 15h >>>>>> if ( cpu_family() == 0x15 ) { >>>>>> @@ -1109,11 +1125,43 @@ >>>>>> } >>>>>> >>>>>> #ifdef COMPILER2 >>>>>> - if (MaxVectorSize > 16) { >>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> } >>>>>> #endif // COMPILER2 >>>>>> + >>>>>> + // Some defaults for AMD family 17h >>>>>> + if ( cpu_family() == 0x17 ) { >>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>> Array Copy >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> + UseXMMForArrayCopy = true; >>>>>> + } >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>> { >>>>>> + UseUnalignedLoadStores = true; >>>>>> + } >>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>> + UseBMI2Instructions = true; >>>>>> + } >>>>>> + if (MaxVectorSize > 32) { >>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>> + } >>>>>> + if (UseSHA) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } else if (UseSHA512Intrinsics) { >>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>> functions not available on this CPU."); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> + } >>>>>> +#ifdef COMPILER2 >>>>>> + if (supports_sse4_2()) { >>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> + } >>>>>> + } >>>>>> +#endif >>>>>> + } >>>>>> } >>>>>> >>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -505,6 +505,14 @@ >>>>>> result |= CPU_CLMUL; >>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>> result |= CPU_RTM; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> + result |= CPU_ADX; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> + result |= CPU_BMI2; >>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> + result |= CPU_SHA; >>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> + result |= CPU_FMA; >>>>>> >>>>>> // AMD features. >>>>>> if (is_amd()) { >>>>>> @@ -515,19 +523,13 @@ >>>>>> result |= CPU_LZCNT; >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>> result |= CPU_SSE4A; >>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>> + result |= CPU_HT; >>>>>> } >>>>>> // Intel features. >>>>>> if(is_intel()) { >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> - result |= CPU_ADX; >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> - result |= CPU_BMI2; >>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> - result |= CPU_SHA; >>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>> result |= CPU_LZCNT; >>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> - result |= CPU_FMA; >>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>> support for prefetchw >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>> >>>>>> ************************************************************** >>>>>> >>>>>> Thanks, >>>>>> Rohit >>>>>> >>>>>>>> >>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Rohit, >>>>>>>>>> >>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>>>> sets >>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>>>> with >>>>>>>>>>> the commit process. >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>>>> OpenJDK >>>>>>>>>> infrastructure and ... >>>>>>>>>> >>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>> servers. >>>>>>>>>> If >>>>>>>>>> the >>>>>>>>>> patch is small please include it inline. Otherwise you will need >>>>>>>>>> to >>>>>>>>>> find >>>>>>>>>> an >>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>> >>>>>>>>> >>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>>>> didnt find any regressions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>>> testing >>>>>>>>>> requirements. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks David, >>>>>>>>> Yes, it's a small patch. >>>>>>>>> >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>> } >>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>> } >>>>>>>>> + if (supports_sha()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>> + } >>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>> || >>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>>> + } >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> >>>>>>>>> // some defaults for AMD family 15h >>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>> } >>>>>>>>> >>>>>>>>> #ifdef COMPILER2 >>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>> } >>>>>>>>> #endif // COMPILER2 >>>>>>>>> + >>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>> for >>>>>>>>> Array Copy >>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>> { >>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>> + } >>>>>>>>> + if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>> { >>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>> + } >>>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>> { >>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>> + } >>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>> + } >>>>>>>>> + if (UseSHA) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>> functions not available on this CPU."); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#ifdef COMPILER2 >>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#endif >>>>>>>>> + } >>>>>>>>> } >>>>>>>>> >>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>> result |= CPU_SSE4A; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> + result |= CPU_BMI2; >>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>> + result |= CPU_HT; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> + result |= CPU_ADX; >>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> + result |= CPU_SHA; >>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> + result |= CPU_FMA; >>>>>>>>> } >>>>>>>>> // Intel features. >>>>>>>>> if(is_intel()) { >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>> >>>>> >>> >> > From erik.osterlund at oracle.com Mon Sep 4 07:15:02 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 09:15:02 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> Message-ID: <59ACFD76.3000606@oracle.com> Hi David, On 2017-09-04 03:24, David Holmes wrote: > Hi Erik, > > On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >> Hi David, >> >> The shared structure for all operations is the following: >> >> An Atomic::something call creates a SomethingImpl function object >> that performs some basic type checking and then forwards the call >> straight to a PlatformSomething function object. This >> PlatformSomething object could decide to do anything. But to make >> life easier, it may inherit from a shared SomethingHelper function >> object with CRTP that calls back into the PlatformSomething function >> object to emit inline assembly. > > Right, but! Lets look at some details. > > Atomic::add > AddImpl > PlatformAdd > FetchAndAdd > AddAndFetch > add_using_helper > > Atomic::cmpxchg > CmpxchgImpl > PlatformCmpxchg > cmpxchg_using_helper > > Atomic::inc > IncImpl > PlatformInc > IncUsingConstant > > Why is it that the simplest operation (inc/dec) has the most complex > platform template definition? Why do we need Adjustment? You > previously said "Adjustment represents the increment/decrement value > as an IntegralConstant - your template friend for passing around a > constant with both a specified type and value in templates". But add > passes around values and doesn't need this. Further inc/dec don't need > to pass anything around anywhere - inc adds 1, dec subtracts 1! This > "1" does not need to appear anywhere in the API or get passed across > layers - the only place this "1" becomes evident is in the actual > platform asm that does the logic of "add 1" or "subtract 1". > > My understanding from previous discussions is that much of the > template machinations was to deal with type management for "dest" and > the values being passed around. But here, for inc/dec there are no > values being passed so we don't have to make "dest" type-compatible > with any value. Dealing with different types being passed in is one part of the problem - a problem that almost all operations seems to have. But Atomic::add and inc/dec have more problems to deal with. The Atomic::add operation has two more problems that cmpxchg does not have. 1) It needs to scale pointer arithmetic. So if you have a P* and you add it by 2, then you really add the underlying value by 2 * sizeof(P), and the scaled addend needs to be of the right type - the type of the destination for integral types and ptrdiff_t for pointers. This is similar semantics to ++pointer. 2) It connects backends with different semantics - either fetch_and_add or add_and_fetch to a common public interface with add_and_fetch semantics. This is the reason that Atomic::add might appear more complicated than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type problems to deal with - no pointer arithmetics. The reason why Atomic::inc/dec looks more complicated than Atomic::add is that it needs to preserve the pointer arithmetic as constants rather than values, because the scaled addend is embedded in the inline assembly as immediate values. Therefore it passes around an IntegralConstant that embeds both the type and size of the addend. And it is not just 1/-1. For integral destinations the constant used is 1/-1 of the type stored at the destination. For pointers the constant is ptrdiff_t with a value representing the size of the element pointed to. Having said that - I am not opposed to simply removing the specializations of inc/dec if we are scared of the complexity of passing this constant to the platform layer. After running a bunch of benchmarks over the weekend, it showed no significant regressions after removal. Now of course that might not tell the full story - it could have missed that some critical operation in the JVM takes longer. But I would be very surprised if that was the case. Thanks, /Erik > > Cheers, > David > ----- > >> Hope this explanation helps understanding the intended structure of >> this work. >> >> Thanks, >> /Erik >> >> On 2017-09-01 12:34, David Holmes wrote: >>> Hi Erik, >>> >>> I just wanted to add that I would expect the cmpxchg, add and inc, >>> Atomic API's to all require similar basic structure for manipulating >>> types/values etc, yet all three seem to have quite different >>> structures that I find very confusing. I'm still at a loss to fathom >>> the CRTP and the hoops we seemingly have to jump through just to add >>> or subtract 1!!! >>> >>> Cheers, >>> David >>> >>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>> Hi David, >>>> >>>> On 2017-09-01 02:49, David Holmes wrote: >>>>> Hi Erik, >>>>> >>>>> Sorry but this one is really losing me. >>>>> >>>>> What is the role of Adjustment ?? >>>> >>>> Adjustment represents the increment/decrement value as an >>>> IntegralConstant - your template friend for passing around a >>>> constant with both a specified type and value in templates. The >>>> type of the increment/decrement is the type of the destination when >>>> the destination is an integral type, otherwise if it is a pointer >>>> type, the increment/decrement type is ptrdiff_t. >>>> >>>>> How are inc/dec anything but "using constant" ?? >>>> >>>> I was also a bit torn on that name (I assume you are referring to >>>> IncUsingConstant/DecUsingConstant). It was hard to find a name that >>>> depicted what this platform helper does. I considered calling the >>>> helper something with immediate in the name because it is really >>>> used to embed the constant as immediate values in inline assembly >>>> today. But then again that seemed too specific, as it is not >>>> completely obvious platform specializations will use it in that >>>> way. One might just want to specialize this to send it into some >>>> compiler Atomic::inc intrinsic for example. Do you have any other >>>> preferred names? Here are a few possible names for IncUsingConstant: >>>> >>>> IncUsingScaledConstant >>>> IncUsingAdjustedConstant >>>> IncUsingPlatformHelper >>>> >>>> Any favourites? >>>> >>>>> Why do we special case jshort?? >>>> >>>> To be consistent with the special case of Atomic::add on jshort. Do >>>> you want it removed? >>>> >>>>> This is indecipherable to normal people ;-) >>>>> >>>>> This()->template inc(dest); >>>>> >>>>> For something as trivial as adding or subtracting 1 the template >>>>> machinations here are just mind boggling! >>>> >>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ >>>> idiom. The idea is to devirtualize a virtual call by passing in the >>>> derived type as a template parameter to a base class, and then let >>>> the base class static_cast to the derived class to devirtualize the >>>> call. I hope this explanation sheds some light on what is going on. >>>> The same CRTP idiom was used in the Atomic::add implementation in a >>>> similar fashion. >>>> >>>> I will add some comments describing this in the next round after >>>> Coleen replies. >>>> >>>> Thanks for looking at this. >>>> >>>> /Erik >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>> Hi everyone, >>>>>> >>>>>> Bug ID: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>> >>>>>> The time has come for the next step in generalizing Atomic with >>>>>> templates. Today I will focus on Atomic::inc/dec. >>>>>> >>>>>> I have tried to mimic the new Kim style that seems to have been >>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>>> structure looks like this: >>>>>> >>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>>> object that performs some basic type checks. >>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>>> define the operation arbitrarily for a given platform. The >>>>>> default implementation if not specialized for a platform is to >>>>>> call Atomic::add. So only platforms that want to do something >>>>>> different than that as an optimization have to provide a >>>>>> specialization. >>>>>> Layer 3) Platforms that decide to specialize >>>>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>>>> performing the necessary computation what the increment/decrement >>>>>> should be after pointer scaling using CRTP. The >>>>>> PlatformInc/PlatformDec operation then only needs to define an >>>>>> inc/dec member function, and will then get all the context >>>>>> information necessary to generate a more optimized >>>>>> implementation. Easy peasy. >>>>>> >>>>>> It is worth noticing that the generalized Atomic::dec operation >>>>>> assumes a two's complement integer machine and potentially sends >>>>>> the unary negative of a potentially unsigned type to Atomic::add. >>>>>> I have the following comments about this: >>>>>> 1) We already assume in other code that two's complement integers >>>>>> must be present. >>>>>> 2) A machine that does not have two's complement integers may >>>>>> still simply provide a specialization that solves the problem in >>>>>> a different way. >>>>>> 3) The alternative that does not make assumptions about that >>>>>> would use the good old IntegerTypes::cast_to_signed >>>>>> metaprogramming stuff, and I seem to recall we thought that was a >>>>>> bit too involved and complicated. >>>>>> This is the reason why I have chosen to use unary minus on the >>>>>> potentially unsigned type in the shared helper code that sends >>>>>> the decrement as an addend to Atomic::add. >>>>>> >>>>>> It would also be nice if somebody with access to PPC and s390 >>>>>> machines could try out the relevant changes there so I do not >>>>>> accidentally break those platforms. I have blind-coded the >>>>>> addition of the immediate values passed in to the inline assembly >>>>>> in a way that I think looks like it should work. >>>>>> >>>>>> Testing: >>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>> >>>>>> Thanks, >>>>>> /Erik >>>> >> From erik.osterlund at oracle.com Mon Sep 4 08:14:48 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 10:14:48 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: <59AD0B78.4000707@oracle.com> Hi Andrew, On 2017-09-02 10:31, Andrew Haley wrote: > On 01/09/17 15:15, Erik ?sterlund wrote: >> It is not the simplest solution I can think of. The simplest solution I >> can think of is to remove all specialized versions of Atomic::inc/dec >> and just have it call Atomic::add directly. That would remove the >> optimizations we have today, for whatever reason we have them. It would >> lead to slightly more conservative fencing on PPC/S390, > I see. Can you say what instructions would be different? Sure. Specializations exist on x86, PPC and S390. Removing these specializations would have the following consequences: ------------------------------------------------------------------- On x86 Atomic::inc of 4 byte sized types: lock addl $immediateAddend,(rDest) becomes lock xaddl rAddend,(rDest) # stores the value that was there back in rAddend upon completion So the inc optimization currently makes sure the addend can be encoded as an immediate value in the code stream, and exploits that we do not need to see the returned value. Therefore a lock addl is good enough for those purposes and does not require the use of an extra register. But it is not obvious that on a modern machine today that slimmed encoding will make any significant difference at all. In the contended case it arguably will not matter. Similar arguments apply for 8 byte sized types and the Atomic::dec variants. ------------------------------------------------------------------- On PPC Atomic::inc/dec and Atomic::add have the following differences: Atomic::inc/dec uses addic between the LL and SC instructions with an immediate value for adding, whereas Atomic::add uses the add instruction with an extra register. Atomic::add has a leading lwsync fence and Atomic::inc/dec has no leading fence. Atomic::add has a trailing isync fence and Atomic::inc/dec has no trailing fence. So the current implementation of Atomic::add uses heavier fencing than Atomic::inc/dec. I can imagine that does matter for performance today. However, the documented semantics of Atomic::inc/dec requires a leading sync fence - so they are both arguably too weak and should have stronger fencing than they do today. And I would argue that if both conformed to the fencing required by our public API, then the difference would probably be small. If dodging those fences on PPC is crucial for performance, then I believe the right way of fixing that is by introducing relaxed atomics should that be necessary. ------------------------------------------------------------------- On S390 Atomic::inc/dec and Atomic::add look almost identical. But I spotted the following tindy differences: Atomic::inc on 4-byte sized types loads the increment with LGHI, whereas Atomic::add loads it with LGFR Similarly, Atomic::inc calculates the new value with AGHI and Atomic::add calculates the new value with AR. I am not too familiar with S390, but if I get this right then Atomic::add uses a fetch_and_add instruction, and then adds the fetched value by one in the assembly to conform to add_and_fetch semantics. Atomic::inc also uses a fetch_and_add instruction and seems to also calculate the add_and_fetch result value, without returning it or in any other way using it. If the native fetch_and_add instruction is not available, it resorts to using a load-link add CAS loop - and they look identical except for using an immediate value for Atomic::inc. The same applies for Atomic::dec and 8 byte sized types. Either way, the differences between add and inc/dec seems to currently mostly be related to using immediate values vs a register, if I get it right. And I would be surprised if that makes a huge difference. ------------------------------------------------------------------- All in all, I would not be unhappy about dropping Atomic::inc specializations in the name of simplicity, and potentially introducing relaxed atomics instead for the platforms that rely on fence elision, should that be required. Thanks, /Erik > >> and would lead to slightly less optimal machine encoding on x86 >> (without immediate values in the instructions). But it would be >> simpler for sure. I did not put any judgement into whether our >> existing optimizations are worthwhile or not. But if you want to >> prioritize simplicity, removing those optimizations is one possible >> solution. Would you prefer that? > Is this really about optimization? If we cared about getting this > stuff as optimized as possible we'd use intrinsics on GCC/x86 targets. > These have been supported for a long time. But it seems we're > determined to preserve the legacy assembly-language implementations > and use them everywhere, even where they are not necessary. > From aph at redhat.com Mon Sep 4 09:21:01 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 10:21:01 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59ACFD76.3000606@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> Message-ID: <227233be-69a3-44f6-ea74-c86ed66aa44e@redhat.com> On 04/09/17 08:15, Erik ?sterlund wrote: > Having said that - I am not opposed to simply removing the > specializations of inc/dec if we are scared of the complexity of > passing this constant to the platform layer. It isn't exactly about fear, but of course we should be cautious about adding complexity. Simplicity is prerequisite for reliability. [One of Dijkstra's pithiest comments.] > After running a bunch of benchmarks over the weekend, it showed no > significant regressions after removal. Now of course that might not > tell the full story - it could have missed that some critical > operation in the JVM takes longer. But I would be very surprised if > that was the case. Good. So would I. Fred Brooks distinguishes between two types of complexity: accidental and essential. Essential complexity is determined by the problem to be solved, and nothing can remove it. Accidental complexity is caused by the implementation: programming language, use of assembly code, and so on. In this case, the idea of atomically incrementing a variable is extremely simple. It's barely even worthy of the name "algorithm". I believe that almost all of the complexity of a solution is accidental: it's mostly caused by C++, the C++ compilers we use, and the internal conventions of HotSpot. The question in my mind is: how much of the accidental complexity can we remove? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Sep 4 09:24:14 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 10:24:14 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AD0B78.4000707@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <59AD0B78.4000707@oracle.com> Message-ID: <8482b0b5-8791-9495-7d3c-d9155bb32518@redhat.com> On 04/09/17 09:14, Erik ?sterlund wrote: > On PPC Atomic::inc/dec and Atomic::add have the following differences: > > Atomic::inc/dec uses addic between the LL and SC instructions with an > immediate value for adding, whereas Atomic::add uses the add instruction > with an extra register. > Atomic::add has a leading lwsync fence and Atomic::inc/dec has no > leading fence. > Atomic::add has a trailing isync fence and Atomic::inc/dec has no > trailing fence. One of those must be a bug. Either one of them is unnecessary or both are necessary. > So the current implementation of Atomic::add uses heavier fencing than > Atomic::inc/dec. I can imagine that does matter for performance today. > However, the documented semantics of Atomic::inc/dec requires a leading > sync fence - so they are both arguably too weak and should have stronger > fencing than they do today. And I would argue that if both conformed to > the fencing required by our public API, then the difference would > probably be small. Right. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Mon Sep 4 09:34:46 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 4 Sep 2017 11:34:46 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> Hi, On 09/02/2017 10:31 AM, Andrew Haley wrote: > On 01/09/17 15:15, Erik ?sterlund wrote: >> It is not the simplest solution I can think of. The simplest solution I >> can think of is to remove all specialized versions of Atomic::inc/dec >> and just have it call Atomic::add directly. That would remove the >> optimizations we have today, for whatever reason we have them. It would >> lead to slightly more conservative fencing on PPC/S390, > > I see. Can you say what instructions would be different? > >> and would lead to slightly less optimal machine encoding on x86 >> (without immediate values in the instructions). But it would be >> simpler for sure. I did not put any judgement into whether our >> existing optimizations are worthwhile or not. But if you want to >> prioritize simplicity, removing those optimizations is one possible >> solution. Would you prefer that? > > Is this really about optimization? If we cared about getting this > stuff as optimized as possible we'd use intrinsics on GCC/x86 targets. > These have been supported for a long time. But it seems we're > determined to preserve the legacy assembly-language implementations > and use them everywhere, even where they are not necessary. > Why not use gcc/clang intrinsic on for all platforms we use gcc/clang? (not just gcc/x86) For "__atomic_fetch_add (&value, inc, __ATOMIC_RELAXED);" gcc seem to generate "lock addl" on x86 and armv8 ldxr,stxr, with acq_rel ldaxr,stlxr, which is what I would expect. And thus we can remove a lot of code! (if we should have the relaxed version in API is another question) /Robbin From erik.osterlund at oracle.com Mon Sep 4 09:50:14 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 11:50:14 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> Message-ID: <59AD21D6.8040305@oracle.com> Hi Robbin, I agree that on x86, there isn't a whole lot of other things the compiler could do with the intrinsics than what we want it to do due to the relatively strong memory model of the machine. So this might be a possible simplification on x86 gcc/clang targets (but still not all x86 targets). As for PPC and ARMv7 though, that is not true any longer. For example, our conservative memory model is more conservative than seq_cst semantics. E.g. it also has "leading sync" semantics always guaranteed, which is exploited in our code base and would be broken if translated simply as seq_cst. Also, since the fencing from the C++ compiler must be compliant with what our code generation does, they could end up being incompatible due to choice of different fencing conventions. Intrinsic provided operations may or may not have leading sync semantics. We can hope for it, but we should never rely on it. Thanks, /Erik On 2017-09-04 11:34, Robbin Ehn wrote: > Hi, > > On 09/02/2017 10:31 AM, Andrew Haley wrote: >> On 01/09/17 15:15, Erik ?sterlund wrote: >>> It is not the simplest solution I can think of. The simplest solution I >>> can think of is to remove all specialized versions of Atomic::inc/dec >>> and just have it call Atomic::add directly. That would remove the >>> optimizations we have today, for whatever reason we have them. It would >>> lead to slightly more conservative fencing on PPC/S390, >> >> I see. Can you say what instructions would be different? >> >>> and would lead to slightly less optimal machine encoding on x86 >>> (without immediate values in the instructions). But it would be >>> simpler for sure. I did not put any judgement into whether our >>> existing optimizations are worthwhile or not. But if you want to >>> prioritize simplicity, removing those optimizations is one possible >>> solution. Would you prefer that? >> >> Is this really about optimization? If we cared about getting this >> stuff as optimized as possible we'd use intrinsics on GCC/x86 targets. >> These have been supported for a long time. But it seems we're >> determined to preserve the legacy assembly-language implementations >> and use them everywhere, even where they are not necessary. >> > > Why not use gcc/clang intrinsic on for all platforms we use gcc/clang? > (not just gcc/x86) > For "__atomic_fetch_add (&value, inc, __ATOMIC_RELAXED);" > gcc seem to generate "lock addl" on x86 and armv8 ldxr,stxr, with > acq_rel ldaxr,stlxr, which is what I would expect. > > And thus we can remove a lot of code! > > (if we should have the relaxed version in API is another question) > > /Robbin From aph at redhat.com Mon Sep 4 10:05:53 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 11:05:53 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AD21D6.8040305@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> Message-ID: On 04/09/17 10:50, Erik ?sterlund wrote: > As for PPC and ARMv7 though, that is not true any longer. For > example, our conservative memory model is more conservative than > seq_cst semantics. E.g. it also has "leading sync" semantics always > guaranteed, which is exploited in our code base and would be broken > if translated simply as seq_cst. Also, since the fencing from the > C++ compiler must be compliant with what our code generation does, > they could end up being incompatible due to choice of different > fencing conventions. Intrinsic provided operations may or may not > have leading sync semantics. We can hope for it, but we should never > rely on it. We can use intrinsics to get any fencing we want. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Mon Sep 4 10:18:04 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 4 Sep 2017 12:18:04 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> Message-ID: <4cdc378a-960f-6ffd-96cf-23e932da0dda@oracle.com> On 09/04/2017 12:05 PM, Andrew Haley wrote: > On 04/09/17 10:50, Erik ?sterlund wrote: > >> As for PPC and ARMv7 though, that is not true any longer. For >> example, our conservative memory model is more conservative than >> seq_cst semantics. E.g. it also has "leading sync" semantics always >> guaranteed, which is exploited in our code base and would be broken >> if translated simply as seq_cst. Also, since the fencing from the >> C++ compiler must be compliant with what our code generation does, >> they could end up being incompatible due to choice of different >> fencing conventions. Intrinsic provided operations may or may not >> have leading sync semantics. We can hope for it, but we should never >> rely on it. > > We can use intrinsics to get any fencing we want. > +1, was just writing the same thing. /Robbin From erik.osterlund at oracle.com Mon Sep 4 10:26:44 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 12:26:44 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> Message-ID: <59AD2A64.3070507@oracle.com> Hi Andrew, On 2017-09-04 12:05, Andrew Haley wrote: > On 04/09/17 10:50, Erik ?sterlund wrote: > >> As for PPC and ARMv7 though, that is not true any longer. For >> example, our conservative memory model is more conservative than >> seq_cst semantics. E.g. it also has "leading sync" semantics always >> guaranteed, which is exploited in our code base and would be broken >> if translated simply as seq_cst. Also, since the fencing from the >> C++ compiler must be compliant with what our code generation does, >> they could end up being incompatible due to choice of different >> fencing conventions. Intrinsic provided operations may or may not >> have leading sync semantics. We can hope for it, but we should never >> rely on it. > We can use intrinsics to get any fencing we want. 1) I want evidence for this claim. Can you get leading and trailing dmb sy (rather than dmb ish) for atomic operations on ARMv7? 2) Even if you could and the compiler happens to generate that - we can not rely on it because there is no contract to the compiler what fence instructions it elects to use. The only contract the compiler needs to abide to is how atomic C++ operations interact with other C++ operations. And we do not want the underlying fencing to silently change when performing compiler upgrades. Thanks, /Erik From magnus.ihse.bursie at oracle.com Mon Sep 4 10:30:20 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 4 Sep 2017 12:30:20 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: References: <088045c0-efcc-ab7a-a088-e80579a3c12b@physik.fu-berlin.de> <14f28ddc-6929-dd0d-a77a-1a463c47d40b@oracle.com> <79503f9c-bc57-725e-b8f1-40cb522b9218@physik.fu-berlin.de> <90649da7-48e5-22b1-3118-5861c6bb0e24@physik.fu-berlin.de> <3cb65ceb-c575-e446-bb66-a50c4b02684a@physik.fu-berlin.de> Message-ID: <12eb6779-8b25-89b5-b3c0-ea30828979fd@oracle.com> On 2017-08-24 18:19, Thomas St?fe wrote: > On Thu, Aug 24, 2017 at 3:51 PM, John Paul Adrian Glaubitz < > glaubitz at physik.fu-berlin.de> wrote: > >> On 08/24/2017 03:22 PM, John Paul Adrian Glaubitz wrote: >> >>> Do the gtests (especially test_memset_with_concurrent_readers.cpp) run >>>> through with your patch? >>>> >>> I will run the testsuite in a second and report back. >>> >> Ok. I have to admit I don't understand how to run the testsuite out of the >> build tree. It mentions jtreg which I have installed: >> >> glaubitz at deb4g:~$ jtreg -version >> jtreg, version 4.2 src b07 >> Installed in /usr/share/java/jtreg.jar >> Running on platform version 1.8.0_144 from /usr/lib/jvm/java-8-openjdk-sp >> arc64/jre. >> Built with 1.8.0_131 on Tue, 20 Jun 2017 10:54:14 +0200. >> Copyright (c) 1999, 2016, Oracle and/or its affiliates. All rights >> reserved. >> Use is subject to license terms. >> glaubitz at deb4g:~$ >> >> But the configure script complains about jtreg missing: >> >> checking if jtreg failure handler should be built... configure: error: >> Cannot enable jtreg failure handler without jtreg. >> configure exiting with result code 1 >> glaubitz at deb4g:~/openjdk/hs$ >> >> I also don't fully understand how the testsuite is run as mentioned in >> [1]. It >> talks about jtreg and then about jtreg harness which doesn't have clear >> build >> instructions [2]. >> >> Adrian >> > Sorry, I should have been more specific. The gtests have nothing to do with > the jtreg suite, they are a set of native tests using google test. > > Just execute (from your build directory): > ./hotspot/variant-server/libjvm/gtest/gtestLauncher -jdk:./images/jdk > > There is also a way to execute them from the make, but I do not know how. For the record: "make run-test-gtest" or "make run-test TEST=gtest" The latter form also allows for a test selection, like this: "make run-test TEST=gtest:LogDecorations". See common/doc/testing.md for more information. /Magnus > > Best Regards, Thomas > > >> [1] http://download.java.net/openjdk/testresults/8/docs/howtoruntests.html >>> [2] http://openjdk.java.net/jtreg/build.html >>> >> -- >> .''`. John Paul Adrian Glaubitz >> : :' : Debian Developer - glaubitz at debian.org >> `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de >> `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 >> From aph at redhat.com Mon Sep 4 10:41:38 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 11:41:38 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AD2A64.3070507@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> Message-ID: On 04/09/17 11:26, Erik ?sterlund wrote: > 1) I want evidence for this claim. Can you get leading and trailing dmb > sy (rather than dmb ish) for atomic operations on ARMv7? I hope not. There is no reason for us to want such a thing in HotSpot. But even if we did want such a thing, we could crop down to asm: the point is the usual cases, not weird corner cases. > 2) Even if you could and the compiler happens to generate that - we can > not rely on it because there is no contract to the compiler what fence > instructions it elects to use. The only contract the compiler needs to > abide to is how atomic C++ operations interact with other C++ > operations. And we do not want the underlying fencing to silently change > when performing compiler upgrades. There is no way that GCC writers would break ABI compatibility in such a fundamental way. There would be a firestorm. I know this because even if no-one else started the fire, I would. I am a GCC author. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From glaubitz at physik.fu-berlin.de Mon Sep 4 11:18:30 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Mon, 4 Sep 2017 13:18:30 +0200 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? Message-ID: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Hello! I'm currently testing Zero builds on Linux Alpha, in my particular case on QEMU in a Debian unstable alpha chroot, using OpenJDK 8 for bootstrapping. For some reason, OpenJDK 8 from Debian's openjdk8 assumes a heap size which is too small and refuses to start: (sid-alpha-sbuild)root at nofan:/# java -version Error occurred during initialization of VM Too small initial heap (sid-alpha-sbuild)root at nofan:/# This can be fixed by overriding the heap settings with _JAVA_OPTIONS: (sid-alpha-sbuild)root at nofan:/# export _JAVA_OPTIONS="-Xmx1024m -Xms256m" (sid-alpha-sbuild)root at nofan:/# java -version Picked up _JAVA_OPTIONS: -Xmx1024m -Xms256m openjdk version "1.8.0_141" OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-3-b15) OpenJDK 64-Bit Zero VM (build 25.141-b15, interpreted mode) (sid-alpha-sbuild)root at nofan:/# As you can see, this has the side effect that the JVM becomes very chatty about the fact that _JAVA_OPTIONS were set. While this doesn't seem to be a problem at first sight, it becomes a problem when trying to run configure for JDK10 which will fail because of the unexpected output when trying to determine the version of the boot JDK: configure: Found potential Boot JDK using configure arguments configure: Potential Boot JDK found at /usr/lib/jvm/java-8-openjdk-alpha/ is incorrect JDK version (Picked up _JAVA_OPTIONS: -Xmx1024m -Xms256m); ignoring configure: (Your Boot JDK must be version 8 or 9) configure: error: The path given by --with-boot-jdk does not contain a valid Boot JDK configure exiting with result code 1 Is there any way to silence the JVM regarding "_JAVA_OPTIONS"? If no, we should probably patch the JVM to do that by default. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From aph at redhat.com Mon Sep 4 11:36:55 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 12:36:55 +0100 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Message-ID: On 04/09/17 12:18, John Paul Adrian Glaubitz wrote: > Hello! > > I'm currently testing Zero builds on Linux Alpha, in my particular case on > QEMU in a Debian unstable alpha chroot, using OpenJDK 8 for bootstrapping. > > For some reason, OpenJDK 8 from Debian's openjdk8 assumes a heap size which > is too small and refuses to start: > > (sid-alpha-sbuild)root at nofan:/# java -version > Error occurred during initialization of VM > Too small initial heap > (sid-alpha-sbuild)root at nofan:/# > > This can be fixed by overriding the heap settings with _JAVA_OPTIONS: We should probably just fix the bug. I recently did something very similar for another target, but I can't find it. :-) -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Alan.Bateman at oracle.com Mon Sep 4 11:37:15 2017 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 4 Sep 2017 12:37:15 +0100 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Message-ID: On 04/09/2017 12:18, John Paul Adrian Glaubitz wrote: > : > > Is there any way to silence the JVM regarding "_JAVA_OPTIONS"? If no, > we should probably patch the JVM to do that by default. The undocumented/unsupported _JAVA_OPTIONS option is highly problematic. One of its flaws is that it appends rather than prepends so it potentially overrides options that you specify on the command lines. So the output message is deliberate, it would be too confusing to have VM options magically overridden. For the issue you are running into then I assume the probe in the build can be updated to ignore the message. -Alan From glaubitz at physik.fu-berlin.de Mon Sep 4 11:53:14 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Mon, 4 Sep 2017 13:53:14 +0200 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Message-ID: <7b3b22cf-0419-9f43-b74f-1ba628a9f500@physik.fu-berlin.de> On 09/04/2017 01:36 PM, Andrew Haley wrote: >> This can be fixed by overriding the heap settings with _JAVA_OPTIONS: > > We should probably just fix the bug. I recently did something very similar > for another target, but I can't find it. :-) Oh, I agree. I just wasn't sure where the default heap settings come from. Can you point me to the place in the sources? Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From magnus.ihse.bursie at oracle.com Mon Sep 4 12:15:41 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 4 Sep 2017 14:15:41 +0200 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Message-ID: On 2017-09-04 13:18, John Paul Adrian Glaubitz wrote: > Hello! > > I'm currently testing Zero builds on Linux Alpha, in my particular > case on > QEMU in a Debian unstable alpha chroot, using OpenJDK 8 for > bootstrapping. > > For some reason, OpenJDK 8 from Debian's openjdk8 assumes a heap size > which > is too small and refuses to start: > > (sid-alpha-sbuild)root at nofan:/# java -version > Error occurred during initialization of VM > Too small initial heap > (sid-alpha-sbuild)root at nofan:/# > > This can be fixed by overriding the heap settings with _JAVA_OPTIONS: > > (sid-alpha-sbuild)root at nofan:/# export _JAVA_OPTIONS="-Xmx1024m -Xms256m" > (sid-alpha-sbuild)root at nofan:/# java -version > Picked up _JAVA_OPTIONS: -Xmx1024m -Xms256m > openjdk version "1.8.0_141" > OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-3-b15) > OpenJDK 64-Bit Zero VM (build 25.141-b15, interpreted mode) > (sid-alpha-sbuild)root at nofan:/# > > As you can see, this has the side effect that the JVM becomes very > chatty about the fact that _JAVA_OPTIONS were set. > > While this doesn't seem to be a problem at first sight, it becomes > a problem when trying to run configure for JDK10 which will fail > because of the unexpected output when trying to determine the version > of the boot JDK: > > configure: Found potential Boot JDK using configure arguments > configure: Potential Boot JDK found at > /usr/lib/jvm/java-8-openjdk-alpha/ is incorrect JDK version (Picked up > _JAVA_OPTIONS: -Xmx1024m -Xms256m); ignoring > configure: (Your Boot JDK must be version 8 or 9) > configure: error: The path given by --with-boot-jdk does not contain a > valid Boot JDK > configure exiting with result code 1 > > Is there any way to silence the JVM regarding "_JAVA_OPTIONS"? If no, > we should probably patch the JVM to do that by default. Ouch! Lots of small, idiotic issues. For the build identification part: are both the _JAVA_OPTIONS and the version outputted to stdout? Or can you separate them by separating stdout/stderr? Otherwise, this patch would solve the issue in your case. I'm not sure how it would affect all other java instances we try to detect, so I'm a bit reluctant to take it in. diff --git a/common/autoconf/boot-jdk.m4 b/common/autoconf/boot-jdk.m4 --- a/common/autoconf/boot-jdk.m4 +++ b/common/autoconf/boot-jdk.m4 @@ -74,7 +74,7 @@ BOOT_JDK_FOUND=no else # Oh, this is looking good! We probably have found a proper JDK. Is it the correct version? - BOOT_JDK_VERSION=`"$BOOT_JDK/bin/java" -version 2>&1 | $HEAD -n 1` + BOOT_JDK_VERSION=`"$BOOT_JDK/bin/java" -version 2>&1 | $GREP version | $HEAD -n 1` # Extra M4 quote needed to protect [] in grep expression. [FOUND_CORRECT_VERSION=`$ECHO $BOOT_JDK_VERSION | $EGREP '\"9([\.+-].*)?\"|(1\.[89]\.)'`] But the main problem here seems to be the Debian openjdk8 instance that crashes on "java -version". Seems like a good and simple test to add to your test matrix. ;-) /Magnus > > Adrian > From thomas.stuefe at gmail.com Mon Sep 4 12:36:29 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Sep 2017 14:36:29 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <12eb6779-8b25-89b5-b3c0-ea30828979fd@oracle.com> References: <088045c0-efcc-ab7a-a088-e80579a3c12b@physik.fu-berlin.de> <14f28ddc-6929-dd0d-a77a-1a463c47d40b@oracle.com> <79503f9c-bc57-725e-b8f1-40cb522b9218@physik.fu-berlin.de> <90649da7-48e5-22b1-3118-5861c6bb0e24@physik.fu-berlin.de> <3cb65ceb-c575-e446-bb66-a50c4b02684a@physik.fu-berlin.de> <12eb6779-8b25-89b5-b3c0-ea30828979fd@oracle.com> Message-ID: On Mon, Sep 4, 2017 at 12:30 PM, Magnus Ihse Bursie < magnus.ihse.bursie at oracle.com> wrote: > > On 2017-08-24 18:19, Thomas St?fe wrote: > >> On Thu, Aug 24, 2017 at 3:51 PM, John Paul Adrian Glaubitz < >> glaubitz at physik.fu-berlin.de> wrote: >> >> On 08/24/2017 03:22 PM, John Paul Adrian Glaubitz wrote: >>> >>> Do the gtests (especially test_memset_with_concurrent_readers.cpp) run >>>> >>>>> through with your patch? >>>>> >>>>> I will run the testsuite in a second and report back. >>>> >>>> Ok. I have to admit I don't understand how to run the testsuite out of >>> the >>> build tree. It mentions jtreg which I have installed: >>> >>> glaubitz at deb4g:~$ jtreg -version >>> jtreg, version 4.2 src b07 >>> Installed in /usr/share/java/jtreg.jar >>> Running on platform version 1.8.0_144 from /usr/lib/jvm/java-8-openjdk-sp >>> arc64/jre. >>> Built with 1.8.0_131 on Tue, 20 Jun 2017 10:54:14 +0200. >>> Copyright (c) 1999, 2016, Oracle and/or its affiliates. All rights >>> reserved. >>> Use is subject to license terms. >>> glaubitz at deb4g:~$ >>> >>> But the configure script complains about jtreg missing: >>> >>> checking if jtreg failure handler should be built... configure: error: >>> Cannot enable jtreg failure handler without jtreg. >>> configure exiting with result code 1 >>> glaubitz at deb4g:~/openjdk/hs$ >>> >>> I also don't fully understand how the testsuite is run as mentioned in >>> [1]. It >>> talks about jtreg and then about jtreg harness which doesn't have clear >>> build >>> instructions [2]. >>> >>> Adrian >>> >>> Sorry, I should have been more specific. The gtests have nothing to do >> with >> the jtreg suite, they are a set of native tests using google test. >> >> Just execute (from your build directory): >> ./hotspot/variant-server/libjvm/gtest/gtestLauncher -jdk:./images/jdk >> >> There is also a way to execute them from the make, but I do not know how. >> > For the record: > > "make run-test-gtest" > or > "make run-test TEST=gtest" > > The latter form also allows for a test selection, like this: "make > run-test TEST=gtest:LogDecorations". > > See common/doc/testing.md for more information. > > /Magnus > > Thank you Magnus! I usually prefer running the test directly, because I might have to fire up the debugger and debug them, and this is difficult if the test is a sub process of make. But yes, this is easier if one expects no errors. ..Thomas > > >> Best Regards, Thomas >> >> >> [1] http://download.java.net/openjdk/testresults/8/docs/howtorun >>> tests.html >>> >>>> [2] http://openjdk.java.net/jtreg/build.html >>>> >>>> -- >>> .''`. John Paul Adrian Glaubitz >>> : :' : Debian Developer - glaubitz at debian.org >>> `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de >>> `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 >>> >>> > From aph at redhat.com Mon Sep 4 12:36:44 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 13:36:44 +0100 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: <7b3b22cf-0419-9f43-b74f-1ba628a9f500@physik.fu-berlin.de> References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> <7b3b22cf-0419-9f43-b74f-1ba628a9f500@physik.fu-berlin.de> Message-ID: On 04/09/17 12:53, John Paul Adrian Glaubitz wrote: > On 09/04/2017 01:36 PM, Andrew Haley wrote: >>> This can be fixed by overriding the heap settings with _JAVA_OPTIONS: >> >> We should probably just fix the bug. I recently did something very similar >> for another target, but I can't find it. :-) > > Oh, I agree. I just wasn't sure where the default heap settings come from. > > Can you point me to the place in the sources? This is the one I was thinking of. It's perhaps not what you need. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -------------- next part -------------- A non-text attachment was scrubbed... Name: java-1.8.0-openjdk-s390-java-opts.patch Type: text/x-patch Size: 1578 bytes Desc: not available URL: From magnus.ihse.bursie at oracle.com Mon Sep 4 12:56:30 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 4 Sep 2017 14:56:30 +0200 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> <7b3b22cf-0419-9f43-b74f-1ba628a9f500@physik.fu-berlin.de> Message-ID: This will only affect the arguments when subsequently running java calls during the build. Adrian's issue was that a simple "java -version" call failed during detection of what JDK to use as boot JDK. However, is is likely that even if Adrian gets the initial test to pass, something similar to this will be needed for make-time calls to java. /Magnus On 2017-09-04 14:36, Andrew Haley wrote: > On 04/09/17 12:53, John Paul Adrian Glaubitz wrote: >> On 09/04/2017 01:36 PM, Andrew Haley wrote: >>>> This can be fixed by overriding the heap settings with _JAVA_OPTIONS: >>> We should probably just fix the bug. I recently did something very similar >>> for another target, but I can't find it. :-) >> Oh, I agree. I just wasn't sure where the default heap settings come from. >> >> Can you point me to the place in the sources? > This is the one I was thinking of. It's perhaps not what you need. > From aph at redhat.com Mon Sep 4 13:22:36 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 14:22:36 +0100 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> <7b3b22cf-0419-9f43-b74f-1ba628a9f500@physik.fu-berlin.de> Message-ID: <7907c4f8-e9e7-cf32-d013-e82d47aeafce@redhat.com> On 04/09/17 13:56, Magnus Ihse Bursie wrote: > This will only affect the arguments when subsequently running java calls > during the build. Adrian's issue was that a simple "java -version" call > failed during detection of what JDK to use as boot JDK. Oh yeah, right. Which means I still haven't found the change I was thinking of. :-( -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Mon Sep 4 13:31:24 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 15:31:24 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> Message-ID: <59AD55AC.4030105@oracle.com> Hi Andrew, On 2017-09-04 12:41, Andrew Haley wrote: > On 04/09/17 11:26, Erik ?sterlund wrote: >> 1) I want evidence for this claim. Can you get leading and trailing dmb >> sy (rather than dmb ish) for atomic operations on ARMv7? > I hope not. There is no reason for us to want such a thing in HotSpot. > But even if we did want such a thing, we could crop down to asm: the > point is the usual cases, not weird corner cases. So we can not emit any fencing we want with GCC intrinsics, let alone the fencing we already have and rely on today on ARMv7. The discussion about whether we should relax our ARMv7 fencing or not is a different discussion, and is unrelated to the claim that we can get any fencing we want with GCC intrinsics. The point is that we can not control the fencing arbitrarily, let alone even get the fencing we have today. >> 2) Even if you could and the compiler happens to generate that - we can >> not rely on it because there is no contract to the compiler what fence >> instructions it elects to use. The only contract the compiler needs to >> abide to is how atomic C++ operations interact with other C++ >> operations. And we do not want the underlying fencing to silently change >> when performing compiler upgrades. > There is no way that GCC writers would break ABI compatibility in such a > fundamental way. There would be a firestorm. I know this because even > if no-one else started the fire, I would. I am a GCC author. Thank you for your reassurance. I appreciate that you take ABI compatibility seriously. Yet over the years, the bindings have changed over time as our understanding of implications of the memory model has evolved - especially when mixing stronger and weaker accesses on the same fields. Even 2017, there are still papers published about how seq_cst mixed with weaker memory ordering needs fixing in the bindings (cf. "Repairing sequential consistency in C/C++11", PLDI'17), resulting in new bindings with both leading sync and trailing sync conventions being proposed (the choice of convention is up to compiler writers). I do not feel confident we can rely on these bindings never changing. As there is no contract or explicit ABI, compiler writers are free to do whatever that is consistent within the boundaries of C++ code and the C++ memory model. The actual ABI is hidden from that contract. And I would not happily embed reliance on intentionally undocumented, implicit, unofficial ABIs that are known to have different fencing conventions that may or may not be compatible with what our generated code requires. Generating the code, disassembling, and then assuming whatever binding was observed in the disassembly is a binding contract, is not a reliable approach. If we require a specific fence, then I do not see why we would not simply emit this specific fence that we require explicitly, rather than insisting on using some intrinsic and hoping it will emit that exact fence that we rely on through some implicit, undocumented, unofficial ABI, that may silently change over time. I fail to see the attraction. Thanks, /Erik From aph at redhat.com Mon Sep 4 16:05:25 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 17:05:25 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AD55AC.4030105@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> <59AD55AC.4030105@oracle.com> Message-ID: <3a6fbae3-cddb-6ac0-890d-da4b33308b5e@redhat.com> Hi, On 04/09/17 14:31, Erik ?sterlund wrote: > On 2017-09-04 12:41, Andrew Haley wrote: >> On 04/09/17 11:26, Erik ?sterlund wrote: >>> 1) I want evidence for this claim. Can you get leading and trailing dmb >>> sy (rather than dmb ish) for atomic operations on ARMv7? >> I hope not. There is no reason for us to want such a thing in HotSpot. >> But even if we did want such a thing, we could crop down to asm: the >> point is the usual cases, not weird corner cases. > > So we can not emit any fencing we want with GCC intrinsics, let alone > the fencing we already have and rely on today on ARMv7. There are corner cases, for which asm can be used, yes. > The discussion about whether we should relax our ARMv7 fencing or > not is a different discussion, and is unrelated to the claim that we > can get any fencing we want with GCC intrinsics. I accept the point in principle, but I suggest it's a bad example: I do not believe that we want DMB SY. > The point is that we can not control the > fencing arbitrarily, let alone even get the fencing we have today. Arbitrarily, no. But I guess you'd expect me to point out that argument can be flipped on its head: if we'd used intrinsics rather than asms the mistake of using DMB SY would have been averted. You can look at this issue in (at least) two ways. :-) >>> 2) Even if you could and the compiler happens to generate that - we can >>> not rely on it because there is no contract to the compiler what fence >>> instructions it elects to use. The only contract the compiler needs to >>> abide to is how atomic C++ operations interact with other C++ >>> operations. And we do not want the underlying fencing to silently change >>> when performing compiler upgrades. >> >> There is no way that GCC writers would break ABI compatibility in such a >> fundamental way. There would be a firestorm. I know this because even >> if no-one else started the fire, I would. I am a GCC author. > > Thank you for your reassurance. I appreciate that you take ABI > compatibility seriously. Yet over the years, the bindings have changed > over time as our understanding of implications of the memory model has > evolved - especially when mixing stronger and weaker accesses on the > same fields. Absolutely so, yes, and IMVHO such code should be taken from HotSpot and quietly put out of its misery. Even if the program is correct it'll require a lot of analysis, and pity the poor programmer who comes across it in a few years' time. > Even 2017, there are still papers published about how > seq_cst mixed with weaker memory ordering needs fixing in the bindings > (cf. "Repairing sequential consistency in C/C++11", PLDI'17), resulting > in new bindings with both leading sync and trailing sync conventions > being proposed (the choice of convention is up to compiler writers). Sure, but there's no way that GCC (or any other serious compiler) is going to make changes in a way that isn't at least compatible with existing binaries. Power PC has its problems, mostly due to being rather old, and I'm not at all surprised to hear that mistakes have been made, given that the language used in the processor definition and the language used in the C++ language standard don't map onto each other in an obvious way. But none of this extends to Linux/x86, which has a straightforward implementation of all of this stuff. > I do not feel confident we can rely on these bindings never > changing. As there is no contract or explicit ABI, compiler writers > are free to do whatever that is consistent within the boundaries of > C++ code and the C++ memory model. The actual ABI is hidden from > that contract. And I would not happily embed reliance on > intentionally undocumented, implicit, unofficial ABIs that are known > to have different fencing conventions that may or may not be > compatible with what our generated code requires. Generating the > code, disassembling, and then assuming whatever binding was observed > in the disassembly is a binding contract, is not a reliable > approach. I suppose I can understand this difference in opinion because my view of GCC is very different from yours: to me it's a white box, not a black box, and I certainly wouldn't take the approach of just looking at the generated code. > If we require a specific fence, then I do not see why we would not > simply emit this specific fence that we require explicitly, rather than > insisting on using some intrinsic and hoping it will emit that exact > fence that we rely on through some implicit, undocumented, unofficial > ABI, that may silently change over time. I fail to see the attraction. That one is easy: if you tell the compiler what you're doing rather than hiding it inside an asm, the compiler can generate better code. The resulting program is also much simpler. Also, you avoid the risks inherent in writing inline asms: only recently have the x86/Linux asms been corrected to add a memory clobber. This is an extremely serious flaw, and it's been around for a very long while. We're talking about risk, yet your risk is of a rather theoretical nature, rather than that one which has already happened. We can, of course, argue that we are where we are, and that bug is fixed, so it no longer matters, but it does IMO point to where the real risk in using inline asm lies. However, having said all of that, let me be clear: while I do not believe that the inline asms for each platform are the best way of doing this, to change them at this point would be unduly disruptive. I am not suggesting that they should be changed now. I am very strongly suggesting that they should be changed in the future, and that we should move to using intrinsics. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From volker.simonis at gmail.com Mon Sep 4 17:23:09 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 4 Sep 2017 19:23:09 +0200 Subject: RFR(S): 8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob() In-Reply-To: References: Message-ID: On Fri, Sep 1, 2017 at 6:00 PM, Vladimir Kozlov wrote: > Checking type is emulation of virtual call ;-) I agree :) But it is only a bimorphic dispatch in this case which should be still faster than a normal virtual call. > But I agree that it is simplest solution - one line change (excluding > comment - comment is good BTW). > Thanks. > You can also add guard AOT_ONLY() around aot specific code: > > const void* start = AOT_ONLY( (code_blob_type() == CodeBlobType::AOT) ? > blob->code_begin() : ) (void*)blob; > > because we do have builds without AOT. > Done. Please find the new webrev here: http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v1/ Could you please sponsor the change once jdk10-hs opens again? Thanks, Volker PS: one thing which is still unclear to me is why you haven't caught this issue before? Isn't test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java part of JPRT and/or your regular tests? > Thanks, > Vladimir > > > On 9/1/17 8:42 AM, Volker Simonis wrote: >> >> Hi, >> >> can I please have a review and sponsor for the following small fix: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091/ >> https://bugs.openjdk.java.net/browse/JDK-8187091 >> >> We see failures in >> test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java which >> are cause by problems in CodeHeap::contains_blob() for corner cases >> with CodeBlobs of zero size: >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (heap.cpp:248), pid=27586, tid=27587 >> # guarantee((char*) b >= _memory.low_boundary() && (char*) b < >> _memory.high()) failed: The block to be deallocated 0x00007fffe6666f80 >> is not within the heap starting with 0x00007fffe6667000 and ending >> with 0x00007fffe6ba000 >> >> The problem is that JDK-8183573 replaced >> >> virtual bool contains_blob(const CodeBlob* blob) const { return >> low_boundary() <= (char*) blob && (char*) blob < high(); } >> >> by: >> >> bool contains_blob(const CodeBlob* blob) const { return >> contains(blob->code_begin()); } >> >> But that my be wrong in the corner case where the size of the >> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >> 'header' - i.e. the C++ object itself) because in that case >> CodeBlob::code_begin() points right behind the CodeBlob's header which >> is a memory location which doesn't belong to the CodeBlob anymore. >> >> This exact corner case is exercised by ReturnBlobToWrongHeapTest which >> allocates CodeBlobs of size zero (i.e. zero 'payload') with the help >> of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills >> up. The test first fills the 'non-profiled nmethods' CodeHeap. If the >> 'non-profiled nmethods' CodeHeap is full, the VM automatically tries >> to allocate from the 'profiled nmethods' CodeHeap until that fills up >> as well. But in the CodeCache the 'profiled nmethods' CodeHeap is >> located right before the non-profiled nmethods' CodeHeap. So if the >> last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a >> payload size of zero and uses all the CodeHeaps remaining size, we >> will end up with a CodeBlob whose code_begin() address will point >> right behind the actual CodeHeap (i.e. it will point right at the >> beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This >> will result in the above guarantee to fire, when we will try to free >> the last allocated CodeBlob (with >> sun.hotspot.WhiteBox.freeCodeBlob()). >> >> In a previous mail thread >> >> (http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/028175.html) >> Vladimir explained why JDK-8183573 was done: >> >>> About contains_blob(). The problem is that AOTCompiledMethod allocated in >>> CHeap and not in aot code section (which is RO): >>> >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>> >>> It is allocated in CHeap after AOT library is loaded. Its code_begin() >>> points to AOT code section but AOTCompiledMethod* >>> points outside it (to normal malloced space) so you can't use (char*)blob >>> address. >> >> >> and proposed these two fixes: >> >>> There are 2 ways to fix it, I think. >>> One is to add new field to CodeBlobLayout and set it to blob* address for >>> normal CodeCache blobs and to code_begin for >>> AOT code. >>> Second is to use contains(blob->code_end() - 1) assuming that AOT code is >>> never zero. >> >> >> I came up with a slightly different solution - just use >> 'CodeHeap::code_blob_type()' whether to use 'blob->code_begin()' (for >> the AOT case) or '(void*)blob' (for all other blobs) as input for the >> call to 'CodeHeap::contain()'. It's simple and still much cheaper than >> a virtual call. What do you think? >> >> I've also updated the documentation of the CodeBlob class hierarchy in >> codeBlob.hpp. Please let me know if I've missed something. >> >> Thank you and best regards, >> Volker >> > From david.holmes at oracle.com Mon Sep 4 21:59:12 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 5 Sep 2017 07:59:12 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59ACFD76.3000606@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> Message-ID: <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> Hi Erik, On 4/09/2017 5:15 PM, Erik ?sterlund wrote: > Hi David, > > On 2017-09-04 03:24, David Holmes wrote: >> Hi Erik, >> >> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>> Hi David, >>> >>> The shared structure for all operations is the following: >>> >>> An Atomic::something call creates a SomethingImpl function object >>> that performs some basic type checking and then forwards the call >>> straight to a PlatformSomething function object. This >>> PlatformSomething object could decide to do anything. But to make >>> life easier, it may inherit from a shared SomethingHelper function >>> object with CRTP that calls back into the PlatformSomething function >>> object to emit inline assembly. >> >> Right, but! Lets look at some details. >> >> Atomic::add >> ? AddImpl >> ??? PlatformAdd >> ????? FetchAndAdd >> ????? AddAndFetch >> ????? add_using_helper >> >> Atomic::cmpxchg >> ? CmpxchgImpl >> ??? PlatformCmpxchg >> ????? cmpxchg_using_helper >> >> Atomic::inc >> ? IncImpl >> ??? PlatformInc >> ????? IncUsingConstant >> >> Why is it that the simplest operation (inc/dec) has the most complex >> platform template definition? Why do we need Adjustment? You >> previously said "Adjustment represents the increment/decrement value >> as an IntegralConstant - your template friend for passing around a >> constant with both a specified type and value in templates". But add >> passes around values and doesn't need this. Further inc/dec don't need >> to pass anything around anywhere - inc adds 1, dec subtracts 1! This >> "1" does not need to appear anywhere in the API or get passed across >> layers - the only place this "1" becomes evident is in the actual >> platform asm that does the logic of "add 1" or "subtract 1". >> >> My understanding from previous discussions is that much of the >> template machinations was to deal with type management for "dest" and >> the values being passed around. But here, for inc/dec there are no >> values being passed so we don't have to make "dest" type-compatible >> with any value. > > Dealing with different types being passed in is one part of the problem > - a problem that almost all operations seems to have. But Atomic::add > and inc/dec have more problems to deal with. > > The Atomic::add operation has two more problems that cmpxchg does not have. > 1) It needs to scale pointer arithmetic. So if you have a P* and you add > it by 2, then you really add the underlying value by 2 * sizeof(P), and > the scaled addend needs to be of the right type - the type of the > destination for integral types and ptrdiff_t for pointers. This is > similar semantics to ++pointer. I'll address this below - but yes I overlooked this aspect. > 2) It connects backends with different semantics - either fetch_and_add > or add_and_fetch to a common public interface with add_and_fetch semantics. Not at all clear why this has to manifest in the upper/middle layers instead of being handled by the actual lowest-layer ?? > This is the reason that Atomic::add might appear more complicated than > Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type > problems to deal with - no pointer arithmetics. > > The reason why Atomic::inc/dec looks more complicated than Atomic::add > is that it needs to preserve the pointer arithmetic as constants rather > than values, because the scaled addend is embedded in the inline > assembly as immediate values. Therefore it passes around an > IntegralConstant that embeds both the type and size of the addend. And > it is not just 1/-1. For integral destinations the constant used is 1/-1 > of the type stored at the destination. For pointers the constant is > ptrdiff_t with a value representing the size of the element pointed to. This is insanely complicated (I think that counts as 'accidental complexity' per Andrew's comment ;-) ). Pointer arithmetic is a basic/fundamental part of C/C++, yet this template stuff has to jump through multiple inverted hoops to do something the language "just does"! All this complexity to manage a conversion addend -> addend * sizeof(*dest) ?? And the fact that inc/dec are simpler than add, yet result in far more complicated templates because the simpler addend is a constant, is just as unfathomable to me! > Having said that - I am not opposed to simply removing the > specializations of inc/dec if we are scared of the complexity of passing > this constant to the platform layer. After running a bunch of benchmarks > over the weekend, it showed no significant regressions after removal. > Now of course that might not tell the full story - it could have missed > that some critical operation in the JVM takes longer. But I would be > very surprised if that was the case. I can imagine we use an "add immediate" form for inc/dec of 1, do we actually use that for other values? I would expect inc_ptr/dec_ptr to always translate to add_ptr, with no special case for when ptr is char* and so we only add/sub 1. ?? Thanks, David > Thanks, > /Erik > >> >> Cheers, >> David >> ----- >> >>> Hope this explanation helps understanding the intended structure of >>> this work. >>> >>> Thanks, >>> /Erik >>> >>> On 2017-09-01 12:34, David Holmes wrote: >>>> Hi Erik, >>>> >>>> I just wanted to add that I would expect the cmpxchg, add and inc, >>>> Atomic API's to all require similar basic structure for manipulating >>>> types/values etc, yet all three seem to have quite different >>>> structures that I find very confusing. I'm still at a loss to fathom >>>> the CRTP and the hoops we seemingly have to jump through just to add >>>> or subtract 1!!! >>>> >>>> Cheers, >>>> David >>>> >>>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>>> Hi David, >>>>> >>>>> On 2017-09-01 02:49, David Holmes wrote: >>>>>> Hi Erik, >>>>>> >>>>>> Sorry but this one is really losing me. >>>>>> >>>>>> What is the role of Adjustment ?? >>>>> >>>>> Adjustment represents the increment/decrement value as an >>>>> IntegralConstant - your template friend for passing around a >>>>> constant with both a specified type and value in templates. The >>>>> type of the increment/decrement is the type of the destination when >>>>> the destination is an integral type, otherwise if it is a pointer >>>>> type, the increment/decrement type is ptrdiff_t. >>>>> >>>>>> How are inc/dec anything but "using constant" ?? >>>>> >>>>> I was also a bit torn on that name (I assume you are referring to >>>>> IncUsingConstant/DecUsingConstant). It was hard to find a name that >>>>> depicted what this platform helper does. I considered calling the >>>>> helper something with immediate in the name because it is really >>>>> used to embed the constant as immediate values in inline assembly >>>>> today. But then again that seemed too specific, as it is not >>>>> completely obvious platform specializations will use it in that >>>>> way. One might just want to specialize this to send it into some >>>>> compiler Atomic::inc intrinsic for example. Do you have any other >>>>> preferred names? Here are a few possible names for IncUsingConstant: >>>>> >>>>> IncUsingScaledConstant >>>>> IncUsingAdjustedConstant >>>>> IncUsingPlatformHelper >>>>> >>>>> Any favourites? >>>>> >>>>>> Why do we special case jshort?? >>>>> >>>>> To be consistent with the special case of Atomic::add on jshort. Do >>>>> you want it removed? >>>>> >>>>>> This is indecipherable to normal people ;-) >>>>>> >>>>>> ?This()->template inc(dest); >>>>>> >>>>>> For something as trivial as adding or subtracting 1 the template >>>>>> machinations here are just mind boggling! >>>>> >>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ >>>>> idiom. The idea is to devirtualize a virtual call by passing in the >>>>> derived type as a template parameter to a base class, and then let >>>>> the base class static_cast to the derived class to devirtualize the >>>>> call. I hope this explanation sheds some light on what is going on. >>>>> The same CRTP idiom was used in the Atomic::add implementation in a >>>>> similar fashion. >>>>> >>>>> I will add some comments describing this in the next round after >>>>> Coleen replies. >>>>> >>>>> Thanks for looking at this. >>>>> >>>>> /Erik >>>>> >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>>> Hi everyone, >>>>>>> >>>>>>> Bug ID: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>>> >>>>>>> Webrev: >>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>>> >>>>>>> The time has come for the next step in generalizing Atomic with >>>>>>> templates. Today I will focus on Atomic::inc/dec. >>>>>>> >>>>>>> I have tried to mimic the new Kim style that seems to have been >>>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>>>> structure looks like this: >>>>>>> >>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>>>> object that performs some basic type checks. >>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>>>> define the operation arbitrarily for a given platform. The >>>>>>> default implementation if not specialized for a platform is to >>>>>>> call Atomic::add. So only platforms that want to do something >>>>>>> different than that as an optimization have to provide a >>>>>>> specialization. >>>>>>> Layer 3) Platforms that decide to specialize >>>>>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>>>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>>>>> performing the necessary computation what the increment/decrement >>>>>>> should be after pointer scaling using CRTP. The >>>>>>> PlatformInc/PlatformDec operation then only needs to define an >>>>>>> inc/dec member function, and will then get all the context >>>>>>> information necessary to generate a more optimized >>>>>>> implementation. Easy peasy. >>>>>>> >>>>>>> It is worth noticing that the generalized Atomic::dec operation >>>>>>> assumes a two's complement integer machine and potentially sends >>>>>>> the unary negative of a potentially unsigned type to Atomic::add. >>>>>>> I have the following comments about this: >>>>>>> 1) We already assume in other code that two's complement integers >>>>>>> must be present. >>>>>>> 2) A machine that does not have two's complement integers may >>>>>>> still simply provide a specialization that solves the problem in >>>>>>> a different way. >>>>>>> 3) The alternative that does not make assumptions about that >>>>>>> would use the good old IntegerTypes::cast_to_signed >>>>>>> metaprogramming stuff, and I seem to recall we thought that was a >>>>>>> bit too involved and complicated. >>>>>>> This is the reason why I have chosen to use unary minus on the >>>>>>> potentially unsigned type in the shared helper code that sends >>>>>>> the decrement as an addend to Atomic::add. >>>>>>> >>>>>>> It would also be nice if somebody with access to PPC and s390 >>>>>>> machines could try out the relevant changes there so I do not >>>>>>> accidentally break those platforms. I have blind-coded the >>>>>>> addition of the immediate values passed in to the inline assembly >>>>>>> in a way that I think looks like it should work. >>>>>>> >>>>>>> Testing: >>>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>>> >>>>>>> Thanks, >>>>>>> /Erik >>>>> >>> > From kim.barrett at oracle.com Tue Sep 5 00:38:07 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 5 Sep 2017 01:38:07 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> Message-ID: <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> > On Sep 4, 2017, at 10:59 PM, David Holmes wrote: > > Hi Erik, > > On 4/09/2017 5:15 PM, Erik ?sterlund wrote: >> Hi David, >> On 2017-09-04 03:24, David Holmes wrote: >>> Hi Erik, >>> >>> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>>> Hi David, >>>> >>>> The shared structure for all operations is the following: >>>> >>>> An Atomic::something call creates a SomethingImpl function object that performs some basic type checking and then forwards the call straight to a PlatformSomething function object. This PlatformSomething object could decide to do anything. But to make life easier, it may inherit from a shared SomethingHelper function object with CRTP that calls back into the PlatformSomething function object to emit inline assembly. >>> >>> Right, but! Lets look at some details. >>> >>> Atomic::add >>> AddImpl >>> PlatformAdd >>> FetchAndAdd >>> AddAndFetch >>> add_using_helper >>> >>> Atomic::cmpxchg >>> CmpxchgImpl >>> PlatformCmpxchg >>> cmpxchg_using_helper >>> >>> Atomic::inc >>> IncImpl >>> PlatformInc >>> IncUsingConstant >>> >>> Why is it that the simplest operation (inc/dec) has the most complex platform template definition? Why do we need Adjustment? You previously said "Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates". But add passes around values and doesn't need this. Further inc/dec don't need to pass anything around anywhere - inc adds 1, dec subtracts 1! This "1" does not need to appear anywhere in the API or get passed across layers - the only place this "1" becomes evident is in the actual platform asm that does the logic of "add 1" or "subtract 1". >>> >>> My understanding from previous discussions is that much of the template machinations was to deal with type management for "dest" and the values being passed around. But here, for inc/dec there are no values being passed so we don't have to make "dest" type-compatible with any value. >> Dealing with different types being passed in is one part of the problem - a problem that almost all operations seems to have. But Atomic::add and inc/dec have more problems to deal with. >> The Atomic::add operation has two more problems that cmpxchg does not have. >> 1) It needs to scale pointer arithmetic. So if you have a P* and you add it by 2, then you really add the underlying value by 2 * sizeof(P), and the scaled addend needs to be of the right type - the type of the destination for integral types and ptrdiff_t for pointers. This is similar semantics to ++pointer. > > I'll address this below - but yes I overlooked this aspect. > >> 2) It connects backends with different semantics - either fetch_and_add or add_and_fetch to a common public interface with add_and_fetch semantics. > > Not at all clear why this has to manifest in the upper/middle layers instead of being handled by the actual lowest-layer ?? > >> This is the reason that Atomic::add might appear more complicated than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type problems to deal with - no pointer arithmetics. >> The reason why Atomic::inc/dec looks more complicated than Atomic::add is that it needs to preserve the pointer arithmetic as constants rather than values, because the scaled addend is embedded in the inline assembly as immediate values. Therefore it passes around an IntegralConstant that embeds both the type and size of the addend. And it is not just 1/-1. For integral destinations the constant used is 1/-1 of the type stored at the destination. For pointers the constant is ptrdiff_t with a value representing the size of the element pointed to. > > This is insanely complicated (I think that counts as 'accidental complexity' per Andrew's comment ;-) ). Pointer arithmetic is a basic/fundamental part of C/C++, yet this template stuff has to jump through multiple inverted hoops to do something the language "just does"! All this complexity to manage a conversion addend -> addend * sizeof(*dest) ?? > > And the fact that inc/dec are simpler than add, yet result in far more complicated templates because the simpler addend is a constant, is just as unfathomable to me! > >> Having said that - I am not opposed to simply removing the specializations of inc/dec if we are scared of the complexity of passing this constant to the platform layer. After running a bunch of benchmarks over the weekend, it showed no significant regressions after removal. Now of course that might not tell the full story - it could have missed that some critical operation in the JVM takes longer. But I would be very surprised if that was the case. > > I can imagine we use an "add immediate" form for inc/dec of 1, do we actually use that for other values? I would expect inc_ptr/dec_ptr to always translate to add_ptr, with no special case for when ptr is char* and so we only add/sub 1. ?? [Delurking briefly.] Sorry I've been silent until now in this discussion, but I'm on vacation, and won't have time until next week to really pay attention. But this seems to have gone somewhat awry, so I'm popping in briefly. David objected to some of the complexity, apparently based on forgetting the scaling for pointer arithmetic. That seems to have been cleared up. However, David (and I think others) are also objecting to other points of complexity, and I think I agree. I was working on an approach that was structurally similar to Atomic::add, but using IntegralContant to retain access to the literal value, for those platforms that benefit from that. Erik's proposal also uses IntegralContant for that purpose, but (from a quick skim) I think got that part wrong, and that is a source of additional complexity. Erik might want to re-read my handoff email to him. I don't know whether that approach would satisfy folks though. I was also looking into the possibility that more platforms might be able to just use Atomic::add to implement Atomic::inc (and maybe Atomic::dec), without an change to the generated code for inc/dec. This would be accomplished by improving the inline assembler for Atomic::add, using an "n" constraint for the addend when appropriate. In some cases this perhaps might be done by providing it as an alternative (e.g. using an "nr" constraint). I hadn't gotten gotten very far in exploring that possibility though, so it might not go anywhere. And I agree the existing barriers in inc/dec for powerpc (both aix and linux) look contrary to the documented requirements. I'm a little bit reluctant to just give up on per-platform microoptimized inc/dec and simply transform those operations into corresponding add operations. Requiring an additional register and it's initialization for some platforms seems like poor form. If this discussion hasn't reached consensus by next week, I'll start working with Erik then to get us there. From david.holmes at oracle.com Tue Sep 5 05:01:44 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 5 Sep 2017 15:01:44 +1000 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> Message-ID: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> Hi Rohit, I was unable to apply your patch to latest jdk10/hs/hotspot repo. Vladimir: are you able to host a webrev for this change please? Thanks, David ---- On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: > Hello Vladimir, > > On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov > wrote: >> Hi Rohit, >> >> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>> >>> Hello Vladimir, >>> >>>> Changes look good. Only question I have is about MaxVectorSize. It is set >>>>> >>>> 16 only in presence of AVX: >>>> >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>> >>>> Does that code works for AMD 17h too? >>> >>> >>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >>> I have removed the surplus check for MaxVectorSize from my patch. I >>> have updated, re-tested and attached the patch. >> >> >> Which check you removed? >> > > My older patch had the below mentioned check which was required on > JDK9 where the default MaxVectorSize was 64. It has been handled > better in openJDK10. So this check is not required anymore. > > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > ... > ... > + if (MaxVectorSize > 32) { > + FLAG_SET_DEFAULT(MaxVectorSize, 32); > + } > .. > .. > + } > >>> >>> I have one query regarding the setting of UseSHA flag: >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>> >>> AMD 17h has support for SHA. >>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>> enabled for it based on the availability of BMI2 and AVX2. Is there an >>> underlying reason for this? I have handled this in the patch but just >>> wanted to confirm. >> >> >> It was done with next changes which use only AVX2 and BMI2 instructions to >> calculate SHA-256: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >> >> I don't know if AMD 15h supports these instructions and can execute that >> code. You need to test it. >> > > Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, > it should work. > Confirmed by running following sanity tests: > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > > So I have removed those SHA checks from my patch too. > > Please find attached updated, re-tested patch. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -1109,11 +1109,27 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > +#ifdef COMPILER2 > + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -505,6 +505,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -515,19 +523,13 @@ > result |= CPU_LZCNT; > if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) > result |= CPU_SSE4A; > + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > + result |= CPU_HT; > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > > Please let me know your comments. > > Thanks for your time. > Rohit > >>> >>> Thanks for taking time to review the code. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1088,6 +1088,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1109,11 +1125,40 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -505,6 +505,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -515,19 +523,13 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> >>> >>> Regards, >>> Rohit >>> >>> >>> >>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>> wrote: >>>>>> >>>>>> >>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Rohit, >>>>>>> >>>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>>> logic >>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>> >>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>>> resubmit for review. >>>>>> >>>>>> Regards, >>>>>> Rohit >>>>>> >>>>> >>>>> Hi All, >>>>> >>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>> default) and didnt find any regressions. >>>>> >>>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>>> defaults for newer AMD 17h (EPYC) processor? >>>>> >>>>> ************************* Patch **************************** >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1088,6 +1088,22 @@ >>>>> } >>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>> } >>>>> + if (supports_sha()) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>> + } >>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>> UseSHA512Intrinsics) { >>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + warning("SHA instructions are not available on this CPU"); >>>>> + } >>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> >>>>> // some defaults for AMD family 15h >>>>> if ( cpu_family() == 0x15 ) { >>>>> @@ -1109,11 +1125,43 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + UseXMMForArrayCopy = true; >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>> + UseUnalignedLoadStores = true; >>>>> + } >>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>> + UseBMI2Instructions = true; >>>>> + } >>>>> + if (MaxVectorSize > 32) { >>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> + } >>>>> + if (UseSHA) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } else if (UseSHA512Intrinsics) { >>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>> functions not available on this CPU."); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2()) { >>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -505,6 +505,14 @@ >>>>> result |= CPU_CLMUL; >>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>> result |= CPU_RTM; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> >>>>> // AMD features. >>>>> if (is_amd()) { >>>>> @@ -515,19 +523,13 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> - result |= CPU_ADX; >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> - result |= CPU_BMI2; >>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> - result |= CPU_SHA; >>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>> result |= CPU_LZCNT; >>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> - result |= CPU_FMA; >>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>> support for prefetchw >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>> result |= CPU_3DNOW_PREFETCH; >>>>> >>>>> ************************************************************** >>>>> >>>>> Thanks, >>>>> Rohit >>>>> >>>>>>> >>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Rohit, >>>>>>>>> >>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>>> sets >>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>>> with >>>>>>>>>> the commit process. >>>>>>>>>> >>>>>>>>>> Webrev: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>>> OpenJDK >>>>>>>>> infrastructure and ... >>>>>>>>> >>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ... unfortunately patches tend to get stripped by the mail servers. >>>>>>>>> If >>>>>>>>> the >>>>>>>>> patch is small please include it inline. Otherwise you will need to >>>>>>>>> find >>>>>>>>> an >>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>> >>>>>>>> >>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>>> didnt find any regressions. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>> testing >>>>>>>>> requirements. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks David, >>>>>>>> Yes, it's a small patch. >>>>>>>> >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>> } >>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>> } >>>>>>>> + if (supports_sha()) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>> + } >>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>>>> UseSHA512Intrinsics) { >>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>> + } >>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } >>>>>>>> >>>>>>>> // some defaults for AMD family 15h >>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>> } >>>>>>>> >>>>>>>> #ifdef COMPILER2 >>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>> } >>>>>>>> #endif // COMPILER2 >>>>>>>> + >>>>>>>> + // Some defaults for AMD family 17h >>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>> for >>>>>>>> Array Copy >>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>> + } >>>>>>>> + if (supports_sse2() && >>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>> { >>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>> + } >>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>> + UseBMI2Instructions = true; >>>>>>>> + } >>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>> + } >>>>>>>> + if (UseSHA) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>> functions not available on this CPU."); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +#ifdef COMPILER2 >>>>>>>> + if (supports_sse4_2()) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +#endif >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>> result |= CPU_LZCNT; >>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>> result |= CPU_SSE4A; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> + result |= CPU_BMI2; >>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>> + result |= CPU_HT; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> + result |= CPU_ADX; >>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> + result |= CPU_SHA; >>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> + result |= CPU_FMA; >>>>>>>> } >>>>>>>> // Intel features. >>>>>>>> if(is_intel()) { >>>>>>>> >>>>>>>> Regards, >>>>>>>> Rohit >>>>>>>> >>>>>>> >>>> >> From rohitarulraj at gmail.com Tue Sep 5 05:29:08 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Tue, 5 Sep 2017 10:59:08 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> Message-ID: Hello David, On Tue, Sep 5, 2017 at 10:31 AM, David Holmes wrote: > Hi Rohit, > > I was unable to apply your patch to latest jdk10/hs/hotspot repo. > I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] without any issues. Can you share the error message that you are getting? Regards, Rohit > > > On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >> >> Hello Vladimir, >> >> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >> wrote: >>> >>> Hi Rohit, >>> >>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>> >>>> >>>> Hello Vladimir, >>>> >>>>> Changes look good. Only question I have is about MaxVectorSize. It is >>>>> set >>>>>> >>>>>> >>>>> 16 only in presence of AVX: >>>>> >>>>> >>>>> >>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>> >>>>> Does that code works for AMD 17h too? >>>> >>>> >>>> >>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>> have updated, re-tested and attached the patch. >>> >>> >>> >>> Which check you removed? >>> >> >> My older patch had the below mentioned check which was required on >> JDK9 where the default MaxVectorSize was 64. It has been handled >> better in openJDK10. So this check is not required anymore. >> >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> ... >> ... >> + if (MaxVectorSize > 32) { >> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >> + } >> .. >> .. >> + } >> >>>> >>>> I have one query regarding the setting of UseSHA flag: >>>> >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>> >>>> AMD 17h has support for SHA. >>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>> enabled for it based on the availability of BMI2 and AVX2. Is there an >>>> underlying reason for this? I have handled this in the patch but just >>>> wanted to confirm. >>> >>> >>> >>> It was done with next changes which use only AVX2 and BMI2 instructions >>> to >>> calculate SHA-256: >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>> >>> I don't know if AMD 15h supports these instructions and can execute that >>> code. You need to test it. >>> >> >> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >> it should work. >> Confirmed by running following sanity tests: >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >> >> So I have removed those SHA checks from my patch too. >> >> Please find attached updated, re-tested patch. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1109,11 +1109,27 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -505,6 +505,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -515,19 +523,13 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> >> Please let me know your comments. >> >> Thanks for your time. >> Rohit >> >>>> >>>> Thanks for taking time to review the code. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1088,6 +1088,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1109,11 +1125,40 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -505,6 +505,14 @@ >>>> result |= CPU_CLMUL; >>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> result |= CPU_RTM; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> >>>> // AMD features. >>>> if (is_amd()) { >>>> @@ -515,19 +523,13 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> - result |= CPU_ADX; >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> - result |= CPU_BMI2; >>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> - result |= CPU_SHA; >>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> result |= CPU_LZCNT; >>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> - result |= CPU_FMA; >>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> result |= CPU_3DNOW_PREFETCH; >>>> >>>> >>>> Regards, >>>> Rohit >>>> >>>> >>>> >>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Rohit, >>>>>>>> >>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>>>> logic >>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>> >>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>>>> resubmit for review. >>>>>>> >>>>>>> Regards, >>>>>>> Rohit >>>>>>> >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>> default) and didnt find any regressions. >>>>>> >>>>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>> >>>>>> ************************* Patch **************************** >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -1088,6 +1088,22 @@ >>>>>> } >>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>> } >>>>>> + if (supports_sha()) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>> + } >>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>> UseSHA512Intrinsics) { >>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>> + } >>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> >>>>>> // some defaults for AMD family 15h >>>>>> if ( cpu_family() == 0x15 ) { >>>>>> @@ -1109,11 +1125,43 @@ >>>>>> } >>>>>> >>>>>> #ifdef COMPILER2 >>>>>> - if (MaxVectorSize > 16) { >>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> } >>>>>> #endif // COMPILER2 >>>>>> + >>>>>> + // Some defaults for AMD family 17h >>>>>> + if ( cpu_family() == 0x17 ) { >>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>> Array Copy >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> + UseXMMForArrayCopy = true; >>>>>> + } >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>> { >>>>>> + UseUnalignedLoadStores = true; >>>>>> + } >>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>> + UseBMI2Instructions = true; >>>>>> + } >>>>>> + if (MaxVectorSize > 32) { >>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>> + } >>>>>> + if (UseSHA) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } else if (UseSHA512Intrinsics) { >>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>> functions not available on this CPU."); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> + } >>>>>> +#ifdef COMPILER2 >>>>>> + if (supports_sse4_2()) { >>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> + } >>>>>> + } >>>>>> +#endif >>>>>> + } >>>>>> } >>>>>> >>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -505,6 +505,14 @@ >>>>>> result |= CPU_CLMUL; >>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>> result |= CPU_RTM; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> + result |= CPU_ADX; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> + result |= CPU_BMI2; >>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> + result |= CPU_SHA; >>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> + result |= CPU_FMA; >>>>>> >>>>>> // AMD features. >>>>>> if (is_amd()) { >>>>>> @@ -515,19 +523,13 @@ >>>>>> result |= CPU_LZCNT; >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>> result |= CPU_SSE4A; >>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>> + result |= CPU_HT; >>>>>> } >>>>>> // Intel features. >>>>>> if(is_intel()) { >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> - result |= CPU_ADX; >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> - result |= CPU_BMI2; >>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> - result |= CPU_SHA; >>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>> result |= CPU_LZCNT; >>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> - result |= CPU_FMA; >>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>> support for prefetchw >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>> >>>>>> ************************************************************** >>>>>> >>>>>> Thanks, >>>>>> Rohit >>>>>> >>>>>>>> >>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Rohit, >>>>>>>>>> >>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>>>> sets >>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>>>> with >>>>>>>>>>> the commit process. >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>>>> OpenJDK >>>>>>>>>> infrastructure and ... >>>>>>>>>> >>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>> servers. >>>>>>>>>> If >>>>>>>>>> the >>>>>>>>>> patch is small please include it inline. Otherwise you will need >>>>>>>>>> to >>>>>>>>>> find >>>>>>>>>> an >>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>> >>>>>>>>> >>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>>>> didnt find any regressions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>>> testing >>>>>>>>>> requirements. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks David, >>>>>>>>> Yes, it's a small patch. >>>>>>>>> >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>> } >>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>> } >>>>>>>>> + if (supports_sha()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>> + } >>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>> || >>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>>> + } >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> >>>>>>>>> // some defaults for AMD family 15h >>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>> } >>>>>>>>> >>>>>>>>> #ifdef COMPILER2 >>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>> } >>>>>>>>> #endif // COMPILER2 >>>>>>>>> + >>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>> for >>>>>>>>> Array Copy >>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>> { >>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>> + } >>>>>>>>> + if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>> { >>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>> + } >>>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>> { >>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>> + } >>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>> + } >>>>>>>>> + if (UseSHA) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>> functions not available on this CPU."); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#ifdef COMPILER2 >>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#endif >>>>>>>>> + } >>>>>>>>> } >>>>>>>>> >>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>> result |= CPU_SSE4A; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> + result |= CPU_BMI2; >>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>> + result |= CPU_HT; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> + result |= CPU_ADX; >>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> + result |= CPU_SHA; >>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> + result |= CPU_FMA; >>>>>>>>> } >>>>>>>>> // Intel features. >>>>>>>>> if(is_intel()) { >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>> >>>>> >>> > From david.holmes at oracle.com Tue Sep 5 05:43:03 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 5 Sep 2017 15:43:03 +1000 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> Message-ID: <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: > Hello David, > > On Tue, Sep 5, 2017 at 10:31 AM, David Holmes wrote: >> Hi Rohit, >> >> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >> > > I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] > and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] > without any issues. > Can you share the error message that you are getting? I was getting this: applying hotspot.patch patching file src/cpu/x86/vm/vm_version_x86.cpp Hunk #1 FAILED at 1108 1 out of 1 hunks FAILED -- saving rejects to file src/cpu/x86/vm/vm_version_x86.cpp.rej patching file src/cpu/x86/vm/vm_version_x86.hpp Hunk #2 FAILED at 522 1 out of 2 hunks FAILED -- saving rejects to file src/cpu/x86/vm/vm_version_x86.hpp.rej abort: patch failed to apply but I started again and this time it applied fine, so not sure what was going on there. Cheers, David > Regards, > Rohit > > >> >> >> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>> >>> Hello Vladimir, >>> >>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>> wrote: >>>> >>>> Hi Rohit, >>>> >>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> Hello Vladimir, >>>>> >>>>>> Changes look good. Only question I have is about MaxVectorSize. It is >>>>>> set >>>>>>> >>>>>>> >>>>>> 16 only in presence of AVX: >>>>>> >>>>>> >>>>>> >>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>> >>>>>> Does that code works for AMD 17h too? >>>>> >>>>> >>>>> >>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >>>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>>> have updated, re-tested and attached the patch. >>>> >>>> >>>> >>>> Which check you removed? >>>> >>> >>> My older patch had the below mentioned check which was required on >>> JDK9 where the default MaxVectorSize was 64. It has been handled >>> better in openJDK10. So this check is not required anymore. >>> >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> ... >>> ... >>> + if (MaxVectorSize > 32) { >>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>> + } >>> .. >>> .. >>> + } >>> >>>>> >>>>> I have one query regarding the setting of UseSHA flag: >>>>> >>>>> >>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>> >>>>> AMD 17h has support for SHA. >>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>> enabled for it based on the availability of BMI2 and AVX2. Is there an >>>>> underlying reason for this? I have handled this in the patch but just >>>>> wanted to confirm. >>>> >>>> >>>> >>>> It was done with next changes which use only AVX2 and BMI2 instructions >>>> to >>>> calculate SHA-256: >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>> >>>> I don't know if AMD 15h supports these instructions and can execute that >>>> code. You need to test it. >>>> >>> >>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>> it should work. >>> Confirmed by running following sanity tests: >>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>> >>> So I have removed those SHA checks from my patch too. >>> >>> Please find attached updated, re-tested patch. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1109,11 +1109,27 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -505,6 +505,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -515,19 +523,13 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> >>> Please let me know your comments. >>> >>> Thanks for your time. >>> Rohit >>> >>>>> >>>>> Thanks for taking time to review the code. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1088,6 +1088,22 @@ >>>>> } >>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>> } >>>>> + if (supports_sha()) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>> + } >>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>> UseSHA512Intrinsics) { >>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + warning("SHA instructions are not available on this CPU"); >>>>> + } >>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> >>>>> // some defaults for AMD family 15h >>>>> if ( cpu_family() == 0x15 ) { >>>>> @@ -1109,11 +1125,40 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>> + } >>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>> + } >>>>> + if (UseSHA) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } else if (UseSHA512Intrinsics) { >>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>> functions not available on this CPU."); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2()) { >>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -505,6 +505,14 @@ >>>>> result |= CPU_CLMUL; >>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>> result |= CPU_RTM; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> >>>>> // AMD features. >>>>> if (is_amd()) { >>>>> @@ -515,19 +523,13 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> - result |= CPU_ADX; >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> - result |= CPU_BMI2; >>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> - result |= CPU_SHA; >>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>> result |= CPU_LZCNT; >>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> - result |= CPU_FMA; >>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>> support for prefetchw >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>> result |= CPU_3DNOW_PREFETCH; >>>>> >>>>> >>>>> Regards, >>>>> Rohit >>>>> >>>>> >>>>> >>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Rohit, >>>>>>>>> >>>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>>>>> logic >>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>> >>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>>>>> resubmit for review. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Rohit >>>>>>>> >>>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>> default) and didnt find any regressions. >>>>>>> >>>>>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>> >>>>>>> ************************* Patch **************************** >>>>>>> >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>> } >>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>> } >>>>>>> + if (supports_sha()) { >>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>> + } >>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>>> UseSHA512Intrinsics) { >>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>> + } >>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } >>>>>>> >>>>>>> // some defaults for AMD family 15h >>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>> } >>>>>>> >>>>>>> #ifdef COMPILER2 >>>>>>> - if (MaxVectorSize > 16) { >>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>> } >>>>>>> #endif // COMPILER2 >>>>>>> + >>>>>>> + // Some defaults for AMD family 17h >>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>>> Array Copy >>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>> + UseXMMForArrayCopy = true; >>>>>>> + } >>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>> { >>>>>>> + UseUnalignedLoadStores = true; >>>>>>> + } >>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>> + UseBMI2Instructions = true; >>>>>>> + } >>>>>>> + if (MaxVectorSize > 32) { >>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>> + } >>>>>>> + if (UseSHA) { >>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>> functions not available on this CPU."); >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } >>>>>>> + } >>>>>>> +#ifdef COMPILER2 >>>>>>> + if (supports_sse4_2()) { >>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>> + } >>>>>>> + } >>>>>>> +#endif >>>>>>> + } >>>>>>> } >>>>>>> >>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> @@ -505,6 +505,14 @@ >>>>>>> result |= CPU_CLMUL; >>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>> result |= CPU_RTM; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> + result |= CPU_ADX; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> + result |= CPU_BMI2; >>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> + result |= CPU_SHA; >>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> + result |= CPU_FMA; >>>>>>> >>>>>>> // AMD features. >>>>>>> if (is_amd()) { >>>>>>> @@ -515,19 +523,13 @@ >>>>>>> result |= CPU_LZCNT; >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>> result |= CPU_SSE4A; >>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>> + result |= CPU_HT; >>>>>>> } >>>>>>> // Intel features. >>>>>>> if(is_intel()) { >>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> - result |= CPU_ADX; >>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> - result |= CPU_BMI2; >>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> - result |= CPU_SHA; >>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>> result |= CPU_LZCNT; >>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> - result |= CPU_FMA; >>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>> support for prefetchw >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>> >>>>>>> ************************************************************** >>>>>>> >>>>>>> Thanks, >>>>>>> Rohit >>>>>>> >>>>>>>>> >>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Rohit, >>>>>>>>>>> >>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>>>>> sets >>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>>>>> with >>>>>>>>>>>> the commit process. >>>>>>>>>>>> >>>>>>>>>>>> Webrev: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>>>>> OpenJDK >>>>>>>>>>> infrastructure and ... >>>>>>>>>>> >>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>> servers. >>>>>>>>>>> If >>>>>>>>>>> the >>>>>>>>>>> patch is small please include it inline. Otherwise you will need >>>>>>>>>>> to >>>>>>>>>>> find >>>>>>>>>>> an >>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>>>> testing >>>>>>>>>>> requirements. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks David, >>>>>>>>>> Yes, it's a small patch. >>>>>>>>>> >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>> } >>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>> } >>>>>>>>>> + if (supports_sha()) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>> + } >>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>>> || >>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>>>> + } >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>> } >>>>>>>>>> #endif // COMPILER2 >>>>>>>>>> + >>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>> for >>>>>>>>>> Array Copy >>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>> { >>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>> + } >>>>>>>>>> + if (supports_sse2() && >>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>> { >>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>> + } >>>>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>> { >>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>> + } >>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>> + } >>>>>>>>>> + if (UseSHA) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>>> functions not available on this CPU."); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> +#endif >>>>>>>>>> + } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>> + result |= CPU_HT; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>> } >>>>>>>>>> // Intel features. >>>>>>>>>> if(is_intel()) { >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> From david.holmes at oracle.com Tue Sep 5 06:02:44 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 5 Sep 2017 16:02:44 +1000 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> Message-ID: <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> Hi Rohit, I couldn't see a bug filed for this so I did it: https://bugs.openjdk.java.net/browse/JDK-8187219 I also hosted the webrev as I wanted to see the change in context: http://cr.openjdk.java.net/~dholmes/8187219/webrev/ I have a couple of comments/queries: src/cpu/x86/vm/vm_version_x86.hpp So this moved the adx/bmi2/sha/fam settings out from being Intel specific to applying to AMD as well - ok. Have these features always been available in AMD chips? Just wondering if they might not be valid for some older processors. You added: 526 if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) 527 result |= CPU_HT; and I'm wondering of there would be any case where this would not be covered by the earlier: 448 if (threads_per_core() > 1) 449 result |= CPU_HT; ? --- src/cpu/x86/vm/vm_version_x86.cpp No comments on AMD specific changes. Thanks, David ----- On 5/09/2017 3:43 PM, David Holmes wrote: > On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >> Hello David, >> >> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >> wrote: >>> Hi Rohit, >>> >>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>> >> >> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >> without any issues. >> Can you share the error message that you are getting? > > I was getting this: > > applying hotspot.patch > patching file src/cpu/x86/vm/vm_version_x86.cpp > Hunk #1 FAILED at 1108 > 1 out of 1 hunks FAILED -- saving rejects to file > src/cpu/x86/vm/vm_version_x86.cpp.rej > patching file src/cpu/x86/vm/vm_version_x86.hpp > Hunk #2 FAILED at 522 > 1 out of 2 hunks FAILED -- saving rejects to file > src/cpu/x86/vm/vm_version_x86.hpp.rej > abort: patch failed to apply > > but I started again and this time it applied fine, so not sure what was > going on there. > > Cheers, > David > >> Regards, >> Rohit >> >> >>> >>> >>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>> >>>> Hello Vladimir, >>>> >>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>> wrote: >>>>> >>>>> Hi Rohit, >>>>> >>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> Hello Vladimir, >>>>>> >>>>>>> Changes look good. Only question I have is about MaxVectorSize. >>>>>>> It is >>>>>>> set >>>>>>>> >>>>>>>> >>>>>>> 16 only in presence of AVX: >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>> >>>>>>> >>>>>>> Does that code works for AMD 17h too? >>>>>> >>>>>> >>>>>> >>>>>> Thanks for pointing that out. Yes, the code works fine for AMD >>>>>> 17h. So >>>>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>>>> have updated, re-tested and attached the patch. >>>>> >>>>> >>>>> >>>>> Which check you removed? >>>>> >>>> >>>> My older patch had the below mentioned check which was required on >>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>> better in openJDK10. So this check is not required anymore. >>>> >>>> +??? // Some defaults for AMD family 17h >>>> +??? if ( cpu_family() == 0x17 ) { >>>> ... >>>> ... >>>> +????? if (MaxVectorSize > 32) { >>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>> +????? } >>>> .. >>>> .. >>>> +????? } >>>> >>>>>> >>>>>> I have one query regarding the setting of UseSHA flag: >>>>>> >>>>>> >>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>> >>>>>> >>>>>> AMD 17h has support for SHA. >>>>>> AMD 15h doesn't have? support for SHA. Still "UseSHA" flag gets >>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>> there an >>>>>> underlying reason for this? I have handled this in the patch but just >>>>>> wanted to confirm. >>>>> >>>>> >>>>> >>>>> It was done with next changes which use only AVX2 and BMI2 >>>>> instructions >>>>> to >>>>> calculate SHA-256: >>>>> >>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>> >>>>> I don't know if AMD 15h supports these instructions and can execute >>>>> that >>>>> code. You need to test it. >>>>> >>>> >>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>> it should work. >>>> Confirmed by running following sanity tests: >>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>> >>>> So I have removed those SHA checks from my patch too. >>>> >>>> Please find attached updated, re-tested patch. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1109,11 +1109,27 @@ >>>> ?????? } >>>> >>>> ?? #ifdef COMPILER2 >>>> -??? if (MaxVectorSize > 16) { >>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> ???????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> ?????? } >>>> ?? #endif // COMPILER2 >>>> + >>>> +??? // Some defaults for AMD family 17h >>>> +??? if ( cpu_family() == 0x17 ) { >>>> +????? // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> +????? if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>> +????? } >>>> +????? if (supports_sse2() && >>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>> +????? } >>>> +#ifdef COMPILER2 >>>> +????? if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> +??????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> +????? } >>>> +#endif >>>> +??? } >>>> ???? } >>>> >>>> ???? if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -505,6 +505,14 @@ >>>> ???????? result |= CPU_CLMUL; >>>> ?????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> ???????? result |= CPU_RTM; >>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> +?????? result |= CPU_ADX; >>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> +????? result |= CPU_BMI2; >>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> +????? result |= CPU_SHA; >>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> +????? result |= CPU_FMA; >>>> >>>> ?????? // AMD features. >>>> ?????? if (is_amd()) { >>>> @@ -515,19 +523,13 @@ >>>> ?????????? result |= CPU_LZCNT; >>>> ???????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> ?????????? result |= CPU_SSE4A; >>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> +??????? result |= CPU_HT; >>>> ?????? } >>>> ?????? // Intel features. >>>> ?????? if(is_intel()) { >>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> -???????? result |= CPU_ADX; >>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> -??????? result |= CPU_BMI2; >>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> -??????? result |= CPU_SHA; >>>> ???????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> ?????????? result |= CPU_LZCNT; >>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> -??????? result |= CPU_FMA; >>>> ???????? // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> ???????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> ?????????? result |= CPU_3DNOW_PREFETCH; >>>> >>>> Please let me know your comments. >>>> >>>> Thanks for your time. >>>> Rohit >>>> >>>>>> >>>>>> Thanks for taking time to review the code. >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -1088,6 +1088,22 @@ >>>>>> ????????? } >>>>>> ????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>> ??????? } >>>>>> +??? if (supports_sha()) { >>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>> +????? } >>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>> UseSHA512Intrinsics) { >>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> +??????? warning("SHA instructions are not available on this CPU"); >>>>>> +????? } >>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> +??? } >>>>>> >>>>>> ??????? // some defaults for AMD family 15h >>>>>> ??????? if ( cpu_family() == 0x15 ) { >>>>>> @@ -1109,11 +1125,40 @@ >>>>>> ??????? } >>>>>> >>>>>> ??? #ifdef COMPILER2 >>>>>> -??? if (MaxVectorSize > 16) { >>>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> ????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> ??????? } >>>>>> ??? #endif // COMPILER2 >>>>>> + >>>>>> +??? // Some defaults for AMD family 17h >>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>> +????? // On family 17h processors use XMM and UnalignedLoadStores >>>>>> for >>>>>> Array Copy >>>>>> +????? if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>> +????? } >>>>>> +????? if (supports_sse2() && >>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>> +????? } >>>>>> +????? if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>> +??????? FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>> +????? } >>>>>> +????? if (UseSHA) { >>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>> +????????? warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>> functions not available on this CPU."); >>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> +??????? } >>>>>> +????? } >>>>>> +#ifdef COMPILER2 >>>>>> +????? if (supports_sse4_2()) { >>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> +??????? } >>>>>> +????? } >>>>>> +#endif >>>>>> +??? } >>>>>> ????? } >>>>>> >>>>>> ????? if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -505,6 +505,14 @@ >>>>>> ????????? result |= CPU_CLMUL; >>>>>> ??????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>> ????????? result |= CPU_RTM; >>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> +?????? result |= CPU_ADX; >>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> +????? result |= CPU_BMI2; >>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> +????? result |= CPU_SHA; >>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> +????? result |= CPU_FMA; >>>>>> >>>>>> ??????? // AMD features. >>>>>> ??????? if (is_amd()) { >>>>>> @@ -515,19 +523,13 @@ >>>>>> ??????????? result |= CPU_LZCNT; >>>>>> ????????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>> ??????????? result |= CPU_SSE4A; >>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>> +??????? result |= CPU_HT; >>>>>> ??????? } >>>>>> ??????? // Intel features. >>>>>> ??????? if(is_intel()) { >>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> -???????? result |= CPU_ADX; >>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> -??????? result |= CPU_BMI2; >>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> -??????? result |= CPU_SHA; >>>>>> ????????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>> ??????????? result |= CPU_LZCNT; >>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> -??????? result |= CPU_FMA; >>>>>> ????????? // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>> support for prefetchw >>>>>> ????????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>> ??????????? result |= CPU_3DNOW_PREFETCH; >>>>>> >>>>>> >>>>>> Regards, >>>>>> Rohit >>>>>> >>>>>> >>>>>> >>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Rohit, >>>>>>>>>> >>>>>>>>>> I think the patch needs updating for jdk10 as I already see a >>>>>>>>>> lot of >>>>>>>>>> logic >>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, >>>>>>>>> test and >>>>>>>>> resubmit for review. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>> default) and didnt find any regressions. >>>>>>>> >>>>>>>> Can anyone please volunteer to review this patch? which sets >>>>>>>> flag/ISA >>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>> >>>>>>>> ************************* Patch **************************** >>>>>>>> >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>> ?????????? } >>>>>>>> ?????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>> ???????? } >>>>>>>> +??? if (supports_sha()) { >>>>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>> +????? } >>>>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>> UseSHA256Intrinsics || >>>>>>>> UseSHA512Intrinsics) { >>>>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> +??????? warning("SHA instructions are not available on this CPU"); >>>>>>>> +????? } >>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> +??? } >>>>>>>> >>>>>>>> ???????? // some defaults for AMD family 15h >>>>>>>> ???????? if ( cpu_family() == 0x15 ) { >>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>> ???????? } >>>>>>>> >>>>>>>> ???? #ifdef COMPILER2 >>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>> ?????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>> ???????? } >>>>>>>> ???? #endif // COMPILER2 >>>>>>>> + >>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>> +????? // On family 17h processors use XMM and >>>>>>>> UnalignedLoadStores for >>>>>>>> Array Copy >>>>>>>> +????? if (supports_sse2() && >>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>> +??????? UseXMMForArrayCopy = true; >>>>>>>> +????? } >>>>>>>> +????? if (supports_sse2() && >>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>> { >>>>>>>> +??????? UseUnalignedLoadStores = true; >>>>>>>> +????? } >>>>>>>> +????? if (supports_bmi2() && >>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>> +??????? UseBMI2Instructions = true; >>>>>>>> +????? } >>>>>>>> +????? if (MaxVectorSize > 32) { >>>>>>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>> +????? } >>>>>>>> +????? if (UseSHA) { >>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>>>> +????????? warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>> functions not available on this CPU."); >>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> +??????? } >>>>>>>> +????? } >>>>>>>> +#ifdef COMPILER2 >>>>>>>> +????? if (supports_sse4_2()) { >>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>> +??????? } >>>>>>>> +????? } >>>>>>>> +#endif >>>>>>>> +??? } >>>>>>>> ?????? } >>>>>>>> >>>>>>>> ?????? if( is_intel() ) { // Intel cpus specific settings >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>> ?????????? result |= CPU_CLMUL; >>>>>>>> ???????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>> ?????????? result |= CPU_RTM; >>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> +?????? result |= CPU_ADX; >>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> +????? result |= CPU_BMI2; >>>>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> +????? result |= CPU_SHA; >>>>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> +????? result |= CPU_FMA; >>>>>>>> >>>>>>>> ???????? // AMD features. >>>>>>>> ???????? if (is_amd()) { >>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>> ???????????? result |= CPU_LZCNT; >>>>>>>> ?????????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>> ???????????? result |= CPU_SSE4A; >>>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>> +??????? result |= CPU_HT; >>>>>>>> ???????? } >>>>>>>> ???????? // Intel features. >>>>>>>> ???????? if(is_intel()) { >>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> -???????? result |= CPU_ADX; >>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> -??????? result |= CPU_BMI2; >>>>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> -??????? result |= CPU_SHA; >>>>>>>> ?????????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>> ???????????? result |= CPU_LZCNT; >>>>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> -??????? result |= CPU_FMA; >>>>>>>> ?????????? // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>> support for prefetchw >>>>>>>> ?????????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>> ???????????? result |= CPU_3DNOW_PREFETCH; >>>>>>>> >>>>>>>> ************************************************************** >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Rohit >>>>>>>> >>>>>>>>>> >>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>> >>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>> which >>>>>>>>>>>>> sets >>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and >>>>>>>>>>>>> help us >>>>>>>>>>>>> with >>>>>>>>>>>>> the commit process. >>>>>>>>>>>>> >>>>>>>>>>>>> Webrev: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>> outside the >>>>>>>>>>>> OpenJDK >>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>> >>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>> servers. >>>>>>>>>>>> If >>>>>>>>>>>> the >>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>> need >>>>>>>>>>>> to >>>>>>>>>>>> find >>>>>>>>>>>> an >>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>> default) and >>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>>>>> testing >>>>>>>>>>>> requirements. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks David, >>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>> >>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>> ??????????? } >>>>>>>>>>> ??????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>> ????????? } >>>>>>>>>>> +??? if (supports_sha()) { >>>>>>>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>> +????? } >>>>>>>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>> || >>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>> +??????? warning("SHA instructions are not available on this >>>>>>>>>>> CPU"); >>>>>>>>>>> +????? } >>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> +??? } >>>>>>>>>>> >>>>>>>>>>> ????????? // some defaults for AMD family 15h >>>>>>>>>>> ????????? if ( cpu_family() == 0x15 ) { >>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>> ????????? } >>>>>>>>>>> >>>>>>>>>>> ????? #ifdef COMPILER2 >>>>>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>> ??????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>> ????????? } >>>>>>>>>>> ????? #endif // COMPILER2 >>>>>>>>>>> + >>>>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>>>> +????? // On family 17h processors use XMM and >>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>> for >>>>>>>>>>> Array Copy >>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>> { >>>>>>>>>>> +??????? UseXMMForArrayCopy = true; >>>>>>>>>>> +????? } >>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>> { >>>>>>>>>>> +??????? UseUnalignedLoadStores = true; >>>>>>>>>>> +????? } >>>>>>>>>>> +????? if (supports_bmi2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>> { >>>>>>>>>>> +??????? UseBMI2Instructions = true; >>>>>>>>>>> +????? } >>>>>>>>>>> +????? if (MaxVectorSize > 32) { >>>>>>>>>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>> +????? } >>>>>>>>>>> +????? if (UseSHA) { >>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>>>>>>> +????????? warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>> hash >>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> +??????? } >>>>>>>>>>> +????? } >>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>> +????? if (supports_sse4_2()) { >>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>> +??????? } >>>>>>>>>>> +????? } >>>>>>>>>>> +#endif >>>>>>>>>>> +??? } >>>>>>>>>>> ??????? } >>>>>>>>>>> >>>>>>>>>>> ??????? if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>> ????????????? result |= CPU_LZCNT; >>>>>>>>>>> ??????????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>> ????????????? result |= CPU_SSE4A; >>>>>>>>>>> +????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>> +??????? result |= CPU_BMI2; >>>>>>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>> +??????? result |= CPU_HT; >>>>>>>>>>> +????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>> +??????? result |= CPU_ADX; >>>>>>>>>>> +????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>> +??????? result |= CPU_SHA; >>>>>>>>>>> +????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>> +??????? result |= CPU_FMA; >>>>>>>>>>> ????????? } >>>>>>>>>>> ????????? // Intel features. >>>>>>>>>>> ????????? if(is_intel()) { >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Rohit >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>> From rohitarulraj at gmail.com Tue Sep 5 07:28:08 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Tue, 5 Sep 2017 12:58:08 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> Message-ID: Hello David, Thanks for taking time to review the patch. On Tue, Sep 5, 2017 at 11:32 AM, David Holmes wrote: > > src/cpu/x86/vm/vm_version_x86.hpp > > So this moved the adx/bmi2/sha/fam settings out from being Intel specific to > applying to AMD as well - ok. Have these features always been available in > AMD chips? Just wondering if they might not be valid for some older > processors. ADX,SHA - Support started from AMD17h EPYC. BMI2 - Support started from AMD15h Excavator on-wards. FMA - Support started from AMD15h Piledriver on-wards. For processors not having these feature support, the CPUID bits would be reserved. Since the existing code in Intel was generic and not under any specific CPU family and CPUID bits were the same, I had made these changes generic. > You added: > > 526 if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > 527 result |= CPU_HT; > > and I'm wondering of there would be any case where this would not be covered > by the earlier: > > 448 if (threads_per_core() > 1) > 449 result |= CPU_HT; > > ? The CPU_HT bit was not getting set with AMD17h. We get threads_per_core() as 1, even with HT enabled. I didn't want to disturb the existing check, hence set it separately. But this check of mine seems to be incorrect as this bit just check if HT is available and not if it is enabled/disabled. I will work on setting this HT bit separately. So for now, I will remove this check, update the patch, test and send it for review again. Thanks, Rohit > > > On 5/09/2017 3:43 PM, David Holmes wrote: >> >> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>> >>> Hello David, >>> >>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>> wrote: >>>> >>>> Hi Rohit, >>>> >>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>> >>> >>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>> without any issues. >>> Can you share the error message that you are getting? >> >> >> I was getting this: >> >> applying hotspot.patch >> patching file src/cpu/x86/vm/vm_version_x86.cpp >> Hunk #1 FAILED at 1108 >> 1 out of 1 hunks FAILED -- saving rejects to file >> src/cpu/x86/vm/vm_version_x86.cpp.rej >> patching file src/cpu/x86/vm/vm_version_x86.hpp >> Hunk #2 FAILED at 522 >> 1 out of 2 hunks FAILED -- saving rejects to file >> src/cpu/x86/vm/vm_version_x86.hpp.rej >> abort: patch failed to apply >> >> but I started again and this time it applied fine, so not sure what was >> going on there. >> >> Cheers, >> David >> >>> Regards, >>> Rohit >>> >>> >>>> >>>> >>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> Hello Vladimir, >>>>> >>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hello Vladimir, >>>>>>> >>>>>>>> Changes look good. Only question I have is about MaxVectorSize. It >>>>>>>> is >>>>>>>> set >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> 16 only in presence of AVX: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>> >>>>>>>> Does that code works for AMD 17h too? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. >>>>>>> So >>>>>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>>>>> have updated, re-tested and attached the patch. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Which check you removed? >>>>>> >>>>> >>>>> My older patch had the below mentioned check which was required on >>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>> better in openJDK10. So this check is not required anymore. >>>>> >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> ... >>>>> ... >>>>> + if (MaxVectorSize > 32) { >>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> + } >>>>> .. >>>>> .. >>>>> + } >>>>> >>>>>>> >>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>> >>>>>>> AMD 17h has support for SHA. >>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is there >>>>>>> an >>>>>>> underlying reason for this? I have handled this in the patch but just >>>>>>> wanted to confirm. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>> instructions >>>>>> to >>>>>> calculate SHA-256: >>>>>> >>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>> >>>>>> I don't know if AMD 15h supports these instructions and can execute >>>>>> that >>>>>> code. You need to test it. >>>>>> >>>>> >>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>>> it should work. >>>>> Confirmed by running following sanity tests: >>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>> >>>>> So I have removed those SHA checks from my patch too. >>>>> >>>>> Please find attached updated, re-tested patch. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1109,11 +1109,27 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>> { >>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -505,6 +505,14 @@ >>>>> result |= CPU_CLMUL; >>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>> result |= CPU_RTM; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> >>>>> // AMD features. >>>>> if (is_amd()) { >>>>> @@ -515,19 +523,13 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> - result |= CPU_ADX; >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> - result |= CPU_BMI2; >>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> - result |= CPU_SHA; >>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>> result |= CPU_LZCNT; >>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> - result |= CPU_FMA; >>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>> support for prefetchw >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>> result |= CPU_3DNOW_PREFETCH; >>>>> >>>>> Please let me know your comments. >>>>> >>>>> Thanks for your time. >>>>> Rohit >>>>> >>>>>>> >>>>>>> Thanks for taking time to review the code. >>>>>>> >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>> } >>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>> } >>>>>>> + if (supports_sha()) { >>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>> + } >>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>>> UseSHA512Intrinsics) { >>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>> + } >>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } >>>>>>> >>>>>>> // some defaults for AMD family 15h >>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>> } >>>>>>> >>>>>>> #ifdef COMPILER2 >>>>>>> - if (MaxVectorSize > 16) { >>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>> } >>>>>>> #endif // COMPILER2 >>>>>>> + >>>>>>> + // Some defaults for AMD family 17h >>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>> for >>>>>>> Array Copy >>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>> + } >>>>>>> + if (supports_sse2() && >>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>> + } >>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>> + } >>>>>>> + if (UseSHA) { >>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>> functions not available on this CPU."); >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } >>>>>>> + } >>>>>>> +#ifdef COMPILER2 >>>>>>> + if (supports_sse4_2()) { >>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>> + } >>>>>>> + } >>>>>>> +#endif >>>>>>> + } >>>>>>> } >>>>>>> >>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> @@ -505,6 +505,14 @@ >>>>>>> result |= CPU_CLMUL; >>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>> result |= CPU_RTM; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> + result |= CPU_ADX; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> + result |= CPU_BMI2; >>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> + result |= CPU_SHA; >>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> + result |= CPU_FMA; >>>>>>> >>>>>>> // AMD features. >>>>>>> if (is_amd()) { >>>>>>> @@ -515,19 +523,13 @@ >>>>>>> result |= CPU_LZCNT; >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>> result |= CPU_SSE4A; >>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>> + result |= CPU_HT; >>>>>>> } >>>>>>> // Intel features. >>>>>>> if(is_intel()) { >>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> - result |= CPU_ADX; >>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> - result |= CPU_BMI2; >>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> - result |= CPU_SHA; >>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>> result |= CPU_LZCNT; >>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> - result |= CPU_FMA; >>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>> support for prefetchw >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Rohit >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Rohit, >>>>>>>>>>> >>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a lot >>>>>>>>>>> of >>>>>>>>>>> logic >>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test >>>>>>>>>> and >>>>>>>>>> resubmit for review. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>>> default) and didnt find any regressions. >>>>>>>>> >>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>> flag/ISA >>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>> >>>>>>>>> ************************* Patch **************************** >>>>>>>>> >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>> } >>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>> } >>>>>>>>> + if (supports_sha()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>> + } >>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>> || >>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>>> + } >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> >>>>>>>>> // some defaults for AMD family 15h >>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>> } >>>>>>>>> >>>>>>>>> #ifdef COMPILER2 >>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>> } >>>>>>>>> #endif // COMPILER2 >>>>>>>>> + >>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>> for >>>>>>>>> Array Copy >>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>> { >>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>> + } >>>>>>>>> + if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>> { >>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>> + } >>>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>> { >>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>> + } >>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>> + } >>>>>>>>> + if (UseSHA) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>> functions not available on this CPU."); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#ifdef COMPILER2 >>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#endif >>>>>>>>> + } >>>>>>>>> } >>>>>>>>> >>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>> result |= CPU_CLMUL; >>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>> result |= CPU_RTM; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> + result |= CPU_ADX; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> + result |= CPU_BMI2; >>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> + result |= CPU_SHA; >>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> + result |= CPU_FMA; >>>>>>>>> >>>>>>>>> // AMD features. >>>>>>>>> if (is_amd()) { >>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>> result |= CPU_SSE4A; >>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>> + result |= CPU_HT; >>>>>>>>> } >>>>>>>>> // Intel features. >>>>>>>>> if(is_intel()) { >>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> - result |= CPU_ADX; >>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> - result |= CPU_BMI2; >>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> - result |= CPU_SHA; >>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> - result |= CPU_FMA; >>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>>> support for prefetchw >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>> >>>>>>>>> ************************************************************** >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>> >>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>>> which >>>>>>>>>>>>>> sets >>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help >>>>>>>>>>>>>> us >>>>>>>>>>>>>> with >>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately patches can not be accepted from systems outside >>>>>>>>>>>>> the >>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>> >>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>>> servers. >>>>>>>>>>>>> If >>>>>>>>>>>>> the >>>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>>> need >>>>>>>>>>>>> to >>>>>>>>>>>>> find >>>>>>>>>>>>> an >>>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) >>>>>>>>>>>>>> and >>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>>>>>> testing >>>>>>>>>>>>> requirements. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks David, >>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>> } >>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>> } >>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>> || >>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>> CPU"); >>>>>>>>>>>> + } >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>> } >>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>> + >>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>> for >>>>>>>>>>>> Array Copy >>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>> { >>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>> { >>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>> { >>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>> hash >>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +#endif >>>>>>>>>>>> + } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>> } >>>>>>>>>>>> // Intel features. >>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Rohit >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> > From erik.osterlund at oracle.com Tue Sep 5 08:04:52 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 5 Sep 2017 10:04:52 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <3a6fbae3-cddb-6ac0-890d-da4b33308b5e@redhat.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> <59AD55AC.4030105@oracle.com> <3a6fbae3-cddb-6ac0-890d-da4b33308b5e@redhat.com> Message-ID: <59AE5AA4.1070803@oracle.com> Hi Andrew, On 2017-09-04 18:05, Andrew Haley wrote: > Hi, > > On 04/09/17 14:31, Erik ?sterlund wrote: > >> On 2017-09-04 12:41, Andrew Haley wrote: >>> On 04/09/17 11:26, Erik ?sterlund wrote: >>>> 1) I want evidence for this claim. Can you get leading and trailing dmb >>>> sy (rather than dmb ish) for atomic operations on ARMv7? >>> I hope not. There is no reason for us to want such a thing in HotSpot. >>> But even if we did want such a thing, we could crop down to asm: the >>> point is the usual cases, not weird corner cases. >> So we can not emit any fencing we want with GCC intrinsics, let alone >> the fencing we already have and rely on today on ARMv7. > There are corner cases, for which asm can be used, yes. Yes, and unfortunately the corner cases is precisely what we want. >> The discussion about whether we should relax our ARMv7 fencing or >> not is a different discussion, and is unrelated to the claim that we >> can get any fencing we want with GCC intrinsics. > I accept the point in principle, but I suggest it's a bad example: I > do not believe that we want DMB SY. I am glad we at least agree about the point in principle. :) If you are willing to make a case for relaxing dmb sy, please feel free to do so in another RFE. Note however that the choice was all but accidental. It was a conscious choice. I would prefer not to derail more than we already have in this thread if that is okay. >> The point is that we can not control the >> fencing arbitrarily, let alone even get the fencing we have today. > Arbitrarily, no. But I guess you'd expect me to point out that > argument can be flipped on its head: if we'd used intrinsics rather > than asms the mistake of using DMB SY would have been averted. You > can look at this issue in (at least) two ways. :-) I accept the point that there is a possibility GCC will mess up but likewise also that we will mess up and GCC not. Please note however that this does not mean that using GCC intrinsics will allow us to blindly trust GCC. The fencing done in our runtime must be reflected in our JIT. So anything we have messed up in our fencing that is fixed in GCC but not in our code generation, would still end up biting us as long as we have a JIT that must be tightly bound to our runtime. Conversely, the problem might get worse if different versions of GCC have some fix and others do not. And clang that uses the same intrinsics might have subtly different behaviour. And our memory model itself is subtly different to their memory model, making it incompatible. For example, our atomics typically conservatively guarantees bidirectional full fencing, while theirs does not. > >>>> 2) Even if you could and the compiler happens to generate that - we can >>>> not rely on it because there is no contract to the compiler what fence >>>> instructions it elects to use. The only contract the compiler needs to >>>> abide to is how atomic C++ operations interact with other C++ >>>> operations. And we do not want the underlying fencing to silently change >>>> when performing compiler upgrades. >>> There is no way that GCC writers would break ABI compatibility in such a >>> fundamental way. There would be a firestorm. I know this because even >>> if no-one else started the fire, I would. I am a GCC author. >> Thank you for your reassurance. I appreciate that you take ABI >> compatibility seriously. Yet over the years, the bindings have changed >> over time as our understanding of implications of the memory model has >> evolved - especially when mixing stronger and weaker accesses on the >> same fields. > Absolutely so, yes, and IMVHO such code should be taken from HotSpot > and quietly put out of its misery. Even if the program is correct > it'll require a lot of analysis, and pity the poor programmer who > comes across it in a few years' time. Analysis that still has to be done. >> Even 2017, there are still papers published about how >> seq_cst mixed with weaker memory ordering needs fixing in the bindings >> (cf. "Repairing sequential consistency in C/C++11", PLDI'17), resulting >> in new bindings with both leading sync and trailing sync conventions >> being proposed (the choice of convention is up to compiler writers). > Sure, but there's no way that GCC (or any other serious compiler) is > going to make changes in a way that isn't at least compatible with > existing binaries. Power PC has its problems, mostly due to being > rather old, and I'm not at all surprised to hear that mistakes have > been made, given that the language used in the processor definition > and the language used in the C++ language standard don't map onto each > other in an obvious way. Perhaps. But it is nevertheless an abstraction crime to rely on that. A reliance I would personally prefer to avoid if possible (and it is). > But none of this extends to Linux/x86, which has a straightforward > implementation of all of this stuff. I must agree here. The x86 ISA does not leave a whole lot to the imagination of the implementors of the intrinsics. It's gonna be a locked atomic instruction of some sort. There is almost zero risk of incompatible ABIs. So that is an abstraction crime I would not object to if it is desired down the road. >> I do not feel confident we can rely on these bindings never >> changing. As there is no contract or explicit ABI, compiler writers >> are free to do whatever that is consistent within the boundaries of >> C++ code and the C++ memory model. The actual ABI is hidden from >> that contract. And I would not happily embed reliance on >> intentionally undocumented, implicit, unofficial ABIs that are known >> to have different fencing conventions that may or may not be >> compatible with what our generated code requires. Generating the >> code, disassembling, and then assuming whatever binding was observed >> in the disassembly is a binding contract, is not a reliable >> approach. > I suppose I can understand this difference in opinion because my view > of GCC is very different from yours: to me it's a white box, not a > black box, and I certainly wouldn't take the approach of just looking > at the generated code. I understand where you are coming from. The reason it is a black box to me though is primarily because the lack of explicit contracts with the compiler forces me to think of it as a black box, unless I want to perform an abstraction crime and start relying on unofficial implementation details, knowing there are incompatible ABI proposals floating around in papers to this date, for compiler writers to choose between. > >> If we require a specific fence, then I do not see why we would not >> simply emit this specific fence that we require explicitly, rather than >> insisting on using some intrinsic and hoping it will emit that exact >> fence that we rely on through some implicit, undocumented, unofficial >> ABI, that may silently change over time. I fail to see the attraction. > That one is easy: if you tell the compiler what you're doing rather > than hiding it inside an asm, the compiler can generate better code. > The resulting program is also much simpler. Okay. For mer personally, safety and correctness is more important than optimal code generation. But I see your point. Your intrinsic will (on x86) generate lock addl with immediate operands on invocations to Atomic::add with constant addends where we will emit lock xaddl, wasting a register with current bindings. For me that seems like an unimportant premature optimization, but I hear it has some attraction for some people (for reasons I do not understand). > Also, you avoid the risks inherent in writing inline asms: only > recently have the x86/Linux asms been corrected to add a memory > clobber. This is an extremely serious flaw, and it's been around for > a very long while. We're talking about risk, yet your risk is of a > rather theoretical nature, rather than that one which has already > happened. We can, of course, argue that we are where we are, and that > bug is fixed, so it no longer matters, but it does IMO point to where > the real risk in using inline asm lies. Point taken - the risk goes two ways. > However, having said all of that, let me be clear: while I do not > believe that the inline asms for each platform are the best way of > doing this, to change them at this point would be unduly disruptive. > I am not suggesting that they should be changed now. I am very > strongly suggesting that they should be changed in the future, and > that we should move to using intrinsics. Perhaps we can evaluate that on a case by case basis. As for x86, I am not as opposed to moving to GCC intrinsics. I think it seems like a premature optimization and abstraction crime, but I will not oppose it, because it can't do much harm in practice (as opposed to theory) I guess. As for this current RFE, the only voice I have heard against removing all Atomic::inc specializations is Kim. He thought it would be nice to keep the micro optimization that turned Atomic::inc bindings to lock add instead of lock xadd on x86. Perhaps instead we nuke Atomic::inc/dec specializations and replace Atomic::add with GCC intrinsics where it is available. As mentioned previously, it automatically detects it can do lock add instead for Atomic::inc, as well as a few more cases where Atomic::add is used with constants that can be embedded into immediate values and the result is not used. If we do that, is anyone unhappy with the idea of nuking Atomic::inc/dec specializations all together? Thanks, /Erik From thomas.stuefe at gmail.com Tue Sep 5 09:28:57 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 5 Sep 2017 11:28:57 +0200 Subject: RFR(xs): 8187028: [aix] Real thread stack size may be up to 64K smaller than requested one In-Reply-To: References: Message-ID: Hi Guys, New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8187028-aix-Real-thread-stack-size-may-be-up-to-64K-smaller-than-requested-one/webrev.02/webrev/ Nothing changed for the fix, But I added a fix to a whitebox test which broke due to the fix. runtime/whitebox/WBStackSize.java tests that the actual stack size a thread receives correlates closely to the requested stack size, and that test had to be loosened for AIX. Kind Regards, Thomas On Thu, Aug 31, 2017 at 12:08 PM, Thomas St?fe wrote: > Hi all, > > please review this change: > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8187028 > > change: > http://cr.openjdk.java.net/~stuefe/webrevs/8187028-aix- > Real-thread-stack-size-may-be-up-to-64K-smaller-than- > requested-one/webrev.00/webrev/ > > The issue is that on AIX, pthread library seems to have a bug where it > sometimes gives us less thread stack space than we requested (a variable > amount, but seems to be 0..64K). This may cause intermittent stack overflow > errors if the stacks are very small to begin with. > > The workaround is to add 64K to the requested stack size to account for > the fact that the OS may give us up to 64K less stack. > > Also, improved logging. > > Thanks, Thomas > From erik.osterlund at oracle.com Tue Sep 5 09:55:59 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 5 Sep 2017 11:55:59 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> Message-ID: <59AE74AF.70308@oracle.com> Hi Kim, On 2017-09-05 02:38, Kim Barrett wrote: >> On Sep 4, 2017, at 10:59 PM, David Holmes wrote: >> >> Hi Erik, >> >> On 4/09/2017 5:15 PM, Erik ?sterlund wrote: >>> Hi David, >>> On 2017-09-04 03:24, David Holmes wrote: >>>> Hi Erik, >>>> >>>> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>>>> Hi David, >>>>> >>>>> The shared structure for all operations is the following: >>>>> >>>>> An Atomic::something call creates a SomethingImpl function object that performs some basic type checking and then forwards the call straight to a PlatformSomething function object. This PlatformSomething object could decide to do anything. But to make life easier, it may inherit from a shared SomethingHelper function object with CRTP that calls back into the PlatformSomething function object to emit inline assembly. >>>> Right, but! Lets look at some details. >>>> >>>> Atomic::add >>>> AddImpl >>>> PlatformAdd >>>> FetchAndAdd >>>> AddAndFetch >>>> add_using_helper >>>> >>>> Atomic::cmpxchg >>>> CmpxchgImpl >>>> PlatformCmpxchg >>>> cmpxchg_using_helper >>>> >>>> Atomic::inc >>>> IncImpl >>>> PlatformInc >>>> IncUsingConstant >>>> >>>> Why is it that the simplest operation (inc/dec) has the most complex platform template definition? Why do we need Adjustment? You previously said "Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates". But add passes around values and doesn't need this. Further inc/dec don't need to pass anything around anywhere - inc adds 1, dec subtracts 1! This "1" does not need to appear anywhere in the API or get passed across layers - the only place this "1" becomes evident is in the actual platform asm that does the logic of "add 1" or "subtract 1". >>>> >>>> My understanding from previous discussions is that much of the template machinations was to deal with type management for "dest" and the values being passed around. But here, for inc/dec there are no values being passed so we don't have to make "dest" type-compatible with any value. >>> Dealing with different types being passed in is one part of the problem - a problem that almost all operations seems to have. But Atomic::add and inc/dec have more problems to deal with. >>> The Atomic::add operation has two more problems that cmpxchg does not have. >>> 1) It needs to scale pointer arithmetic. So if you have a P* and you add it by 2, then you really add the underlying value by 2 * sizeof(P), and the scaled addend needs to be of the right type - the type of the destination for integral types and ptrdiff_t for pointers. This is similar semantics to ++pointer. >> I'll address this below - but yes I overlooked this aspect. >> >>> 2) It connects backends with different semantics - either fetch_and_add or add_and_fetch to a common public interface with add_and_fetch semantics. >> Not at all clear why this has to manifest in the upper/middle layers instead of being handled by the actual lowest-layer ?? >> >>> This is the reason that Atomic::add might appear more complicated than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type problems to deal with - no pointer arithmetics. >>> The reason why Atomic::inc/dec looks more complicated than Atomic::add is that it needs to preserve the pointer arithmetic as constants rather than values, because the scaled addend is embedded in the inline assembly as immediate values. Therefore it passes around an IntegralConstant that embeds both the type and size of the addend. And it is not just 1/-1. For integral destinations the constant used is 1/-1 of the type stored at the destination. For pointers the constant is ptrdiff_t with a value representing the size of the element pointed to. >> This is insanely complicated (I think that counts as 'accidental complexity' per Andrew's comment ;-) ). Pointer arithmetic is a basic/fundamental part of C/C++, yet this template stuff has to jump through multiple inverted hoops to do something the language "just does"! All this complexity to manage a conversion addend -> addend * sizeof(*dest) ?? >> >> And the fact that inc/dec are simpler than add, yet result in far more complicated templates because the simpler addend is a constant, is just as unfathomable to me! >> >>> Having said that - I am not opposed to simply removing the specializations of inc/dec if we are scared of the complexity of passing this constant to the platform layer. After running a bunch of benchmarks over the weekend, it showed no significant regressions after removal. Now of course that might not tell the full story - it could have missed that some critical operation in the JVM takes longer. But I would be very surprised if that was the case. >> I can imagine we use an "add immediate" form for inc/dec of 1, do we actually use that for other values? I would expect inc_ptr/dec_ptr to always translate to add_ptr, with no special case for when ptr is char* and so we only add/sub 1. ?? > [Delurking briefly.] > > Sorry I've been silent until now in this discussion, but I'm on > vacation, and won't have time until next week to really pay attention. > But this seems to have gone somewhat awry, so I'm popping in briefly. > > David objected to some of the complexity, apparently based on > forgetting the scaling for pointer arithmetic. That seems to have been > cleared up. > > However, David (and I think others) are also objecting to other points > of complexity, and I think I agree. I was working on an approach that > was structurally similar to Atomic::add, but using IntegralContant to > retain access to the literal value, for those platforms that benefit > from that. Erik's proposal also uses IntegralContant for that purpose, > but (from a quick skim) I think got that part wrong, and that is a > source of additional complexity. Erik might want to re-read my handoff > email to him. I don't know whether that approach would satisfy folks > though. Unfortunately that approach did not work. It passed IntegralConstant as rvalue to operator(), which is illegal as IntegralConstant is an AllStatic class. > I was also looking into the possibility that more platforms might be > able to just use Atomic::add to implement Atomic::inc (and maybe > Atomic::dec), without an change to the generated code for inc/dec. > This would be accomplished by improving the inline assembler for > Atomic::add, using an "n" constraint for the addend when appropriate. > In some cases this perhaps might be done by providing it as an > alternative (e.g. using an "nr" constraint). I hadn't gotten gotten > very far in exploring that possibility though, so it might not go > anywhere. > > And I agree the existing barriers in inc/dec for powerpc (both aix and > linux) look contrary to the documented requirements. I am glad we agree. > I'm a little bit reluctant to just give up on per-platform > microoptimized inc/dec and simply transform those operations into > corresponding add operations. Requiring an additional register and > it's initialization for some platforms seems like poor form. If we insist on having these micro optimizations, I am not opposed to on selected x86 GCC platforms (were the risk for future ABI breakage is nearly zero) using GCC intrinsics for Atomic::add. If Atomic::inc/dec simply calls Atomic::add, then those GCC intrinsics will be able to micro optimize as you desire to the optimal lock add encoding. Thanks, /Erik > If this discussion hasn't reached consensus by next week, I'll start > working with Erik then to get us there. From glaubitz at physik.fu-berlin.de Tue Sep 5 10:00:46 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Tue, 5 Sep 2017 12:00:46 +0200 Subject: [RFR]: 8187227: __m68k_cmpxchg() is not being used correctly Message-ID: Hi! Please review the changeset in [1] which fixes the incorrect use of __m68k_cmpxchg(). The description for this change is [2]: ============================================================= On m68k, linux-zero has platform-specific implementations for compare_and_swap(), add_and_fetch() and lock_test_and_set(). These functions are all using __m68k_cmpxchg() which is basically a wrapper around the m68k assembly instruction CAS. Currently, all three functions make incorrect assumptions about how CAS and its wrapper actually work and consequently use __m68k_cmpxchg() incorrectly. The source code comment for __m68_cmpxchg() states: * Atomically store newval in *ptr if *ptr is equal to oldval for user space. * Returns newval on success and oldval if no exchange happened. * This implementation is processor specific and works on * 68020 68030 68040 and 68060. However, looking at the documentation for the CAS instruction on m68k [1] and the implementation of __m68k_cmpxchg(), this is actually not how the function works. It does not return the update value on a successful exchange but rather the contents the compare operand, i.e. oldval. If no exchange happened, it will actually return the contents of the memory location. newval is never returned and consequently testing for "newval" in compare_and_swap(), add_and_fetch() and lock_test_and_set() is a bug. I have preapred a patch that fixes this issue by making correct use of __m68k_cmpxchg() in compare_and_swap(), add_and_fetch() and lock_test_and_set(). This patch has been tested to work on Debian m68k. > [1] http://68k.hax.com/CAS ============================================================= > [1] http://cr.openjdk.java.net/~glaubitz/8187227/webrev.01/ > [2] https://bugs.openjdk.java.net/browse/JDK-8187227 -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From erik.osterlund at oracle.com Tue Sep 5 10:07:26 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 5 Sep 2017 12:07:26 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> Message-ID: <59AE775E.1070503@oracle.com> Hi David, On 2017-09-04 23:59, David Holmes wrote: > Hi Erik, > > On 4/09/2017 5:15 PM, Erik ?sterlund wrote: >> Hi David, >> >> On 2017-09-04 03:24, David Holmes wrote: >>> Hi Erik, >>> >>> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>>> Hi David, >>>> >>>> The shared structure for all operations is the following: >>>> >>>> An Atomic::something call creates a SomethingImpl function object >>>> that performs some basic type checking and then forwards the call >>>> straight to a PlatformSomething function object. This >>>> PlatformSomething object could decide to do anything. But to make >>>> life easier, it may inherit from a shared SomethingHelper function >>>> object with CRTP that calls back into the PlatformSomething >>>> function object to emit inline assembly. >>> >>> Right, but! Lets look at some details. >>> >>> Atomic::add >>> AddImpl >>> PlatformAdd >>> FetchAndAdd >>> AddAndFetch >>> add_using_helper >>> >>> Atomic::cmpxchg >>> CmpxchgImpl >>> PlatformCmpxchg >>> cmpxchg_using_helper >>> >>> Atomic::inc >>> IncImpl >>> PlatformInc >>> IncUsingConstant >>> >>> Why is it that the simplest operation (inc/dec) has the most complex >>> platform template definition? Why do we need Adjustment? You >>> previously said "Adjustment represents the increment/decrement value >>> as an IntegralConstant - your template friend for passing around a >>> constant with both a specified type and value in templates". But add >>> passes around values and doesn't need this. Further inc/dec don't >>> need to pass anything around anywhere - inc adds 1, dec subtracts 1! >>> This "1" does not need to appear anywhere in the API or get passed >>> across layers - the only place this "1" becomes evident is in the >>> actual platform asm that does the logic of "add 1" or "subtract 1". >>> >>> My understanding from previous discussions is that much of the >>> template machinations was to deal with type management for "dest" >>> and the values being passed around. But here, for inc/dec there are >>> no values being passed so we don't have to make "dest" >>> type-compatible with any value. >> >> Dealing with different types being passed in is one part of the >> problem - a problem that almost all operations seems to have. But >> Atomic::add and inc/dec have more problems to deal with. >> >> The Atomic::add operation has two more problems that cmpxchg does not >> have. >> 1) It needs to scale pointer arithmetic. So if you have a P* and you >> add it by 2, then you really add the underlying value by 2 * >> sizeof(P), and the scaled addend needs to be of the right type - the >> type of the destination for integral types and ptrdiff_t for >> pointers. This is similar semantics to ++pointer. > > I'll address this below - but yes I overlooked this aspect. > >> 2) It connects backends with different semantics - either >> fetch_and_add or add_and_fetch to a common public interface with >> add_and_fetch semantics. > > Not at all clear why this has to manifest in the upper/middle layers > instead of being handled by the actual lowest-layer ?? It could have been addressed in the lowest layer indeed. I suppose Kim found it nicer to do that on a higher level while you find it nicer to do it on a lower level. I have no opinion here. > >> This is the reason that Atomic::add might appear more complicated >> than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different >> type problems to deal with - no pointer arithmetics. >> >> The reason why Atomic::inc/dec looks more complicated than >> Atomic::add is that it needs to preserve the pointer arithmetic as >> constants rather than values, because the scaled addend is embedded >> in the inline assembly as immediate values. Therefore it passes >> around an IntegralConstant that embeds both the type and size of the >> addend. And it is not just 1/-1. For integral destinations the >> constant used is 1/-1 of the type stored at the destination. For >> pointers the constant is ptrdiff_t with a value representing the size >> of the element pointed to. > > This is insanely complicated (I think that counts as 'accidental > complexity' per Andrew's comment ;-) ). Pointer arithmetic is a > basic/fundamental part of C/C++, yet this template stuff has to jump > through multiple inverted hoops to do something the language "just > does"! All this complexity to manage a conversion addend -> addend * > sizeof(*dest) ?? Okay. > And the fact that inc/dec are simpler than add, yet result in far more > complicated templates because the simpler addend is a constant, is > just as unfathomable to me! My latest proposal is to nuke the Atomic::inc/dec specializations and make it call Atomic::add. Any objections on that? It is arguably simpler, and then we can leave the complexity discussion behind. >> Having said that - I am not opposed to simply removing the >> specializations of inc/dec if we are scared of the complexity of >> passing this constant to the platform layer. After running a bunch of >> benchmarks over the weekend, it showed no significant regressions >> after removal. Now of course that might not tell the full story - it >> could have missed that some critical operation in the JVM takes >> longer. But I would be very surprised if that was the case. > > I can imagine we use an "add immediate" form for inc/dec of 1, do we > actually use that for other values? I would expect inc_ptr/dec_ptr to > always translate to add_ptr, with no special case for when ptr is > char* and so we only add/sub 1. ?? Yes we currently only inc/sub by 1. Thanks, /Erik > Thanks, > David > >> Thanks, >> /Erik >> >>> >>> Cheers, >>> David >>> ----- >>> >>>> Hope this explanation helps understanding the intended structure of >>>> this work. >>>> >>>> Thanks, >>>> /Erik >>>> >>>> On 2017-09-01 12:34, David Holmes wrote: >>>>> Hi Erik, >>>>> >>>>> I just wanted to add that I would expect the cmpxchg, add and inc, >>>>> Atomic API's to all require similar basic structure for >>>>> manipulating types/values etc, yet all three seem to have quite >>>>> different structures that I find very confusing. I'm still at a >>>>> loss to fathom the CRTP and the hoops we seemingly have to jump >>>>> through just to add or subtract 1!!! >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>>>> Hi David, >>>>>> >>>>>> On 2017-09-01 02:49, David Holmes wrote: >>>>>>> Hi Erik, >>>>>>> >>>>>>> Sorry but this one is really losing me. >>>>>>> >>>>>>> What is the role of Adjustment ?? >>>>>> >>>>>> Adjustment represents the increment/decrement value as an >>>>>> IntegralConstant - your template friend for passing around a >>>>>> constant with both a specified type and value in templates. The >>>>>> type of the increment/decrement is the type of the destination >>>>>> when the destination is an integral type, otherwise if it is a >>>>>> pointer type, the increment/decrement type is ptrdiff_t. >>>>>> >>>>>>> How are inc/dec anything but "using constant" ?? >>>>>> >>>>>> I was also a bit torn on that name (I assume you are referring to >>>>>> IncUsingConstant/DecUsingConstant). It was hard to find a name >>>>>> that depicted what this platform helper does. I considered >>>>>> calling the helper something with immediate in the name because >>>>>> it is really used to embed the constant as immediate values in >>>>>> inline assembly today. But then again that seemed too specific, >>>>>> as it is not completely obvious platform specializations will use >>>>>> it in that way. One might just want to specialize this to send it >>>>>> into some compiler Atomic::inc intrinsic for example. Do you have >>>>>> any other preferred names? Here are a few possible names for >>>>>> IncUsingConstant: >>>>>> >>>>>> IncUsingScaledConstant >>>>>> IncUsingAdjustedConstant >>>>>> IncUsingPlatformHelper >>>>>> >>>>>> Any favourites? >>>>>> >>>>>>> Why do we special case jshort?? >>>>>> >>>>>> To be consistent with the special case of Atomic::add on jshort. >>>>>> Do you want it removed? >>>>>> >>>>>>> This is indecipherable to normal people ;-) >>>>>>> >>>>>>> This()->template inc(dest); >>>>>>> >>>>>>> For something as trivial as adding or subtracting 1 the template >>>>>>> machinations here are just mind boggling! >>>>>> >>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ >>>>>> idiom. The idea is to devirtualize a virtual call by passing in >>>>>> the derived type as a template parameter to a base class, and >>>>>> then let the base class static_cast to the derived class to >>>>>> devirtualize the call. I hope this explanation sheds some light >>>>>> on what is going on. The same CRTP idiom was used in the >>>>>> Atomic::add implementation in a similar fashion. >>>>>> >>>>>> I will add some comments describing this in the next round after >>>>>> Coleen replies. >>>>>> >>>>>> Thanks for looking at this. >>>>>> >>>>>> /Erik >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> Bug ID: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>>>> >>>>>>>> Webrev: >>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>>>> >>>>>>>> The time has come for the next step in generalizing Atomic with >>>>>>>> templates. Today I will focus on Atomic::inc/dec. >>>>>>>> >>>>>>>> I have tried to mimic the new Kim style that seems to have been >>>>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>>>>> structure looks like this: >>>>>>>> >>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>>>>> object that performs some basic type checks. >>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>>>>> define the operation arbitrarily for a given platform. The >>>>>>>> default implementation if not specialized for a platform is to >>>>>>>> call Atomic::add. So only platforms that want to do something >>>>>>>> different than that as an optimization have to provide a >>>>>>>> specialization. >>>>>>>> Layer 3) Platforms that decide to specialize >>>>>>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>>>>>> helper class IncUsingConstant/DecUsingConstant. This helper >>>>>>>> helps performing the necessary computation what the >>>>>>>> increment/decrement should be after pointer scaling using CRTP. >>>>>>>> The PlatformInc/PlatformDec operation then only needs to define >>>>>>>> an inc/dec member function, and will then get all the context >>>>>>>> information necessary to generate a more optimized >>>>>>>> implementation. Easy peasy. >>>>>>>> >>>>>>>> It is worth noticing that the generalized Atomic::dec operation >>>>>>>> assumes a two's complement integer machine and potentially >>>>>>>> sends the unary negative of a potentially unsigned type to >>>>>>>> Atomic::add. I have the following comments about this: >>>>>>>> 1) We already assume in other code that two's complement >>>>>>>> integers must be present. >>>>>>>> 2) A machine that does not have two's complement integers may >>>>>>>> still simply provide a specialization that solves the problem >>>>>>>> in a different way. >>>>>>>> 3) The alternative that does not make assumptions about that >>>>>>>> would use the good old IntegerTypes::cast_to_signed >>>>>>>> metaprogramming stuff, and I seem to recall we thought that was >>>>>>>> a bit too involved and complicated. >>>>>>>> This is the reason why I have chosen to use unary minus on the >>>>>>>> potentially unsigned type in the shared helper code that sends >>>>>>>> the decrement as an addend to Atomic::add. >>>>>>>> >>>>>>>> It would also be nice if somebody with access to PPC and s390 >>>>>>>> machines could try out the relevant changes there so I do not >>>>>>>> accidentally break those platforms. I have blind-coded the >>>>>>>> addition of the immediate values passed in to the inline >>>>>>>> assembly in a way that I think looks like it should work. >>>>>>>> >>>>>>>> Testing: >>>>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>>>> >>>>>>>> Thanks, >>>>>>>> /Erik >>>>>> >>>> >> From kim.barrett at oracle.com Tue Sep 5 10:31:51 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 5 Sep 2017 11:31:51 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AE74AF.70308@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> <59AE74AF.70308@oracle.com> Message-ID: <92BFF96B-BB27-4740-9C8B-6BDC3EAC31F7@oracle.com> > On Sep 5, 2017, at 10:55 AM, Erik ?sterlund wrote: > On 2017-09-05 02:38, Kim Barrett wrote: >> [?] Erik's proposal also uses IntegralContant for that purpose, >> but (from a quick skim) I think got that part wrong, and that is a >> source of additional complexity. Erik might want to re-read my handoff >> email to him. I don't know whether that approach would satisfy folks >> though. > > Unfortunately that approach did not work. It passed IntegralConstant as rvalue to operator(), which is illegal as IntegralConstant is an AllStatic class. IntegralConstant is an AllStatic? That?s just wrong! And probably my fault too. // A type n is a model of Integral Constant if it meets the following // requirements: // [?] // n::value_type const c = n() : c == n::value >> I'm a little bit reluctant to just give up on per-platform >> microoptimized inc/dec and simply transform those operations into >> corresponding add operations. Requiring an additional register and >> it's initialization for some platforms seems like poor form. > > If we insist on having these micro optimizations, I am not opposed to on selected x86 GCC platforms (were the risk for future ABI breakage is nearly zero) using GCC intrinsics for Atomic::add. If Atomic::inc/dec simply calls Atomic::add, then those GCC intrinsics will be able to micro optimize as you desire to the optimal lock add encoding. I think I?d be okay with that too. From goetz.lindenmaier at sap.com Tue Sep 5 10:58:05 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 5 Sep 2017 10:58:05 +0000 Subject: RFR(xs): 8187028: [aix] Real thread stack size may be up to 64K smaller than requested one In-Reply-To: References: Message-ID: <0f2837a40bd0484c9811fd04232e7926@sap.com> Hi Thomas, thanks for fixing this. Looks good. Best regards, Goetz. > -----Original Message----- > From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] > Sent: Dienstag, 5. September 2017 11:29 > To: Volker Simonis ; Lindenmaier, Goetz > > Cc: ppc-aix-port-dev at openjdk.java.net; HotSpot Open Source Developers > > Subject: Re: RFR(xs): 8187028: [aix] Real thread stack size may be up to 64K > smaller than requested one > > Hi Guys, > > New webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8187028-aix-Real-thread-stack- > size-may-be-up-to-64K-smaller-than-requested-one/webrev.02/webrev/ > > > Nothing changed for the fix, But I added a fix to a whitebox test which broke > due to the fix. > > runtime/whitebox/WBStackSize.java tests that the actual stack size a thread > receives correlates closely to the requested stack size, and that test had to > be loosened for AIX. > > Kind Regards, Thomas > > > On Thu, Aug 31, 2017 at 12:08 PM, Thomas St?fe > wrote: > > > Hi all, > > please review this change: > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8187028 > > > > change: > http://cr.openjdk.java.net/~stuefe/webrevs/8187028-aix-Real- > thread-stack-size-may-be-up-to-64K-smaller-than-requested- > one/webrev.00/webrev/ > stack-size-may-be-up-to-64K-smaller-than-requested- > one/webrev.00/webrev/> > > The issue is that on AIX, pthread library seems to have a bug where it > sometimes gives us less thread stack space than we requested (a variable > amount, but seems to be 0..64K). This may cause intermittent stack overflow > errors if the stacks are very small to begin with. > > The workaround is to add 64K to the requested stack size to account > for the fact that the OS may give us up to 64K less stack. > > Also, improved logging. > > Thanks, Thomas > From erik.osterlund at oracle.com Tue Sep 5 12:09:18 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 5 Sep 2017 14:09:18 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <92BFF96B-BB27-4740-9C8B-6BDC3EAC31F7@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> <59AE74AF.70308@oracle.com> <92BFF96B-BB27-4740-9C8B-6BDC3EAC31F7@oracle.com> Message-ID: <59AE93EE.2010600@oracle.com> Hi Kim, On 2017-09-05 12:31, Kim Barrett wrote: >> On Sep 5, 2017, at 10:55 AM, Erik ?sterlund wrote: >> On 2017-09-05 02:38, Kim Barrett wrote: >>> [?] Erik's proposal also uses IntegralContant for that purpose, >>> but (from a quick skim) I think got that part wrong, and that is a >>> source of additional complexity. Erik might want to re-read my handoff >>> email to him. I don't know whether that approach would satisfy folks >>> though. >> Unfortunately that approach did not work. It passed IntegralConstant as rvalue to operator(), which is illegal as IntegralConstant is an AllStatic class. > IntegralConstant is an AllStatic? That?s just wrong! And probably my fault too. It is indeed. > // A type n is a model of Integral Constant if it meets the following > // requirements: > // > [?] > // n::value_type const c = n() : c == n::value Fair point. That should probably be fixed if somebody wants to pass around IntegralConstant by value in some later RFE. > >>> I'm a little bit reluctant to just give up on per-platform >>> microoptimized inc/dec and simply transform those operations into >>> corresponding add operations. Requiring an additional register and >>> it's initialization for some platforms seems like poor form. >> If we insist on having these micro optimizations, I am not opposed to on selected x86 GCC platforms (were the risk for future ABI breakage is nearly zero) using GCC intrinsics for Atomic::add. If Atomic::inc/dec simply calls Atomic::add, then those GCC intrinsics will be able to micro optimize as you desire to the optimal lock add encoding. > I think I?d be okay with that too. Okay, great. So far it sounds like as for Atomic::inc/dec, there are no loud voices against the idea of removing the Atomic::inc/dec specializations. So I propose this new webrev that does exactly that. Full webrev: http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01/ Incremental over last webrev: http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00_01/ I hope this looks simpler. Thanks, /Erik From vladimir.kozlov at oracle.com Tue Sep 5 16:35:43 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Sep 2017 09:35:43 -0700 Subject: RFR(S): 8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob() In-Reply-To: References: Message-ID: On 9/4/17 10:23 AM, Volker Simonis wrote: > On Fri, Sep 1, 2017 at 6:00 PM, Vladimir Kozlov > wrote: >> Checking type is emulation of virtual call ;-) > > I agree :) But it is only a bimorphic dispatch in this case which > should be still faster than a normal virtual call. > >> But I agree that it is simplest solution - one line change (excluding >> comment - comment is good BTW). >> > > Thanks. > >> You can also add guard AOT_ONLY() around aot specific code: >> >> const void* start = AOT_ONLY( (code_blob_type() == CodeBlobType::AOT) ? >> blob->code_begin() : ) (void*)blob; >> >> because we do have builds without AOT. >> > > Done. Please find the new webrev here: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v1/ Looks good. Thank you for updated CodeBlob description comment. > > Could you please sponsor the change once jdk10-hs opens again? We have to wait when jdk10 "consolidation" is finished. It may take 2 weeks. > > Thanks, > Volker > > PS: one thing which is still unclear to me is why you haven't caught > this issue before? Isn't > test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java part of > JPRT and/or your regular tests? test/compiler/codecache/stress are excluded from JPRT runs: https://bugs.openjdk.java.net/browse/JDK-8069021 Also these tests are marked with @key stress. Originally it was only 2 tests and ReturnBlobToWrongHeapTest.java was added later: https://bugs.openjdk.java.net/browse/JDK-8069021 I am trying to find which testing tier runs them. I will follow this. Thanks, Vladimir > > >> Thanks, >> Vladimir >> >> >> On 9/1/17 8:42 AM, Volker Simonis wrote: >>> >>> Hi, >>> >>> can I please have a review and sponsor for the following small fix: >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091/ >>> https://bugs.openjdk.java.net/browse/JDK-8187091 >>> >>> We see failures in >>> test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java which >>> are cause by problems in CodeHeap::contains_blob() for corner cases >>> with CodeBlobs of zero size: >>> >>> # A fatal error has been detected by the Java Runtime Environment: >>> # >>> # Internal Error (heap.cpp:248), pid=27586, tid=27587 >>> # guarantee((char*) b >= _memory.low_boundary() && (char*) b < >>> _memory.high()) failed: The block to be deallocated 0x00007fffe6666f80 >>> is not within the heap starting with 0x00007fffe6667000 and ending >>> with 0x00007fffe6ba000 >>> >>> The problem is that JDK-8183573 replaced >>> >>> virtual bool contains_blob(const CodeBlob* blob) const { return >>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>> >>> by: >>> >>> bool contains_blob(const CodeBlob* blob) const { return >>> contains(blob->code_begin()); } >>> >>> But that my be wrong in the corner case where the size of the >>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>> 'header' - i.e. the C++ object itself) because in that case >>> CodeBlob::code_begin() points right behind the CodeBlob's header which >>> is a memory location which doesn't belong to the CodeBlob anymore. >>> >>> This exact corner case is exercised by ReturnBlobToWrongHeapTest which >>> allocates CodeBlobs of size zero (i.e. zero 'payload') with the help >>> of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills >>> up. The test first fills the 'non-profiled nmethods' CodeHeap. If the >>> 'non-profiled nmethods' CodeHeap is full, the VM automatically tries >>> to allocate from the 'profiled nmethods' CodeHeap until that fills up >>> as well. But in the CodeCache the 'profiled nmethods' CodeHeap is >>> located right before the non-profiled nmethods' CodeHeap. So if the >>> last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a >>> payload size of zero and uses all the CodeHeaps remaining size, we >>> will end up with a CodeBlob whose code_begin() address will point >>> right behind the actual CodeHeap (i.e. it will point right at the >>> beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This >>> will result in the above guarantee to fire, when we will try to free >>> the last allocated CodeBlob (with >>> sun.hotspot.WhiteBox.freeCodeBlob()). >>> >>> In a previous mail thread >>> >>> (http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/028175.html) >>> Vladimir explained why JDK-8183573 was done: >>> >>>> About contains_blob(). The problem is that AOTCompiledMethod allocated in >>>> CHeap and not in aot code section (which is RO): >>>> >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>> >>>> It is allocated in CHeap after AOT library is loaded. Its code_begin() >>>> points to AOT code section but AOTCompiledMethod* >>>> points outside it (to normal malloced space) so you can't use (char*)blob >>>> address. >>> >>> >>> and proposed these two fixes: >>> >>>> There are 2 ways to fix it, I think. >>>> One is to add new field to CodeBlobLayout and set it to blob* address for >>>> normal CodeCache blobs and to code_begin for >>>> AOT code. >>>> Second is to use contains(blob->code_end() - 1) assuming that AOT code is >>>> never zero. >>> >>> >>> I came up with a slightly different solution - just use >>> 'CodeHeap::code_blob_type()' whether to use 'blob->code_begin()' (for >>> the AOT case) or '(void*)blob' (for all other blobs) as input for the >>> call to 'CodeHeap::contain()'. It's simple and still much cheaper than >>> a virtual call. What do you think? >>> >>> I've also updated the documentation of the CodeBlob class hierarchy in >>> codeBlob.hpp. Please let me know if I've missed something. >>> >>> Thank you and best regards, >>> Volker >>> >> From volker.simonis at gmail.com Tue Sep 5 16:49:34 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 5 Sep 2017 18:49:34 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov wrote: > May be add new CodeBlob's method to adjust sizes instead of directly setting > them in CodeCache::free_unused_tail(). Then you would not need friend class > CodeCache in CodeBlob. > Changed as suggested (I didn't liked the friend declaration as well :) > Also I think adjustment to header_size should be done in > CodeCache::free_unused_tail() to limit scope of code who knows about blob > layout. > Yes, that's much cleaner. Please find the updated webrev here: http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ I've also found another "day 1" problem in StubQueue::next(): Stub* next(Stub* s) const { int i = index_of(s) + stub_size(s); - if (i == _buffer_limit) i = 0; + // Only wrap around in the non-contiguous case (see stubss.cpp) + if (i == _buffer_limit && _queue_end < _buffer_limit) i = 0; return (i == _queue_end) ? NULL : stub_at(i); } The problem was that the method was not prepared to handle the case where _buffer_limit == _queue_end == _buffer_size which lead to an infinite recursion when iterating over a StubQueue with StubQueue::next() until next() returns NULL (as this was for example done with -XX:+PrintInterpreter). But with the new, trimmed CodeBlob we run into exactly this situation. While doing this last fix I also noticed that "StubQueue::stubs_do()", "StubQueue::queues_do()" and "StubQueue::register_queue()" don't seem to be used anywhere in the open code base (please correct me if I'm wrong). What do you think, maybe we should remove this code in a follow up change if it is really not needed? Finally, could you please run the new version through JPRT and sponsor it once jdk10/hs will be opened again? Thanks, Volker > Thanks, > Vladimir > > > On 9/1/17 8:46 AM, Volker Simonis wrote: >> >> Hi, >> >> I've decided to split the fix for the 'CodeHeap::contains_blob()' >> problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails >> because of problems in CodeHeap::contains_blob()" >> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new >> review thread for discussing it at: >> >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >> >> So please lets keep this thread for discussing the interpreter code >> size issue only. I've prepared a new version of the webrev which is >> the same as the first one with the only difference that the change to >> 'CodeHeap::contains_blob()' has been removed: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >> >> Thanks, >> Volker >> >> >> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >> wrote: >>> >>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >>> wrote: >>>> >>>> Very good change. Thank you, Volker. >>>> >>>> About contains_blob(). The problem is that AOTCompiledMethod allocated >>>> in >>>> CHeap and not in aot code section (which is RO): >>>> >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>> >>>> It is allocated in CHeap after AOT library is loaded. Its code_begin() >>>> points to AOT code section but AOTCompiledMethod* points outside it (to >>>> normal malloced space) so you can't use (char*)blob address. >>>> >>> >>> Thanks for the explanation - now I got it. >>> >>>> There are 2 ways to fix it, I think. >>>> One is to add new field to CodeBlobLayout and set it to blob* address >>>> for >>>> normal CodeCache blobs and to code_begin for AOT code. >>>> Second is to use contains(blob->code_end() - 1) assuming that AOT code >>>> is >>>> never zero. >>>> >>> >>> I'll give it a try tomorrow and will send out a new webrev. >>> >>> Regards, >>> Volker >>> >>>> Thanks, >>>> Vladimir >>>> >>>> >>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> While working on this, I found another problem which is related to >>>>>>> the >>>>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg test >>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>>> >>>>>>> The problem is that JDK-8183573 replaced >>>>>>> >>>>>>> virtual bool contains_blob(const CodeBlob* blob) const { return >>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>>>> >>>>>>> by: >>>>>>> >>>>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>>>> contains(blob->code_begin()); } >>>>>>> >>>>>>> But that my be wrong in the corner case where the size of the >>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>>>> 'header' - i.e. the C++ object itself) because in that case >>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header >>>>>>> which >>>>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I recall this change was somehow necessary to allow merging >>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>>>> one devirtualized method, so you need to ensure all AOT tests >>>>>> pass with this change (on linux-x64). >>>>>> >>>>> >>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>>>> successful. Are there any other tests I should check? >>>>> >>>>> That said, it is a little hard to follow the stages of your change. It >>>>> seems like >>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>>> was reviewed [1] but then finally the slightly changed version from >>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ was >>>>> checked in and linked to the bug report. >>>>> >>>>> The first, reviewed version of the change still had a correct version >>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>>>> checked in version has the faulty version of that method. >>>>> >>>>> I don't know why you finally did that change to 'contains_blob()' but >>>>> I don't see any reason why we shouldn't be able to directly use the >>>>> blob's address for inclusion checking. From what I understand, it >>>>> should ALWAYS be contained in the corresponding CodeHeap so no reason >>>>> to mess with 'CodeBlob::code_begin()'. >>>>> >>>>> Please let me know if I'm missing something. >>>>> >>>>> [1] >>>>> >>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>>> >>>>>> I can't help to wonder if we'd not be better served by disallowing >>>>>> zero-sized payloads. Is this something that can ever actually >>>>>> happen except by abuse of the white box API? >>>>>> >>>>> >>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>>>> wants to allocate "segment sized" blocks which is most easily achieved >>>>> by allocation zero-sized CodeBlobs. And I think there's nothing wrong >>>>> about it if we handle the inclusion tests correctly. >>>>> >>>>> Thank you and best regards, >>>>> Volker >>>>> >>>>>> /Claes From coleen.phillimore at oracle.com Tue Sep 5 16:50:03 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 5 Sep 2017 12:50:03 -0400 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> Message-ID: Thank you, Serguei! Coleen On 9/1/17 6:48 PM, serguei.spitsyn at oracle.com wrote: > Hi Coleen, > > The fix looks good. > > Thanks, > Serguei > > > On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: >> Summary: Add resolved_references and init_lock as hidden static field >> in class so root is found. >> >> Tested manually with YourKit.? See bug for images.?? Also ran >> serviceability tests. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >> >> Thanks, >> Coleen >> > From vladimir.kozlov at oracle.com Tue Sep 5 17:17:29 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Sep 2017 10:17:29 -0700 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> Message-ID: <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> On 9/5/17 9:49 AM, Volker Simonis wrote: > On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov > wrote: >> May be add new CodeBlob's method to adjust sizes instead of directly setting >> them in CodeCache::free_unused_tail(). Then you would not need friend class >> CodeCache in CodeBlob. >> > > Changed as suggested (I didn't liked the friend declaration as well :) > >> Also I think adjustment to header_size should be done in >> CodeCache::free_unused_tail() to limit scope of code who knows about blob >> layout. >> > > Yes, that's much cleaner. Please find the updated webrev here: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ Good. > > I've also found another "day 1" problem in StubQueue::next(): > > Stub* next(Stub* s) const { int i = > index_of(s) + stub_size(s); > - if (i == > _buffer_limit) i = 0; > + // Only wrap > around in the non-contiguous case (see stubss.cpp) > + if (i == > _buffer_limit && _queue_end < _buffer_limit) i = 0; > return (i == > _queue_end) ? NULL : stub_at(i); > } > > The problem was that the method was not prepared to handle the case > where _buffer_limit == _queue_end == _buffer_size which lead to an > infinite recursion when iterating over a StubQueue with > StubQueue::next() until next() returns NULL (as this was for example > done with -XX:+PrintInterpreter). But with the new, trimmed CodeBlob > we run into exactly this situation. Okay. > > While doing this last fix I also noticed that "StubQueue::stubs_do()", > "StubQueue::queues_do()" and "StubQueue::register_queue()" don't seem > to be used anywhere in the open code base (please correct me if I'm > wrong). What do you think, maybe we should remove this code in a > follow up change if it is really not needed? register_queue() is used in constructor. Other 2 you can remove. stub_code_begin() and stub_code_end() are not used too -remove. I thought we run on linux with flag which warn about unused code. > > Finally, could you please run the new version through JPRT and sponsor > it once jdk10/hs will be opened again? Will do when jdk10 "consolidation" is finished. Please, remind me later if I forget. Thanks, Vladimir > > Thanks, > Volker > >> Thanks, >> Vladimir >> >> >> On 9/1/17 8:46 AM, Volker Simonis wrote: >>> >>> Hi, >>> >>> I've decided to split the fix for the 'CodeHeap::contains_blob()' >>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails >>> because of problems in CodeHeap::contains_blob()" >>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new >>> review thread for discussing it at: >>> >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >>> >>> So please lets keep this thread for discussing the interpreter code >>> size issue only. I've prepared a new version of the webrev which is >>> the same as the first one with the only difference that the change to >>> 'CodeHeap::contains_blob()' has been removed: >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >>> >>> Thanks, >>> Volker >>> >>> >>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >>> wrote: >>>> >>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >>>> wrote: >>>>> >>>>> Very good change. Thank you, Volker. >>>>> >>>>> About contains_blob(). The problem is that AOTCompiledMethod allocated >>>>> in >>>>> CHeap and not in aot code section (which is RO): >>>>> >>>>> >>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>>> >>>>> It is allocated in CHeap after AOT library is loaded. Its code_begin() >>>>> points to AOT code section but AOTCompiledMethod* points outside it (to >>>>> normal malloced space) so you can't use (char*)blob address. >>>>> >>>> >>>> Thanks for the explanation - now I got it. >>>> >>>>> There are 2 ways to fix it, I think. >>>>> One is to add new field to CodeBlobLayout and set it to blob* address >>>>> for >>>>> normal CodeCache blobs and to code_begin for AOT code. >>>>> Second is to use contains(blob->code_end() - 1) assuming that AOT code >>>>> is >>>>> never zero. >>>>> >>>> >>>> I'll give it a try tomorrow and will send out a new webrev. >>>> >>>> Regards, >>>> Volker >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> >>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>>>> >>>>>> >>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> While working on this, I found another problem which is related to >>>>>>>> the >>>>>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg test >>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>>>> >>>>>>>> The problem is that JDK-8183573 replaced >>>>>>>> >>>>>>>> virtual bool contains_blob(const CodeBlob* blob) const { return >>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>>>>> >>>>>>>> by: >>>>>>>> >>>>>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>>>>> contains(blob->code_begin()); } >>>>>>>> >>>>>>>> But that my be wrong in the corner case where the size of the >>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>>>>> 'header' - i.e. the C++ object itself) because in that case >>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header >>>>>>>> which >>>>>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I recall this change was somehow necessary to allow merging >>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>>>>> one devirtualized method, so you need to ensure all AOT tests >>>>>>> pass with this change (on linux-x64). >>>>>>> >>>>>> >>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>>>>> successful. Are there any other tests I should check? >>>>>> >>>>>> That said, it is a little hard to follow the stages of your change. It >>>>>> seems like >>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>>>> was reviewed [1] but then finally the slightly changed version from >>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ was >>>>>> checked in and linked to the bug report. >>>>>> >>>>>> The first, reviewed version of the change still had a correct version >>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>>>>> checked in version has the faulty version of that method. >>>>>> >>>>>> I don't know why you finally did that change to 'contains_blob()' but >>>>>> I don't see any reason why we shouldn't be able to directly use the >>>>>> blob's address for inclusion checking. From what I understand, it >>>>>> should ALWAYS be contained in the corresponding CodeHeap so no reason >>>>>> to mess with 'CodeBlob::code_begin()'. >>>>>> >>>>>> Please let me know if I'm missing something. >>>>>> >>>>>> [1] >>>>>> >>>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>>>> >>>>>>> I can't help to wonder if we'd not be better served by disallowing >>>>>>> zero-sized payloads. Is this something that can ever actually >>>>>>> happen except by abuse of the white box API? >>>>>>> >>>>>> >>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>>>>> wants to allocate "segment sized" blocks which is most easily achieved >>>>>> by allocation zero-sized CodeBlobs. And I think there's nothing wrong >>>>>> about it if we handle the inclusion tests correctly. >>>>>> >>>>>> Thank you and best regards, >>>>>> Volker >>>>>> >>>>>>> /Claes From vladimir.kozlov at oracle.com Tue Sep 5 17:49:01 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 5 Sep 2017 10:49:01 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> Message-ID: <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> On 9/4/17 11:02 PM, David Holmes wrote: > Hi Rohit, > > I couldn't see a bug filed for this so I did it: > > https://bugs.openjdk.java.net/browse/JDK-8187219 > > I also hosted the webrev as I wanted to see the change in context: > > http://cr.openjdk.java.net/~dholmes/8187219/webrev/ Thank you, David, for filing RFE and preparing webrev. > > I have a couple of comments/queries: > > src/cpu/x86/vm/vm_version_x86.hpp > > So this moved the adx/bmi2/sha/fam settings out from being Intel > specific to applying to AMD as well - ok. Have these features always > been available in AMD chips? Just wondering if they might not be valid > for some older processors. Looks like AMD used the *same* CPUID bits to check availability of these features. Older CPUs will not have these bits set. > > You added: > > ?526?????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > ?527???????? result |= CPU_HT; > > and I'm wondering of there would be any case where this would not be > covered by the earlier: > > ?448???? if (threads_per_core() > 1) > ?449?????? result |= CPU_HT; > > ? > --- Valid question. Thanks, Vladimir > > src/cpu/x86/vm/vm_version_x86.cpp > > No comments on AMD specific changes. > > Thanks, > David > ----- > > On 5/09/2017 3:43 PM, David Holmes wrote: >> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>> Hello David, >>> >>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>> wrote: >>>> Hi Rohit, >>>> >>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>> >>> >>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>> without any issues. >>> Can you share the error message that you are getting? >> >> I was getting this: >> >> applying hotspot.patch >> patching file src/cpu/x86/vm/vm_version_x86.cpp >> Hunk #1 FAILED at 1108 >> 1 out of 1 hunks FAILED -- saving rejects to file >> src/cpu/x86/vm/vm_version_x86.cpp.rej >> patching file src/cpu/x86/vm/vm_version_x86.hpp >> Hunk #2 FAILED at 522 >> 1 out of 2 hunks FAILED -- saving rejects to file >> src/cpu/x86/vm/vm_version_x86.hpp.rej >> abort: patch failed to apply >> >> but I started again and this time it applied fine, so not sure what >> was going on there. >> >> Cheers, >> David >> >>> Regards, >>> Rohit >>> >>> >>>> >>>> >>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>> >>>>> Hello Vladimir, >>>>> >>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>> wrote: >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> Hello Vladimir, >>>>>>> >>>>>>>> Changes look good. Only question I have is about MaxVectorSize. >>>>>>>> It is >>>>>>>> set >>>>>>>>> >>>>>>>>> >>>>>>>> 16 only in presence of AVX: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>> >>>>>>>> >>>>>>>> Does that code works for AMD 17h too? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD >>>>>>> 17h. So >>>>>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>>>>> have updated, re-tested and attached the patch. >>>>>> >>>>>> >>>>>> >>>>>> Which check you removed? >>>>>> >>>>> >>>>> My older patch had the below mentioned check which was required on >>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>> better in openJDK10. So this check is not required anymore. >>>>> >>>>> +??? // Some defaults for AMD family 17h >>>>> +??? if ( cpu_family() == 0x17 ) { >>>>> ... >>>>> ... >>>>> +????? if (MaxVectorSize > 32) { >>>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> +????? } >>>>> .. >>>>> .. >>>>> +????? } >>>>> >>>>>>> >>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>> >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>> >>>>>>> >>>>>>> AMD 17h has support for SHA. >>>>>>> AMD 15h doesn't have? support for SHA. Still "UseSHA" flag gets >>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>> there an >>>>>>> underlying reason for this? I have handled this in the patch but >>>>>>> just >>>>>>> wanted to confirm. >>>>>> >>>>>> >>>>>> >>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>> instructions >>>>>> to >>>>>> calculate SHA-256: >>>>>> >>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>> >>>>>> I don't know if AMD 15h supports these instructions and can >>>>>> execute that >>>>>> code. You need to test it. >>>>>> >>>>> >>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>>> it should work. >>>>> Confirmed by running following sanity tests: >>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>> >>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>> >>>>> >>>>> So I have removed those SHA checks from my patch too. >>>>> >>>>> Please find attached updated, re-tested patch. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1109,11 +1109,27 @@ >>>>> ?????? } >>>>> >>>>> ?? #ifdef COMPILER2 >>>>> -??? if (MaxVectorSize > 16) { >>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> ???????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> ?????? } >>>>> ?? #endif // COMPILER2 >>>>> + >>>>> +??? // Some defaults for AMD family 17h >>>>> +??? if ( cpu_family() == 0x17 ) { >>>>> +????? // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> +????? if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>> +????? } >>>>> +????? if (supports_sse2() && >>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>> +????? } >>>>> +#ifdef COMPILER2 >>>>> +????? if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> +??????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> +????? } >>>>> +#endif >>>>> +??? } >>>>> ???? } >>>>> >>>>> ???? if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -505,6 +505,14 @@ >>>>> ???????? result |= CPU_CLMUL; >>>>> ?????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>> ???????? result |= CPU_RTM; >>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> +?????? result |= CPU_ADX; >>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> +????? result |= CPU_BMI2; >>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> +????? result |= CPU_SHA; >>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> +????? result |= CPU_FMA; >>>>> >>>>> ?????? // AMD features. >>>>> ?????? if (is_amd()) { >>>>> @@ -515,19 +523,13 @@ >>>>> ?????????? result |= CPU_LZCNT; >>>>> ???????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> ?????????? result |= CPU_SSE4A; >>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> +??????? result |= CPU_HT; >>>>> ?????? } >>>>> ?????? // Intel features. >>>>> ?????? if(is_intel()) { >>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> -???????? result |= CPU_ADX; >>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> -??????? result |= CPU_BMI2; >>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> -??????? result |= CPU_SHA; >>>>> ???????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>> ?????????? result |= CPU_LZCNT; >>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> -??????? result |= CPU_FMA; >>>>> ???????? // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>> support for prefetchw >>>>> ???????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>> ?????????? result |= CPU_3DNOW_PREFETCH; >>>>> >>>>> Please let me know your comments. >>>>> >>>>> Thanks for your time. >>>>> Rohit >>>>> >>>>>>> >>>>>>> Thanks for taking time to review the code. >>>>>>> >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>> ????????? } >>>>>>> ????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>> ??????? } >>>>>>> +??? if (supports_sha()) { >>>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>>> +????? } >>>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || >>>>>>> UseSHA256Intrinsics || >>>>>>> UseSHA512Intrinsics) { >>>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> +??????? warning("SHA instructions are not available on this CPU"); >>>>>>> +????? } >>>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> +??? } >>>>>>> >>>>>>> ??????? // some defaults for AMD family 15h >>>>>>> ??????? if ( cpu_family() == 0x15 ) { >>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>> ??????? } >>>>>>> >>>>>>> ??? #ifdef COMPILER2 >>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>> ????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>> ??????? } >>>>>>> ??? #endif // COMPILER2 >>>>>>> + >>>>>>> +??? // Some defaults for AMD family 17h >>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>> +????? // On family 17h processors use XMM and >>>>>>> UnalignedLoadStores for >>>>>>> Array Copy >>>>>>> +????? if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>> +????? } >>>>>>> +????? if (supports_sse2() && >>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>> +????? } >>>>>>> +????? if (supports_bmi2() && >>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>> +??????? FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>> +????? } >>>>>>> +????? if (UseSHA) { >>>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>>> +????????? warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>> functions not available on this CPU."); >>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> +??????? } >>>>>>> +????? } >>>>>>> +#ifdef COMPILER2 >>>>>>> +????? if (supports_sse4_2()) { >>>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>> +??????? } >>>>>>> +????? } >>>>>>> +#endif >>>>>>> +??? } >>>>>>> ????? } >>>>>>> >>>>>>> ????? if( is_intel() ) { // Intel cpus specific settings >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> @@ -505,6 +505,14 @@ >>>>>>> ????????? result |= CPU_CLMUL; >>>>>>> ??????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>> ????????? result |= CPU_RTM; >>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> +?????? result |= CPU_ADX; >>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> +????? result |= CPU_BMI2; >>>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> +????? result |= CPU_SHA; >>>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> +????? result |= CPU_FMA; >>>>>>> >>>>>>> ??????? // AMD features. >>>>>>> ??????? if (is_amd()) { >>>>>>> @@ -515,19 +523,13 @@ >>>>>>> ??????????? result |= CPU_LZCNT; >>>>>>> ????????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>> ??????????? result |= CPU_SSE4A; >>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>> +??????? result |= CPU_HT; >>>>>>> ??????? } >>>>>>> ??????? // Intel features. >>>>>>> ??????? if(is_intel()) { >>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> -???????? result |= CPU_ADX; >>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> -??????? result |= CPU_BMI2; >>>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> -??????? result |= CPU_SHA; >>>>>>> ????????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>> ??????????? result |= CPU_LZCNT; >>>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> -??????? result |= CPU_FMA; >>>>>>> ????????? // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>> support for prefetchw >>>>>>> ????????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>> ??????????? result |= CPU_3DNOW_PREFETCH; >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Rohit >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Rohit, >>>>>>>>>>> >>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a >>>>>>>>>>> lot of >>>>>>>>>>> logic >>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, >>>>>>>>>> test and >>>>>>>>>> resubmit for review. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>>> default) and didnt find any regressions. >>>>>>>>> >>>>>>>>> Can anyone please volunteer to review this patch? which sets >>>>>>>>> flag/ISA >>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>> >>>>>>>>> ************************* Patch **************************** >>>>>>>>> >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>> ?????????? } >>>>>>>>> ?????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>> ???????? } >>>>>>>>> +??? if (supports_sha()) { >>>>>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>> +????? } >>>>>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>> UseSHA256Intrinsics || >>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> +??????? warning("SHA instructions are not available on this >>>>>>>>> CPU"); >>>>>>>>> +????? } >>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> +??? } >>>>>>>>> >>>>>>>>> ???????? // some defaults for AMD family 15h >>>>>>>>> ???????? if ( cpu_family() == 0x15 ) { >>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>> ???????? } >>>>>>>>> >>>>>>>>> ???? #ifdef COMPILER2 >>>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>> ?????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>> ???????? } >>>>>>>>> ???? #endif // COMPILER2 >>>>>>>>> + >>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>> +????? // On family 17h processors use XMM and >>>>>>>>> UnalignedLoadStores for >>>>>>>>> Array Copy >>>>>>>>> +????? if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>> +??????? UseXMMForArrayCopy = true; >>>>>>>>> +????? } >>>>>>>>> +????? if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>> { >>>>>>>>> +??????? UseUnalignedLoadStores = true; >>>>>>>>> +????? } >>>>>>>>> +????? if (supports_bmi2() && >>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>> +??????? UseBMI2Instructions = true; >>>>>>>>> +????? } >>>>>>>>> +????? if (MaxVectorSize > 32) { >>>>>>>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>> +????? } >>>>>>>>> +????? if (UseSHA) { >>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>>>>> +????????? warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>> functions not available on this CPU."); >>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> +??????? } >>>>>>>>> +????? } >>>>>>>>> +#ifdef COMPILER2 >>>>>>>>> +????? if (supports_sse4_2()) { >>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>> +??????? } >>>>>>>>> +????? } >>>>>>>>> +#endif >>>>>>>>> +??? } >>>>>>>>> ?????? } >>>>>>>>> >>>>>>>>> ?????? if( is_intel() ) { // Intel cpus specific settings >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>> ?????????? result |= CPU_CLMUL; >>>>>>>>> ???????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>> ?????????? result |= CPU_RTM; >>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> +?????? result |= CPU_ADX; >>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> +????? result |= CPU_BMI2; >>>>>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> +????? result |= CPU_SHA; >>>>>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> +????? result |= CPU_FMA; >>>>>>>>> >>>>>>>>> ???????? // AMD features. >>>>>>>>> ???????? if (is_amd()) { >>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>> ???????????? result |= CPU_LZCNT; >>>>>>>>> ?????????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>> ???????????? result |= CPU_SSE4A; >>>>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>> +??????? result |= CPU_HT; >>>>>>>>> ???????? } >>>>>>>>> ???????? // Intel features. >>>>>>>>> ???????? if(is_intel()) { >>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> -???????? result |= CPU_ADX; >>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> -??????? result |= CPU_BMI2; >>>>>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> -??????? result |= CPU_SHA; >>>>>>>>> ?????????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>> ???????????? result |= CPU_LZCNT; >>>>>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> -??????? result |= CPU_FMA; >>>>>>>>> ?????????? // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>> indicates >>>>>>>>> support for prefetchw >>>>>>>>> ?????????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>> ???????????? result |= CPU_3DNOW_PREFETCH; >>>>>>>>> >>>>>>>>> ************************************************************** >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>> >>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>>> which >>>>>>>>>>>>>> sets >>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and >>>>>>>>>>>>>> help us >>>>>>>>>>>>>> with >>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>> outside the >>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>> >>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>>> servers. >>>>>>>>>>>>> If >>>>>>>>>>>>> the >>>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>>> need >>>>>>>>>>>>> to >>>>>>>>>>>>> find >>>>>>>>>>>>> an >>>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>> default) and >>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to >>>>>>>>>>>>> comment on >>>>>>>>>>>>> testing >>>>>>>>>>>>> requirements. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks David, >>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>> ??????????? } >>>>>>>>>>>> ??????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>> ????????? } >>>>>>>>>>>> +??? if (supports_sha()) { >>>>>>>>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>> +????? } >>>>>>>>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>> || >>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> +??????? warning("SHA instructions are not available on this >>>>>>>>>>>> CPU"); >>>>>>>>>>>> +????? } >>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> +??? } >>>>>>>>>>>> >>>>>>>>>>>> ????????? // some defaults for AMD family 15h >>>>>>>>>>>> ????????? if ( cpu_family() == 0x15 ) { >>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>> ????????? } >>>>>>>>>>>> >>>>>>>>>>>> ????? #ifdef COMPILER2 >>>>>>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>>>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>> ??????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>> ????????? } >>>>>>>>>>>> ????? #endif // COMPILER2 >>>>>>>>>>>> + >>>>>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>>>>> +????? // On family 17h processors use XMM and >>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>> for >>>>>>>>>>>> Array Copy >>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>> { >>>>>>>>>>>> +??????? UseXMMForArrayCopy = true; >>>>>>>>>>>> +????? } >>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>> { >>>>>>>>>>>> +??????? UseUnalignedLoadStores = true; >>>>>>>>>>>> +????? } >>>>>>>>>>>> +????? if (supports_bmi2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>> { >>>>>>>>>>>> +??????? UseBMI2Instructions = true; >>>>>>>>>>>> +????? } >>>>>>>>>>>> +????? if (MaxVectorSize > 32) { >>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>> +????? } >>>>>>>>>>>> +????? if (UseSHA) { >>>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>>>>>>>> +????????? warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>> crypto hash >>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> +??????? } >>>>>>>>>>>> +????? } >>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>> +????? if (supports_sse4_2()) { >>>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>> +??????? } >>>>>>>>>>>> +????? } >>>>>>>>>>>> +#endif >>>>>>>>>>>> +??? } >>>>>>>>>>>> ??????? } >>>>>>>>>>>> >>>>>>>>>>>> ??????? if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>> ????????????? result |= CPU_LZCNT; >>>>>>>>>>>> ??????????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>> ????????????? result |= CPU_SSE4A; >>>>>>>>>>>> +????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> +??????? result |= CPU_BMI2; >>>>>>>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>> +??????? result |= CPU_HT; >>>>>>>>>>>> +????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> +??????? result |= CPU_ADX; >>>>>>>>>>>> +????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> +??????? result |= CPU_SHA; >>>>>>>>>>>> +????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> +??????? result |= CPU_FMA; >>>>>>>>>>>> ????????? } >>>>>>>>>>>> ????????? // Intel features. >>>>>>>>>>>> ????????? if(is_intel()) { >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Rohit >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> From coleen.phillimore at oracle.com Tue Sep 5 17:59:29 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 5 Sep 2017 13:59:29 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AE93EE.2010600@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> <59AE74AF.70308@oracle.com> <92BFF96B-BB27-4740-9C8B-6BDC3EAC31F7@oracle.com> <59AE93EE.2010600@oracle.com> Message-ID: <3e795f9e-c778-fcef-3718-6f33bbdbd371@oracle.com> On 9/5/17 8:09 AM, Erik ?sterlund wrote: > Hi Kim, > > On 2017-09-05 12:31, Kim Barrett wrote: >>> On Sep 5, 2017, at 10:55 AM, Erik ?sterlund >>> wrote: >>> On 2017-09-05 02:38, Kim Barrett wrote: >>>> [?] Erik's proposal also uses IntegralContant for that purpose, >>>> but (from a quick skim) I think got that part wrong, and that is a >>>> source of additional complexity. Erik might want to re-read my handoff >>>> email to him.? I don't know whether that approach would satisfy folks >>>> though. >>> Unfortunately that approach did not work. It passed IntegralConstant >>> as rvalue to operator(), which is illegal as IntegralConstant is an >>> AllStatic class. >> IntegralConstant is an AllStatic?? That?s just wrong!? And probably >> my fault too. > > It is indeed. > >> // A type n is a model of Integral Constant if it meets the following >> // requirements: >> // >> [?] >> // n::value_type const c = n() : c == n::value > > Fair point. That should probably be fixed if somebody wants to pass > around IntegralConstant by value in some later RFE. > >> >>>> I'm a little bit reluctant to just give up on per-platform >>>> microoptimized inc/dec and simply transform those operations into >>>> corresponding add operations.? Requiring an additional register and >>>> it's initialization for some platforms seems like poor form. >>> If we insist on having these micro optimizations, I am not opposed >>> to on selected x86 GCC platforms (were the risk for future ABI >>> breakage is nearly zero) using GCC intrinsics for Atomic::add. If >>> Atomic::inc/dec simply calls Atomic::add, then those GCC intrinsics >>> will be able to micro optimize as you desire to the optimal lock add >>> encoding. >> I think I?d be okay with that too. > > Okay, great. So far it sounds like as for Atomic::inc/dec, there are > no loud voices against the idea of removing the Atomic::inc/dec > specializations. So I propose this new webrev that does exactly that. > > Full webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01/ > > Incremental over last webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00_01/ > > I hope this looks simpler. Erik, This looks a ton better to me.?? With the simplification, the template parameter's being D rather than T don't cause as much of a headache. +template +inline void Atomic::inc(D volatile* dest) { + STATIC_ASSERT(IsPointer::value || IsIntegral::value); + typedef typename Conditional::value, ptrdiff_t, D>::type T; + Atomic::add(T(1), dest); +} + +template +inline void Atomic::dec(D volatile* dest) { + STATIC_ASSERT(IsPointer::value || IsIntegral::value); + typedef typename Conditional::value, ptrdiff_t, D>::type T; + // Assumes two's complement integer representation. + #pragma warning(suppress: 4146) + Atomic::add(T(-1), dest); +} + It's taking me too long to parse the metaprogramming trick with T.??? What does this do??? Please write a one line comment.? I am guessing that this is going to scale the pointer case by the size of the pointed to value.? So for pointer it becomes: Atomic::add(ptrdiff_t(1), dest)) ?? this doesn't do that ? Thank you for making this simpler.? I agree with Andrews assessment about accidental complexity.? This seems like something that shouldn't be difficult. Lastly, does this deprecate inc_ptr()??? After this change, can you file an RFE? Coleen > > Thanks, > /Erik From coleen.phillimore at oracle.com Tue Sep 5 19:36:49 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 5 Sep 2017 15:36:49 -0400 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> Message-ID: <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> I was going to make the same comment about the friend declaration in v1, so v2 looks better to me.? Looks good.? Thank you for finding a solution to this problem that we've had for a long time.? I will sponsor this (remind me if I forget after the 18th). thanks, Coleen On 9/5/17 1:17 PM, Vladimir Kozlov wrote: > On 9/5/17 9:49 AM, Volker Simonis wrote: >> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov >> wrote: >>> May be add new CodeBlob's method to adjust sizes instead of directly >>> setting >>> them in? CodeCache::free_unused_tail(). Then you would not need >>> friend class >>> CodeCache in CodeBlob. >>> >> >> Changed as suggested (I didn't liked the friend declaration as well :) >> >>> Also I think adjustment to header_size should be done in >>> CodeCache::free_unused_tail() to limit scope of code who knows about >>> blob >>> layout. >>> >> >> Yes, that's much cleaner. Please find the updated webrev here: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ > > Good. > >> >> I've also found another "day 1" problem in StubQueue::next(): >> >> ??? Stub* next(Stub* s) const????????????????????? { int i = >> index_of(s) + stub_size(s); >> -?????????????????????????????????????????????????? if (i == >> _buffer_limit) i = 0; >> +?????????????????????????????????????????????????? // Only wrap >> around in the non-contiguous case (see stubss.cpp) >> +?????????????????????????????????????????????????? if (i == >> _buffer_limit && _queue_end < _buffer_limit) i = 0; >> ???????????????????????????????????????????????????? return (i == >> _queue_end) ? NULL : stub_at(i); >> ?????????????????????????????????????????????????? } >> >> The problem was that the method was not prepared to handle the case >> where _buffer_limit == _queue_end == _buffer_size which lead to an >> infinite recursion when iterating over a StubQueue with >> StubQueue::next() until next() returns NULL (as this was for example >> done with -XX:+PrintInterpreter). But with the new, trimmed CodeBlob >> we run into exactly this situation. > > Okay. > >> >> While doing this last fix I also noticed that "StubQueue::stubs_do()", >> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't seem >> to be used anywhere in the open code base (please correct me if I'm >> wrong). What do you think, maybe we should remove this code in a >> follow up change if it is really not needed? > > register_queue() is used in constructor. Other 2 you can remove. > stub_code_begin() and stub_code_end() are not used too -remove. > I thought we run on linux with flag which warn about unused code. > >> >> Finally, could you please run the new version through JPRT and sponsor >> it once jdk10/hs will be opened again? > > Will do when jdk10 "consolidation" is finished. Please, remind me > later if I forget. > > Thanks, > Vladimir > >> >> Thanks, >> Volker >> >>> Thanks, >>> Vladimir >>> >>> >>> On 9/1/17 8:46 AM, Volker Simonis wrote: >>>> >>>> Hi, >>>> >>>> I've decided to split the fix for the 'CodeHeap::contains_blob()' >>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails >>>> because of problems in CodeHeap::contains_blob()" >>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new >>>> review thread for discussing it at: >>>> >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >>>> >>>> >>>> So please lets keep this thread for discussing the interpreter code >>>> size issue only. I've prepared a new version of the webrev which is >>>> the same as the first one with the only difference that the change to >>>> 'CodeHeap::contains_blob()' has been removed: >>>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >>>> >>>> Thanks, >>>> Volker >>>> >>>> >>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >>>> wrote: >>>>> >>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >>>>> wrote: >>>>>> >>>>>> Very good change. Thank you, Volker. >>>>>> >>>>>> About contains_blob(). The problem is that AOTCompiledMethod >>>>>> allocated >>>>>> in >>>>>> CHeap and not in aot code section (which is RO): >>>>>> >>>>>> >>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>>>> >>>>>> >>>>>> It is allocated in CHeap after AOT library is loaded. Its >>>>>> code_begin() >>>>>> points to AOT code section but AOTCompiledMethod* points outside >>>>>> it (to >>>>>> normal malloced space) so you can't use (char*)blob address. >>>>>> >>>>> >>>>> Thanks for the explanation - now I got it. >>>>> >>>>>> There are 2 ways to fix it, I think. >>>>>> One is to add new field to CodeBlobLayout and set it to blob* >>>>>> address >>>>>> for >>>>>> normal CodeCache blobs and to code_begin for AOT code. >>>>>> Second is to use contains(blob->code_end() - 1) assuming that AOT >>>>>> code >>>>>> is >>>>>> never zero. >>>>>> >>>>> >>>>> I'll give it a try tomorrow and will send out a new webrev. >>>>> >>>>> Regards, >>>>> Volker >>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> >>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> While working on this, I found another problem which is >>>>>>>>> related to >>>>>>>>> the >>>>>>>>> fix of JDK-8183573 and leads to crashes when executing the >>>>>>>>> JTreg test >>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>>>>> >>>>>>>>> The problem is that JDK-8183573 replaced >>>>>>>>> >>>>>>>>> ????? virtual bool contains_blob(const CodeBlob* blob) const { >>>>>>>>> return >>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>>>>>> >>>>>>>>> by: >>>>>>>>> >>>>>>>>> ????? bool contains_blob(const CodeBlob* blob) const { return >>>>>>>>> contains(blob->code_begin()); } >>>>>>>>> >>>>>>>>> But that my be wrong in the corner case where the size of the >>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of >>>>>>>>> the >>>>>>>>> 'header' - i.e. the C++ object itself) because in that case >>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header >>>>>>>>> which >>>>>>>>> is a memory location which doesn't belong to the CodeBlob >>>>>>>>> anymore. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I recall this change was somehow necessary to allow merging >>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>>>>>> one devirtualized method, so you need to ensure all AOT tests >>>>>>>> pass with this change (on linux-x64). >>>>>>>> >>>>>>> >>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>>>>>> successful. Are there any other tests I should check? >>>>>>> >>>>>>> That said, it is a little hard to follow the stages of your >>>>>>> change. It >>>>>>> seems like >>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>>>>> was reviewed [1] but then finally the slightly changed version from >>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ >>>>>>> was >>>>>>> checked in and linked to the bug report. >>>>>>> >>>>>>> The first, reviewed version of the change still had a correct >>>>>>> version >>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the >>>>>>> second, >>>>>>> checked in version has the faulty version of that method. >>>>>>> >>>>>>> I don't know why you finally did that change to >>>>>>> 'contains_blob()' but >>>>>>> I don't see any reason why we shouldn't be able to directly use the >>>>>>> blob's address for inclusion checking. From what I understand, it >>>>>>> should ALWAYS be contained in the corresponding CodeHeap so no >>>>>>> reason >>>>>>> to mess with 'CodeBlob::code_begin()'. >>>>>>> >>>>>>> Please let me know if I'm missing something. >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>>>>> >>>>>>> >>>>>>>> I can't help to wonder if we'd not be better served by disallowing >>>>>>>> zero-sized payloads. Is this something that can ever actually >>>>>>>> happen except by abuse of the white box API? >>>>>>>> >>>>>>> >>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) >>>>>>> specifically >>>>>>> wants to allocate "segment sized" blocks which is most easily >>>>>>> achieved >>>>>>> by allocation zero-sized CodeBlobs. And I think there's nothing >>>>>>> wrong >>>>>>> about it if we handle the inclusion tests correctly. >>>>>>> >>>>>>> Thank you and best regards, >>>>>>> Volker >>>>>>> >>>>>>>> /Claes From coleen.phillimore at oracle.com Tue Sep 5 19:39:41 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 5 Sep 2017 15:39:41 -0400 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> Message-ID: On 9/5/17 1:17 PM, Vladimir Kozlov wrote: >> >> Finally, could you please run the new version through JPRT and sponsor >> it once jdk10/hs will be opened again? > > Will do when jdk10 "consolidation" is finished. Please, remind me > later if I forget. I didn't see this - Vladimir, you can sponsor. Thanks, Coleen From david.holmes at oracle.com Wed Sep 6 02:11:10 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 6 Sep 2017 12:11:10 +1000 Subject: [RFR]: 8187227: __m68k_cmpxchg() is not being used correctly In-Reply-To: References: Message-ID: Hi Adrian, Not really a review but I was curious about this ... On 5/09/2017 8:00 PM, John Paul Adrian Glaubitz wrote: > Hi! > > Please review the changeset in [1] which fixes the incorrect use of > __m68k_cmpxchg(). > > The description for this change is [2]: > > ============================================================= > > On m68k, linux-zero has platform-specific implementations for > compare_and_swap(), add_and_fetch() and lock_test_and_set(). These > functions are all using __m68k_cmpxchg() which is basically a wrapper > around the m68k assembly instruction CAS. > > Currently, all three functions make incorrect assumptions about how CAS > and its wrapper actually work and consequently use __m68k_cmpxchg() > incorrectly. The source code comment for __m68_cmpxchg() states: > > ? * Atomically store newval in *ptr if *ptr is equal to oldval for user > space. > ? * Returns newval on success and oldval if no exchange happened. > ? * This implementation is processor specific and works on > ? * 68020 68030 68040 and 68060. > > However, looking at the documentation for the CAS instruction on m68k > [1] and the implementation of __m68k_cmpxchg(), this is actually not how > the function works. It does not return the update value on a successful > exchange but rather the contents the compare operand, i.e. oldval. If no > exchange happened, it will actually return the contents of the memory > location. newval is never returned and consequently testing for "newval" > in compare_and_swap(), add_and_fetch() and lock_test_and_set() is a bug. I am surprised this even works at all. So trying to follow the logic if initially "prev == oldval" then the cas actually succeeds, but the loop logic thinks it failed and so retries. It re-reads the current value into prev, which no longer equals oldval, so the loop terminates and it returns "prev" which may actually be the value that was updated by the successful cas; or it could be a different value if some other thread has since also performed a successful cas. So this function would always report failure, even on success (except in ABA situation)! I can't see how anything would work in that case ?? BTW m68k_compare_and_swap does not need to have a loop at all, it only has to do the cas and return the correct value. A loop would only be needed if the low-level cas can fail spuriously - which does not seem to be the case. Cheers, David > I have preapred a patch that fixes this issue by making correct use of > __m68k_cmpxchg() in compare_and_swap(), add_and_fetch() and > lock_test_and_set(). This patch has been tested to work on Debian m68k. > >> [1] http://68k.hax.com/CAS > > ============================================================= > >> [1] http://cr.openjdk.java.net/~glaubitz/8187227/webrev.01/ >> [2] https://bugs.openjdk.java.net/browse/JDK-8187227 > From volker.simonis at gmail.com Wed Sep 6 13:20:03 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 6 Sep 2017 15:20:03 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> Message-ID: On Tue, Sep 5, 2017 at 9:36 PM, wrote: > > I was going to make the same comment about the friend declaration in v1, so > v2 looks better to me. Looks good. Thank you for finding a solution to > this problem that we've had for a long time. I will sponsor this (remind me > if I forget after the 18th). > Thanks Coleen! I've updated http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ in-place and added you as a second reviewer. Regards, Volker > thanks, > Coleen > > > > On 9/5/17 1:17 PM, Vladimir Kozlov wrote: >> >> On 9/5/17 9:49 AM, Volker Simonis wrote: >>> >>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov >>> wrote: >>>> >>>> May be add new CodeBlob's method to adjust sizes instead of directly >>>> setting >>>> them in CodeCache::free_unused_tail(). Then you would not need friend >>>> class >>>> CodeCache in CodeBlob. >>>> >>> >>> Changed as suggested (I didn't liked the friend declaration as well :) >>> >>>> Also I think adjustment to header_size should be done in >>>> CodeCache::free_unused_tail() to limit scope of code who knows about >>>> blob >>>> layout. >>>> >>> >>> Yes, that's much cleaner. Please find the updated webrev here: >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ >> >> >> Good. >> >>> >>> I've also found another "day 1" problem in StubQueue::next(): >>> >>> Stub* next(Stub* s) const { int i = >>> index_of(s) + stub_size(s); >>> - if (i == >>> _buffer_limit) i = 0; >>> + // Only wrap >>> around in the non-contiguous case (see stubss.cpp) >>> + if (i == >>> _buffer_limit && _queue_end < _buffer_limit) i = 0; >>> return (i == >>> _queue_end) ? NULL : stub_at(i); >>> } >>> >>> The problem was that the method was not prepared to handle the case >>> where _buffer_limit == _queue_end == _buffer_size which lead to an >>> infinite recursion when iterating over a StubQueue with >>> StubQueue::next() until next() returns NULL (as this was for example >>> done with -XX:+PrintInterpreter). But with the new, trimmed CodeBlob >>> we run into exactly this situation. >> >> >> Okay. >> >>> >>> While doing this last fix I also noticed that "StubQueue::stubs_do()", >>> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't seem >>> to be used anywhere in the open code base (please correct me if I'm >>> wrong). What do you think, maybe we should remove this code in a >>> follow up change if it is really not needed? >> >> >> register_queue() is used in constructor. Other 2 you can remove. >> stub_code_begin() and stub_code_end() are not used too -remove. >> I thought we run on linux with flag which warn about unused code. >> >>> >>> Finally, could you please run the new version through JPRT and sponsor >>> it once jdk10/hs will be opened again? >> >> >> Will do when jdk10 "consolidation" is finished. Please, remind me later if >> I forget. >> >> Thanks, >> Vladimir >> >>> >>> Thanks, >>> Volker >>> >>>> Thanks, >>>> Vladimir >>>> >>>> >>>> On 9/1/17 8:46 AM, Volker Simonis wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I've decided to split the fix for the 'CodeHeap::contains_blob()' >>>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails >>>>> because of problems in CodeHeap::contains_blob()" >>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new >>>>> review thread for discussing it at: >>>>> >>>>> >>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >>>>> >>>>> So please lets keep this thread for discussing the interpreter code >>>>> size issue only. I've prepared a new version of the webrev which is >>>>> the same as the first one with the only difference that the change to >>>>> 'CodeHeap::contains_blob()' has been removed: >>>>> >>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >>>>> >>>>> Thanks, >>>>> Volker >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >>>>> wrote: >>>>>> >>>>>> >>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Very good change. Thank you, Volker. >>>>>>> >>>>>>> About contains_blob(). The problem is that AOTCompiledMethod >>>>>>> allocated >>>>>>> in >>>>>>> CHeap and not in aot code section (which is RO): >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>>>>> >>>>>>> It is allocated in CHeap after AOT library is loaded. Its >>>>>>> code_begin() >>>>>>> points to AOT code section but AOTCompiledMethod* points outside it >>>>>>> (to >>>>>>> normal malloced space) so you can't use (char*)blob address. >>>>>>> >>>>>> >>>>>> Thanks for the explanation - now I got it. >>>>>> >>>>>>> There are 2 ways to fix it, I think. >>>>>>> One is to add new field to CodeBlobLayout and set it to blob* address >>>>>>> for >>>>>>> normal CodeCache blobs and to code_begin for AOT code. >>>>>>> Second is to use contains(blob->code_end() - 1) assuming that AOT >>>>>>> code >>>>>>> is >>>>>>> never zero. >>>>>>> >>>>>> >>>>>> I'll give it a try tomorrow and will send out a new webrev. >>>>>> >>>>>> Regards, >>>>>> Volker >>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> >>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> While working on this, I found another problem which is related to >>>>>>>>>> the >>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg >>>>>>>>>> test >>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>>>>>> >>>>>>>>>> The problem is that JDK-8183573 replaced >>>>>>>>>> >>>>>>>>>> virtual bool contains_blob(const CodeBlob* blob) const { >>>>>>>>>> return >>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>>>>>>> >>>>>>>>>> by: >>>>>>>>>> >>>>>>>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>>>>>>> contains(blob->code_begin()); } >>>>>>>>>> >>>>>>>>>> But that my be wrong in the corner case where the size of the >>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>>>>>>> 'header' - i.e. the C++ object itself) because in that case >>>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header >>>>>>>>>> which >>>>>>>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I recall this change was somehow necessary to allow merging >>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>>>>>>> one devirtualized method, so you need to ensure all AOT tests >>>>>>>>> pass with this change (on linux-x64). >>>>>>>>> >>>>>>>> >>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>>>>>>> successful. Are there any other tests I should check? >>>>>>>> >>>>>>>> That said, it is a little hard to follow the stages of your change. >>>>>>>> It >>>>>>>> seems like >>>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>>>>>> was reviewed [1] but then finally the slightly changed version from >>>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ >>>>>>>> was >>>>>>>> checked in and linked to the bug report. >>>>>>>> >>>>>>>> The first, reviewed version of the change still had a correct >>>>>>>> version >>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>>>>>>> checked in version has the faulty version of that method. >>>>>>>> >>>>>>>> I don't know why you finally did that change to 'contains_blob()' >>>>>>>> but >>>>>>>> I don't see any reason why we shouldn't be able to directly use the >>>>>>>> blob's address for inclusion checking. From what I understand, it >>>>>>>> should ALWAYS be contained in the corresponding CodeHeap so no >>>>>>>> reason >>>>>>>> to mess with 'CodeBlob::code_begin()'. >>>>>>>> >>>>>>>> Please let me know if I'm missing something. >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> >>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>>>>>> >>>>>>>>> I can't help to wonder if we'd not be better served by disallowing >>>>>>>>> zero-sized payloads. Is this something that can ever actually >>>>>>>>> happen except by abuse of the white box API? >>>>>>>>> >>>>>>>> >>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>>>>>>> wants to allocate "segment sized" blocks which is most easily >>>>>>>> achieved >>>>>>>> by allocation zero-sized CodeBlobs. And I think there's nothing >>>>>>>> wrong >>>>>>>> about it if we handle the inclusion tests correctly. >>>>>>>> >>>>>>>> Thank you and best regards, >>>>>>>> Volker >>>>>>>> >>>>>>>>> /Claes > > From aph at redhat.com Wed Sep 6 13:20:03 2017 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Sep 2017 14:20:03 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AE5AA4.1070803@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> <59AD55AC.4030105@oracle.com> <3a6fbae3-cddb-6ac0-890d-da4b33308b5e@redhat.com> <59AE5AA4.1070803@oracle.com> Message-ID: On 05/09/17 09:04, Erik ?sterlund wrote: > For example, our atomics typically conservatively guarantees > bidirectional full fencing, while theirs does not. Firstly, we can insert whatever fences we want, using intrinsics. We don't need assembly language to do that. Secondly, I don't see bidirectional full fencing in x86 atomics, and I don't think we really want bidirectional full fencing anyway. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at linaro.org Wed Sep 6 13:55:52 2017 From: felix.yang at linaro.org (Felix Yang) Date: Wed, 6 Sep 2017 21:55:52 +0800 Subject: Question regarding "native-compiled frame" Message-ID: Hi, Can some help explain what is a so called "native-compiled frame" in hotspot please? This is mentioned in the frame::sender function which is defined in file frame_x86.cpp: ---------------------------------------------------------------------------------------------------------- frame frame::sender(RegisterMap* map) const { // Default is we done have to follow them. The sender_for_xxx will // update it accordingly map->set_include_argument_oops(false); if (is_entry_frame()) return sender_for_entry_frame(map); if (is_interpreted_frame()) return sender_for_interpreter_frame(map); assert(_cb == CodeCache::find_blob(pc()),"Must be the same"); if (_cb != NULL) { return sender_for_compiled_frame(map); } // Must be native-compiled frame, i.e. the marshaling code for native // methods that exists in the core system. => return frame(sender_sp(), link(), sender_pc()); } ---------------------------------------------------------------------------------------------------------- I did some experiments and found that the last "return" statement of the function never gets executed when I do a specjbb2005 & specjbb2015 test. From the comments in the code, this "return" statement handles the "native-compiled frame" case. I am curious about the condition of generating a native-compile frame in hotspot. Thanks for your help, Felix From volker.simonis at gmail.com Wed Sep 6 14:02:53 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 6 Sep 2017 16:02:53 +0200 Subject: RFR(XS): 8187280: Remove unused methods from StubQueue Message-ID: Hi, can I please get a review and a sponsor for the following trivial change: http://cr.openjdk.java.net/~simonis/webrevs/2017/8187280/ https://bugs.openjdk.java.net/browse/JDK-8187280 While working on "8166317: InterpreterCodeSize should be computed" I found that several methods on StubQueue are not used any more in the current code base. As StubQueue's code is "more subtle than it looks" (see stubs.cpp) I think it helps to at least remove the unused parts :) Tested by doing a product/slowdebug build and running the hotspot regression tests. Thank you and best regards, Volker From aph at redhat.com Wed Sep 6 14:10:31 2017 From: aph at redhat.com (Andrew Haley) Date: Wed, 6 Sep 2017 15:10:31 +0100 Subject: Question regarding "native-compiled frame" In-Reply-To: References: Message-ID: On 06/09/17 14:55, Felix Yang wrote: > Can some help explain what is a so called "native-compiled frame" in > hotspot please? Compiled by C++. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Wed Sep 6 14:13:03 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 6 Sep 2017 16:13:03 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> <59AD55AC.4030105@oracle.com> <3a6fbae3-cddb-6ac0-890d-da4b33308b5e@redhat.com> <59AE5AA4.1070803@oracle.com> Message-ID: <59B0026F.3040406@oracle.com> Hi Andrew, On 2017-09-06 15:20, Andrew Haley wrote: > On 05/09/17 09:04, Erik ?sterlund wrote: > >> For example, our atomics typically conservatively guarantees >> bidirectional full fencing, while theirs does not. > Firstly, we can insert whatever fences we want, using intrinsics. We > don't need assembly language to do that. Since I thought we already had (and finished) that discussion, and it is no longer relevant to the current proposal of removing inc/dec specializations, I hope you are okay with me preferring not to re-open that discussion in this RFE. Another day, perhaps? > Secondly, I don't see bidirectional full fencing in x86 atomics, and I > don't think we really want bidirectional full fencing anyway. That is because an atomic x86 locked instruction is observably equivalent to having bidirectional fencing surrounding the access due to the stronger memory model of the machine. Thanks, /Erik From volker.simonis at gmail.com Wed Sep 6 14:16:02 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 6 Sep 2017 16:16:02 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> Message-ID: On Tue, Sep 5, 2017 at 7:17 PM, Vladimir Kozlov wrote: > On 9/5/17 9:49 AM, Volker Simonis wrote: >> >> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov >> wrote: >>> >>> May be add new CodeBlob's method to adjust sizes instead of directly >>> setting >>> them in CodeCache::free_unused_tail(). Then you would not need friend >>> class >>> CodeCache in CodeBlob. >>> >> >> Changed as suggested (I didn't liked the friend declaration as well :) >> >>> Also I think adjustment to header_size should be done in >>> CodeCache::free_unused_tail() to limit scope of code who knows about blob >>> layout. >>> >> >> Yes, that's much cleaner. Please find the updated webrev here: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ > > > Good. > >> >> I've also found another "day 1" problem in StubQueue::next(): >> >> Stub* next(Stub* s) const { int i = >> index_of(s) + stub_size(s); >> - if (i == >> _buffer_limit) i = 0; >> + // Only wrap >> around in the non-contiguous case (see stubss.cpp) >> + if (i == >> _buffer_limit && _queue_end < _buffer_limit) i = 0; >> return (i == >> _queue_end) ? NULL : stub_at(i); >> } >> >> The problem was that the method was not prepared to handle the case >> where _buffer_limit == _queue_end == _buffer_size which lead to an >> infinite recursion when iterating over a StubQueue with >> StubQueue::next() until next() returns NULL (as this was for example >> done with -XX:+PrintInterpreter). But with the new, trimmed CodeBlob >> we run into exactly this situation. > > > Okay. > >> >> While doing this last fix I also noticed that "StubQueue::stubs_do()", >> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't seem >> to be used anywhere in the open code base (please correct me if I'm >> wrong). What do you think, maybe we should remove this code in a >> follow up change if it is really not needed? > > > register_queue() is used in constructor. Other 2 you can remove. > stub_code_begin() and stub_code_end() are not used too -remove. > I thought we run on linux with flag which warn about unused code. > Yes, but register_queue() is only required for iterating over all queues with queues_do() so we can remove it as well if we remove queues_do(). I've opened "8187280: Remove unused methods from StubQueue" for this and started a new review thread at: http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028283.html Please let's follow up on this issue there. >> >> Finally, could you please run the new version through JPRT and sponsor >> it once jdk10/hs will be opened again? > > > Will do when jdk10 "consolidation" is finished. Please, remind me later if I > forget. > Thanks, Volker > Thanks, > Vladimir > > >> >> Thanks, >> Volker >> >>> Thanks, >>> Vladimir >>> >>> >>> On 9/1/17 8:46 AM, Volker Simonis wrote: >>>> >>>> >>>> Hi, >>>> >>>> I've decided to split the fix for the 'CodeHeap::contains_blob()' >>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails >>>> because of problems in CodeHeap::contains_blob()" >>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new >>>> review thread for discussing it at: >>>> >>>> >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >>>> >>>> So please lets keep this thread for discussing the interpreter code >>>> size issue only. I've prepared a new version of the webrev which is >>>> the same as the first one with the only difference that the change to >>>> 'CodeHeap::contains_blob()' has been removed: >>>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >>>> >>>> Thanks, >>>> Volker >>>> >>>> >>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >>>> wrote: >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >>>>> wrote: >>>>>> >>>>>> >>>>>> Very good change. Thank you, Volker. >>>>>> >>>>>> About contains_blob(). The problem is that AOTCompiledMethod allocated >>>>>> in >>>>>> CHeap and not in aot code section (which is RO): >>>>>> >>>>>> >>>>>> >>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>>>> >>>>>> It is allocated in CHeap after AOT library is loaded. Its code_begin() >>>>>> points to AOT code section but AOTCompiledMethod* points outside it >>>>>> (to >>>>>> normal malloced space) so you can't use (char*)blob address. >>>>>> >>>>> >>>>> Thanks for the explanation - now I got it. >>>>> >>>>>> There are 2 ways to fix it, I think. >>>>>> One is to add new field to CodeBlobLayout and set it to blob* address >>>>>> for >>>>>> normal CodeCache blobs and to code_begin for AOT code. >>>>>> Second is to use contains(blob->code_end() - 1) assuming that AOT code >>>>>> is >>>>>> never zero. >>>>>> >>>>> >>>>> I'll give it a try tomorrow and will send out a new webrev. >>>>> >>>>> Regards, >>>>> Volker >>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> >>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> While working on this, I found another problem which is related to >>>>>>>>> the >>>>>>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg >>>>>>>>> test >>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>>>>> >>>>>>>>> The problem is that JDK-8183573 replaced >>>>>>>>> >>>>>>>>> virtual bool contains_blob(const CodeBlob* blob) const { >>>>>>>>> return >>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>>>>>> >>>>>>>>> by: >>>>>>>>> >>>>>>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>>>>>> contains(blob->code_begin()); } >>>>>>>>> >>>>>>>>> But that my be wrong in the corner case where the size of the >>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>>>>>> 'header' - i.e. the C++ object itself) because in that case >>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header >>>>>>>>> which >>>>>>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I recall this change was somehow necessary to allow merging >>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>>>>>> one devirtualized method, so you need to ensure all AOT tests >>>>>>>> pass with this change (on linux-x64). >>>>>>>> >>>>>>> >>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>>>>>> successful. Are there any other tests I should check? >>>>>>> >>>>>>> That said, it is a little hard to follow the stages of your change. >>>>>>> It >>>>>>> seems like >>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>>>>> was reviewed [1] but then finally the slightly changed version from >>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ >>>>>>> was >>>>>>> checked in and linked to the bug report. >>>>>>> >>>>>>> The first, reviewed version of the change still had a correct version >>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>>>>>> checked in version has the faulty version of that method. >>>>>>> >>>>>>> I don't know why you finally did that change to 'contains_blob()' but >>>>>>> I don't see any reason why we shouldn't be able to directly use the >>>>>>> blob's address for inclusion checking. From what I understand, it >>>>>>> should ALWAYS be contained in the corresponding CodeHeap so no reason >>>>>>> to mess with 'CodeBlob::code_begin()'. >>>>>>> >>>>>>> Please let me know if I'm missing something. >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> >>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>>>>> >>>>>>>> I can't help to wonder if we'd not be better served by disallowing >>>>>>>> zero-sized payloads. Is this something that can ever actually >>>>>>>> happen except by abuse of the white box API? >>>>>>>> >>>>>>> >>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>>>>>> wants to allocate "segment sized" blocks which is most easily >>>>>>> achieved >>>>>>> by allocation zero-sized CodeBlobs. And I think there's nothing wrong >>>>>>> about it if we handle the inclusion tests correctly. >>>>>>> >>>>>>> Thank you and best regards, >>>>>>> Volker >>>>>>> >>>>>>>> /Claes From coleen.phillimore at oracle.com Wed Sep 6 16:04:45 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 6 Sep 2017 12:04:45 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle Message-ID: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> Summary: Add indirection for fetching mirror so that GC doesn't have to follow CLD::_klasses Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 changes. Ran nightly tests through Mach5 and RBT.?? Early performance testing showed good performance improvment in GC class loader data processing time, but nmethod processing time continues to dominate. Also performace testing showed no throughput regression.?? I'm rerunning both of these performance testing and will post the numbers. bug link https://bugs.openjdk.java.net/browse/JDK-8186777 local webrev at http://oklahoma.us.oracle.com/~cphillim/webrev/8186777.01/webrev Thanks, Coleen From vladimir.kozlov at oracle.com Wed Sep 6 16:14:34 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Sep 2017 09:14:34 -0700 Subject: RFR(XS): 8187280: Remove unused methods from StubQueue In-Reply-To: References: Message-ID: <63AB2BF3-F230-4A5E-963F-B9958FDA6F58@oracle.com> What about stub_code_begin and stub_code_end? Thanks Vladimir Sent from my iPhone > On Sep 6, 2017, at 7:02 AM, Volker Simonis wrote: > > Hi, > > can I please get a review and a sponsor for the following trivial change: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8187280/ > https://bugs.openjdk.java.net/browse/JDK-8187280 > > > While working on "8166317: InterpreterCodeSize should be computed" I > found that several methods on StubQueue are not used any more in the > current code base. As StubQueue's code is "more subtle than it looks" > (see stubs.cpp) I think it helps to at least remove the unused parts > :) > > Tested by doing a product/slowdebug build and running the hotspot > regression tests. > > Thank you and best regards, > Volker From volker.simonis at gmail.com Wed Sep 6 17:31:44 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 6 Sep 2017 19:31:44 +0200 Subject: RFR(XS): 8187280: Remove unused methods from StubQueue In-Reply-To: <63AB2BF3-F230-4A5E-963F-B9958FDA6F58@oracle.com> References: <63AB2BF3-F230-4A5E-963F-B9958FDA6F58@oracle.com> Message-ID: On Wed, Sep 6, 2017 at 6:14 PM, Vladimir Kozlov wrote: > What about stub_code_begin and stub_code_end? > You're right! Removed them as well: http://cr.openjdk.java.net/~simonis/webrevs/2017/8187280.v1/ Thanks, Volker > Thanks > Vladimir > > Sent from my iPhone > >> On Sep 6, 2017, at 7:02 AM, Volker Simonis wrote: >> >> Hi, >> >> can I please get a review and a sponsor for the following trivial change: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187280/ >> https://bugs.openjdk.java.net/browse/JDK-8187280 >> >> >> While working on "8166317: InterpreterCodeSize should be computed" I >> found that several methods on StubQueue are not used any more in the >> current code base. As StubQueue's code is "more subtle than it looks" >> (see stubs.cpp) I think it helps to at least remove the unused parts >> :) >> >> Tested by doing a product/slowdebug build and running the hotspot >> regression tests. >> >> Thank you and best regards, >> Volker > From vladimir.kozlov at oracle.com Wed Sep 6 20:06:47 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 6 Sep 2017 13:06:47 -0700 Subject: RFR(XS): 8187280: Remove unused methods from StubQueue In-Reply-To: References: <63AB2BF3-F230-4A5E-963F-B9958FDA6F58@oracle.com> Message-ID: Good Thanks Vladimir > On Sep 6, 2017, at 10:31 AM, Volker Simonis wrote: > > On Wed, Sep 6, 2017 at 6:14 PM, Vladimir Kozlov > wrote: >> What about stub_code_begin and stub_code_end? >> > > You're right! Removed them as well: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8187280.v1/ > > Thanks, > Volker > >> Thanks >> Vladimir >> >> Sent from my iPhone >> >>> On Sep 6, 2017, at 7:02 AM, Volker Simonis wrote: >>> >>> Hi, >>> >>> can I please get a review and a sponsor for the following trivial change: >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187280/ >>> https://bugs.openjdk.java.net/browse/JDK-8187280 >>> >>> >>> While working on "8166317: InterpreterCodeSize should be computed" I >>> found that several methods on StubQueue are not used any more in the >>> current code base. As StubQueue's code is "more subtle than it looks" >>> (see stubs.cpp) I think it helps to at least remove the unused parts >>> :) >>> >>> Tested by doing a product/slowdebug build and running the hotspot >>> regression tests. >>> >>> Thank you and best regards, >>> Volker >> From coleen.phillimore at oracle.com Thu Sep 7 01:47:27 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 6 Sep 2017 21:47:27 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> Message-ID: On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: > Summary: Add indirection for fetching mirror so that GC doesn't have > to follow CLD::_klasses > > Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 > changes. > > Ran nightly tests through Mach5 and RBT.?? Early performance testing > showed good performance improvment in GC class loader data processing > time, but nmethod processing time continues to dominate. Also > performace testing showed no throughput regression.?? I'm rerunning > both of these performance testing and will post the numbers. > > bug link https://bugs.openjdk.java.net/browse/JDK-8186777 > local webrev at > http://oklahoma.us.oracle.com/~cphillim/webrev/8186777.01/webrev Sorry, the open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev Coleen > > Thanks, > Coleen From david.holmes at oracle.com Thu Sep 7 02:10:59 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 7 Sep 2017 12:10:59 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AE93EE.2010600@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> <59AE74AF.70308@oracle.com> <92BFF96B-BB27-4740-9C8B-6BDC3EAC31F7@oracle.com> <59AE93EE.2010600@oracle.com> Message-ID: <724317c6-0773-a6ac-bfba-5b9e97ca85c6@oracle.com> On 5/09/2017 10:09 PM, Erik ?sterlund wrote: > Okay, great. So far it sounds like as for Atomic::inc/dec, there are no > loud voices against the idea of removing the Atomic::inc/dec > specializations. So I propose this new webrev that does exactly that. > > Full webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01/ > > Incremental over last webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00_01/ > > I hope this looks simpler. Yes this is much simpler. I am still totally dismayed by the complexity that was needed to retain the inc/dec specializations. To me it just screams that there is something fundamentally wrong with what was being done. :( I'm also somewhat perplexed. I can't read inline assembly fluently, but looking at the existing inc_ptr implementations, eg for x86, I'm not seeing code that adds 1*sizeof(*dest). ?? Thanks, David > Thanks, > /Erik From rohitarulraj at gmail.com Thu Sep 7 05:24:21 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Thu, 7 Sep 2017 10:54:21 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> Message-ID: Hello Vladimir, David, > >> >> You added: >> >> 526 if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> 527 result |= CPU_HT; >> >> and I'm wondering of there would be any case where this would not be >> covered by the earlier: >> >> 448 if (threads_per_core() > 1) >> 449 result |= CPU_HT; >> >> ? >> --- > > > Valid question. > Thanks for your review and comments. I have updated the patch to calculate threads per core by using the CPUID bit: CPUID_Fn8000001E_EBX [8:15]. Reference: https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf [Pg 82] CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. The number of threads per core is ThreadsPerCore+1. diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -70,7 +70,7 @@ bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, done, wrapup; + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, ext_cpuid8, done, wrapup; Label legacy_setup, save_restore_except, legacy_save_restore, start_simd_check; StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); @@ -272,9 +272,23 @@ __ jccb(Assembler::belowEqual, ext_cpuid5); __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? __ jccb(Assembler::belowEqual, ext_cpuid7); + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? + __ jccb(Assembler::belowEqual, ext_cpuid8); + // + // Extended cpuid(0x8000001E) + // + __ movl(rax, 0x8000001E); + __ cpuid(); + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid1E_offset()))); + __ movl(Address(rsi, 0), rax); + __ movl(Address(rsi, 4), rbx); + __ movl(Address(rsi, 8), rcx); + __ movl(Address(rsi,12), rdx); + // // Extended cpuid(0x80000008) // + __ bind(ext_cpuid8); __ movl(rax, 0x80000008); __ cpuid(); __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); @@ -1109,11 +1123,27 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); + } +#ifdef COMPILER2 + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -228,6 +228,15 @@ } bits; }; + union ExtCpuid1EEx { + uint32_t value; + struct { + uint32_t : 8, + threads_per_core : 8, + : 16; + } bits; + }; + union XemXcr0Eax { uint32_t value; struct { @@ -398,6 +407,12 @@ ExtCpuid8Ecx ext_cpuid8_ecx; uint32_t ext_cpuid8_edx; // reserved + // cpuid function 0x8000001E // AMD 17h + uint32_t ext_cpuid1E_eax; + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) + uint32_t ext_cpuid1E_ecx; + uint32_t ext_cpuid1E_edx; // unused currently + // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) XemXcr0Eax xem_xcr0_eax; uint32_t xem_xcr0_edx; // reserved @@ -505,6 +520,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -518,16 +541,8 @@ } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; @@ -590,6 +605,7 @@ static ByteSize ext_cpuid5_offset() { return byte_offset_of(CpuidInfo, ext_cpuid5_eax); } static ByteSize ext_cpuid7_offset() { return byte_offset_of(CpuidInfo, ext_cpuid7_eax); } static ByteSize ext_cpuid8_offset() { return byte_offset_of(CpuidInfo, ext_cpuid8_eax); } + static ByteSize ext_cpuid1E_offset() { return byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } static ByteSize tpl_cpuidB0_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } static ByteSize tpl_cpuidB1_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } static ByteSize tpl_cpuidB2_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } @@ -673,8 +689,11 @@ if (is_intel() && supports_processor_topology()) { result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / - cores_per_cpu(); + if (cpu_family() >= 0x17) + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; + else + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / + cores_per_cpu(); } return (result == 0 ? 1 : result); } I have attached the patch for review. Please let me know your comments. Thanks, Rohit > Thanks, > Vladimir > > >> >> src/cpu/x86/vm/vm_version_x86.cpp >> >> No comments on AMD specific changes. >> >> Thanks, >> David >> ----- >> >> On 5/09/2017 3:43 PM, David Holmes wrote: >>> >>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>> >>>> Hello David, >>>> >>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>> wrote: >>>>> >>>>> Hi Rohit, >>>>> >>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>>> >>>> >>>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>>> without any issues. >>>> Can you share the error message that you are getting? >>> >>> >>> I was getting this: >>> >>> applying hotspot.patch >>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>> Hunk #1 FAILED at 1108 >>> 1 out of 1 hunks FAILED -- saving rejects to file >>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>> Hunk #2 FAILED at 522 >>> 1 out of 2 hunks FAILED -- saving rejects to file >>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>> abort: patch failed to apply >>> >>> but I started again and this time it applied fine, so not sure what was >>> going on there. >>> >>> Cheers, >>> David >>> >>>> Regards, >>>> Rohit >>>> >>>> >>>>> >>>>> >>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> Hello Vladimir, >>>>>> >>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Rohit, >>>>>>> >>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hello Vladimir, >>>>>>>> >>>>>>>>> Changes look good. Only question I have is about MaxVectorSize. It >>>>>>>>> is >>>>>>>>> set >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> 16 only in presence of AVX: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>> >>>>>>>>> Does that code works for AMD 17h too? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. >>>>>>>> So >>>>>>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>>>>>> have updated, re-tested and attached the patch. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Which check you removed? >>>>>>> >>>>>> >>>>>> My older patch had the below mentioned check which was required on >>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>>> better in openJDK10. So this check is not required anymore. >>>>>> >>>>>> + // Some defaults for AMD family 17h >>>>>> + if ( cpu_family() == 0x17 ) { >>>>>> ... >>>>>> ... >>>>>> + if (MaxVectorSize > 32) { >>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>> + } >>>>>> .. >>>>>> .. >>>>>> + } >>>>>> >>>>>>>> >>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>> >>>>>>>> AMD 17h has support for SHA. >>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is there >>>>>>>> an >>>>>>>> underlying reason for this? I have handled this in the patch but >>>>>>>> just >>>>>>>> wanted to confirm. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>> instructions >>>>>>> to >>>>>>> calculate SHA-256: >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>> >>>>>>> I don't know if AMD 15h supports these instructions and can execute >>>>>>> that >>>>>>> code. You need to test it. >>>>>>> >>>>>> >>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>>>> it should work. >>>>>> Confirmed by running following sanity tests: >>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>> >>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>> >>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>> >>>>>> So I have removed those SHA checks from my patch too. >>>>>> >>>>>> Please find attached updated, re-tested patch. >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -1109,11 +1109,27 @@ >>>>>> } >>>>>> >>>>>> #ifdef COMPILER2 >>>>>> - if (MaxVectorSize > 16) { >>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> } >>>>>> #endif // COMPILER2 >>>>>> + >>>>>> + // Some defaults for AMD family 17h >>>>>> + if ( cpu_family() == 0x17 ) { >>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>> Array Copy >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>> + } >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>> { >>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>> + } >>>>>> +#ifdef COMPILER2 >>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> + } >>>>>> +#endif >>>>>> + } >>>>>> } >>>>>> >>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -505,6 +505,14 @@ >>>>>> result |= CPU_CLMUL; >>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>> result |= CPU_RTM; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> + result |= CPU_ADX; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> + result |= CPU_BMI2; >>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> + result |= CPU_SHA; >>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> + result |= CPU_FMA; >>>>>> >>>>>> // AMD features. >>>>>> if (is_amd()) { >>>>>> @@ -515,19 +523,13 @@ >>>>>> result |= CPU_LZCNT; >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>> result |= CPU_SSE4A; >>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>> + result |= CPU_HT; >>>>>> } >>>>>> // Intel features. >>>>>> if(is_intel()) { >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> - result |= CPU_ADX; >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> - result |= CPU_BMI2; >>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> - result |= CPU_SHA; >>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>> result |= CPU_LZCNT; >>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> - result |= CPU_FMA; >>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>> support for prefetchw >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>> >>>>>> Please let me know your comments. >>>>>> >>>>>> Thanks for your time. >>>>>> Rohit >>>>>> >>>>>>>> >>>>>>>> Thanks for taking time to review the code. >>>>>>>> >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>> } >>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>> } >>>>>>>> + if (supports_sha()) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>> + } >>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>> || >>>>>>>> UseSHA512Intrinsics) { >>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>> + } >>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } >>>>>>>> >>>>>>>> // some defaults for AMD family 15h >>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>> } >>>>>>>> >>>>>>>> #ifdef COMPILER2 >>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>> } >>>>>>>> #endif // COMPILER2 >>>>>>>> + >>>>>>>> + // Some defaults for AMD family 17h >>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>> for >>>>>>>> Array Copy >>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>> + } >>>>>>>> + if (supports_sse2() && >>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>> + } >>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>> { >>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>> + } >>>>>>>> + if (UseSHA) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>> functions not available on this CPU."); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +#ifdef COMPILER2 >>>>>>>> + if (supports_sse4_2()) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +#endif >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>> result |= CPU_CLMUL; >>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>> result |= CPU_RTM; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> + result |= CPU_ADX; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> + result |= CPU_BMI2; >>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> + result |= CPU_SHA; >>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> + result |= CPU_FMA; >>>>>>>> >>>>>>>> // AMD features. >>>>>>>> if (is_amd()) { >>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>> result |= CPU_LZCNT; >>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>> result |= CPU_SSE4A; >>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>> + result |= CPU_HT; >>>>>>>> } >>>>>>>> // Intel features. >>>>>>>> if(is_intel()) { >>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> - result |= CPU_ADX; >>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> - result |= CPU_BMI2; >>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> - result |= CPU_SHA; >>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>> result |= CPU_LZCNT; >>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> - result |= CPU_FMA; >>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>> support for prefetchw >>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Rohit >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>> >>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a >>>>>>>>>>>> lot of >>>>>>>>>>>> logic >>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test >>>>>>>>>>> and >>>>>>>>>>> resubmit for review. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Rohit >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>> >>>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>>> flag/ISA >>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>> >>>>>>>>>> ************************* Patch **************************** >>>>>>>>>> >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>> } >>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>> } >>>>>>>>>> + if (supports_sha()) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>> + } >>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>>> || >>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>> CPU"); >>>>>>>>>> + } >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>> } >>>>>>>>>> #endif // COMPILER2 >>>>>>>>>> + >>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>> for >>>>>>>>>> Array Copy >>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>> { >>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>> + } >>>>>>>>>> + if (supports_sse2() && >>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>> { >>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>> + } >>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>> + } >>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>> + } >>>>>>>>>> + if (UseSHA) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>>> functions not available on this CPU."); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> +#endif >>>>>>>>>> + } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>> result |= CPU_RTM; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>> >>>>>>>>>> // AMD features. >>>>>>>>>> if (is_amd()) { >>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>> + result |= CPU_HT; >>>>>>>>>> } >>>>>>>>>> // Intel features. >>>>>>>>>> if(is_intel()) { >>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>> indicates >>>>>>>>>> support for prefetchw >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>> >>>>>>>>>> ************************************************************** >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>>>> which >>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help >>>>>>>>>>>>>>> us >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems outside >>>>>>>>>>>>>> the >>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>>>> servers. >>>>>>>>>>>>>> If >>>>>>>>>>>>>> the >>>>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>>>> need >>>>>>>>>>>>>> to >>>>>>>>>>>>>> find >>>>>>>>>>>>>> an >>>>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment >>>>>>>>>>>>>> on >>>>>>>>>>>>>> testing >>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>> } >>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>> } >>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>> || >>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>> CPU"); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } >>>>>>>>>>>>> >>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>> } >>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>> + >>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>> for >>>>>>>>>>>>> Array Copy >>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>> hash >>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#endif >>>>>>>>>>>>> + } >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>> } >>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Rohit >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> > From erik.osterlund at oracle.com Thu Sep 7 09:34:38 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 7 Sep 2017 11:34:38 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <724317c6-0773-a6ac-bfba-5b9e97ca85c6@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> <59AE74AF.70308@oracle.com> <92BFF96B-BB27-4740-9C8B-6BDC3EAC31F7@oracle.com> <59AE93EE.2010600@oracle.com> <724317c6-0773-a6ac-bfba-5b9e97ca85c6@oracle.com> Message-ID: <59B112AE.9040403@oracle.com> Hi David, On 2017-09-07 04:10, David Holmes wrote: > > > On 5/09/2017 10:09 PM, Erik ?sterlund wrote: >> Okay, great. So far it sounds like as for Atomic::inc/dec, there are >> no loud voices against the idea of removing the Atomic::inc/dec >> specializations. So I propose this new webrev that does exactly that. >> >> Full webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01/ >> >> Incremental over last webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00_01/ >> >> I hope this looks simpler. > > Yes this is much simpler. Glad to hear it! > I am still totally dismayed by the complexity that was needed to > retain the inc/dec specializations. To me it just screams that there > is something fundamentally wrong with what was being done. :( Okay. > I'm also somewhat perplexed. I can't read inline assembly fluently, > but looking at the existing inc_ptr implementations, eg for x86, I'm > not seeing code that adds 1*sizeof(*dest). ?? Neither can I. Thanks, /Erik > Thanks, > David > >> Thanks, >> /Erik From thomas.stuefe at gmail.com Thu Sep 7 10:02:01 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 7 Sep 2017 12:02:01 +0200 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it Message-ID: Hi all, may I please have a review for this small change: Bug: https://bugs.openjdk.java.net/browse/JDK-8187230 Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- leave-os-guard-page-size-at-default-for-non-java-threads/webrev.00/webrev/ The change is very subtle. Before, we would set the OS guard page size for every thread - for java threads disable them, for non-java threads we'd set them to 4K. Now, we still disable them for java threads but leave them at the OS default size for non-java threads. The really important part is the disabling of OS guard pages for java threads, where we have a VM guard pages in place and do not want to spend more memory on OS guards. We do not really care for the exact size of the OS guard pages for non-java threads, and therefore should not set it - we should leave the size in place the OS deems sufficient. That also spares us the complexity of handling the thread stack page size, which on AIX may be different from os::vm_page_size(). Thank you and Kind Regards, Thomas From felix.yang at linaro.org Thu Sep 7 10:35:20 2017 From: felix.yang at linaro.org (Felix Yang) Date: Thu, 7 Sep 2017 18:35:20 +0800 Subject: Question regarding "native-compiled frame" In-Reply-To: References: Message-ID: Hi, Thanks for the reply. Then when will the last return statement of frame::sender got a chance to be executed? As I see it, when JVM does something in safepoint state and need to traverse Java thread stack, we never calculate the sender of a native-compiled frame. Except for running Specjbb, I also performed a full jtreg test in order to catch one such case, but still the last return statement not hit. So it's strange to me that frame::sender needs to handle such a frame. Maybe I missed something. Could you elaborate more on this please? Thanks for your help, Felix On 6 September 2017 at 22:10, Andrew Haley wrote: > On 06/09/17 14:55, Felix Yang wrote: > > Can some help explain what is a so called "native-compiled frame" in > > hotspot please? > > Compiled by C++. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From david.holmes at oracle.com Thu Sep 7 10:53:04 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 7 Sep 2017 20:53:04 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59B112AE.9040403@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> <59AE74AF.70308@oracle.com> <92BFF96B-BB27-4740-9C8B-6BDC3EAC31F7@oracle.com> <59AE93EE.2010600@oracle.com> <724317c6-0773-a6ac-bfba-5b9e97ca85c6@oracle.com> <59B112AE.9040403@oracle.com> Message-ID: On 7/09/2017 7:34 PM, Erik ?sterlund wrote: > Hi David, > > On 2017-09-07 04:10, David Holmes wrote: >> >> >> On 5/09/2017 10:09 PM, Erik ?sterlund wrote: >>> Okay, great. So far it sounds like as for Atomic::inc/dec, there are >>> no loud voices against the idea of removing the Atomic::inc/dec >>> specializations. So I propose this new webrev that does exactly that. >>> >>> Full webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01/ >>> >>> Incremental over last webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00_01/ >>> >>> I hope this looks simpler. >> >> Yes this is much simpler. > > Glad to hear it! > >> I am still totally dismayed by the complexity that was needed to >> retain the inc/dec specializations. To me it just screams that there >> is something fundamentally wrong with what was being done. :( > > Okay. > >> I'm also somewhat perplexed. I can't read inline assembly fluently, >> but looking at the existing inc_ptr implementations, eg for x86, I'm >> not seeing code that adds 1*sizeof(*dest). ?? > > Neither can I. Okay so ... something not right here. :) It seems to me that the uses of inc_ptr are actually just bumping a counter by 1, where the counter is defined to be the same size as a ptr ie 32-bit on 32-bit and 64-bit on 64-bit. Cheers, David > Thanks, > /Erik > >> Thanks, >> David >> >>> Thanks, >>> /Erik > From adinn at redhat.com Thu Sep 7 11:04:46 2017 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 7 Sep 2017 12:04:46 +0100 Subject: Question regarding "native-compiled frame" In-Reply-To: References: Message-ID: On 07/09/17 11:35, Felix Yang wrote: > Thanks for the reply. > Then when will the last return statement of frame::sender got a chance to > be executed? > As I see it, when JVM does something in safepoint state and need to > traverse Java thread stack, we never calculate the sender of a > native-compiled frame. It is possible for Java to call out into the JVM and then for the JVM to call back into Java. For example, when a class is loaded the JVM calls into Java to run the class initializer. This re-entry may happen multiple times. In that case a stack walk under the re-entry may find a Java start fame and it's parent frame will be the native frame where Java entered the VM. Note that the native frame will always be returned by the call to sender_for_entry_frame(map). That method skips all the C frames between the Java entry frame and the native frame which exited Java. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From erik.osterlund at oracle.com Thu Sep 7 11:05:44 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 7 Sep 2017 13:05:44 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <14533625-0A6E-45FE-8EF2-65CE40931D94@oracle.com> <59AE74AF.70308@oracle.com> <92BFF96B-BB27-4740-9C8B-6BDC3EAC31F7@oracle.com> <59AE93EE.2010600@oracle.com> <724317c6-0773-a6ac-bfba-5b9e97ca85c6@oracle.com> <59B112AE.9040403@oracle.com> Message-ID: <59B12808.605@oracle.com> Hi David, On 2017-09-07 12:53, David Holmes wrote: > On 7/09/2017 7:34 PM, Erik ?sterlund wrote: >> Hi David, >> >> On 2017-09-07 04:10, David Holmes wrote: >>> >>> >>> On 5/09/2017 10:09 PM, Erik ?sterlund wrote: >>>> Okay, great. So far it sounds like as for Atomic::inc/dec, there >>>> are no loud voices against the idea of removing the Atomic::inc/dec >>>> specializations. So I propose this new webrev that does exactly that. >>>> >>>> Full webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01/ >>>> >>>> Incremental over last webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00_01/ >>>> >>>> I hope this looks simpler. >>> >>> Yes this is much simpler. >> >> Glad to hear it! >> >>> I am still totally dismayed by the complexity that was needed to >>> retain the inc/dec specializations. To me it just screams that there >>> is something fundamentally wrong with what was being done. :( >> >> Okay. >> >>> I'm also somewhat perplexed. I can't read inline assembly fluently, >>> but looking at the existing inc_ptr implementations, eg for x86, I'm >>> not seeing code that adds 1*sizeof(*dest). ?? >> >> Neither can I. > > Okay so ... something not right here. :) > > It seems to me that the uses of inc_ptr are actually just bumping a > counter by 1, where the counter is defined to be the same size as a > ptr ie 32-bit on 32-bit and 64-bit on 64-bit. That seems right to me. The old add/inc/dec_ptr overloads have the same semantics they have always had: they all forward the call to add 1/-1 with a destination of type pointer to intptr_t or char*. That is - either it performs normal integer addition (for intptr_t), or pointer scaled add (char*) where the element size of the pointer is 1 (sizeof(char)). In both cases inc_ptr will add by one when going down these old non-generalized explicit *_ptr overloads. Thanks, /Erik > Cheers, > David > >> Thanks, >> /Erik >> >>> Thanks, >>> David >>> >>>> Thanks, >>>> /Erik >> From goetz.lindenmaier at sap.com Thu Sep 7 12:20:30 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 7 Sep 2017 12:20:30 +0000 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: Message-ID: Hi Thomas, looks good except for that you missed setting the guard pages size to zero for compiler threads. Compiler threads are Java Threads and thus get our guard pages. Best regards, Goetz. > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf Of Thomas St?fe > Sent: Donnerstag, 7. September 2017 12:02 > To: ppc-aix-port-dev at openjdk.java.net > Cc: HotSpot Open Source Developers > Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non- > java threads instead of explicitly setting it > > Hi all, > > may I please have a review for this small change: > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8187230 > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- > leave-os-guard-page-size-at-default-for-non-java- > threads/webrev.00/webrev/ > > The change is very subtle. > > Before, we would set the OS guard page size for every thread - for java > threads disable them, for non-java threads we'd set them to 4K. > > Now, we still disable them for java threads but leave them at the OS > default size for non-java threads. > > The really important part is the disabling of OS guard pages for java > threads, where we have a VM guard pages in place and do not want to spend > more memory on OS guards. We do not really care for the exact size of the > OS guard pages for non-java threads, and therefore should not set it - we > should leave the size in place the OS deems sufficient. That also spares us > the complexity of handling the thread stack page size, which on AIX may be > different from os::vm_page_size(). > > Thank you and Kind Regards, Thomas From thomas.stuefe at gmail.com Thu Sep 7 13:46:36 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 7 Sep 2017 15:46:36 +0200 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: Message-ID: Hi Goetz, thanks for the review! Corrected webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix-leave-os-guard-page-size-at-default-for-non-java-threads/webrev.01/webrev/ Thanks, Thomas On Thu, Sep 7, 2017 at 2:20 PM, Lindenmaier, Goetz < goetz.lindenmaier at sap.com> wrote: > Hi Thomas, > > looks good except for that you missed setting the guard pages size to zero > for compiler threads. Compiler threads are Java Threads and thus get > our guard pages. > > Best regards, > Goetz. > > > -----Original Message----- > > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > > Behalf Of Thomas St?fe > > Sent: Donnerstag, 7. September 2017 12:02 > > To: ppc-aix-port-dev at openjdk.java.net > > Cc: HotSpot Open Source Developers > > Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default > for non- > > java threads instead of explicitly setting it > > > > Hi all, > > > > may I please have a review for this small change: > > > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8187230 > > > > Webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- > > leave-os-guard-page-size-at-default-for-non-java- > > threads/webrev.00/webrev/ > > > > The change is very subtle. > > > > Before, we would set the OS guard page size for every thread - for java > > threads disable them, for non-java threads we'd set them to 4K. > > > > Now, we still disable them for java threads but leave them at the OS > > default size for non-java threads. > > > > The really important part is the disabling of OS guard pages for java > > threads, where we have a VM guard pages in place and do not want to spend > > more memory on OS guards. We do not really care for the exact size of the > > OS guard pages for non-java threads, and therefore should not set it - we > > should leave the size in place the OS deems sufficient. That also spares > us > > the complexity of handling the thread stack page size, which on AIX may > be > > different from os::vm_page_size(). > > > > Thank you and Kind Regards, Thomas > From goetz.lindenmaier at sap.com Thu Sep 7 14:14:14 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 7 Sep 2017 14:14:14 +0000 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: Message-ID: <49c5502dfe5742309858979e546e0769@sap.com> Thanks, looks good now! Best regards, Goetz. > -----Original Message----- > From: Thomas St?fe [mailto:thomas.stuefe at gmail.com] > Sent: Donnerstag, 7. September 2017 15:47 > To: Lindenmaier, Goetz > Cc: ppc-aix-port-dev at openjdk.java.net; HotSpot Open Source Developers > > Subject: Re: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for > non-java threads instead of explicitly setting it > > Hi Goetz, > > thanks for the review! > > Corrected webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8187230- > aix-leave-os-guard-page-size-at-default-for-non-java- > threads/webrev.01/webrev/ > > Thanks, Thomas > > > On Thu, Sep 7, 2017 at 2:20 PM, Lindenmaier, Goetz > > > wrote: > > > Hi Thomas, > > looks good except for that you missed setting the guard pages size to > zero > for compiler threads. Compiler threads are Java Threads and thus get > our guard pages. > > Best regards, > Goetz. > > > > -----Original Message----- > > From: hotspot-dev [mailto:hotspot-dev- > bounces at openjdk.java.net bounces at openjdk.java.net> ] On > > Behalf Of Thomas St?fe > > Sent: Donnerstag, 7. September 2017 12:02 > > To: ppc-aix-port-dev at openjdk.java.net dev at openjdk.java.net> > > Cc: HotSpot Open Source Developers dev at openjdk.java.net > > > Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at > default for non- > > java threads instead of explicitly setting it > > > > Hi all, > > > > may I please have a review for this small change: > > > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8187230 > > > > > Webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- > > > leave-os-guard-page-size-at-default-for-non-java- > > threads/webrev.00/webrev/ > > > > The change is very subtle. > > > > Before, we would set the OS guard page size for every thread - for > java > > threads disable them, for non-java threads we'd set them to 4K. > > > > Now, we still disable them for java threads but leave them at the > OS > > default size for non-java threads. > > > > The really important part is the disabling of OS guard pages for java > > threads, where we have a VM guard pages in place and do not > want to spend > > more memory on OS guards. We do not really care for the exact > size of the > > OS guard pages for non-java threads, and therefore should not set > it - we > > should leave the size in place the OS deems sufficient. That also > spares us > > the complexity of handling the thread stack page size, which on AIX > may be > > different from os::vm_page_size(). > > > > Thank you and Kind Regards, Thomas > > From chris.plummer at oracle.com Thu Sep 7 21:07:05 2017 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 7 Sep 2017 14:07:05 -0700 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: Message-ID: Hi Thomas, Is there a reason this shouldn't also be done for linux? thanks, Chris On 9/7/17 3:02 AM, Thomas St?fe wrote: > Hi all, > > may I please have a review for this small change: > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8187230 > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- > leave-os-guard-page-size-at-default-for-non-java-threads/webrev.00/webrev/ > > The change is very subtle. > > Before, we would set the OS guard page size for every thread - for java > threads disable them, for non-java threads we'd set them to 4K. > > Now, we still disable them for java threads but leave them at the OS > default size for non-java threads. > > The really important part is the disabling of OS guard pages for java > threads, where we have a VM guard pages in place and do not want to spend > more memory on OS guards. We do not really care for the exact size of the > OS guard pages for non-java threads, and therefore should not set it - we > should leave the size in place the OS deems sufficient. That also spares us > the complexity of handling the thread stack page size, which on AIX may be > different from os::vm_page_size(). > > Thank you and Kind Regards, Thomas From david.holmes at oracle.com Fri Sep 8 04:37:43 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 8 Sep 2017 14:37:43 +1000 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: Message-ID: <45615e9a-b332-132d-05b7-082422c68bd8@oracle.com> Hi Chris, On 8/09/2017 7:07 AM, Chris Plummer wrote: > Hi Thomas, > > Is there a reason this shouldn't also be done for linux? It probably could given "The default guard size is the same as the system page size." so the end result would be the same. But then perhaps this whole default_guard_size logic should disappear, if all we ever need to do is disable guards for JavaThreads? But this seems out of scope for what Thomas wanted to fix - unless he wants to extend the scope of course ;-) Cheers, David > thanks, > > Chris > > On 9/7/17 3:02 AM, Thomas St?fe wrote: >> Hi all, >> >> may I please have a review for this small change: >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8187230 >> >> Webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- >> leave-os-guard-page-size-at-default-for-non-java-threads/webrev.00/webrev/ >> >> >> The change is very subtle. >> >> Before, we would set the OS guard page size for every thread - for java >> threads disable them, for non-java threads we'd set them to 4K. >> >> Now, we still disable them for java threads but leave them at the OS >> default size for non-java threads. >> >> The really important part is the disabling of OS guard pages for java >> threads, where we have a VM guard pages in place and do not want to spend >> more memory on OS guards. We do not really care for the exact size of the >> OS guard pages for non-java threads, and therefore should not set it - we >> should leave the size in place the OS deems sufficient. That also >> spares us >> the complexity of handling the thread stack page size, which on AIX >> may be >> different from os::vm_page_size(). >> >> Thank you and Kind Regards, Thomas > > > From aph at redhat.com Fri Sep 8 07:13:47 2017 From: aph at redhat.com (Andrew Haley) Date: Fri, 8 Sep 2017 08:13:47 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59B0026F.3040406@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> <59AD55AC.4030105@oracle.com> <3a6fbae3-cddb-6ac0-890d-da4b33308b5e@redhat.com> <59AE5AA4.1070803@oracle.com> <59B0026F.3040406@oracle.com> Message-ID: <8730f7ea-1612-60a6-3826-0e0324858f72@redhat.com> On 06/09/17 15:13, Erik ?sterlund wrote: > Hi Andrew, > > On 2017-09-06 15:20, Andrew Haley wrote: >> On 05/09/17 09:04, Erik ?sterlund wrote: >> >>> For example, our atomics typically conservatively guarantees >>> bidirectional full fencing, while theirs does not. >> Firstly, we can insert whatever fences we want, using intrinsics. We >> don't need assembly language to do that. > > Since I thought we already had (and finished) that discussion, and it is > no longer relevant to the current proposal of removing inc/dec > specializations, I hope you are okay with me preferring not to re-open > that discussion in this RFE. Another day, perhaps? Well, let us make a deal: if you don't say something I disagree with, I promise not to disagree. :-) > >> Secondly, I don't see bidirectional full fencing in x86 atomics, and I >> don't think we really want bidirectional full fencing anyway. > > That is because an atomic x86 locked instruction is observably > equivalent to having bidirectional fencing surrounding the access due to > the stronger memory model of the machine. TSO, as implemented by x86, is very strong, but is it really equivalent to bidirectional fencing? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From goetz.lindenmaier at sap.com Fri Sep 8 07:51:27 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 8 Sep 2017 07:51:27 +0000 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: Message-ID: <368f252c8d5440e785e1ee341f4a918e@sap.com> Hi Chris, on linux the pthread implementation is a bit strange, or buggy. It takes the OS guard pages out of the stack size specified. We need to set it so we can predict the additional space that must be allocated for the stack. See also the comment in os_linux.cpp, create_thread(). Best regards, Goetz. > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] > On Behalf Of Chris Plummer > Sent: Thursday, September 07, 2017 11:07 PM > To: Thomas St?fe ; ppc-aix-port- > dev at openjdk.java.net > Cc: HotSpot Open Source Developers > Subject: Re: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for > non-java threads instead of explicitly setting it > > Hi Thomas, > > Is there a reason this shouldn't also be done for linux? > > thanks, > > Chris > > On 9/7/17 3:02 AM, Thomas St?fe wrote: > > Hi all, > > > > may I please have a review for this small change: > > > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8187230 > > > > Webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- > > leave-os-guard-page-size-at-default-for-non-java- > threads/webrev.00/webrev/ > > > > The change is very subtle. > > > > Before, we would set the OS guard page size for every thread - for java > > threads disable them, for non-java threads we'd set them to 4K. > > > > Now, we still disable them for java threads but leave them at the OS > > default size for non-java threads. > > > > The really important part is the disabling of OS guard pages for java > > threads, where we have a VM guard pages in place and do not want to > spend > > more memory on OS guards. We do not really care for the exact size of the > > OS guard pages for non-java threads, and therefore should not set it - we > > should leave the size in place the OS deems sufficient. That also spares us > > the complexity of handling the thread stack page size, which on AIX may be > > different from os::vm_page_size(). > > > > Thank you and Kind Regards, Thomas > > From erik.osterlund at oracle.com Fri Sep 8 08:13:21 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 8 Sep 2017 10:13:21 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <8730f7ea-1612-60a6-3826-0e0324858f72@redhat.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> <59AD55AC.4030105@oracle.com> <3a6fbae3-cddb-6ac0-890d-da4b33308b5e@redhat.com> <59AE5AA4.1070803@oracle.com> <59B0026F.3040406@oracle.com> <8730f7ea-1612-60a6-3826-0e0324858f72@redhat.com> Message-ID: <59B25121.8030308@oracle.com> Hi Andrew, On 2017-09-08 09:13, Andrew Haley wrote: > On 06/09/17 15:13, Erik ?sterlund wrote: >> Hi Andrew, >> >> On 2017-09-06 15:20, Andrew Haley wrote: >>> On 05/09/17 09:04, Erik ?sterlund wrote: >>> >>>> For example, our atomics typically conservatively guarantees >>>> bidirectional full fencing, while theirs does not. >>> Firstly, we can insert whatever fences we want, using intrinsics. We >>> don't need assembly language to do that. >> Since I thought we already had (and finished) that discussion, and it is >> no longer relevant to the current proposal of removing inc/dec >> specializations, I hope you are okay with me preferring not to re-open >> that discussion in this RFE. Another day, perhaps? > Well, let us make a deal: if you don't say something I disagree with, I promise > not to disagree. :-) We have a deal! :) >>> Secondly, I don't see bidirectional full fencing in x86 atomics, and I >>> don't think we really want bidirectional full fencing anyway. >> That is because an atomic x86 locked instruction is observably >> equivalent to having bidirectional fencing surrounding the access due to >> the stronger memory model of the machine. > TSO, as implemented by x86, is very strong, but is it really equivalent to > bidirectional fencing? Yup. Thanks, /Erik From thomas.stuefe at gmail.com Fri Sep 8 08:48:16 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 8 Sep 2017 10:48:16 +0200 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: <368f252c8d5440e785e1ee341f4a918e@sap.com> References: <368f252c8d5440e785e1ee341f4a918e@sap.com> Message-ID: Hi Guys, On Fri, Sep 8, 2017 at 9:51 AM, Lindenmaier, Goetz < goetz.lindenmaier at sap.com> wrote: > Hi Chris, > > on linux the pthread implementation is a bit strange, or buggy. > It takes the OS guard pages out of the stack size specified. > We need to set it so we can predict the additional space > that must be allocated for the stack. > > See also the comment in os_linux.cpp, create_thread(). > Goetz, I know we talked about this off list yesterday, but now I am not sure this is actually needed. Yes, to correctly calculate the stack size, we need to know the OS guard page size, but we do not need to set it, we just need to know it. So, for non-java threads (java threads get the OS guard set to zero), it would probably be sufficient to: - pthread_attr_init() (sets default thread attribute values to the attribute structure) and then - pthread_attr_getguardsize() to read the guard size from that structure. That way we leave the OS guard page at the size glibc deems best. I think that is a better option. Consider a situation where the glibc changes the size of the OS guard pages, for whatever reason - we probably should follow suit. See e.g. this security issue - admittedly only loosely related, since the fix for this issue seemed to be a fix to stack banging, not changing the OS guard size: https://access.redhat.com/security/vulnerabilities/stackguard So, in short, I think we could change this for Linux too. If you guys agree, I'll add this to the patch. Since I am on vacation and the depot is closed, it may take some time. Kind Regards, Thomas > > Best regards, > Goetz. > > > -----Original Message----- > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net > ] > > On Behalf Of Chris Plummer > > Sent: Thursday, September 07, 2017 11:07 PM > > To: Thomas St?fe ; ppc-aix-port- > > dev at openjdk.java.net > > Cc: HotSpot Open Source Developers > > Subject: Re: RFR(xxs): 8187230: [aix] Leave OS guard page size at > default for > > non-java threads instead of explicitly setting it > > > > Hi Thomas, > > > > Is there a reason this shouldn't also be done for linux? > > > > thanks, > > > > Chris > > > > On 9/7/17 3:02 AM, Thomas St?fe wrote: > > > Hi all, > > > > > > may I please have a review for this small change: > > > > > > Bug: > > > https://bugs.openjdk.java.net/browse/JDK-8187230 > > > > > > Webrev: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- > > > leave-os-guard-page-size-at-default-for-non-java- > > threads/webrev.00/webrev/ > > > > > > The change is very subtle. > > > > > > Before, we would set the OS guard page size for every thread - for java > > > threads disable them, for non-java threads we'd set them to 4K. > > > > > > Now, we still disable them for java threads but leave them at the OS > > > default size for non-java threads. > > > > > > The really important part is the disabling of OS guard pages for java > > > threads, where we have a VM guard pages in place and do not want to > > spend > > > more memory on OS guards. We do not really care for the exact size of > the > > > OS guard pages for non-java threads, and therefore should not set it - > we > > > should leave the size in place the OS deems sufficient. That also > spares us > > > the complexity of handling the thread stack page size, which on AIX > may be > > > different from os::vm_page_size(). > > > > > > Thank you and Kind Regards, Thomas > > > > > > From stuart.monteith at linaro.org Fri Sep 8 15:53:40 2017 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Fri, 8 Sep 2017 16:53:40 +0100 Subject: RFR(XL/M) : 8178788: wrap JCStress test suite as jtreg tests In-Reply-To: <342c3748-616f-8a7d-74f4-3ce929b1e0dc@redhat.com> References: <9A2C94EA-89A3-4C75-9D3C-51E058BD8A1D@oracle.com> <342c3748-616f-8a7d-74f4-3ce929b1e0dc@redhat.com> Message-ID: Hello, I've spent some time on this, and I have to admit that I'm stumped. I get exactly the same errors on x86 on jdk10/hs and jdk10/jdk10 with arecent build of JTReg and JT_HOME set appropriately. Are there any pointers on how this is supposed to be run? Thanks, Stuart On 25 April 2017 at 11:47, Aleksey Shipilev wrote: > On 04/19/2017 12:12 AM, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev//8178788/webrev.00/index.html > >> 69903 lines changed: 69903 ins; 0 del; 0 mod; > > (69524 lines are generated) > > > > Hi all, > > > > could you please review this patch which adds a jtreg test wrapper for > > jcstress test suite and jtreg tests which run jsctress tests thru this > > wrapper? > > > > webrev: http://cr.openjdk.java.net/~iignatyev//8178788/webrev.00/ > index.html > > JBS: https://bugs.openjdk.java.net/browse/JDK-8178788 testing: > > TL;DR: This patch introduces more problems than it solves. Just run the > jcstress > tests-all JAR against the tested runtime. > > Wrapping jcstress tests with jtreg defies the purpose of jcstress harness > -- > that is, running lots of tests as fast as it possibly could without > affecting > testing quality. For example, by cleverly reusing VMs between the tests, > using > Whitebox to deoptimize without restarting the VMs, etc. It really wastes > CPU > time to run each test in isolation. > > Also, it does not "automatically" work, which defies "easy to run" goal: > > Caused by: java.io.FileNotFoundException: Couldn't automatically resolve > dependency for jcstress-tests-all , revision 0.3 > Please specify the location using jdk.test.lib.artifacts. > jcstress-tests-all > at > jdk.test.lib.artifacts.DefaultArtifactManager.resolve( > DefaultArtifactManager.java:37) > at jdk.test.lib.artifacts.ArtifactResolver.resolve( > ArtifactResolver.java:54) > at applications.jcstress.JcstressRunner.pathToArtifact( > JcstressRunner.java:53) > ... 8 more > > Okay, brilliant! How do I configure this, if I run "make test"? > > CONF=linux-x86_64-normal-server-release LOG=info make test > TEST="hotspot_all" > > > -Aleksey > > From thomas.stuefe at gmail.com Sat Sep 9 07:57:47 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 9 Sep 2017 09:57:47 +0200 Subject: Gtests, do we care about leaking memory on asserts? Message-ID: Hi all, I am writing a gtest which require memory to be allocated. What is the policy about freeing it, do we care that we leak that memory on assert? So, should I care to wrap an RAII object around it or similar? Took a quick look at the current tests, there are a number of tests which would leak C-Heap on assert, e.g: memory/test_metachunk.cpp memory/test_guardedMemory.cpp gc/g1/test_freeRegionList.cpp .. So, I guess this is fine? Thank you, Thomas From david.holmes at oracle.com Mon Sep 11 07:50:57 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 11 Sep 2017 17:50:57 +1000 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> Message-ID: <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> Hi Rohit, Updated webrev at: http://cr.openjdk.java.net/~dholmes/8187219/webrev.v2/ On 7/09/2017 3:24 PM, Rohit Arul Raj wrote: > Hello Vladimir, David, > >> >>> >>> You added: >>> >>> 526 if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> 527 result |= CPU_HT; >>> >>> and I'm wondering of there would be any case where this would not be >>> covered by the earlier: >>> >>> 448 if (threads_per_core() > 1) >>> 449 result |= CPU_HT; >>> >>> ? >>> --- >> >> >> Valid question. >> > > Thanks for your review and comments. > I have updated the patch to calculate threads per core by using the > CPUID bit: CPUID_Fn8000001E_EBX [8:15]. This detail is getting beyond my knowledge I'm afraid. I have two queries: 1. ExtCpuid1EEx Should this be ExtCpuid1EEbx? (I see the naming here is somewhat inconsistent - and potentially confusing: I would have preferred to see things like ExtCpuid_1E_Ebx, to make it clear.) 2. You fixed the calculation of threads_per_core by reading it directly, but I have to wonder which part of the old calculation: 695 result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / 696 cores_per_cpu(); was actually going wrong in the case of the 17h? Thanks, David ----- > Reference: > https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf [Pg 82] > > CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) > 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. > The number of threads per core is ThreadsPerCore+1. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -70,7 +70,7 @@ > bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); > > Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; > - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > done, wrapup; > + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > ext_cpuid8, done, wrapup; > Label legacy_setup, save_restore_except, legacy_save_restore, > start_simd_check; > > StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); > @@ -272,9 +272,23 @@ > __ jccb(Assembler::belowEqual, ext_cpuid5); > __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? > __ jccb(Assembler::belowEqual, ext_cpuid7); > + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? > + __ jccb(Assembler::belowEqual, ext_cpuid8); > + // > + // Extended cpuid(0x8000001E) > + // > + __ movl(rax, 0x8000001E); > + __ cpuid(); > + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid1E_offset()))); > + __ movl(Address(rsi, 0), rax); > + __ movl(Address(rsi, 4), rbx); > + __ movl(Address(rsi, 8), rcx); > + __ movl(Address(rsi,12), rdx); > + > // > // Extended cpuid(0x80000008) > // > + __ bind(ext_cpuid8); > __ movl(rax, 0x80000008); > __ cpuid(); > __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); > @@ -1109,11 +1123,27 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > +#ifdef COMPILER2 > + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -228,6 +228,15 @@ > } bits; > }; > > + union ExtCpuid1EEx { > + uint32_t value; > + struct { > + uint32_t : 8, > + threads_per_core : 8, > + : 16; > + } bits; > + }; > + > union XemXcr0Eax { > uint32_t value; > struct { > @@ -398,6 +407,12 @@ > ExtCpuid8Ecx ext_cpuid8_ecx; > uint32_t ext_cpuid8_edx; // reserved > > + // cpuid function 0x8000001E // AMD 17h > + uint32_t ext_cpuid1E_eax; > + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) > + uint32_t ext_cpuid1E_ecx; > + uint32_t ext_cpuid1E_edx; // unused currently > + > // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) > XemXcr0Eax xem_xcr0_eax; > uint32_t xem_xcr0_edx; // reserved > @@ -505,6 +520,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -518,16 +541,8 @@ > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > @@ -590,6 +605,7 @@ > static ByteSize ext_cpuid5_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid5_eax); } > static ByteSize ext_cpuid7_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid7_eax); } > static ByteSize ext_cpuid8_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid8_eax); } > + static ByteSize ext_cpuid1E_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } > static ByteSize tpl_cpuidB0_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } > static ByteSize tpl_cpuidB1_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } > static ByteSize tpl_cpuidB2_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } > @@ -673,8 +689,11 @@ > if (is_intel() && supports_processor_topology()) { > result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; > } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { > - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > - cores_per_cpu(); > + if (cpu_family() >= 0x17) > + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; > + else > + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > + cores_per_cpu(); > } > return (result == 0 ? 1 : result); > } > > I have attached the patch for review. > Please let me know your comments. > > Thanks, > Rohit > >> Thanks, >> Vladimir >> >> >>> >>> src/cpu/x86/vm/vm_version_x86.cpp >>> >>> No comments on AMD specific changes. >>> >>> Thanks, >>> David >>> ----- >>> >>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>> >>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>> >>>>> Hello David, >>>>> >>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>> wrote: >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>>>> >>>>> >>>>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >>>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>>>> without any issues. >>>>> Can you share the error message that you are getting? >>>> >>>> >>>> I was getting this: >>>> >>>> applying hotspot.patch >>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>> Hunk #1 FAILED at 1108 >>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>> Hunk #2 FAILED at 522 >>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>> abort: patch failed to apply >>>> >>>> but I started again and this time it applied fine, so not sure what was >>>> going on there. >>>> >>>> Cheers, >>>> David >>>> >>>>> Regards, >>>>> Rohit >>>>> >>>>> >>>>>> >>>>>> >>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> Hello Vladimir, >>>>>>> >>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Rohit, >>>>>>>> >>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello Vladimir, >>>>>>>>> >>>>>>>>>> Changes look good. Only question I have is about MaxVectorSize. It >>>>>>>>>> is >>>>>>>>>> set >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>> >>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. >>>>>>>>> So >>>>>>>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Which check you removed? >>>>>>>> >>>>>>> >>>>>>> My older patch had the below mentioned check which was required on >>>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>> >>>>>>> + // Some defaults for AMD family 17h >>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>> ... >>>>>>> ... >>>>>>> + if (MaxVectorSize > 32) { >>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>> + } >>>>>>> .. >>>>>>> .. >>>>>>> + } >>>>>>> >>>>>>>>> >>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>> >>>>>>>>> AMD 17h has support for SHA. >>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is there >>>>>>>>> an >>>>>>>>> underlying reason for this? I have handled this in the patch but >>>>>>>>> just >>>>>>>>> wanted to confirm. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>> instructions >>>>>>>> to >>>>>>>> calculate SHA-256: >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>> >>>>>>>> I don't know if AMD 15h supports these instructions and can execute >>>>>>>> that >>>>>>>> code. You need to test it. >>>>>>>> >>>>>>> >>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>>>>> it should work. >>>>>>> Confirmed by running following sanity tests: >>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>> >>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>> >>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>> >>>>>>> So I have removed those SHA checks from my patch too. >>>>>>> >>>>>>> Please find attached updated, re-tested patch. >>>>>>> >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>> } >>>>>>> >>>>>>> #ifdef COMPILER2 >>>>>>> - if (MaxVectorSize > 16) { >>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>> } >>>>>>> #endif // COMPILER2 >>>>>>> + >>>>>>> + // Some defaults for AMD family 17h >>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>>> Array Copy >>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>> + } >>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>> { >>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>> + } >>>>>>> +#ifdef COMPILER2 >>>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>> + } >>>>>>> +#endif >>>>>>> + } >>>>>>> } >>>>>>> >>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> @@ -505,6 +505,14 @@ >>>>>>> result |= CPU_CLMUL; >>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>> result |= CPU_RTM; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> + result |= CPU_ADX; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> + result |= CPU_BMI2; >>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> + result |= CPU_SHA; >>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> + result |= CPU_FMA; >>>>>>> >>>>>>> // AMD features. >>>>>>> if (is_amd()) { >>>>>>> @@ -515,19 +523,13 @@ >>>>>>> result |= CPU_LZCNT; >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>> result |= CPU_SSE4A; >>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>> + result |= CPU_HT; >>>>>>> } >>>>>>> // Intel features. >>>>>>> if(is_intel()) { >>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> - result |= CPU_ADX; >>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> - result |= CPU_BMI2; >>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> - result |= CPU_SHA; >>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>> result |= CPU_LZCNT; >>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> - result |= CPU_FMA; >>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>> support for prefetchw >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>> >>>>>>> Please let me know your comments. >>>>>>> >>>>>>> Thanks for your time. >>>>>>> Rohit >>>>>>> >>>>>>>>> >>>>>>>>> Thanks for taking time to review the code. >>>>>>>>> >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>> } >>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>> } >>>>>>>>> + if (supports_sha()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>> + } >>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>> || >>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>>> + } >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> >>>>>>>>> // some defaults for AMD family 15h >>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>> } >>>>>>>>> >>>>>>>>> #ifdef COMPILER2 >>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>> } >>>>>>>>> #endif // COMPILER2 >>>>>>>>> + >>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>> for >>>>>>>>> Array Copy >>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>> + } >>>>>>>>> + if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>> + } >>>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>> { >>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>> + } >>>>>>>>> + if (UseSHA) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>> functions not available on this CPU."); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#ifdef COMPILER2 >>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#endif >>>>>>>>> + } >>>>>>>>> } >>>>>>>>> >>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>> result |= CPU_CLMUL; >>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>> result |= CPU_RTM; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> + result |= CPU_ADX; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> + result |= CPU_BMI2; >>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> + result |= CPU_SHA; >>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> + result |= CPU_FMA; >>>>>>>>> >>>>>>>>> // AMD features. >>>>>>>>> if (is_amd()) { >>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>> result |= CPU_SSE4A; >>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>> + result |= CPU_HT; >>>>>>>>> } >>>>>>>>> // Intel features. >>>>>>>>> if(is_intel()) { >>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> - result |= CPU_ADX; >>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> - result |= CPU_BMI2; >>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> - result |= CPU_SHA; >>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> - result |= CPU_FMA; >>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>>> support for prefetchw >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>> >>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a >>>>>>>>>>>>> lot of >>>>>>>>>>>>> logic >>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test >>>>>>>>>>>> and >>>>>>>>>>>> resubmit for review. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Rohit >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>> >>>>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>>>> flag/ISA >>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>> >>>>>>>>>>> ************************* Patch **************************** >>>>>>>>>>> >>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>> } >>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>> } >>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>> + } >>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>>>> || >>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>> CPU"); >>>>>>>>>>> + } >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>> } >>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>> + >>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>>> for >>>>>>>>>>> Array Copy >>>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>> { >>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>> + } >>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>> { >>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>> + } >>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>> + } >>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>> + } >>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> +#endif >>>>>>>>>>> + } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>> >>>>>>>>>>> // AMD features. >>>>>>>>>>> if (is_amd()) { >>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>> } >>>>>>>>>>> // Intel features. >>>>>>>>>>> if(is_intel()) { >>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>> indicates >>>>>>>>>>> support for prefetchw >>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>> >>>>>>>>>>> ************************************************************** >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Rohit >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help >>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems outside >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>> If >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>>>>> need >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> find >>>>>>>>>>>>>>> an >>>>>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>> || >>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> >>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>> for >>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>> hash >>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >> From rohitarulraj at gmail.com Mon Sep 11 09:08:48 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Mon, 11 Sep 2017 14:38:48 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> Message-ID: Hello David, Thanks for the review. > > This detail is getting beyond my knowledge I'm afraid. I have two queries: > > 1. ExtCpuid1EEx > > Should this be ExtCpuid1EEbx? (I see the naming here is somewhat > inconsistent - and potentially confusing: I would have preferred to see > things like ExtCpuid_1E_Ebx, to make it clear.) Yes, I can change it accordingly. > 2. You fixed the calculation of threads_per_core by reading it directly, but > I have to wonder which part of the old calculation: > > 695 result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > 696 cores_per_cpu(); > > was actually going wrong in the case of the 17h? Currently in OpenDJK, we get the no. of threads per core from the following CPUID bits. threads_per_core = CPUID_Fn00000001_EBX [16:23] / CPUID_Fn80000008_ECX [0:7] (cores_per_cpu). It used to work with AMD15h since CPUID Fn0000_0001_EBX [16:23] : gives no of logical cores. CPUID Fn8000_0008_ECX [0:7] : gives no of physical cores. http://support.amd.com/TechDocs/55072_AMD_Family_15h_Models_70h-7Fh_BKDG.pdf [Pg 54: 2.4.11.1 Multi-Core Support] For AMD17h EPYC, using the same CPUID's, we get threads per core as 1 since: CPUID Fn0000_0001_EBX [16:23] : gives no of logical cores. CPUID Fn8000_0008_ECX [0:7] : gives no of threads in the package. https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf [Pg 74] Hence the change was required for 17h family of processors. I will update the changes and submit the patch again. Thanks, Rohit > Thanks, > David > ----- > > >> Reference: >> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >> [Pg 82] >> >> CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >> 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. >> The number of threads per core is ThreadsPerCore+1. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -70,7 +70,7 @@ >> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >> >> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >> done, wrapup; >> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >> ext_cpuid8, done, wrapup; >> Label legacy_setup, save_restore_except, legacy_save_restore, >> start_simd_check; >> >> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >> @@ -272,9 +272,23 @@ >> __ jccb(Assembler::belowEqual, ext_cpuid5); >> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >> __ jccb(Assembler::belowEqual, ext_cpuid7); >> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >> + __ jccb(Assembler::belowEqual, ext_cpuid8); >> + // >> + // Extended cpuid(0x8000001E) >> + // >> + __ movl(rax, 0x8000001E); >> + __ cpuid(); >> + __ lea(rsi, Address(rbp, >> in_bytes(VM_Version::ext_cpuid1E_offset()))); >> + __ movl(Address(rsi, 0), rax); >> + __ movl(Address(rsi, 4), rbx); >> + __ movl(Address(rsi, 8), rcx); >> + __ movl(Address(rsi,12), rdx); >> + >> // >> // Extended cpuid(0x80000008) >> // >> + __ bind(ext_cpuid8); >> __ movl(rax, 0x80000008); >> __ cpuid(); >> __ lea(rsi, Address(rbp, >> in_bytes(VM_Version::ext_cpuid8_offset()))); >> @@ -1109,11 +1123,27 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -228,6 +228,15 @@ >> } bits; >> }; >> >> + union ExtCpuid1EEx { >> + uint32_t value; >> + struct { >> + uint32_t : 8, >> + threads_per_core : 8, >> + : 16; >> + } bits; >> + }; >> + >> union XemXcr0Eax { >> uint32_t value; >> struct { >> @@ -398,6 +407,12 @@ >> ExtCpuid8Ecx ext_cpuid8_ecx; >> uint32_t ext_cpuid8_edx; // reserved >> >> + // cpuid function 0x8000001E // AMD 17h >> + uint32_t ext_cpuid1E_eax; >> + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >> + uint32_t ext_cpuid1E_ecx; >> + uint32_t ext_cpuid1E_edx; // unused currently >> + >> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >> register) >> XemXcr0Eax xem_xcr0_eax; >> uint32_t xem_xcr0_edx; // reserved >> @@ -505,6 +520,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -518,16 +541,8 @@ >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> @@ -590,6 +605,7 @@ >> static ByteSize ext_cpuid5_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >> static ByteSize ext_cpuid7_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >> static ByteSize ext_cpuid8_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >> + static ByteSize ext_cpuid1E_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >> static ByteSize tpl_cpuidB0_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >> static ByteSize tpl_cpuidB1_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >> static ByteSize tpl_cpuidB2_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >> @@ -673,8 +689,11 @@ >> if (is_intel() && supports_processor_topology()) { >> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >> - cores_per_cpu(); >> + if (cpu_family() >= 0x17) >> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; >> + else >> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >> + cores_per_cpu(); >> } >> return (result == 0 ? 1 : result); >> } >> >> I have attached the patch for review. >> Please let me know your comments. >> >> Thanks, >> Rohit >> >>> Thanks, >>> Vladimir >>> >>> >>>> >>>> src/cpu/x86/vm/vm_version_x86.cpp >>>> >>>> No comments on AMD specific changes. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>> >>>>> >>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> Hello David, >>>>>> >>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Rohit, >>>>>>> >>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>>>>> >>>>>> >>>>>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >>>>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>>>>> without any issues. >>>>>> Can you share the error message that you are getting? >>>>> >>>>> >>>>> >>>>> I was getting this: >>>>> >>>>> applying hotspot.patch >>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>> Hunk #1 FAILED at 1108 >>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>> Hunk #2 FAILED at 522 >>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>> abort: patch failed to apply >>>>> >>>>> but I started again and this time it applied fine, so not sure what was >>>>> going on there. >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>>> Regards, >>>>>> Rohit >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hello Vladimir, >>>>>>>> >>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Rohit, >>>>>>>>> >>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hello Vladimir, >>>>>>>>>> >>>>>>>>>>> Changes look good. Only question I have is about MaxVectorSize. >>>>>>>>>>> It >>>>>>>>>>> is >>>>>>>>>>> set >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>> >>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD >>>>>>>>>> 17h. >>>>>>>>>> So >>>>>>>>>> I have removed the surplus check for MaxVectorSize from my patch. >>>>>>>>>> I >>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Which check you removed? >>>>>>>>> >>>>>>>> >>>>>>>> My older patch had the below mentioned check which was required on >>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>> >>>>>>>> + // Some defaults for AMD family 17h >>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>> ... >>>>>>>> ... >>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>> + } >>>>>>>> .. >>>>>>>> .. >>>>>>>> + } >>>>>>>> >>>>>>>>>> >>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>> >>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>>>>> there >>>>>>>>>> an >>>>>>>>>> underlying reason for this? I have handled this in the patch but >>>>>>>>>> just >>>>>>>>>> wanted to confirm. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>> instructions >>>>>>>>> to >>>>>>>>> calculate SHA-256: >>>>>>>>> >>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>> >>>>>>>>> I don't know if AMD 15h supports these instructions and can execute >>>>>>>>> that >>>>>>>>> code. You need to test it. >>>>>>>>> >>>>>>>> >>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>>>>>> it should work. >>>>>>>> Confirmed by running following sanity tests: >>>>>>>> >>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>> >>>>>>>> >>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>> >>>>>>>> >>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>> >>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>> >>>>>>>> Please find attached updated, re-tested patch. >>>>>>>> >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>> } >>>>>>>> >>>>>>>> #ifdef COMPILER2 >>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>> } >>>>>>>> #endif // COMPILER2 >>>>>>>> + >>>>>>>> + // Some defaults for AMD family 17h >>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>> for >>>>>>>> Array Copy >>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>> + } >>>>>>>> + if (supports_sse2() && >>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>> { >>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>> + } >>>>>>>> +#ifdef COMPILER2 >>>>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>> { >>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>> + } >>>>>>>> +#endif >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>> result |= CPU_CLMUL; >>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>> result |= CPU_RTM; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> + result |= CPU_ADX; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> + result |= CPU_BMI2; >>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> + result |= CPU_SHA; >>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> + result |= CPU_FMA; >>>>>>>> >>>>>>>> // AMD features. >>>>>>>> if (is_amd()) { >>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>> result |= CPU_LZCNT; >>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>> result |= CPU_SSE4A; >>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>> + result |= CPU_HT; >>>>>>>> } >>>>>>>> // Intel features. >>>>>>>> if(is_intel()) { >>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> - result |= CPU_ADX; >>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> - result |= CPU_BMI2; >>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> - result |= CPU_SHA; >>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>> result |= CPU_LZCNT; >>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> - result |= CPU_FMA; >>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>> support for prefetchw >>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>> >>>>>>>> Please let me know your comments. >>>>>>>> >>>>>>>> Thanks for your time. >>>>>>>> Rohit >>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>> >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>> } >>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>> } >>>>>>>>>> + if (supports_sha()) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>> + } >>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>>> || >>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>> CPU"); >>>>>>>>>> + } >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>> } >>>>>>>>>> #endif // COMPILER2 >>>>>>>>>> + >>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>> for >>>>>>>>>> Array Copy >>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>> { >>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>> + } >>>>>>>>>> + if (supports_sse2() && >>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>> + } >>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>> { >>>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>> + } >>>>>>>>>> + if (UseSHA) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>>> functions not available on this CPU."); >>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> +#endif >>>>>>>>>> + } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>> result |= CPU_RTM; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>> >>>>>>>>>> // AMD features. >>>>>>>>>> if (is_amd()) { >>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>> + result |= CPU_HT; >>>>>>>>>> } >>>>>>>>>> // Intel features. >>>>>>>>>> if(is_intel()) { >>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>> indicates >>>>>>>>>> support for prefetchw >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a >>>>>>>>>>>>>> lot of >>>>>>>>>>>>>> logic >>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, >>>>>>>>>>>>> test >>>>>>>>>>>>> and >>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Rohit >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>> >>>>>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>>>>> flag/ISA >>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>> >>>>>>>>>>>> ************************* Patch **************************** >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>> } >>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>> } >>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>> || >>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>> CPU"); >>>>>>>>>>>> + } >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>> } >>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>> + >>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>> for >>>>>>>>>>>> Array Copy >>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>> { >>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>> { >>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>> + } >>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>> hash >>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +#endif >>>>>>>>>>>> + } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>> >>>>>>>>>>>> // AMD features. >>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>> } >>>>>>>>>>>> // Intel features. >>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>> indicates >>>>>>>>>>>> support for prefetchw >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>> 0) { >>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>> >>>>>>>>>>>> ************************************************************** >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Rohit >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and >>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>> || >>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>> > From rohitarulraj at gmail.com Tue Sep 12 04:52:46 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Tue, 12 Sep 2017 10:22:46 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com> <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> Message-ID: Hello David, >> >> >> 1. ExtCpuid1EEx >> >> Should this be ExtCpuid1EEbx? (I see the naming here is somewhat >> inconsistent - and potentially confusing: I would have preferred to see >> things like ExtCpuid_1E_Ebx, to make it clear.) > > Yes, I can change it accordingly. > I have attached the updated, re-tested patch as per your comments above. diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -70,7 +70,7 @@ bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, done, wrapup; + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, ext_cpuid8, done, wrapup; Label legacy_setup, save_restore_except, legacy_save_restore, start_simd_check; StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); @@ -272,9 +272,23 @@ __ jccb(Assembler::belowEqual, ext_cpuid5); __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? __ jccb(Assembler::belowEqual, ext_cpuid7); + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? + __ jccb(Assembler::belowEqual, ext_cpuid8); + // + // Extended cpuid(0x8000001E) + // + __ movl(rax, 0x8000001E); + __ cpuid(); + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid_1E_offset()))); + __ movl(Address(rsi, 0), rax); + __ movl(Address(rsi, 4), rbx); + __ movl(Address(rsi, 8), rcx); + __ movl(Address(rsi,12), rdx); + // // Extended cpuid(0x80000008) // + __ bind(ext_cpuid8); __ movl(rax, 0x80000008); __ cpuid(); __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); @@ -1109,11 +1123,27 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); + } +#ifdef COMPILER2 + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -228,6 +228,15 @@ } bits; }; + union ExtCpuid_1E_Ebx { + uint32_t value; + struct { + uint32_t : 8, + threads_per_core : 8, + : 16; + } bits; + }; + union XemXcr0Eax { uint32_t value; struct { @@ -398,6 +407,12 @@ ExtCpuid8Ecx ext_cpuid8_ecx; uint32_t ext_cpuid8_edx; // reserved + // cpuid function 0x8000001E // AMD 17h + uint32_t ext_cpuid_1E_eax; + ExtCpuid_1E_Ebx ext_cpuid_1E_ebx; // threads per core (AMD17h) + uint32_t ext_cpuid_1E_ecx; + uint32_t ext_cpuid_1E_edx; // unused currently + // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) XemXcr0Eax xem_xcr0_eax; uint32_t xem_xcr0_edx; // reserved @@ -505,6 +520,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -518,16 +541,8 @@ } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; @@ -590,6 +605,7 @@ static ByteSize ext_cpuid5_offset() { return byte_offset_of(CpuidInfo, ext_cpuid5_eax); } static ByteSize ext_cpuid7_offset() { return byte_offset_of(CpuidInfo, ext_cpuid7_eax); } static ByteSize ext_cpuid8_offset() { return byte_offset_of(CpuidInfo, ext_cpuid8_eax); } + static ByteSize ext_cpuid_1E_offset() { return byte_offset_of(CpuidInfo, ext_cpuid_1E_eax); } static ByteSize tpl_cpuidB0_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } static ByteSize tpl_cpuidB1_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } static ByteSize tpl_cpuidB2_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } @@ -673,8 +689,11 @@ if (is_intel() && supports_processor_topology()) { result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / - cores_per_cpu(); + if (cpu_family() >= 0x17) + result = _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + 1; + else + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / + cores_per_cpu(); } return (result == 0 ? 1 : result); } Please let me know your comments Thanks for your time. Regards, Rohit >> Thanks, >> David >> ----- >> >> >>> Reference: >>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >>> [Pg 82] >>> >>> CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >>> 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. >>> The number of threads per core is ThreadsPerCore+1. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -70,7 +70,7 @@ >>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>> >>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>> done, wrapup; >>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>> ext_cpuid8, done, wrapup; >>> Label legacy_setup, save_restore_except, legacy_save_restore, >>> start_simd_check; >>> >>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>> @@ -272,9 +272,23 @@ >>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>> + // >>> + // Extended cpuid(0x8000001E) >>> + // >>> + __ movl(rax, 0x8000001E); >>> + __ cpuid(); >>> + __ lea(rsi, Address(rbp, >>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>> + __ movl(Address(rsi, 0), rax); >>> + __ movl(Address(rsi, 4), rbx); >>> + __ movl(Address(rsi, 8), rcx); >>> + __ movl(Address(rsi,12), rdx); >>> + >>> // >>> // Extended cpuid(0x80000008) >>> // >>> + __ bind(ext_cpuid8); >>> __ movl(rax, 0x80000008); >>> __ cpuid(); >>> __ lea(rsi, Address(rbp, >>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>> @@ -1109,11 +1123,27 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -228,6 +228,15 @@ >>> } bits; >>> }; >>> >>> + union ExtCpuid1EEx { >>> + uint32_t value; >>> + struct { >>> + uint32_t : 8, >>> + threads_per_core : 8, >>> + : 16; >>> + } bits; >>> + }; >>> + >>> union XemXcr0Eax { >>> uint32_t value; >>> struct { >>> @@ -398,6 +407,12 @@ >>> ExtCpuid8Ecx ext_cpuid8_ecx; >>> uint32_t ext_cpuid8_edx; // reserved >>> >>> + // cpuid function 0x8000001E // AMD 17h >>> + uint32_t ext_cpuid1E_eax; >>> + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >>> + uint32_t ext_cpuid1E_ecx; >>> + uint32_t ext_cpuid1E_edx; // unused currently >>> + >>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>> register) >>> XemXcr0Eax xem_xcr0_eax; >>> uint32_t xem_xcr0_edx; // reserved >>> @@ -505,6 +520,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -518,16 +541,8 @@ >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> @@ -590,6 +605,7 @@ >>> static ByteSize ext_cpuid5_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>> static ByteSize ext_cpuid7_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>> static ByteSize ext_cpuid8_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>> + static ByteSize ext_cpuid1E_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>> static ByteSize tpl_cpuidB0_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>> static ByteSize tpl_cpuidB1_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>> static ByteSize tpl_cpuidB2_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>> @@ -673,8 +689,11 @@ >>> if (is_intel() && supports_processor_topology()) { >>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>> - cores_per_cpu(); >>> + if (cpu_family() >= 0x17) >>> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; >>> + else >>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>> + cores_per_cpu(); >>> } >>> return (result == 0 ? 1 : result); >>> } >>> >>> I have attached the patch for review. >>> Please let me know your comments. >>> >>> Thanks, >>> Rohit >>> >>>> Thanks, >>>> Vladimir >>>> >>>> >>>>> >>>>> src/cpu/x86/vm/vm_version_x86.cpp >>>>> >>>>> No comments on AMD specific changes. >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>>> >>>>>> >>>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> Hello David, >>>>>>> >>>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Rohit, >>>>>>>> >>>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>>>>>> >>>>>>> >>>>>>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >>>>>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>>>>>> without any issues. >>>>>>> Can you share the error message that you are getting? >>>>>> >>>>>> >>>>>> >>>>>> I was getting this: >>>>>> >>>>>> applying hotspot.patch >>>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>>> Hunk #1 FAILED at 1108 >>>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>>> Hunk #2 FAILED at 522 >>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>>> abort: patch failed to apply >>>>>> >>>>>> but I started again and this time it applied fine, so not sure what was >>>>>> going on there. >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>>> Regards, >>>>>>> Rohit >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello Vladimir, >>>>>>>>> >>>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Rohit, >>>>>>>>>> >>>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hello Vladimir, >>>>>>>>>>> >>>>>>>>>>>> Changes look good. Only question I have is about MaxVectorSize. >>>>>>>>>>>> It >>>>>>>>>>>> is >>>>>>>>>>>> set >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>>> >>>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD >>>>>>>>>>> 17h. >>>>>>>>>>> So >>>>>>>>>>> I have removed the surplus check for MaxVectorSize from my patch. >>>>>>>>>>> I >>>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Which check you removed? >>>>>>>>>> >>>>>>>>> >>>>>>>>> My older patch had the below mentioned check which was required on >>>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>>> >>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>> ... >>>>>>>>> ... >>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>> + } >>>>>>>>> .. >>>>>>>>> .. >>>>>>>>> + } >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>>> >>>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>>>>>> there >>>>>>>>>>> an >>>>>>>>>>> underlying reason for this? I have handled this in the patch but >>>>>>>>>>> just >>>>>>>>>>> wanted to confirm. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>>> instructions >>>>>>>>>> to >>>>>>>>>> calculate SHA-256: >>>>>>>>>> >>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>>> >>>>>>>>>> I don't know if AMD 15h supports these instructions and can execute >>>>>>>>>> that >>>>>>>>>> code. You need to test it. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>>>>>>> it should work. >>>>>>>>> Confirmed by running following sanity tests: >>>>>>>>> >>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>>> >>>>>>>>> >>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>>> >>>>>>>>> >>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>>> >>>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>>> >>>>>>>>> Please find attached updated, re-tested patch. >>>>>>>>> >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>>> } >>>>>>>>> >>>>>>>>> #ifdef COMPILER2 >>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>> } >>>>>>>>> #endif // COMPILER2 >>>>>>>>> + >>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>> for >>>>>>>>> Array Copy >>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>> + } >>>>>>>>> + if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>> { >>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>> + } >>>>>>>>> +#ifdef COMPILER2 >>>>>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>>> { >>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>> + } >>>>>>>>> +#endif >>>>>>>>> + } >>>>>>>>> } >>>>>>>>> >>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>> result |= CPU_CLMUL; >>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>> result |= CPU_RTM; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> + result |= CPU_ADX; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> + result |= CPU_BMI2; >>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> + result |= CPU_SHA; >>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> + result |= CPU_FMA; >>>>>>>>> >>>>>>>>> // AMD features. >>>>>>>>> if (is_amd()) { >>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>> result |= CPU_SSE4A; >>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>> + result |= CPU_HT; >>>>>>>>> } >>>>>>>>> // Intel features. >>>>>>>>> if(is_intel()) { >>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> - result |= CPU_ADX; >>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> - result |= CPU_BMI2; >>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> - result |= CPU_SHA; >>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> - result |= CPU_FMA; >>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>>> support for prefetchw >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>> >>>>>>>>> Please let me know your comments. >>>>>>>>> >>>>>>>>> Thanks for your time. >>>>>>>>> Rohit >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>>> >>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>> } >>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>> } >>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>> + } >>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>>>> || >>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>> CPU"); >>>>>>>>>>> + } >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>> } >>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>> + >>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>>> for >>>>>>>>>>> Array Copy >>>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>> { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>> + } >>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>> + } >>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>> { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>>> + } >>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> +#endif >>>>>>>>>>> + } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>> >>>>>>>>>>> // AMD features. >>>>>>>>>>> if (is_amd()) { >>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>> } >>>>>>>>>>> // Intel features. >>>>>>>>>>> if(is_intel()) { >>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>> indicates >>>>>>>>>>> support for prefetchw >>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Rohit >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a >>>>>>>>>>>>>>> lot of >>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, >>>>>>>>>>>>>> test >>>>>>>>>>>>>> and >>>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>> >>>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>>> >>>>>>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>>>>>> flag/ISA >>>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>>> >>>>>>>>>>>>> ************************* Patch **************************** >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>> } >>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>> } >>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>> || >>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>> CPU"); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } >>>>>>>>>>>>> >>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>> } >>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>> + >>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>> for >>>>>>>>>>>>> Array Copy >>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>> hash >>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#endif >>>>>>>>>>>>> + } >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>> >>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>> } >>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>> indicates >>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>>> 0) { >>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>> >>>>>>>>>>>>> ************************************************************** >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Rohit >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and >>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>> >> From glaubitz at physik.fu-berlin.de Tue Sep 12 19:58:52 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Tue, 12 Sep 2017 21:58:52 +0200 Subject: [RFR]: 8187227: __m68k_cmpxchg() is not being used correctly In-Reply-To: References: Message-ID: <86379c30-80e6-04ab-a270-be034657f7ea@physik.fu-berlin.de> Hi David! Sorry for the late reply! On 09/06/2017 04:11 AM, David Holmes wrote: > Not really a review but I was curious about this ... Ok :). Looks like a review though. > On 5/09/2017 8:00 PM, John Paul Adrian Glaubitz wrote: > I am surprised this even works at all. So trying to follow the logic if initially "prev == oldval" then the cas actually succeeds, but the loop logic thinks it > failed and so retries. It re-reads the current value into prev, which no longer equals oldval, so the loop terminates and it returns "prev" which may actually > be the value that was updated by the successful cas; or it could be a different value if some other thread has since also performed a successful cas. So this > function would always report failure, even on success (except in ABA situation)! I can't see how anything would work in that case ?? I have to admit, compare-and-swap always gets me confused, so I have to carefully re-check the logic. It goes like this: 1) store value of memory into prev 2) check if it's not equal to oldval (which is false at this point), thus continue and don't return 3) perform exchange, newprev contains oldval on success 4) if prev and newprev are identical, we actually made a successful exchange, hence return oldval 5) if newprev != prev, the exchange failed and we have whatever was in the memory location 6) write memory location contents into prev 7) if prev is not oldval, it means there was no exchange, hence return what was in the memory location Does that make any sense? > BTW m68k_compare_and_swap does not need to have a loop at all, it only has to > do the cas and return the correct value. A loop would only be needed if the > low-level cas can fail spuriously - which does not seem to be the case. Good point. I will ask Andreas Schwab, who is the Linux m68k expert, on his opinion. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From david.holmes at oracle.com Wed Sep 13 03:34:02 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 13 Sep 2017 13:34:02 +1000 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> Message-ID: <8821f457-c060-48bc-d874-f8d114fa0f48@oracle.com> Hi Rohit, On 12/09/2017 2:52 PM, Rohit Arul Raj wrote: > Hello David, > >>> >>> >>> 1. ExtCpuid1EEx >>> >>> Should this be ExtCpuid1EEbx? (I see the naming here is somewhat >>> inconsistent - and potentially confusing: I would have preferred to see >>> things like ExtCpuid_1E_Ebx, to make it clear.) >> >> Yes, I can change it accordingly. The parenthetical comment was wishful thinking - that the underscores had been used for all of these names in this code. The change I was suggesting was just use of ExtCpuid1EEbx. I don't need to see further changes. This can be finalized when Vladimir pushes the change once the repo is open again. Thanks, David > > I have attached the updated, re-tested patch as per your comments above. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -70,7 +70,7 @@ > bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); > > Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; > - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > done, wrapup; > + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > ext_cpuid8, done, wrapup; > Label legacy_setup, save_restore_except, legacy_save_restore, > start_simd_check; > > StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); > @@ -272,9 +272,23 @@ > __ jccb(Assembler::belowEqual, ext_cpuid5); > __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? > __ jccb(Assembler::belowEqual, ext_cpuid7); > + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? > + __ jccb(Assembler::belowEqual, ext_cpuid8); > + // > + // Extended cpuid(0x8000001E) > + // > + __ movl(rax, 0x8000001E); > + __ cpuid(); > + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid_1E_offset()))); > + __ movl(Address(rsi, 0), rax); > + __ movl(Address(rsi, 4), rbx); > + __ movl(Address(rsi, 8), rcx); > + __ movl(Address(rsi,12), rdx); > + > // > // Extended cpuid(0x80000008) > // > + __ bind(ext_cpuid8); > __ movl(rax, 0x80000008); > __ cpuid(); > __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); > @@ -1109,11 +1123,27 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > +#ifdef COMPILER2 > + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -228,6 +228,15 @@ > } bits; > }; > > + union ExtCpuid_1E_Ebx { > + uint32_t value; > + struct { > + uint32_t : 8, > + threads_per_core : 8, > + : 16; > + } bits; > + }; > + > union XemXcr0Eax { > uint32_t value; > struct { > @@ -398,6 +407,12 @@ > ExtCpuid8Ecx ext_cpuid8_ecx; > uint32_t ext_cpuid8_edx; // reserved > > + // cpuid function 0x8000001E // AMD 17h > + uint32_t ext_cpuid_1E_eax; > + ExtCpuid_1E_Ebx ext_cpuid_1E_ebx; // threads per core (AMD17h) > + uint32_t ext_cpuid_1E_ecx; > + uint32_t ext_cpuid_1E_edx; // unused currently > + > // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) > XemXcr0Eax xem_xcr0_eax; > uint32_t xem_xcr0_edx; // reserved > @@ -505,6 +520,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -518,16 +541,8 @@ > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > @@ -590,6 +605,7 @@ > static ByteSize ext_cpuid5_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid5_eax); } > static ByteSize ext_cpuid7_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid7_eax); } > static ByteSize ext_cpuid8_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid8_eax); } > + static ByteSize ext_cpuid_1E_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid_1E_eax); } > static ByteSize tpl_cpuidB0_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } > static ByteSize tpl_cpuidB1_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } > static ByteSize tpl_cpuidB2_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } > @@ -673,8 +689,11 @@ > if (is_intel() && supports_processor_topology()) { > result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; > } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { > - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > - cores_per_cpu(); > + if (cpu_family() >= 0x17) > + result = _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + 1; > + else > + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > + cores_per_cpu(); > } > return (result == 0 ? 1 : result); > } > > > Please let me know your comments > > Thanks for your time. > > Regards, > Rohit > > >>> Thanks, >>> David >>> ----- >>> >>> >>>> Reference: >>>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >>>> [Pg 82] >>>> >>>> CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >>>> 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. >>>> The number of threads per core is ThreadsPerCore+1. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -70,7 +70,7 @@ >>>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>> >>>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>> done, wrapup; >>>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>> ext_cpuid8, done, wrapup; >>>> Label legacy_setup, save_restore_except, legacy_save_restore, >>>> start_simd_check; >>>> >>>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>> @@ -272,9 +272,23 @@ >>>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>>> + // >>>> + // Extended cpuid(0x8000001E) >>>> + // >>>> + __ movl(rax, 0x8000001E); >>>> + __ cpuid(); >>>> + __ lea(rsi, Address(rbp, >>>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>>> + __ movl(Address(rsi, 0), rax); >>>> + __ movl(Address(rsi, 4), rbx); >>>> + __ movl(Address(rsi, 8), rcx); >>>> + __ movl(Address(rsi,12), rdx); >>>> + >>>> // >>>> // Extended cpuid(0x80000008) >>>> // >>>> + __ bind(ext_cpuid8); >>>> __ movl(rax, 0x80000008); >>>> __ cpuid(); >>>> __ lea(rsi, Address(rbp, >>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>> @@ -1109,11 +1123,27 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -228,6 +228,15 @@ >>>> } bits; >>>> }; >>>> >>>> + union ExtCpuid1EEx { >>>> + uint32_t value; >>>> + struct { >>>> + uint32_t : 8, >>>> + threads_per_core : 8, >>>> + : 16; >>>> + } bits; >>>> + }; >>>> + >>>> union XemXcr0Eax { >>>> uint32_t value; >>>> struct { >>>> @@ -398,6 +407,12 @@ >>>> ExtCpuid8Ecx ext_cpuid8_ecx; >>>> uint32_t ext_cpuid8_edx; // reserved >>>> >>>> + // cpuid function 0x8000001E // AMD 17h >>>> + uint32_t ext_cpuid1E_eax; >>>> + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >>>> + uint32_t ext_cpuid1E_ecx; >>>> + uint32_t ext_cpuid1E_edx; // unused currently >>>> + >>>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>> register) >>>> XemXcr0Eax xem_xcr0_eax; >>>> uint32_t xem_xcr0_edx; // reserved >>>> @@ -505,6 +520,14 @@ >>>> result |= CPU_CLMUL; >>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> result |= CPU_RTM; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> >>>> // AMD features. >>>> if (is_amd()) { >>>> @@ -518,16 +541,8 @@ >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> - result |= CPU_ADX; >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> - result |= CPU_BMI2; >>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> - result |= CPU_SHA; >>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> result |= CPU_LZCNT; >>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> - result |= CPU_FMA; >>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> result |= CPU_3DNOW_PREFETCH; >>>> @@ -590,6 +605,7 @@ >>>> static ByteSize ext_cpuid5_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>> static ByteSize ext_cpuid7_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>> static ByteSize ext_cpuid8_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>> + static ByteSize ext_cpuid1E_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>>> static ByteSize tpl_cpuidB0_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>> static ByteSize tpl_cpuidB1_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>> static ByteSize tpl_cpuidB2_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>> @@ -673,8 +689,11 @@ >>>> if (is_intel() && supports_processor_topology()) { >>>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>> - cores_per_cpu(); >>>> + if (cpu_family() >= 0x17) >>>> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; >>>> + else >>>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>> + cores_per_cpu(); >>>> } >>>> return (result == 0 ? 1 : result); >>>> } >>>> >>>> I have attached the patch for review. >>>> Please let me know your comments. >>>> >>>> Thanks, >>>> Rohit >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> >>>>>> >>>>>> src/cpu/x86/vm/vm_version_x86.cpp >>>>>> >>>>>> No comments on AMD specific changes. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>>>> >>>>>>> >>>>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hello David, >>>>>>>> >>>>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Rohit, >>>>>>>>> >>>>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>>>>>>> >>>>>>>> >>>>>>>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >>>>>>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>>>>>>> without any issues. >>>>>>>> Can you share the error message that you are getting? >>>>>>> >>>>>>> >>>>>>> >>>>>>> I was getting this: >>>>>>> >>>>>>> applying hotspot.patch >>>>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> Hunk #1 FAILED at 1108 >>>>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> Hunk #2 FAILED at 522 >>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>>>> abort: patch failed to apply >>>>>>> >>>>>>> but I started again and this time it applied fine, so not sure what was >>>>>>> going on there. >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>>> Regards, >>>>>>>> Rohit >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hello Vladimir, >>>>>>>>>> >>>>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Rohit, >>>>>>>>>>> >>>>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>> >>>>>>>>>>>>> Changes look good. Only question I have is about MaxVectorSize. >>>>>>>>>>>>> It >>>>>>>>>>>>> is >>>>>>>>>>>>> set >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>>>> >>>>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD >>>>>>>>>>>> 17h. >>>>>>>>>>>> So >>>>>>>>>>>> I have removed the surplus check for MaxVectorSize from my patch. >>>>>>>>>>>> I >>>>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Which check you removed? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> My older patch had the below mentioned check which was required on >>>>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>>>> >>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>> ... >>>>>>>>>> ... >>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>> + } >>>>>>>>>> .. >>>>>>>>>> .. >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>>>> >>>>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>>>>>>> there >>>>>>>>>>>> an >>>>>>>>>>>> underlying reason for this? I have handled this in the patch but >>>>>>>>>>>> just >>>>>>>>>>>> wanted to confirm. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>>>> instructions >>>>>>>>>>> to >>>>>>>>>>> calculate SHA-256: >>>>>>>>>>> >>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>>>> >>>>>>>>>>> I don't know if AMD 15h supports these instructions and can execute >>>>>>>>>>> that >>>>>>>>>>> code. You need to test it. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>>>>>>>> it should work. >>>>>>>>>> Confirmed by running following sanity tests: >>>>>>>>>> >>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>>>> >>>>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>>>> >>>>>>>>>> Please find attached updated, re-tested patch. >>>>>>>>>> >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>> } >>>>>>>>>> #endif // COMPILER2 >>>>>>>>>> + >>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>> for >>>>>>>>>> Array Copy >>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>> + } >>>>>>>>>> + if (supports_sse2() && >>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>> { >>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>> + } >>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>>>> { >>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>> + } >>>>>>>>>> +#endif >>>>>>>>>> + } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>> result |= CPU_RTM; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>> >>>>>>>>>> // AMD features. >>>>>>>>>> if (is_amd()) { >>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>> + result |= CPU_HT; >>>>>>>>>> } >>>>>>>>>> // Intel features. >>>>>>>>>> if(is_intel()) { >>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>>>> support for prefetchw >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>> >>>>>>>>>> Please let me know your comments. >>>>>>>>>> >>>>>>>>>> Thanks for your time. >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>> } >>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>> } >>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>>>>> || >>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>> CPU"); >>>>>>>>>>>> + } >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>> } >>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>> + >>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>>>> for >>>>>>>>>>>> Array Copy >>>>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>> { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>> { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +#endif >>>>>>>>>>>> + } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>> >>>>>>>>>>>> // AMD features. >>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>> } >>>>>>>>>>>> // Intel features. >>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>> indicates >>>>>>>>>>>> support for prefetchw >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Rohit >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a >>>>>>>>>>>>>>>> lot of >>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, >>>>>>>>>>>>>>> test >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>>>>>>> flag/ISA >>>>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>>>> >>>>>>>>>>>>>> ************************* Patch **************************** >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>> || >>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> >>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>> for >>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>> hash >>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>> >>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>> indicates >>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>> >>>>>>>>>>>>>> ************************************************************** >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and >>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>> >>> From kim.barrett at oracle.com Tue Sep 12 13:08:53 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 12 Sep 2017 15:08:53 +0200 Subject: RFR: 8186476: Generalize Atomic::add with templates In-Reply-To: <44a6d9bd-24b6-c8cb-973d-ff40775381c9@redhat.com> References: <61346762-D2F9-4B13-934F-280606E3E0F7@oracle.com> <44a6d9bd-24b6-c8cb-973d-ff40775381c9@redhat.com> Message-ID: <2C219970-24D2-4C01-81ED-F9201398E065@oracle.com> Catching up from vacation. > On Aug 22, 2017, at 10:52 AM, Andrew Haley wrote: > > On 20/08/17 07:16, Kim Barrett wrote: >> - atomic_linux_sparc.hpp >> >> Neither add variant has "cc" in the clobbers list, even though the >> condition codes are modified. That seems like a pre-existing bug. > > Some targets always clobber the flags. I can't remember whether SPARC > is one of them. I?ve no idea, but that seems peculiar since SPARC has various instruction pairs for affecting or not the condition codes. >> Uses hard-coded machine registers and assumes the inline-asm >> processing assigns the values to the corresponding machine registers, >> even though the given constraints are just generic register requests. >> I don't understand how this can work. > > Looks right to me. Argh! You are correct, and I misread it. > 69 __asm__ volatile( > 70 "1: \n\t" > 71 " ld [%2], %%o2\n\t" > 72 " add %1, %%o2, %%o3\n\t" > 73 " cas [%2], %%o2, %%o3\n\t" > 74 " cmp %%o2, %%o3\n\t" > 75 " bne 1b\n\t" > 76 " nop\n\t" > 77 " add %1, %%o2, %0\n\t" > 78 : "=r" (rv) > 79 : "r" (add_value), "r" (dest) > 80 : "memory", "o2", "o3"); > 81 return rv; > 82 } > > o2 and o3 are temps. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at linaro.org Wed Sep 13 13:10:36 2017 From: felix.yang at linaro.org (Felix Yang) Date: Wed, 13 Sep 2017 21:10:36 +0800 Subject: Question regarding "native-compiled frame" In-Reply-To: References: Message-ID: On 7 September 2017 at 19:04, Andrew Dinn wrote: > On 07/09/17 11:35, Felix Yang wrote: > > Thanks for the reply. > > Then when will the last return statement of frame::sender got a chance > to > > be executed? > > As I see it, when JVM does something in safepoint state and need to > > traverse Java thread stack, we never calculate the sender of a > > native-compiled frame. > > It is possible for Java to call out into the JVM and then for the JVM to > call back into Java. For example, when a class is loaded the JVM calls > into Java to run the class initializer. This re-entry may happen > multiple times. > > In that case a stack walk under the re-entry may find a Java start fame > and it's parent frame will be the native frame where Java entered the VM. > Yes, that's the frame structure. > Note that the native frame will always be returned by the call to > sender_for_entry_frame(map). That method skips all the C frames between > the Java entry frame and the native frame which exited Java. > True. And this is handled by the three IF statements of frame::sender function. It seems to me that the last return statement is not involved in the stack walking process, isn't it? Thanks for your help, Felix From vladimir.kozlov at oracle.com Wed Sep 13 16:50:52 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 13 Sep 2017 09:50:52 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> Message-ID: <11af0f62-ba6b-d533-d23c-750d2ca012c7@oracle.com> Hi Rohit, CPUID check for 0x8000001E should be explicit. Otherwise the code will be executed for all above 0x80000008. + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) supported? + __ jccb(Assembler::belowEqual, ext_cpuid8); + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? + __ jccb(Assembler::notEqual, ext_cpuid8); + // + // Extended cpuid(0x8000001E) + // Use {} in next condition: > + if (cpu_family() >= 0x17) { > + result = _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + 1; > + } else { > + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > + cores_per_cpu(); > + } Thanks, Vladimir On 9/11/17 9:52 PM, Rohit Arul Raj wrote: > Hello David, > >>> >>> >>> 1. ExtCpuid1EEx >>> >>> Should this be ExtCpuid1EEbx? (I see the naming here is somewhat >>> inconsistent - and potentially confusing: I would have preferred to see >>> things like ExtCpuid_1E_Ebx, to make it clear.) >> >> Yes, I can change it accordingly. >> > > I have attached the updated, re-tested patch as per your comments above. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -70,7 +70,7 @@ > bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); > > Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; > - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > done, wrapup; > + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > ext_cpuid8, done, wrapup; > Label legacy_setup, save_restore_except, legacy_save_restore, > start_simd_check; > > StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); > @@ -272,9 +272,23 @@ > __ jccb(Assembler::belowEqual, ext_cpuid5); > __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? > __ jccb(Assembler::belowEqual, ext_cpuid7); > + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? > + __ jccb(Assembler::belowEqual, ext_cpuid8); > + // > + // Extended cpuid(0x8000001E) > + // > + __ movl(rax, 0x8000001E); > + __ cpuid(); > + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid_1E_offset()))); > + __ movl(Address(rsi, 0), rax); > + __ movl(Address(rsi, 4), rbx); > + __ movl(Address(rsi, 8), rcx); > + __ movl(Address(rsi,12), rdx); > + > // > // Extended cpuid(0x80000008) > // > + __ bind(ext_cpuid8); > __ movl(rax, 0x80000008); > __ cpuid(); > __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); > @@ -1109,11 +1123,27 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > +#ifdef COMPILER2 > + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -228,6 +228,15 @@ > } bits; > }; > > + union ExtCpuid_1E_Ebx { > + uint32_t value; > + struct { > + uint32_t : 8, > + threads_per_core : 8, > + : 16; > + } bits; > + }; > + > union XemXcr0Eax { > uint32_t value; > struct { > @@ -398,6 +407,12 @@ > ExtCpuid8Ecx ext_cpuid8_ecx; > uint32_t ext_cpuid8_edx; // reserved > > + // cpuid function 0x8000001E // AMD 17h > + uint32_t ext_cpuid_1E_eax; > + ExtCpuid_1E_Ebx ext_cpuid_1E_ebx; // threads per core (AMD17h) > + uint32_t ext_cpuid_1E_ecx; > + uint32_t ext_cpuid_1E_edx; // unused currently > + > // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) > XemXcr0Eax xem_xcr0_eax; > uint32_t xem_xcr0_edx; // reserved > @@ -505,6 +520,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -518,16 +541,8 @@ > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > @@ -590,6 +605,7 @@ > static ByteSize ext_cpuid5_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid5_eax); } > static ByteSize ext_cpuid7_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid7_eax); } > static ByteSize ext_cpuid8_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid8_eax); } > + static ByteSize ext_cpuid_1E_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid_1E_eax); } > static ByteSize tpl_cpuidB0_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } > static ByteSize tpl_cpuidB1_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } > static ByteSize tpl_cpuidB2_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } > @@ -673,8 +689,11 @@ > if (is_intel() && supports_processor_topology()) { > result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; > } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { > - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > - cores_per_cpu(); > + if (cpu_family() >= 0x17) > + result = _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + 1; > + else > + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > + cores_per_cpu(); > } > return (result == 0 ? 1 : result); > } > > > Please let me know your comments > > Thanks for your time. > > Regards, > Rohit > > >>> Thanks, >>> David >>> ----- >>> >>> >>>> Reference: >>>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >>>> [Pg 82] >>>> >>>> CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >>>> 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. >>>> The number of threads per core is ThreadsPerCore+1. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -70,7 +70,7 @@ >>>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>> >>>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>> done, wrapup; >>>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>> ext_cpuid8, done, wrapup; >>>> Label legacy_setup, save_restore_except, legacy_save_restore, >>>> start_simd_check; >>>> >>>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>> @@ -272,9 +272,23 @@ >>>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>>> + // >>>> + // Extended cpuid(0x8000001E) >>>> + // >>>> + __ movl(rax, 0x8000001E); >>>> + __ cpuid(); >>>> + __ lea(rsi, Address(rbp, >>>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>>> + __ movl(Address(rsi, 0), rax); >>>> + __ movl(Address(rsi, 4), rbx); >>>> + __ movl(Address(rsi, 8), rcx); >>>> + __ movl(Address(rsi,12), rdx); >>>> + >>>> // >>>> // Extended cpuid(0x80000008) >>>> // >>>> + __ bind(ext_cpuid8); >>>> __ movl(rax, 0x80000008); >>>> __ cpuid(); >>>> __ lea(rsi, Address(rbp, >>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>> @@ -1109,11 +1123,27 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -228,6 +228,15 @@ >>>> } bits; >>>> }; >>>> >>>> + union ExtCpuid1EEx { >>>> + uint32_t value; >>>> + struct { >>>> + uint32_t : 8, >>>> + threads_per_core : 8, >>>> + : 16; >>>> + } bits; >>>> + }; >>>> + >>>> union XemXcr0Eax { >>>> uint32_t value; >>>> struct { >>>> @@ -398,6 +407,12 @@ >>>> ExtCpuid8Ecx ext_cpuid8_ecx; >>>> uint32_t ext_cpuid8_edx; // reserved >>>> >>>> + // cpuid function 0x8000001E // AMD 17h >>>> + uint32_t ext_cpuid1E_eax; >>>> + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >>>> + uint32_t ext_cpuid1E_ecx; >>>> + uint32_t ext_cpuid1E_edx; // unused currently >>>> + >>>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>> register) >>>> XemXcr0Eax xem_xcr0_eax; >>>> uint32_t xem_xcr0_edx; // reserved >>>> @@ -505,6 +520,14 @@ >>>> result |= CPU_CLMUL; >>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> result |= CPU_RTM; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> >>>> // AMD features. >>>> if (is_amd()) { >>>> @@ -518,16 +541,8 @@ >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> - result |= CPU_ADX; >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> - result |= CPU_BMI2; >>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> - result |= CPU_SHA; >>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> result |= CPU_LZCNT; >>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> - result |= CPU_FMA; >>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> result |= CPU_3DNOW_PREFETCH; >>>> @@ -590,6 +605,7 @@ >>>> static ByteSize ext_cpuid5_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>> static ByteSize ext_cpuid7_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>> static ByteSize ext_cpuid8_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>> + static ByteSize ext_cpuid1E_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>>> static ByteSize tpl_cpuidB0_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>> static ByteSize tpl_cpuidB1_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>> static ByteSize tpl_cpuidB2_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>> @@ -673,8 +689,11 @@ >>>> if (is_intel() && supports_processor_topology()) { >>>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>> - cores_per_cpu(); >>>> + if (cpu_family() >= 0x17) >>>> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; >>>> + else >>>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>> + cores_per_cpu(); >>>> } >>>> return (result == 0 ? 1 : result); >>>> } >>>> >>>> I have attached the patch for review. >>>> Please let me know your comments. >>>> >>>> Thanks, >>>> Rohit >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> >>>>>> >>>>>> src/cpu/x86/vm/vm_version_x86.cpp >>>>>> >>>>>> No comments on AMD specific changes. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>>>> >>>>>>> >>>>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hello David, >>>>>>>> >>>>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Rohit, >>>>>>>>> >>>>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>>>>>>> >>>>>>>> >>>>>>>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826] >>>>>>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>>>>>>> without any issues. >>>>>>>> Can you share the error message that you are getting? >>>>>>> >>>>>>> >>>>>>> >>>>>>> I was getting this: >>>>>>> >>>>>>> applying hotspot.patch >>>>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> Hunk #1 FAILED at 1108 >>>>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> Hunk #2 FAILED at 522 >>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>>>> abort: patch failed to apply >>>>>>> >>>>>>> but I started again and this time it applied fine, so not sure what was >>>>>>> going on there. >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>>> Regards, >>>>>>>> Rohit >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hello Vladimir, >>>>>>>>>> >>>>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Rohit, >>>>>>>>>>> >>>>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>> >>>>>>>>>>>>> Changes look good. Only question I have is about MaxVectorSize. >>>>>>>>>>>>> It >>>>>>>>>>>>> is >>>>>>>>>>>>> set >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>>>> >>>>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD >>>>>>>>>>>> 17h. >>>>>>>>>>>> So >>>>>>>>>>>> I have removed the surplus check for MaxVectorSize from my patch. >>>>>>>>>>>> I >>>>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Which check you removed? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> My older patch had the below mentioned check which was required on >>>>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>>>> >>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>> ... >>>>>>>>>> ... >>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>> + } >>>>>>>>>> .. >>>>>>>>>> .. >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>>>> >>>>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>>>>>>> there >>>>>>>>>>>> an >>>>>>>>>>>> underlying reason for this? I have handled this in the patch but >>>>>>>>>>>> just >>>>>>>>>>>> wanted to confirm. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>>>> instructions >>>>>>>>>>> to >>>>>>>>>>> calculate SHA-256: >>>>>>>>>>> >>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>>>> >>>>>>>>>>> I don't know if AMD 15h supports these instructions and can execute >>>>>>>>>>> that >>>>>>>>>>> code. You need to test it. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >>>>>>>>>> it should work. >>>>>>>>>> Confirmed by running following sanity tests: >>>>>>>>>> >>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>>>> >>>>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>>>> >>>>>>>>>> Please find attached updated, re-tested patch. >>>>>>>>>> >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>> } >>>>>>>>>> #endif // COMPILER2 >>>>>>>>>> + >>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>> for >>>>>>>>>> Array Copy >>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>> + } >>>>>>>>>> + if (supports_sse2() && >>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>> { >>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>> + } >>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>>>> { >>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>> + } >>>>>>>>>> +#endif >>>>>>>>>> + } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>> result |= CPU_RTM; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>> >>>>>>>>>> // AMD features. >>>>>>>>>> if (is_amd()) { >>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>> + result |= CPU_HT; >>>>>>>>>> } >>>>>>>>>> // Intel features. >>>>>>>>>> if(is_intel()) { >>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>>>> support for prefetchw >>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>> >>>>>>>>>> Please let me know your comments. >>>>>>>>>> >>>>>>>>>> Thanks for your time. >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>> } >>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>> } >>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>>>>> || >>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>> CPU"); >>>>>>>>>>>> + } >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>> } >>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>> + >>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>>>>> for >>>>>>>>>>>> Array Copy >>>>>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>> { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>> { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + } >>>>>>>>>>>> +#endif >>>>>>>>>>>> + } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>> >>>>>>>>>>>> // AMD features. >>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>> } >>>>>>>>>>>> // Intel features. >>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>> indicates >>>>>>>>>>>> support for prefetchw >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Rohit >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a >>>>>>>>>>>>>>>> lot of >>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, >>>>>>>>>>>>>>> test >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>>>>>>> flag/ISA >>>>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>>>> >>>>>>>>>>>>>> ************************* Patch **************************** >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>> || >>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> >>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>> for >>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>> hash >>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>> >>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>> indicates >>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>> >>>>>>>>>>>>>> ************************************************************** >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) >>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and >>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you will >>>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment >>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>> >> > From stuart.monteith at linaro.org Wed Sep 13 18:36:29 2017 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Wed, 13 Sep 2017 19:36:29 +0100 Subject: RFR(XL/M) : 8178788: wrap JCStress test suite as jtreg tests In-Reply-To: References: <9A2C94EA-89A3-4C75-9D3C-51E058BD8A1D@oracle.com> <342c3748-616f-8a7d-74f4-3ce929b1e0dc@redhat.com> Message-ID: For the record, if I put my build of jcstress.jar in $HOME, the following allows the jcstress tests to run: make test TEST="hotspot_all" EXTRA_JTREG_OPTIONS="-Djdk.test.lib.artifacts.jcstress-tests-all=$HOME/jcstress.jar" >From "make help" and the makefiles themselves, I had expected the follow to work: make TEST="hotspot_all" JTREG="VM_OPTIONS=-Djdk.test.lib.artifacts.jcstress-tests-all=$HOME/jcstress.jar" test but it does not - the JTREG parameter is apparently ignored. This is unfortunate as there is a warning as it is a non-control variable. Am I wrong in thinking that this was written for testing internall within Oracle? I can not find an instance of "com.oracle.jib.api.JibServiceFactory" in the OpenJDK project or elsewhere. BR, Stuart On 8 September 2017 at 16:53, Stuart Monteith wrote: > Hello, > I've spent some time on this, and I have to admit that I'm stumped. I > get exactly the same errors on x86 on jdk10/hs and jdk10/jdk10 with arecent > build of JTReg and JT_HOME set appropriately. > > Are there any pointers on how this is supposed to be run? > > Thanks, > Stuart > > On 25 April 2017 at 11:47, Aleksey Shipilev wrote: > >> On 04/19/2017 12:12 AM, Igor Ignatyev wrote: >> > http://cr.openjdk.java.net/~iignatyev//8178788/webrev.00/index.html >> >> 69903 lines changed: 69903 ins; 0 del; 0 mod; >> > (69524 lines are generated) >> > >> > Hi all, >> > >> > could you please review this patch which adds a jtreg test wrapper for >> > jcstress test suite and jtreg tests which run jsctress tests thru this >> > wrapper? >> > >> > webrev: http://cr.openjdk.java.net/~iignatyev//8178788/webrev.00/ind >> ex.html >> > JBS: https://bugs.openjdk.java.net/browse/JDK-8178788 testing: >> >> TL;DR: This patch introduces more problems than it solves. Just run the >> jcstress >> tests-all JAR against the tested runtime. >> >> Wrapping jcstress tests with jtreg defies the purpose of jcstress harness >> -- >> that is, running lots of tests as fast as it possibly could without >> affecting >> testing quality. For example, by cleverly reusing VMs between the tests, >> using >> Whitebox to deoptimize without restarting the VMs, etc. It really wastes >> CPU >> time to run each test in isolation. >> >> Also, it does not "automatically" work, which defies "easy to run" goal: >> >> Caused by: java.io.FileNotFoundException: Couldn't automatically resolve >> dependency for jcstress-tests-all , revision 0.3 >> Please specify the location using jdk.test.lib.artifacts.jcstres >> s-tests-all >> at >> jdk.test.lib.artifacts.DefaultArtifactManager.resolve(Defaul >> tArtifactManager.java:37) >> at jdk.test.lib.artifacts.ArtifactResolver.resolve(ArtifactReso >> lver.java:54) >> at applications.jcstress.JcstressRunner.pathToArtifact(Jcstress >> Runner.java:53) >> ... 8 more >> >> Okay, brilliant! How do I configure this, if I run "make test"? >> >> CONF=linux-x86_64-normal-server-release LOG=info make test >> TEST="hotspot_all" >> >> >> -Aleksey >> >> > From magnus.ihse.bursie at oracle.com Thu Sep 14 07:40:53 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Thu, 14 Sep 2017 09:40:53 +0200 Subject: RFR(XL/M) : 8178788: wrap JCStress test suite as jtreg tests In-Reply-To: References: <9A2C94EA-89A3-4C75-9D3C-51E058BD8A1D@oracle.com> <342c3748-616f-8a7d-74f4-3ce929b1e0dc@redhat.com> Message-ID: <613877ee-1c56-3f85-138b-5e6ee320a08a@oracle.com> Stuart, On 2017-09-13 20:36, Stuart Monteith wrote: > For the record, if I put my build of jcstress.jar in $HOME, the > following allows the jcstress tests to run: > > make test TEST="hotspot_all" > EXTRA_JTREG_OPTIONS="-Djdk.test.lib.artifacts.jcstress-tests-all=$HOME/jcstress.jar" > > From "make help" and the makefiles themselves, I had expected the > follow to work: > make TEST="hotspot_all" > JTREG="VM_OPTIONS=-Djdk.test.lib.artifacts.jcstress-tests-all=$HOME/jcstress.jar" > test > > but it does not - the JTREG parameter is apparently ignored. This is > unfortunate as there is a warning as it is a non-control variable. You are using the "test" target, but the JTREG option is only available for the new "run-test" target. Using "test" will invoke the old testing framework, which is about to be replaced by a more modern and integrated one. In the long term, "test" will invoke the new framework, but during a transition period, "run-test" needs to be used. For what it's worth, I tried your command line (but without the patch applied) and verified using LOG=cmdlines that the -D option was indeed passed to jtreg. /Magnus > > Am I wrong in thinking that this was written for testing internall > within Oracle? I can not find an instance of > "com.oracle.jib.api.JibServiceFactory" in the OpenJDK project or > elsewhere. > > BR, > Stuart > > On 8 September 2017 at 16:53, Stuart Monteith > > wrote: > > Hello, > I've spent some time on this, and I have to admit that I'm > stumped. I get exactly the same errors on x86 on jdk10/hs and > jdk10/jdk10 with arecent build of JTReg and JT_HOME set appropriately. > > Are there any pointers on how this is supposed to be run? > > Thanks, > Stuart > > On 25 April 2017 at 11:47, Aleksey Shipilev > wrote: > > On 04/19/2017 12:12 AM, Igor Ignatyev wrote: > > > http://cr.openjdk.java.net/~iignatyev//8178788/webrev.00/index.html > > >> 69903 lines changed: 69903 ins; 0 del; 0 mod; > > (69524 lines are generated) > > > > Hi all, > > > > could you please review this patch which adds a jtreg test > wrapper for > > jcstress test suite and jtreg tests which run jsctress tests > thru this > > wrapper? > > > > webrev: > http://cr.openjdk.java.net/~iignatyev//8178788/webrev.00/index.html > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8178788 > testing: > > TL;DR: This patch introduces more problems than it solves. > Just run the jcstress > tests-all JAR against the tested runtime. > > Wrapping jcstress tests with jtreg defies the purpose of > jcstress harness -- > that is, running lots of tests as fast as it possibly could > without affecting > testing quality. For example, by cleverly reusing VMs between > the tests, using > Whitebox to deoptimize without restarting the VMs, etc. It > really wastes CPU > time to run each test in isolation. > > Also, it does not "automatically" work, which defies "easy to > run" goal: > > Caused by: java.io.FileNotFoundException: Couldn't > automatically resolve > dependency for jcstress-tests-all , revision 0.3 > Please specify the location using > jdk.test.lib.artifacts.jcstress-tests-all > at > jdk.test.lib.artifacts.DefaultArtifactManager.resolve(DefaultArtifactManager.java:37) > at > jdk.test.lib.artifacts.ArtifactResolver.resolve(ArtifactResolver.java:54) > at > applications.jcstress.JcstressRunner.pathToArtifact(JcstressRunner.java:53) > ... 8 more > > Okay, brilliant! How do I configure this, if I run "make test"? > > CONF=linux-x86_64-normal-server-release LOG=info make test > TEST="hotspot_all" > > > -Aleksey > > > From stuart.monteith at linaro.org Thu Sep 14 10:26:53 2017 From: stuart.monteith at linaro.org (Stuart Monteith) Date: Thu, 14 Sep 2017 11:26:53 +0100 Subject: RFR(XL/M) : 8178788: wrap JCStress test suite as jtreg tests In-Reply-To: <613877ee-1c56-3f85-138b-5e6ee320a08a@oracle.com> References: <9A2C94EA-89A3-4C75-9D3C-51E058BD8A1D@oracle.com> <342c3748-616f-8a7d-74f4-3ce929b1e0dc@redhat.com> <613877ee-1c56-3f85-138b-5e6ee320a08a@oracle.com> Message-ID: Thank you Magnus, that's useful. My working command line is: make run-test TEST="hotspot_all" JTREG="VM_OPTIONS=-Djdk.test.lib.artifacts.jcstress-tests-all=$HOME/jcstress.jar" but it takes an excessively long time to run. Are there plans for a means to cleanly disable the JCStress JTreg tests so we can efficiently run them with the JCStress harness? Thanks, Stuart On 14 September 2017 at 08:40, Magnus Ihse Bursie < magnus.ihse.bursie at oracle.com> wrote: > Stuart, > > On 2017-09-13 20:36, Stuart Monteith wrote: > > For the record, if I put my build of jcstress.jar in $HOME, the following > allows the jcstress tests to run: > > make test TEST="hotspot_all" EXTRA_JTREG_OPTIONS="-Djdk. > test.lib.artifacts.jcstress-tests-all=$HOME/jcstress.jar" > > From "make help" and the makefiles themselves, I had expected the follow > to work: > make TEST="hotspot_all" JTREG="VM_OPTIONS=-Djdk.test. > lib.artifacts.jcstress-tests-all=$HOME/jcstress.jar" test > > but it does not - the JTREG parameter is apparently ignored. This is > unfortunate as there is a warning as it is a non-control variable. > > > You are using the "test" target, but the JTREG option is only available > for the new "run-test" target. Using "test" will invoke the old testing > framework, which is about to be replaced by a more modern and integrated > one. In the long term, "test" will invoke the new framework, but during a > transition period, "run-test" needs to be used. > > For what it's worth, I tried your command line (but without the patch > applied) and verified using LOG=cmdlines that the -D option was indeed > passed to jtreg. > > /Magnus > > > > Am I wrong in thinking that this was written for testing internall within > Oracle? I can not find an instance of "com.oracle.jib.api.JibServiceFactory" > in the OpenJDK project or elsewhere. > > BR, > Stuart > > On 8 September 2017 at 16:53, Stuart Monteith > wrote: > >> Hello, >> I've spent some time on this, and I have to admit that I'm stumped. I >> get exactly the same errors on x86 on jdk10/hs and jdk10/jdk10 with arecent >> build of JTReg and JT_HOME set appropriately. >> >> Are there any pointers on how this is supposed to be run? >> >> Thanks, >> Stuart >> >> On 25 April 2017 at 11:47, Aleksey Shipilev wrote: >> >>> On 04/19/2017 12:12 AM, Igor Ignatyev wrote: >>> > http://cr.openjdk.java.net/~iignatyev//8178788/webrev.00/index.html >>> >> 69903 lines changed: 69903 ins; 0 del; 0 mod; >>> > (69524 lines are generated) >>> > >>> > Hi all, >>> > >>> > could you please review this patch which adds a jtreg test wrapper for >>> > jcstress test suite and jtreg tests which run jsctress tests thru this >>> > wrapper? >>> > >>> > webrev: http://cr.openjdk.java.net/~iignatyev//8178788/webrev.00/ind >>> ex.html >>> > JBS: https://bugs.openjdk.java.net/browse/JDK-8178788 testing: >>> >>> TL;DR: This patch introduces more problems than it solves. Just run the >>> jcstress >>> tests-all JAR against the tested runtime. >>> >>> Wrapping jcstress tests with jtreg defies the purpose of jcstress >>> harness -- >>> that is, running lots of tests as fast as it possibly could without >>> affecting >>> testing quality. For example, by cleverly reusing VMs between the tests, >>> using >>> Whitebox to deoptimize without restarting the VMs, etc. It really wastes >>> CPU >>> time to run each test in isolation. >>> >>> Also, it does not "automatically" work, which defies "easy to run" goal: >>> >>> Caused by: java.io.FileNotFoundException: Couldn't automatically resolve >>> dependency for jcstress-tests-all , revision 0.3 >>> Please specify the location using jdk.test.lib.artifacts.jcstres >>> s-tests-all >>> at >>> jdk.test.lib.artifacts.DefaultArtifactManager.resolve(Defaul >>> tArtifactManager.java:37) >>> at jdk.test.lib.artifacts.ArtifactResolver.resolve(ArtifactReso >>> lver.java:54) >>> at applications.jcstress.JcstressRunner.pathToArtifact(Jcstress >>> Runner.java:53) >>> ... 8 more >>> >>> Okay, brilliant! How do I configure this, if I run "make test"? >>> >>> CONF=linux-x86_64-normal-server-release LOG=info make test >>> TEST="hotspot_all" >>> >>> >>> -Aleksey >>> >>> >> > > From rohitarulraj at gmail.com Thu Sep 14 18:40:52 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 15 Sep 2017 00:10:52 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <11af0f62-ba6b-d533-d23c-750d2ca012c7@oracle.com> References: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> <11af0f62-ba6b-d533-d23c-750d2ca012c7@oracle.com> Message-ID: Hello Vladimir, > CPUID check for 0x8000001E should be explicit. Otherwise the code will be > executed for all above 0x80000008. > > + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) > supported? > + __ jccb(Assembler::belowEqual, ext_cpuid8); > + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? > + __ jccb(Assembler::notEqual, ext_cpuid8); > + // > + // Extended cpuid(0x8000001E) > + // > AMD17h has CPUID 0x8000001F too, so the notEqual condition will be false. I have modified the last statement as below: + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) supported? + __ jccb(Assembler::belowEqual, ext_cpuid8); + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? + __ jccb(Assembler::below, ext_cpuid8); Is this OK? After updating the above changes, I got "Short forward jump exceeds 8-bit offset" error while building openJDK. # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (macroAssembler_x86.hpp:116), pid=7786, tid=7787 # guarantee(this->is8bit(imm8)) failed: Short forward jump exceeds 8-bit offset So I have replaced the short jump with near jump while checking for CPUID 0x80000005. __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? __ jcc(Assembler::belowEqual, done); __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? - __ jccb(Assembler::belowEqual, ext_cpuid1); + __ jcc(Assembler::belowEqual, ext_cpuid1); I have attached the updated, re-tested patch. diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -70,7 +70,7 @@ bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, done, wrapup; + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, ext_cpuid8, done, wrapup; Label legacy_setup, save_restore_except, legacy_save_restore, start_simd_check; StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); @@ -267,14 +267,30 @@ __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? __ jcc(Assembler::belowEqual, done); __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? - __ jccb(Assembler::belowEqual, ext_cpuid1); + __ jcc(Assembler::belowEqual, ext_cpuid1); __ cmpl(rax, 0x80000006); // Is cpuid(0x80000007) supported? __ jccb(Assembler::belowEqual, ext_cpuid5); __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? __ jccb(Assembler::belowEqual, ext_cpuid7); + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) supported? + __ jccb(Assembler::belowEqual, ext_cpuid8); + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? + __ jccb(Assembler::below, ext_cpuid8); + // + // Extended cpuid(0x8000001E) + // + __ movl(rax, 0x8000001E); + __ cpuid(); + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid1E_offset()))); + __ movl(Address(rsi, 0), rax); + __ movl(Address(rsi, 4), rbx); + __ movl(Address(rsi, 8), rcx); + __ movl(Address(rsi,12), rdx); + // // Extended cpuid(0x80000008) // + __ bind(ext_cpuid8); __ movl(rax, 0x80000008); __ cpuid(); __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); @@ -1109,11 +1125,27 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); + } +#ifdef COMPILER2 + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -228,6 +228,15 @@ } bits; }; + union ExtCpuid1EEbx { + uint32_t value; + struct { + uint32_t : 8, + threads_per_core : 8, + : 16; + } bits; + }; + union XemXcr0Eax { uint32_t value; struct { @@ -398,6 +407,12 @@ ExtCpuid8Ecx ext_cpuid8_ecx; uint32_t ext_cpuid8_edx; // reserved + // cpuid function 0x8000001E // AMD 17h + uint32_t ext_cpuid1E_eax; + ExtCpuid1EEbx ext_cpuid1E_ebx; // threads per core (AMD17h) + uint32_t ext_cpuid1E_ecx; + uint32_t ext_cpuid1E_edx; // unused currently + // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) XemXcr0Eax xem_xcr0_eax; uint32_t xem_xcr0_edx; // reserved @@ -505,6 +520,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -518,16 +541,8 @@ } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; @@ -590,6 +605,7 @@ static ByteSize ext_cpuid5_offset() { return byte_offset_of(CpuidInfo, ext_cpuid5_eax); } static ByteSize ext_cpuid7_offset() { return byte_offset_of(CpuidInfo, ext_cpuid7_eax); } static ByteSize ext_cpuid8_offset() { return byte_offset_of(CpuidInfo, ext_cpuid8_eax); } + static ByteSize ext_cpuid1E_offset() { return byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } static ByteSize tpl_cpuidB0_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } static ByteSize tpl_cpuidB1_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } static ByteSize tpl_cpuidB2_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } @@ -673,8 +689,12 @@ if (is_intel() && supports_processor_topology()) { result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / - cores_per_cpu(); + if (cpu_family() >= 0x17) { + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; + } else { + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / + cores_per_cpu(); + } } return (result == 0 ? 1 : result); } Please let me know your comments. Thanks for your review. Regards, Rohit > > > On 9/11/17 9:52 PM, Rohit Arul Raj wrote: >> >> Hello David, >> >>>> >>>> >>>> 1. ExtCpuid1EEx >>>> >>>> Should this be ExtCpuid1EEbx? (I see the naming here is somewhat >>>> inconsistent - and potentially confusing: I would have preferred to see >>>> things like ExtCpuid_1E_Ebx, to make it clear.) >>> >>> >>> Yes, I can change it accordingly. >>> >> >> I have attached the updated, re-tested patch as per your comments above. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -70,7 +70,7 @@ >> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >> >> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >> done, wrapup; >> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >> ext_cpuid8, done, wrapup; >> Label legacy_setup, save_restore_except, legacy_save_restore, >> start_simd_check; >> >> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >> @@ -272,9 +272,23 @@ >> __ jccb(Assembler::belowEqual, ext_cpuid5); >> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >> __ jccb(Assembler::belowEqual, ext_cpuid7); >> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >> + __ jccb(Assembler::belowEqual, ext_cpuid8); >> + // >> + // Extended cpuid(0x8000001E) >> + // >> + __ movl(rax, 0x8000001E); >> + __ cpuid(); >> + __ lea(rsi, Address(rbp, >> in_bytes(VM_Version::ext_cpuid_1E_offset()))); >> + __ movl(Address(rsi, 0), rax); >> + __ movl(Address(rsi, 4), rbx); >> + __ movl(Address(rsi, 8), rcx); >> + __ movl(Address(rsi,12), rdx); >> + >> // >> // Extended cpuid(0x80000008) >> // >> + __ bind(ext_cpuid8); >> __ movl(rax, 0x80000008); >> __ cpuid(); >> __ lea(rsi, Address(rbp, >> in_bytes(VM_Version::ext_cpuid8_offset()))); >> @@ -1109,11 +1123,27 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -228,6 +228,15 @@ >> } bits; >> }; >> >> + union ExtCpuid_1E_Ebx { >> + uint32_t value; >> + struct { >> + uint32_t : 8, >> + threads_per_core : 8, >> + : 16; >> + } bits; >> + }; >> + >> union XemXcr0Eax { >> uint32_t value; >> struct { >> @@ -398,6 +407,12 @@ >> ExtCpuid8Ecx ext_cpuid8_ecx; >> uint32_t ext_cpuid8_edx; // reserved >> >> + // cpuid function 0x8000001E // AMD 17h >> + uint32_t ext_cpuid_1E_eax; >> + ExtCpuid_1E_Ebx ext_cpuid_1E_ebx; // threads per core (AMD17h) >> + uint32_t ext_cpuid_1E_ecx; >> + uint32_t ext_cpuid_1E_edx; // unused currently >> + >> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >> register) >> XemXcr0Eax xem_xcr0_eax; >> uint32_t xem_xcr0_edx; // reserved >> @@ -505,6 +520,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -518,16 +541,8 @@ >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> @@ -590,6 +605,7 @@ >> static ByteSize ext_cpuid5_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >> static ByteSize ext_cpuid7_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >> static ByteSize ext_cpuid8_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >> + static ByteSize ext_cpuid_1E_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid_1E_eax); } >> static ByteSize tpl_cpuidB0_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >> static ByteSize tpl_cpuidB1_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >> static ByteSize tpl_cpuidB2_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >> @@ -673,8 +689,11 @@ >> if (is_intel() && supports_processor_topology()) { >> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >> - cores_per_cpu(); >> + if (cpu_family() >= 0x17) >> + result = _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + 1; >> + else >> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >> + cores_per_cpu(); >> } >> return (result == 0 ? 1 : result); >> } >> >> >> Please let me know your comments >> >> Thanks for your time. >> >> Regards, >> Rohit >> >> >>>> Thanks, >>>> David >>>> ----- >>>> >>>> >>>>> Reference: >>>>> >>>>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >>>>> [Pg 82] >>>>> >>>>> CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >>>>> 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. >>>>> The number of threads per core is ThreadsPerCore+1. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -70,7 +70,7 @@ >>>>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>>> >>>>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>> done, wrapup; >>>>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>> ext_cpuid8, done, wrapup; >>>>> Label legacy_setup, save_restore_except, legacy_save_restore, >>>>> start_simd_check; >>>>> >>>>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>>> @@ -272,9 +272,23 @@ >>>>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>>>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>>>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>>>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>>>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>>>> + // >>>>> + // Extended cpuid(0x8000001E) >>>>> + // >>>>> + __ movl(rax, 0x8000001E); >>>>> + __ cpuid(); >>>>> + __ lea(rsi, Address(rbp, >>>>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>>>> + __ movl(Address(rsi, 0), rax); >>>>> + __ movl(Address(rsi, 4), rbx); >>>>> + __ movl(Address(rsi, 8), rcx); >>>>> + __ movl(Address(rsi,12), rdx); >>>>> + >>>>> // >>>>> // Extended cpuid(0x80000008) >>>>> // >>>>> + __ bind(ext_cpuid8); >>>>> __ movl(rax, 0x80000008); >>>>> __ cpuid(); >>>>> __ lea(rsi, Address(rbp, >>>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>>> @@ -1109,11 +1123,27 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>> { >>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -228,6 +228,15 @@ >>>>> } bits; >>>>> }; >>>>> >>>>> + union ExtCpuid1EEx { >>>>> + uint32_t value; >>>>> + struct { >>>>> + uint32_t : 8, >>>>> + threads_per_core : 8, >>>>> + : 16; >>>>> + } bits; >>>>> + }; >>>>> + >>>>> union XemXcr0Eax { >>>>> uint32_t value; >>>>> struct { >>>>> @@ -398,6 +407,12 @@ >>>>> ExtCpuid8Ecx ext_cpuid8_ecx; >>>>> uint32_t ext_cpuid8_edx; // reserved >>>>> >>>>> + // cpuid function 0x8000001E // AMD 17h >>>>> + uint32_t ext_cpuid1E_eax; >>>>> + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >>>>> + uint32_t ext_cpuid1E_ecx; >>>>> + uint32_t ext_cpuid1E_edx; // unused currently >>>>> + >>>>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>>> register) >>>>> XemXcr0Eax xem_xcr0_eax; >>>>> uint32_t xem_xcr0_edx; // reserved >>>>> @@ -505,6 +520,14 @@ >>>>> result |= CPU_CLMUL; >>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>> result |= CPU_RTM; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> >>>>> // AMD features. >>>>> if (is_amd()) { >>>>> @@ -518,16 +541,8 @@ >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> - result |= CPU_ADX; >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> - result |= CPU_BMI2; >>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> - result |= CPU_SHA; >>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>> result |= CPU_LZCNT; >>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> - result |= CPU_FMA; >>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>> support for prefetchw >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>> result |= CPU_3DNOW_PREFETCH; >>>>> @@ -590,6 +605,7 @@ >>>>> static ByteSize ext_cpuid5_offset() { return >>>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>>> static ByteSize ext_cpuid7_offset() { return >>>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>>> static ByteSize ext_cpuid8_offset() { return >>>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>>> + static ByteSize ext_cpuid1E_offset() { return >>>>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>>>> static ByteSize tpl_cpuidB0_offset() { return >>>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>>> static ByteSize tpl_cpuidB1_offset() { return >>>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>>> static ByteSize tpl_cpuidB2_offset() { return >>>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>>> @@ -673,8 +689,11 @@ >>>>> if (is_intel() && supports_processor_topology()) { >>>>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>> - cores_per_cpu(); >>>>> + if (cpu_family() >= 0x17) >>>>> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + >>>>> 1; >>>>> + else >>>>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>> + cores_per_cpu(); >>>>> } >>>>> return (result == 0 ? 1 : result); >>>>> } >>>>> >>>>> I have attached the patch for review. >>>>> Please let me know your comments. >>>>> >>>>> Thanks, >>>>> Rohit >>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> >>>>>>> >>>>>>> src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> >>>>>>> No comments on AMD specific changes. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello David, >>>>>>>>> >>>>>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Rohit, >>>>>>>>>> >>>>>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I checked out the latest jdk10/hs/hotspot [parent: >>>>>>>>> 13548:1a9c2e07a826] >>>>>>>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>>>>>>>> without any issues. >>>>>>>>> Can you share the error message that you are getting? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I was getting this: >>>>>>>> >>>>>>>> applying hotspot.patch >>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> Hunk #1 FAILED at 1108 >>>>>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> Hunk #2 FAILED at 522 >>>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>>>>> abort: patch failed to apply >>>>>>>> >>>>>>>> but I started again and this time it applied fine, so not sure what >>>>>>>> was >>>>>>>> going on there. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hello Vladimir, >>>>>>>>>>> >>>>>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>> >>>>>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>>> >>>>>>>>>>>>>> Changes look good. Only question I have is about >>>>>>>>>>>>>> MaxVectorSize. >>>>>>>>>>>>>> It >>>>>>>>>>>>>> is >>>>>>>>>>>>>> set >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD >>>>>>>>>>>>> 17h. >>>>>>>>>>>>> So >>>>>>>>>>>>> I have removed the surplus check for MaxVectorSize from my >>>>>>>>>>>>> patch. >>>>>>>>>>>>> I >>>>>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Which check you removed? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> My older patch had the below mentioned check which was required >>>>>>>>>>> on >>>>>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>>>>> >>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>> ... >>>>>>>>>>> ... >>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>> + } >>>>>>>>>>> .. >>>>>>>>>>> .. >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>>>>> >>>>>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>>>>>>>> there >>>>>>>>>>>>> an >>>>>>>>>>>>> underlying reason for this? I have handled this in the patch >>>>>>>>>>>>> but >>>>>>>>>>>>> just >>>>>>>>>>>>> wanted to confirm. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>>>>> instructions >>>>>>>>>>>> to >>>>>>>>>>>> calculate SHA-256: >>>>>>>>>>>> >>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>>>>> >>>>>>>>>>>> I don't know if AMD 15h supports these instructions and can >>>>>>>>>>>> execute >>>>>>>>>>>> that >>>>>>>>>>>> code. You need to test it. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 >>>>>>>>>>> instructions, >>>>>>>>>>> it should work. >>>>>>>>>>> Confirmed by running following sanity tests: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>>>>> >>>>>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>>>>> >>>>>>>>>>> Please find attached updated, re-tested patch. >>>>>>>>>>> >>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>> } >>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>> + >>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>> for >>>>>>>>>>> Array Copy >>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>> + } >>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>> { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>> + } >>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>> + if (supports_sse4_2() && >>>>>>>>>>> FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>>>>> { >>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>> + } >>>>>>>>>>> +#endif >>>>>>>>>>> + } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>> >>>>>>>>>>> // AMD features. >>>>>>>>>>> if (is_amd()) { >>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>> } >>>>>>>>>>> // Intel features. >>>>>>>>>>> if(is_intel()) { >>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>> indicates >>>>>>>>>>> support for prefetchw >>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) >>>>>>>>>>> { >>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>> >>>>>>>>>>> Please let me know your comments. >>>>>>>>>>> >>>>>>>>>>> Thanks for your time. >>>>>>>>>>> Rohit >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>> } >>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>> } >>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>> || >>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>> CPU"); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } >>>>>>>>>>>>> >>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>> } >>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>> + >>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>> for >>>>>>>>>>>>> Array Copy >>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>> hash >>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#endif >>>>>>>>>>>>> + } >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>> >>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>> } >>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != >>>>>>>>>>>>> 0) >>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>> indicates >>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>>> 0) { >>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Rohit >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see >>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>> lot of >>>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, >>>>>>>>>>>>>>>> test >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg >>>>>>>>>>>>>>> ($make >>>>>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>>>>>>>> flag/ISA >>>>>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ************************* Patch **************************** >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>> || >>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel >>>>>>>>>>>>>>> != 0) >>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse >>>>>>>>>>>>>>> != >>>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ************************************************************** >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I would like an volunteer to review this patch >>>>>>>>>>>>>>>>>>>> (openJDK9) >>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and >>>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for >>>>>>>>>>>>>>>>>>>> reference. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the >>>>>>>>>>>>>>>>>>> mail >>>>>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you >>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to >>>>>>>>>>>>>>>>>>> comment >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>> + warning("SHA instructions are not available on >>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < >>>>>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != >>>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>> > From erik.helin at oracle.com Fri Sep 15 09:24:46 2017 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 15 Sep 2017 11:24:46 +0200 Subject: RFR: 8187570: Comparison between pointer and char in MethodMatcher::canonicalize Message-ID: <7e92ea84-7f81-dc09-0710-c84f218b7495@oracle.com> Hi all, when I compiled with gcc 7.1.1 it warned me about the following code: bool MethodMatcher::canonicalize(char * line, const char *& error_msg) { char* colon = strstr(line, "::"); bool have_colon = (colon != NULL); if (have_colon) { // Don't allow multiple '::' if (colon + 2 != '\0') { The problem is that colon is a pointer, so colon + 2 is a pointer, and then colon + 2 is compared '\0', which is a char :/ Anyways, I think Yasumasa also spotted this issue a while back, but I couldn't find a patch for it, so I quickly whipped one up: http://cr.openjdk.java.net/~ehelin/8187570/00/ --- old/src/hotspot/share/compiler/methodMatcher.cpp 2017-09-15 10:43:49.430656504 +0200 +++ new/src/hotspot/share/compiler/methodMatcher.cpp 2017-09-15 10:43:49.102654877 +0200 @@ -96,7 +96,7 @@ bool have_colon = (colon != NULL); if (have_colon) { // Don't allow multiple '::' - if (colon + 2 != '\0') { + if (colon[2] != '\0') { if (strstr(colon+2, "::")) { error_msg = "Method pattern only allows one '::' allowed"; return false; I was a little bit afraid of what would happen if line (the parameter to MethodMatcher::canonicalize) isn't properly null terminated, so I checked the callers, and it seems like all callers property null terminate the `line` argument. Yasumasa, you want to be author and/or contributor of this patch? Thanks, Erik PS. First webrev and patch created with a consolidated forest, seems to be working fine :) From yasuenag at gmail.com Fri Sep 15 12:34:24 2017 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Fri, 15 Sep 2017 21:34:24 +0900 Subject: RFR: 8187570: Comparison between pointer and char in MethodMatcher::canonicalize In-Reply-To: <7e92ea84-7f81-dc09-0710-c84f218b7495@oracle.com> References: <7e92ea84-7f81-dc09-0710-c84f218b7495@oracle.com> Message-ID: Hi Erik, > Anyways, I think Yasumasa also spotted this issue a while back, but I couldn't find a patch for it I've pasted a patch in [1]. But I did not create webrev because Kim told me this issue was reported in JDK-8181503. Currently, JDK-8181503 seems to be in progress. > Yasumasa, you want to be author and/or contributor of this patch? Please push this patch as your changeset :-) because I cannot access JPRT. I'm thumbs up this change as a reviewer (ysuenaga). Thanks, Yasumasa On 2017/09/15 18:24, Erik Helin wrote: > Hi all, > > when I compiled with gcc 7.1.1 it warned me about the following code: > > bool MethodMatcher::canonicalize(char * line, const char *& error_msg) { > ? char* colon = strstr(line, "::"); > ? bool have_colon = (colon != NULL); > ? if (have_colon) { > ??? // Don't allow multiple '::' > ??? if (colon + 2 != '\0') { > > The problem is that colon is a pointer, so colon + 2 is a pointer, and then colon + 2 is compared '\0', which is a char :/ > > Anyways, I think Yasumasa also spotted this issue a while back, but I couldn't find a patch for it, so I quickly whipped one up: > > http://cr.openjdk.java.net/~ehelin/8187570/00/ > > --- old/src/hotspot/share/compiler/methodMatcher.cpp??? 2017-09-15 10:43:49.430656504 +0200 > +++ new/src/hotspot/share/compiler/methodMatcher.cpp??? 2017-09-15 10:43:49.102654877 +0200 > @@ -96,7 +96,7 @@ > ?? bool have_colon = (colon != NULL); > ?? if (have_colon) { > ???? // Don't allow multiple '::' > -??? if (colon + 2 != '\0') { > +??? if (colon[2] != '\0') { > ?????? if (strstr(colon+2, "::")) { > ???????? error_msg = "Method pattern only allows one '::' allowed"; > ???????? return false; > > I was a little bit afraid of what would happen if line (the parameter to MethodMatcher::canonicalize) isn't properly null terminated, so I checked the callers, and it seems like all callers property null terminate the `line` argument. > > Yasumasa, you want to be author and/or contributor of this patch? > > Thanks, > Erik > > PS. First webrev and patch created with a consolidated forest, seems to be working fine :) From yasuenag at gmail.com Fri Sep 15 12:36:13 2017 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Fri, 15 Sep 2017 21:36:13 +0900 Subject: RFR: 8187570: Comparison between pointer and char in MethodMatcher::canonicalize In-Reply-To: References: <7e92ea84-7f81-dc09-0710-c84f218b7495@oracle.com> Message-ID: <84e6f6c9-b620-ee84-1917-370bf5a75ded@gmail.com> Sorry, I forgot to paste links: [1] http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-July/027431.html email from Kim: http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-July/027433.html On 2017/09/15 21:31, Yasumasa Suenaga wrote: > Hi Erik, > > >> Anyways, I think Yasumasa also spotted this issue a while back, but I couldn't find a patch for it > > I've pasted a patch in [1]. But I did not create webrev because Kim told me this issue was reported in JDK-8181503. > Currently, JDK-8181503 seems to be in progress. > >> Yasumasa, you want to be author and/or contributor of this patch? > > Please push this patch as your changeset :-) >> >> Thanks, >> Erik >> >> PS. First webrev and patch created with a consolidated forest, seems to be working fine :) > > On 2017/09/15 18:24, Erik Helin wrote: >> Hi all, >> >> when I compiled with gcc 7.1.1 it warned me about the following code: >> >> bool MethodMatcher::canonicalize(char * line, const char *& error_msg) { >> ?? char* colon = strstr(line, "::"); >> ?? bool have_colon = (colon != NULL); >> ?? if (have_colon) { >> ???? // Don't allow multiple '::' >> ???? if (colon + 2 != '\0') { >> >> The problem is that colon is a pointer, so colon + 2 is a pointer, and then colon + 2 is compared '\0', which is a char :/ >> >> Anyways, I think Yasumasa also spotted this issue a while back, but I couldn't find a patch for it, so I quickly whipped one up: >> >> http://cr.openjdk.java.net/~ehelin/8187570/00/ >> >> --- old/src/hotspot/share/compiler/methodMatcher.cpp??? 2017-09-15 10:43:49.430656504 +0200 >> +++ new/src/hotspot/share/compiler/methodMatcher.cpp??? 2017-09-15 10:43:49.102654877 +0200 >> @@ -96,7 +96,7 @@ >> ??? bool have_colon = (colon != NULL); >> ??? if (have_colon) { >> ????? // Don't allow multiple '::' >> -??? if (colon + 2 != '\0') { >> +??? if (colon[2] != '\0') { >> ??????? if (strstr(colon+2, "::")) { >> ????????? error_msg = "Method pattern only allows one '::' allowed"; >> ????????? return false; >> >> I was a little bit afraid of what would happen if line (the parameter to MethodMatcher::canonicalize) isn't properly null terminated, so I checked the callers, and it seems like all callers property null terminate the `line` argument. >> >> Yasumasa, you want to be author and/or contributor of this patch? >> >> Thanks, >> Erik >> >> PS. First webrev and patch created with a consolidated forest, seems to be working fine :) From erik.helin at oracle.com Fri Sep 15 13:04:49 2017 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 15 Sep 2017 15:04:49 +0200 Subject: RFR: 8187578: BitMap::reallocate should check if old_map is NULL Message-ID: Hi all, I'm still trying to compile with gcc 7.1.1 and run into another small issue. BitMap::reallocate calls Copy::disjoint_words and there is a case in Copy::disjoint_words that might result in call to memcpy. The problem is that BitMap::reallocate does not check that the "from" argument to Copy::disjoint_words differs from NULL, and a call to memcpy with a NULL argument is undefined behavior. Webrev: http://cr.openjdk.java.net/~ehelin/8187578/00/ Patch: --- old/src/hotspot/share/utilities/bitMap.cpp 2017-09-15 14:47:21.471113699 +0200 +++ new/src/hotspot/share/utilities/bitMap.cpp 2017-09-15 14:47:21.179112252 +0200 @@ -81,8 +81,10 @@ if (new_size_in_words > 0) { map = allocator.allocate(new_size_in_words); - Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, - MIN2(old_size_in_words, new_size_in_words)); + if (old_map != NULL) { + Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, + MIN2(old_size_in_words, new_size_in_words)); + } if (new_size_in_words > old_size_in_words) { clear_range_of_words(map, old_size_in_words, new_size_in_words); Thanks, Erik From erik.osterlund at oracle.com Fri Sep 15 13:55:15 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 15 Sep 2017 15:55:15 +0200 Subject: RFR: 8187578: BitMap::reallocate should check if old_map is NULL In-Reply-To: References: Message-ID: <59BBDBC3.6000305@oracle.com> Looks good. /Erik On 2017-09-15 15:04, Erik Helin wrote: > Hi all, > > I'm still trying to compile with gcc 7.1.1 and run into another small > issue. BitMap::reallocate calls Copy::disjoint_words and there is a > case in Copy::disjoint_words that might result in call to memcpy. The > problem is that BitMap::reallocate does not check that the "from" > argument to Copy::disjoint_words differs from NULL, and a call to > memcpy with a NULL argument is undefined behavior. > > Webrev: > http://cr.openjdk.java.net/~ehelin/8187578/00/ > > Patch: > --- old/src/hotspot/share/utilities/bitMap.cpp 2017-09-15 > 14:47:21.471113699 +0200 > +++ new/src/hotspot/share/utilities/bitMap.cpp 2017-09-15 > 14:47:21.179112252 +0200 > @@ -81,8 +81,10 @@ > if (new_size_in_words > 0) { > map = allocator.allocate(new_size_in_words); > > - Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, > - MIN2(old_size_in_words, new_size_in_words)); > + if (old_map != NULL) { > + Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, > + MIN2(old_size_in_words, new_size_in_words)); > + } > > if (new_size_in_words > old_size_in_words) { > clear_range_of_words(map, old_size_in_words, new_size_in_words); > > Thanks, > Erik From erik.osterlund at oracle.com Fri Sep 15 13:56:38 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 15 Sep 2017 15:56:38 +0200 Subject: RFR: 8187570: Comparison between pointer and char in MethodMatcher::canonicalize In-Reply-To: <7e92ea84-7f81-dc09-0710-c84f218b7495@oracle.com> References: <7e92ea84-7f81-dc09-0710-c84f218b7495@oracle.com> Message-ID: <59BBDC16.80804@oracle.com> Looks good. /Erik On 2017-09-15 11:24, Erik Helin wrote: > Hi all, > > when I compiled with gcc 7.1.1 it warned me about the following code: > > bool MethodMatcher::canonicalize(char * line, const char *& error_msg) { > char* colon = strstr(line, "::"); > bool have_colon = (colon != NULL); > if (have_colon) { > // Don't allow multiple '::' > if (colon + 2 != '\0') { > > The problem is that colon is a pointer, so colon + 2 is a pointer, and > then colon + 2 is compared '\0', which is a char :/ > > Anyways, I think Yasumasa also spotted this issue a while back, but I > couldn't find a patch for it, so I quickly whipped one up: > > http://cr.openjdk.java.net/~ehelin/8187570/00/ > > --- old/src/hotspot/share/compiler/methodMatcher.cpp 2017-09-15 > 10:43:49.430656504 +0200 > +++ new/src/hotspot/share/compiler/methodMatcher.cpp 2017-09-15 > 10:43:49.102654877 +0200 > @@ -96,7 +96,7 @@ > bool have_colon = (colon != NULL); > if (have_colon) { > // Don't allow multiple '::' > - if (colon + 2 != '\0') { > + if (colon[2] != '\0') { > if (strstr(colon+2, "::")) { > error_msg = "Method pattern only allows one '::' allowed"; > return false; > > I was a little bit afraid of what would happen if line (the parameter > to MethodMatcher::canonicalize) isn't properly null terminated, so I > checked the callers, and it seems like all callers property null > terminate the `line` argument. > > Yasumasa, you want to be author and/or contributor of this patch? > > Thanks, > Erik > > PS. First webrev and patch created with a consolidated forest, seems > to be working fine :) From stefan.karlsson at oracle.com Fri Sep 15 15:12:16 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 15 Sep 2017 17:12:16 +0200 Subject: RFR: 8187578: BitMap::reallocate should check if old_map is NULL In-Reply-To: References: Message-ID: <97eda24e-7225-7fc3-1312-6648d83104cf@oracle.com> Looks good. StefanK On 2017-09-15 15:04, Erik Helin wrote: > Hi all, > > I'm still trying to compile with gcc 7.1.1 and run into another small > issue. BitMap::reallocate calls Copy::disjoint_words and there is a > case in Copy::disjoint_words that might result in call to memcpy. The > problem is that BitMap::reallocate does not check that the "from" > argument to Copy::disjoint_words differs from NULL, and a call to > memcpy with a NULL argument is undefined behavior. > > Webrev: > http://cr.openjdk.java.net/~ehelin/8187578/00/ > > Patch: > --- old/src/hotspot/share/utilities/bitMap.cpp??? 2017-09-15 > 14:47:21.471113699 +0200 > +++ new/src/hotspot/share/utilities/bitMap.cpp??? 2017-09-15 > 14:47:21.179112252 +0200 > @@ -81,8 +81,10 @@ > ?? if (new_size_in_words > 0) { > ???? map = allocator.allocate(new_size_in_words); > > -??? Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, > -???????????????????????? MIN2(old_size_in_words, new_size_in_words)); > +??? if (old_map != NULL) { > +????? Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, > +?????????????????????????? MIN2(old_size_in_words, new_size_in_words)); > +??? } > > ???? if (new_size_in_words > old_size_in_words) { > ?????? clear_range_of_words(map, old_size_in_words, new_size_in_words); > > Thanks, > Erik From chris.plummer at oracle.com Fri Sep 15 18:15:18 2017 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 15 Sep 2017 11:15:18 -0700 Subject: Question regarding "native-compiled frame" In-Reply-To: References: Message-ID: <6cc77474-c4eb-f1a7-9412-04addfed2811@oracle.com> On 9/13/17 6:10 AM, Felix Yang wrote: > On 7 September 2017 at 19:04, Andrew Dinn wrote: > >> On 07/09/17 11:35, Felix Yang wrote: >>> Thanks for the reply. >>> Then when will the last return statement of frame::sender got a chance >> to >>> be executed? >>> As I see it, when JVM does something in safepoint state and need to >>> traverse Java thread stack, we never calculate the sender of a >>> native-compiled frame. >> It is possible for Java to call out into the JVM and then for the JVM to >> call back into Java. For example, when a class is loaded the JVM calls >> into Java to run the class initializer. This re-entry may happen >> multiple times. >> >> In that case a stack walk under the re-entry may find a Java start fame >> and it's parent frame will be the native frame where Java entered the VM. >> > Yes, that's the frame structure. > > >> Note that the native frame will always be returned by the call to >> sender_for_entry_frame(map). That method skips all the C frames between >> the Java entry frame and the native frame which exited Java. >> > True. And this is handled by the three IF statements of frame::sender > function. > It seems to me that the last return statement is not involved in the stack > walking process, isn't it? Hi Felix, Are you talking about the following: ? // Must be native-compiled frame, i.e. the marshaling code for native ? // methods that exists in the core system. ? return frame(sender_sp(), link(), sender_pc()); If so, the Oracle arm port used to have this same code. However, 2-3 years ago I was doing quite a few fixes to arm stack/frame walking, and suspected this code was never executed. In 9 I changed to: ? assert(false, "should not be called for a C frame"); ? return frame(); We've never hit this assert in all our testing since. Possibly this change could also be made to x86. However, the details of why I thought this change was ok has escaped me. cheers, Chris > > Thanks for your help, > Felix From glaubitz at physik.fu-berlin.de Fri Sep 15 21:15:57 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Fri, 15 Sep 2017 23:15:57 +0200 Subject: [RFR]: 8187590: Zero runtime can lock-up on linux-alpha Message-ID: <82f35edb-ec8a-180c-3bdc-81fc34ef647c@physik.fu-berlin.de> Hi! Please review this change [1] which fixes random lockups of the Zero runtime on linux-alpha. This was discovered when building OpenJDK on the Debian automatic package builders for Alpha, particularly on SMP machines. It was observed that the issue could be fixed by installing a uni-processor kernel. After some testing, we discovered that the proper fix is to use __sync_synchronize() even for light memory barriers. Thanks, Adrian > [1] http://cr.openjdk.java.net/~glaubitz/8187590/webrev.02/ -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From felix.yang at linaro.org Sat Sep 16 09:43:42 2017 From: felix.yang at linaro.org (Felix Yang) Date: Sat, 16 Sep 2017 17:43:42 +0800 Subject: Question regarding "native-compiled frame" In-Reply-To: <6cc77474-c4eb-f1a7-9412-04addfed2811@oracle.com> References: <6cc77474-c4eb-f1a7-9412-04addfed2811@oracle.com> Message-ID: On 16 September 2017 at 02:15, Chris Plummer wrote: > On 9/13/17 6:10 AM, Felix Yang wrote: > >> On 7 September 2017 at 19:04, Andrew Dinn wrote: >> >> On 07/09/17 11:35, Felix Yang wrote: >>> >>>> Thanks for the reply. >>>> Then when will the last return statement of frame::sender got a >>>> chance >>>> >>> to >>> >>>> be executed? >>>> As I see it, when JVM does something in safepoint state and need to >>>> traverse Java thread stack, we never calculate the sender of a >>>> native-compiled frame. >>>> >>> It is possible for Java to call out into the JVM and then for the JVM to >>> call back into Java. For example, when a class is loaded the JVM calls >>> into Java to run the class initializer. This re-entry may happen >>> multiple times. >>> >>> In that case a stack walk under the re-entry may find a Java start fame >>> and it's parent frame will be the native frame where Java entered the VM. >>> >>> Yes, that's the frame structure. >> >> >> Note that the native frame will always be returned by the call to >>> sender_for_entry_frame(map). That method skips all the C frames between >>> the Java entry frame and the native frame which exited Java. >>> >>> True. And this is handled by the three IF statements of frame::sender >> function. >> It seems to me that the last return statement is not involved in the stack >> walking process, isn't it? >> > Hi Felix, > > Are you talking about the following: > > // Must be native-compiled frame, i.e. the marshaling code for native > // methods that exists in the core system. > return frame(sender_sp(), link(), sender_pc()); > Yes, I want to know which kind of frame we are handling here. > > If so, the Oracle arm port used to have this same code. However, 2-3 years > ago I was doing quite a few fixes to arm stack/frame walking, and suspected > this code was never executed. In 9 I changed to: > > assert(false, "should not be called for a C frame"); > return frame(); > > We've never hit this assert in all our testing since. Possibly this change > could also be made to x86. However, the details of why I thought this > change was ok has escaped me. Thanks for this helpful information. I performed JTreg and specjbb test and I find this code never hit on x86 platform. So I also think of this return statement for x86 as dead code. Don't know why the code is there. Thanks for your help, Felix From david.holmes at oracle.com Sun Sep 17 21:31:40 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 18 Sep 2017 07:31:40 +1000 Subject: RFR: 8187578: BitMap::reallocate should check if old_map is NULL In-Reply-To: References: Message-ID: <852974e7-dc0b-04fa-a01b-c8e98d54039c@oracle.com> Hi Erik, On 15/09/2017 11:04 PM, Erik Helin wrote: > Hi all, > > I'm still trying to compile with gcc 7.1.1 and run into another small > issue. BitMap::reallocate calls Copy::disjoint_words and there is a case > in Copy::disjoint_words that might result in call to memcpy. The problem > is that BitMap::reallocate does not check that the "from" argument to > Copy::disjoint_words differs from NULL, and a call to memcpy with a NULL > argument is undefined behavior. Shouldn't this whole function be a no-op if old_map is NULL? Wouldn't calling it with NULL be a programming error that we should be checking via an assert? Thanks, David > Webrev: > http://cr.openjdk.java.net/~ehelin/8187578/00/ > > Patch: > --- old/src/hotspot/share/utilities/bitMap.cpp??? 2017-09-15 > 14:47:21.471113699 +0200 > +++ new/src/hotspot/share/utilities/bitMap.cpp??? 2017-09-15 > 14:47:21.179112252 +0200 > @@ -81,8 +81,10 @@ > ?? if (new_size_in_words > 0) { > ???? map = allocator.allocate(new_size_in_words); > > -??? Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, > -???????????????????????? MIN2(old_size_in_words, new_size_in_words)); > +??? if (old_map != NULL) { > +????? Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, > +?????????????????????????? MIN2(old_size_in_words, new_size_in_words)); > +??? } > > ???? if (new_size_in_words > old_size_in_words) { > ?????? clear_range_of_words(map, old_size_in_words, new_size_in_words); > > Thanks, > Erik From aph at redhat.com Mon Sep 18 08:24:59 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Sep 2017 09:24:59 +0100 Subject: Question regarding "native-compiled frame" In-Reply-To: References: <6cc77474-c4eb-f1a7-9412-04addfed2811@oracle.com> Message-ID: On 16/09/17 10:43, Felix Yang wrote: > Thanks for this helpful information. > I performed JTreg and specjbb test and I find this code never hit on x86 > platform. > So I also think of this return statement for x86 as dead code. Don't know > why the code is there. We'll probably never know. We should take the x86 patch and apply it to AArch64 for JDK 10. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Mon Sep 18 08:43:35 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 18 Sep 2017 10:43:35 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AE775E.1070503@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <59AE775E.1070503@oracle.com> Message-ID: <59BF8737.4070804@oracle.com> Hi, After some off-list discussions I have made a new version with the following improvements: 1) Added some comments describing the constraints on the types passed in to inc/dec (integral or pointer, and pointers are scaled). 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually used pointers, only pointer sized integers. So I thought removing these overloads and the unnecessary confusion caused by them would make it easier to review this change. 3) Renamed the typedef in the body representing the addend to be called I instead of T to be consistent with the convention Kim introduced. Full webrev: http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/ Incremental webrev: http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/ Thanks, /Erik On 2017-09-05 12:07, Erik ?sterlund wrote: > Hi David, > > On 2017-09-04 23:59, David Holmes wrote: >> Hi Erik, >> >> On 4/09/2017 5:15 PM, Erik ?sterlund wrote: >>> Hi David, >>> >>> On 2017-09-04 03:24, David Holmes wrote: >>>> Hi Erik, >>>> >>>> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>>>> Hi David, >>>>> >>>>> The shared structure for all operations is the following: >>>>> >>>>> An Atomic::something call creates a SomethingImpl function object >>>>> that performs some basic type checking and then forwards the call >>>>> straight to a PlatformSomething function object. This >>>>> PlatformSomething object could decide to do anything. But to make >>>>> life easier, it may inherit from a shared SomethingHelper function >>>>> object with CRTP that calls back into the PlatformSomething >>>>> function object to emit inline assembly. >>>> >>>> Right, but! Lets look at some details. >>>> >>>> Atomic::add >>>> AddImpl >>>> PlatformAdd >>>> FetchAndAdd >>>> AddAndFetch >>>> add_using_helper >>>> >>>> Atomic::cmpxchg >>>> CmpxchgImpl >>>> PlatformCmpxchg >>>> cmpxchg_using_helper >>>> >>>> Atomic::inc >>>> IncImpl >>>> PlatformInc >>>> IncUsingConstant >>>> >>>> Why is it that the simplest operation (inc/dec) has the most >>>> complex platform template definition? Why do we need Adjustment? >>>> You previously said "Adjustment represents the increment/decrement >>>> value as an IntegralConstant - your template friend for passing >>>> around a constant with both a specified type and value in >>>> templates". But add passes around values and doesn't need this. >>>> Further inc/dec don't need to pass anything around anywhere - inc >>>> adds 1, dec subtracts 1! This "1" does not need to appear anywhere >>>> in the API or get passed across layers - the only place this "1" >>>> becomes evident is in the actual platform asm that does the logic >>>> of "add 1" or "subtract 1". >>>> >>>> My understanding from previous discussions is that much of the >>>> template machinations was to deal with type management for "dest" >>>> and the values being passed around. But here, for inc/dec there are >>>> no values being passed so we don't have to make "dest" >>>> type-compatible with any value. >>> >>> Dealing with different types being passed in is one part of the >>> problem - a problem that almost all operations seems to have. But >>> Atomic::add and inc/dec have more problems to deal with. >>> >>> The Atomic::add operation has two more problems that cmpxchg does >>> not have. >>> 1) It needs to scale pointer arithmetic. So if you have a P* and you >>> add it by 2, then you really add the underlying value by 2 * >>> sizeof(P), and the scaled addend needs to be of the right type - the >>> type of the destination for integral types and ptrdiff_t for >>> pointers. This is similar semantics to ++pointer. >> >> I'll address this below - but yes I overlooked this aspect. >> >>> 2) It connects backends with different semantics - either >>> fetch_and_add or add_and_fetch to a common public interface with >>> add_and_fetch semantics. >> >> Not at all clear why this has to manifest in the upper/middle layers >> instead of being handled by the actual lowest-layer ?? > > It could have been addressed in the lowest layer indeed. I suppose Kim > found it nicer to do that on a higher level while you find it nicer to > do it on a lower level. I have no opinion here. > >> >>> This is the reason that Atomic::add might appear more complicated >>> than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different >>> type problems to deal with - no pointer arithmetics. >>> >>> The reason why Atomic::inc/dec looks more complicated than >>> Atomic::add is that it needs to preserve the pointer arithmetic as >>> constants rather than values, because the scaled addend is embedded >>> in the inline assembly as immediate values. Therefore it passes >>> around an IntegralConstant that embeds both the type and size of the >>> addend. And it is not just 1/-1. For integral destinations the >>> constant used is 1/-1 of the type stored at the destination. For >>> pointers the constant is ptrdiff_t with a value representing the >>> size of the element pointed to. >> >> This is insanely complicated (I think that counts as 'accidental >> complexity' per Andrew's comment ;-) ). Pointer arithmetic is a >> basic/fundamental part of C/C++, yet this template stuff has to jump >> through multiple inverted hoops to do something the language "just >> does"! All this complexity to manage a conversion addend -> addend * >> sizeof(*dest) ?? > > Okay. > >> And the fact that inc/dec are simpler than add, yet result in far >> more complicated templates because the simpler addend is a constant, >> is just as unfathomable to me! > > My latest proposal is to nuke the Atomic::inc/dec specializations and > make it call Atomic::add. Any objections on that? It is arguably > simpler, and then we can leave the complexity discussion behind. > >>> Having said that - I am not opposed to simply removing the >>> specializations of inc/dec if we are scared of the complexity of >>> passing this constant to the platform layer. After running a bunch >>> of benchmarks over the weekend, it showed no significant regressions >>> after removal. Now of course that might not tell the full story - it >>> could have missed that some critical operation in the JVM takes >>> longer. But I would be very surprised if that was the case. >> >> I can imagine we use an "add immediate" form for inc/dec of 1, do we >> actually use that for other values? I would expect inc_ptr/dec_ptr to >> always translate to add_ptr, with no special case for when ptr is >> char* and so we only add/sub 1. ?? > > Yes we currently only inc/sub by 1. > > Thanks, > /Erik > >> Thanks, >> David >> >>> Thanks, >>> /Erik >>> >>>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> Hope this explanation helps understanding the intended structure >>>>> of this work. >>>>> >>>>> Thanks, >>>>> /Erik >>>>> >>>>> On 2017-09-01 12:34, David Holmes wrote: >>>>>> Hi Erik, >>>>>> >>>>>> I just wanted to add that I would expect the cmpxchg, add and >>>>>> inc, Atomic API's to all require similar basic structure for >>>>>> manipulating types/values etc, yet all three seem to have quite >>>>>> different structures that I find very confusing. I'm still at a >>>>>> loss to fathom the CRTP and the hoops we seemingly have to jump >>>>>> through just to add or subtract 1!!! >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> >>>>>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> On 2017-09-01 02:49, David Holmes wrote: >>>>>>>> Hi Erik, >>>>>>>> >>>>>>>> Sorry but this one is really losing me. >>>>>>>> >>>>>>>> What is the role of Adjustment ?? >>>>>>> >>>>>>> Adjustment represents the increment/decrement value as an >>>>>>> IntegralConstant - your template friend for passing around a >>>>>>> constant with both a specified type and value in templates. The >>>>>>> type of the increment/decrement is the type of the destination >>>>>>> when the destination is an integral type, otherwise if it is a >>>>>>> pointer type, the increment/decrement type is ptrdiff_t. >>>>>>> >>>>>>>> How are inc/dec anything but "using constant" ?? >>>>>>> >>>>>>> I was also a bit torn on that name (I assume you are referring >>>>>>> to IncUsingConstant/DecUsingConstant). It was hard to find a >>>>>>> name that depicted what this platform helper does. I considered >>>>>>> calling the helper something with immediate in the name because >>>>>>> it is really used to embed the constant as immediate values in >>>>>>> inline assembly today. But then again that seemed too specific, >>>>>>> as it is not completely obvious platform specializations will >>>>>>> use it in that way. One might just want to specialize this to >>>>>>> send it into some compiler Atomic::inc intrinsic for example. Do >>>>>>> you have any other preferred names? Here are a few possible >>>>>>> names for IncUsingConstant: >>>>>>> >>>>>>> IncUsingScaledConstant >>>>>>> IncUsingAdjustedConstant >>>>>>> IncUsingPlatformHelper >>>>>>> >>>>>>> Any favourites? >>>>>>> >>>>>>>> Why do we special case jshort?? >>>>>>> >>>>>>> To be consistent with the special case of Atomic::add on jshort. >>>>>>> Do you want it removed? >>>>>>> >>>>>>>> This is indecipherable to normal people ;-) >>>>>>>> >>>>>>>> This()->template inc(dest); >>>>>>>> >>>>>>>> For something as trivial as adding or subtracting 1 the >>>>>>>> template machinations here are just mind boggling! >>>>>>> >>>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ >>>>>>> idiom. The idea is to devirtualize a virtual call by passing in >>>>>>> the derived type as a template parameter to a base class, and >>>>>>> then let the base class static_cast to the derived class to >>>>>>> devirtualize the call. I hope this explanation sheds some light >>>>>>> on what is going on. The same CRTP idiom was used in the >>>>>>> Atomic::add implementation in a similar fashion. >>>>>>> >>>>>>> I will add some comments describing this in the next round after >>>>>>> Coleen replies. >>>>>>> >>>>>>> Thanks for looking at this. >>>>>>> >>>>>>> /Erik >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> Bug ID: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>>>>> >>>>>>>>> Webrev: >>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>>>>> >>>>>>>>> The time has come for the next step in generalizing Atomic >>>>>>>>> with templates. Today I will focus on Atomic::inc/dec. >>>>>>>>> >>>>>>>>> I have tried to mimic the new Kim style that seems to have >>>>>>>>> been universally accepted. Like Atomic::add and >>>>>>>>> Atomic::cmpxchg, the structure looks like this: >>>>>>>>> >>>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>>>>>> object that performs some basic type checks. >>>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that >>>>>>>>> can define the operation arbitrarily for a given platform. The >>>>>>>>> default implementation if not specialized for a platform is to >>>>>>>>> call Atomic::add. So only platforms that want to do something >>>>>>>>> different than that as an optimization have to provide a >>>>>>>>> specialization. >>>>>>>>> Layer 3) Platforms that decide to specialize >>>>>>>>> PlatformInc/PlatformDec to be more optimized may inherit from >>>>>>>>> a helper class IncUsingConstant/DecUsingConstant. This helper >>>>>>>>> helps performing the necessary computation what the >>>>>>>>> increment/decrement should be after pointer scaling using >>>>>>>>> CRTP. The PlatformInc/PlatformDec operation then only needs to >>>>>>>>> define an inc/dec member function, and will then get all the >>>>>>>>> context information necessary to generate a more optimized >>>>>>>>> implementation. Easy peasy. >>>>>>>>> >>>>>>>>> It is worth noticing that the generalized Atomic::dec >>>>>>>>> operation assumes a two's complement integer machine and >>>>>>>>> potentially sends the unary negative of a potentially unsigned >>>>>>>>> type to Atomic::add. I have the following comments about this: >>>>>>>>> 1) We already assume in other code that two's complement >>>>>>>>> integers must be present. >>>>>>>>> 2) A machine that does not have two's complement integers may >>>>>>>>> still simply provide a specialization that solves the problem >>>>>>>>> in a different way. >>>>>>>>> 3) The alternative that does not make assumptions about that >>>>>>>>> would use the good old IntegerTypes::cast_to_signed >>>>>>>>> metaprogramming stuff, and I seem to recall we thought that >>>>>>>>> was a bit too involved and complicated. >>>>>>>>> This is the reason why I have chosen to use unary minus on the >>>>>>>>> potentially unsigned type in the shared helper code that sends >>>>>>>>> the decrement as an addend to Atomic::add. >>>>>>>>> >>>>>>>>> It would also be nice if somebody with access to PPC and s390 >>>>>>>>> machines could try out the relevant changes there so I do not >>>>>>>>> accidentally break those platforms. I have blind-coded the >>>>>>>>> addition of the immediate values passed in to the inline >>>>>>>>> assembly in a way that I think looks like it should work. >>>>>>>>> >>>>>>>>> Testing: >>>>>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> /Erik >>>>>>> >>>>> >>> > From stefan.karlsson at oracle.com Mon Sep 18 08:48:11 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 18 Sep 2017 01:48:11 -0700 (PDT) Subject: RFR: 8187578: BitMap::reallocate should check if old_map is NULL In-Reply-To: <852974e7-dc0b-04fa-a01b-c8e98d54039c@oracle.com> References: <852974e7-dc0b-04fa-a01b-c8e98d54039c@oracle.com> Message-ID: Hi David, On 2017-09-17 23:31, David Holmes wrote: > Hi Erik, > > On 15/09/2017 11:04 PM, Erik Helin wrote: >> Hi all, >> >> I'm still trying to compile with gcc 7.1.1 and run into another small >> issue. BitMap::reallocate calls Copy::disjoint_words and there is a >> case in Copy::disjoint_words that might result in call to memcpy. The >> problem is that BitMap::reallocate does not check that the "from" >> argument to Copy::disjoint_words differs from NULL, and a call to >> memcpy with a NULL argument is undefined behavior. > > Shouldn't this whole function be a no-op if old_map is NULL? Wouldn't > calling it with NULL be a programming error that we should be checking > via an assert? When this function was written the intent was to give it similar semantics as that of realloc. Hence, old_map == NULL is supposed to be a valid input to reallocate. This is explicitly used in the following function: template bm_word_t* BitMap::allocate(const Allocator& allocator, idx_t size_in_bits, bool clear) { // Reuse reallocate to ensure that the new memory is cleared. return reallocate(allocator, NULL, 0, size_in_bits, clear); } Thanks, StefanK > > Thanks, > David > >> Webrev: >> http://cr.openjdk.java.net/~ehelin/8187578/00/ >> >> Patch: >> --- old/src/hotspot/share/utilities/bitMap.cpp 2017-09-15 >> 14:47:21.471113699 +0200 >> +++ new/src/hotspot/share/utilities/bitMap.cpp 2017-09-15 >> 14:47:21.179112252 +0200 >> @@ -81,8 +81,10 @@ >> if (new_size_in_words > 0) { >> map = allocator.allocate(new_size_in_words); >> >> - Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, >> - MIN2(old_size_in_words, new_size_in_words)); >> + if (old_map != NULL) { >> + Copy::disjoint_words((HeapWord*)old_map, (HeapWord*) map, >> + MIN2(old_size_in_words, new_size_in_words)); >> + } >> >> if (new_size_in_words > old_size_in_words) { >> clear_range_of_words(map, old_size_in_words, new_size_in_words); >> >> Thanks, >> Erik From kim.barrett at oracle.com Mon Sep 18 11:29:13 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 18 Sep 2017 07:29:13 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59BF8737.4070804@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <59AE775E.1070503@oracle.com> <59BF8737.4070804@oracle.com> Message-ID: > On Sep 18, 2017, at 4:43 AM, Erik ?sterlund wrote: > > Hi, > > After some off-list discussions I have made a new version with the following improvements: > > 1) Added some comments describing the constraints on the types passed in to inc/dec (integral or pointer, and pointers are scaled). > 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually used pointers, only pointer sized integers. So I thought removing these overloads and the unnecessary confusion caused by them would make it easier to review this change. > 3) Renamed the typedef in the body representing the addend to be called I instead of T to be consistent with the convention Kim introduced. > > Full webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/ > > Incremental webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/ In the descriptions of inc and dec: - "inc*()" => "inc()" and "dec*()" => "dec()?, as the _ptr variants are now gone. - "size of the type of the pointer" might be more clear as "size of the pointed to type?, or perhaps ?the pointee type?. Otherwise, looks good. I don?t need another webrev for above comment changes. From erik.osterlund at oracle.com Mon Sep 18 11:34:42 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 18 Sep 2017 13:34:42 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <59AE775E.1070503@oracle.com> <59BF8737.4070804@oracle.com> Message-ID: <59BFAF52.1090801@oracle.com> Hi Kim, Thanks for the review. /Erik On 2017-09-18 13:29, Kim Barrett wrote: >> On Sep 18, 2017, at 4:43 AM, Erik ?sterlund wrote: >> >> Hi, >> >> After some off-list discussions I have made a new version with the following improvements: >> >> 1) Added some comments describing the constraints on the types passed in to inc/dec (integral or pointer, and pointers are scaled). >> 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually used pointers, only pointer sized integers. So I thought removing these overloads and the unnecessary confusion caused by them would make it easier to review this change. >> 3) Renamed the typedef in the body representing the addend to be called I instead of T to be consistent with the convention Kim introduced. >> >> Full webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/ >> >> Incremental webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/ > In the descriptions of inc and dec: > > - "inc*()" => "inc()" and "dec*()" => "dec()?, as the _ptr variants are now gone. > > - "size of the type of the pointer" might be more clear as "size of the pointed to type?, or perhaps ?the pointee type?. > > Otherwise, looks good. I don?t need another webrev for above comment changes. > From felix.yang at linaro.org Mon Sep 18 14:01:09 2017 From: felix.yang at linaro.org (Felix Yang) Date: Mon, 18 Sep 2017 22:01:09 +0800 Subject: Question regarding "native-compiled frame" In-Reply-To: References: <6cc77474-c4eb-f1a7-9412-04addfed2811@oracle.com> Message-ID: Yes, the AArch64 port is doing the same thing as the x86 port. I noticed this issue when I was trying to analyze a recent JVM crash issue which triggers on the x86-64 platform with an official Oracle jdk8u121 release. It turns out that this bug happens very rarely (triggered only once for my Java workload and I am unable to reproduce it). And I think it is the same as: https://bugs.openjdk.java.net/browse/JDK-8146224 which is still pending there. Looking at the assembly instructions in the hs error log file, I find the SIGSEGV is triggered by the last return statement of frame::sender function. Currently, I disabled the FindDeadLocks VM_Operation in my Java application trying to lower the possibility of hitting the bug. But I am afraid that it is something of a ticking bomb for us. Thanks for your help, Felix On 18 September 2017 at 16:24, Andrew Haley wrote: > On 16/09/17 10:43, Felix Yang wrote: > > Thanks for this helpful information. > > I performed JTreg and specjbb test and I find this code never hit on x86 > > platform. > > So I also think of this return statement for x86 as dead code. Don't know > > why the code is there. > > We'll probably never know. We should take the x86 patch and apply it to > AArch64 > for JDK 10. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From aph at redhat.com Mon Sep 18 14:05:29 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Sep 2017 15:05:29 +0100 Subject: Question regarding "native-compiled frame" In-Reply-To: References: <6cc77474-c4eb-f1a7-9412-04addfed2811@oracle.com> Message-ID: <4e56f7b2-3d3c-c0b3-b101-1b27e5108502@redhat.com> On 18/09/17 15:01, Felix Yang wrote: > Currently, I disabled the FindDeadLocks VM_Operation in my Java application > trying to lower the possibility of hitting the bug. But I am afraid that it > is something of a ticking bomb for us. It's clearly stack corruption of some kind. It might even be a bug in the host C++ compiler. We'll never know until we can reproduce it. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Mon Sep 18 15:29:09 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 18 Sep 2017 17:29:09 +0200 Subject: JDK10/RFR(XXS): 8011352: C1: TraceCodeBlobStacks crashes fastdebug solaris sparc In-Reply-To: <6821786d-3c0b-9219-0975-5d7c4e4e8e79@oracle.com> References: <6821786d-3c0b-9219-0975-5d7c4e4e8e79@oracle.com> Message-ID: > Patch below: > > -----8<----- > > --- old/src/cpu/sparc/vm/frame_sparc.cpp Wed Mar 22 16:47:13 2017 > +++ new/src/cpu/sparc/vm/frame_sparc.cpp Wed Mar 22 16:47:12 2017 > @@ -123,8 +123,8 @@ > reg = regname->as_Register(); > } > if (reg->is_out()) { > - assert(_younger_window != NULL, "Younger window should be available"); > - return second_word + (address)&_younger_window[reg->after_save()->sp_offset_in_saved_window()]; > + return _younger_window == NULL ? NULL : > + second_word + (address)&_younger_window[reg->after_save()->sp_offset_in_saved_window()]; > } > if (reg->is_local() || reg->is_in()) { > assert(_window != NULL, "Window should be available"); That looks reasonable to me. Roland. From aph at redhat.com Mon Sep 18 15:38:08 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Sep 2017 16:38:08 +0100 Subject: [RFR]: 8187590: Zero runtime can lock-up on linux-alpha In-Reply-To: <82f35edb-ec8a-180c-3bdc-81fc34ef647c@physik.fu-berlin.de> References: <82f35edb-ec8a-180c-3bdc-81fc34ef647c@physik.fu-berlin.de> Message-ID: <6d7f27a2-60d0-9cc9-9f8e-e21e2877b476@redhat.com> On 15/09/17 22:15, John Paul Adrian Glaubitz wrote: > Please review this change [1] which fixes random lockups of the Zero runtime > on linux-alpha. This was discovered when building OpenJDK on the Debian > automatic package builders for Alpha, particularly on SMP machines. > > It was observed that the issue could be fixed by installing a uni-processor > kernel. After some testing, we discovered that the proper fix is to use > __sync_synchronize() even for light memory barriers. That looks right, thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Mon Sep 18 19:17:23 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 18 Sep 2017 15:17:23 -0400 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) Message-ID: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> Compiler (C2) uses ResourceArea instead of Arena in some circumstances, so it can take advantage of ResourceMark. However, ResourceArea is tagged as mtThread, that results those memory is miscounted by NMT Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ Test: hotspot_tier1 (fastdebug and release) on Linux x64 Thanks, -Zhengyu From adinn at redhat.com Tue Sep 19 08:09:33 2017 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 19 Sep 2017 09:09:33 +0100 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> Message-ID: <57037ca9-2e2d-4604-98ee-6d1e654e8573@redhat.com> On 18/09/17 20:17, Zhengyu Gu wrote: > Compiler (C2) uses ResourceArea instead of Arena in some circumstances, > so it can take advantage of ResourceMark. However, ResourceArea is > tagged as mtThread, that results those memory is miscounted by NMT > > Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 > Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ > > > Test: > > ? hotspot_tier1 (fastdebug and release) on Linux x64 Changes look good to me. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From david.holmes at oracle.com Tue Sep 19 10:46:08 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 19 Sep 2017 20:46:08 +1000 Subject: OpenJDK OOM issue - In-Reply-To: References: Message-ID: <8f7f43fd-fbee-fd54-21ab-d43e24e8a694@oracle.com> Hi Tim, On 19/09/2017 6:50 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > Hi OpenJDK dev group This is better discussed on hotspot-dev, so redirecting there. > We meet one issue that the VM failed to initialize. The error log is as below. We checked both memory usage and thread number. They do not hit the limit. So could you please help to confirm why "java.lang.OutOfMemoryError: unable to create new native thread" error occurs? Many thanks. Unfortunately there is no way to tell. As you indicate there appears to be enough memory, and there appear to be enough threads/processes available for creation. Does this happen regularly or was this a one-of failure? You really need to see the exact state of the machine at the time this happened. David > " > on Sep 18 11:05:04 EEST 2017 2 or first INFO log missing: Error occurred during initialization of VM > java.lang.OutOfMemoryError: unable to create new native thread > Error occurred during initialization of VM > java.lang.OutOfMemoryError: unable to create new native thread > > 1. Memory Usage > MemFree: 898332 kB > From below core file generated during OMM, it can be seen about 900M physical memory available during that time. > > > 2 Thread number > > sh-4.1$ ps -eLf|wc -l > 5326 > > > sh-4.1$ ulimit -a > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 43497 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 43497 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > Br, > Tim > > > > From zgu at redhat.com Tue Sep 19 12:06:21 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 19 Sep 2017 08:06:21 -0400 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <57037ca9-2e2d-4604-98ee-6d1e654e8573@redhat.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <57037ca9-2e2d-4604-98ee-6d1e654e8573@redhat.com> Message-ID: <4f1b1ff9-498c-dd55-f85f-f54bbe3e4ec3@redhat.com> Thanks for the quick review, Andrew. -Zhengyu On 09/19/2017 04:09 AM, Andrew Dinn wrote: > On 18/09/17 20:17, Zhengyu Gu wrote: >> Compiler (C2) uses ResourceArea instead of Arena in some circumstances, >> so it can take advantage of ResourceMark. However, ResourceArea is >> tagged as mtThread, that results those memory is miscounted by NMT >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >> >> >> Test: >> >> hotspot_tier1 (fastdebug and release) on Linux x64 > Changes look good to me. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander > From erik.helin at oracle.com Tue Sep 19 12:42:49 2017 From: erik.helin at oracle.com (Erik Helin) Date: Tue, 19 Sep 2017 14:42:49 +0200 Subject: RFR: 8187667: Disable deprecation warning for readdir_r Message-ID: <5081712c-9c62-5b6a-2e43-9b8d6e3ca64a@oracle.com> Hi all, I'm continuing to run into some small problems when compiling HotSpot with a more recent toolchain. It seems like readdir_r [0] has been deprecated beginning with glibc 2.24 [1]. In HotSpot, we use readdir_r for os::readdir on Linux (defined in os_linux.inline.hpp). Since readdir_r most likely will stay around for a long time in glibc (even though in deprecated form), I figured it was best to just silence the deprecation warning from gcc. If readdir_r finally is removed one day, then we might have to look up the appropriate readdir function using dlopen, dlsym etc. Patch: http://cr.openjdk.java.net/~ehelin/8187667/00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8187667 Testing: - Compiles with: - gcc 7.1.1 and glibc 2.25 on Fedora 26 - gcc 4.9.2 and glibc 2.12 on OEL 6.4 - JPRT Thanks, Erik [0]: http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=19056 From poonam.bajaj at oracle.com Tue Sep 19 13:08:57 2017 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Tue, 19 Sep 2017 06:08:57 -0700 (PDT) Subject: OpenJDK OOM issue - In-Reply-To: <8f7f43fd-fbee-fd54-21ab-d43e24e8a694@oracle.com> References: <8f7f43fd-fbee-fd54-21ab-d43e24e8a694@oracle.com> Message-ID: Hello Tim, With CompressedOops enabled (which is enabled by default with 64-bit JVM), the Java heap may get placed in the lower virtual address space leaving very little space for the native heap allocations, and that can cause these kinds of failures even when there is lot of native memory available. Please read details here: https://blogs.oracle.com/poonam/running-on-a-64bit-platform-and-still-running-out-of-memory You can check the output collected with -XX:+PrintGCDetails that would show where your Java Heap is being based at. Thanks, Poonam > -----Original Message----- > From: David Holmes > Sent: Tuesday, September 19, 2017 3:46 AM > To: Yu, Tim (NSB - CN/Chengdu) > Cc: jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net; hotspot-dev > developers; Shen, David (NSB - CN/Chengdu) > Subject: Re: OpenJDK OOM issue - > > Hi Tim, > > On 19/09/2017 6:50 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > > Hi OpenJDK dev group > > This is better discussed on hotspot-dev, so redirecting there. > > > We meet one issue that the VM failed to initialize. The error log is > as below. We checked both memory usage and thread number. They do not > hit the limit. So could you please help to confirm why > "java.lang.OutOfMemoryError: unable to create new native thread" error > occurs? Many thanks. > > Unfortunately there is no way to tell. As you indicate there appears to > be enough memory, and there appear to be enough threads/processes > available for creation. > > Does this happen regularly or was this a one-of failure? You really > need to see the exact state of the machine at the time this happened. > > David > > > " > > on Sep 18 11:05:04 EEST 2017 2 or first INFO log missing: Error > > occurred during initialization of VM > > java.lang.OutOfMemoryError: unable to create new native thread Error > > occurred during initialization of VM > > java.lang.OutOfMemoryError: unable to create new native thread > > > > 1. Memory Usage > > MemFree: 898332 kB > > From below core file generated during OMM, it can be seen about 900M > physical memory available during that time. > > > > > > 2 Thread number > > > > sh-4.1$ ps -eLf|wc -l > > 5326 > > > > > > sh-4.1$ ulimit -a > > core file size (blocks, -c) 0 > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 43497 > > max locked memory (kbytes, -l) 64 > > max memory size (kbytes, -m) unlimited > > open files (-n) 1024 > > pipe size (512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 10240 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 43497 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > > > Br, > > Tim > > > > > > > > From erik.helin at oracle.com Tue Sep 19 13:37:59 2017 From: erik.helin at oracle.com (Erik Helin) Date: Tue, 19 Sep 2017 15:37:59 +0200 Subject: RFR: 8187676: Disable harmless uninitialized warnings for two files Message-ID: <2031be3e-2623-dde1-fff2-2d6cd6e41de9@oracle.com> Hi all, with gcc 7.1.1 from Fedora 26 on x86-64 there are warnings about the potential usage of maybe uninitialized memory in src/hotspot/cpu/x86/assembler_x86.cpp and in src/hotspot/cpu/x86/interp_masm_x86.cpp. The problems arises from the class RelocationHolder in src/hotspot/share/code/relocInfo.hpp which has the private fields: enum { _relocbuf_size = 5 }; void* _relocbuf[ _relocbuf_size ]; and the default constructor for RelocationHolder does not initialize the elements of _relocbuf. I _think_ this is an optimization, RelocationHolder is used *a lot* and setting the elements of RelocationHolder::_relocbuf to NULL (or some other value) in the default constructor might result in a performance penalty. Have a look in build/linux-x86_64-normal-server-fastdebug/hotspot/variant-server/gensrc/adfiles and you will see that RelocationHolder is used all over the place :) AFAICS all users of RelocationHolder::_relocbuf take care to not use uninitialized memory, which means that this warning is wrong, so I suggest we disable the warning -Wmaybe-uninitialized for src/hotspot/cpu/x86/assembler_x86.cpp. The problem continues because the class Address in src/hotspot/cpu/x86/assembler_x86.hpp has a private field, `RelocationHolder _rspec;` and the default constructor for Address does not initialize _rspec._relocbuf (most likely for performance reasons). The class Address also has a default copy constructor, which will copy all the elements of _rspec._relocbuf, which will result in a read of uninitialized memory. However, this is a benign usage of uninitialized memory, since we take no action based on the content of the uninitialized memory (it is just copied byte for byte). So, in this case too, I suggest we disable the warning -Wuninitialized for src/hotspot/cpu/x86/assembler_x86.hpp. What do you think? Patch: http://cr.openjdk.java.net/~ehelin/8187676/00/ --- old/make/hotspot/lib/JvmOverrideFiles.gmk 2017-09-19 15:11:45.036108983 +0200 +++ new/make/hotspot/lib/JvmOverrideFiles.gmk 2017-09-19 15:11:44.692107277 +0200 @@ -32,6 +32,8 @@ ifeq ($(TOOLCHAIN_TYPE), gcc) BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := -fno-var-tracking-assignments -O0 BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := -fno-var-tracking-assignments + BUILD_LIBJVM_assembler_x86.cpp_CXXFLAGS := -Wno-maybe-uninitialized + BUILD_LIBJVM_interp_masm_x86.cpp_CXXFLAGS := -Wno-uninitialized endif ifeq ($(OPENJDK_TARGET_OS), linux) Issue: https://bugs.openjdk.java.net/browse/JDK-8187676 Testing: - Compiles with: - gcc 7.1.1 and glibc 2.25 on Fedora 26 - gcc 4.9.2 and glibc 2.12 on OEL 6.4 - JPRT Thanks, Erik From kim.barrett at oracle.com Tue Sep 19 13:55:59 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 19 Sep 2017 09:55:59 -0400 Subject: RFR: 8187676: Disable harmless uninitialized warnings for two files In-Reply-To: <2031be3e-2623-dde1-fff2-2d6cd6e41de9@oracle.com> References: <2031be3e-2623-dde1-fff2-2d6cd6e41de9@oracle.com> Message-ID: <0C253EA0-47D4-492E-B453-50E187D3EB82@oracle.com> > On Sep 19, 2017, at 9:37 AM, Erik Helin wrote: > > Hi all, > > with gcc 7.1.1 from Fedora 26 on x86-64 there are warnings about the potential usage of maybe uninitialized memory in src/hotspot/cpu/x86/assembler_x86.cpp and in src/hotspot/cpu/x86/interp_masm_x86.cpp. > > The problems arises from the class RelocationHolder in src/hotspot/share/code/relocInfo.hpp which has the private fields: > enum { _relocbuf_size = 5 }; > void* _relocbuf[ _relocbuf_size ]; > > and the default constructor for RelocationHolder does not initialize the elements of _relocbuf. I _think_ this is an optimization, RelocationHolder is used *a lot* and setting the elements of RelocationHolder::_relocbuf to NULL (or some other value) in the default constructor might result in a performance penalty. Have a look in build/linux-x86_64-normal-server-fastdebug/hotspot/variant-server/gensrc/adfiles and you will see that RelocationHolder is used all over the place :) > > AFAICS all users of RelocationHolder::_relocbuf take care to not use uninitialized memory, which means that this warning is wrong, so I suggest we disable the warning -Wmaybe-uninitialized for src/hotspot/cpu/x86/assembler_x86.cpp. Rahul Raghavan is working on a fix for JDK-8160404 that is likely relevant. I have a patch from him for preliminary review. From erik.helin at oracle.com Tue Sep 19 14:46:44 2017 From: erik.helin at oracle.com (Erik Helin) Date: Tue, 19 Sep 2017 16:46:44 +0200 Subject: RFR: 8187676: Disable harmless uninitialized warnings for two files In-Reply-To: <0C253EA0-47D4-492E-B453-50E187D3EB82@oracle.com> References: <2031be3e-2623-dde1-fff2-2d6cd6e41de9@oracle.com> <0C253EA0-47D4-492E-B453-50E187D3EB82@oracle.com> Message-ID: On 09/19/2017 03:55 PM, Kim Barrett wrote: >> On Sep 19, 2017, at 9:37 AM, Erik Helin wrote: >> >> Hi all, >> >> with gcc 7.1.1 from Fedora 26 on x86-64 there are warnings about the potential usage of maybe uninitialized memory in src/hotspot/cpu/x86/assembler_x86.cpp and in src/hotspot/cpu/x86/interp_masm_x86.cpp. >> >> The problems arises from the class RelocationHolder in src/hotspot/share/code/relocInfo.hpp which has the private fields: >> enum { _relocbuf_size = 5 }; >> void* _relocbuf[ _relocbuf_size ]; >> >> and the default constructor for RelocationHolder does not initialize the elements of _relocbuf. I _think_ this is an optimization, RelocationHolder is used *a lot* and setting the elements of RelocationHolder::_relocbuf to NULL (or some other value) in the default constructor might result in a performance penalty. Have a look in build/linux-x86_64-normal-server-fastdebug/hotspot/variant-server/gensrc/adfiles and you will see that RelocationHolder is used all over the place :) >> >> AFAICS all users of RelocationHolder::_relocbuf take care to not use uninitialized memory, which means that this warning is wrong, so I suggest we disable the warning -Wmaybe-uninitialized for src/hotspot/cpu/x86/assembler_x86.cpp. > > Rahul Raghavan is working on a fix for JDK-8160404 that is likely relevant. > I have a patch from him for preliminary review. I had a look at https://bugs.openjdk.java.net/browse/JDK-8160404 and I'm not sure that the patch ensures that the elements of _relocbuf are initialized (could very well be, I'm not particularly familiar with this code). Since JDK-8160404 seems to be under development, maybe we should take in this patch that disables the warnings and then the patch for JDK-8160404 can enable the warnings again (if possible)? Thanks, Erik From rahul.v.raghavan at oracle.com Tue Sep 19 14:59:50 2017 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Tue, 19 Sep 2017 20:29:50 +0530 Subject: RFR: 8187676: Disable harmless uninitialized warnings for two files In-Reply-To: <2031be3e-2623-dde1-fff2-2d6cd6e41de9@oracle.com> References: <2031be3e-2623-dde1-fff2-2d6cd6e41de9@oracle.com> Message-ID: <7512e87d-4e28-27a1-5e10-5cdfa794cdf4@oracle.com> Hi Erik, Please note that this 8187676 seems to be related to 8160404. https://bugs.openjdk.java.net/browse/JDK-8160404 (RelocationHolder constructors have bugs) As per the latest notes comments added for 8160404-jbs, I will submit webrev/RFR soon and will request help confirm similar issues with latest gcc7 gets solved. Thanks, Rahul On Tuesday 19 September 2017 07:07 PM, Erik Helin wrote: > Hi all, > > with gcc 7.1.1 from Fedora 26 on x86-64 there are warnings about the > potential usage of maybe uninitialized memory in > src/hotspot/cpu/x86/assembler_x86.cpp and in > src/hotspot/cpu/x86/interp_masm_x86.cpp. > > The problems arises from the class RelocationHolder in > src/hotspot/share/code/relocInfo.hpp which has the private fields: > enum { _relocbuf_size = 5 }; > void* _relocbuf[ _relocbuf_size ]; > > and the default constructor for RelocationHolder does not initialize the > elements of _relocbuf. I _think_ this is an optimization, > RelocationHolder is used *a lot* and setting the elements of > RelocationHolder::_relocbuf to NULL (or some other value) in the default > constructor might result in a performance penalty. Have a look in > build/linux-x86_64-normal-server-fastdebug/hotspot/variant-server/gensrc/adfiles > and you will see that RelocationHolder is used all over the place :) > > AFAICS all users of RelocationHolder::_relocbuf take care to not use > uninitialized memory, which means that this warning is wrong, so I > suggest we disable the warning -Wmaybe-uninitialized for > src/hotspot/cpu/x86/assembler_x86.cpp. > > The problem continues because the class Address in > src/hotspot/cpu/x86/assembler_x86.hpp has a private field, > `RelocationHolder _rspec;` and the default constructor for Address does > not initialize _rspec._relocbuf (most likely for performance reasons). > The class Address also has a default copy constructor, which will copy > all the elements of _rspec._relocbuf, which will result in a read of > uninitialized memory. However, this is a benign usage of uninitialized > memory, since we take no action based on the content of the > uninitialized memory (it is just copied byte for byte). > > So, in this case too, I suggest we disable the warning -Wuninitialized > for src/hotspot/cpu/x86/assembler_x86.hpp. > > What do you think? > > Patch: > http://cr.openjdk.java.net/~ehelin/8187676/00/ > > --- old/make/hotspot/lib/JvmOverrideFiles.gmk 2017-09-19 > 15:11:45.036108983 +0200 > +++ new/make/hotspot/lib/JvmOverrideFiles.gmk 2017-09-19 > 15:11:44.692107277 +0200 > @@ -32,6 +32,8 @@ > ifeq ($(TOOLCHAIN_TYPE), gcc) > BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := -fno-var-tracking-assignments > -O0 > BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := > -fno-var-tracking-assignments > + BUILD_LIBJVM_assembler_x86.cpp_CXXFLAGS := -Wno-maybe-uninitialized > + BUILD_LIBJVM_interp_masm_x86.cpp_CXXFLAGS := -Wno-uninitialized > endif > > ifeq ($(OPENJDK_TARGET_OS), linux) > > Issue: > https://bugs.openjdk.java.net/browse/JDK-8187676 > > Testing: > - Compiles with: > - gcc 7.1.1 and glibc 2.25 on Fedora 26 > - gcc 4.9.2 and glibc 2.12 on OEL 6.4 > - JPRT > > Thanks, > Erik From chris.plummer at oracle.com Tue Sep 19 15:27:37 2017 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 19 Sep 2017 08:27:37 -0700 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> Message-ID: Looks good. Seems to follow a pattern used elsewhere. thanks, Chris On 9/18/17 12:17 PM, Zhengyu Gu wrote: > Compiler (C2) uses ResourceArea instead of Arena in some > circumstances, so it can take advantage of ResourceMark. However, > ResourceArea is tagged as mtThread, that results those memory is > miscounted by NMT > > Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 > Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ > > > Test: > > ? hotspot_tier1 (fastdebug and release) on Linux x64 > > > Thanks, > > -Zhengyu From zgu at redhat.com Tue Sep 19 15:29:57 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 19 Sep 2017 11:29:57 -0400 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> Message-ID: <12e6f4e9-6925-6740-fa46-bd9d413381bb@redhat.com> Thanks for the review, Chris. -Zhengyu On 09/19/2017 11:27 AM, Chris Plummer wrote: > Looks good. Seems to follow a pattern used elsewhere. > > thanks, > > Chris > > On 9/18/17 12:17 PM, Zhengyu Gu wrote: >> Compiler (C2) uses ResourceArea instead of Arena in some >> circumstances, so it can take advantage of ResourceMark. However, >> ResourceArea is tagged as mtThread, that results those memory is >> miscounted by NMT >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >> >> >> Test: >> >> hotspot_tier1 (fastdebug and release) on Linux x64 >> >> >> Thanks, >> >> -Zhengyu > > > From vladimir.kozlov at oracle.com Tue Sep 19 16:14:40 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Sep 2017 09:14:40 -0700 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> Message-ID: <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> Compilers also use a lot thread local ResourceArea - Thread::current()->resource_area() and NEW_RESOURCE_ARRAY() macro. But thread local area is defined as mtThread: http://hg.openjdk.java.net/jdk10/hs/hotspot/file/5ab7a67bc155/src/share/vm/runtime/thread.cpp#l218 Vladimir On 9/18/17 12:17 PM, Zhengyu Gu wrote: > Compiler (C2) uses ResourceArea instead of Arena in some circumstances, > so it can take advantage of ResourceMark. However, ResourceArea is > tagged as mtThread, that results those memory is miscounted by NMT > > Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 > Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ > > > Test: > > ? hotspot_tier1 (fastdebug and release) on Linux x64 > > > Thanks, > > -Zhengyu From vladimir.kozlov at oracle.com Tue Sep 19 16:25:20 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Sep 2017 09:25:20 -0700 Subject: RFR: 8187676: Disable harmless uninitialized warnings for two files In-Reply-To: <7512e87d-4e28-27a1-5e10-5cdfa794cdf4@oracle.com> References: <2031be3e-2623-dde1-fff2-2d6cd6e41de9@oracle.com> <7512e87d-4e28-27a1-5e10-5cdfa794cdf4@oracle.com> Message-ID: <4f5b0427-54bf-2b85-0a94-bb41049d2676@oracle.com> I would prefer to have general solution Rahul is working on because code is general - not only x86 is affected. Thanks, Vladimir On 9/19/17 7:59 AM, Rahul Raghavan wrote: > Hi Erik, > > Please note that this 8187676 seems to be related to 8160404. > ?? https://bugs.openjdk.java.net/browse/JDK-8160404 > ?? (RelocationHolder constructors have bugs) > > As per the latest notes comments added for 8160404-jbs, I will submit > webrev/RFR soon and will request help confirm similar issues with latest > gcc7 gets solved. > > Thanks, > Rahul > > On Tuesday 19 September 2017 07:07 PM, Erik Helin wrote: >> Hi all, >> >> with gcc 7.1.1 from Fedora 26 on x86-64 there are warnings about the >> potential usage of maybe uninitialized memory in >> src/hotspot/cpu/x86/assembler_x86.cpp and in >> src/hotspot/cpu/x86/interp_masm_x86.cpp. >> >> The problems arises from the class RelocationHolder in >> src/hotspot/share/code/relocInfo.hpp which has the private fields: >> ?? enum { _relocbuf_size = 5 }; >> ?? void* _relocbuf[ _relocbuf_size ]; >> >> and the default constructor for RelocationHolder does not initialize >> the elements of _relocbuf. I _think_ this is an optimization, >> RelocationHolder is used *a lot* and setting the elements of >> RelocationHolder::_relocbuf to NULL (or some other value) in the >> default constructor might result in a performance penalty. Have a look >> in >> build/linux-x86_64-normal-server-fastdebug/hotspot/variant-server/gensrc/adfiles >> and you will see that RelocationHolder is used all over the place :) >> >> AFAICS all users of RelocationHolder::_relocbuf take care to not use >> uninitialized memory, which means that this warning is wrong, so I >> suggest we disable the warning -Wmaybe-uninitialized for >> src/hotspot/cpu/x86/assembler_x86.cpp. >> >> The problem continues because the class Address in >> src/hotspot/cpu/x86/assembler_x86.hpp has a private field, >> `RelocationHolder _rspec;` and the default constructor for Address >> does not initialize _rspec._relocbuf (most likely for performance >> reasons). The class Address also has a default copy constructor, which >> will copy all the elements of _rspec._relocbuf, which will result in a >> read of uninitialized memory. However, this is a benign usage of >> uninitialized memory, since we take no action based on the content of >> the uninitialized memory (it is just copied byte for byte). >> >> So, in this case too, I suggest we disable the warning -Wuninitialized >> for src/hotspot/cpu/x86/assembler_x86.hpp. >> >> What do you think? >> >> Patch: >> http://cr.openjdk.java.net/~ehelin/8187676/00/ >> >> --- old/make/hotspot/lib/JvmOverrideFiles.gmk??? 2017-09-19 >> 15:11:45.036108983 +0200 >> +++ new/make/hotspot/lib/JvmOverrideFiles.gmk??? 2017-09-19 >> 15:11:44.692107277 +0200 >> @@ -32,6 +32,8 @@ >> ? ifeq ($(TOOLCHAIN_TYPE), gcc) >> ??? BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := >> -fno-var-tracking-assignments -O0 >> ??? BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := >> -fno-var-tracking-assignments >> +? BUILD_LIBJVM_assembler_x86.cpp_CXXFLAGS := -Wno-maybe-uninitialized >> +? BUILD_LIBJVM_interp_masm_x86.cpp_CXXFLAGS := -Wno-uninitialized >> ? endif >> >> ? ifeq ($(OPENJDK_TARGET_OS), linux) >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8187676 >> >> Testing: >> - Compiles with: >> ?? - gcc 7.1.1 and glibc 2.25 on Fedora 26 >> ?? - gcc 4.9.2 and glibc 2.12 on OEL 6.4 >> - JPRT >> >> Thanks, >> Erik From adinn at redhat.com Tue Sep 19 16:28:47 2017 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 19 Sep 2017 17:28:47 +0100 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> Message-ID: <11c90c3d-a9a6-1fc1-8cf0-d27698d99bcc@redhat.com> On 19/09/17 17:14, Vladimir Kozlov wrote: > Compilers also use a lot thread local ResourceArea - > Thread::current()->resource_area() and NEW_RESOURCE_ARRAY() macro. > But thread local area is defined as mtThread: > > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/5ab7a67bc155/src/share/vm/runtime/thread.cpp#l218 True, although that's arguably part of the fixed cost of creating a thread. By contrast, the changes Zhengyu has made are for data areas specifically allocated during a compilation to allow that compile to proceed. I don't know for sure which way to sway on that first count. The second one definitely needs debiting against the compiler account. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From zgu at redhat.com Tue Sep 19 16:36:17 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 19 Sep 2017 12:36:17 -0400 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> Message-ID: <273c1e58-2405-e3e0-1d45-56d626dbd974@redhat.com> Hi Vladimir, On 09/19/2017 12:14 PM, Vladimir Kozlov wrote: > Compilers also use a lot thread local ResourceArea - > Thread::current()->resource_area() and NEW_RESOURCE_ARRAY() macro. > But thread local area is defined as mtThread: > > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/5ab7a67bc155/src/share/vm/runtime/thread.cpp#l218 Thread's resource area is general purpose per-thread storage and used by many subsystems. Unfortunately, NMT has no way to distinguish the users at this moment, categorizing as mThread is sort of placeholder. I am welcome to any suggestions Thanks, -Zhengyu > > > Vladimir > > On 9/18/17 12:17 PM, Zhengyu Gu wrote: >> Compiler (C2) uses ResourceArea instead of Arena in some >> circumstances, so it can take advantage of ResourceMark. However, >> ResourceArea is tagged as mtThread, that results those memory is >> miscounted by NMT >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >> >> >> Test: >> >> hotspot_tier1 (fastdebug and release) on Linux x64 >> >> >> Thanks, >> >> -Zhengyu From vladimir.kozlov at oracle.com Tue Sep 19 16:37:16 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Sep 2017 09:37:16 -0700 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <11c90c3d-a9a6-1fc1-8cf0-d27698d99bcc@redhat.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> <11c90c3d-a9a6-1fc1-8cf0-d27698d99bcc@redhat.com> Message-ID: <2fdbf27f-0eaa-2bf9-c02c-b3a918316ed2@oracle.com> On 9/19/17 9:28 AM, Andrew Dinn wrote: > On 19/09/17 17:14, Vladimir Kozlov wrote: >> Compilers also use a lot thread local ResourceArea - >> Thread::current()->resource_area() and NEW_RESOURCE_ARRAY() macro. >> But thread local area is defined as mtThread: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/5ab7a67bc155/src/share/vm/runtime/thread.cpp#l218 > True, although that's arguably part of the fixed cost of creating a thread. > > By contrast, the changes Zhengyu has made are for data areas > specifically allocated during a compilation to allow that compile to > proceed. I totally agree with fix. But I think we should do more to account memory used by compilers. Thread local area is huge part of it. Thanks, Vladimir > > I don't know for sure which way to sway on that first count. The second > one definitely needs debiting against the compiler account. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander > From vladimir.kozlov at oracle.com Tue Sep 19 16:52:56 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Sep 2017 09:52:56 -0700 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <273c1e58-2405-e3e0-1d45-56d626dbd974@redhat.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> <273c1e58-2405-e3e0-1d45-56d626dbd974@redhat.com> Message-ID: On 9/19/17 9:36 AM, Zhengyu Gu wrote: > Hi Vladimir, > > > On 09/19/2017 12:14 PM, Vladimir Kozlov wrote: >> Compilers also use a lot thread local ResourceArea - >> Thread::current()->resource_area() and NEW_RESOURCE_ARRAY() macro. >> But thread local area is defined as mtThread: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/5ab7a67bc155/src/share/vm/runtime/thread.cpp#l218 > > > Thread's resource area is general purpose per-thread storage and used by > many subsystems. Unfortunately, NMT has no way to distinguish the users > at this moment, categorizing as mThread is sort of placeholder. > > I am welcome to any suggestions Thread() is called from CompilerThread() and we can pass a parameter to indicate user. Or add a virtual method to Thread class to check type of thread. In the past we had compiler local changes to get information how much memory was used during compilation but it was never get pushed. We accessed Arena::_bytes_allocated for that. Thanks, Vladimir > > Thanks, > > -Zhengyu > > >> >> >> Vladimir >> >> On 9/18/17 12:17 PM, Zhengyu Gu wrote: >>> Compiler (C2) uses ResourceArea instead of Arena in some >>> circumstances, so it can take advantage of ResourceMark. However, >>> ResourceArea is tagged as mtThread, that results those memory is >>> miscounted by NMT >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >>> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >>> >>> >>> Test: >>> >>> ?? hotspot_tier1 (fastdebug and release) on Linux x64 >>> >>> >>> Thanks, >>> >>> -Zhengyu From zgu at redhat.com Tue Sep 19 17:19:21 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 19 Sep 2017 13:19:21 -0400 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> <273c1e58-2405-e3e0-1d45-56d626dbd974@redhat.com> Message-ID: <43860830-c766-8238-2594-400566b02bc1@redhat.com> Hi Vladimir, I filed a bug to track this issue. https://bugs.openjdk.java.net/browse/JDK-8187685 Thanks, -Zhengyu On 09/19/2017 12:52 PM, Vladimir Kozlov wrote: > On 9/19/17 9:36 AM, Zhengyu Gu wrote: >> Hi Vladimir, >> >> >> On 09/19/2017 12:14 PM, Vladimir Kozlov wrote: >>> Compilers also use a lot thread local ResourceArea - >>> Thread::current()->resource_area() and NEW_RESOURCE_ARRAY() macro. >>> But thread local area is defined as mtThread: >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/5ab7a67bc155/src/share/vm/runtime/thread.cpp#l218 >> >> >> >> Thread's resource area is general purpose per-thread storage and used >> by many subsystems. Unfortunately, NMT has no way to distinguish the >> users at this moment, categorizing as mThread is sort of placeholder. >> >> I am welcome to any suggestions > > Thread() is called from CompilerThread() and we can pass a parameter to > indicate user. Or add a virtual method to Thread class to check type of > thread. > > In the past we had compiler local changes to get information how much > memory was used during compilation but it was never get pushed. We > accessed Arena::_bytes_allocated for that. > > Thanks, > Vladimir > >> >> Thanks, >> >> -Zhengyu >> >> >>> >>> >>> Vladimir >>> >>> On 9/18/17 12:17 PM, Zhengyu Gu wrote: >>>> Compiler (C2) uses ResourceArea instead of Arena in some >>>> circumstances, so it can take advantage of ResourceMark. However, >>>> ResourceArea is tagged as mtThread, that results those memory is >>>> miscounted by NMT >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >>>> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >>>> >>>> >>>> Test: >>>> >>>> hotspot_tier1 (fastdebug and release) on Linux x64 >>>> >>>> >>>> Thanks, >>>> >>>> -Zhengyu From coleen.phillimore at oracle.com Tue Sep 19 17:55:14 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 19 Sep 2017 13:55:14 -0400 Subject: CFV: New hotspot Group Member: Markus Gronlund Message-ID: I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to Membership in the hotspot Group. Markus has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 51 changes.?? He is an expert in the area of event based tracing of Java programs. Votes are due by Tuesday, October 3, 2017. Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by replying to this mailing list. For Lazy Consensus voting instructions, see [2]. Coleen [1]http://openjdk.java.net/census#hotspot [2]http://openjdk.java.net/groups/#member-vote From vladimir.kozlov at oracle.com Tue Sep 19 17:56:57 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Sep 2017 10:56:57 -0700 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <43860830-c766-8238-2594-400566b02bc1@redhat.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> <273c1e58-2405-e3e0-1d45-56d626dbd974@redhat.com> <43860830-c766-8238-2594-400566b02bc1@redhat.com> Message-ID: <47f91523-8f92-5cfc-bea5-8ce2789219d9@oracle.com> On 9/19/17 10:19 AM, Zhengyu Gu wrote: > Hi Vladimir, > > I filed a bug to track this issue. > > https://bugs.openjdk.java.net/browse/JDK-8187685 Yes, lets do it separately. Your current fix is good. Thanks, Vladimir > > Thanks, > > -Zhengyu > > On 09/19/2017 12:52 PM, Vladimir Kozlov wrote: >> On 9/19/17 9:36 AM, Zhengyu Gu wrote: >>> Hi Vladimir, >>> >>> >>> On 09/19/2017 12:14 PM, Vladimir Kozlov wrote: >>>> Compilers also use a lot thread local ResourceArea - >>>> Thread::current()->resource_area() and NEW_RESOURCE_ARRAY() macro. >>>> But thread local area is defined as mtThread: >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/5ab7a67bc155/src/share/vm/runtime/thread.cpp#l218 >>> >>> >>> >>> >>> Thread's resource area is general purpose per-thread storage and used >>> by many subsystems. Unfortunately, NMT has no way to distinguish the >>> users at this moment, categorizing as mThread is sort of placeholder. >>> >>> I am welcome to any suggestions >> >> Thread() is called from CompilerThread() and we can pass a parameter >> to indicate user. Or add a virtual method to Thread class to check >> type of thread. >> >> In the past we had compiler local changes to get information how much >> memory was used during compilation but it was never get pushed. We >> accessed Arena::_bytes_allocated for that. >> >> Thanks, >> Vladimir >> >>> >>> Thanks, >>> >>> -Zhengyu >>> >>> >>>> >>>> >>>> Vladimir >>>> >>>> On 9/18/17 12:17 PM, Zhengyu Gu wrote: >>>>> Compiler (C2) uses ResourceArea instead of Arena in some >>>>> circumstances, so it can take advantage of ResourceMark. However, >>>>> ResourceArea is tagged as mtThread, that results those memory is >>>>> miscounted by NMT >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >>>>> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >>>>> >>>>> >>>>> Test: >>>>> >>>>> ?? hotspot_tier1 (fastdebug and release) on Linux x64 >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> -Zhengyu From vladimir.kozlov at oracle.com Tue Sep 19 17:57:34 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 19 Sep 2017 10:57:34 -0700 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: Vote: yes On 9/19/17 10:55 AM, coleen.phillimore at oracle.com wrote: > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to > Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 51 changes.?? He is an expert in the > area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > From john.r.rose at oracle.com Tue Sep 19 17:59:19 2017 From: john.r.rose at oracle.com (John Rose) Date: Tue, 19 Sep 2017 10:59:19 -0700 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: <2B6A0A0D-FF90-49BF-9BC9-29E68FD19F11@oracle.com> Vote: yes From jesper.wilhelmsson at oracle.com Tue Sep 19 17:59:52 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Tue, 19 Sep 2017 19:59:52 +0200 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: <18120822-07A2-4065-BE37-C60217549737@oracle.com> Vote: Yes. /Jesper > On 19 Sep 2017, at 19:55, coleen.phillimore at oracle.com wrote: > > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 51 changes. He is an expert in the area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by replying to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > From coleen.phillimore at oracle.com Tue Sep 19 18:05:46 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 19 Sep 2017 14:05:46 -0400 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: <16b10ecf-f468-32e0-c64f-b1da974f149a@oracle.com> Vote: yes On 9/19/17 1:55 PM, coleen.phillimore at oracle.com wrote: > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to > Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is > a Reviewer in the JDK 9 Project with 51 changes.?? He is an expert in > the area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > From ChrisPhi at LGonQn.Org Tue Sep 19 18:08:11 2017 From: ChrisPhi at LGonQn.Org (Chris Phillips) Date: Tue, 19 Sep 2017 14:08:11 -0400 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: Vote: Yes Chris On 19/09/17 01:55 PM, coleen.phillimore at oracle.com wrote: > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to > Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 51 changes.?? He is an expert in the > area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > > > > From zgu at redhat.com Tue Sep 19 18:15:04 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 19 Sep 2017 14:15:04 -0400 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <47f91523-8f92-5cfc-bea5-8ce2789219d9@oracle.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <083de739-42df-ff85-22b5-a456c6134dd2@oracle.com> <273c1e58-2405-e3e0-1d45-56d626dbd974@redhat.com> <43860830-c766-8238-2594-400566b02bc1@redhat.com> <47f91523-8f92-5cfc-bea5-8ce2789219d9@oracle.com> Message-ID: <622a981f-aedc-cb37-9f1f-910ed7719079@redhat.com> Thanks for the review, Vladimir. -Zhengyu On 09/19/2017 01:56 PM, Vladimir Kozlov wrote: > On 9/19/17 10:19 AM, Zhengyu Gu wrote: >> Hi Vladimir, >> >> I filed a bug to track this issue. >> >> https://bugs.openjdk.java.net/browse/JDK-8187685 > > Yes, lets do it separately. > Your current fix is good. > > Thanks, > Vladimir > >> >> Thanks, >> >> -Zhengyu >> >> On 09/19/2017 12:52 PM, Vladimir Kozlov wrote: >>> On 9/19/17 9:36 AM, Zhengyu Gu wrote: >>>> Hi Vladimir, >>>> >>>> >>>> On 09/19/2017 12:14 PM, Vladimir Kozlov wrote: >>>>> Compilers also use a lot thread local ResourceArea - >>>>> Thread::current()->resource_area() and NEW_RESOURCE_ARRAY() macro. >>>>> But thread local area is defined as mtThread: >>>>> >>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/5ab7a67bc155/src/share/vm/runtime/thread.cpp#l218 >>>> >>>> >>>> >>>> >>>> >>>> Thread's resource area is general purpose per-thread storage and >>>> used by many subsystems. Unfortunately, NMT has no way to >>>> distinguish the users at this moment, categorizing as mThread is >>>> sort of placeholder. >>>> >>>> I am welcome to any suggestions >>> >>> Thread() is called from CompilerThread() and we can pass a parameter >>> to indicate user. Or add a virtual method to Thread class to check >>> type of thread. >>> >>> In the past we had compiler local changes to get information how much >>> memory was used during compilation but it was never get pushed. We >>> accessed Arena::_bytes_allocated for that. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> >>>> -Zhengyu >>>> >>>> >>>>> >>>>> >>>>> Vladimir >>>>> >>>>> On 9/18/17 12:17 PM, Zhengyu Gu wrote: >>>>>> Compiler (C2) uses ResourceArea instead of Arena in some >>>>>> circumstances, so it can take advantage of ResourceMark. However, >>>>>> ResourceArea is tagged as mtThread, that results those memory is >>>>>> miscounted by NMT >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >>>>>> >>>>>> >>>>>> Test: >>>>>> >>>>>> hotspot_tier1 (fastdebug and release) on Linux x64 >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Zhengyu From daniel.daugherty at oracle.com Tue Sep 19 20:05:32 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 19 Sep 2017 14:05:32 -0600 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: <32ac5be0-05dc-12de-36a2-b58633ee6c14@oracle.com> Vote: yes Dan On 9/19/17 11:55 AM, coleen.phillimore at oracle.com wrote: > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to > Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is > a Reviewer in the JDK 9 Project with 51 changes.?? He is an expert in > the area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > > From david.holmes at oracle.com Tue Sep 19 21:05:51 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 20 Sep 2017 07:05:51 +1000 Subject: RFR: 8187667: Disable deprecation warning for readdir_r In-Reply-To: <5081712c-9c62-5b6a-2e43-9b8d6e3ca64a@oracle.com> References: <5081712c-9c62-5b6a-2e43-9b8d6e3ca64a@oracle.com> Message-ID: <0bc6d706-18f1-eddc-5cf1-ccadd52b12d7@oracle.com> Hi Erik, Reviewed! On 19/09/2017 10:42 PM, Erik Helin wrote: > Hi all, > > I'm continuing to run into some small problems when compiling HotSpot > with a more recent toolchain. It seems like readdir_r [0] has been > deprecated beginning with glibc 2.24 [1]. In HotSpot, we use readdir_r > for os::readdir on Linux (defined in os_linux.inline.hpp). Since > readdir_r most likely will stay around for a long time in glibc (even > though in deprecated form), I figured it was best to just silence the > deprecation warning from gcc. If readdir_r finally is removed one day, > then we might have to look up the appropriate readdir function using > dlopen, dlsym etc. I find it very odd that they have deprecated the thread-safe variant of this function, and recommend use of the basic readdir. I can only assume they have made readdir itself thread-safe, but given the POSIX spec does not require that, noone can take advantage without locking into knowing which glibc version they are running on! That seems an awful mess for programmers. It is a good idea to just keep using it. > Patch: > http://cr.openjdk.java.net/~ehelin/8187667/00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8187667 > > Testing: > - Compiles with: > ? - gcc 7.1.1 and glibc 2.25 on Fedora 26 > ? - gcc 4.9.2 and glibc 2.12 on OEL 6.4 > - JPRT The change will cause warnings on gcc < 4.6, so this reinforces the need to switch to our minimim gcc version ... which I forget :) Thanks, David > Thanks, > Erik > > [0]: http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html > [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=19056 From david.holmes at oracle.com Tue Sep 19 21:11:08 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 20 Sep 2017 07:11:08 +1000 Subject: RFR: 8187667: Disable deprecation warning for readdir_r In-Reply-To: <0bc6d706-18f1-eddc-5cf1-ccadd52b12d7@oracle.com> References: <5081712c-9c62-5b6a-2e43-9b8d6e3ca64a@oracle.com> <0bc6d706-18f1-eddc-5cf1-ccadd52b12d7@oracle.com> Message-ID: <87f7ef4a-3f10-475f-59ed-34c945879bef@oracle.com> FYI background reading: https://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html#Reading_002fClosing-Directory Seems the deprecation has merit. David On 20/09/2017 7:05 AM, David Holmes wrote: > Hi Erik, > > Reviewed! > > On 19/09/2017 10:42 PM, Erik Helin wrote: >> Hi all, >> >> I'm continuing to run into some small problems when compiling HotSpot >> with a more recent toolchain. It seems like readdir_r [0] has been >> deprecated beginning with glibc 2.24 [1]. In HotSpot, we use readdir_r >> for os::readdir on Linux (defined in os_linux.inline.hpp). Since >> readdir_r most likely will stay around for a long time in glibc (even >> though in deprecated form), I figured it was best to just silence the >> deprecation warning from gcc. If readdir_r finally is removed one day, >> then we might have to look up the appropriate readdir function using >> dlopen, dlsym etc. > > I find it very odd that they have deprecated the thread-safe variant of > this function, and recommend use of the basic readdir. I can only assume > they have made readdir itself thread-safe, but given the POSIX spec does > not require that, noone can take advantage without locking into knowing > which glibc version they are running on! That seems an awful mess for > programmers. > > It is a good idea to just keep using it. > >> Patch: >> http://cr.openjdk.java.net/~ehelin/8187667/00/ >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8187667 >> >> Testing: >> - Compiles with: >> ?? - gcc 7.1.1 and glibc 2.25 on Fedora 26 >> ?? - gcc 4.9.2 and glibc 2.12 on OEL 6.4 >> - JPRT > > The change will cause warnings on gcc < 4.6, so this reinforces the need > to switch to our minimim gcc version ... which I forget :) > > Thanks, > David > > >> Thanks, >> Erik >> >> [0]: >> http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html >> [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=19056 From david.holmes at oracle.com Tue Sep 19 21:18:26 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 20 Sep 2017 07:18:26 +1000 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: Vote: yes David On 20/09/2017 3:55 AM, coleen.phillimore at oracle.com wrote: > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to > Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 51 changes.?? He is an expert in the > area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > From coleen.phillimore at oracle.com Tue Sep 19 23:19:55 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 19 Sep 2017 19:19:55 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59BF8737.4070804@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <59AE775E.1070503@oracle.com> <59BF8737.4070804@oracle.com> Message-ID: <6778adf3-c943-6789-6b10-5917a94f74a9@oracle.com> Erik,? This looks like a very nice cleanup! Thanks, Coleen On 9/18/17 4:43 AM, Erik ?sterlund wrote: > Hi, > > After some off-list discussions I have made a new version with the > following improvements: > > 1) Added some comments describing the constraints on the types passed > in to inc/dec (integral or pointer, and pointers are scaled). > 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually > used pointers, only pointer sized integers. So I thought removing > these overloads and the unnecessary confusion caused by them would > make it easier to review this change. > 3) Renamed the typedef in the body representing the addend to be > called I instead of T to be consistent with the convention Kim > introduced. > > Full webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/ > > Incremental webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/ > > Thanks, > /Erik > > On 2017-09-05 12:07, Erik ?sterlund wrote: >> Hi David, >> >> On 2017-09-04 23:59, David Holmes wrote: >>> Hi Erik, >>> >>> On 4/09/2017 5:15 PM, Erik ?sterlund wrote: >>>> Hi David, >>>> >>>> On 2017-09-04 03:24, David Holmes wrote: >>>>> Hi Erik, >>>>> >>>>> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>>>>> Hi David, >>>>>> >>>>>> The shared structure for all operations is the following: >>>>>> >>>>>> An Atomic::something call creates a SomethingImpl function object >>>>>> that performs some basic type checking and then forwards the call >>>>>> straight to a PlatformSomething function object. This >>>>>> PlatformSomething object could decide to do anything. But to make >>>>>> life easier, it may inherit from a shared SomethingHelper >>>>>> function object with CRTP that calls back into the >>>>>> PlatformSomething function object to emit inline assembly. >>>>> >>>>> Right, but! Lets look at some details. >>>>> >>>>> Atomic::add >>>>> ? AddImpl >>>>> ??? PlatformAdd >>>>> ????? FetchAndAdd >>>>> ????? AddAndFetch >>>>> ????? add_using_helper >>>>> >>>>> Atomic::cmpxchg >>>>> ? CmpxchgImpl >>>>> ??? PlatformCmpxchg >>>>> ????? cmpxchg_using_helper >>>>> >>>>> Atomic::inc >>>>> ? IncImpl >>>>> ??? PlatformInc >>>>> ????? IncUsingConstant >>>>> >>>>> Why is it that the simplest operation (inc/dec) has the most >>>>> complex platform template definition? Why do we need Adjustment? >>>>> You previously said "Adjustment represents the increment/decrement >>>>> value as an IntegralConstant - your template friend for passing >>>>> around a constant with both a specified type and value in >>>>> templates". But add passes around values and doesn't need this. >>>>> Further inc/dec don't need to pass anything around anywhere - inc >>>>> adds 1, dec subtracts 1! This "1" does not need to appear anywhere >>>>> in the API or get passed across layers - the only place this "1" >>>>> becomes evident is in the actual platform asm that does the logic >>>>> of "add 1" or "subtract 1". >>>>> >>>>> My understanding from previous discussions is that much of the >>>>> template machinations was to deal with type management for "dest" >>>>> and the values being passed around. But here, for inc/dec there >>>>> are no values being passed so we don't have to make "dest" >>>>> type-compatible with any value. >>>> >>>> Dealing with different types being passed in is one part of the >>>> problem - a problem that almost all operations seems to have. But >>>> Atomic::add and inc/dec have more problems to deal with. >>>> >>>> The Atomic::add operation has two more problems that cmpxchg does >>>> not have. >>>> 1) It needs to scale pointer arithmetic. So if you have a P* and >>>> you add it by 2, then you really add the underlying value by 2 * >>>> sizeof(P), and the scaled addend needs to be of the right type - >>>> the type of the destination for integral types and ptrdiff_t for >>>> pointers. This is similar semantics to ++pointer. >>> >>> I'll address this below - but yes I overlooked this aspect. >>> >>>> 2) It connects backends with different semantics - either >>>> fetch_and_add or add_and_fetch to a common public interface with >>>> add_and_fetch semantics. >>> >>> Not at all clear why this has to manifest in the upper/middle layers >>> instead of being handled by the actual lowest-layer ?? >> >> It could have been addressed in the lowest layer indeed. I suppose >> Kim found it nicer to do that on a higher level while you find it >> nicer to do it on a lower level. I have no opinion here. >> >>> >>>> This is the reason that Atomic::add might appear more complicated >>>> than Atomic::cmpxchg. Because Atomic::cmpxchg only had the >>>> different type problems to deal with - no pointer arithmetics. >>>> >>>> The reason why Atomic::inc/dec looks more complicated than >>>> Atomic::add is that it needs to preserve the pointer arithmetic as >>>> constants rather than values, because the scaled addend is embedded >>>> in the inline assembly as immediate values. Therefore it passes >>>> around an IntegralConstant that embeds both the type and size of >>>> the addend. And it is not just 1/-1. For integral destinations the >>>> constant used is 1/-1 of the type stored at the destination. For >>>> pointers the constant is ptrdiff_t with a value representing the >>>> size of the element pointed to. >>> >>> This is insanely complicated (I think that counts as 'accidental >>> complexity' per Andrew's comment ;-) ). Pointer arithmetic is a >>> basic/fundamental part of C/C++, yet this template stuff has to jump >>> through multiple inverted hoops to do something the language "just >>> does"! All this complexity to manage a conversion addend -> addend * >>> sizeof(*dest) ?? >> >> Okay. >> >>> And the fact that inc/dec are simpler than add, yet result in far >>> more complicated templates because the simpler addend is a constant, >>> is just as unfathomable to me! >> >> My latest proposal is to nuke the Atomic::inc/dec specializations and >> make it call Atomic::add. Any objections on that? It is arguably >> simpler, and then we can leave the complexity discussion behind. >> >>>> Having said that - I am not opposed to simply removing the >>>> specializations of inc/dec if we are scared of the complexity of >>>> passing this constant to the platform layer. After running a bunch >>>> of benchmarks over the weekend, it showed no significant >>>> regressions after removal. Now of course that might not tell the >>>> full story - it could have missed that some critical operation in >>>> the JVM takes longer. But I would be very surprised if that was the >>>> case. >>> >>> I can imagine we use an "add immediate" form for inc/dec of 1, do we >>> actually use that for other values? I would expect inc_ptr/dec_ptr >>> to always translate to add_ptr, with no special case for when ptr is >>> char* and so we only add/sub 1. ?? >> >> Yes we currently only inc/sub by 1. >> >> Thanks, >> /Erik >> >>> Thanks, >>> David >>> >>>> Thanks, >>>> /Erik >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> Hope this explanation helps understanding the intended structure >>>>>> of this work. >>>>>> >>>>>> Thanks, >>>>>> /Erik >>>>>> >>>>>> On 2017-09-01 12:34, David Holmes wrote: >>>>>>> Hi Erik, >>>>>>> >>>>>>> I just wanted to add that I would expect the cmpxchg, add and >>>>>>> inc, Atomic API's to all require similar basic structure for >>>>>>> manipulating types/values etc, yet all three seem to have quite >>>>>>> different structures that I find very confusing. I'm still at a >>>>>>> loss to fathom the CRTP and the hoops we seemingly have to jump >>>>>>> through just to add or subtract 1!!! >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> On 2017-09-01 02:49, David Holmes wrote: >>>>>>>>> Hi Erik, >>>>>>>>> >>>>>>>>> Sorry but this one is really losing me. >>>>>>>>> >>>>>>>>> What is the role of Adjustment ?? >>>>>>>> >>>>>>>> Adjustment represents the increment/decrement value as an >>>>>>>> IntegralConstant - your template friend for passing around a >>>>>>>> constant with both a specified type and value in templates. The >>>>>>>> type of the increment/decrement is the type of the destination >>>>>>>> when the destination is an integral type, otherwise if it is a >>>>>>>> pointer type, the increment/decrement type is ptrdiff_t. >>>>>>>> >>>>>>>>> How are inc/dec anything but "using constant" ?? >>>>>>>> >>>>>>>> I was also a bit torn on that name (I assume you are referring >>>>>>>> to IncUsingConstant/DecUsingConstant). It was hard to find a >>>>>>>> name that depicted what this platform helper does. I considered >>>>>>>> calling the helper something with immediate in the name because >>>>>>>> it is really used to embed the constant as immediate values in >>>>>>>> inline assembly today. But then again that seemed too specific, >>>>>>>> as it is not completely obvious platform specializations will >>>>>>>> use it in that way. One might just want to specialize this to >>>>>>>> send it into some compiler Atomic::inc intrinsic for example. >>>>>>>> Do you have any other preferred names? Here are a few possible >>>>>>>> names for IncUsingConstant: >>>>>>>> >>>>>>>> IncUsingScaledConstant >>>>>>>> IncUsingAdjustedConstant >>>>>>>> IncUsingPlatformHelper >>>>>>>> >>>>>>>> Any favourites? >>>>>>>> >>>>>>>>> Why do we special case jshort?? >>>>>>>> >>>>>>>> To be consistent with the special case of Atomic::add on >>>>>>>> jshort. Do you want it removed? >>>>>>>> >>>>>>>>> This is indecipherable to normal people ;-) >>>>>>>>> >>>>>>>>> ?This()->template inc(dest); >>>>>>>>> >>>>>>>>> For something as trivial as adding or subtracting 1 the >>>>>>>>> template machinations here are just mind boggling! >>>>>>>> >>>>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ >>>>>>>> idiom. The idea is to devirtualize a virtual call by passing in >>>>>>>> the derived type as a template parameter to a base class, and >>>>>>>> then let the base class static_cast to the derived class to >>>>>>>> devirtualize the call. I hope this explanation sheds some light >>>>>>>> on what is going on. The same CRTP idiom was used in the >>>>>>>> Atomic::add implementation in a similar fashion. >>>>>>>> >>>>>>>> I will add some comments describing this in the next round >>>>>>>> after Coleen replies. >>>>>>>> >>>>>>>> Thanks for looking at this. >>>>>>>> >>>>>>>> /Erik >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> Bug ID: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>>>>>> >>>>>>>>>> Webrev: >>>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>>>>>> >>>>>>>>>> The time has come for the next step in generalizing Atomic >>>>>>>>>> with templates. Today I will focus on Atomic::inc/dec. >>>>>>>>>> >>>>>>>>>> I have tried to mimic the new Kim style that seems to have >>>>>>>>>> been universally accepted. Like Atomic::add and >>>>>>>>>> Atomic::cmpxchg, the structure looks like this: >>>>>>>>>> >>>>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() >>>>>>>>>> function object that performs some basic type checks. >>>>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that >>>>>>>>>> can define the operation arbitrarily for a given platform. >>>>>>>>>> The default implementation if not specialized for a platform >>>>>>>>>> is to call Atomic::add. So only platforms that want to do >>>>>>>>>> something different than that as an optimization have to >>>>>>>>>> provide a specialization. >>>>>>>>>> Layer 3) Platforms that decide to specialize >>>>>>>>>> PlatformInc/PlatformDec to be more optimized may inherit from >>>>>>>>>> a helper class IncUsingConstant/DecUsingConstant. This helper >>>>>>>>>> helps performing the necessary computation what the >>>>>>>>>> increment/decrement should be after pointer scaling using >>>>>>>>>> CRTP. The PlatformInc/PlatformDec operation then only needs >>>>>>>>>> to define an inc/dec member function, and will then get all >>>>>>>>>> the context information necessary to generate a more >>>>>>>>>> optimized implementation. Easy peasy. >>>>>>>>>> >>>>>>>>>> It is worth noticing that the generalized Atomic::dec >>>>>>>>>> operation assumes a two's complement integer machine and >>>>>>>>>> potentially sends the unary negative of a potentially >>>>>>>>>> unsigned type to Atomic::add. I have the following comments >>>>>>>>>> about this: >>>>>>>>>> 1) We already assume in other code that two's complement >>>>>>>>>> integers must be present. >>>>>>>>>> 2) A machine that does not have two's complement integers may >>>>>>>>>> still simply provide a specialization that solves the problem >>>>>>>>>> in a different way. >>>>>>>>>> 3) The alternative that does not make assumptions about that >>>>>>>>>> would use the good old IntegerTypes::cast_to_signed >>>>>>>>>> metaprogramming stuff, and I seem to recall we thought that >>>>>>>>>> was a bit too involved and complicated. >>>>>>>>>> This is the reason why I have chosen to use unary minus on >>>>>>>>>> the potentially unsigned type in the shared helper code that >>>>>>>>>> sends the decrement as an addend to Atomic::add. >>>>>>>>>> >>>>>>>>>> It would also be nice if somebody with access to PPC and s390 >>>>>>>>>> machines could try out the relevant changes there so I do not >>>>>>>>>> accidentally break those platforms. I have blind-coded the >>>>>>>>>> addition of the immediate values passed in to the inline >>>>>>>>>> assembly in a way that I think looks like it should work. >>>>>>>>>> >>>>>>>>>> Testing: >>>>>>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> /Erik >>>>>>>> >>>>>> >>>> >> > From david.holmes at oracle.com Wed Sep 20 00:05:43 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 20 Sep 2017 10:05:43 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59BF8737.4070804@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <59AE775E.1070503@oracle.com> <59BF8737.4070804@oracle.com> Message-ID: <62f0ce2a-3e50-9764-3dae-48d704d4c1a2@oracle.com> Hi Erik, It's a Thumbs Up from me! On 18/09/2017 6:43 PM, Erik ?sterlund wrote: > Hi, > > After some off-list discussions I have made a new version with the > following improvements: > > 1) Added some comments describing the constraints on the types passed in > to inc/dec (integral or pointer, and pointers are scaled). Looks fine. Though from previous discussions isn't it the case that we never actually do any of those ptr inc/dec operations? > 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually > used pointers, only pointer sized integers. So I thought removing these > overloads and the unnecessary confusion caused by them would make it > easier to review this change. Yes! Thank you. This was extremely confusing and misleading. I really should try and dig into the history of this ... but don't have time :( > 3) Renamed the typedef in the body representing the addend to be called > I instead of T to be consistent with the convention Kim introduced. Ok. Thanks, David ----- > Full webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/ > > Incremental webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/ > > Thanks, > /Erik > > On 2017-09-05 12:07, Erik ?sterlund wrote: >> Hi David, >> >> On 2017-09-04 23:59, David Holmes wrote: >>> Hi Erik, >>> >>> On 4/09/2017 5:15 PM, Erik ?sterlund wrote: >>>> Hi David, >>>> >>>> On 2017-09-04 03:24, David Holmes wrote: >>>>> Hi Erik, >>>>> >>>>> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>>>>> Hi David, >>>>>> >>>>>> The shared structure for all operations is the following: >>>>>> >>>>>> An Atomic::something call creates a SomethingImpl function object >>>>>> that performs some basic type checking and then forwards the call >>>>>> straight to a PlatformSomething function object. This >>>>>> PlatformSomething object could decide to do anything. But to make >>>>>> life easier, it may inherit from a shared SomethingHelper function >>>>>> object with CRTP that calls back into the PlatformSomething >>>>>> function object to emit inline assembly. >>>>> >>>>> Right, but! Lets look at some details. >>>>> >>>>> Atomic::add >>>>> ? AddImpl >>>>> ??? PlatformAdd >>>>> ????? FetchAndAdd >>>>> ????? AddAndFetch >>>>> ????? add_using_helper >>>>> >>>>> Atomic::cmpxchg >>>>> ? CmpxchgImpl >>>>> ??? PlatformCmpxchg >>>>> ????? cmpxchg_using_helper >>>>> >>>>> Atomic::inc >>>>> ? IncImpl >>>>> ??? PlatformInc >>>>> ????? IncUsingConstant >>>>> >>>>> Why is it that the simplest operation (inc/dec) has the most >>>>> complex platform template definition? Why do we need Adjustment? >>>>> You previously said "Adjustment represents the increment/decrement >>>>> value as an IntegralConstant - your template friend for passing >>>>> around a constant with both a specified type and value in >>>>> templates". But add passes around values and doesn't need this. >>>>> Further inc/dec don't need to pass anything around anywhere - inc >>>>> adds 1, dec subtracts 1! This "1" does not need to appear anywhere >>>>> in the API or get passed across layers - the only place this "1" >>>>> becomes evident is in the actual platform asm that does the logic >>>>> of "add 1" or "subtract 1". >>>>> >>>>> My understanding from previous discussions is that much of the >>>>> template machinations was to deal with type management for "dest" >>>>> and the values being passed around. But here, for inc/dec there are >>>>> no values being passed so we don't have to make "dest" >>>>> type-compatible with any value. >>>> >>>> Dealing with different types being passed in is one part of the >>>> problem - a problem that almost all operations seems to have. But >>>> Atomic::add and inc/dec have more problems to deal with. >>>> >>>> The Atomic::add operation has two more problems that cmpxchg does >>>> not have. >>>> 1) It needs to scale pointer arithmetic. So if you have a P* and you >>>> add it by 2, then you really add the underlying value by 2 * >>>> sizeof(P), and the scaled addend needs to be of the right type - the >>>> type of the destination for integral types and ptrdiff_t for >>>> pointers. This is similar semantics to ++pointer. >>> >>> I'll address this below - but yes I overlooked this aspect. >>> >>>> 2) It connects backends with different semantics - either >>>> fetch_and_add or add_and_fetch to a common public interface with >>>> add_and_fetch semantics. >>> >>> Not at all clear why this has to manifest in the upper/middle layers >>> instead of being handled by the actual lowest-layer ?? >> >> It could have been addressed in the lowest layer indeed. I suppose Kim >> found it nicer to do that on a higher level while you find it nicer to >> do it on a lower level. I have no opinion here. >> >>> >>>> This is the reason that Atomic::add might appear more complicated >>>> than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different >>>> type problems to deal with - no pointer arithmetics. >>>> >>>> The reason why Atomic::inc/dec looks more complicated than >>>> Atomic::add is that it needs to preserve the pointer arithmetic as >>>> constants rather than values, because the scaled addend is embedded >>>> in the inline assembly as immediate values. Therefore it passes >>>> around an IntegralConstant that embeds both the type and size of the >>>> addend. And it is not just 1/-1. For integral destinations the >>>> constant used is 1/-1 of the type stored at the destination. For >>>> pointers the constant is ptrdiff_t with a value representing the >>>> size of the element pointed to. >>> >>> This is insanely complicated (I think that counts as 'accidental >>> complexity' per Andrew's comment ;-) ). Pointer arithmetic is a >>> basic/fundamental part of C/C++, yet this template stuff has to jump >>> through multiple inverted hoops to do something the language "just >>> does"! All this complexity to manage a conversion addend -> addend * >>> sizeof(*dest) ?? >> >> Okay. >> >>> And the fact that inc/dec are simpler than add, yet result in far >>> more complicated templates because the simpler addend is a constant, >>> is just as unfathomable to me! >> >> My latest proposal is to nuke the Atomic::inc/dec specializations and >> make it call Atomic::add. Any objections on that? It is arguably >> simpler, and then we can leave the complexity discussion behind. >> >>>> Having said that - I am not opposed to simply removing the >>>> specializations of inc/dec if we are scared of the complexity of >>>> passing this constant to the platform layer. After running a bunch >>>> of benchmarks over the weekend, it showed no significant regressions >>>> after removal. Now of course that might not tell the full story - it >>>> could have missed that some critical operation in the JVM takes >>>> longer. But I would be very surprised if that was the case. >>> >>> I can imagine we use an "add immediate" form for inc/dec of 1, do we >>> actually use that for other values? I would expect inc_ptr/dec_ptr to >>> always translate to add_ptr, with no special case for when ptr is >>> char* and so we only add/sub 1. ?? >> >> Yes we currently only inc/sub by 1. >> >> Thanks, >> /Erik >> >>> Thanks, >>> David >>> >>>> Thanks, >>>> /Erik >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> Hope this explanation helps understanding the intended structure >>>>>> of this work. >>>>>> >>>>>> Thanks, >>>>>> /Erik >>>>>> >>>>>> On 2017-09-01 12:34, David Holmes wrote: >>>>>>> Hi Erik, >>>>>>> >>>>>>> I just wanted to add that I would expect the cmpxchg, add and >>>>>>> inc, Atomic API's to all require similar basic structure for >>>>>>> manipulating types/values etc, yet all three seem to have quite >>>>>>> different structures that I find very confusing. I'm still at a >>>>>>> loss to fathom the CRTP and the hoops we seemingly have to jump >>>>>>> through just to add or subtract 1!!! >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>>>>>> Hi David, >>>>>>>> >>>>>>>> On 2017-09-01 02:49, David Holmes wrote: >>>>>>>>> Hi Erik, >>>>>>>>> >>>>>>>>> Sorry but this one is really losing me. >>>>>>>>> >>>>>>>>> What is the role of Adjustment ?? >>>>>>>> >>>>>>>> Adjustment represents the increment/decrement value as an >>>>>>>> IntegralConstant - your template friend for passing around a >>>>>>>> constant with both a specified type and value in templates. The >>>>>>>> type of the increment/decrement is the type of the destination >>>>>>>> when the destination is an integral type, otherwise if it is a >>>>>>>> pointer type, the increment/decrement type is ptrdiff_t. >>>>>>>> >>>>>>>>> How are inc/dec anything but "using constant" ?? >>>>>>>> >>>>>>>> I was also a bit torn on that name (I assume you are referring >>>>>>>> to IncUsingConstant/DecUsingConstant). It was hard to find a >>>>>>>> name that depicted what this platform helper does. I considered >>>>>>>> calling the helper something with immediate in the name because >>>>>>>> it is really used to embed the constant as immediate values in >>>>>>>> inline assembly today. But then again that seemed too specific, >>>>>>>> as it is not completely obvious platform specializations will >>>>>>>> use it in that way. One might just want to specialize this to >>>>>>>> send it into some compiler Atomic::inc intrinsic for example. Do >>>>>>>> you have any other preferred names? Here are a few possible >>>>>>>> names for IncUsingConstant: >>>>>>>> >>>>>>>> IncUsingScaledConstant >>>>>>>> IncUsingAdjustedConstant >>>>>>>> IncUsingPlatformHelper >>>>>>>> >>>>>>>> Any favourites? >>>>>>>> >>>>>>>>> Why do we special case jshort?? >>>>>>>> >>>>>>>> To be consistent with the special case of Atomic::add on jshort. >>>>>>>> Do you want it removed? >>>>>>>> >>>>>>>>> This is indecipherable to normal people ;-) >>>>>>>>> >>>>>>>>> ?This()->template inc(dest); >>>>>>>>> >>>>>>>>> For something as trivial as adding or subtracting 1 the >>>>>>>>> template machinations here are just mind boggling! >>>>>>>> >>>>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ >>>>>>>> idiom. The idea is to devirtualize a virtual call by passing in >>>>>>>> the derived type as a template parameter to a base class, and >>>>>>>> then let the base class static_cast to the derived class to >>>>>>>> devirtualize the call. I hope this explanation sheds some light >>>>>>>> on what is going on. The same CRTP idiom was used in the >>>>>>>> Atomic::add implementation in a similar fashion. >>>>>>>> >>>>>>>> I will add some comments describing this in the next round after >>>>>>>> Coleen replies. >>>>>>>> >>>>>>>> Thanks for looking at this. >>>>>>>> >>>>>>>> /Erik >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> Bug ID: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>>>>>> >>>>>>>>>> Webrev: >>>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>>>>>> >>>>>>>>>> The time has come for the next step in generalizing Atomic >>>>>>>>>> with templates. Today I will focus on Atomic::inc/dec. >>>>>>>>>> >>>>>>>>>> I have tried to mimic the new Kim style that seems to have >>>>>>>>>> been universally accepted. Like Atomic::add and >>>>>>>>>> Atomic::cmpxchg, the structure looks like this: >>>>>>>>>> >>>>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>>>>>>> object that performs some basic type checks. >>>>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that >>>>>>>>>> can define the operation arbitrarily for a given platform. The >>>>>>>>>> default implementation if not specialized for a platform is to >>>>>>>>>> call Atomic::add. So only platforms that want to do something >>>>>>>>>> different than that as an optimization have to provide a >>>>>>>>>> specialization. >>>>>>>>>> Layer 3) Platforms that decide to specialize >>>>>>>>>> PlatformInc/PlatformDec to be more optimized may inherit from >>>>>>>>>> a helper class IncUsingConstant/DecUsingConstant. This helper >>>>>>>>>> helps performing the necessary computation what the >>>>>>>>>> increment/decrement should be after pointer scaling using >>>>>>>>>> CRTP. The PlatformInc/PlatformDec operation then only needs to >>>>>>>>>> define an inc/dec member function, and will then get all the >>>>>>>>>> context information necessary to generate a more optimized >>>>>>>>>> implementation. Easy peasy. >>>>>>>>>> >>>>>>>>>> It is worth noticing that the generalized Atomic::dec >>>>>>>>>> operation assumes a two's complement integer machine and >>>>>>>>>> potentially sends the unary negative of a potentially unsigned >>>>>>>>>> type to Atomic::add. I have the following comments about this: >>>>>>>>>> 1) We already assume in other code that two's complement >>>>>>>>>> integers must be present. >>>>>>>>>> 2) A machine that does not have two's complement integers may >>>>>>>>>> still simply provide a specialization that solves the problem >>>>>>>>>> in a different way. >>>>>>>>>> 3) The alternative that does not make assumptions about that >>>>>>>>>> would use the good old IntegerTypes::cast_to_signed >>>>>>>>>> metaprogramming stuff, and I seem to recall we thought that >>>>>>>>>> was a bit too involved and complicated. >>>>>>>>>> This is the reason why I have chosen to use unary minus on the >>>>>>>>>> potentially unsigned type in the shared helper code that sends >>>>>>>>>> the decrement as an addend to Atomic::add. >>>>>>>>>> >>>>>>>>>> It would also be nice if somebody with access to PPC and s390 >>>>>>>>>> machines could try out the relevant changes there so I do not >>>>>>>>>> accidentally break those platforms. I have blind-coded the >>>>>>>>>> addition of the immediate values passed in to the inline >>>>>>>>>> assembly in a way that I think looks like it should work. >>>>>>>>>> >>>>>>>>>> Testing: >>>>>>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> /Erik >>>>>>>> >>>>>> >>>> >> > From erik.osterlund at oracle.com Wed Sep 20 07:10:59 2017 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Wed, 20 Sep 2017 09:10:59 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <6778adf3-c943-6789-6b10-5917a94f74a9@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <59AE775E.1070503@oracle.com> <59BF8737.4070804@oracle.com> <6778adf3-c943-6789-6b10-5917a94f74a9@oracle.com> Message-ID: Hi Coleen, Thanks for the review. /Erik > On 20 Sep 2017, at 01:19, coleen.phillimore at oracle.com wrote: > > > Erik, This looks like a very nice cleanup! > Thanks, > Coleen > > On 9/18/17 4:43 AM, Erik ?sterlund wrote: >> Hi, >> >> After some off-list discussions I have made a new version with the following improvements: >> >> 1) Added some comments describing the constraints on the types passed in to inc/dec (integral or pointer, and pointers are scaled). >> 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually used pointers, only pointer sized integers. So I thought removing these overloads and the unnecessary confusion caused by them would make it easier to review this change. >> 3) Renamed the typedef in the body representing the addend to be called I instead of T to be consistent with the convention Kim introduced. >> >> Full webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/ >> >> Incremental webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/ >> >> Thanks, >> /Erik >> >> On 2017-09-05 12:07, Erik ?sterlund wrote: >>> Hi David, >>> >>> On 2017-09-04 23:59, David Holmes wrote: >>>> Hi Erik, >>>> >>>> On 4/09/2017 5:15 PM, Erik ?sterlund wrote: >>>>> Hi David, >>>>> >>>>> On 2017-09-04 03:24, David Holmes wrote: >>>>>> Hi Erik, >>>>>> >>>>>> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> The shared structure for all operations is the following: >>>>>>> >>>>>>> An Atomic::something call creates a SomethingImpl function object that performs some basic type checking and then forwards the call straight to a PlatformSomething function object. This PlatformSomething object could decide to do anything. But to make life easier, it may inherit from a shared SomethingHelper function object with CRTP that calls back into the PlatformSomething function object to emit inline assembly. >>>>>> >>>>>> Right, but! Lets look at some details. >>>>>> >>>>>> Atomic::add >>>>>> AddImpl >>>>>> PlatformAdd >>>>>> FetchAndAdd >>>>>> AddAndFetch >>>>>> add_using_helper >>>>>> >>>>>> Atomic::cmpxchg >>>>>> CmpxchgImpl >>>>>> PlatformCmpxchg >>>>>> cmpxchg_using_helper >>>>>> >>>>>> Atomic::inc >>>>>> IncImpl >>>>>> PlatformInc >>>>>> IncUsingConstant >>>>>> >>>>>> Why is it that the simplest operation (inc/dec) has the most complex platform template definition? Why do we need Adjustment? You previously said "Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates". But add passes around values and doesn't need this. Further inc/dec don't need to pass anything around anywhere - inc adds 1, dec subtracts 1! This "1" does not need to appear anywhere in the API or get passed across layers - the only place this "1" becomes evident is in the actual platform asm that does the logic of "add 1" or "subtract 1". >>>>>> >>>>>> My understanding from previous discussions is that much of the template machinations was to deal with type management for "dest" and the values being passed around. But here, for inc/dec there are no values being passed so we don't have to make "dest" type-compatible with any value. >>>>> >>>>> Dealing with different types being passed in is one part of the problem - a problem that almost all operations seems to have. But Atomic::add and inc/dec have more problems to deal with. >>>>> >>>>> The Atomic::add operation has two more problems that cmpxchg does not have. >>>>> 1) It needs to scale pointer arithmetic. So if you have a P* and you add it by 2, then you really add the underlying value by 2 * sizeof(P), and the scaled addend needs to be of the right type - the type of the destination for integral types and ptrdiff_t for pointers. This is similar semantics to ++pointer. >>>> >>>> I'll address this below - but yes I overlooked this aspect. >>>> >>>>> 2) It connects backends with different semantics - either fetch_and_add or add_and_fetch to a common public interface with add_and_fetch semantics. >>>> >>>> Not at all clear why this has to manifest in the upper/middle layers instead of being handled by the actual lowest-layer ?? >>> >>> It could have been addressed in the lowest layer indeed. I suppose Kim found it nicer to do that on a higher level while you find it nicer to do it on a lower level. I have no opinion here. >>> >>>> >>>>> This is the reason that Atomic::add might appear more complicated than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type problems to deal with - no pointer arithmetics. >>>>> >>>>> The reason why Atomic::inc/dec looks more complicated than Atomic::add is that it needs to preserve the pointer arithmetic as constants rather than values, because the scaled addend is embedded in the inline assembly as immediate values. Therefore it passes around an IntegralConstant that embeds both the type and size of the addend. And it is not just 1/-1. For integral destinations the constant used is 1/-1 of the type stored at the destination. For pointers the constant is ptrdiff_t with a value representing the size of the element pointed to. >>>> >>>> This is insanely complicated (I think that counts as 'accidental complexity' per Andrew's comment ;-) ). Pointer arithmetic is a basic/fundamental part of C/C++, yet this template stuff has to jump through multiple inverted hoops to do something the language "just does"! All this complexity to manage a conversion addend -> addend * sizeof(*dest) ?? >>> >>> Okay. >>> >>>> And the fact that inc/dec are simpler than add, yet result in far more complicated templates because the simpler addend is a constant, is just as unfathomable to me! >>> >>> My latest proposal is to nuke the Atomic::inc/dec specializations and make it call Atomic::add. Any objections on that? It is arguably simpler, and then we can leave the complexity discussion behind. >>> >>>>> Having said that - I am not opposed to simply removing the specializations of inc/dec if we are scared of the complexity of passing this constant to the platform layer. After running a bunch of benchmarks over the weekend, it showed no significant regressions after removal. Now of course that might not tell the full story - it could have missed that some critical operation in the JVM takes longer. But I would be very surprised if that was the case. >>>> >>>> I can imagine we use an "add immediate" form for inc/dec of 1, do we actually use that for other values? I would expect inc_ptr/dec_ptr to always translate to add_ptr, with no special case for when ptr is char* and so we only add/sub 1. ?? >>> >>> Yes we currently only inc/sub by 1. >>> >>> Thanks, >>> /Erik >>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> /Erik >>>>> >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Hope this explanation helps understanding the intended structure of this work. >>>>>>> >>>>>>> Thanks, >>>>>>> /Erik >>>>>>> >>>>>>> On 2017-09-01 12:34, David Holmes wrote: >>>>>>>> Hi Erik, >>>>>>>> >>>>>>>> I just wanted to add that I would expect the cmpxchg, add and inc, Atomic API's to all require similar basic structure for manipulating types/values etc, yet all three seem to have quite different structures that I find very confusing. I'm still at a loss to fathom the CRTP and the hoops we seemingly have to jump through just to add or subtract 1!!! >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> On 2017-09-01 02:49, David Holmes wrote: >>>>>>>>>> Hi Erik, >>>>>>>>>> >>>>>>>>>> Sorry but this one is really losing me. >>>>>>>>>> >>>>>>>>>> What is the role of Adjustment ?? >>>>>>>>> >>>>>>>>> Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates. The type of the increment/decrement is the type of the destination when the destination is an integral type, otherwise if it is a pointer type, the increment/decrement type is ptrdiff_t. >>>>>>>>> >>>>>>>>>> How are inc/dec anything but "using constant" ?? >>>>>>>>> >>>>>>>>> I was also a bit torn on that name (I assume you are referring to IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted what this platform helper does. I considered calling the helper something with immediate in the name because it is really used to embed the constant as immediate values in inline assembly today. But then again that seemed too specific, as it is not completely obvious platform specializations will use it in that way. One might just want to specialize this to send it into some compiler Atomic::inc intrinsic for example. Do you have any other preferred names? Here are a few possible names for IncUsingConstant: >>>>>>>>> >>>>>>>>> IncUsingScaledConstant >>>>>>>>> IncUsingAdjustedConstant >>>>>>>>> IncUsingPlatformHelper >>>>>>>>> >>>>>>>>> Any favourites? >>>>>>>>> >>>>>>>>>> Why do we special case jshort?? >>>>>>>>> >>>>>>>>> To be consistent with the special case of Atomic::add on jshort. Do you want it removed? >>>>>>>>> >>>>>>>>>> This is indecipherable to normal people ;-) >>>>>>>>>> >>>>>>>>>> This()->template inc(dest); >>>>>>>>>> >>>>>>>>>> For something as trivial as adding or subtracting 1 the template machinations here are just mind boggling! >>>>>>>>> >>>>>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The idea is to devirtualize a virtual call by passing in the derived type as a template parameter to a base class, and then let the base class static_cast to the derived class to devirtualize the call. I hope this explanation sheds some light on what is going on. The same CRTP idiom was used in the Atomic::add implementation in a similar fashion. >>>>>>>>> >>>>>>>>> I will add some comments describing this in the next round after Coleen replies. >>>>>>>>> >>>>>>>>> Thanks for looking at this. >>>>>>>>> >>>>>>>>> /Erik >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> Bug ID: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> The time has come for the next step in generalizing Atomic with templates. Today I will focus on Atomic::inc/dec. >>>>>>>>>>> >>>>>>>>>>> I have tried to mimic the new Kim style that seems to have been universally accepted. Like Atomic::add and Atomic::cmpxchg, the structure looks like this: >>>>>>>>>>> >>>>>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object that performs some basic type checks. >>>>>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define the operation arbitrarily for a given platform. The default implementation if not specialized for a platform is to call Atomic::add. So only platforms that want to do something different than that as an optimization have to provide a specialization. >>>>>>>>>>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to be more optimized may inherit from a helper class IncUsingConstant/DecUsingConstant. This helper helps performing the necessary computation what the increment/decrement should be after pointer scaling using CRTP. The PlatformInc/PlatformDec operation then only needs to define an inc/dec member function, and will then get all the context information necessary to generate a more optimized implementation. Easy peasy. >>>>>>>>>>> >>>>>>>>>>> It is worth noticing that the generalized Atomic::dec operation assumes a two's complement integer machine and potentially sends the unary negative of a potentially unsigned type to Atomic::add. I have the following comments about this: >>>>>>>>>>> 1) We already assume in other code that two's complement integers must be present. >>>>>>>>>>> 2) A machine that does not have two's complement integers may still simply provide a specialization that solves the problem in a different way. >>>>>>>>>>> 3) The alternative that does not make assumptions about that would use the good old IntegerTypes::cast_to_signed metaprogramming stuff, and I seem to recall we thought that was a bit too involved and complicated. >>>>>>>>>>> This is the reason why I have chosen to use unary minus on the potentially unsigned type in the shared helper code that sends the decrement as an addend to Atomic::add. >>>>>>>>>>> >>>>>>>>>>> It would also be nice if somebody with access to PPC and s390 machines could try out the relevant changes there so I do not accidentally break those platforms. I have blind-coded the addition of the immediate values passed in to the inline assembly in a way that I think looks like it should work. >>>>>>>>>>> >>>>>>>>>>> Testing: >>>>>>>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> /Erik >>>>>>>>> >>>>>>> >>>>> >>> >> > From erik.osterlund at oracle.com Wed Sep 20 07:15:17 2017 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Wed, 20 Sep 2017 09:15:17 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <62f0ce2a-3e50-9764-3dae-48d704d4c1a2@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> <8af30028-5ab1-5623-21c7-f6e2ce2ab8cb@oracle.com> <59AE775E.1070503@oracle.com> <59BF8737.4070804@oracle.com> <62f0ce2a-3e50-9764-3dae-48d704d4c1a2@oracle.com> Message-ID: Hi David, > On 20 Sep 2017, at 02:05, David Holmes wrote: > > Hi Erik, > > It's a Thumbs Up from me! Glad to hear it. > On 18/09/2017 6:43 PM, Erik ?sterlund wrote: >> Hi, >> After some off-list discussions I have made a new version with the following improvements: >> 1) Added some comments describing the constraints on the types passed in to inc/dec (integral or pointer, and pointers are scaled). > > Looks fine. Though from previous discussions isn't it the case that we never actually do any of those ptr inc/dec operations? At the moment we do not do it - true, but perhaps tomorrow somebody might, and then I don?t want the user of the API to get surprised. > >> 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually used pointers, only pointer sized integers. So I thought removing these overloads and the unnecessary confusion caused by them would make it easier to review this change. > > Yes! Thank you. This was extremely confusing and misleading. I really should try and dig into the history of this ... but don't have time :( :) > >> 3) Renamed the typedef in the body representing the addend to be called I instead of T to be consistent with the convention Kim introduced. > > Ok. Thanks for the review. /Erik > > Thanks, > David > ----- > >> Full webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/ >> Incremental webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/ >> Thanks, >> /Erik >> On 2017-09-05 12:07, Erik ?sterlund wrote: >>> Hi David, >>> >>> On 2017-09-04 23:59, David Holmes wrote: >>>> Hi Erik, >>>> >>>> On 4/09/2017 5:15 PM, Erik ?sterlund wrote: >>>>> Hi David, >>>>> >>>>> On 2017-09-04 03:24, David Holmes wrote: >>>>>> Hi Erik, >>>>>> >>>>>> On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> The shared structure for all operations is the following: >>>>>>> >>>>>>> An Atomic::something call creates a SomethingImpl function object that performs some basic type checking and then forwards the call straight to a PlatformSomething function object. This PlatformSomething object could decide to do anything. But to make life easier, it may inherit from a shared SomethingHelper function object with CRTP that calls back into the PlatformSomething function object to emit inline assembly. >>>>>> >>>>>> Right, but! Lets look at some details. >>>>>> >>>>>> Atomic::add >>>>>> AddImpl >>>>>> PlatformAdd >>>>>> FetchAndAdd >>>>>> AddAndFetch >>>>>> add_using_helper >>>>>> >>>>>> Atomic::cmpxchg >>>>>> CmpxchgImpl >>>>>> PlatformCmpxchg >>>>>> cmpxchg_using_helper >>>>>> >>>>>> Atomic::inc >>>>>> IncImpl >>>>>> PlatformInc >>>>>> IncUsingConstant >>>>>> >>>>>> Why is it that the simplest operation (inc/dec) has the most complex platform template definition? Why do we need Adjustment? You previously said "Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates". But add passes around values and doesn't need this. Further inc/dec don't need to pass anything around anywhere - inc adds 1, dec subtracts 1! This "1" does not need to appear anywhere in the API or get passed across layers - the only place this "1" becomes evident is in the actual platform asm that does the logic of "add 1" or "subtract 1". >>>>>> >>>>>> My understanding from previous discussions is that much of the template machinations was to deal with type management for "dest" and the values being passed around. But here, for inc/dec there are no values being passed so we don't have to make "dest" type-compatible with any value. >>>>> >>>>> Dealing with different types being passed in is one part of the problem - a problem that almost all operations seems to have. But Atomic::add and inc/dec have more problems to deal with. >>>>> >>>>> The Atomic::add operation has two more problems that cmpxchg does not have. >>>>> 1) It needs to scale pointer arithmetic. So if you have a P* and you add it by 2, then you really add the underlying value by 2 * sizeof(P), and the scaled addend needs to be of the right type - the type of the destination for integral types and ptrdiff_t for pointers. This is similar semantics to ++pointer. >>>> >>>> I'll address this below - but yes I overlooked this aspect. >>>> >>>>> 2) It connects backends with different semantics - either fetch_and_add or add_and_fetch to a common public interface with add_and_fetch semantics. >>>> >>>> Not at all clear why this has to manifest in the upper/middle layers instead of being handled by the actual lowest-layer ?? >>> >>> It could have been addressed in the lowest layer indeed. I suppose Kim found it nicer to do that on a higher level while you find it nicer to do it on a lower level. I have no opinion here. >>> >>>> >>>>> This is the reason that Atomic::add might appear more complicated than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type problems to deal with - no pointer arithmetics. >>>>> >>>>> The reason why Atomic::inc/dec looks more complicated than Atomic::add is that it needs to preserve the pointer arithmetic as constants rather than values, because the scaled addend is embedded in the inline assembly as immediate values. Therefore it passes around an IntegralConstant that embeds both the type and size of the addend. And it is not just 1/-1. For integral destinations the constant used is 1/-1 of the type stored at the destination. For pointers the constant is ptrdiff_t with a value representing the size of the element pointed to. >>>> >>>> This is insanely complicated (I think that counts as 'accidental complexity' per Andrew's comment ;-) ). Pointer arithmetic is a basic/fundamental part of C/C++, yet this template stuff has to jump through multiple inverted hoops to do something the language "just does"! All this complexity to manage a conversion addend -> addend * sizeof(*dest) ?? >>> >>> Okay. >>> >>>> And the fact that inc/dec are simpler than add, yet result in far more complicated templates because the simpler addend is a constant, is just as unfathomable to me! >>> >>> My latest proposal is to nuke the Atomic::inc/dec specializations and make it call Atomic::add. Any objections on that? It is arguably simpler, and then we can leave the complexity discussion behind. >>> >>>>> Having said that - I am not opposed to simply removing the specializations of inc/dec if we are scared of the complexity of passing this constant to the platform layer. After running a bunch of benchmarks over the weekend, it showed no significant regressions after removal. Now of course that might not tell the full story - it could have missed that some critical operation in the JVM takes longer. But I would be very surprised if that was the case. >>>> >>>> I can imagine we use an "add immediate" form for inc/dec of 1, do we actually use that for other values? I would expect inc_ptr/dec_ptr to always translate to add_ptr, with no special case for when ptr is char* and so we only add/sub 1. ?? >>> >>> Yes we currently only inc/sub by 1. >>> >>> Thanks, >>> /Erik >>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> /Erik >>>>> >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Hope this explanation helps understanding the intended structure of this work. >>>>>>> >>>>>>> Thanks, >>>>>>> /Erik >>>>>>> >>>>>>> On 2017-09-01 12:34, David Holmes wrote: >>>>>>>> Hi Erik, >>>>>>>> >>>>>>>> I just wanted to add that I would expect the cmpxchg, add and inc, Atomic API's to all require similar basic structure for manipulating types/values etc, yet all three seem to have quite different structures that I find very confusing. I'm still at a loss to fathom the CRTP and the hoops we seemingly have to jump through just to add or subtract 1!!! >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> >>>>>>>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> On 2017-09-01 02:49, David Holmes wrote: >>>>>>>>>> Hi Erik, >>>>>>>>>> >>>>>>>>>> Sorry but this one is really losing me. >>>>>>>>>> >>>>>>>>>> What is the role of Adjustment ?? >>>>>>>>> >>>>>>>>> Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates. The type of the increment/decrement is the type of the destination when the destination is an integral type, otherwise if it is a pointer type, the increment/decrement type is ptrdiff_t. >>>>>>>>> >>>>>>>>>> How are inc/dec anything but "using constant" ?? >>>>>>>>> >>>>>>>>> I was also a bit torn on that name (I assume you are referring to IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted what this platform helper does. I considered calling the helper something with immediate in the name because it is really used to embed the constant as immediate values in inline assembly today. But then again that seemed too specific, as it is not completely obvious platform specializations will use it in that way. One might just want to specialize this to send it into some compiler Atomic::inc intrinsic for example. Do you have any other preferred names? Here are a few possible names for IncUsingConstant: >>>>>>>>> >>>>>>>>> IncUsingScaledConstant >>>>>>>>> IncUsingAdjustedConstant >>>>>>>>> IncUsingPlatformHelper >>>>>>>>> >>>>>>>>> Any favourites? >>>>>>>>> >>>>>>>>>> Why do we special case jshort?? >>>>>>>>> >>>>>>>>> To be consistent with the special case of Atomic::add on jshort. Do you want it removed? >>>>>>>>> >>>>>>>>>> This is indecipherable to normal people ;-) >>>>>>>>>> >>>>>>>>>> This()->template inc(dest); >>>>>>>>>> >>>>>>>>>> For something as trivial as adding or subtracting 1 the template machinations here are just mind boggling! >>>>>>>>> >>>>>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The idea is to devirtualize a virtual call by passing in the derived type as a template parameter to a base class, and then let the base class static_cast to the derived class to devirtualize the call. I hope this explanation sheds some light on what is going on. The same CRTP idiom was used in the Atomic::add implementation in a similar fashion. >>>>>>>>> >>>>>>>>> I will add some comments describing this in the next round after Coleen replies. >>>>>>>>> >>>>>>>>> Thanks for looking at this. >>>>>>>>> >>>>>>>>> /Erik >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> Bug ID: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>>>>>>> >>>>>>>>>>> The time has come for the next step in generalizing Atomic with templates. Today I will focus on Atomic::inc/dec. >>>>>>>>>>> >>>>>>>>>>> I have tried to mimic the new Kim style that seems to have been universally accepted. Like Atomic::add and Atomic::cmpxchg, the structure looks like this: >>>>>>>>>>> >>>>>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object that performs some basic type checks. >>>>>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define the operation arbitrarily for a given platform. The default implementation if not specialized for a platform is to call Atomic::add. So only platforms that want to do something different than that as an optimization have to provide a specialization. >>>>>>>>>>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to be more optimized may inherit from a helper class IncUsingConstant/DecUsingConstant. This helper helps performing the necessary computation what the increment/decrement should be after pointer scaling using CRTP. The PlatformInc/PlatformDec operation then only needs to define an inc/dec member function, and will then get all the context information necessary to generate a more optimized implementation. Easy peasy. >>>>>>>>>>> >>>>>>>>>>> It is worth noticing that the generalized Atomic::dec operation assumes a two's complement integer machine and potentially sends the unary negative of a potentially unsigned type to Atomic::add. I have the following comments about this: >>>>>>>>>>> 1) We already assume in other code that two's complement integers must be present. >>>>>>>>>>> 2) A machine that does not have two's complement integers may still simply provide a specialization that solves the problem in a different way. >>>>>>>>>>> 3) The alternative that does not make assumptions about that would use the good old IntegerTypes::cast_to_signed metaprogramming stuff, and I seem to recall we thought that was a bit too involved and complicated. >>>>>>>>>>> This is the reason why I have chosen to use unary minus on the potentially unsigned type in the shared helper code that sends the decrement as an addend to Atomic::add. >>>>>>>>>>> >>>>>>>>>>> It would also be nice if somebody with access to PPC and s390 machines could try out the relevant changes there so I do not accidentally break those platforms. I have blind-coded the addition of the immediate values passed in to the inline assembly in a way that I think looks like it should work. >>>>>>>>>>> >>>>>>>>>>> Testing: >>>>>>>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> /Erik >>>>>>>>> >>>>>>> >>>>> >>> From tobias.hartmann at oracle.com Wed Sep 20 07:36:58 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 20 Sep 2017 09:36:58 +0200 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: <041d766f-5c69-d538-b5f9-e0b0f26578e4@oracle.com> Vote: yes On 19.09.2017 19:55, coleen.phillimore at oracle.com wrote: > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 51 > changes.?? He is an expert in the area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by > replying to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > From volker.simonis at gmail.com Wed Sep 20 07:58:21 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 20 Sep 2017 09:58:21 +0200 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: Vote: yes On Tue, Sep 19, 2017 at 7:55 PM, wrote: > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to > Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 51 changes. He is an expert in the area > of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on this > nomination. Votes must be cast in the open by replying to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > From erik.helin at oracle.com Wed Sep 20 09:39:57 2017 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 20 Sep 2017 11:39:57 +0200 Subject: RFR: 8187667: Disable deprecation warning for readdir_r In-Reply-To: <0bc6d706-18f1-eddc-5cf1-ccadd52b12d7@oracle.com> References: <5081712c-9c62-5b6a-2e43-9b8d6e3ca64a@oracle.com> <0bc6d706-18f1-eddc-5cf1-ccadd52b12d7@oracle.com> Message-ID: <6c709bb7-5e93-204b-3065-b8a4bf32843b@oracle.com> On 09/19/2017 11:05 PM, David Holmes wrote: > Hi Erik, > > Reviewed! Thanks David for reviewing! Erik > On 19/09/2017 10:42 PM, Erik Helin wrote: >> Hi all, >> >> I'm continuing to run into some small problems when compiling HotSpot >> with a more recent toolchain. It seems like readdir_r [0] has been >> deprecated beginning with glibc 2.24 [1]. In HotSpot, we use readdir_r >> for os::readdir on Linux (defined in os_linux.inline.hpp). Since >> readdir_r most likely will stay around for a long time in glibc (even >> though in deprecated form), I figured it was best to just silence the >> deprecation warning from gcc. If readdir_r finally is removed one day, >> then we might have to look up the appropriate readdir function using >> dlopen, dlsym etc. > > I find it very odd that they have deprecated the thread-safe variant of > this function, and recommend use of the basic readdir. I can only assume > they have made readdir itself thread-safe, but given the POSIX spec does > not require that, noone can take advantage without locking into knowing > which glibc version they are running on! That seems an awful mess for > programmers. > > It is a good idea to just keep using it. > >> Patch: >> http://cr.openjdk.java.net/~ehelin/8187667/00/ >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8187667 >> >> Testing: >> - Compiles with: >> ?? - gcc 7.1.1 and glibc 2.25 on Fedora 26 >> ?? - gcc 4.9.2 and glibc 2.12 on OEL 6.4 >> - JPRT > > The change will cause warnings on gcc < 4.6, so this reinforces the need > to switch to our minimim gcc version ... which I forget :) > > Thanks, > David > > >> Thanks, >> Erik >> >> [0]: >> http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html >> [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=19056 From stefan.karlsson at oracle.com Wed Sep 20 11:37:35 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 20 Sep 2017 13:37:35 +0200 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: Vote: yes On 2017-09-19 19:55, coleen.phillimore at oracle.com wrote: > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to > Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 51 changes. He is an expert in the > area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > From david.holmes at oracle.com Wed Sep 20 11:43:02 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 20 Sep 2017 21:43:02 +1000 Subject: OpenJDK OOM issue - In-Reply-To: References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> Message-ID: Tim, Please note attachments get stripped from the mailing lists. All - please drop the jdk8-dev and jdk8u-dev mailing lists from this and leave it just on hotspot-dev. I've tried to bcc those lists. Thank you. David On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > Hi All > > Thank you all for the quick response. > The environment information is listed as below, could you please help to further check? > > 1. What OS is this? > # cat /etc/redhat-release > Red Hat Enterprise Linux Server release 6.9 (Santiago) > # uname -a > Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux > > 2.GC log is listed as below. The heap information cannot be printed out in gc-2017_09_20-09_21_15.log when OOM happens. In gc-2017_09_20-09_21_17.log, you can see the heap begins with 0x0000000787380000 and it should be not the first 4G virtual memory address. > -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log > -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log > -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log > -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log > -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log > -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log > -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log > > 3. This issue happens occasionally but frequently. We periodically launch a JAVA program to use JMX to monitor service status of another JAVA service. > > Br, > Tim > > > > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Tuesday, September 19, 2017 9:13 PM > To: Yu, Tim (NSB - CN/Chengdu) ; jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net > Cc: Shen, David (NSB - CN/Chengdu) > Subject: Re: OpenJDK OOM issue - > > On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >> Hi OpenJDK dev group >> >> We meet one issue that the VM failed to initialize. The error log is as below. We checked both memory usage and thread number. They do not hit the limit. So could you please help to confirm why "java.lang.OutOfMemoryError: unable to create new native thread" error occurs? Many thanks. > > What OS is this? > From karen.kinnear at oracle.com Wed Sep 20 12:10:59 2017 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 20 Sep 2017 08:10:59 -0400 Subject: CFV: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: vote: yes Karen > On Sep 19, 2017, at 1:55 PM, coleen.phillimore at oracle.com wrote: > > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 51 changes. He is an expert in the area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by replying to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > > From poonam.bajaj at oracle.com Wed Sep 20 16:15:35 2017 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Wed, 20 Sep 2017 09:15:35 -0700 (PDT) Subject: OpenJDK OOM issue - In-Reply-To: References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> Message-ID: <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> Hello Tim, >From the hs_err_pid12678.log file, the java heap is based at 0x715a00000 which is 28gb, so there should be plenty of space available for the native heap. Memory map: ... 00600000-00601000 rw-p 00000000 fc:01 17950 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-0.b11.el6_9.x86_64/jre/bin/java 019bc000-019dd000 rw-p 00000000 00:00 0 [heap] 715a00000-71cd00000 rw-p 00000000 00:00 0 71cd00000-787380000 ---p 00000000 00:00 0 ... Heap PSYoungGen total 51200K, used 13258K [0x0000000787380000, 0x000000078ac80000, 0x00000007c0000000) eden space 44032K, 30% used [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) from space 7168K, 0% used [0x000000078a580000,0x000000078a580000,0x000000078ac80000) to space 7168K, 0% used [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) ParOldGen total 117760K, used 0K [0x0000000715a00000, 0x000000071cd00000, 0x0000000787380000) object space 117760K, 0% used [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) Metaspace used 10485K, capacity 10722K, committed 11008K, reserved 1058816K class space used 1125K, capacity 1227K, committed 1280K, reserved 1048576K To narrow down the issue, would it be possible for you to test with -XX:-UseCompressedOops? Thanks, Poonam > -----Original Message----- > From: David Holmes > Sent: Wednesday, September 20, 2017 4:43 AM > To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; hotspot- > dev developers > Cc: Shen, David (NSB - CN/Chengdu) > Subject: Re: OpenJDK OOM issue - > > Tim, > > Please note attachments get stripped from the mailing lists. > > All - please drop the jdk8-dev and jdk8u-dev mailing lists from this > and leave it just on hotspot-dev. I've tried to bcc those lists. > > Thank you. > > David > > On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > > Hi All > > > > Thank you all for the quick response. > > The environment information is listed as below, could you please help > to further check? > > > > 1. What OS is this? > > # cat /etc/redhat-release > > Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a > > Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 13:24:18 > > EDT 2017 x86_64 x86_64 x86_64 GNU/Linux > > > > 2.GC log is listed as below. The heap information cannot be printed > out in gc-2017_09_20-09_21_15.log when OOM happens. In gc-2017_09_20- > 09_21_17.log, you can see the heap begins with 0x0000000787380000 and > it should be not the first 4G virtual memory address. > > -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log > > -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log > > -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log > > -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log > > -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log > > -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log > > -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log > > > > 3. This issue happens occasionally but frequently. We periodically > launch a JAVA program to use JMX to monitor service status of another > JAVA service. > > > > Br, > > Tim > > > > > > > > > > -----Original Message----- > > From: Andrew Haley [mailto:aph at redhat.com] > > Sent: Tuesday, September 19, 2017 9:13 PM > > To: Yu, Tim (NSB - CN/Chengdu) ; > > jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net > > Cc: Shen, David (NSB - CN/Chengdu) > > Subject: Re: OpenJDK OOM issue - > > > > On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: > >> Hi OpenJDK dev group > >> > >> We meet one issue that the VM failed to initialize. The error log is > as below. We checked both memory usage and thread number. They do not > hit the limit. So could you please help to confirm why > "java.lang.OutOfMemoryError: unable to create new native thread" error > occurs? Many thanks. > > > > What OS is this? > > From vladimir.kozlov at oracle.com Wed Sep 20 16:30:10 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Sep 2017 09:30:10 -0700 Subject: OpenJDK OOM issue - In-Reply-To: <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> Message-ID: <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> On Linux we should not have java heap's low address memory problem. Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 kB"). Also 5326 processes it a lot. Overloaded system? Poonam, do we have bug for this? Can you attached hs_err file to it. Thanks, Vladimir On 9/20/17 9:15 AM, Poonam Parhar wrote: > Hello Tim, > > From the hs_err_pid12678.log file, the java heap is based at 0x715a00000 which is 28gb, so there should be plenty of space available for the native heap. > > Memory map: > ... > 00600000-00601000 rw-p 00000000 fc:01 17950 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-0.b11.el6_9.x86_64/jre/bin/java > 019bc000-019dd000 rw-p 00000000 00:00 0 [heap] > 715a00000-71cd00000 rw-p 00000000 00:00 0 > 71cd00000-787380000 ---p 00000000 00:00 0 > ... > > Heap > PSYoungGen total 51200K, used 13258K [0x0000000787380000, 0x000000078ac80000, 0x00000007c0000000) > eden space 44032K, 30% used [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) > from space 7168K, 0% used [0x000000078a580000,0x000000078a580000,0x000000078ac80000) > to space 7168K, 0% used [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) > ParOldGen total 117760K, used 0K [0x0000000715a00000, 0x000000071cd00000, 0x0000000787380000) > object space 117760K, 0% used [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) > Metaspace used 10485K, capacity 10722K, committed 11008K, reserved 1058816K > class space used 1125K, capacity 1227K, committed 1280K, reserved 1048576K > > > To narrow down the issue, would it be possible for you to test with -XX:-UseCompressedOops? > > Thanks, > Poonam > >> -----Original Message----- >> From: David Holmes >> Sent: Wednesday, September 20, 2017 4:43 AM >> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; hotspot- >> dev developers >> Cc: Shen, David (NSB - CN/Chengdu) >> Subject: Re: OpenJDK OOM issue - >> >> Tim, >> >> Please note attachments get stripped from the mailing lists. >> >> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this >> and leave it just on hotspot-dev. I've tried to bcc those lists. >> >> Thank you. >> >> David >> >> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>> Hi All >>> >>> Thank you all for the quick response. >>> The environment information is listed as below, could you please help >> to further check? >>> >>> 1. What OS is this? >>> # cat /etc/redhat-release >>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a >>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 13:24:18 >>> EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> >>> 2.GC log is listed as below. The heap information cannot be printed >> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc-2017_09_20- >> 09_21_17.log, you can see the heap begins with 0x0000000787380000 and >> it should be not the first 4G virtual memory address. >>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log >>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log >>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log >>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log >>> >>> 3. This issue happens occasionally but frequently. We periodically >> launch a JAVA program to use JMX to monitor service status of another >> JAVA service. >>> >>> Br, >>> Tim >>> >>> >>> >>> >>> -----Original Message----- >>> From: Andrew Haley [mailto:aph at redhat.com] >>> Sent: Tuesday, September 19, 2017 9:13 PM >>> To: Yu, Tim (NSB - CN/Chengdu) ; >>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net >>> Cc: Shen, David (NSB - CN/Chengdu) >>> Subject: Re: OpenJDK OOM issue - >>> >>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >>>> Hi OpenJDK dev group >>>> >>>> We meet one issue that the VM failed to initialize. The error log is >> as below. We checked both memory usage and thread number. They do not >> hit the limit. So could you please help to confirm why >> "java.lang.OutOfMemoryError: unable to create new native thread" error >> occurs? Many thanks. >>> >>> What OS is this? >>> From vladimir.kozlov at oracle.com Wed Sep 20 16:37:07 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 20 Sep 2017 09:37:07 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> <11af0f62-ba6b-d533-d23c-750d2ca012c7@oracle.com> Message-ID: On 9/14/17 11:40 AM, Rohit Arul Raj wrote: > Hello Vladimir, > >> CPUID check for 0x8000001E should be explicit. Otherwise the code will be >> executed for all above 0x80000008. >> >> + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) >> supported? >> + __ jccb(Assembler::belowEqual, ext_cpuid8); >> + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? >> + __ jccb(Assembler::notEqual, ext_cpuid8); >> + // >> + // Extended cpuid(0x8000001E) >> + // >> > > AMD17h has CPUID 0x8000001F too, so the notEqual condition will be false. You are right. > I have modified the last statement as below: > > + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) supported? > + __ jccb(Assembler::belowEqual, ext_cpuid8); > + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? > + __ jccb(Assembler::below, ext_cpuid8); > > Is this OK? Yes. > > After updating the above changes, I got "Short forward jump exceeds > 8-bit offset" error while building openJDK. > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (macroAssembler_x86.hpp:116), pid=7786, tid=7787 > # guarantee(this->is8bit(imm8)) failed: Short forward jump exceeds 8-bit offset > > So I have replaced the short jump with near jump while checking for > CPUID 0x80000005. > > __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? > __ jcc(Assembler::belowEqual, done); > __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? > - __ jccb(Assembler::belowEqual, ext_cpuid1); > + __ jcc(Assembler::belowEqual, ext_cpuid1); Good. You may need to increase size of the buffer too (to be safe) to 1100: static const int stub_size = 1000; Thanks, Vladimir > > I have attached the updated, re-tested patch. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -70,7 +70,7 @@ > bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); > > Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; > - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > done, wrapup; > + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > ext_cpuid8, done, wrapup; > Label legacy_setup, save_restore_except, legacy_save_restore, > start_simd_check; > > StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); > @@ -267,14 +267,30 @@ > __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? > __ jcc(Assembler::belowEqual, done); > __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? > - __ jccb(Assembler::belowEqual, ext_cpuid1); > + __ jcc(Assembler::belowEqual, ext_cpuid1); > __ cmpl(rax, 0x80000006); // Is cpuid(0x80000007) supported? > __ jccb(Assembler::belowEqual, ext_cpuid5); > __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? > __ jccb(Assembler::belowEqual, ext_cpuid7); > + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) supported? > + __ jccb(Assembler::belowEqual, ext_cpuid8); > + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? > + __ jccb(Assembler::below, ext_cpuid8); > + // > + // Extended cpuid(0x8000001E) > + // > + __ movl(rax, 0x8000001E); > + __ cpuid(); > + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid1E_offset()))); > + __ movl(Address(rsi, 0), rax); > + __ movl(Address(rsi, 4), rbx); > + __ movl(Address(rsi, 8), rcx); > + __ movl(Address(rsi,12), rdx); > + > // > // Extended cpuid(0x80000008) > // > + __ bind(ext_cpuid8); > __ movl(rax, 0x80000008); > __ cpuid(); > __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); > @@ -1109,11 +1125,27 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > +#ifdef COMPILER2 > + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -228,6 +228,15 @@ > } bits; > }; > > + union ExtCpuid1EEbx { > + uint32_t value; > + struct { > + uint32_t : 8, > + threads_per_core : 8, > + : 16; > + } bits; > + }; > + > union XemXcr0Eax { > uint32_t value; > struct { > @@ -398,6 +407,12 @@ > ExtCpuid8Ecx ext_cpuid8_ecx; > uint32_t ext_cpuid8_edx; // reserved > > + // cpuid function 0x8000001E // AMD 17h > + uint32_t ext_cpuid1E_eax; > + ExtCpuid1EEbx ext_cpuid1E_ebx; // threads per core (AMD17h) > + uint32_t ext_cpuid1E_ecx; > + uint32_t ext_cpuid1E_edx; // unused currently > + > // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) > XemXcr0Eax xem_xcr0_eax; > uint32_t xem_xcr0_edx; // reserved > @@ -505,6 +520,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -518,16 +541,8 @@ > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > @@ -590,6 +605,7 @@ > static ByteSize ext_cpuid5_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid5_eax); } > static ByteSize ext_cpuid7_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid7_eax); } > static ByteSize ext_cpuid8_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid8_eax); } > + static ByteSize ext_cpuid1E_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } > static ByteSize tpl_cpuidB0_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } > static ByteSize tpl_cpuidB1_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } > static ByteSize tpl_cpuidB2_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } > @@ -673,8 +689,12 @@ > if (is_intel() && supports_processor_topology()) { > result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; > } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { > - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > - cores_per_cpu(); > + if (cpu_family() >= 0x17) { > + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; > + } else { > + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > + cores_per_cpu(); > + } > } > return (result == 0 ? 1 : result); > } > > Please let me know your comments. > Thanks for your review. > > Regards, > Rohit > >> >> >> On 9/11/17 9:52 PM, Rohit Arul Raj wrote: >>> >>> Hello David, >>> >>>>> >>>>> >>>>> 1. ExtCpuid1EEx >>>>> >>>>> Should this be ExtCpuid1EEbx? (I see the naming here is somewhat >>>>> inconsistent - and potentially confusing: I would have preferred to see >>>>> things like ExtCpuid_1E_Ebx, to make it clear.) >>>> >>>> >>>> Yes, I can change it accordingly. >>>> >>> >>> I have attached the updated, re-tested patch as per your comments above. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -70,7 +70,7 @@ >>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>> >>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>> done, wrapup; >>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>> ext_cpuid8, done, wrapup; >>> Label legacy_setup, save_restore_except, legacy_save_restore, >>> start_simd_check; >>> >>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>> @@ -272,9 +272,23 @@ >>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>> + // >>> + // Extended cpuid(0x8000001E) >>> + // >>> + __ movl(rax, 0x8000001E); >>> + __ cpuid(); >>> + __ lea(rsi, Address(rbp, >>> in_bytes(VM_Version::ext_cpuid_1E_offset()))); >>> + __ movl(Address(rsi, 0), rax); >>> + __ movl(Address(rsi, 4), rbx); >>> + __ movl(Address(rsi, 8), rcx); >>> + __ movl(Address(rsi,12), rdx); >>> + >>> // >>> // Extended cpuid(0x80000008) >>> // >>> + __ bind(ext_cpuid8); >>> __ movl(rax, 0x80000008); >>> __ cpuid(); >>> __ lea(rsi, Address(rbp, >>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>> @@ -1109,11 +1123,27 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -228,6 +228,15 @@ >>> } bits; >>> }; >>> >>> + union ExtCpuid_1E_Ebx { >>> + uint32_t value; >>> + struct { >>> + uint32_t : 8, >>> + threads_per_core : 8, >>> + : 16; >>> + } bits; >>> + }; >>> + >>> union XemXcr0Eax { >>> uint32_t value; >>> struct { >>> @@ -398,6 +407,12 @@ >>> ExtCpuid8Ecx ext_cpuid8_ecx; >>> uint32_t ext_cpuid8_edx; // reserved >>> >>> + // cpuid function 0x8000001E // AMD 17h >>> + uint32_t ext_cpuid_1E_eax; >>> + ExtCpuid_1E_Ebx ext_cpuid_1E_ebx; // threads per core (AMD17h) >>> + uint32_t ext_cpuid_1E_ecx; >>> + uint32_t ext_cpuid_1E_edx; // unused currently >>> + >>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>> register) >>> XemXcr0Eax xem_xcr0_eax; >>> uint32_t xem_xcr0_edx; // reserved >>> @@ -505,6 +520,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -518,16 +541,8 @@ >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> @@ -590,6 +605,7 @@ >>> static ByteSize ext_cpuid5_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>> static ByteSize ext_cpuid7_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>> static ByteSize ext_cpuid8_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>> + static ByteSize ext_cpuid_1E_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid_1E_eax); } >>> static ByteSize tpl_cpuidB0_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>> static ByteSize tpl_cpuidB1_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>> static ByteSize tpl_cpuidB2_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>> @@ -673,8 +689,11 @@ >>> if (is_intel() && supports_processor_topology()) { >>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>> - cores_per_cpu(); >>> + if (cpu_family() >= 0x17) >>> + result = _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + 1; >>> + else >>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>> + cores_per_cpu(); >>> } >>> return (result == 0 ? 1 : result); >>> } >>> >>> >>> Please let me know your comments >>> >>> Thanks for your time. >>> >>> Regards, >>> Rohit >>> >>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>> >>>>>> Reference: >>>>>> >>>>>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >>>>>> [Pg 82] >>>>>> >>>>>> CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >>>>>> 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh. >>>>>> The number of threads per core is ThreadsPerCore+1. >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -70,7 +70,7 @@ >>>>>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>>>> >>>>>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>>>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>>> done, wrapup; >>>>>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>>> ext_cpuid8, done, wrapup; >>>>>> Label legacy_setup, save_restore_except, legacy_save_restore, >>>>>> start_simd_check; >>>>>> >>>>>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>>>> @@ -272,9 +272,23 @@ >>>>>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>>>>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>>>>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>>>>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>>>>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>>>>> + // >>>>>> + // Extended cpuid(0x8000001E) >>>>>> + // >>>>>> + __ movl(rax, 0x8000001E); >>>>>> + __ cpuid(); >>>>>> + __ lea(rsi, Address(rbp, >>>>>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>>>>> + __ movl(Address(rsi, 0), rax); >>>>>> + __ movl(Address(rsi, 4), rbx); >>>>>> + __ movl(Address(rsi, 8), rcx); >>>>>> + __ movl(Address(rsi,12), rdx); >>>>>> + >>>>>> // >>>>>> // Extended cpuid(0x80000008) >>>>>> // >>>>>> + __ bind(ext_cpuid8); >>>>>> __ movl(rax, 0x80000008); >>>>>> __ cpuid(); >>>>>> __ lea(rsi, Address(rbp, >>>>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>>>> @@ -1109,11 +1123,27 @@ >>>>>> } >>>>>> >>>>>> #ifdef COMPILER2 >>>>>> - if (MaxVectorSize > 16) { >>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> } >>>>>> #endif // COMPILER2 >>>>>> + >>>>>> + // Some defaults for AMD family 17h >>>>>> + if ( cpu_family() == 0x17 ) { >>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>> Array Copy >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>> + } >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>> { >>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>> + } >>>>>> +#ifdef COMPILER2 >>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> + } >>>>>> +#endif >>>>>> + } >>>>>> } >>>>>> >>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -228,6 +228,15 @@ >>>>>> } bits; >>>>>> }; >>>>>> >>>>>> + union ExtCpuid1EEx { >>>>>> + uint32_t value; >>>>>> + struct { >>>>>> + uint32_t : 8, >>>>>> + threads_per_core : 8, >>>>>> + : 16; >>>>>> + } bits; >>>>>> + }; >>>>>> + >>>>>> union XemXcr0Eax { >>>>>> uint32_t value; >>>>>> struct { >>>>>> @@ -398,6 +407,12 @@ >>>>>> ExtCpuid8Ecx ext_cpuid8_ecx; >>>>>> uint32_t ext_cpuid8_edx; // reserved >>>>>> >>>>>> + // cpuid function 0x8000001E // AMD 17h >>>>>> + uint32_t ext_cpuid1E_eax; >>>>>> + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >>>>>> + uint32_t ext_cpuid1E_ecx; >>>>>> + uint32_t ext_cpuid1E_edx; // unused currently >>>>>> + >>>>>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>>>> register) >>>>>> XemXcr0Eax xem_xcr0_eax; >>>>>> uint32_t xem_xcr0_edx; // reserved >>>>>> @@ -505,6 +520,14 @@ >>>>>> result |= CPU_CLMUL; >>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>> result |= CPU_RTM; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> + result |= CPU_ADX; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> + result |= CPU_BMI2; >>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> + result |= CPU_SHA; >>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> + result |= CPU_FMA; >>>>>> >>>>>> // AMD features. >>>>>> if (is_amd()) { >>>>>> @@ -518,16 +541,8 @@ >>>>>> } >>>>>> // Intel features. >>>>>> if(is_intel()) { >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> - result |= CPU_ADX; >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> - result |= CPU_BMI2; >>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> - result |= CPU_SHA; >>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>> result |= CPU_LZCNT; >>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> - result |= CPU_FMA; >>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>> support for prefetchw >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>> @@ -590,6 +605,7 @@ >>>>>> static ByteSize ext_cpuid5_offset() { return >>>>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>>>> static ByteSize ext_cpuid7_offset() { return >>>>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>>>> static ByteSize ext_cpuid8_offset() { return >>>>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>>>> + static ByteSize ext_cpuid1E_offset() { return >>>>>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>>>>> static ByteSize tpl_cpuidB0_offset() { return >>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>>>> static ByteSize tpl_cpuidB1_offset() { return >>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>>>> static ByteSize tpl_cpuidB2_offset() { return >>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>>>> @@ -673,8 +689,11 @@ >>>>>> if (is_intel() && supports_processor_topology()) { >>>>>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>>>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>>>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>> - cores_per_cpu(); >>>>>> + if (cpu_family() >= 0x17) >>>>>> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + >>>>>> 1; >>>>>> + else >>>>>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>> + cores_per_cpu(); >>>>>> } >>>>>> return (result == 0 ? 1 : result); >>>>>> } >>>>>> >>>>>> I have attached the patch for review. >>>>>> Please let me know your comments. >>>>>> >>>>>> Thanks, >>>>>> Rohit >>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> >>>>>>>> No comments on AMD specific changes. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hello David, >>>>>>>>>> >>>>>>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Rohit, >>>>>>>>>>> >>>>>>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I checked out the latest jdk10/hs/hotspot [parent: >>>>>>>>>> 13548:1a9c2e07a826] >>>>>>>>>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch] >>>>>>>>>> without any issues. >>>>>>>>>> Can you share the error message that you are getting? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I was getting this: >>>>>>>>> >>>>>>>>> applying hotspot.patch >>>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> Hunk #1 FAILED at 1108 >>>>>>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> Hunk #2 FAILED at 522 >>>>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>>>>>> abort: patch failed to apply >>>>>>>>> >>>>>>>>> but I started again and this time it applied fine, so not sure what >>>>>>>>> was >>>>>>>>> going on there. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>> >>>>>>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Changes look good. Only question I have is about >>>>>>>>>>>>>>> MaxVectorSize. >>>>>>>>>>>>>>> It >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> set >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD >>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>> So >>>>>>>>>>>>>> I have removed the surplus check for MaxVectorSize from my >>>>>>>>>>>>>> patch. >>>>>>>>>>>>>> I >>>>>>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Which check you removed? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> My older patch had the below mentioned check which was required >>>>>>>>>>>> on >>>>>>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been handled >>>>>>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>>>>>> >>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>> ... >>>>>>>>>>>> ... >>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>> + } >>>>>>>>>>>> .. >>>>>>>>>>>> .. >>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>>>>>> >>>>>>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>>>>>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>>>>>>>>> there >>>>>>>>>>>>>> an >>>>>>>>>>>>>> underlying reason for this? I have handled this in the patch >>>>>>>>>>>>>> but >>>>>>>>>>>>>> just >>>>>>>>>>>>>> wanted to confirm. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>>>>>> instructions >>>>>>>>>>>>> to >>>>>>>>>>>>> calculate SHA-256: >>>>>>>>>>>>> >>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>>>>>> >>>>>>>>>>>>> I don't know if AMD 15h supports these instructions and can >>>>>>>>>>>>> execute >>>>>>>>>>>>> that >>>>>>>>>>>>> code. You need to test it. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 >>>>>>>>>>>> instructions, >>>>>>>>>>>> it should work. >>>>>>>>>>>> Confirmed by running following sanity tests: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>>>>>> >>>>>>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>>>>>> >>>>>>>>>>>> Please find attached updated, re-tested patch. >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>> } >>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>> + >>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>> for >>>>>>>>>>>> Array Copy >>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>> + } >>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>> { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>> + } >>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>> + if (supports_sse4_2() && >>>>>>>>>>>> FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>>>>>> { >>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>> + } >>>>>>>>>>>> +#endif >>>>>>>>>>>> + } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>> >>>>>>>>>>>> // AMD features. >>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>> } >>>>>>>>>>>> // Intel features. >>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>> indicates >>>>>>>>>>>> support for prefetchw >>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) >>>>>>>>>>>> { >>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>> >>>>>>>>>>>> Please let me know your comments. >>>>>>>>>>>> >>>>>>>>>>>> Thanks for your time. >>>>>>>>>>>> Rohit >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>> || >>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> >>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>> for >>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>> hash >>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>> >>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != >>>>>>>>>>>>>> 0) >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>> indicates >>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already see >>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> lot of >>>>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, >>>>>>>>>>>>>>>>> test >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg >>>>>>>>>>>>>>>> ($make >>>>>>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can anyone please volunteer to review this patch which sets >>>>>>>>>>>>>>>> flag/ISA >>>>>>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ************************* Patch **************************** >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel >>>>>>>>>>>>>>>> != 0) >>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse >>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ************************************************************** >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I would like an volunteer to review this patch >>>>>>>>>>>>>>>>>>>>> (openJDK9) >>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and >>>>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for >>>>>>>>>>>>>>>>>>>>> reference. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the >>>>>>>>>>>>>>>>>>>> mail >>>>>>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you >>>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to >>>>>>>>>>>>>>>>>>>> comment >>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>> + warning("SHA instructions are not available on >>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < >>>>>>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != >>>>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>> > From poonam.bajaj at oracle.com Wed Sep 20 16:37:24 2017 From: poonam.bajaj at oracle.com (Poonam Parhar) Date: Wed, 20 Sep 2017 09:37:24 -0700 (PDT) Subject: OpenJDK OOM issue - In-Reply-To: <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> Message-ID: <7368e002-9966-4813-b144-350690cc0566@default> Hi Vladimir, > -----Original Message----- > From: Vladimir Kozlov > Sent: Wednesday, September 20, 2017 9:30 AM > To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew > Haley; hotspot-dev developers > Cc: Shen, David (NSB - CN/Chengdu) > Subject: Re: OpenJDK OOM issue - > > On Linux we should not have java heap's low address memory problem. > > Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 > kB"). > > Also 5326 processes it a lot. Overloaded system? > > Poonam, do we have bug for this? Can you attached hs_err file to it. > No, there is no bug for this. Thanks, Poonam > Thanks, > Vladimir > > On 9/20/17 9:15 AM, Poonam Parhar wrote: > > Hello Tim, > > > > From the hs_err_pid12678.log file, the java heap is based at > 0x715a00000 which is 28gb, so there should be plenty of space available > for the native heap. > > > > Memory map: > > ... > > 00600000-00601000 rw-p 00000000 fc:01 17950 > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- > 0.b11.el6_9.x86_64/jre/bin/java > > 019bc000-019dd000 rw-p 00000000 00:00 0 > [heap] > > 715a00000-71cd00000 rw-p 00000000 00:00 0 > > 71cd00000-787380000 ---p 00000000 00:00 0 ... > > > > Heap > > PSYoungGen total 51200K, used 13258K [0x0000000787380000, > 0x000000078ac80000, 0x00000007c0000000) > > eden space 44032K, 30% used > [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) > > from space 7168K, 0% used > [0x000000078a580000,0x000000078a580000,0x000000078ac80000) > > to space 7168K, 0% used > [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) > > ParOldGen total 117760K, used 0K [0x0000000715a00000, > 0x000000071cd00000, 0x0000000787380000) > > object space 117760K, 0% used > [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) > > Metaspace used 10485K, capacity 10722K, committed 11008K, > reserved 1058816K > > class space used 1125K, capacity 1227K, committed 1280K, > reserved 1048576K > > > > > > To narrow down the issue, would it be possible for you to test with - > XX:-UseCompressedOops? > > > > Thanks, > > Poonam > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Wednesday, September 20, 2017 4:43 AM > >> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; > hotspot- > >> dev developers > >> Cc: Shen, David (NSB - CN/Chengdu) > >> Subject: Re: OpenJDK OOM issue - > >> > >> Tim, > >> > >> Please note attachments get stripped from the mailing lists. > >> > >> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this > >> and leave it just on hotspot-dev. I've tried to bcc those lists. > >> > >> Thank you. > >> > >> David > >> > >> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > >>> Hi All > >>> > >>> Thank you all for the quick response. > >>> The environment information is listed as below, could you please > >>> help > >> to further check? > >>> > >>> 1. What OS is this? > >>> # cat /etc/redhat-release > >>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a > >>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 > >>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux > >>> > >>> 2.GC log is listed as below. The heap information cannot be printed > >> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- > 2017_09_20- > >> 09_21_17.log, you can see the heap begins with 0x0000000787380000 > and > >> it should be not the first 4G virtual memory address. > >>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log > >>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log > >>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log > >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log > >>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log > >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log > >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log > >>> > >>> 3. This issue happens occasionally but frequently. We periodically > >> launch a JAVA program to use JMX to monitor service status of > another > >> JAVA service. > >>> > >>> Br, > >>> Tim > >>> > >>> > >>> > >>> > >>> -----Original Message----- > >>> From: Andrew Haley [mailto:aph at redhat.com] > >>> Sent: Tuesday, September 19, 2017 9:13 PM > >>> To: Yu, Tim (NSB - CN/Chengdu) ; > >>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net > >>> Cc: Shen, David (NSB - CN/Chengdu) > >>> Subject: Re: OpenJDK OOM issue - > >>> > >>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: > >>>> Hi OpenJDK dev group > >>>> > >>>> We meet one issue that the VM failed to initialize. The error log > >>>> is > >> as below. We checked both memory usage and thread number. They do > not > >> hit the limit. So could you please help to confirm why > >> "java.lang.OutOfMemoryError: unable to create new native thread" > >> error occurs? Many thanks. > >>> > >>> What OS is this? > >>> From tim.yu at nokia-sbell.com Wed Sep 20 08:44:23 2017 From: tim.yu at nokia-sbell.com (Yu, Tim (NSB - CN/Chengdu)) Date: Wed, 20 Sep 2017 08:44:23 +0000 Subject: OpenJDK OOM issue - In-Reply-To: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> Message-ID: Hi All Thank you all for the quick response. The environment information is listed as below, could you please help to further check? 1. What OS is this? # cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux 2.GC log is listed as below. The heap information cannot be printed out in gc-2017_09_20-09_21_15.log when OOM happens. In gc-2017_09_20-09_21_17.log, you can see the heap begins with 0x0000000787380000 and it should be not the first 4G virtual memory address. -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log 3. This issue happens occasionally but frequently. We periodically launch a JAVA program to use JMX to monitor service status of another JAVA service. Br, Tim -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Tuesday, September 19, 2017 9:13 PM To: Yu, Tim (NSB - CN/Chengdu) ; jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net Cc: Shen, David (NSB - CN/Chengdu) Subject: Re: OpenJDK OOM issue - On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: > Hi OpenJDK dev group > > We meet one issue that the VM failed to initialize. The error log is as below. We checked both memory usage and thread number. They do not hit the limit. So could you please help to confirm why "java.lang.OutOfMemoryError: unable to create new native thread" error occurs? Many thanks. What OS is this? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From mikael.vidstedt at oracle.com Thu Sep 21 00:17:02 2017 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Wed, 20 Sep 2017 17:17:02 -0700 Subject: Repo consolidation update: jdk10/master, jdk10/client, and jdk10/hotspot open for pushes In-Reply-To: <4de3c93d-5091-f0e8-9bc3-54db95965d6f@oracle.com> References: <2b5ce3ba-5f8f-d884-64fe-ee144763a27e@oracle.com> <6c6d93cd-eb53-f743-4322-279abf96b2c1@oracle.com> <4b214f54-88c9-55e7-c5e0-bdf6ca352686@oracle.com> <4de3c93d-5091-f0e8-9bc3-54db95965d6f@oracle.com> Message-ID: All, TL;DR: jdk10/hs remains closed for now. As Joe mentions the consolidated jdk10/hs repo has been created, but please hold off on pushing anything for now. We?d like to have some time to do a full test run and verify that we start off with the new repo in a known good state. Hopefully we?ll have the necessary test result data within the next day or two and if everything looks good the repos will be opened. Thanks, Mikael > On Sep 20, 2017, at 3:12 PM, joe darcy wrote: > > Hello, > > The JDK 10 master line of development at > > http://hg.openjdk.java.net/jdk10/master/ > > and the two integration lines of development at > > http://hg.openjdk.java.net/jdk10/client/ > http://hg.openjdk.java.net/jdk10/hs/ > > are now open for pushes. All three are consolidated and are currently equivalent to JDK 10 b24. > > Consolidated versions of other JDK 10-derived lines of development, such as for Project Amber, Project Valhalla, and the sandbox, will be created in the coming days. > > If you have clones of any of the old non-consolidated JDK 10 forests with outstanding work to bring over, you can: > > * Synchronize the old forest with jdk10/jdk10. The jdk10/jdk10 forest is an archive of the non-consolidated master forest at build 23. Before being closed to pushes, both the jdk10/client and jdk10/hs forests were synchronized with jdk10/jdk10. > > * Extract the patch of the in-progress work. > > * Run the patch through the patch conversion script, bin/unshuffle_patch.sh in the consolidated repository. > > * Apply the converted patch to a clone of a consolidated repo and resolved any conflicts, etc. > > Thank you to the many people who worked on the consolidation project! > > Cheers, > > -Joe > From erik.helin at oracle.com Thu Sep 21 09:02:06 2017 From: erik.helin at oracle.com (Erik Helin) Date: Thu, 21 Sep 2017 11:02:06 +0200 Subject: RFR: 8187676: Disable harmless uninitialized warnings for two files In-Reply-To: <4f5b0427-54bf-2b85-0a94-bb41049d2676@oracle.com> References: <2031be3e-2623-dde1-fff2-2d6cd6e41de9@oracle.com> <7512e87d-4e28-27a1-5e10-5cdfa794cdf4@oracle.com> <4f5b0427-54bf-2b85-0a94-bb41049d2676@oracle.com> Message-ID: <1a8dd6cc-8cf2-bb0e-af09-ea53324c85e3@oracle.com> Ok, lets wait for Rahul's patches. Rahul, when you post your patches, CC me and I can check if gcc 7.1.1 still complains :) Thanks, Erik On 09/19/2017 06:25 PM, Vladimir Kozlov wrote: > I would prefer to have general solution Rahul is working on because code > is general - not only x86 is affected. > > Thanks, > Vladimir > > On 9/19/17 7:59 AM, Rahul Raghavan wrote: >> Hi Erik, >> >> Please note that this 8187676 seems to be related to 8160404. >> ??? https://bugs.openjdk.java.net/browse/JDK-8160404 >> ??? (RelocationHolder constructors have bugs) >> >> As per the latest notes comments added for 8160404-jbs, I will submit >> webrev/RFR soon and will request help confirm similar issues with >> latest gcc7 gets solved. >> >> Thanks, >> Rahul >> >> On Tuesday 19 September 2017 07:07 PM, Erik Helin wrote: >>> Hi all, >>> >>> with gcc 7.1.1 from Fedora 26 on x86-64 there are warnings about the >>> potential usage of maybe uninitialized memory in >>> src/hotspot/cpu/x86/assembler_x86.cpp and in >>> src/hotspot/cpu/x86/interp_masm_x86.cpp. >>> >>> The problems arises from the class RelocationHolder in >>> src/hotspot/share/code/relocInfo.hpp which has the private fields: >>> ?? enum { _relocbuf_size = 5 }; >>> ?? void* _relocbuf[ _relocbuf_size ]; >>> >>> and the default constructor for RelocationHolder does not initialize >>> the elements of _relocbuf. I _think_ this is an optimization, >>> RelocationHolder is used *a lot* and setting the elements of >>> RelocationHolder::_relocbuf to NULL (or some other value) in the >>> default constructor might result in a performance penalty. Have a >>> look in >>> build/linux-x86_64-normal-server-fastdebug/hotspot/variant-server/gensrc/adfiles >>> and you will see that RelocationHolder is used all over the place :) >>> >>> AFAICS all users of RelocationHolder::_relocbuf take care to not use >>> uninitialized memory, which means that this warning is wrong, so I >>> suggest we disable the warning -Wmaybe-uninitialized for >>> src/hotspot/cpu/x86/assembler_x86.cpp. >>> >>> The problem continues because the class Address in >>> src/hotspot/cpu/x86/assembler_x86.hpp has a private field, >>> `RelocationHolder _rspec;` and the default constructor for Address >>> does not initialize _rspec._relocbuf (most likely for performance >>> reasons). The class Address also has a default copy constructor, >>> which will copy all the elements of _rspec._relocbuf, which will >>> result in a read of uninitialized memory. However, this is a benign >>> usage of uninitialized memory, since we take no action based on the >>> content of the uninitialized memory (it is just copied byte for byte). >>> >>> So, in this case too, I suggest we disable the warning >>> -Wuninitialized for src/hotspot/cpu/x86/assembler_x86.hpp. >>> >>> What do you think? >>> >>> Patch: >>> http://cr.openjdk.java.net/~ehelin/8187676/00/ >>> >>> --- old/make/hotspot/lib/JvmOverrideFiles.gmk??? 2017-09-19 >>> 15:11:45.036108983 +0200 >>> +++ new/make/hotspot/lib/JvmOverrideFiles.gmk??? 2017-09-19 >>> 15:11:44.692107277 +0200 >>> @@ -32,6 +32,8 @@ >>> ? ifeq ($(TOOLCHAIN_TYPE), gcc) >>> ??? BUILD_LIBJVM_vmStructs.cpp_CXXFLAGS := >>> -fno-var-tracking-assignments -O0 >>> ??? BUILD_LIBJVM_jvmciCompilerToVM.cpp_CXXFLAGS := >>> -fno-var-tracking-assignments >>> +? BUILD_LIBJVM_assembler_x86.cpp_CXXFLAGS := -Wno-maybe-uninitialized >>> +? BUILD_LIBJVM_interp_masm_x86.cpp_CXXFLAGS := -Wno-uninitialized >>> ? endif >>> >>> ? ifeq ($(OPENJDK_TARGET_OS), linux) >>> >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8187676 >>> >>> Testing: >>> - Compiles with: >>> ?? - gcc 7.1.1 and glibc 2.25 on Fedora 26 >>> ?? - gcc 4.9.2 and glibc 2.12 on OEL 6.4 >>> - JPRT >>> >>> Thanks, >>> Erik From stefan.johansson at oracle.com Thu Sep 21 09:12:36 2017 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 21 Sep 2017 11:12:36 +0200 Subject: RFR: 8187667: Disable deprecation warning for readdir_r In-Reply-To: <5081712c-9c62-5b6a-2e43-9b8d6e3ca64a@oracle.com> References: <5081712c-9c62-5b6a-2e43-9b8d6e3ca64a@oracle.com> Message-ID: On 2017-09-19 14:42, Erik Helin wrote: > Hi all, > > I'm continuing to run into some small problems when compiling HotSpot > with a more recent toolchain. It seems like readdir_r [0] has been > deprecated beginning with glibc 2.24 [1]. In HotSpot, we use readdir_r > for os::readdir on Linux (defined in os_linux.inline.hpp). Since > readdir_r most likely will stay around for a long time in glibc (even > though in deprecated form), I figured it was best to just silence the > deprecation warning from gcc. If readdir_r finally is removed one day, > then we might have to look up the appropriate readdir function using > dlopen, dlsym etc. > > Patch: > http://cr.openjdk.java.net/~ehelin/8187667/00/ > Looks good, StefanJ > Bug: > https://bugs.openjdk.java.net/browse/JDK-8187667 > > Testing: > - Compiles with: > ? - gcc 7.1.1 and glibc 2.25 on Fedora 26 > ? - gcc 4.9.2 and glibc 2.12 on OEL 6.4 > - JPRT > > Thanks, > Erik > > [0]: > http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html > [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=19056 From tim.yu at nokia-sbell.com Thu Sep 21 03:22:56 2017 From: tim.yu at nokia-sbell.com (Yu, Tim (NSB - CN/Chengdu)) Date: Thu, 21 Sep 2017 03:22:56 +0000 Subject: OpenJDK OOM issue - In-Reply-To: <7368e002-9966-4813-b144-350690cc0566@default> References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> <7368e002-9966-4813-b144-350690cc0566@default> Message-ID: Hi Poonam & Vladimir After add " -XX:-UseCompressedOops" flag, the OMM still happens. The corresponding GC log is as below and no heap is printed out. So, what's the next step to do? Please help on this and many thanks :) OpenJDK 64-Bit Server VM (25.131-b11) for linux-amd64 JRE (1.8.0_131-b11), built on Apr 13 2017 17:56:19 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) Memory: 4k page, physical 11163792k(551024k free), swap 16777212k(16722204k free) CommandLine flags: -XX:InitialHeapSize=178620672 -XX:MaxHeapSize=2857930752 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseCompressedOops -XX:+UseParallelGC Br, Tim -----Original Message----- From: Poonam Parhar [mailto:poonam.bajaj at oracle.com] Sent: Thursday, September 21, 2017 12:37 AM To: Vladimir Kozlov ; David Holmes ; Yu, Tim (NSB - CN/Chengdu) ; Andrew Haley ; hotspot-dev developers Cc: Shen, David (NSB - CN/Chengdu) Subject: RE: OpenJDK OOM issue - Hi Vladimir, > -----Original Message----- > From: Vladimir Kozlov > Sent: Wednesday, September 20, 2017 9:30 AM > To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew > Haley; hotspot-dev developers > Cc: Shen, David (NSB - CN/Chengdu) > Subject: Re: OpenJDK OOM issue - > > On Linux we should not have java heap's low address memory problem. > > Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 > kB"). > > Also 5326 processes it a lot. Overloaded system? > > Poonam, do we have bug for this? Can you attached hs_err file to it. > No, there is no bug for this. Thanks, Poonam > Thanks, > Vladimir > > On 9/20/17 9:15 AM, Poonam Parhar wrote: > > Hello Tim, > > > > From the hs_err_pid12678.log file, the java heap is based at > 0x715a00000 which is 28gb, so there should be plenty of space available > for the native heap. > > > > Memory map: > > ... > > 00600000-00601000 rw-p 00000000 fc:01 17950 > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- > 0.b11.el6_9.x86_64/jre/bin/java > > 019bc000-019dd000 rw-p 00000000 00:00 0 > [heap] > > 715a00000-71cd00000 rw-p 00000000 00:00 0 > > 71cd00000-787380000 ---p 00000000 00:00 0 ... > > > > Heap > > PSYoungGen total 51200K, used 13258K [0x0000000787380000, > 0x000000078ac80000, 0x00000007c0000000) > > eden space 44032K, 30% used > [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) > > from space 7168K, 0% used > [0x000000078a580000,0x000000078a580000,0x000000078ac80000) > > to space 7168K, 0% used > [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) > > ParOldGen total 117760K, used 0K [0x0000000715a00000, > 0x000000071cd00000, 0x0000000787380000) > > object space 117760K, 0% used > [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) > > Metaspace used 10485K, capacity 10722K, committed 11008K, > reserved 1058816K > > class space used 1125K, capacity 1227K, committed 1280K, > reserved 1048576K > > > > > > To narrow down the issue, would it be possible for you to test with - > XX:-UseCompressedOops? > > > > Thanks, > > Poonam > > > >> -----Original Message----- > >> From: David Holmes > >> Sent: Wednesday, September 20, 2017 4:43 AM > >> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; > hotspot- > >> dev developers > >> Cc: Shen, David (NSB - CN/Chengdu) > >> Subject: Re: OpenJDK OOM issue - > >> > >> Tim, > >> > >> Please note attachments get stripped from the mailing lists. > >> > >> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this > >> and leave it just on hotspot-dev. I've tried to bcc those lists. > >> > >> Thank you. > >> > >> David > >> > >> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > >>> Hi All > >>> > >>> Thank you all for the quick response. > >>> The environment information is listed as below, could you please > >>> help > >> to further check? > >>> > >>> 1. What OS is this? > >>> # cat /etc/redhat-release > >>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a > >>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 > >>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux > >>> > >>> 2.GC log is listed as below. The heap information cannot be printed > >> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- > 2017_09_20- > >> 09_21_17.log, you can see the heap begins with 0x0000000787380000 > and > >> it should be not the first 4G virtual memory address. > >>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log > >>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log > >>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log > >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log > >>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log > >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log > >>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log > >>> > >>> 3. This issue happens occasionally but frequently. We periodically > >> launch a JAVA program to use JMX to monitor service status of > another > >> JAVA service. > >>> > >>> Br, > >>> Tim > >>> > >>> > >>> > >>> > >>> -----Original Message----- > >>> From: Andrew Haley [mailto:aph at redhat.com] > >>> Sent: Tuesday, September 19, 2017 9:13 PM > >>> To: Yu, Tim (NSB - CN/Chengdu) ; > >>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net > >>> Cc: Shen, David (NSB - CN/Chengdu) > >>> Subject: Re: OpenJDK OOM issue - > >>> > >>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: > >>>> Hi OpenJDK dev group > >>>> > >>>> We meet one issue that the VM failed to initialize. The error log > >>>> is > >> as below. We checked both memory usage and thread number. They do > not > >> hit the limit. So could you please help to confirm why > >> "java.lang.OutOfMemoryError: unable to create new native thread" > >> error occurs? Many thanks. > >>> > >>> What OS is this? > >>> From zgu at redhat.com Thu Sep 21 14:07:36 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 21 Sep 2017 10:07:36 -0400 Subject: OpenJDK OOM issue - In-Reply-To: References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> <7368e002-9966-4813-b144-350690cc0566@default> Message-ID: Hi Tim, Try to run with -XX:NativeMemoryTracking=summary , this should give you some hints on native memory side. In your case, not be able to create native thread, more likely to be on native side than heap. Thanks, -Zhengyu On 09/20/2017 11:22 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > Hi Poonam & Vladimir > > After add " -XX:-UseCompressedOops" flag, the OMM still happens. The corresponding GC log is as below and no heap is printed out. So, what's the next step to do? Please help on this and many thanks :) > > OpenJDK 64-Bit Server VM (25.131-b11) for linux-amd64 JRE (1.8.0_131-b11), built on Apr 13 2017 17:56:19 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) > Memory: 4k page, physical 11163792k(551024k free), swap 16777212k(16722204k free) > CommandLine flags: -XX:InitialHeapSize=178620672 -XX:MaxHeapSize=2857930752 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseCompressedOops -XX:+UseParallelGC > > Br, > Tim > > -----Original Message----- > From: Poonam Parhar [mailto:poonam.bajaj at oracle.com] > Sent: Thursday, September 21, 2017 12:37 AM > To: Vladimir Kozlov ; David Holmes ; Yu, Tim (NSB - CN/Chengdu) ; Andrew Haley ; hotspot-dev developers > Cc: Shen, David (NSB - CN/Chengdu) > Subject: RE: OpenJDK OOM issue - > > Hi Vladimir, > >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Wednesday, September 20, 2017 9:30 AM >> To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew >> Haley; hotspot-dev developers >> Cc: Shen, David (NSB - CN/Chengdu) >> Subject: Re: OpenJDK OOM issue - >> >> On Linux we should not have java heap's low address memory problem. >> >> Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 >> kB"). >> >> Also 5326 processes it a lot. Overloaded system? >> >> Poonam, do we have bug for this? Can you attached hs_err file to it. >> > > No, there is no bug for this. > > Thanks, > Poonam > >> Thanks, >> Vladimir >> >> On 9/20/17 9:15 AM, Poonam Parhar wrote: >>> Hello Tim, >>> >>> From the hs_err_pid12678.log file, the java heap is based at >> 0x715a00000 which is 28gb, so there should be plenty of space available >> for the native heap. >>> >>> Memory map: >>> ... >>> 00600000-00601000 rw-p 00000000 fc:01 17950 >> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- >> 0.b11.el6_9.x86_64/jre/bin/java >>> 019bc000-019dd000 rw-p 00000000 00:00 0 >> [heap] >>> 715a00000-71cd00000 rw-p 00000000 00:00 0 >>> 71cd00000-787380000 ---p 00000000 00:00 0 ... >>> >>> Heap >>> PSYoungGen total 51200K, used 13258K [0x0000000787380000, >> 0x000000078ac80000, 0x00000007c0000000) >>> eden space 44032K, 30% used >> [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) >>> from space 7168K, 0% used >> [0x000000078a580000,0x000000078a580000,0x000000078ac80000) >>> to space 7168K, 0% used >> [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) >>> ParOldGen total 117760K, used 0K [0x0000000715a00000, >> 0x000000071cd00000, 0x0000000787380000) >>> object space 117760K, 0% used >> [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) >>> Metaspace used 10485K, capacity 10722K, committed 11008K, >> reserved 1058816K >>> class space used 1125K, capacity 1227K, committed 1280K, >> reserved 1048576K >>> >>> >>> To narrow down the issue, would it be possible for you to test with - >> XX:-UseCompressedOops? >>> >>> Thanks, >>> Poonam >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Wednesday, September 20, 2017 4:43 AM >>>> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; >> hotspot- >>>> dev developers >>>> Cc: Shen, David (NSB - CN/Chengdu) >>>> Subject: Re: OpenJDK OOM issue - >>>> >>>> Tim, >>>> >>>> Please note attachments get stripped from the mailing lists. >>>> >>>> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this >>>> and leave it just on hotspot-dev. I've tried to bcc those lists. >>>> >>>> Thank you. >>>> >>>> David >>>> >>>> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>> Hi All >>>>> >>>>> Thank you all for the quick response. >>>>> The environment information is listed as below, could you please >>>>> help >>>> to further check? >>>>> >>>>> 1. What OS is this? >>>>> # cat /etc/redhat-release >>>>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a >>>>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 >>>>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> 2.GC log is listed as below. The heap information cannot be printed >>>> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- >> 2017_09_20- >>>> 09_21_17.log, you can see the heap begins with 0x0000000787380000 >> and >>>> it should be not the first 4G virtual memory address. >>>>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log >>>>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log >>>>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log >>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log >>>>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log >>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log >>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log >>>>> >>>>> 3. This issue happens occasionally but frequently. We periodically >>>> launch a JAVA program to use JMX to monitor service status of >> another >>>> JAVA service. >>>>> >>>>> Br, >>>>> Tim >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Andrew Haley [mailto:aph at redhat.com] >>>>> Sent: Tuesday, September 19, 2017 9:13 PM >>>>> To: Yu, Tim (NSB - CN/Chengdu) ; >>>>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net >>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>> Subject: Re: OpenJDK OOM issue - >>>>> >>>>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>> Hi OpenJDK dev group >>>>>> >>>>>> We meet one issue that the VM failed to initialize. The error log >>>>>> is >>>> as below. We checked both memory usage and thread number. They do >> not >>>> hit the limit. So could you please help to confirm why >>>> "java.lang.OutOfMemoryError: unable to create new native thread" >>>> error occurs? Many thanks. >>>>> >>>>> What OS is this? >>>>> From kirk.pepperdine at gmail.com Thu Sep 21 14:30:13 2017 From: kirk.pepperdine at gmail.com (Kirk Pepperdine) Date: Thu, 21 Sep 2017 16:30:13 +0200 Subject: OpenJDK OOM issue - In-Reply-To: References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> <7368e002-9966-4813-b144-350690cc0566@default> Message-ID: <168E199C-EC63-49F1-97D3-2FA7D5177E3A@gmail.com> Have you tried running pmap? Kind regards, Kirk > On Sep 21, 2017, at 4:07 PM, Zhengyu Gu wrote: > > Hi Tim, > > Try to run with -XX:NativeMemoryTracking=summary , this should give you some hints on native memory side. > > In your case, not be able to create native thread, more likely to be on native side than heap. > > Thanks, > > -Zhengyu > > On 09/20/2017 11:22 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >> Hi Poonam & Vladimir >> After add " -XX:-UseCompressedOops" flag, the OMM still happens. The corresponding GC log is as below and no heap is printed out. So, what's the next step to do? Please help on this and many thanks :) >> OpenJDK 64-Bit Server VM (25.131-b11) for linux-amd64 JRE (1.8.0_131-b11), built on Apr 13 2017 17:56:19 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) >> Memory: 4k page, physical 11163792k(551024k free), swap 16777212k(16722204k free) >> CommandLine flags: -XX:InitialHeapSize=178620672 -XX:MaxHeapSize=2857930752 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseCompressedOops -XX:+UseParallelGC >> Br, >> Tim >> -----Original Message----- >> From: Poonam Parhar [mailto:poonam.bajaj at oracle.com] >> Sent: Thursday, September 21, 2017 12:37 AM >> To: Vladimir Kozlov ; David Holmes ; Yu, Tim (NSB - CN/Chengdu) ; Andrew Haley ; hotspot-dev developers >> Cc: Shen, David (NSB - CN/Chengdu) >> Subject: RE: OpenJDK OOM issue - >> Hi Vladimir, >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Wednesday, September 20, 2017 9:30 AM >>> To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew >>> Haley; hotspot-dev developers >>> Cc: Shen, David (NSB - CN/Chengdu) >>> Subject: Re: OpenJDK OOM issue - >>> >>> On Linux we should not have java heap's low address memory problem. >>> >>> Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 >>> kB"). >>> >>> Also 5326 processes it a lot. Overloaded system? >>> >>> Poonam, do we have bug for this? Can you attached hs_err file to it. >>> >> No, there is no bug for this. >> Thanks, >> Poonam >>> Thanks, >>> Vladimir >>> >>> On 9/20/17 9:15 AM, Poonam Parhar wrote: >>>> Hello Tim, >>>> >>>> From the hs_err_pid12678.log file, the java heap is based at >>> 0x715a00000 which is 28gb, so there should be plenty of space available >>> for the native heap. >>>> >>>> Memory map: >>>> ... >>>> 00600000-00601000 rw-p 00000000 fc:01 17950 >>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- >>> 0.b11.el6_9.x86_64/jre/bin/java >>>> 019bc000-019dd000 rw-p 00000000 00:00 0 >>> [heap] >>>> 715a00000-71cd00000 rw-p 00000000 00:00 0 >>>> 71cd00000-787380000 ---p 00000000 00:00 0 ... >>>> >>>> Heap >>>> PSYoungGen total 51200K, used 13258K [0x0000000787380000, >>> 0x000000078ac80000, 0x00000007c0000000) >>>> eden space 44032K, 30% used >>> [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) >>>> from space 7168K, 0% used >>> [0x000000078a580000,0x000000078a580000,0x000000078ac80000) >>>> to space 7168K, 0% used >>> [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) >>>> ParOldGen total 117760K, used 0K [0x0000000715a00000, >>> 0x000000071cd00000, 0x0000000787380000) >>>> object space 117760K, 0% used >>> [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) >>>> Metaspace used 10485K, capacity 10722K, committed 11008K, >>> reserved 1058816K >>>> class space used 1125K, capacity 1227K, committed 1280K, >>> reserved 1048576K >>>> >>>> >>>> To narrow down the issue, would it be possible for you to test with - >>> XX:-UseCompressedOops? >>>> >>>> Thanks, >>>> Poonam >>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Wednesday, September 20, 2017 4:43 AM >>>>> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; >>> hotspot- >>>>> dev developers >>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>> Subject: Re: OpenJDK OOM issue - >>>>> >>>>> Tim, >>>>> >>>>> Please note attachments get stripped from the mailing lists. >>>>> >>>>> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this >>>>> and leave it just on hotspot-dev. I've tried to bcc those lists. >>>>> >>>>> Thank you. >>>>> >>>>> David >>>>> >>>>> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>> Hi All >>>>>> >>>>>> Thank you all for the quick response. >>>>>> The environment information is listed as below, could you please >>>>>> help >>>>> to further check? >>>>>> >>>>>> 1. What OS is this? >>>>>> # cat /etc/redhat-release >>>>>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a >>>>>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 >>>>>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>>>> >>>>>> 2.GC log is listed as below. The heap information cannot be printed >>>>> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- >>> 2017_09_20- >>>>> 09_21_17.log, you can see the heap begins with 0x0000000787380000 >>> and >>>>> it should be not the first 4G virtual memory address. >>>>>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log >>>>>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log >>>>>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log >>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log >>>>>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log >>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log >>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log >>>>>> >>>>>> 3. This issue happens occasionally but frequently. We periodically >>>>> launch a JAVA program to use JMX to monitor service status of >>> another >>>>> JAVA service. >>>>>> >>>>>> Br, >>>>>> Tim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Andrew Haley [mailto:aph at redhat.com] >>>>>> Sent: Tuesday, September 19, 2017 9:13 PM >>>>>> To: Yu, Tim (NSB - CN/Chengdu) ; >>>>>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net >>>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>>> Subject: Re: OpenJDK OOM issue - >>>>>> >>>>>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>>> Hi OpenJDK dev group >>>>>>> >>>>>>> We meet one issue that the VM failed to initialize. The error log >>>>>>> is >>>>> as below. We checked both memory usage and thread number. They do >>> not >>>>> hit the limit. So could you please help to confirm why >>>>> "java.lang.OutOfMemoryError: unable to create new native thread" >>>>> error occurs? Many thanks. >>>>>> >>>>>> What OS is this? >>>>>> From vladimir.kozlov at oracle.com Thu Sep 21 15:41:04 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 21 Sep 2017 08:41:04 -0700 Subject: OpenJDK OOM issue - In-Reply-To: References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> <7368e002-9966-4813-b144-350690cc0566@default> Message-ID: <9b332f0d-de95-b630-8f0e-4f3e4139dfaa@oracle.com> Okay, swap size is fine. No problems with flags too. Are you using some kind of container? Memory size is strange (not power of 2) - 11163792k. And almost all of it is used. Can you stop other processes which use memory? May be something related to available large pages? How many physical threads/cores system has? Also look on what is loaded on system. Almost all of You can try to reduce memory used by JVM: - reduce metaspace/class space - you may not need 1Gb for that, - reduce number of GC and JIT compiler threads (to reduce memory for stacks and thread's stuctures), - try to switch off Tiered compilation - it will reduce codecache size and number of JIT compiler threads. Regards, Vladimir On 9/20/17 8:22 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > Hi Poonam & Vladimir > > After add " -XX:-UseCompressedOops" flag, the OMM still happens. The corresponding GC log is as below and no heap is printed out. So, what's the next step to do? Please help on this and many thanks :) > > OpenJDK 64-Bit Server VM (25.131-b11) for linux-amd64 JRE (1.8.0_131-b11), built on Apr 13 2017 17:56:19 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) > Memory: 4k page, physical 11163792k(551024k free), swap 16777212k(16722204k free) > CommandLine flags: -XX:InitialHeapSize=178620672 -XX:MaxHeapSize=2857930752 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseCompressedOops -XX:+UseParallelGC > > Br, > Tim > > -----Original Message----- > From: Poonam Parhar [mailto:poonam.bajaj at oracle.com] > Sent: Thursday, September 21, 2017 12:37 AM > To: Vladimir Kozlov ; David Holmes ; Yu, Tim (NSB - CN/Chengdu) ; Andrew Haley ; hotspot-dev developers > Cc: Shen, David (NSB - CN/Chengdu) > Subject: RE: OpenJDK OOM issue - > > Hi Vladimir, > >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Wednesday, September 20, 2017 9:30 AM >> To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew >> Haley; hotspot-dev developers >> Cc: Shen, David (NSB - CN/Chengdu) >> Subject: Re: OpenJDK OOM issue - >> >> On Linux we should not have java heap's low address memory problem. >> >> Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 >> kB"). >> >> Also 5326 processes it a lot. Overloaded system? >> >> Poonam, do we have bug for this? Can you attached hs_err file to it. >> > > No, there is no bug for this. > > Thanks, > Poonam > >> Thanks, >> Vladimir >> >> On 9/20/17 9:15 AM, Poonam Parhar wrote: >>> Hello Tim, >>> >>> From the hs_err_pid12678.log file, the java heap is based at >> 0x715a00000 which is 28gb, so there should be plenty of space available >> for the native heap. >>> >>> Memory map: >>> ... >>> 00600000-00601000 rw-p 00000000 fc:01 17950 >> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- >> 0.b11.el6_9.x86_64/jre/bin/java >>> 019bc000-019dd000 rw-p 00000000 00:00 0 >> [heap] >>> 715a00000-71cd00000 rw-p 00000000 00:00 0 >>> 71cd00000-787380000 ---p 00000000 00:00 0 ... >>> >>> Heap >>> PSYoungGen total 51200K, used 13258K [0x0000000787380000, >> 0x000000078ac80000, 0x00000007c0000000) >>> eden space 44032K, 30% used >> [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) >>> from space 7168K, 0% used >> [0x000000078a580000,0x000000078a580000,0x000000078ac80000) >>> to space 7168K, 0% used >> [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) >>> ParOldGen total 117760K, used 0K [0x0000000715a00000, >> 0x000000071cd00000, 0x0000000787380000) >>> object space 117760K, 0% used >> [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) >>> Metaspace used 10485K, capacity 10722K, committed 11008K, >> reserved 1058816K >>> class space used 1125K, capacity 1227K, committed 1280K, >> reserved 1048576K >>> >>> >>> To narrow down the issue, would it be possible for you to test with - >> XX:-UseCompressedOops? >>> >>> Thanks, >>> Poonam >>> >>>> -----Original Message----- >>>> From: David Holmes >>>> Sent: Wednesday, September 20, 2017 4:43 AM >>>> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; >> hotspot- >>>> dev developers >>>> Cc: Shen, David (NSB - CN/Chengdu) >>>> Subject: Re: OpenJDK OOM issue - >>>> >>>> Tim, >>>> >>>> Please note attachments get stripped from the mailing lists. >>>> >>>> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this >>>> and leave it just on hotspot-dev. I've tried to bcc those lists. >>>> >>>> Thank you. >>>> >>>> David >>>> >>>> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>> Hi All >>>>> >>>>> Thank you all for the quick response. >>>>> The environment information is listed as below, could you please >>>>> help >>>> to further check? >>>>> >>>>> 1. What OS is this? >>>>> # cat /etc/redhat-release >>>>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a >>>>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 >>>>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> 2.GC log is listed as below. The heap information cannot be printed >>>> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- >> 2017_09_20- >>>> 09_21_17.log, you can see the heap begins with 0x0000000787380000 >> and >>>> it should be not the first 4G virtual memory address. >>>>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log >>>>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log >>>>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log >>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log >>>>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log >>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log >>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log >>>>> >>>>> 3. This issue happens occasionally but frequently. We periodically >>>> launch a JAVA program to use JMX to monitor service status of >> another >>>> JAVA service. >>>>> >>>>> Br, >>>>> Tim >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Andrew Haley [mailto:aph at redhat.com] >>>>> Sent: Tuesday, September 19, 2017 9:13 PM >>>>> To: Yu, Tim (NSB - CN/Chengdu) ; >>>>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net >>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>> Subject: Re: OpenJDK OOM issue - >>>>> >>>>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>> Hi OpenJDK dev group >>>>>> >>>>>> We meet one issue that the VM failed to initialize. The error log >>>>>> is >>>> as below. We checked both memory usage and thread number. They do >> not >>>> hit the limit. So could you please help to confirm why >>>> "java.lang.OutOfMemoryError: unable to create new native thread" >>>> error occurs? Many thanks. >>>>> >>>>> What OS is this? >>>>> From rohitarulraj at gmail.com Fri Sep 22 07:41:24 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 22 Sep 2017 13:11:24 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> <11af0f62-ba6b-d533-d23c-750d2ca012c7@oracle.com> Message-ID: Thanks Vladimir, On Wed, Sep 20, 2017 at 10:07 PM, Vladimir Kozlov wrote: >> __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? >> __ jcc(Assembler::belowEqual, done); >> __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? >> - __ jccb(Assembler::belowEqual, ext_cpuid1); >> + __ jcc(Assembler::belowEqual, ext_cpuid1); > > > Good. You may need to increase size of the buffer too (to be safe) to 1100: > > static const int stub_size = 1000; > Please find the updated patch after the requested change. diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -46,7 +46,7 @@ address VM_Version::_cpuinfo_cont_addr = 0; static BufferBlob* stub_blob; -static const int stub_size = 1000; +static const int stub_size = 1100; extern "C" { typedef void (*get_cpu_info_stub_t)(void*); @@ -70,7 +70,7 @@ bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, done, wrapup; + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, ext_cpuid8, done, wrapup; Label legacy_setup, save_restore_except, legacy_save_restore, start_simd_check; StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); @@ -267,14 +267,30 @@ __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? __ jcc(Assembler::belowEqual, done); __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? - __ jccb(Assembler::belowEqual, ext_cpuid1); + __ jcc(Assembler::belowEqual, ext_cpuid1); __ cmpl(rax, 0x80000006); // Is cpuid(0x80000007) supported? __ jccb(Assembler::belowEqual, ext_cpuid5); __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? __ jccb(Assembler::belowEqual, ext_cpuid7); + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) supported? + __ jccb(Assembler::belowEqual, ext_cpuid8); + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? + __ jccb(Assembler::below, ext_cpuid8); + // + // Extended cpuid(0x8000001E) + // + __ movl(rax, 0x8000001E); + __ cpuid(); + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid1E_offset()))); + __ movl(Address(rsi, 0), rax); + __ movl(Address(rsi, 4), rbx); + __ movl(Address(rsi, 8), rcx); + __ movl(Address(rsi,12), rdx); + // // Extended cpuid(0x80000008) // + __ bind(ext_cpuid8); __ movl(rax, 0x80000008); __ cpuid(); __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); @@ -1109,11 +1125,27 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); + } +#ifdef COMPILER2 + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -228,6 +228,15 @@ } bits; }; + union ExtCpuid1EEbx { + uint32_t value; + struct { + uint32_t : 8, + threads_per_core : 8, + : 16; + } bits; + }; + union XemXcr0Eax { uint32_t value; struct { @@ -398,6 +407,12 @@ ExtCpuid8Ecx ext_cpuid8_ecx; uint32_t ext_cpuid8_edx; // reserved + // cpuid function 0x8000001E // AMD 17h + uint32_t ext_cpuid1E_eax; + ExtCpuid1EEbx ext_cpuid1E_ebx; // threads per core (AMD17h) + uint32_t ext_cpuid1E_ecx; + uint32_t ext_cpuid1E_edx; // unused currently + // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) XemXcr0Eax xem_xcr0_eax; uint32_t xem_xcr0_edx; // reserved @@ -505,6 +520,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -518,16 +541,8 @@ } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; @@ -590,6 +605,7 @@ static ByteSize ext_cpuid5_offset() { return byte_offset_of(CpuidInfo, ext_cpuid5_eax); } static ByteSize ext_cpuid7_offset() { return byte_offset_of(CpuidInfo, ext_cpuid7_eax); } static ByteSize ext_cpuid8_offset() { return byte_offset_of(CpuidInfo, ext_cpuid8_eax); } + static ByteSize ext_cpuid1E_offset() { return byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } static ByteSize tpl_cpuidB0_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } static ByteSize tpl_cpuidB1_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } static ByteSize tpl_cpuidB2_offset() { return byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } @@ -673,8 +689,12 @@ if (is_intel() && supports_processor_topology()) { result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / - cores_per_cpu(); + if (cpu_family() >= 0x17) { + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; + } else { + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / + cores_per_cpu(); + } } return (result == 0 ? 1 : result); } Regards, Rohit >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -70,7 +70,7 @@ >> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >> >> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >> done, wrapup; >> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >> ext_cpuid8, done, wrapup; >> Label legacy_setup, save_restore_except, legacy_save_restore, >> start_simd_check; >> >> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >> @@ -267,14 +267,30 @@ >> __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? >> __ jcc(Assembler::belowEqual, done); >> __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? >> - __ jccb(Assembler::belowEqual, ext_cpuid1); >> + __ jcc(Assembler::belowEqual, ext_cpuid1); >> __ cmpl(rax, 0x80000006); // Is cpuid(0x80000007) supported? >> __ jccb(Assembler::belowEqual, ext_cpuid5); >> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >> __ jccb(Assembler::belowEqual, ext_cpuid7); >> + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) >> supported? >> + __ jccb(Assembler::belowEqual, ext_cpuid8); >> + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? >> + __ jccb(Assembler::below, ext_cpuid8); >> + // >> + // Extended cpuid(0x8000001E) >> + // >> + __ movl(rax, 0x8000001E); >> + __ cpuid(); >> + __ lea(rsi, Address(rbp, >> in_bytes(VM_Version::ext_cpuid1E_offset()))); >> + __ movl(Address(rsi, 0), rax); >> + __ movl(Address(rsi, 4), rbx); >> + __ movl(Address(rsi, 8), rcx); >> + __ movl(Address(rsi,12), rdx); >> + >> // >> // Extended cpuid(0x80000008) >> // >> + __ bind(ext_cpuid8); >> __ movl(rax, 0x80000008); >> __ cpuid(); >> __ lea(rsi, Address(rbp, >> in_bytes(VM_Version::ext_cpuid8_offset()))); >> @@ -1109,11 +1125,27 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -228,6 +228,15 @@ >> } bits; >> }; >> >> + union ExtCpuid1EEbx { >> + uint32_t value; >> + struct { >> + uint32_t : 8, >> + threads_per_core : 8, >> + : 16; >> + } bits; >> + }; >> + >> union XemXcr0Eax { >> uint32_t value; >> struct { >> @@ -398,6 +407,12 @@ >> ExtCpuid8Ecx ext_cpuid8_ecx; >> uint32_t ext_cpuid8_edx; // reserved >> >> + // cpuid function 0x8000001E // AMD 17h >> + uint32_t ext_cpuid1E_eax; >> + ExtCpuid1EEbx ext_cpuid1E_ebx; // threads per core (AMD17h) >> + uint32_t ext_cpuid1E_ecx; >> + uint32_t ext_cpuid1E_edx; // unused currently >> + >> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >> register) >> XemXcr0Eax xem_xcr0_eax; >> uint32_t xem_xcr0_edx; // reserved >> @@ -505,6 +520,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -518,16 +541,8 @@ >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> @@ -590,6 +605,7 @@ >> static ByteSize ext_cpuid5_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >> static ByteSize ext_cpuid7_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >> static ByteSize ext_cpuid8_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >> + static ByteSize ext_cpuid1E_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >> static ByteSize tpl_cpuidB0_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >> static ByteSize tpl_cpuidB1_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >> static ByteSize tpl_cpuidB2_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >> @@ -673,8 +689,12 @@ >> if (is_intel() && supports_processor_topology()) { >> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >> - cores_per_cpu(); >> + if (cpu_family() >= 0x17) { >> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; >> + } else { >> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >> + cores_per_cpu(); >> + } >> } >> return (result == 0 ? 1 : result); >> } >> >> Please let me know your comments. >> Thanks for your review. >> >> Regards, >> Rohit >> >>> >>> >>> On 9/11/17 9:52 PM, Rohit Arul Raj wrote: >>>> >>>> >>>> Hello David, >>>> >>>>>> >>>>>> >>>>>> 1. ExtCpuid1EEx >>>>>> >>>>>> Should this be ExtCpuid1EEbx? (I see the naming here is somewhat >>>>>> inconsistent - and potentially confusing: I would have preferred to >>>>>> see >>>>>> things like ExtCpuid_1E_Ebx, to make it clear.) >>>>> >>>>> >>>>> >>>>> Yes, I can change it accordingly. >>>>> >>>> >>>> I have attached the updated, re-tested patch as per your comments above. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -70,7 +70,7 @@ >>>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>> >>>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>> done, wrapup; >>>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>> ext_cpuid8, done, wrapup; >>>> Label legacy_setup, save_restore_except, legacy_save_restore, >>>> start_simd_check; >>>> >>>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>> @@ -272,9 +272,23 @@ >>>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>>> + // >>>> + // Extended cpuid(0x8000001E) >>>> + // >>>> + __ movl(rax, 0x8000001E); >>>> + __ cpuid(); >>>> + __ lea(rsi, Address(rbp, >>>> in_bytes(VM_Version::ext_cpuid_1E_offset()))); >>>> + __ movl(Address(rsi, 0), rax); >>>> + __ movl(Address(rsi, 4), rbx); >>>> + __ movl(Address(rsi, 8), rcx); >>>> + __ movl(Address(rsi,12), rdx); >>>> + >>>> // >>>> // Extended cpuid(0x80000008) >>>> // >>>> + __ bind(ext_cpuid8); >>>> __ movl(rax, 0x80000008); >>>> __ cpuid(); >>>> __ lea(rsi, Address(rbp, >>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>> @@ -1109,11 +1123,27 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -228,6 +228,15 @@ >>>> } bits; >>>> }; >>>> >>>> + union ExtCpuid_1E_Ebx { >>>> + uint32_t value; >>>> + struct { >>>> + uint32_t : 8, >>>> + threads_per_core : 8, >>>> + : 16; >>>> + } bits; >>>> + }; >>>> + >>>> union XemXcr0Eax { >>>> uint32_t value; >>>> struct { >>>> @@ -398,6 +407,12 @@ >>>> ExtCpuid8Ecx ext_cpuid8_ecx; >>>> uint32_t ext_cpuid8_edx; // reserved >>>> >>>> + // cpuid function 0x8000001E // AMD 17h >>>> + uint32_t ext_cpuid_1E_eax; >>>> + ExtCpuid_1E_Ebx ext_cpuid_1E_ebx; // threads per core (AMD17h) >>>> + uint32_t ext_cpuid_1E_ecx; >>>> + uint32_t ext_cpuid_1E_edx; // unused currently >>>> + >>>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>> register) >>>> XemXcr0Eax xem_xcr0_eax; >>>> uint32_t xem_xcr0_edx; // reserved >>>> @@ -505,6 +520,14 @@ >>>> result |= CPU_CLMUL; >>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> result |= CPU_RTM; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> >>>> // AMD features. >>>> if (is_amd()) { >>>> @@ -518,16 +541,8 @@ >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> - result |= CPU_ADX; >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> - result |= CPU_BMI2; >>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> - result |= CPU_SHA; >>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> result |= CPU_LZCNT; >>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> - result |= CPU_FMA; >>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> result |= CPU_3DNOW_PREFETCH; >>>> @@ -590,6 +605,7 @@ >>>> static ByteSize ext_cpuid5_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>> static ByteSize ext_cpuid7_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>> static ByteSize ext_cpuid8_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>> + static ByteSize ext_cpuid_1E_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid_1E_eax); } >>>> static ByteSize tpl_cpuidB0_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>> static ByteSize tpl_cpuidB1_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>> static ByteSize tpl_cpuidB2_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>> @@ -673,8 +689,11 @@ >>>> if (is_intel() && supports_processor_topology()) { >>>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>> - cores_per_cpu(); >>>> + if (cpu_family() >= 0x17) >>>> + result = _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + >>>> 1; >>>> + else >>>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>> + cores_per_cpu(); >>>> } >>>> return (result == 0 ? 1 : result); >>>> } >>>> >>>> >>>> Please let me know your comments >>>> >>>> Thanks for your time. >>>> >>>> Regards, >>>> Rohit >>>> >>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>> >>>>>>> Reference: >>>>>>> >>>>>>> >>>>>>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >>>>>>> [Pg 82] >>>>>>> >>>>>>> CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >>>>>>> 15:8 ThreadsPerCore: threads per core. Read-only. Reset: >>>>>>> XXh. >>>>>>> The number of threads per core is ThreadsPerCore+1. >>>>>>> >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> @@ -70,7 +70,7 @@ >>>>>>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>>>>> >>>>>>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>>>>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>>>> done, wrapup; >>>>>>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>>>> ext_cpuid8, done, wrapup; >>>>>>> Label legacy_setup, save_restore_except, legacy_save_restore, >>>>>>> start_simd_check; >>>>>>> >>>>>>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>>>>> @@ -272,9 +272,23 @@ >>>>>>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>>>>>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) >>>>>>> supported? >>>>>>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>>>>>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>>>>>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>>>>>> + // >>>>>>> + // Extended cpuid(0x8000001E) >>>>>>> + // >>>>>>> + __ movl(rax, 0x8000001E); >>>>>>> + __ cpuid(); >>>>>>> + __ lea(rsi, Address(rbp, >>>>>>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>>>>>> + __ movl(Address(rsi, 0), rax); >>>>>>> + __ movl(Address(rsi, 4), rbx); >>>>>>> + __ movl(Address(rsi, 8), rcx); >>>>>>> + __ movl(Address(rsi,12), rdx); >>>>>>> + >>>>>>> // >>>>>>> // Extended cpuid(0x80000008) >>>>>>> // >>>>>>> + __ bind(ext_cpuid8); >>>>>>> __ movl(rax, 0x80000008); >>>>>>> __ cpuid(); >>>>>>> __ lea(rsi, Address(rbp, >>>>>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>>>>> @@ -1109,11 +1123,27 @@ >>>>>>> } >>>>>>> >>>>>>> #ifdef COMPILER2 >>>>>>> - if (MaxVectorSize > 16) { >>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>> } >>>>>>> #endif // COMPILER2 >>>>>>> + >>>>>>> + // Some defaults for AMD family 17h >>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>> for >>>>>>> Array Copy >>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>> + } >>>>>>> + if (supports_sse2() && >>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>> { >>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>> + } >>>>>>> +#ifdef COMPILER2 >>>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>> + } >>>>>>> +#endif >>>>>>> + } >>>>>>> } >>>>>>> >>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> @@ -228,6 +228,15 @@ >>>>>>> } bits; >>>>>>> }; >>>>>>> >>>>>>> + union ExtCpuid1EEx { >>>>>>> + uint32_t value; >>>>>>> + struct { >>>>>>> + uint32_t : 8, >>>>>>> + threads_per_core : 8, >>>>>>> + : 16; >>>>>>> + } bits; >>>>>>> + }; >>>>>>> + >>>>>>> union XemXcr0Eax { >>>>>>> uint32_t value; >>>>>>> struct { >>>>>>> @@ -398,6 +407,12 @@ >>>>>>> ExtCpuid8Ecx ext_cpuid8_ecx; >>>>>>> uint32_t ext_cpuid8_edx; // reserved >>>>>>> >>>>>>> + // cpuid function 0x8000001E // AMD 17h >>>>>>> + uint32_t ext_cpuid1E_eax; >>>>>>> + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >>>>>>> + uint32_t ext_cpuid1E_ecx; >>>>>>> + uint32_t ext_cpuid1E_edx; // unused currently >>>>>>> + >>>>>>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>>>>> register) >>>>>>> XemXcr0Eax xem_xcr0_eax; >>>>>>> uint32_t xem_xcr0_edx; // reserved >>>>>>> @@ -505,6 +520,14 @@ >>>>>>> result |= CPU_CLMUL; >>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>> result |= CPU_RTM; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> + result |= CPU_ADX; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> + result |= CPU_BMI2; >>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> + result |= CPU_SHA; >>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> + result |= CPU_FMA; >>>>>>> >>>>>>> // AMD features. >>>>>>> if (is_amd()) { >>>>>>> @@ -518,16 +541,8 @@ >>>>>>> } >>>>>>> // Intel features. >>>>>>> if(is_intel()) { >>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> - result |= CPU_ADX; >>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> - result |= CPU_BMI2; >>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> - result |= CPU_SHA; >>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>> result |= CPU_LZCNT; >>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> - result |= CPU_FMA; >>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>> support for prefetchw >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>> @@ -590,6 +605,7 @@ >>>>>>> static ByteSize ext_cpuid5_offset() { return >>>>>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>>>>> static ByteSize ext_cpuid7_offset() { return >>>>>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>>>>> static ByteSize ext_cpuid8_offset() { return >>>>>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>>>>> + static ByteSize ext_cpuid1E_offset() { return >>>>>>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>>>>>> static ByteSize tpl_cpuidB0_offset() { return >>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>>>>> static ByteSize tpl_cpuidB1_offset() { return >>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>>>>> static ByteSize tpl_cpuidB2_offset() { return >>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>>>>> @@ -673,8 +689,11 @@ >>>>>>> if (is_intel() && supports_processor_topology()) { >>>>>>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>>>>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>>>>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>>> - cores_per_cpu(); >>>>>>> + if (cpu_family() >= 0x17) >>>>>>> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + >>>>>>> 1; >>>>>>> + else >>>>>>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>>> + cores_per_cpu(); >>>>>>> } >>>>>>> return (result == 0 ? 1 : result); >>>>>>> } >>>>>>> >>>>>>> I have attached the patch for review. >>>>>>> Please let me know your comments. >>>>>>> >>>>>>> Thanks, >>>>>>> Rohit >>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> >>>>>>>>> No comments on AMD specific changes. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hello David, >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>> >>>>>>>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot >>>>>>>>>>>> repo. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I checked out the latest jdk10/hs/hotspot [parent: >>>>>>>>>>> 13548:1a9c2e07a826] >>>>>>>>>>> and was able to apply the patch >>>>>>>>>>> [epyc-amd17h-defaults-3Sept.patch] >>>>>>>>>>> without any issues. >>>>>>>>>>> Can you share the error message that you are getting? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I was getting this: >>>>>>>>>> >>>>>>>>>> applying hotspot.patch >>>>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> Hunk #1 FAILED at 1108 >>>>>>>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>> Hunk #2 FAILED at 522 >>>>>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>>>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>>>>>>> abort: patch failed to apply >>>>>>>>>> >>>>>>>>>> but I started again and this time it applied fine, so not sure >>>>>>>>>> what >>>>>>>>>> was >>>>>>>>>> going on there. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Rohit >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Changes look good. Only question I have is about >>>>>>>>>>>>>>>> MaxVectorSize. >>>>>>>>>>>>>>>> It >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> set >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for >>>>>>>>>>>>>>> AMD >>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>> So >>>>>>>>>>>>>>> I have removed the surplus check for MaxVectorSize from my >>>>>>>>>>>>>>> patch. >>>>>>>>>>>>>>> I >>>>>>>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Which check you removed? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> My older patch had the below mentioned check which was required >>>>>>>>>>>>> on >>>>>>>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been >>>>>>>>>>>>> handled >>>>>>>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>>>>>>> >>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>> ... >>>>>>>>>>>>> ... >>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>> + } >>>>>>>>>>>>> .. >>>>>>>>>>>>> .. >>>>>>>>>>>>> + } >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag >>>>>>>>>>>>>>> gets >>>>>>>>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>>>>>>>>>> there >>>>>>>>>>>>>>> an >>>>>>>>>>>>>>> underlying reason for this? I have handled this in the patch >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>> just >>>>>>>>>>>>>>> wanted to confirm. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>>>>>>> instructions >>>>>>>>>>>>>> to >>>>>>>>>>>>>> calculate SHA-256: >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't know if AMD 15h supports these instructions and can >>>>>>>>>>>>>> execute >>>>>>>>>>>>>> that >>>>>>>>>>>>>> code. You need to test it. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 >>>>>>>>>>>>> instructions, >>>>>>>>>>>>> it should work. >>>>>>>>>>>>> Confirmed by running following sanity tests: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>>>>>>> >>>>>>>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>>>>>>> >>>>>>>>>>>>> Please find attached updated, re-tested patch. >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>> } >>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>> + >>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>> for >>>>>>>>>>>>> Array Copy >>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>> + if (supports_sse4_2() && >>>>>>>>>>>>> FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>>>>>>> { >>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>> + } >>>>>>>>>>>>> +#endif >>>>>>>>>>>>> + } >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>> >>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>> } >>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != >>>>>>>>>>>>> 0) >>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>> indicates >>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>>> 0) >>>>>>>>>>>>> { >>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>> >>>>>>>>>>>>> Please let me know your comments. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for your time. >>>>>>>>>>>>> Rohit >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>> || >>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel >>>>>>>>>>>>>>> != >>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse >>>>>>>>>>>>>>> != >>>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already >>>>>>>>>>>>>>>>>>> see >>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> lot of >>>>>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source >>>>>>>>>>>>>>>>>> base, >>>>>>>>>>>>>>>>>> test >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg >>>>>>>>>>>>>>>>> ($make >>>>>>>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can anyone please volunteer to review this patch which >>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>> flag/ISA >>>>>>>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ************************* Patch >>>>>>>>>>>>>>>>> **************************** >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>> + warning("SHA instructions are not available on >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != >>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel >>>>>>>>>>>>>>>>> != 0) >>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit >>>>>>>>>>>>>>>>> 8) >>>>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>> (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse >>>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ************************************************************** >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I would like an volunteer to review this patch >>>>>>>>>>>>>>>>>>>>>> (openJDK9) >>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor >>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for >>>>>>>>>>>>>>>>>>>>>> reference. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the >>>>>>>>>>>>>>>>>>>>> mail >>>>>>>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you >>>>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to >>>>>>>>>>>>>>>>>>>>> comment >>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, >>>>>>>>>>>>>>>>>>>> false); >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>>> + warning("SHA instructions are not available on >>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < >>>>>>>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a >>>>>>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >>>>> > From bob.vandette at oracle.com Fri Sep 22 14:27:11 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Fri, 22 Sep 2017 10:27:11 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage Message-ID: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> Please review these changes that improve on docker container detection and the automatic configuration of the number of active CPUs and total and free memory based on the containers resource limitation settings and metric data files. http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ These changes are enabled with -XX:+UseContainerSupport. You can enable logging for this support via -Xlog:os+container=trace. Since the dynamic selection of CPUs based on cpusets, quotas and shares may not satisfy every users needs, I?ve added an additional flag to allow the number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. Bob. From coleen.phillimore at oracle.com Fri Sep 22 20:33:44 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 22 Sep 2017 16:33:44 -0400 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> Message-ID: Harold pointed out privately that having ConstantPool::resolved_references() sometimes return NULL forces the callers to check for null for safety, so I added a resolved_references_or_null() call for the heap dumper to call instead. Please review this new version, also small: open webrev at http://cr.openjdk.java.net/~coleenp/8081323.02/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8081323 Reran tier1 and heapdump tests. Thanks, Coleen On 9/5/17 12:50 PM, coleen.phillimore at oracle.com wrote: > > Thank you, Serguei! > Coleen > > On 9/1/17 6:48 PM, serguei.spitsyn at oracle.com wrote: >> Hi Coleen, >> >> The fix looks good. >> >> Thanks, >> Serguei >> >> >> On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: >>> Summary: Add resolved_references and init_lock as hidden static >>> field in class so root is found. >>> >>> Tested manually with YourKit.? See bug for images.?? Also ran >>> serviceability tests. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >>> >>> Thanks, >>> Coleen >>> >> > From harold.seigel at oracle.com Fri Sep 22 20:36:19 2017 From: harold.seigel at oracle.com (harold seigel) Date: Fri, 22 Sep 2017 16:36:19 -0400 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> Message-ID: <87da3a46-41dd-1d45-b18d-99cbd45ec4e7@oracle.com> Hi Coleen, It looks good.? Thanks for fixing it. Harold On 9/22/2017 4:33 PM, coleen.phillimore at oracle.com wrote: > > Harold pointed out privately that having > ConstantPool::resolved_references() sometimes return NULL forces the > callers to check for null for safety, so I added a > resolved_references_or_null() call for the heap dumper to call instead. > > Please review this new version, also small: > > open webrev at http://cr.openjdk.java.net/~coleenp/8081323.02/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8081323 > > Reran tier1 and heapdump tests. > > Thanks, > Coleen > > On 9/5/17 12:50 PM, coleen.phillimore at oracle.com wrote: >> >> Thank you, Serguei! >> Coleen >> >> On 9/1/17 6:48 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Coleen, >>> >>> The fix looks good. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: >>>> Summary: Add resolved_references and init_lock as hidden static >>>> field in class so root is found. >>>> >>>> Tested manually with YourKit.? See bug for images.?? Also ran >>>> serviceability tests. >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >>>> >>>> Thanks, >>>> Coleen >>>> >>> >> > From coleen.phillimore at oracle.com Fri Sep 22 20:56:13 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 22 Sep 2017 16:56:13 -0400 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: <87da3a46-41dd-1d45-b18d-99cbd45ec4e7@oracle.com> References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> <87da3a46-41dd-1d45-b18d-99cbd45ec4e7@oracle.com> Message-ID: Thanks, Harold! Coleen On 9/22/17 4:36 PM, harold seigel wrote: > Hi Coleen, > > It looks good.? Thanks for fixing it. > > Harold > > > On 9/22/2017 4:33 PM, coleen.phillimore at oracle.com wrote: >> >> Harold pointed out privately that having >> ConstantPool::resolved_references() sometimes return NULL forces the >> callers to check for null for safety, so I added a >> resolved_references_or_null() call for the heap dumper to call instead. >> >> Please review this new version, also small: >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.02/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >> >> Reran tier1 and heapdump tests. >> >> Thanks, >> Coleen >> >> On 9/5/17 12:50 PM, coleen.phillimore at oracle.com wrote: >>> >>> Thank you, Serguei! >>> Coleen >>> >>> On 9/1/17 6:48 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Coleen, >>>> >>>> The fix looks good. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: >>>>> Summary: Add resolved_references and init_lock as hidden static >>>>> field in class so root is found. >>>>> >>>>> Tested manually with YourKit.? See bug for images.?? Also ran >>>>> serviceability tests. >>>>> >>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >>>>> >>>>> Thanks, >>>>> Coleen >>>>> >>>> >>> >> > From jiangli.zhou at oracle.com Fri Sep 22 22:33:48 2017 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Fri, 22 Sep 2017 15:33:48 -0700 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> Message-ID: <7029FEDD-16DA-40E2-8FDB-97146C2D5D16@oracle.com> Hi Coleen, This looks good. Thanks, Jiangli > On Sep 22, 2017, at 1:33 PM, coleen.phillimore at oracle.com wrote: > > > Harold pointed out privately that having ConstantPool::resolved_references() sometimes return NULL forces the callers to check for null for safety, so I added a resolved_references_or_null() call for the heap dumper to call instead. > > Please review this new version, also small: > > open webrev at http://cr.openjdk.java.net/~coleenp/8081323.02/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8081323 > > Reran tier1 and heapdump tests. > > Thanks, > Coleen > > On 9/5/17 12:50 PM, coleen.phillimore at oracle.com wrote: >> >> Thank you, Serguei! >> Coleen >> >> On 9/1/17 6:48 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Coleen, >>> >>> The fix looks good. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: >>>> Summary: Add resolved_references and init_lock as hidden static field in class so root is found. >>>> >>>> Tested manually with YourKit. See bug for images. Also ran serviceability tests. >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >>>> >>>> Thanks, >>>> Coleen >>>> >>> >> > From coleen.phillimore at oracle.com Fri Sep 22 22:36:41 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 22 Sep 2017 18:36:41 -0400 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: <7029FEDD-16DA-40E2-8FDB-97146C2D5D16@oracle.com> References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> <7029FEDD-16DA-40E2-8FDB-97146C2D5D16@oracle.com> Message-ID: <77d4638a-0251-2795-8bd2-ebd7c9d64283@oracle.com> Thank you, Jiangli! Coleen On 9/22/17 6:33 PM, Jiangli Zhou wrote: > Hi Coleen, > > This looks good. > > Thanks, > Jiangli > >> On Sep 22, 2017, at 1:33 PM, coleen.phillimore at oracle.com wrote: >> >> >> Harold pointed out privately that having ConstantPool::resolved_references() sometimes return NULL forces the callers to check for null for safety, so I added a resolved_references_or_null() call for the heap dumper to call instead. >> >> Please review this new version, also small: >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.02/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >> >> Reran tier1 and heapdump tests. >> >> Thanks, >> Coleen >> >> On 9/5/17 12:50 PM, coleen.phillimore at oracle.com wrote: >>> Thank you, Serguei! >>> Coleen >>> >>> On 9/1/17 6:48 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Coleen, >>>> >>>> The fix looks good. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: >>>>> Summary: Add resolved_references and init_lock as hidden static field in class so root is found. >>>>> >>>>> Tested manually with YourKit. See bug for images. Also ran serviceability tests. >>>>> >>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >>>>> >>>>> Thanks, >>>>> Coleen >>>>> From david.holmes at oracle.com Mon Sep 25 07:29:13 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 25 Sep 2017 17:29:13 +1000 Subject: OpenJDK OOM issue - In-Reply-To: References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> <7368e002-9966-4813-b144-350690cc0566@default> <168E199C-EC63-49F1-97D3-2FA7D5177E3A@gmail.com> Message-ID: <91fe39b8-3207-2e4d-493f-d9b8fe963a7c@oracle.com> Hi Tim, You previously showed ulimit output that indicated a large user process/threads limit, but the hs-err log shows: rlimit: STACK 10240k, CORE 0k, NPROC 1024, NOFILE 4096, AS infinity With NPROC = 1024 you very likely hit the maximum user processes/threads limit. Cheers, David On 25/09/2017 4:54 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > Hi Kirk & Zhengyu > > Thanks for your reply. This issue does not re-occur from Sep 22, but I will keep monitoring. > Below is the NMT information for our JVM during normal situation. As the JAVA program is triggered periodically with one minute interval and the OOM happens during JVM initialization, could you please just reference it? > > For hs_err_pid10907.log, it can be seen clearly that lots of memory is available, but the OOM occurs due to "unable to create new native thread". It's quite strange to me and could you please help to explain what's the possible reasons? Many thanks. > Memory: 4k page, physical 11163792k(274656k free), swap 16777212k(16756468k free) > > Native Memory Tracking: > > Total: reserved=4243930KB, committed=325642KB > - Java Heap (reserved=2791424KB, committed=176128KB) > (mmap: reserved=2791424KB, committed=176128KB) > > - Class (reserved=1064215KB, committed=16791KB) > (classes #1800) > (malloc=5399KB #1210) > (mmap: reserved=1058816KB, committed=11392KB) > > - Thread (reserved=18582KB, committed=18582KB) > (thread #18) > (stack: reserved=18504KB, committed=18504KB) > (malloc=57KB #96) > (arena=21KB #36) > > - Code (reserved=249870KB, committed=2806KB) > (malloc=270KB #919) > (mmap: reserved=249600KB, committed=2536KB) > > - GC (reserved=107764KB, committed=99260KB) > (malloc=5772KB #132) > (mmap: reserved=101992KB, committed=93488KB) > > - Compiler (reserved=134KB, committed=134KB) > (malloc=3KB #48) > (arena=131KB #3) > > - Internal (reserved=5573KB, committed=5573KB) > (malloc=5541KB #3125) > (mmap: reserved=32KB, committed=32KB) > > - Symbol (reserved=3364KB, committed=3364KB) > (malloc=1853KB #4257) > (arena=1511KB #1) > > - Native Memory Tracking (reserved=160KB, committed=160KB) > (malloc=4KB #45) > (tracking overhead=156KB) > > - Arena Chunk (reserved=2844KB, committed=2844KB) > (malloc=2844KB) > > > Br, > Tim > > > -----Original Message----- > From: Kirk Pepperdine [mailto:kirk.pepperdine at gmail.com] > Sent: Thursday, September 21, 2017 10:30 PM > To: Zhengyu Gu > Cc: Yu, Tim (NSB - CN/Chengdu) ; Poonam Parhar ; Vladimir Kozlov ; david.holmes at oracle.com Holmes ; Andrew Haley ; Ray Hindman ; Shen, David (NSB - CN/Chengdu) > Subject: Re: OpenJDK OOM issue - > > Have you tried running pmap? > > Kind regards, > Kirk > >> On Sep 21, 2017, at 4:07 PM, Zhengyu Gu wrote: >> >> Hi Tim, >> >> Try to run with -XX:NativeMemoryTracking=summary , this should give you some hints on native memory side. >> >> In your case, not be able to create native thread, more likely to be on native side than heap. >> >> Thanks, >> >> -Zhengyu >> >> On 09/20/2017 11:22 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>> Hi Poonam & Vladimir >>> After add " -XX:-UseCompressedOops" flag, the OMM still happens. The corresponding GC log is as below and no heap is printed out. So, what's the next step to do? Please help on this and many thanks :) >>> OpenJDK 64-Bit Server VM (25.131-b11) for linux-amd64 JRE (1.8.0_131-b11), built on Apr 13 2017 17:56:19 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) >>> Memory: 4k page, physical 11163792k(551024k free), swap 16777212k(16722204k free) >>> CommandLine flags: -XX:InitialHeapSize=178620672 -XX:MaxHeapSize=2857930752 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseCompressedOops -XX:+UseParallelGC >>> Br, >>> Tim >>> -----Original Message----- >>> From: Poonam Parhar [mailto:poonam.bajaj at oracle.com] >>> Sent: Thursday, September 21, 2017 12:37 AM >>> To: Vladimir Kozlov ; David Holmes ; Yu, Tim (NSB - CN/Chengdu) ; Andrew Haley ; hotspot-dev developers >>> Cc: Shen, David (NSB - CN/Chengdu) >>> Subject: RE: OpenJDK OOM issue - >>> Hi Vladimir, >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Wednesday, September 20, 2017 9:30 AM >>>> To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew >>>> Haley; hotspot-dev developers >>>> Cc: Shen, David (NSB - CN/Chengdu) >>>> Subject: Re: OpenJDK OOM issue - >>>> >>>> On Linux we should not have java heap's low address memory problem. >>>> >>>> Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 >>>> kB"). >>>> >>>> Also 5326 processes it a lot. Overloaded system? >>>> >>>> Poonam, do we have bug for this? Can you attached hs_err file to it. >>>> >>> No, there is no bug for this. >>> Thanks, >>> Poonam >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/20/17 9:15 AM, Poonam Parhar wrote: >>>>> Hello Tim, >>>>> >>>>> From the hs_err_pid12678.log file, the java heap is based at >>>> 0x715a00000 which is 28gb, so there should be plenty of space available >>>> for the native heap. >>>>> >>>>> Memory map: >>>>> ... >>>>> 00600000-00601000 rw-p 00000000 fc:01 17950 >>>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- >>>> 0.b11.el6_9.x86_64/jre/bin/java >>>>> 019bc000-019dd000 rw-p 00000000 00:00 0 >>>> [heap] >>>>> 715a00000-71cd00000 rw-p 00000000 00:00 0 >>>>> 71cd00000-787380000 ---p 00000000 00:00 0 ... >>>>> >>>>> Heap >>>>> PSYoungGen total 51200K, used 13258K [0x0000000787380000, >>>> 0x000000078ac80000, 0x00000007c0000000) >>>>> eden space 44032K, 30% used >>>> [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) >>>>> from space 7168K, 0% used >>>> [0x000000078a580000,0x000000078a580000,0x000000078ac80000) >>>>> to space 7168K, 0% used >>>> [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) >>>>> ParOldGen total 117760K, used 0K [0x0000000715a00000, >>>> 0x000000071cd00000, 0x0000000787380000) >>>>> object space 117760K, 0% used >>>> [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) >>>>> Metaspace used 10485K, capacity 10722K, committed 11008K, >>>> reserved 1058816K >>>>> class space used 1125K, capacity 1227K, committed 1280K, >>>> reserved 1048576K >>>>> >>>>> >>>>> To narrow down the issue, would it be possible for you to test with - >>>> XX:-UseCompressedOops? >>>>> >>>>> Thanks, >>>>> Poonam >>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Wednesday, September 20, 2017 4:43 AM >>>>>> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; >>>> hotspot- >>>>>> dev developers >>>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>>> Subject: Re: OpenJDK OOM issue - >>>>>> >>>>>> Tim, >>>>>> >>>>>> Please note attachments get stripped from the mailing lists. >>>>>> >>>>>> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this >>>>>> and leave it just on hotspot-dev. I've tried to bcc those lists. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> David >>>>>> >>>>>> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>>> Hi All >>>>>>> >>>>>>> Thank you all for the quick response. >>>>>>> The environment information is listed as below, could you please >>>>>>> help >>>>>> to further check? >>>>>>> >>>>>>> 1. What OS is this? >>>>>>> # cat /etc/redhat-release >>>>>>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a >>>>>>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 >>>>>>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>>>>> >>>>>>> 2.GC log is listed as below. The heap information cannot be printed >>>>>> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- >>>> 2017_09_20- >>>>>> 09_21_17.log, you can see the heap begins with 0x0000000787380000 >>>> and >>>>>> it should be not the first 4G virtual memory address. >>>>>>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log >>>>>>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log >>>>>>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log >>>>>>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log >>>>>>> >>>>>>> 3. This issue happens occasionally but frequently. We periodically >>>>>> launch a JAVA program to use JMX to monitor service status of >>>> another >>>>>> JAVA service. >>>>>>> >>>>>>> Br, >>>>>>> Tim >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Andrew Haley [mailto:aph at redhat.com] >>>>>>> Sent: Tuesday, September 19, 2017 9:13 PM >>>>>>> To: Yu, Tim (NSB - CN/Chengdu) ; >>>>>>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net >>>>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>>>> Subject: Re: OpenJDK OOM issue - >>>>>>> >>>>>>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>>>> Hi OpenJDK dev group >>>>>>>> >>>>>>>> We meet one issue that the VM failed to initialize. The error log >>>>>>>> is >>>>>> as below. We checked both memory usage and thread number. They do >>>> not >>>>>> hit the limit. So could you please help to confirm why >>>>>> "java.lang.OutOfMemoryError: unable to create new native thread" >>>>>> error occurs? Many thanks. >>>>>>> >>>>>>> What OS is this? >>>>>>> From rohitarulraj at gmail.com Mon Sep 25 11:03:19 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Mon, 25 Sep 2017 16:33:19 +0530 Subject: Issues with JDK 9 crashing itself and the operating system In-Reply-To: References: Message-ID: Hello Jeronimo, Thanks for the detailed report. We were able to reproduce the issue on our machine. We will analyze this further and get back to you. Regards, Rohit On Sat, Sep 23, 2017 at 4:46 PM, Jeronimo Backes wrote: > Hello, my name is Jeronimo and I'm the author of the univocity-parsers > library (https://github.com/uniVocity/univocity-parsers) and I'm writing to > you by recommendation of Erik Duveblad. > > Basically, I recently installed the JDK 9 distributed by Oracle on my > development computer and when I try to build my project (with a simple `mvn > clean install` command) the JVM crashes with: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f18b96c52f0, pid=3865, tid=3904 > # > # JRE version: Java(TM) SE Runtime Environment (9.0+181) (build 9+181) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (9+181, mixed mode, tiered, > compressed oops, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x9292f0] > JVMCIGlobals::check_jvmci_flags_are_consistent()+0x120 > # > # Core dump will be written. Default location: Core dumps may be processed > with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e" (or dumping to > /home/jbax/dev/repository/univocity-parsers/core.3865) > # > # An error report file with more information is saved as: > # /home/jbax/dev/repository/univocity-parsers/hs_err_pid3865.log > # > # Compiler replay data is saved as: > # /home/jbax/dev/repository/univocity-parsers/replay_pid3865.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > > > The hs_err files generated are available here > https://github.com/uniVocity/univocity-parsers/files/1326484/jdk_9_crash2.zip. > This zip also contains the pom.xml file I used. The build succeeded 4 times > before the JVM crashed. > > Yesterday I had the crash happen 100% of the time, but the CPU was > overclocked to 3.6Ghz (never had any issue with it though) and saved the > error file here: > https://github.com/uniVocity/univocity-parsers/files/1324326/jdk_9_crash.zip. > I created an issue on github to investigate this: > https://github.com/uniVocity/univocity-parsers/issues/189. There Erik > mentioned that: > > "Looking at the hs_err file, the stack trace is "wrong", a C2 Compiler > Thread can't call JVMCIGlobals::check_jvmci_flags_are_consistent (and the > value of the register RIP does not correspond to any instruction in the > compiled version of that function). This makes me suspect that something > could be wrong with your CPU, the CPU should not have jumped to this memory > location." > > Things still fail with stock hardware settings. More details about my > environment : > > OS, Maven and Java versions: > > [jbax at linux-pc ~]$ mvn -version > Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; > 2014-12-15T03:59:23+10:30) > Maven home: /home/jbax/dev/apache-maven > Java version: 9, vendor: Oracle Corporation > Java home: /home/jbax/dev/jdk9 > Default locale: en_AU, platform encoding: UTF-8 > OS name: "linux", version: "4.12.13-1-manjaro", arch: "amd64", family: > "unix" > [jbax at linux-pc ~]$ > > Hardware: > [jbax at linux-pc univocity-parsers]$ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 16 > On-line CPU(s) list: 0-15 > Thread(s) per core: 2 > Core(s) per socket: 8 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: AuthenticAMD > CPU family: 23 > Model: 1 > Model name: AMD Ryzen 7 1700 Eight-Core Processor > Stepping: 1 > CPU MHz: 1550.000 > CPU max MHz: 3000.0000 > CPU min MHz: 1550.0000 > BogoMIPS: 6001.43 > Virtualization: AMD-V > L1d cache: 32K > L1i cache: 64K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): 0-15 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid > aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt > aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm > sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core > perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep > bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero > irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid > decodeassists pausefilter pfthreshold avic overflow_recov succor smca > > On an unrelated note, I use an old java application that crashes the entire > OS for me when Java 9 is used: http://www.jinchess.com/download > > It's just a matter of downloading, unpacking and trying to start it with > jin-2.14.1/jin > > The OS crashes and I have to hard-reset the computer. It works just fine if > revert back to Java 6, 7 or 8. > > I thought you'd might want to investigate what is going on. Let me know if > you need more information. > > Best regards, > > Jeronimo. > > > > > -- > the uniVocity team > www.univocity.com From zgu at redhat.com Mon Sep 25 16:00:12 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 25 Sep 2017 12:00:12 -0400 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> Message-ID: <376596c4-4a2a-6415-b9c6-1fe7a64d79b2@redhat.com> Hi Chris, Could you please sponsor this change? Attached patch has been rebased to new repo. Thank you in advance! -Zhengyu On 09/19/2017 11:27 AM, Chris Plummer wrote: > Looks good. Seems to follow a pattern used elsewhere. > > thanks, > > Chris > > On 9/18/17 12:17 PM, Zhengyu Gu wrote: >> Compiler (C2) uses ResourceArea instead of Arena in some >> circumstances, so it can take advantage of ResourceMark. However, >> ResourceArea is tagged as mtThread, that results those memory is >> miscounted by NMT >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >> >> >> Test: >> >> hotspot_tier1 (fastdebug and release) on Linux x64 >> >> >> Thanks, >> >> -Zhengyu > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: 8187629.patch Type: text/x-patch Size: 3217 bytes Desc: not available URL: From chris.plummer at oracle.com Mon Sep 25 16:56:44 2017 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 25 Sep 2017 09:56:44 -0700 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <376596c4-4a2a-6415-b9c6-1fe7a64d79b2@redhat.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <376596c4-4a2a-6415-b9c6-1fe7a64d79b2@redhat.com> Message-ID: <100ed1ee-82a9-909f-f79d-5d590dac4d8a@oracle.com> Yes, I'll push it today. thanks, Chris On 9/25/17 9:00 AM, Zhengyu Gu wrote: > Hi Chris, > > Could you please sponsor this change? > > Attached patch has been rebased to new repo. > > Thank you in advance! > > -Zhengyu > > On 09/19/2017 11:27 AM, Chris Plummer wrote: >> Looks good. Seems to follow a pattern used elsewhere. >> >> thanks, >> >> Chris >> >> On 9/18/17 12:17 PM, Zhengyu Gu wrote: >>> Compiler (C2) uses ResourceArea instead of Arena in some >>> circumstances, so it can take advantage of ResourceMark. However, >>> ResourceArea is tagged as mtThread, that results those memory is >>> miscounted by NMT >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >>> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >>> >>> >>> Test: >>> >>> ? hotspot_tier1 (fastdebug and release) on Linux x64 >>> >>> >>> Thanks, >>> >>> -Zhengyu >> >> >> From zgu at redhat.com Mon Sep 25 16:58:49 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 25 Sep 2017 12:58:49 -0400 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: <100ed1ee-82a9-909f-f79d-5d590dac4d8a@oracle.com> References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <376596c4-4a2a-6415-b9c6-1fe7a64d79b2@redhat.com> <100ed1ee-82a9-909f-f79d-5d590dac4d8a@oracle.com> Message-ID: Thanks a lot, Chris. -Zhengyu On 09/25/2017 12:56 PM, Chris Plummer wrote: > Yes, I'll push it today. > > thanks, > > Chris > > On 9/25/17 9:00 AM, Zhengyu Gu wrote: >> Hi Chris, >> >> Could you please sponsor this change? >> >> Attached patch has been rebased to new repo. >> >> Thank you in advance! >> >> -Zhengyu >> >> On 09/19/2017 11:27 AM, Chris Plummer wrote: >>> Looks good. Seems to follow a pattern used elsewhere. >>> >>> thanks, >>> >>> Chris >>> >>> On 9/18/17 12:17 PM, Zhengyu Gu wrote: >>>> Compiler (C2) uses ResourceArea instead of Arena in some >>>> circumstances, so it can take advantage of ResourceMark. However, >>>> ResourceArea is tagged as mtThread, that results those memory is >>>> miscounted by NMT >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >>>> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >>>> >>>> >>>> Test: >>>> >>>> hotspot_tier1 (fastdebug and release) on Linux x64 >>>> >>>> >>>> Thanks, >>>> >>>> -Zhengyu >>> >>> >>> > > From chris.plummer at oracle.com Mon Sep 25 17:01:27 2017 From: chris.plummer at oracle.com (Chris Plummer) Date: Mon, 25 Sep 2017 10:01:27 -0700 Subject: RFR(XXS) 8187629: NMT: Memory miscounting in compiler (C2) In-Reply-To: References: <8987f8cd-9eee-bb86-13a6-3623ab093668@redhat.com> <376596c4-4a2a-6415-b9c6-1fe7a64d79b2@redhat.com> <100ed1ee-82a9-909f-f79d-5d590dac4d8a@oracle.com> Message-ID: Well, looks like hs is still closed. I'll get it all ready to push once it's opened. Chris On 9/25/17 9:58 AM, Zhengyu Gu wrote: > Thanks a lot, Chris. > > -Zhengyu > > On 09/25/2017 12:56 PM, Chris Plummer wrote: >> Yes, I'll push it today. >> >> thanks, >> >> Chris >> >> On 9/25/17 9:00 AM, Zhengyu Gu wrote: >>> Hi Chris, >>> >>> Could you please sponsor this change? >>> >>> Attached patch has been rebased to new repo. >>> >>> Thank you in advance! >>> >>> -Zhengyu >>> >>> On 09/19/2017 11:27 AM, Chris Plummer wrote: >>>> Looks good. Seems to follow a pattern used elsewhere. >>>> >>>> thanks, >>>> >>>> Chris >>>> >>>> On 9/18/17 12:17 PM, Zhengyu Gu wrote: >>>>> Compiler (C2) uses ResourceArea instead of Arena in some >>>>> circumstances, so it can take advantage of ResourceMark. However, >>>>> ResourceArea is tagged as mtThread, that results those memory is >>>>> miscounted by NMT >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8187629 >>>>> Webrev: http://cr.openjdk.java.net/~zgu/8187629/webrev.00/ >>>>> >>>>> >>>>> Test: >>>>> >>>>> ? hotspot_tier1 (fastdebug and release) on Linux x64 >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> -Zhengyu >>>> >>>> >>>> >> >> From jbax at univocity.com Sat Sep 23 11:16:12 2017 From: jbax at univocity.com (Jeronimo Backes) Date: Sat, 23 Sep 2017 20:46:12 +0930 Subject: Issues with JDK 9 crashing itself and the operating system Message-ID: Hello, my name is Jeronimo and I'm the author of the univocity-parsers library (https://github.com/uniVocity/univocity-parsers) and I'm writing to you by recommendation of Erik Duveblad. Basically, I recently installed the JDK 9 distributed by Oracle on my development computer and when I try to build my project (with a simple `mvn clean install` command) the JVM crashes with: # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f18b96c52f0, pid=3865, tid=3904 # # JRE version: Java(TM) SE Runtime Environment (9.0+181) (build 9+181) # Java VM: Java HotSpot(TM) 64-Bit Server VM (9+181, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x9292f0] JVMCIGlobals::check_jvmci_flags_are_consistent()+0x120 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e" (or dumping to /home/jbax/dev/repository/univocity-parsers/core.3865) # # An error report file with more information is saved as: # /home/jbax/dev/repository/univocity-parsers/hs_err_pid3865.log # # Compiler replay data is saved as: # /home/jbax/dev/repository/univocity-parsers/replay_pid3865.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # The hs_err files generated are available here https://github.com/uniVocity/univocity-parsers/files/1326484/jdk_9_crash2.zip . This zip also contains the pom.xml file I used. *The build succeeded 4 times before the JVM crashed*. Yesterday I had the crash happen 100% of the time, but the CPU was overclocked to 3.6Ghz (never had any issue with it though) and saved the error file here: https://github.com/uniVocity/univocity-parsers/files/1324326/jdk_9_crash.zip. I created an issue on github to investigate this: https://github.com/uniVocity/univocity-parsers/issues/189. There Erik mentioned that: *"Looking at the hs_err file, the stack trace is "wrong", a C2 Compiler Thread can't call JVMCIGlobals::check_jvmci_flags_are_consistent (and the value of the register RIP does not correspond to any instruction in the compiled version of that function). This makes me suspect that something could be wrong with your CPU, the CPU should not have jumped to this memory location."* Things still fail with stock hardware settings. More details about my environment : *OS, Maven and Java versions:* [jbax at linux-pc ~]$ mvn -version Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T03:59:23+10:30) Maven home: /home/jbax/dev/apache-maven Java version: 9, vendor: Oracle Corporation Java home: /home/jbax/dev/jdk9 Default locale: en_AU, platform encoding: UTF-8 OS name: "linux", version: "4.12.13-1-manjaro", arch: "amd64", family: "unix" [jbax at linux-pc ~]$ *Hardware:* [jbax at linux-pc univocity-parsers]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD Ryzen 7 1700 Eight-Core Processor Stepping: 1 CPU MHz: 1550.000 CPU max MHz: 3000.0000 CPU min MHz: 1550.0000 BogoMIPS: 6001.43 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-15 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca *On an unrelated note*, I use an old java application that crashes the entire OS for me when Java 9 is used: http://www.jinchess.com/download It's just a matter of downloading, unpacking and trying to start it with jin-2.14.1/jin The OS crashes and I have to hard-reset the computer. It works just fine if revert back to Java 6, 7 or 8. I thought you'd might want to investigate what is going on. Let me know if you need more information. Best regards, Jeronimo. -- the uniVocity team www.univocity.com From tim.yu at nokia-sbell.com Mon Sep 25 06:54:39 2017 From: tim.yu at nokia-sbell.com (Yu, Tim (NSB - CN/Chengdu)) Date: Mon, 25 Sep 2017 06:54:39 +0000 Subject: OpenJDK OOM issue - In-Reply-To: <168E199C-EC63-49F1-97D3-2FA7D5177E3A@gmail.com> References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> <7368e002-9966-4813-b144-350690cc0566@default> <168E199C-EC63-49F1-97D3-2FA7D5177E3A@gmail.com> Message-ID: Hi Kirk & Zhengyu Thanks for your reply. This issue does not re-occur from Sep 22, but I will keep monitoring. Below is the NMT information for our JVM during normal situation. As the JAVA program is triggered periodically with one minute interval and the OOM happens during JVM initialization, could you please just reference it? For hs_err_pid10907.log, it can be seen clearly that lots of memory is available, but the OOM occurs due to "unable to create new native thread". It's quite strange to me and could you please help to explain what's the possible reasons? Many thanks. Memory: 4k page, physical 11163792k(274656k free), swap 16777212k(16756468k free) Native Memory Tracking: Total: reserved=4243930KB, committed=325642KB - Java Heap (reserved=2791424KB, committed=176128KB) (mmap: reserved=2791424KB, committed=176128KB) - Class (reserved=1064215KB, committed=16791KB) (classes #1800) (malloc=5399KB #1210) (mmap: reserved=1058816KB, committed=11392KB) - Thread (reserved=18582KB, committed=18582KB) (thread #18) (stack: reserved=18504KB, committed=18504KB) (malloc=57KB #96) (arena=21KB #36) - Code (reserved=249870KB, committed=2806KB) (malloc=270KB #919) (mmap: reserved=249600KB, committed=2536KB) - GC (reserved=107764KB, committed=99260KB) (malloc=5772KB #132) (mmap: reserved=101992KB, committed=93488KB) - Compiler (reserved=134KB, committed=134KB) (malloc=3KB #48) (arena=131KB #3) - Internal (reserved=5573KB, committed=5573KB) (malloc=5541KB #3125) (mmap: reserved=32KB, committed=32KB) - Symbol (reserved=3364KB, committed=3364KB) (malloc=1853KB #4257) (arena=1511KB #1) - Native Memory Tracking (reserved=160KB, committed=160KB) (malloc=4KB #45) (tracking overhead=156KB) - Arena Chunk (reserved=2844KB, committed=2844KB) (malloc=2844KB) Br, Tim -----Original Message----- From: Kirk Pepperdine [mailto:kirk.pepperdine at gmail.com] Sent: Thursday, September 21, 2017 10:30 PM To: Zhengyu Gu Cc: Yu, Tim (NSB - CN/Chengdu) ; Poonam Parhar ; Vladimir Kozlov ; david.holmes at oracle.com Holmes ; Andrew Haley ; Ray Hindman ; Shen, David (NSB - CN/Chengdu) Subject: Re: OpenJDK OOM issue - Have you tried running pmap? Kind regards, Kirk > On Sep 21, 2017, at 4:07 PM, Zhengyu Gu wrote: > > Hi Tim, > > Try to run with -XX:NativeMemoryTracking=summary , this should give you some hints on native memory side. > > In your case, not be able to create native thread, more likely to be on native side than heap. > > Thanks, > > -Zhengyu > > On 09/20/2017 11:22 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >> Hi Poonam & Vladimir >> After add " -XX:-UseCompressedOops" flag, the OMM still happens. The corresponding GC log is as below and no heap is printed out. So, what's the next step to do? Please help on this and many thanks :) >> OpenJDK 64-Bit Server VM (25.131-b11) for linux-amd64 JRE (1.8.0_131-b11), built on Apr 13 2017 17:56:19 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) >> Memory: 4k page, physical 11163792k(551024k free), swap 16777212k(16722204k free) >> CommandLine flags: -XX:InitialHeapSize=178620672 -XX:MaxHeapSize=2857930752 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseCompressedOops -XX:+UseParallelGC >> Br, >> Tim >> -----Original Message----- >> From: Poonam Parhar [mailto:poonam.bajaj at oracle.com] >> Sent: Thursday, September 21, 2017 12:37 AM >> To: Vladimir Kozlov ; David Holmes ; Yu, Tim (NSB - CN/Chengdu) ; Andrew Haley ; hotspot-dev developers >> Cc: Shen, David (NSB - CN/Chengdu) >> Subject: RE: OpenJDK OOM issue - >> Hi Vladimir, >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Wednesday, September 20, 2017 9:30 AM >>> To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew >>> Haley; hotspot-dev developers >>> Cc: Shen, David (NSB - CN/Chengdu) >>> Subject: Re: OpenJDK OOM issue - >>> >>> On Linux we should not have java heap's low address memory problem. >>> >>> Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 >>> kB"). >>> >>> Also 5326 processes it a lot. Overloaded system? >>> >>> Poonam, do we have bug for this? Can you attached hs_err file to it. >>> >> No, there is no bug for this. >> Thanks, >> Poonam >>> Thanks, >>> Vladimir >>> >>> On 9/20/17 9:15 AM, Poonam Parhar wrote: >>>> Hello Tim, >>>> >>>> From the hs_err_pid12678.log file, the java heap is based at >>> 0x715a00000 which is 28gb, so there should be plenty of space available >>> for the native heap. >>>> >>>> Memory map: >>>> ... >>>> 00600000-00601000 rw-p 00000000 fc:01 17950 >>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- >>> 0.b11.el6_9.x86_64/jre/bin/java >>>> 019bc000-019dd000 rw-p 00000000 00:00 0 >>> [heap] >>>> 715a00000-71cd00000 rw-p 00000000 00:00 0 >>>> 71cd00000-787380000 ---p 00000000 00:00 0 ... >>>> >>>> Heap >>>> PSYoungGen total 51200K, used 13258K [0x0000000787380000, >>> 0x000000078ac80000, 0x00000007c0000000) >>>> eden space 44032K, 30% used >>> [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) >>>> from space 7168K, 0% used >>> [0x000000078a580000,0x000000078a580000,0x000000078ac80000) >>>> to space 7168K, 0% used >>> [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) >>>> ParOldGen total 117760K, used 0K [0x0000000715a00000, >>> 0x000000071cd00000, 0x0000000787380000) >>>> object space 117760K, 0% used >>> [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) >>>> Metaspace used 10485K, capacity 10722K, committed 11008K, >>> reserved 1058816K >>>> class space used 1125K, capacity 1227K, committed 1280K, >>> reserved 1048576K >>>> >>>> >>>> To narrow down the issue, would it be possible for you to test with - >>> XX:-UseCompressedOops? >>>> >>>> Thanks, >>>> Poonam >>>> >>>>> -----Original Message----- >>>>> From: David Holmes >>>>> Sent: Wednesday, September 20, 2017 4:43 AM >>>>> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; >>> hotspot- >>>>> dev developers >>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>> Subject: Re: OpenJDK OOM issue - >>>>> >>>>> Tim, >>>>> >>>>> Please note attachments get stripped from the mailing lists. >>>>> >>>>> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this >>>>> and leave it just on hotspot-dev. I've tried to bcc those lists. >>>>> >>>>> Thank you. >>>>> >>>>> David >>>>> >>>>> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>> Hi All >>>>>> >>>>>> Thank you all for the quick response. >>>>>> The environment information is listed as below, could you please >>>>>> help >>>>> to further check? >>>>>> >>>>>> 1. What OS is this? >>>>>> # cat /etc/redhat-release >>>>>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a >>>>>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 >>>>>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>>>> >>>>>> 2.GC log is listed as below. The heap information cannot be printed >>>>> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- >>> 2017_09_20- >>>>> 09_21_17.log, you can see the heap begins with 0x0000000787380000 >>> and >>>>> it should be not the first 4G virtual memory address. >>>>>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log >>>>>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log >>>>>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log >>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log >>>>>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log >>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log >>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log >>>>>> >>>>>> 3. This issue happens occasionally but frequently. We periodically >>>>> launch a JAVA program to use JMX to monitor service status of >>> another >>>>> JAVA service. >>>>>> >>>>>> Br, >>>>>> Tim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Andrew Haley [mailto:aph at redhat.com] >>>>>> Sent: Tuesday, September 19, 2017 9:13 PM >>>>>> To: Yu, Tim (NSB - CN/Chengdu) ; >>>>>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net >>>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>>> Subject: Re: OpenJDK OOM issue - >>>>>> >>>>>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>>> Hi OpenJDK dev group >>>>>>> >>>>>>> We meet one issue that the VM failed to initialize. The error log >>>>>>> is >>>>> as below. We checked both memory usage and thread number. They do >>> not >>>>> hit the limit. So could you please help to confirm why >>>>> "java.lang.OutOfMemoryError: unable to create new native thread" >>>>> error occurs? Many thanks. >>>>>> >>>>>> What OS is this? >>>>>> From tim.yu at nokia-sbell.com Mon Sep 25 09:04:25 2017 From: tim.yu at nokia-sbell.com (Yu, Tim (NSB - CN/Chengdu)) Date: Mon, 25 Sep 2017 09:04:25 +0000 Subject: OpenJDK OOM issue - In-Reply-To: <91fe39b8-3207-2e4d-493f-d9b8fe963a7c@oracle.com> References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> <7368e002-9966-4813-b144-350690cc0566@default> <168E199C-EC63-49F1-97D3-2FA7D5177E3A@gmail.com> <91fe39b8-3207-2e4d-493f-d9b8fe963a7c@oracle.com> Message-ID: Hi David Thanks for your reply. I checked with your comments. Finding is as below, I have verified at least more than 2000 thousand process/thread can be created on the issue system with user "esbamin".( The issued JAVA program is executed by user "esbadmin") Could you please help to explain why NPROC number in hs_err file is 1024? Many thanks. 1. check nproc number with below command, it can be seen the max user process number is 43497. # su esbadmin sh-4.1$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 43497 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 43497 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 2. Based on https://blog.dbi-services.com/linux-how-to-monitor-the-nproc-limit-1/, I compiled testnpron.c and launch 1500 more processes on the issued system with user "esbadmin". # sudo -u esbadmin ./testnproc 3. Monitor the nproc number //before testnproc triggered [root at cloudyvm16 ~]# ps h -Led -o user | sort | uniq -c | sort -n|grep esbadmin 915 esbadmin //after testnproc triggered and you can see 2419 process/thread is created with user "esbadmin" [root at cloudyvm16 ~]# ps h -Led -o user | sort | uniq -c | sort -n|grep esbadmin 2419 esbadmin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Monday, September 25, 2017 3:29 PM To: Yu, Tim (NSB - CN/Chengdu) ; Kirk Pepperdine ; Zhengyu Gu Cc: Poonam Parhar ; Vladimir Kozlov ; Andrew Haley ; Ray Hindman ; Shen, David (NSB - CN/Chengdu) Subject: Re: OpenJDK OOM issue - Hi Tim, You previously showed ulimit output that indicated a large user process/threads limit, but the hs-err log shows: rlimit: STACK 10240k, CORE 0k, NPROC 1024, NOFILE 4096, AS infinity With NPROC = 1024 you very likely hit the maximum user processes/threads limit. Cheers, David On 25/09/2017 4:54 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > Hi Kirk & Zhengyu > > Thanks for your reply. This issue does not re-occur from Sep 22, but I will keep monitoring. > Below is the NMT information for our JVM during normal situation. As the JAVA program is triggered periodically with one minute interval and the OOM happens during JVM initialization, could you please just reference it? > > For hs_err_pid10907.log, it can be seen clearly that lots of memory is available, but the OOM occurs due to "unable to create new native thread". It's quite strange to me and could you please help to explain what's the possible reasons? Many thanks. > Memory: 4k page, physical 11163792k(274656k free), swap 16777212k(16756468k free) > > Native Memory Tracking: > > Total: reserved=4243930KB, committed=325642KB > - Java Heap (reserved=2791424KB, committed=176128KB) > (mmap: reserved=2791424KB, committed=176128KB) > > - Class (reserved=1064215KB, committed=16791KB) > (classes #1800) > (malloc=5399KB #1210) > (mmap: reserved=1058816KB, committed=11392KB) > > - Thread (reserved=18582KB, committed=18582KB) > (thread #18) > (stack: reserved=18504KB, committed=18504KB) > (malloc=57KB #96) > (arena=21KB #36) > > - Code (reserved=249870KB, committed=2806KB) > (malloc=270KB #919) > (mmap: reserved=249600KB, committed=2536KB) > > - GC (reserved=107764KB, committed=99260KB) > (malloc=5772KB #132) > (mmap: reserved=101992KB, committed=93488KB) > > - Compiler (reserved=134KB, committed=134KB) > (malloc=3KB #48) > (arena=131KB #3) > > - Internal (reserved=5573KB, committed=5573KB) > (malloc=5541KB #3125) > (mmap: reserved=32KB, committed=32KB) > > - Symbol (reserved=3364KB, committed=3364KB) > (malloc=1853KB #4257) > (arena=1511KB #1) > > - Native Memory Tracking (reserved=160KB, committed=160KB) > (malloc=4KB #45) > (tracking overhead=156KB) > > - Arena Chunk (reserved=2844KB, committed=2844KB) > (malloc=2844KB) > > > Br, > Tim > > > -----Original Message----- > From: Kirk Pepperdine [mailto:kirk.pepperdine at gmail.com] > Sent: Thursday, September 21, 2017 10:30 PM > To: Zhengyu Gu > Cc: Yu, Tim (NSB - CN/Chengdu) ; Poonam Parhar ; Vladimir Kozlov ; david.holmes at oracle.com Holmes ; Andrew Haley ; Ray Hindman ; Shen, David (NSB - CN/Chengdu) > Subject: Re: OpenJDK OOM issue - > > Have you tried running pmap? > > Kind regards, > Kirk > >> On Sep 21, 2017, at 4:07 PM, Zhengyu Gu wrote: >> >> Hi Tim, >> >> Try to run with -XX:NativeMemoryTracking=summary , this should give you some hints on native memory side. >> >> In your case, not be able to create native thread, more likely to be on native side than heap. >> >> Thanks, >> >> -Zhengyu >> >> On 09/20/2017 11:22 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>> Hi Poonam & Vladimir >>> After add " -XX:-UseCompressedOops" flag, the OMM still happens. The corresponding GC log is as below and no heap is printed out. So, what's the next step to do? Please help on this and many thanks :) >>> OpenJDK 64-Bit Server VM (25.131-b11) for linux-amd64 JRE (1.8.0_131-b11), built on Apr 13 2017 17:56:19 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) >>> Memory: 4k page, physical 11163792k(551024k free), swap 16777212k(16722204k free) >>> CommandLine flags: -XX:InitialHeapSize=178620672 -XX:MaxHeapSize=2857930752 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseCompressedOops -XX:+UseParallelGC >>> Br, >>> Tim >>> -----Original Message----- >>> From: Poonam Parhar [mailto:poonam.bajaj at oracle.com] >>> Sent: Thursday, September 21, 2017 12:37 AM >>> To: Vladimir Kozlov ; David Holmes ; Yu, Tim (NSB - CN/Chengdu) ; Andrew Haley ; hotspot-dev developers >>> Cc: Shen, David (NSB - CN/Chengdu) >>> Subject: RE: OpenJDK OOM issue - >>> Hi Vladimir, >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Wednesday, September 20, 2017 9:30 AM >>>> To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew >>>> Haley; hotspot-dev developers >>>> Cc: Shen, David (NSB - CN/Chengdu) >>>> Subject: Re: OpenJDK OOM issue - >>>> >>>> On Linux we should not have java heap's low address memory problem. >>>> >>>> Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 >>>> kB"). >>>> >>>> Also 5326 processes it a lot. Overloaded system? >>>> >>>> Poonam, do we have bug for this? Can you attached hs_err file to it. >>>> >>> No, there is no bug for this. >>> Thanks, >>> Poonam >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/20/17 9:15 AM, Poonam Parhar wrote: >>>>> Hello Tim, >>>>> >>>>> From the hs_err_pid12678.log file, the java heap is based at >>>> 0x715a00000 which is 28gb, so there should be plenty of space available >>>> for the native heap. >>>>> >>>>> Memory map: >>>>> ... >>>>> 00600000-00601000 rw-p 00000000 fc:01 17950 >>>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- >>>> 0.b11.el6_9.x86_64/jre/bin/java >>>>> 019bc000-019dd000 rw-p 00000000 00:00 0 >>>> [heap] >>>>> 715a00000-71cd00000 rw-p 00000000 00:00 0 >>>>> 71cd00000-787380000 ---p 00000000 00:00 0 ... >>>>> >>>>> Heap >>>>> PSYoungGen total 51200K, used 13258K [0x0000000787380000, >>>> 0x000000078ac80000, 0x00000007c0000000) >>>>> eden space 44032K, 30% used >>>> [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) >>>>> from space 7168K, 0% used >>>> [0x000000078a580000,0x000000078a580000,0x000000078ac80000) >>>>> to space 7168K, 0% used >>>> [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) >>>>> ParOldGen total 117760K, used 0K [0x0000000715a00000, >>>> 0x000000071cd00000, 0x0000000787380000) >>>>> object space 117760K, 0% used >>>> [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) >>>>> Metaspace used 10485K, capacity 10722K, committed 11008K, >>>> reserved 1058816K >>>>> class space used 1125K, capacity 1227K, committed 1280K, >>>> reserved 1048576K >>>>> >>>>> >>>>> To narrow down the issue, would it be possible for you to test with - >>>> XX:-UseCompressedOops? >>>>> >>>>> Thanks, >>>>> Poonam >>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Wednesday, September 20, 2017 4:43 AM >>>>>> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; >>>> hotspot- >>>>>> dev developers >>>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>>> Subject: Re: OpenJDK OOM issue - >>>>>> >>>>>> Tim, >>>>>> >>>>>> Please note attachments get stripped from the mailing lists. >>>>>> >>>>>> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this >>>>>> and leave it just on hotspot-dev. I've tried to bcc those lists. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> David >>>>>> >>>>>> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>>> Hi All >>>>>>> >>>>>>> Thank you all for the quick response. >>>>>>> The environment information is listed as below, could you please >>>>>>> help >>>>>> to further check? >>>>>>> >>>>>>> 1. What OS is this? >>>>>>> # cat /etc/redhat-release >>>>>>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a >>>>>>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 >>>>>>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>>>>> >>>>>>> 2.GC log is listed as below. The heap information cannot be printed >>>>>> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- >>>> 2017_09_20- >>>>>> 09_21_17.log, you can see the heap begins with 0x0000000787380000 >>>> and >>>>>> it should be not the first 4G virtual memory address. >>>>>>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log >>>>>>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log >>>>>>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log >>>>>>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log >>>>>>> >>>>>>> 3. This issue happens occasionally but frequently. We periodically >>>>>> launch a JAVA program to use JMX to monitor service status of >>>> another >>>>>> JAVA service. >>>>>>> >>>>>>> Br, >>>>>>> Tim >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Andrew Haley [mailto:aph at redhat.com] >>>>>>> Sent: Tuesday, September 19, 2017 9:13 PM >>>>>>> To: Yu, Tim (NSB - CN/Chengdu) ; >>>>>>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net >>>>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>>>> Subject: Re: OpenJDK OOM issue - >>>>>>> >>>>>>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>>>> Hi OpenJDK dev group >>>>>>>> >>>>>>>> We meet one issue that the VM failed to initialize. The error log >>>>>>>> is >>>>>> as below. We checked both memory usage and thread number. They do >>>> not >>>>>> hit the limit. So could you please help to confirm why >>>>>> "java.lang.OutOfMemoryError: unable to create new native thread" >>>>>> error occurs? Many thanks. >>>>>>> >>>>>>> What OS is this? >>>>>>> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: testnproc.c URL: From tim.yu at nokia-sbell.com Mon Sep 25 09:33:42 2017 From: tim.yu at nokia-sbell.com (Yu, Tim (NSB - CN/Chengdu)) Date: Mon, 25 Sep 2017 09:33:42 +0000 Subject: OpenJDK OOM issue - In-Reply-To: References: <99906d91-dea4-dd5f-2d35-f50f9f5264de@redhat.com> <27ce9fc2-2e76-4391-a38c-8d068f1a6dbd@default> <4b587954-1a90-e04d-9766-8f22f803e57e@oracle.com> <7368e002-9966-4813-b144-350690cc0566@default> <168E199C-EC63-49F1-97D3-2FA7D5177E3A@gmail.com> <91fe39b8-3207-2e4d-493f-d9b8fe963a7c@oracle.com> Message-ID: Hi David After add log, I see NPROC is 1024 during script running. I will check the actual process number of "esbadmin" later when issue re-occurs. Thanks. Please ignore previous mail :) Mon Sep 25 12:24:02 EEST 2017 max user processes (-u) 1024 Br, Tim -----Original Message----- From: Yu, Tim (NSB - CN/Chengdu) Sent: Monday, September 25, 2017 5:04 PM To: 'David Holmes' ; Kirk Pepperdine ; Zhengyu Gu Cc: Poonam Parhar ; Vladimir Kozlov ; Andrew Haley ; Ray Hindman ; Shen, David (NSB - CN/Chengdu) Subject: RE: OpenJDK OOM issue - Hi David Thanks for your reply. I checked with your comments. Finding is as below, I have verified at least more than 2000 thousand process/thread can be created on the issue system with user "esbamin".( The issued JAVA program is executed by user "esbadmin") Could you please help to explain why NPROC number in hs_err file is 1024? Many thanks. 1. check nproc number with below command, it can be seen the max user process number is 43497. # su esbadmin sh-4.1$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 43497 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 43497 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 2. Based on https://blog.dbi-services.com/linux-how-to-monitor-the-nproc-limit-1/, I compiled testnpron.c and launch 1500 more processes on the issued system with user "esbadmin". # sudo -u esbadmin ./testnproc 3. Monitor the nproc number //before testnproc triggered [root at cloudyvm16 ~]# ps h -Led -o user | sort | uniq -c | sort -n|grep esbadmin 915 esbadmin //after testnproc triggered and you can see 2419 process/thread is created with user "esbadmin" [root at cloudyvm16 ~]# ps h -Led -o user | sort | uniq -c | sort -n|grep esbadmin 2419 esbadmin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Monday, September 25, 2017 3:29 PM To: Yu, Tim (NSB - CN/Chengdu) ; Kirk Pepperdine ; Zhengyu Gu Cc: Poonam Parhar ; Vladimir Kozlov ; Andrew Haley ; Ray Hindman ; Shen, David (NSB - CN/Chengdu) Subject: Re: OpenJDK OOM issue - Hi Tim, You previously showed ulimit output that indicated a large user process/threads limit, but the hs-err log shows: rlimit: STACK 10240k, CORE 0k, NPROC 1024, NOFILE 4096, AS infinity With NPROC = 1024 you very likely hit the maximum user processes/threads limit. Cheers, David On 25/09/2017 4:54 PM, Yu, Tim (NSB - CN/Chengdu) wrote: > Hi Kirk & Zhengyu > > Thanks for your reply. This issue does not re-occur from Sep 22, but I will keep monitoring. > Below is the NMT information for our JVM during normal situation. As the JAVA program is triggered periodically with one minute interval and the OOM happens during JVM initialization, could you please just reference it? > > For hs_err_pid10907.log, it can be seen clearly that lots of memory is available, but the OOM occurs due to "unable to create new native thread". It's quite strange to me and could you please help to explain what's the possible reasons? Many thanks. > Memory: 4k page, physical 11163792k(274656k free), swap 16777212k(16756468k free) > > Native Memory Tracking: > > Total: reserved=4243930KB, committed=325642KB > - Java Heap (reserved=2791424KB, committed=176128KB) > (mmap: reserved=2791424KB, committed=176128KB) > > - Class (reserved=1064215KB, committed=16791KB) > (classes #1800) > (malloc=5399KB #1210) > (mmap: reserved=1058816KB, committed=11392KB) > > - Thread (reserved=18582KB, committed=18582KB) > (thread #18) > (stack: reserved=18504KB, committed=18504KB) > (malloc=57KB #96) > (arena=21KB #36) > > - Code (reserved=249870KB, committed=2806KB) > (malloc=270KB #919) > (mmap: reserved=249600KB, committed=2536KB) > > - GC (reserved=107764KB, committed=99260KB) > (malloc=5772KB #132) > (mmap: reserved=101992KB, committed=93488KB) > > - Compiler (reserved=134KB, committed=134KB) > (malloc=3KB #48) > (arena=131KB #3) > > - Internal (reserved=5573KB, committed=5573KB) > (malloc=5541KB #3125) > (mmap: reserved=32KB, committed=32KB) > > - Symbol (reserved=3364KB, committed=3364KB) > (malloc=1853KB #4257) > (arena=1511KB #1) > > - Native Memory Tracking (reserved=160KB, committed=160KB) > (malloc=4KB #45) > (tracking overhead=156KB) > > - Arena Chunk (reserved=2844KB, committed=2844KB) > (malloc=2844KB) > > > Br, > Tim > > > -----Original Message----- > From: Kirk Pepperdine [mailto:kirk.pepperdine at gmail.com] > Sent: Thursday, September 21, 2017 10:30 PM > To: Zhengyu Gu > Cc: Yu, Tim (NSB - CN/Chengdu) ; Poonam Parhar ; Vladimir Kozlov ; david.holmes at oracle.com Holmes ; Andrew Haley ; Ray Hindman ; Shen, David (NSB - CN/Chengdu) > Subject: Re: OpenJDK OOM issue - > > Have you tried running pmap? > > Kind regards, > Kirk > >> On Sep 21, 2017, at 4:07 PM, Zhengyu Gu wrote: >> >> Hi Tim, >> >> Try to run with -XX:NativeMemoryTracking=summary , this should give you some hints on native memory side. >> >> In your case, not be able to create native thread, more likely to be on native side than heap. >> >> Thanks, >> >> -Zhengyu >> >> On 09/20/2017 11:22 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>> Hi Poonam & Vladimir >>> After add " -XX:-UseCompressedOops" flag, the OMM still happens. The corresponding GC log is as below and no heap is printed out. So, what's the next step to do? Please help on this and many thanks :) >>> OpenJDK 64-Bit Server VM (25.131-b11) for linux-amd64 JRE (1.8.0_131-b11), built on Apr 13 2017 17:56:19 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) >>> Memory: 4k page, physical 11163792k(551024k free), swap 16777212k(16722204k free) >>> CommandLine flags: -XX:InitialHeapSize=178620672 -XX:MaxHeapSize=2857930752 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-UseCompressedOops -XX:+UseParallelGC >>> Br, >>> Tim >>> -----Original Message----- >>> From: Poonam Parhar [mailto:poonam.bajaj at oracle.com] >>> Sent: Thursday, September 21, 2017 12:37 AM >>> To: Vladimir Kozlov ; David Holmes ; Yu, Tim (NSB - CN/Chengdu) ; Andrew Haley ; hotspot-dev developers >>> Cc: Shen, David (NSB - CN/Chengdu) >>> Subject: RE: OpenJDK OOM issue - >>> Hi Vladimir, >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: Wednesday, September 20, 2017 9:30 AM >>>> To: Poonam Parhar; David Holmes; Yu, Tim (NSB - CN/Chengdu); Andrew >>>> Haley; hotspot-dev developers >>>> Cc: Shen, David (NSB - CN/Chengdu) >>>> Subject: Re: OpenJDK OOM issue - >>>> >>>> On Linux we should not have java heap's low address memory problem. >>>> >>>> Small swap space? Also memory left is less than 1Gb ("MemFree: 898332 >>>> kB"). >>>> >>>> Also 5326 processes it a lot. Overloaded system? >>>> >>>> Poonam, do we have bug for this? Can you attached hs_err file to it. >>>> >>> No, there is no bug for this. >>> Thanks, >>> Poonam >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/20/17 9:15 AM, Poonam Parhar wrote: >>>>> Hello Tim, >>>>> >>>>> From the hs_err_pid12678.log file, the java heap is based at >>>> 0x715a00000 which is 28gb, so there should be plenty of space available >>>> for the native heap. >>>>> >>>>> Memory map: >>>>> ... >>>>> 00600000-00601000 rw-p 00000000 fc:01 17950 >>>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131- >>>> 0.b11.el6_9.x86_64/jre/bin/java >>>>> 019bc000-019dd000 rw-p 00000000 00:00 0 >>>> [heap] >>>>> 715a00000-71cd00000 rw-p 00000000 00:00 0 >>>>> 71cd00000-787380000 ---p 00000000 00:00 0 ... >>>>> >>>>> Heap >>>>> PSYoungGen total 51200K, used 13258K [0x0000000787380000, >>>> 0x000000078ac80000, 0x00000007c0000000) >>>>> eden space 44032K, 30% used >>>> [0x0000000787380000,0x0000000788072ad0,0x0000000789e80000) >>>>> from space 7168K, 0% used >>>> [0x000000078a580000,0x000000078a580000,0x000000078ac80000) >>>>> to space 7168K, 0% used >>>> [0x0000000789e80000,0x0000000789e80000,0x000000078a580000) >>>>> ParOldGen total 117760K, used 0K [0x0000000715a00000, >>>> 0x000000071cd00000, 0x0000000787380000) >>>>> object space 117760K, 0% used >>>> [0x0000000715a00000,0x0000000715a00000,0x000000071cd00000) >>>>> Metaspace used 10485K, capacity 10722K, committed 11008K, >>>> reserved 1058816K >>>>> class space used 1125K, capacity 1227K, committed 1280K, >>>> reserved 1048576K >>>>> >>>>> >>>>> To narrow down the issue, would it be possible for you to test with - >>>> XX:-UseCompressedOops? >>>>> >>>>> Thanks, >>>>> Poonam >>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes >>>>>> Sent: Wednesday, September 20, 2017 4:43 AM >>>>>> To: Yu, Tim (NSB - CN/Chengdu); Andrew Haley; Poonam Parhar; >>>> hotspot- >>>>>> dev developers >>>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>>> Subject: Re: OpenJDK OOM issue - >>>>>> >>>>>> Tim, >>>>>> >>>>>> Please note attachments get stripped from the mailing lists. >>>>>> >>>>>> All - please drop the jdk8-dev and jdk8u-dev mailing lists from this >>>>>> and leave it just on hotspot-dev. I've tried to bcc those lists. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> David >>>>>> >>>>>> On 20/09/2017 6:44 PM, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>>> Hi All >>>>>>> >>>>>>> Thank you all for the quick response. >>>>>>> The environment information is listed as below, could you please >>>>>>> help >>>>>> to further check? >>>>>>> >>>>>>> 1. What OS is this? >>>>>>> # cat /etc/redhat-release >>>>>>> Red Hat Enterprise Linux Server release 6.9 (Santiago) # uname -a >>>>>>> Linux cloudyvm16 2.6.32-696.6.3.el6.x86_64 #1 SMP Fri Jun 30 >>>>>>> 13:24:18 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>>>>> >>>>>>> 2.GC log is listed as below. The heap information cannot be printed >>>>>> out in gc-2017_09_20-09_21_15.log when OOM happens. In gc- >>>> 2017_09_20- >>>>>> 09_21_17.log, you can see the heap begins with 0x0000000787380000 >>>> and >>>>>> it should be not the first 4G virtual memory address. >>>>>>> -rw-r--r-- 1 19477 Sep 20 09:21 hs_err_pid12678.log >>>>>>> -rw-r--r-- 1 570 Sep 20 09:21 gc-2017_09_20-09_21_15.log >>>>>>> -rw-r--r-- 1 17741 Sep 20 09:21 hs_err_pid12706.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_17.log >>>>>>> -rw-r--r-- 1 1722 Sep 20 09:21 gc-2017_09_20-09_21_18.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_19.log >>>>>>> -rw-r--r-- 1 1297 Sep 20 09:21 gc-2017_09_20-09_21_20.log >>>>>>> >>>>>>> 3. This issue happens occasionally but frequently. We periodically >>>>>> launch a JAVA program to use JMX to monitor service status of >>>> another >>>>>> JAVA service. >>>>>>> >>>>>>> Br, >>>>>>> Tim >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Andrew Haley [mailto:aph at redhat.com] >>>>>>> Sent: Tuesday, September 19, 2017 9:13 PM >>>>>>> To: Yu, Tim (NSB - CN/Chengdu) ; >>>>>>> jdk8-dev at openjdk.java.net; jdk8u-dev at openjdk.java.net >>>>>>> Cc: Shen, David (NSB - CN/Chengdu) >>>>>>> Subject: Re: OpenJDK OOM issue - >>>>>>> >>>>>>> On 19/09/17 09:50, Yu, Tim (NSB - CN/Chengdu) wrote: >>>>>>>> Hi OpenJDK dev group >>>>>>>> >>>>>>>> We meet one issue that the VM failed to initialize. The error log >>>>>>>> is >>>>>> as below. We checked both memory usage and thread number. They do >>>> not >>>>>> hit the limit. So could you please help to confirm why >>>>>> "java.lang.OutOfMemoryError: unable to create new native thread" >>>>>> error occurs? Many thanks. >>>>>>> >>>>>>> What OS is this? >>>>>>> From hohensee at amazon.com Mon Sep 25 21:34:42 2017 From: hohensee at amazon.com (Hohensee, Paul) Date: Mon, 25 Sep 2017 21:34:42 +0000 Subject: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> Message-ID: Hi Bob, Looks functionally ok, but I?m not an expert on that, so most of this review is sw engineering comments. In os_linux.cpp: Is ActiveProcessorCount used to constrain the container? The only use I can see is in active_processor_count(), which just returns the value specified on the command line. Seems to be a nop otherwise. In available_memory(), is the cast to julong in ?avail_mem = (julong)(mem_limit ? mem_usage);? needed? Maybe to get rid of a compiler warning? Otherwise it?s strange. In print_container_info(), ?Running in a container: ? is common to both if and else. You could replace the first call to st->print() with st->print(?container: ?; if (!OSContainer::is_containerized()) st->print(?not ?); st->print(?running in a container.?) I?d remove the ?OSContainer::? prefix string, it?s a bit verbose, plus you probably want an st->cr() at the end of the method. A jlong is a long on some platforms and a long long on others, so I?d replace ?0L? with just ?0?, since ?0? will get widened properly. Minor formatting nits: ?else? goes on the same line as the ?}? closing the ?then?; if an ?else? clause isn?t a block, then it usually goes on the same line as the ?else?. In osContainer.hpp/cpp: If is_containerized() is frequently called, you might want to inline it. Also, init() is called only from create_java_vm(), so it?s not multi-thread safe, which means that checking _is_initialized and calling init() if it?s not set is confusing. I?d remove _is_initiatialized. Getters in Hotspot don?t have ?get_? prefixes (at least, they never used to!), so replace ?get_container_type? with ?container_type? and ?pd_get_container_type? with ?pd_container_type?. In osContainer_linux.hpp/cpp: In the CgroupSubsystem, you may want to use the os:: versions of strdup and free. MAXBUF is used as the maximum length of various file paths. Use MAXPATHLEN+1 instead? Change get_subsystem_path() to subsystem_path()? Is it really necessary to have a separate GEN_CONTAINER_GET_INFO_STR macro? You could just pass NULL into GEN_CONTAINER_GET_INFO and have it check for scan_fmt == NULL. Minor formatting nits: ?else goes on the same line as the ?}? closing the ?then?; ?{? s/b at the end of the line defining cpuset_cpus_to_count(). The GET_CONTAINER_INFO macro should bracket the code with a block: {}. Manifest constant 9223372036854771712 should be ?static const julong UNLIMITED_MEM = 0x7ffffffffffff000;? or ?#define UNLIMITED_MEM 0x7ffffffffffff000?. Thanks, Paul On 9/22/17, 7:28 AM, "hotspot-dev on behalf of Bob Vandette" wrote: Please review these changes that improve on docker container detection and the automatic configuration of the number of active CPUs and total and free memory based on the containers resource limitation settings and metric data files. http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ These changes are enabled with -XX:+UseContainerSupport. You can enable logging for this support via -Xlog:os+container=trace. Since the dynamic selection of CPUs based on cpusets, quotas and shares may not satisfy every users needs, I?ve added an additional flag to allow the number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. Bob. From serguei.spitsyn at oracle.com Tue Sep 26 00:20:28 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 25 Sep 2017 17:20:28 -0700 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> Message-ID: <444d29d5-3b58-5aff-5f93-77b79cbf7c50@oracle.com> Hi Coleen, It looks good. Thanks, Serguei On 9/22/17 13:33, coleen.phillimore at oracle.com wrote: > > Harold pointed out privately that having > ConstantPool::resolved_references() sometimes return NULL forces the > callers to check for null for safety, so I added a > resolved_references_or_null() call for the heap dumper to call instead. > > Please review this new version, also small: > > open webrev at http://cr.openjdk.java.net/~coleenp/8081323.02/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8081323 > > Reran tier1 and heapdump tests. > > Thanks, > Coleen > > On 9/5/17 12:50 PM, coleen.phillimore at oracle.com wrote: >> >> Thank you, Serguei! >> Coleen >> >> On 9/1/17 6:48 PM, serguei.spitsyn at oracle.com wrote: >>> Hi Coleen, >>> >>> The fix looks good. >>> >>> Thanks, >>> Serguei >>> >>> >>> On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: >>>> Summary: Add resolved_references and init_lock as hidden static >>>> field in class so root is found. >>>> >>>> Tested manually with YourKit.? See bug for images.?? Also ran >>>> serviceability tests. >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >>>> >>>> Thanks, >>>> Coleen >>>> >>> >> > From jesper.wilhelmsson at oracle.com Tue Sep 26 01:17:50 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Tue, 26 Sep 2017 03:17:50 +0200 Subject: jdk10/hs is open for pushes Message-ID: <8C3AE377-ED24-4743-8FE3-EE0D803CECA7@oracle.com> Hi, The repo consolidation is now done and the hs repo is again open for pushes. Please note that in order to update your repository or push changes you will need to clone a new copy of jdk10/hs and manually move your patches from your old copy. Thanks, /Jesper From coleen.phillimore at oracle.com Tue Sep 26 03:31:52 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 25 Sep 2017 23:31:52 -0400 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: <444d29d5-3b58-5aff-5f93-77b79cbf7c50@oracle.com> References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> <444d29d5-3b58-5aff-5f93-77b79cbf7c50@oracle.com> Message-ID: <197d28d8-4a8a-b71b-da1a-f817e0fda2d0@oracle.com> Thanks for the re-review! Coleen On 9/25/17 8:20 PM, serguei.spitsyn at oracle.com wrote: > Hi Coleen, > > It looks good. > > Thanks, > Serguei > > > On 9/22/17 13:33, coleen.phillimore at oracle.com wrote: >> >> Harold pointed out privately that having >> ConstantPool::resolved_references() sometimes return NULL forces the >> callers to check for null for safety, so I added a >> resolved_references_or_null() call for the heap dumper to call instead. >> >> Please review this new version, also small: >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.02/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >> >> Reran tier1 and heapdump tests. >> >> Thanks, >> Coleen >> >> On 9/5/17 12:50 PM, coleen.phillimore at oracle.com wrote: >>> >>> Thank you, Serguei! >>> Coleen >>> >>> On 9/1/17 6:48 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi Coleen, >>>> >>>> The fix looks good. >>>> >>>> Thanks, >>>> Serguei >>>> >>>> >>>> On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: >>>>> Summary: Add resolved_references and init_lock as hidden static >>>>> field in class so root is found. >>>>> >>>>> Tested manually with YourKit.? See bug for images.?? Also ran >>>>> serviceability tests. >>>>> >>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8081323 >>>>> >>>>> Thanks, >>>>> Coleen >>>>> >>>> >>> >> > From david.holmes at oracle.com Tue Sep 26 05:19:13 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 26 Sep 2017 15:19:13 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> Message-ID: <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> Hi Bob, I think there are some high-level decisions to be made regarding Linux only or seemingly shared support for containers - see below - but overall I don't have any major issues. I can't comment on the details of the actual low-level cgroup queries etc. Comments below ... Bob writes: > Please review these changes that improve on docker container detection and the > automatic configuration of the number of active CPUs and total and free memory > based on the containers resource limitation settings and metric data files. > > http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ > > These changes are enabled with -XX:+UseContainerSupport. If this is all confined to Linux only then this should be a linux-only flag and all the changes should be confined to linux code. No shared osContainer API is needed as it can be defined as a nested class of os::Linux, and will only be called from os_linux.cpp. > > You can enable logging for this support via -Xlog:os+container=trace. I'm not sure that is the right tagging for all situations. For example in os::Linux::available_memory you might log the actual amount for os+memory, regardless of whether containerized on not. You can of course also have container specific logging in the same function. > Since the dynamic selection of CPUs based on cpusets, quotas and shares > may not satisfy every users needs, I?ve added an additional flag to allow the > number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. I would suggest that ActiveProcessorCount be constrained to being >1 - this is in line with our plans to get rid of AssumeMP/os::is_MP and always build in MP support. Otherwise a count of 1 today won't behave the same as a count of 1 in the future. Also you have defined this globally but only accounted for it on Linux. I think it makes sense to support this flag on all platforms (a generalization of AssumeMP). Otherwise it needs to be defined as a Linux-only flag in the pd_globals.hpp file --- src/os/linux/vm/os_linux.cpp In os::Linux::available_memory you might want more container specific logging to track why you don't use container info even if containerized. It might help expose a misconfigured container. In os::Linux::print_container_info I'd probably just print nothing if not containerized. General style issue: we're not constrained to only declare local variables at the start of a block. You should be able to declare the variable at, or close to, the point of initialization. Style issue: 2121 if (i < 0) st->print("OSContainer::active_processor_count() failed"); 2122 else and elsewhere. Please move the st->print to its own line. Others may argue for always using blocks ({}) in if/else. 2128 st->print("OSContainer::memory_limit_in_bytes: %ld", j); And elsewhere: %ld should be JLONG_FORMAT when printing a jlong. 5024 // User has overridden the number of active processors 5025 if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) { 5026 log_trace(os)("active_processor_count: " 5027 "active processor count set by user : %d", 5028 (int)ActiveProcessorCount); 5029 return ActiveProcessorCount; 5030 } We don't normally check flags in runtime code like this - this will be executed on every call, and you will see that logging each time. This should be handled during initialization (os::Posix::init()? - if applying this flag globally) - with logging occurring once. The above should just reduce to: if (ActiveProcessorCount > 0) { return ActiveProcessorCount; // explicit user control of number of cpus } Even then I do get concerned about having to always check for the least common cases before the most common one. :( --- The osContainer_.hpp files seem to be unnecessary as they are all empty. --- osContainer.hpp: The pd_* methods should be private. You've exposed all of the potential cpu container functions as-if they might be used directly, rather than just internally when calculating available-processors. That's okay but you then need to document all the possible return values - in particular the distinction between -2, -1, 0 and > 0. --- osContainer.cpp 41 bool OSContainer::is_containerized() { 42 if (!_is_initialized) OSContainer::init(); 43 return _is_containerized; 44 } As OSContainer::init() is called at VM initialization the above should simply assert that _is_initialized is true - else it means your call to ::init is in the wrong place and serves no purpose. That allows is_containerized to be trivially inlined and placed in the .hpp file. --- osContainer_linux.cpp As Paul commented you should check whether you should be using os::strdup, os::malloc, os::free etc to interact with NMT correctly - and also respond to OOM appropriately. 34 class CgroupSubsystem: CHeapObj { You defined this class as CHeapObj and added a destructor to free a few things, but I can't see where the instances of this class will themselves ever be freed. 62 void set_subsystem_path(char *cgroup_path) { If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere? 170 GEN_CONTAINER_GET_INFO(jlong, jlong, "%ld") %ld should be JLONG_FORMAT 417 if (memlimit == 9223372036854771712) Large constants should be defined using the CONST64 macro. And they should only be defined once as Paul suggested. 485 * Algorythm: Typo: -> Algorithm 509 cpus = OSContainer::cpu_cpuset_cpus(); 516 log_error(os,container)("Error getting cpuset_cpucount"); That seems to be the wrong error message for the function called. 522 share_count = ceilf((float)share / 1024.0f); The cast to float is not needed as you are dividing by a float. 509 cpus = OSContainer::cpu_cpuset_cpus(); 510 if (cpus != (char *)CONTAINER_ERROR) { This is invalid code - you can't cast -2 to char*. The only way to report an error from a char* function is an actual error message, or else NULL. 509 cpus = OSContainer::cpu_cpuset_cpus(); As you are inside a pd_* implementation, I don't think you want or need to call the public versions of the other OSContainer functions - you can just call the local pd_ variants directly. Nothing written at the OSContainer level should be allowed to affect what you calculate here inside your implementation. 555 * Return the number of miliseconds per period Typo: -> milliseconds 525 else share_count = cpu_count; Style (various locations): separate line and possibly {} Thanks, David ----- > > Bob. From david.holmes at oracle.com Tue Sep 26 09:01:01 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 26 Sep 2017 19:01:01 +1000 Subject: [RFR]: 8187590: Zero runtime can lock-up on linux-alpha In-Reply-To: <6d7f27a2-60d0-9cc9-9f8e-e21e2877b476@redhat.com> References: <6d7f27a2-60d0-9cc9-9f8e-e21e2877b476@redhat.com> Message-ID: <60818bc8-70c9-acaa-4cba-5fade4995154@oracle.com> I'll add my Review and raise you a Sponsor. :) Cheers, David > Andrew Haley aph at redhat.com > Mon Sep 18 15:38:08 UTC 2017 > On 15/09/17 22:15, John Paul Adrian Glaubitz wrote: >> Please review this change [1] which fixes random lockups of the Zero runtime >> on linux-alpha. This was discovered when building OpenJDK on the Debian >> automatic package builders for Alpha, particularly on SMP machines. >> >> It was observed that the issue could be fixed by installing a uni-processor >> kernel. After some testing, we discovered that the proper fix is to use >> __sync_synchronize() even for light memory barriers. > > That looks right, thanks. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From bob.vandette at oracle.com Wed Sep 27 15:45:51 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 27 Sep 2017 11:45:51 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> Message-ID: <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> David, Thank you for taking the time and providing a detailed review of these changes. Where I haven?t responded, I?ll update the implementation based on your comments. > On Sep 26, 2017, at 1:19 AM, David Holmes wrote: > > Hi Bob, > > I think there are some high-level decisions to be made regarding Linux only or seemingly shared support for containers - see below - but overall I don't have any major issues. I can't comment on the details of the actual low-level cgroup queries etc. > > Comments below ... > > Bob writes: >> Please review these changes that improve on docker container detection and the >> automatic configuration of the number of active CPUs and total and free memory >> based on the containers resource limitation settings and metric data files. >> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >> These changes are enabled with -XX:+UseContainerSupport. > > If this is all confined to Linux only then this should be a linux-only flag and all the changes should be confined to linux code. No shared osContainer API is needed as it can be defined as a nested class of os::Linux, and will only be called from os_linux.cpp. I received feedback on my other Container work where I was asked to make sure it was possible to support other container technologies. The addition of the shared osContainer API is to prepare for this and recognize that this will eventually be supported other platforms. > >> You can enable logging for this support via -Xlog:os+container=trace. > > I'm not sure that is the right tagging for all situations. For example in os::Linux::available_memory you might log the actual amount for os+memory, regardless of whether containerized on not. You can of course also have container specific logging in the same function. > >> Since the dynamic selection of CPUs based on cpusets, quotas and shares >> may not satisfy every users needs, I?ve added an additional flag to allow the >> number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. > > I would suggest that ActiveProcessorCount be constrained to being >1 - this is in line with our plans to get rid of AssumeMP/os::is_MP and always build in MP support. Otherwise a count of 1 today won't behave the same as a count of 1 in the future. What if I return true for is_MP anytime ActiveProcessorCount is set. I?d like to provide the ability of specifying a single processor. > > Also you have defined this globally but only accounted for it on Linux. I think it makes sense to support this flag on all platforms (a generalization of AssumeMP). Otherwise it needs to be defined as a Linux-only flag in the pd_globals.hpp file Good idea. > > --- > > src/os/linux/vm/os_linux.cpp > > In os::Linux::available_memory you might want more container specific logging to track why you don't use container info even if containerized. It might help expose a misconfigured container. > > In os::Linux::print_container_info I'd probably just print nothing if not containerized. > > General style issue: we're not constrained to only declare local variables at the start of a block. You should be able to declare the variable at, or close to, the point of initialization. > > Style issue: > > 2121 if (i < 0) st->print("OSContainer::active_processor_count() failed"); > 2122 else > > and elsewhere. Please move the st->print to its own line. Others may argue for always using blocks ({}) in if/else. There doesn?t seem to be consistency on this issue. > > 2128 st->print("OSContainer::memory_limit_in_bytes: %ld", j); > > And elsewhere: %ld should be JLONG_FORMAT when printing a jlong. I found a few places already where I needed to add this. I?ll look for all of them. > > 5024 // User has overridden the number of active processors > 5025 if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) { > 5026 log_trace(os)("active_processor_count: " > 5027 "active processor count set by user : %d", > 5028 (int)ActiveProcessorCount); > 5029 return ActiveProcessorCount; > 5030 } > > We don't normally check flags in runtime code like this - this will be executed on every call, and you will see that logging each time. This should be handled during initialization (os::Posix::init()? - if applying this flag globally) - with logging occurring once. The above should just reduce to: > > if (ActiveProcessorCount > 0) { > return ActiveProcessorCount; // explicit user control of number of cpus > } > > Even then I do get concerned about having to always check for the least common cases before the most common one. :( This is not in a highly used function so it should be ok. > > --- > > The osContainer_.hpp files seem to be unnecessary as they are all empty. I?ll remove them. I wasn?t sure if there was a convention to move more of osContainer_linux.cpp -> osContainer_linux.hpp. For example: class CgroupSubsystem > > --- > > osContainer.hpp: > > The pd_* methods should be private. > > You've exposed all of the potential cpu container functions as-if they might be used directly, rather than just internally when calculating available-processors. That's okay but you then need to document all the possible return values - in particular the distinction between -2, -1, 0 and > 0. > > --- > > osContainer.cpp > > 41 bool OSContainer::is_containerized() { > 42 if (!_is_initialized) OSContainer::init(); > 43 return _is_containerized; > 44 } > > As OSContainer::init() is called at VM initialization the above should simply assert that _is_initialized is true - else it means your call to ::init is in the wrong place and serves no purpose. That allows is_containerized to be trivially inlined and placed in the .hpp file. > > --- > > osContainer_linux.cpp > > As Paul commented you should check whether you should be using os::strdup, os::malloc, os::free etc to interact with NMT correctly - and also respond to OOM appropriately. > > 34 class CgroupSubsystem: CHeapObj { > > You defined this class as CHeapObj and added a destructor to free a few things, but I can't see where the instances of this class will themselves ever be freed What?s the latest thinking on freeing CHeap Objects on termination? Is it really worth wasting cpu cycles when our process is about to terminate? If not, I?ll just remove the destructors. > > 62 void set_subsystem_path(char *cgroup_path) { > > If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere? I tried several different ways of declaring the container accessor functions and always ended up with warnings due to scanf not being able to validate arguments since the format string didn?t end up being a string literal. I originally was using templates and then ended up with the macros. I tried several different casts but could resolve the problem. > > 170 GEN_CONTAINER_GET_INFO(jlong, jlong, "%ld") > > %ld should be JLONG_FORMAT > > 417 if (memlimit == 9223372036854771712) > > Large constants should be defined using the CONST64 macro. And they should only be defined once as Paul suggested. > > 485 * Algorythm: > > Typo: -> Algorithm > > 509 cpus = OSContainer::cpu_cpuset_cpus(); > 516 log_error(os,container)("Error getting cpuset_cpucount"); > > That seems to be the wrong error message for the function called. > > 522 share_count = ceilf((float)share / 1024.0f); > > The cast to float is not needed as you are dividing by a float. > > 509 cpus = OSContainer::cpu_cpuset_cpus(); > 510 if (cpus != (char *)CONTAINER_ERROR) { > > This is invalid code - you can't cast -2 to char*. The only way to report an error from a char* function is an actual error message, or else NULL. I was trying to make error reporting consistent for all types. I?ll change the error for strings to NULL. > > 509 cpus = OSContainer::cpu_cpuset_cpus(); > > As you are inside a pd_* implementation, I don't think you want or need to call the public versions of the other OSContainer functions - you can just call the local pd_ variants directly. Nothing written at the OSContainer level should be allowed to affect what you calculate here inside your implementation. > > 555 * Return the number of miliseconds per period > > Typo: -> milliseconds > > 525 else share_count = cpu_count; > > Style (various locations): separate line and possibly {} > > Thanks, > David > ?? Thanks, Bob. > >> Bob. > From bob.vandette at oracle.com Wed Sep 27 16:09:07 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 27 Sep 2017 12:09:07 -0400 Subject: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> Message-ID: Thanks for the review Paul. I?ll take care of all of your suggested changes where I haven?t commented below. > On Sep 25, 2017, at 5:34 PM, Hohensee, Paul wrote: > > Hi Bob, > > Looks functionally ok, but I?m not an expert on that, so most of this review is sw engineering comments. > > In os_linux.cpp: > > Is ActiveProcessorCount used to constrain the container? The only use I can see is in active_processor_count(), which just returns the value specified on the command line. Seems to be a nop otherwise. Yes, it returns ActiveProcessorCount in the os:active_processor_count() function rather than trying to determine the number of active processors by other mechanisms. This is not a noop. > > In available_memory(), is the cast to julong in ?avail_mem = (julong)(mem_limit ? mem_usage);? needed? Maybe to get rid of a compiler warning? Otherwise it?s strange. Yes, compiler warning. > > In print_container_info(), ?Running in a container: ? is common to both if and else. You could replace the first call to st->print() with > st->print(?container: ?; if (!OSContainer::is_containerized()) st->print(?not ?); st->print(?running in a container.?) > I?d remove the ?OSContainer::? prefix string, it?s a bit verbose, plus you probably want an st->cr() at the end of the method. > A jlong is a long on some platforms and a long long on others, so I?d replace ?0L? with just ?0?, since ?0? will get widened properly. > Minor formatting nits: ?else? goes on the same line as the ?}? closing the ?then?; if an ?else? clause isn?t a block, then it usually goes on the same line as the ?else?. > > In osContainer.hpp/cpp: > > If is_containerized() is frequently called, you might want to inline it. Also, init() is called only from create_java_vm(), so it?s not multi-thread safe, which means that checking _is_initialized and calling init() if it?s not set is confusing. I?d remove _is_initiatialized. > > Getters in Hotspot don?t have ?get_? prefixes (at least, they never used to!), so replace ?get_container_type? with ?container_type? and ?pd_get_container_type? with ?pd_container_type?. There are many examples of get_xxx in hotspot now. > > In osContainer_linux.hpp/cpp: > > In the CgroupSubsystem, you may want to use the os:: versions of strdup and free. > MAXBUF is used as the maximum length of various file paths. Use MAXPATHLEN+1 instead? > Change get_subsystem_path() to subsystem_path()? > > Is it really necessary to have a separate GEN_CONTAINER_GET_INFO_STR macro? You could just pass NULL into GEN_CONTAINER_GET_INFO and have it check for scan_fmt == NULL. I was trying to avoid additional runtime testing since I already have a lot of required error checking to do. > > Minor formatting nits: ?else goes on the same line as the ?}? closing the ?then?; ?{? s/b at the end of the line defining cpuset_cpus_to_count(). > > The GET_CONTAINER_INFO macro should bracket the code with a block: {}. > > Manifest constant 9223372036854771712 should be ?static const julong UNLIMITED_MEM = 0x7ffffffffffff000;? or ?#define UNLIMITED_MEM 0x7ffffffffffff000?. > Thanks, Bob. > Thanks, > > Paul > > On 9/22/17, 7:28 AM, "hotspot-dev on behalf of Bob Vandette" wrote: > > Please review these changes that improve on docker container detection and the > automatic configuration of the number of active CPUs and total and free memory > based on the containers resource limitation settings and metric data files. > > http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ > > These changes are enabled with -XX:+UseContainerSupport. > > You can enable logging for this support via -Xlog:os+container=trace. > > Since the dynamic selection of CPUs based on cpusets, quotas and shares > may not satisfy every users needs, I?ve added an additional flag to allow the > number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. > > > Bob. > > > > > From glaubitz at physik.fu-berlin.de Wed Sep 27 16:40:38 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 27 Sep 2017 18:40:38 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code Message-ID: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> Hello! Zero currently fails to build on linux-sparc since there are two instances where SPARC-specific defines conflict with the Zero code. One instance is src/hotspot/share/compiler/oopMap.cpp where a header is uncondtionally included for SPARC which is not available for Zero. It turns out that this header, vmreg_sparc.inline.hpp, does actually not need to be included here. So, easiest part is just to remove the inclusion of this header: @@ -38,13 +38,10 @@ #include "c1/c1_Defs.hpp" #endif #ifdef COMPILER2 #include "opto/optoreg.hpp" #endif -#ifdef SPARC -#include "vmreg_sparc.inline.hpp" -#endif The second instance is src/hotspot/cpu/sparc/memset_with_concurrent_readers_sparc.cpp which is not included in the build for Zero. However, we always need to include it because of the particular implementation of memset() on SPARC which does not work well with concurrent readers which is why memset_with_concurrent_readers.hpp looks like this: #ifdef SPARC // SPARC requires special handling. See SPARC-specific definition. #else // All others just use memset. inline void memset_with_concurrent_readers(void* to, int value, size_t size) { ::memset(to, value, size); } #endif // End of target dispatch. The webrev to fix the Zero build on SPARC includes both changes and can be found in [1]. Thanks, Adrian > [1] http://cr.openjdk.java.net/~glaubitz/8186578/webrev.02/ -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From dmitry.chuyko at bell-sw.com Wed Sep 27 17:04:55 2017 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Wed, 27 Sep 2017 20:04:55 +0300 Subject: [10] RFR: 8186671 - AARCH64: Use `yield` instruction in SpinPause on linux-aarch64 In-Reply-To: References: Message-ID: Hello, Re-sending this to hotspot-dev on the advice of Adrew, the patch is updated for consolidated repo. rfe: https://bugs.openjdk.java.net/browse/JDK-8186671 webrev: http://cr.openjdk.java.net/~dchuyko/8186671/webrev.01/ original thread: http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2017-August/004870.html The function was moved to platform .S file and now implemented with yield instruction. -Dmitry -------- Forwarded Message -------- Subject: Re: [aarch64-port-dev ] RFR: 8186671: Use `yield` instruction in SpinPause on linux-aarch64 Date: Sat, 2 Sep 2017 09:10:00 +0100 From: Andrew Haley To: Dmitry Chuyko , aarch64-port-dev at openjdk.java.net On 01/09/17 17:26, Dmitry Chuyko wrote: > There were no objections to this part (extern). I need sponsorship to > push the change. I can do it, but it really needs to be sent to hotspot-dev. > It would be interesting to discuss the other (intrinsic) part a bit more > at fireside chat. OK, but without any actual implementations we can test it'll be a very short discussion. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From wenlei.xie at gmail.com Wed Sep 27 18:03:48 2017 From: wenlei.xie at gmail.com (Wenlei Xie) Date: Wed, 27 Sep 2017 11:03:48 -0700 Subject: Questions about negative loaded classes and Lambda Form Compilation Message-ID: Hi, We recently see some weird behavior of JVM in our production cluster. We are running JDK 1.8.0_131. 1. On more than half of the machines (200 out of 400 machines), we see he JMX counter report negative LoadedClassCount, see attached jmxcounter.png. After some further dig, we note UnloadedClassCount is larger than TotalLoadedClassCount. And LoadedClassCount (-695,710) = TotalLoadedClassCount - UnloadedClassCount . PerfCounter reports the same number, here is the result on the same machine: $ jcmd 307 PerfCounter.print | grep -i class | grep -i java.cls java.cls.loadedClasses=192004392 java.cls.sharedLoadedClasses=0 java.cls.sharedUnloadedClasses=0 java.cls.unloadedClasses=192700102 2. For the same cluster, we also see over half of machines repeatedly experiencing full GC due to Metaspace full. We dump JSTACK for every minute during 30 minutes, and see many threads are trying to compile the exact same lambda form throughout the 30-minute period. Here is an example stacktrace on one machine. The LambdaForm triggers the compilation on that machine is always LambdaForm$MH/170067652. Once it's compiled, it should use the new compiled lambda form. We don't know why it's still trying to compile the same lambda form again and again. -- Would it be because the compiled lambda form somehow failed to load? This might relate to the negative number of loaded classes. "20170926_232912_39740_3vuuu.1.79-4-76640" #76640 prio=5 os_prio=0 tid=0x00007f908006dbd0 nid=0x150a6 runnable [0x00007f8bddb1b000] java.lang.Thread.State: RUNNABLE at sun.misc.Unsafe.defineAnonymousClass(Native Method) at java.lang.invoke.InvokerBytecodeGenerator. loadAndInitializeInvokerClass(InvokerBytecodeGenerator.java:284) at java.lang.invoke.InvokerBytecodeGenerator.loadMethod( InvokerBytecodeGenerator.java:276) at java.lang.invoke.InvokerBytecodeGenerator. generateCustomizedCode(InvokerBytecodeGenerator.java:618) at java.lang.invoke.LambdaForm.compileToBytecode(LambdaForm. java:654) at java.lang.invoke.LambdaForm.prepare(LambdaForm.java:635) at java.lang.invoke.MethodHandle.updateForm(MethodHandle.java: 1432) at java.lang.invoke.MethodHandle.customize(MethodHandle.java: 1442) at java.lang.invoke.Invokers.maybeCustomize(Invokers.java:407) at java.lang.invoke.Invokers.checkCustomized(Invokers.java:398) at java.lang.invoke.LambdaForm$MH/170067652.invokeExact_MT( LambdaForm$MH) at com.facebook.presto.operator.aggregation.MinMaxHelper. combineStateWithState(MinMaxHelper.java:141) at com.facebook.presto.operator.aggregation. MaxAggregationFunction.combine(MaxAggregationFunction.java:108) at java.lang.invoke.LambdaForm$DMH/1607453282.invokeStatic_ L3_V(LambdaForm$DMH) at java.lang.invoke.LambdaForm$BMH/1118134445.reinvoke( LambdaForm$BMH) at java.lang.invoke.LambdaForm$MH/1971758264. linkToTargetMethod(LambdaForm$MH) at com.facebook.presto.$gen.IntegerIntegerMaxGroupedAccumu lator_3439.addIntermediate(Unknown Source) at com.facebook.presto.operator.aggregation.builder. InMemoryHashAggregationBuilder$Aggregator.processPage( InMemoryHashAggregationBuilder.java:367) at com.facebook.presto.operator.aggregation.builder. InMemoryHashAggregationBuilder.processPage(InMemoryHashAggregationBuilder .java:138) at com.facebook.presto.operator.HashAggregationOperator. addInput(HashAggregationOperator.java:400) at com.facebook.presto.operator.Driver.processInternal(Driver. java:343) at com.facebook.presto.operator.Driver.lambda$processFor$6( Driver.java:241) at com.facebook.presto.operator.Driver$$Lambda$765/442308692.get(Unknown Source) at com.facebook.presto.operator.Driver.tryWithLock(Driver. java:614) at com.facebook.presto.operator.Driver.processFor(Driver.java: 235) at com.facebook.presto.execution.SqlTaskExecution$ DriverSplitRunner.processFor(SqlTaskExecution.java:622) at com.facebook.presto.execution.executor. PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) at com.facebook.presto.execution.executor.TaskExecutor$ TaskRunner.run(TaskExecutor.java:485) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) ... Both issues go away after we restart the JVM, and the same query won't trigger the LambdaForm compilation issue, so it looks like the JVM enters some weird state. We are wondering if there is any thoughts on what could trigger these issues? Or is there any suggestions about how to further investigate it next time we see the VM in this state? Thank you. -- Best Regards, Wenlei Xie Email: wenlei.xie at gmail.com From kim.barrett at oracle.com Wed Sep 27 18:51:20 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 27 Sep 2017 14:51:20 -0400 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> Message-ID: <474BDE59-253E-4C34-8D40-626EADBF84FC@oracle.com> > On Sep 27, 2017, at 12:40 PM, John Paul Adrian Glaubitz wrote: > > Hello! > > Zero currently fails to build on linux-sparc since there are two instances where > SPARC-specific defines conflict with the Zero code. > > [?] > The webrev to fix the Zero build on SPARC includes both changes and can be found > in [1]. > > Thanks, > Adrian > >> [1] http://cr.openjdk.java.net/~glaubitz/8186578/webrev.02/ > > -- > .''`. John Paul Adrian Glaubitz > : :' : Debian Developer - glaubitz at debian.org > `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de > `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 Looks good. Please update the copyright in oopMap.cpp. From magnus.ihse.bursie at oracle.com Wed Sep 27 19:47:39 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Wed, 27 Sep 2017 21:47:39 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> Message-ID: <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> Build change looks good. /Magnus On 2017-09-27 18:40, John Paul Adrian Glaubitz wrote: > Hello! > > Zero currently fails to build on linux-sparc since there are two > instances where > SPARC-specific defines conflict with the Zero code. > > One instance is src/hotspot/share/compiler/oopMap.cpp where a header > is uncondtionally > included for SPARC which is not available for Zero. It turns out that > this header, > vmreg_sparc.inline.hpp, does actually not need to be included here. > So, easiest > part is just to remove the inclusion of this header: > > @@ -38,13 +38,10 @@ > ?#include "c1/c1_Defs.hpp" > ?#endif > ?#ifdef COMPILER2 > ?#include "opto/optoreg.hpp" > ?#endif > -#ifdef SPARC > -#include "vmreg_sparc.inline.hpp" > -#endif > > The second instance is > src/hotspot/cpu/sparc/memset_with_concurrent_readers_sparc.cpp > which is not included in the build for Zero. However, we always need > to include it > because of the particular implementation of memset() on SPARC which > does not work > well with concurrent readers which is why > memset_with_concurrent_readers.hpp looks > like this: > > #ifdef SPARC > > // SPARC requires special handling.? See SPARC-specific definition. > > #else > // All others just use memset. > > inline void memset_with_concurrent_readers(void* to, int value, size_t > size) { > ? ::memset(to, value, size); > } > > #endif // End of target dispatch. > > The webrev to fix the Zero build on SPARC includes both changes and > can be found > in [1]. > > Thanks, > Adrian > >> [1] http://cr.openjdk.java.net/~glaubitz/8186578/webrev.02/ > From coleen.phillimore at oracle.com Wed Sep 27 19:57:01 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 27 Sep 2017 15:57:01 -0400 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> Message-ID: <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> This looks good.? I can sponsor this for you when you update the copyright and patch in the webrev, with the 3 reviewers. Thanks, Coleen On 9/27/17 3:47 PM, Magnus Ihse Bursie wrote: > Build change looks good. > > /Magnus > > On 2017-09-27 18:40, John Paul Adrian Glaubitz wrote: >> Hello! >> >> Zero currently fails to build on linux-sparc since there are two >> instances where >> SPARC-specific defines conflict with the Zero code. >> >> One instance is src/hotspot/share/compiler/oopMap.cpp where a header >> is uncondtionally >> included for SPARC which is not available for Zero. It turns out that >> this header, >> vmreg_sparc.inline.hpp, does actually not need to be included here. >> So, easiest >> part is just to remove the inclusion of this header: >> >> @@ -38,13 +38,10 @@ >> ?#include "c1/c1_Defs.hpp" >> ?#endif >> ?#ifdef COMPILER2 >> ?#include "opto/optoreg.hpp" >> ?#endif >> -#ifdef SPARC >> -#include "vmreg_sparc.inline.hpp" >> -#endif >> >> The second instance is >> src/hotspot/cpu/sparc/memset_with_concurrent_readers_sparc.cpp >> which is not included in the build for Zero. However, we always need >> to include it >> because of the particular implementation of memset() on SPARC which >> does not work >> well with concurrent readers which is why >> memset_with_concurrent_readers.hpp looks >> like this: >> >> #ifdef SPARC >> >> // SPARC requires special handling.? See SPARC-specific definition. >> >> #else >> // All others just use memset. >> >> inline void memset_with_concurrent_readers(void* to, int value, >> size_t size) { >> ? ::memset(to, value, size); >> } >> >> #endif // End of target dispatch. >> >> The webrev to fix the Zero build on SPARC includes both changes and >> can be found >> in [1]. >> >> Thanks, >> Adrian >> >>> [1] http://cr.openjdk.java.net/~glaubitz/8186578/webrev.02/ >> > From glaubitz at physik.fu-berlin.de Wed Sep 27 19:57:53 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 27 Sep 2017 21:57:53 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <474BDE59-253E-4C34-8D40-626EADBF84FC@oracle.com> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <474BDE59-253E-4C34-8D40-626EADBF84FC@oracle.com> Message-ID: <32e954b6-b544-43e4-b565-01a787371a7e@physik.fu-berlin.de> Hi Kim! On 09/27/2017 08:51 PM, Kim Barrett wrote: > Looks good. > > Please update the copyright in oopMap.cpp. Done [1]. Adrian > [1] http://cr.openjdk.java.net/~glaubitz/8186578/webrev.03/ -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From hohensee at amazon.com Wed Sep 27 19:59:29 2017 From: hohensee at amazon.com (Hohensee, Paul) Date: Wed, 27 Sep 2017 19:59:29 +0000 Subject: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> Message-ID: Hadn?t known about the existence of ?get_? getter names. That means there?s a mix currently, which is confusing, but a matter for another RFE or a change to the style guide. See https://wiki.openjdk.java.net/display/HotSpot/StyleGuide which says Getter accessor names are noun phrases, with no "get_" noise word. Boolean getters can also begin with "is_" or "has_". Re GEN_CONTAINER_GET_INFO_STR, on Intel processors the compare-and-branch generated for scan_fmt == NULL is a single cycle and probably similarly cheap on other platforms. Not a lot of overhead in exchange for de-duplication. Thanks, Paul On 9/27/17, 9:10 AM, "Bob Vandette" wrote: Thanks for the review Paul. I?ll take care of all of your suggested changes where I haven?t commented below. > On Sep 25, 2017, at 5:34 PM, Hohensee, Paul wrote: > > Hi Bob, > > Looks functionally ok, but I?m not an expert on that, so most of this review is sw engineering comments. > > In os_linux.cpp: > > Is ActiveProcessorCount used to constrain the container? The only use I can see is in active_processor_count(), which just returns the value specified on the command line. Seems to be a nop otherwise. Yes, it returns ActiveProcessorCount in the os:active_processor_count() function rather than trying to determine the number of active processors by other mechanisms. This is not a noop. > > In available_memory(), is the cast to julong in ?avail_mem = (julong)(mem_limit ? mem_usage);? needed? Maybe to get rid of a compiler warning? Otherwise it?s strange. Yes, compiler warning. > > In print_container_info(), ?Running in a container: ? is common to both if and else. You could replace the first call to st->print() with > st->print(?container: ?; if (!OSContainer::is_containerized()) st->print(?not ?); st->print(?running in a container.?) > I?d remove the ?OSContainer::? prefix string, it?s a bit verbose, plus you probably want an st->cr() at the end of the method. > A jlong is a long on some platforms and a long long on others, so I?d replace ?0L? with just ?0?, since ?0? will get widened properly. > Minor formatting nits: ?else? goes on the same line as the ?}? closing the ?then?; if an ?else? clause isn?t a block, then it usually goes on the same line as the ?else?. > > In osContainer.hpp/cpp: > > If is_containerized() is frequently called, you might want to inline it. Also, init() is called only from create_java_vm(), so it?s not multi-thread safe, which means that checking _is_initialized and calling init() if it?s not set is confusing. I?d remove _is_initiatialized. > > Getters in Hotspot don?t have ?get_? prefixes (at least, they never used to!), so replace ?get_container_type? with ?container_type? and ?pd_get_container_type? with ?pd_container_type?. There are many examples of get_xxx in hotspot now. > > In osContainer_linux.hpp/cpp: > > In the CgroupSubsystem, you may want to use the os:: versions of strdup and free. > MAXBUF is used as the maximum length of various file paths. Use MAXPATHLEN+1 instead? > Change get_subsystem_path() to subsystem_path()? > > Is it really necessary to have a separate GEN_CONTAINER_GET_INFO_STR macro? You could just pass NULL into GEN_CONTAINER_GET_INFO and have it check for scan_fmt == NULL. I was trying to avoid additional runtime testing since I already have a lot of required error checking to do. > > Minor formatting nits: ?else goes on the same line as the ?}? closing the ?then?; ?{? s/b at the end of the line defining cpuset_cpus_to_count(). > > The GET_CONTAINER_INFO macro should bracket the code with a block: {}. > > Manifest constant 9223372036854771712 should be ?static const julong UNLIMITED_MEM = 0x7ffffffffffff000;? or ?#define UNLIMITED_MEM 0x7ffffffffffff000?. > Thanks, Bob. > Thanks, > > Paul > > On 9/22/17, 7:28 AM, "hotspot-dev on behalf of Bob Vandette" wrote: > > Please review these changes that improve on docker container detection and the > automatic configuration of the number of active CPUs and total and free memory > based on the containers resource limitation settings and metric data files. > > http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ > > These changes are enabled with -XX:+UseContainerSupport. > > You can enable logging for this support via -Xlog:os+container=trace. > > Since the dynamic selection of CPUs based on cpusets, quotas and shares > may not satisfy every users needs, I?ve added an additional flag to allow the > number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. > > > Bob. > > > > > From john.r.rose at oracle.com Wed Sep 27 20:05:09 2017 From: john.r.rose at oracle.com (John Rose) Date: Wed, 27 Sep 2017 13:05:09 -0700 Subject: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> Message-ID: <7B2F30D0-C8DF-43CA-BC65-3E91EC136294@oracle.com> "There are counter examples to the prevailing style" does not lead to "therefore I can diverge from the prevailing style whenever I want." You need another reason, such as "I'm modifying an API where an alternative style is the local norm" or "these identifiers gain special value from their special style." ? John On Sep 27, 2017, at 12:59 PM, Hohensee, Paul wrote: >> Getters in Hotspot don?t have ?get_? prefixes (at least, they never used to!), so replace ?get_container_type? with ?container_type? and ?pd_get_container_type? with ?pd_container_type?. > > There are many examples of get_xxx in hotspot now. From glaubitz at physik.fu-berlin.de Wed Sep 27 20:10:30 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 27 Sep 2017 22:10:30 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> Message-ID: On 09/27/2017 09:57 PM, coleen.phillimore at oracle.com wrote:> This looks good.? I can sponsor this for you when you update the copyright and patch in the webrev, with the 3 reviewers. Just a second, please. $(HOTSPOT_TOP_DIR) seems to be wrong. Please don't merge yet. I need to perform another test run. -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From glaubitz at physik.fu-berlin.de Wed Sep 27 20:11:53 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 27 Sep 2017 22:11:53 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <32e954b6-b544-43e4-b565-01a787371a7e@physik.fu-berlin.de> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <474BDE59-253E-4C34-8D40-626EADBF84FC@oracle.com> <32e954b6-b544-43e4-b565-01a787371a7e@physik.fu-berlin.de> Message-ID: On 09/27/2017 09:57 PM, John Paul Adrian Glaubitz wrote:> On 09/27/2017 08:51 PM, Kim Barrett wrote: >> Looks good. >> >> Please update the copyright in oopMap.cpp. > > Done [1]. Just a second, please. I just ran into build errors. Please don't merge as is. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From bob.vandette at oracle.com Wed Sep 27 20:22:20 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 27 Sep 2017 16:22:20 -0400 Subject: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <7B2F30D0-C8DF-43CA-BC65-3E91EC136294@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7B2F30D0-C8DF-43CA-BC65-3E91EC136294@oracle.com> Message-ID: <858C4BE6-3469-4B4F-9470-8770A6DF0DB8@oracle.com> I don?t have a problem with following the approved style. Whenever I?m adding new code in a large body of code I usually look around and try to adapt to the norm assuming that others that came before me were properly reviewed for style conformance. I assumed that the style guide had been updated but didn?t check. I?ll just point out that there are many many offenders that have been in the source tree a looong time. Since I am more or less extending the os class, I followed os.hpp?s lead. os.hpp: static const char* get_temp_directory(); os.hpp: static const char* get_current_directory(char *buf, size_t buflen); os.hpp: static int get_loaded_modules_info(LoadedModulesCallbackFunc callback, void *param); os.hpp: static void* get_default_process_handle(); os.hpp: static bool get_host_name(char* buf, size_t buflen); os.hpp: static int get_last_error(); os.hpp: static frame get_sender_for_C_frame(frame *fr); os.hpp: static int get_signal_number(const char* signal_name); os.hpp: static int get_native_stack(address* stack, int size, int toSkip = 0); Here are a few more examples: thread.hpp: const char* get_thread_name() const; thread.hpp: const char* get_thread_name_string(char* buf = NULL, int buflen = 0) const; thread.hpp: const char* get_threadgroup_name() const; thread.hpp: const char* get_parent_name() const; sharedRuntime.hpp: static address get_handle_wrong_method_stub() { sharedRuntime.hpp: static address get_handle_wrong_method_abstract_stub() { sharedRuntime.hpp: static address get_resolve_opt_virtual_call_stub() { sharedRuntime.hpp: static address get_resolve_virtual_call_stub() { threadService.hpp: static jlong get_total_thread_count() { return _total_threads_count->get_value(); } threadService.hpp: static jlong get_peak_thread_count() { return _peak_threads_count->get_value(); } threadService.hpp: static jlong get_live_thread_count() { return _live_threads_count->get_value() - _exiting_threads_count; } threadService.hpp: static jlong get_daemon_thread_count() { return _daemon_threads_count->get_value() - _exiting_daemon_threads_count; } memoryService.hpp: static MemoryPool* get_memory_pool(instanceHandle pool); memoryService.hpp: static MemoryManager* get_memory_manager(instanceHandle mgr); memoryService.hpp: static MemoryPool* get_memory_pool(int index) { memoryService.hpp: static MemoryManager* get_memory_manager(int index) { memoryService.hpp: static bool get_verbose() { return log_is_enabled(Info, gc); } memoryService.hpp: static const GCMemoryManager* get_minor_gc_manager() { memoryService.hpp: static const GCMemoryManager* get_major_gc_manager() { memTracker.cpp: return MallocTracker::get_base(memblock); memTracker.hpp: static inline Tracker get_virtual_memory_uncommit_tracker() { return Tracker(); } memTracker.hpp: static inline Tracker get_virtual_memory_release_tracker() { return Tracker(); } memTracker.hpp: return MallocTracker::get_header_size(memblock); memTracker.hpp: static inline Tracker get_virtual_memory_uncommit_tracker() { memTracker.hpp: static inline Tracker get_virtual_memory_release_tracker() { memTracker.hpp: static inline MemBaseline& get_baseline() { I?ll update my changes, Bob. > On Sep 27, 2017, at 4:05 PM, John Rose wrote: > > "There are counter examples to the prevailing style" does not lead to "therefore I can diverge from the prevailing style whenever I want." You need another reason, such as "I'm modifying an API where an alternative style is the local norm" or "these identifiers gain special value from their special style." > > ? John > > On Sep 27, 2017, at 12:59 PM, Hohensee, Paul wrote: > >>> Getters in Hotspot don?t have ?get_? prefixes (at least, they never used to!), so replace ?get_container_type? with ?container_type? and ?pd_get_container_type? with ?pd_container_type?. >> >> There are many examples of get_xxx in hotspot now. > From coleen.phillimore at oracle.com Wed Sep 27 20:23:16 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 27 Sep 2017 16:23:16 -0400 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <474BDE59-253E-4C34-8D40-626EADBF84FC@oracle.com> <32e954b6-b544-43e4-b565-01a787371a7e@physik.fu-berlin.de> Message-ID: <8b8c9ea0-22fe-fbb8-c560-0da42c0dc67f@oracle.com> ok. On 9/27/17 4:11 PM, John Paul Adrian Glaubitz wrote: > On 09/27/2017 09:57 PM, John Paul Adrian Glaubitz wrote:> On > 09/27/2017 08:51 PM, Kim Barrett wrote: >>> Looks good. >>> >>> Please update the copyright in oopMap.cpp. >> >> Done [1]. > > Just a second, please. I just ran into build errors. > > Please don't merge as is. > > Adrian > From magnus.ihse.bursie at oracle.com Wed Sep 27 20:30:55 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Wed, 27 Sep 2017 22:30:55 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> Message-ID: <69861ddf-e2e1-0fe6-7b61-20fd5171d5cb@oracle.com> On 2017-09-27 22:10, John Paul Adrian Glaubitz wrote: > On 09/27/2017 09:57 PM, coleen.phillimore at oracle.com wrote:> This > looks good.? I can sponsor this for you when you update the copyright > and patch in the webrev, with the 3 reviewers. > Just a second, please. $(HOTSPOT_TOP_DIR) seems to be wrong. > > Please don't merge yet. I need to perform another test run. You are correct, $(HOTSPOT_TOPDIR) does not exist anymore (since it does not make sence in the new source layout). Replace it with $(TOPDIR)/src/hotspot. I am sorry I missed it at the review. :( /Magnus From glaubitz at physik.fu-berlin.de Wed Sep 27 20:32:35 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 27 Sep 2017 22:32:35 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <69861ddf-e2e1-0fe6-7b61-20fd5171d5cb@oracle.com> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> <69861ddf-e2e1-0fe6-7b61-20fd5171d5cb@oracle.com> Message-ID: <007bcfa5-6161-fc03-761d-2bacd5fdaeb3@physik.fu-berlin.de> On 09/27/2017 10:30 PM, Magnus Ihse Bursie wrote: > You are correct, $(HOTSPOT_TOPDIR) does not exist anymore (since it does not make sence in the new source layout). > > Replace it with $(TOPDIR)/src/hotspot. Ah, that's what I was looking for :). Thank you! > I am sorry I missed it at the review. :( No worries. I should have tested more thoroughly before sending in the updated RFR :). Will report back in a second. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From glaubitz at physik.fu-berlin.de Wed Sep 27 21:08:47 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 27 Sep 2017 23:08:47 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <007bcfa5-6161-fc03-761d-2bacd5fdaeb3@physik.fu-berlin.de> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> <69861ddf-e2e1-0fe6-7b61-20fd5171d5cb@oracle.com> <007bcfa5-6161-fc03-761d-2bacd5fdaeb3@physik.fu-berlin.de> Message-ID: On 09/27/2017 10:32 PM, John Paul Adrian Glaubitz wrote: >> I am sorry I missed it at the review. :( > No worries. I should have tested more thoroughly before sending > in the updated RFR :). > > Will report back in a second. Ok, fixed and verified. Please pull from [1]. Adrian > [1] http://cr.openjdk.java.net/~glaubitz/8186578/webrev.04/ -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From david.holmes at oracle.com Wed Sep 27 21:50:09 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 28 Sep 2017 07:50:09 +1000 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> <69861ddf-e2e1-0fe6-7b61-20fd5171d5cb@oracle.com> <007bcfa5-6161-fc03-761d-2bacd5fdaeb3@physik.fu-berlin.de> Message-ID: Adrian, Can you please create the final changeset including the "Reviewed-by: ..." line. That way your sponsor can hg import direct from the webrev link to the patch. Thanks, David PS. Looks good to me too. On 28/09/2017 7:08 AM, John Paul Adrian Glaubitz wrote: > On 09/27/2017 10:32 PM, John Paul Adrian Glaubitz wrote: >>> I am sorry I missed it at the review. :( >> No worries. I should have tested more thoroughly before sending >> in the updated RFR :). >> >> Will report back in a second. > > Ok, fixed and verified. Please pull from [1]. > > Adrian > >> [1] http://cr.openjdk.java.net/~glaubitz/8186578/webrev.04/ > From glaubitz at physik.fu-berlin.de Wed Sep 27 21:54:18 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 27 Sep 2017 23:54:18 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> <69861ddf-e2e1-0fe6-7b61-20fd5171d5cb@oracle.com> <007bcfa5-6161-fc03-761d-2bacd5fdaeb3@physik.fu-berlin.de> Message-ID: <4ff8d940-3b73-a9d0-285d-628913b6b3c2@physik.fu-berlin.de> Hi David! On 09/27/2017 11:50 PM, David Holmes wrote: > Can you please create the final changeset including the "Reviewed-by: ..." line. > That way your sponsor can hg import direct from the webrev link to the patch. I'm not sure how I do that? Does webrev/hg have an option to add this tag? Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From hohensee at amazon.com Wed Sep 27 21:54:22 2017 From: hohensee at amazon.com (Hohensee, Paul) Date: Wed, 27 Sep 2017 21:54:22 +0000 Subject: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <858C4BE6-3469-4B4F-9470-8770A6DF0DB8@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7B2F30D0-C8DF-43CA-BC65-3E91EC136294@oracle.com> <858C4BE6-3469-4B4F-9470-8770A6DF0DB8@oracle.com> Message-ID: I?ve filed https://bugs.openjdk.java.net/browse/JDK-8188068 to fix the getter method name problem and assigned it to myself. Thanks, Paul On 9/27/17, 1:22 PM, "Bob Vandette" wrote: I don?t have a problem with following the approved style. Whenever I?m adding new code in a large body of code I usually look around and try to adapt to the norm assuming that others that came before me were properly reviewed for style conformance. I assumed that the style guide had been updated but didn?t check. I?ll just point out that there are many many offenders that have been in the source tree a looong time. Since I am more or less extending the os class, I followed os.hpp?s lead. os.hpp: static const char* get_temp_directory(); os.hpp: static const char* get_current_directory(char *buf, size_t buflen); os.hpp: static int get_loaded_modules_info(LoadedModulesCallbackFunc callback, void *param); os.hpp: static void* get_default_process_handle(); os.hpp: static bool get_host_name(char* buf, size_t buflen); os.hpp: static int get_last_error(); os.hpp: static frame get_sender_for_C_frame(frame *fr); os.hpp: static int get_signal_number(const char* signal_name); os.hpp: static int get_native_stack(address* stack, int size, int toSkip = 0); Here are a few more examples: thread.hpp: const char* get_thread_name() const; thread.hpp: const char* get_thread_name_string(char* buf = NULL, int buflen = 0) const; thread.hpp: const char* get_threadgroup_name() const; thread.hpp: const char* get_parent_name() const; sharedRuntime.hpp: static address get_handle_wrong_method_stub() { sharedRuntime.hpp: static address get_handle_wrong_method_abstract_stub() { sharedRuntime.hpp: static address get_resolve_opt_virtual_call_stub() { sharedRuntime.hpp: static address get_resolve_virtual_call_stub() { threadService.hpp: static jlong get_total_thread_count() { return _total_threads_count->get_value(); } threadService.hpp: static jlong get_peak_thread_count() { return _peak_threads_count->get_value(); } threadService.hpp: static jlong get_live_thread_count() { return _live_threads_count->get_value() - _exiting_threads_count; } threadService.hpp: static jlong get_daemon_thread_count() { return _daemon_threads_count->get_value() - _exiting_daemon_threads_count; } memoryService.hpp: static MemoryPool* get_memory_pool(instanceHandle pool); memoryService.hpp: static MemoryManager* get_memory_manager(instanceHandle mgr); memoryService.hpp: static MemoryPool* get_memory_pool(int index) { memoryService.hpp: static MemoryManager* get_memory_manager(int index) { memoryService.hpp: static bool get_verbose() { return log_is_enabled(Info, gc); } memoryService.hpp: static const GCMemoryManager* get_minor_gc_manager() { memoryService.hpp: static const GCMemoryManager* get_major_gc_manager() { memTracker.cpp: return MallocTracker::get_base(memblock); memTracker.hpp: static inline Tracker get_virtual_memory_uncommit_tracker() { return Tracker(); } memTracker.hpp: static inline Tracker get_virtual_memory_release_tracker() { return Tracker(); } memTracker.hpp: return MallocTracker::get_header_size(memblock); memTracker.hpp: static inline Tracker get_virtual_memory_uncommit_tracker() { memTracker.hpp: static inline Tracker get_virtual_memory_release_tracker() { memTracker.hpp: static inline MemBaseline& get_baseline() { I?ll update my changes, Bob. > On Sep 27, 2017, at 4:05 PM, John Rose wrote: > > "There are counter examples to the prevailing style" does not lead to "therefore I can diverge from the prevailing style whenever I want." You need another reason, such as "I'm modifying an API where an alternative style is the local norm" or "these identifiers gain special value from their special style." > > ? John > > On Sep 27, 2017, at 12:59 PM, Hohensee, Paul wrote: > >>> Getters in Hotspot don?t have ?get_? prefixes (at least, they never used to!), so replace ?get_container_type? with ?container_type? and ?pd_get_container_type? with ?pd_container_type?. >> >> There are many examples of get_xxx in hotspot now. > From john.r.rose at oracle.com Wed Sep 27 23:35:19 2017 From: john.r.rose at oracle.com (John Rose) Date: Wed, 27 Sep 2017 16:35:19 -0700 Subject: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <858C4BE6-3469-4B4F-9470-8770A6DF0DB8@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7B2F30D0-C8DF-43CA-BC65-3E91EC136294@oracle.com> <858C4BE6-3469-4B4F-9470-8770A6DF0DB8@oracle.com> Message-ID: <3BF572D0-1168-4C0E-B202-2A535C361CFB@oracle.com> On Sep 27, 2017, at 1:22 PM, Bob Vandette wrote: > I don?t have a problem with following the approved style. Of course! For the record, I certainly didn't intend to imply any such problems. My previous note was perhaps short enough to carry an abrupt tone. I think I wrote it on my phone. Sorry. :-) I'm going to take this opportunity to expand on my reasoning, for the record. I'll also address your counterexamples. So, I wrote the style guide for Hotspot to help preserve some pleasing (at least, useful) regularities in the style of the code base. The guide is successful (at least, partly) if it gives a clear nudge towards a reasonable norm we can all share. Under the assumption that judgment and taste are important in our creative activity of software engineering, the style guide states clearly that there can always be reasons to depart from the rules. That makes it relatively weak as a process component, and that is intentional. It is guidance not mandate, and as such dependent on everyone's judgment and taste. > Since I am more or less extending the os class, I followed os.hpp?s lead. This argument ("I'm modifying an API where an alternative style is the local norm.") is warmly supported by the style guide. See: https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-Counterexamples Some details: os.hpp has 900 LOC and a crude grep shows about 40 nullary non-boolean access methods plus a non-grepped number of non-nullary access methods. $ grep ' static .*( *)' os.hpp | grep -v 'static *void *[^*]' | grep -v 'static *bool' Of these nine have "get_" pattern in their names. So it's a mixed bag. Choosing how to add new names is not an exact science, but I'd say there is a "tilt" here towards avoiding "get_", but not a decisive one. Is there a local convention or just blind bit-evolution? I think it's probably just evolution. But I won't be upset if someone points out a reason for some of those "get_" names. Also, Paul's bug for un-mixing some of the various bags is good as a low-frequency cleanup activity. When he calls for review, folks can take a look and give reasons why local conventions are important and should not be "cleaned up". If we want to agree to leave things as they are, for some durable reason, we can capture an explanatory comment and move on. We might even decide, "please don't ever change this bit because it will cause a maintenance burden", in which case that too should be recorded clearly. I hope this helps. ? John From david.holmes at oracle.com Thu Sep 28 00:44:35 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 28 Sep 2017 10:44:35 +1000 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <4ff8d940-3b73-a9d0-285d-628913b6b3c2@physik.fu-berlin.de> References: <1655822b-bebb-44bb-a45e-56b115fe8490@physik.fu-berlin.de> <1c0bea33-bcc5-ce78-874b-e3d277de5dae@oracle.com> <752166eb-4304-1f2d-da12-4dc140443601@oracle.com> <69861ddf-e2e1-0fe6-7b61-20fd5171d5cb@oracle.com> <007bcfa5-6161-fc03-761d-2bacd5fdaeb3@physik.fu-berlin.de> <4ff8d940-3b73-a9d0-285d-628913b6b3c2@physik.fu-berlin.de> Message-ID: <3e526726-d9b3-0e59-9040-6fbe1a830a25@oracle.com> On 28/09/2017 7:54 AM, John Paul Adrian Glaubitz wrote: > Hi David! > > On 09/27/2017 11:50 PM, David Holmes wrote: >> Can you please create the final changeset including the "Reviewed-by: >> ..." line. >> That way your sponsor can hg import direct from the webrev link to the >> patch. > > I'm not sure how I do that? It is part of the commit message when you do "hg commit". http://openjdk.java.net/guide/producingChangeset.html Thanks, David ----- > Does webrev/hg have an option to add this tag? > > Adrian > From david.holmes at oracle.com Thu Sep 28 01:20:58 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 28 Sep 2017 11:20:58 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> Message-ID: <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> Hi Bob, On 28/09/2017 1:45 AM, Bob Vandette wrote: > David, ?Thank you for taking the time and providing a detailed review of > these changes. > > Where I haven?t responded, I?ll update the implementation based on your > comments. Okay. I've trimmed below to only leave things I have follow up on. >> If this is all confined to Linux only then this should be a linux-only >> flag and all the changes should be confined to linux code. No shared >> osContainer API is needed as it can be defined as a nested class of >> os::Linux, and will only be called from os_linux.cpp. > > I received feedback on my other Container work where I was asked to > make sure it was possible to support other container technologies. > The addition of the shared osContainer API is to prepare for this and > recognize that this will eventually be supported other platforms. The problem is that the proposed osContainer API is totally cgroup centric. That API might not make sense for a different container technology. Even if Docker is used on different platforms, does it use cgroups on those other platforms? Until we have at least two examples we want to support we don't know how to formulate a generic API. So in my opinion we should initially keep this Linux specific as a proof-of-concept for future more general container support. >>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>> may not satisfy every users needs, I?ve added an additional flag to >>> allow the >>> number of CPUs to be overridden. ?This flag is named >>> -XX:ActiveProcessorCount=xx. >> >> I would suggest that ActiveProcessorCount be constrained to being >1 - >> this is in line with our plans to get rid of AssumeMP/os::is_MP and >> always build in MP support. Otherwise a count of 1 today won't behave >> the same as a count of 1 in the future. > What if I return true for is_MP anytime ActiveProcessorCount is set. > ?I?d like to provide the ability of specifying a single processor. If I make the AssumeMP change for 18.3 as planned then this won't be an issue. I'd better get onto that :) >> >> Also you have defined this globally but only accounted for it on >> Linux. I think it makes sense to support this flag on all platforms (a >> generalization of AssumeMP). Otherwise it needs to be defined as a >> Linux-only flag in the pd_globals.hpp file > Good idea. You could even factor this out as a separate issue/task independent of the container work. >> Style issue: >> >> 2121 ????if (i < 0) st->print("OSContainer::active_processor_count() >> failed"); >> 2122 ????else >> >> and elsewhere. Please move the st->print to its own line. Others may >> argue for always using blocks ({}) in if/else. > > There doesn?t seem to be consistency on this issue. No there's no consistency :( And this isn't in the hotspot style guide AFAICS. But I'm sure it's in some other coding guidelines ;-) >> 5024 ??// User has overridden the number of active processors >> 5025 ??if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) { >> 5026 ????log_trace(os)("active_processor_count: " >> 5027 ??????????????????"active processor count set by user : %d", >> 5028 ??????????????????(int)ActiveProcessorCount); >> 5029 ????return ActiveProcessorCount; >> 5030 ??} >> >> We don't normally check flags in runtime code like this - this will be >> executed on every call, and you will see that logging each time. This >> should be handled during initialization (os::Posix::init()? - if >> applying this flag globally) - with logging occurring once. The above >> should just reduce to: >> >> if (ActiveProcessorCount > 0) { >> ?return ActiveProcessorCount; // explicit user control of number of cpus >> } >> >> Even then I do get concerned about having to always check for the >> least common cases before the most common one. :( > > This is not in a highly used function so it should be ok. I really don't like seeing the FLAG_IS_DEFAULT in there - and you need to move the logging anyway. >> >> The osContainer_.hpp files seem to be unnecessary as they are all >> empty. > > I?ll remove them. ?I wasn?t sure if there was a convention to move more > of osContainer_linux.cpp -> osContainer_linux.hpp. > > For example: classCgroupSubsystem The header is only needed to expose an API for other code to use. Locally defined classes can be kept in the .cpp file. >> 34 class CgroupSubsystem: CHeapObj { >> >> You defined this class as CHeapObj and added a destructor to free a >> few things, but I can't see where the instances of this class will >> themselves ever be freed > > What?s the latest thinking on freeing CHeap Objects on termination? ?Is > it really worth wasting cpu cycles when our > process is about to terminate? ?If not, I?ll just remove the destructors. Philosophically I prefer new APIs to play nice with the invocation API, even if existing API's don't play nice. But that's just me. >> >> 62 ????void set_subsystem_path(char *cgroup_path) { >> >> If this takes a "const char*" will it save you from casting string >> literals to "char*" elsewhere? > > I tried several different ways of declaring the container accessor > functions and > always ended up with warnings due to scanf not being able to validate > arguments > since the format string didn?t end up being a string literal. ?I > originally was using templates > and then ended up with the macros. ?I tried several different casts but > could resolve the problem. Sounds like something Kim Barrett should take a look at :) Thanks, David From david.holmes at oracle.com Thu Sep 28 06:01:43 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 28 Sep 2017 16:01:43 +1000 Subject: RFR: 8185062: Set AssumeMP to true and deprecate the flag Message-ID: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8185062 Webrev: http://cr.openjdk.java.net/~dholmes/8185062/webrev/ Following on from the discussion here: http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/027720.html and given the new 6 monthly release cadence, it has been decided to switch AssumeMP to true and deprecate it in 18.3, with a view to obsoleting it and removing all non-MP related code in 18.9. The SPARC specific setting of AssumeMP is no longer needed, and os::is_MP is micro-optimized by checking AssumeMP first. CSR request: https://bugs.openjdk.java.net/browse/JDK-8188079 Can I please get a reviewer for the CSR request (edit it and add your OpenJDK user name to the "Reviewed by" box). I will push once the CSR request is approved. Thanks, David From shade at redhat.com Thu Sep 28 06:11:34 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 28 Sep 2017 08:11:34 +0200 Subject: RFR: 8185062: Set AssumeMP to true and deprecate the flag In-Reply-To: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> References: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> Message-ID: <83e9a2d9-d0ab-5fcb-4d57-73016a41a4af@redhat.com> On 09/28/2017 08:01 AM, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8185062 > Webrev: http://cr.openjdk.java.net/~dholmes/8185062/webrev/ Looks good. *) This change is micro-optimization, right? Double-space before AssumeMP: 216 return AssumeMP || (_processor_count != 1); > CSR request: https://bugs.openjdk.java.net/browse/JDK-8188079 > > Can I please get a reviewer for the CSR request (edit it and add your OpenJDK user name to the > "Reviewed by" box). Who is allowed to do this? -Aleksey From david.holmes at oracle.com Thu Sep 28 06:19:19 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 28 Sep 2017 16:19:19 +1000 Subject: RFR: 8185062: Set AssumeMP to true and deprecate the flag In-Reply-To: <83e9a2d9-d0ab-5fcb-4d57-73016a41a4af@redhat.com> References: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> <83e9a2d9-d0ab-5fcb-4d57-73016a41a4af@redhat.com> Message-ID: <3e3568b2-6ad6-a780-b530-1efc8708fb5f@oracle.com> Hi Aleksey, On 28/09/2017 4:11 PM, Aleksey Shipilev wrote: > On 09/28/2017 08:01 AM, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8185062 >> Webrev: http://cr.openjdk.java.net/~dholmes/8185062/webrev/ > > Looks good. Thanks for taking a look so quickly! :) > *) This change is micro-optimization, right? Double-space before AssumeMP: > > 216 return AssumeMP || (_processor_count != 1); :) Congratulations you passed the reviewer test. Fixed. >> CSR request: https://bugs.openjdk.java.net/browse/JDK-8188079 >> >> Can I please get a reviewer for the CSR request (edit it and add your OpenJDK user name to the >> "Reviewed by" box). > > Who is allowed to do this? https://wiki.openjdk.java.net/display/csr/CSR+FAQs Q: Who should be a reviewer on a CSR proposal? A: One or more engineers with expertise in the areas impacted by the proposed change should review the CSR request and be listed as a reviewer before the proposal is reviewed by the CSR membership. (These engineers may or may not be Reviewers on the corresponding JDK project.) It is appropriate to ask a CSR member to review a request in a area where he or she has expertise, but it is not necessary for a CSR member to review a request before the CSR body considers it. To encourage wider reviews, it is preferable if the CSR chair is not the only reviewer of a CSR request. The CSR may request a proposal be reviewed by additional engineers before further considering the request. --- And of course you must have an OpenJDK username. Thanks, David ----- > -Aleksey > From erik.osterlund at oracle.com Thu Sep 28 11:56:22 2017 From: erik.osterlund at oracle.com (Erik =?ISO-8859-1?Q?=D6sterlund?=) Date: Thu, 28 Sep 2017 13:56:22 +0200 Subject: RFR (M): 8187977: Generalize Atomic::xchg to use templates Message-ID: <1506599782.27149.154.camel@oracle.com> Hi all, The time has come to generalize more atomics. I have modelled Atomic::xchg to systematically do what Atomic::cmpxchg did but for xchg. Bug: https://bugs.openjdk.java.net/browse/JDK-8187977 Webrev: http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00/ Testing: mach5 hs-tier3 and JPRT Thanks, /Erik From vladimir.kozlov at oracle.com Thu Sep 28 15:31:33 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 28 Sep 2017 08:31:33 -0700 Subject: RFR: 8185062: Set AssumeMP to true and deprecate the flag In-Reply-To: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> References: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> Message-ID: On 9/27/17 11:01 PM, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8185062 > Webrev: http://cr.openjdk.java.net/~dholmes/8185062/webrev/ What are rules for flags values in the test? I see some are not matching values in globals.hpp. > > Following on from the discussion here: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/027720.html > > and given the new 6 monthly release cadence, it has been decided to switch AssumeMP to true and deprecate it in 18.3, > with a view to obsoleting it and removing all non-MP related code in 18.9. > > The SPARC specific setting of AssumeMP is no longer needed, and os::is_MP is micro-optimized by checking AssumeMP first. > > CSR request: https://bugs.openjdk.java.net/browse/JDK-8188079 > > Can I please get a reviewer for the CSR request (edit it and add your OpenJDK user name to the "Reviewed by" box). Reviewed. Thanks, Vladimir > > I will push once the CSR request is approved. > > Thanks, > David From daniel.daugherty at oracle.com Thu Sep 28 15:39:47 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 28 Sep 2017 09:39:47 -0600 Subject: RFR: 8185062: Set AssumeMP to true and deprecate the flag In-Reply-To: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> References: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> Message-ID: <13517856-8cf3-f6d7-ed9b-17f9a2c07090@oracle.com> On 9/28/17 12:01 AM, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8185062 > Webrev: http://cr.openjdk.java.net/~dholmes/8185062/webrev/ src/hotspot/share/runtime/arguments.cpp ??? No comments. src/hotspot/share/runtime/globals.hpp ??? Can you delete the extra blanks before the backslash? ??? If not, no worries. src/hotspot/share/runtime/os.hpp ??? No comments. test/hotspot/jtreg/runtime/CommandLine/VMDeprecatedOptions.java ??? No comments. Thumbs up. > Following on from the discussion here: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/027720.html > > > and given the new 6 monthly release cadence, it has been decided to > switch AssumeMP to true and deprecate it in 18.3, with a view to > obsoleting it and removing all non-MP related code in 18.9. > > The SPARC specific setting of AssumeMP is no longer needed, and > os::is_MP is micro-optimized by checking AssumeMP first. > > CSR request: https://bugs.openjdk.java.net/browse/JDK-8188079 > > Can I please get a reviewer for the CSR request (edit it and add your > OpenJDK user name to the "Reviewed by" box). Also reviewed the CSR. Dan > > I will push once the CSR request is approved. > > Thanks, > David From mbrandy at linux.vnet.ibm.com Thu Sep 28 17:53:23 2017 From: mbrandy at linux.vnet.ibm.com (Matthew Brandyberry) Date: Thu, 28 Sep 2017 12:53:23 -0500 Subject: [8u] RFR (M) 8181809 PPC64: Leverage mtfprd/mffprd on POWER8 Message-ID: Hi, Please review this backport of 8181809 for jdk8u. It applies cleanly to jdk8u except for the lack of C1 support on PPC in 8u -- thus those changes are omitted here. This is a PPC-specific hotspot optimization that leverages the mtfprd/mffprd instructions for for movement between general purpose and floating point registers (rather than through memory). It yields a ~35% improvement measured via a microbenchmark. webrev :http://cr.openjdk.java.net/~mbrandy/8181809/jdk8u/v1 bug :https://bugs.openjdk.java.net/browse/JDK-8181809 review thread:http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-June/027226.html Thank you. -Matt From kim.barrett at oracle.com Thu Sep 28 19:02:34 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 28 Sep 2017 15:02:34 -0400 Subject: RFR (M): 8187977: Generalize Atomic::xchg to use templates In-Reply-To: <1506599782.27149.154.camel@oracle.com> References: <1506599782.27149.154.camel@oracle.com> Message-ID: <6436642A-1806-429F-81CD-06C96E78EECD@oracle.com> > On Sep 28, 2017, at 7:56 AM, Erik ?sterlund wrote: > > Hi all, > > The time has come to generalize more atomics. > > I have modelled Atomic::xchg to systematically do what Atomic::cmpxchg > did but for xchg. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8187977 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00/ > > Testing: mach5 hs-tier3 and JPRT > > Thanks, > /Erik ============================================================================== src/hotspot/share/runtime/atomic.hpp 312 // A default definition is not provided, so specializations must be 313 // provided for: 314 // T operator()(T, T volatile*) const That description is not correct. The default definition of PlatformXchg is at line 410. And it makes no sense to talk about specializing a function for a class that doesn't exist. This should instead be using similar wording to that in the description of PlatformCmpxchg. ============================================================================== src/hotspot/share/runtime/atomic.hpp There are a lot of similarities between the handling of the exchange_value by cmpxchg and xchg, and I expect Atomic::store and OrderAccess::release_store and variants to also be similar. (Actually, xchg and store may look *very* similar.) It would be nice to think about whether there's some refactoring that could be used to reduce the amount of code involved. I don't have a specific suggestion yet though. For now, this is just something to think about. ============================================================================== src/hotspot/share/compiler/compileBroker.hpp 335 Atomic::xchg(jint(shutdown_compilation), &_should_compile_new_jobs); [Added jint conversion.] Pre-existing: Why is this an Atomic::xchg at all, since the old value isn't used. Seems like it could be a release_store. There also seems to be some data typing problems around _should_compile_new_jobs. Shouldn't that variable be a CompilerActivity? That wouldn't have worked previously, and we're probably still not ready for it if the xchg gets changed to a release_store, but eventually... so there probably ought to be a bug for it. ============================================================================== src/hotspot/os_cpu/solaris_x86/atomic_solaris_x86.hpp 146 template<> 147 template 148 inline T Atomic::PlatformXchg<8>::operator()(T exchange_value, Please move this definition up near the PlatformXchg<4> definition, e.g. around line 96. ============================================================================== src/hotspot/os_cpu/linux_arm/atomic_linux_arm.hpp I would prefer the two PlatformXchg specializations be adjacent. If the 4byte specialization was after PlatformAdd<8>, reviewing would have been easier, and the different Add would be adjacent and the different Xchg would be adjacent. The only cost is an extra #ifdef AARCH64 block around the 8byte Xchg. ============================================================================== src/hotspot/os_cpu/bsd_zero/atomic_bsd_zero.hpp 216 return xchg_using_helper(arm_lock_test_and_set, exchange_value, dest); 219 return xchg_using_helper(m68k_lock_test_and_set, exchange_value, dest); arm/m68k_lock_test_and_set expect their arguments in the other order, and need to be updated. There was a similar change for cmpxchg. Same problem in atomic_linux_zero.hpp. ============================================================================== From coleen.phillimore at oracle.com Thu Sep 28 21:36:23 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 28 Sep 2017 17:36:23 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> Message-ID: <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> Thank you to Stefan Karlsson offlist for pointing out that the previous .01 version of this webrev breaks CMS in that it doesn't remember ClassLoaderData::_handles that are changed and added while concurrent marking is in progress.? I've fixed this bug to move the Klass::_modified_oops and _accumulated_modified_oops to the ClassLoaderData and use these fields in the CMS remarking phase to catch any new handles that are added.?? This also fixes this bug https://bugs.openjdk.java.net/browse/JDK-8173988 . In addition, the previous version of this change removed an optimization during young collection, which showed some uncertain performance regression in young pause times, so I added this optimization back to not walk ClassLoaderData during young collections if all the oops are old.? The performance results of SPECjbb2015 now are slightly better, but not significantly. This latest patch has been tested on tier1-5 on linux x64 and windows x64 in mach5 test harness. http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ Can I get at least 3 reviewers?? One from each of the compiler, gc, and runtime group at least since there are changes to all 3. Thanks! Coleen On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: > Summary: Add indirection for fetching mirror so that GC doesn't have > to follow CLD::_klasses > > Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 > changes. > > Ran nightly tests through Mach5 and RBT.?? Early performance testing > showed good performance improvment in GC class loader data processing > time, but nmethod processing time continues to dominate. Also > performace testing showed no throughput regression.?? I'm rerunning > both of these performance testing and will post the numbers. > > bug link https://bugs.openjdk.java.net/browse/JDK-8186777 > open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev > > Thanks, > Coleen From igor.ignatyev at oracle.com Thu Sep 28 21:41:04 2017 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 28 Sep 2017 14:41:04 -0700 Subject: RFR(XS) : 8188117 : jdk/test/lib/FileInstaller doesn't work for directories Message-ID: http://cr.openjdk.java.net/~iignatyev//8188117/webrev.00/index.html > 5 lines changed: 2 ins; 0 del; 3 mod; Hi all, could you please review this tiny fix for testlibrary class? FileInstaller uses Path::relativize incorrectly, it calls this method on a child passing (grand*)parent as an argument, but Path::relativize constructs a relative path for an argument against an receiver. the patch also adds diagnostic output to help w/ failure analyses. webrev: http://cr.openjdk.java.net/~iignatyev//8188117/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8188117 Thanks, -- Igor From mikhailo.seledtsov at oracle.com Thu Sep 28 22:14:01 2017 From: mikhailo.seledtsov at oracle.com (Mikhailo Seledtsov) Date: Thu, 28 Sep 2017 15:14:01 -0700 Subject: RFR(XS) : 8188117 : jdk/test/lib/FileInstaller doesn't work for directories In-Reply-To: References: Message-ID: <59CD7429.5020104@oracle.com> Looks good, Misha On 9/28/17, 2:41 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8188117/webrev.00/index.html >> 5 lines changed: 2 ins; 0 del; 3 mod; > Hi all, > > could you please review this tiny fix for testlibrary class? > > FileInstaller uses Path::relativize incorrectly, it calls this method on a child passing (grand*)parent as an argument, but Path::relativize constructs a relative path for an argument against an receiver. > the patch also adds diagnostic output to help w/ failure analyses. > > webrev: http://cr.openjdk.java.net/~iignatyev//8188117/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8188117 > > Thanks, > -- Igor From david.holmes at oracle.com Thu Sep 28 22:25:52 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 29 Sep 2017 08:25:52 +1000 Subject: RFR: 8185062: Set AssumeMP to true and deprecate the flag In-Reply-To: References: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> Message-ID: <3fb00451-efe1-56d8-bebe-c1189e7bfa22@oracle.com> Hi Vladimir, On 29/09/2017 1:31 AM, Vladimir Kozlov wrote: > On 9/27/17 11:01 PM, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8185062 >> Webrev: http://cr.openjdk.java.net/~dholmes/8185062/webrev/ > > What are rules for flags values in the test? > I see some are not matching values in globals.hpp. No rules. The test simply verifies you get the deprecation warning. The value doesn't matter as long as it is valid. >> >> Following on from the discussion here: >> >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/027720.html >> >> >> and given the new 6 monthly release cadence, it has been decided to >> switch AssumeMP to true and deprecate it in 18.3, with a view to >> obsoleting it and removing all non-MP related code in 18.9. >> >> The SPARC specific setting of AssumeMP is no longer needed, and >> os::is_MP is micro-optimized by checking AssumeMP first. >> >> CSR request: https://bugs.openjdk.java.net/browse/JDK-8188079 >> >> Can I please get a reviewer for the CSR request (edit it and add your >> OpenJDK user name to the "Reviewed by" box). > > Reviewed. Thanks! David > Thanks, > Vladimir > >> >> I will push once the CSR request is approved. >> >> Thanks, >> David From david.holmes at oracle.com Thu Sep 28 22:26:32 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 29 Sep 2017 08:26:32 +1000 Subject: RFR: 8185062: Set AssumeMP to true and deprecate the flag In-Reply-To: <13517856-8cf3-f6d7-ed9b-17f9a2c07090@oracle.com> References: <4930a917-04d5-c10c-55c1-c012a5d26dd3@oracle.com> <13517856-8cf3-f6d7-ed9b-17f9a2c07090@oracle.com> Message-ID: <3ce5da9d-cf40-e425-10bc-3d90373c6a00@oracle.com> Hi Dan, On 29/09/2017 1:39 AM, Daniel D. Daugherty wrote: > On 9/28/17 12:01 AM, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8185062 >> Webrev: http://cr.openjdk.java.net/~dholmes/8185062/webrev/ > > src/hotspot/share/runtime/arguments.cpp > ??? No comments. > > src/hotspot/share/runtime/globals.hpp > ??? Can you delete the extra blanks before the backslash? > ??? If not, no worries. Will see if I can. > src/hotspot/share/runtime/os.hpp > ??? No comments. > > test/hotspot/jtreg/runtime/CommandLine/VMDeprecatedOptions.java > ??? No comments. > > Thumbs up. > > >> Following on from the discussion here: >> >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/027720.html >> >> >> and given the new 6 monthly release cadence, it has been decided to >> switch AssumeMP to true and deprecate it in 18.3, with a view to >> obsoleting it and removing all non-MP related code in 18.9. >> >> The SPARC specific setting of AssumeMP is no longer needed, and >> os::is_MP is micro-optimized by checking AssumeMP first. >> >> CSR request: https://bugs.openjdk.java.net/browse/JDK-8188079 >> >> Can I please get a reviewer for the CSR request (edit it and add your >> OpenJDK user name to the "Reviewed by" box). > > Also reviewed the CSR. Many thanks! David > Dan > >> >> I will push once the CSR request is approved. >> >> Thanks, >> David > From serguei.spitsyn at oracle.com Thu Sep 28 22:43:16 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Thu, 28 Sep 2017 15:43:16 -0700 Subject: RFR(XS) : 8188117 : jdk/test/lib/FileInstaller doesn't work for directories In-Reply-To: References: Message-ID: <73a65134-f215-be53-f900-f0322435ccbb@oracle.com> Looks good. Thanks, Serguei On 9/28/17 14:41, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8188117/webrev.00/index.html >> 5 lines changed: 2 ins; 0 del; 3 mod; > Hi all, > > could you please review this tiny fix for testlibrary class? > > FileInstaller uses Path::relativize incorrectly, it calls this method on a child passing (grand*)parent as an argument, but Path::relativize constructs a relative path for an argument against an receiver. > the patch also adds diagnostic output to help w/ failure analyses. > > webrev: http://cr.openjdk.java.net/~iignatyev//8188117/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8188117 > > Thanks, > -- Igor From igor.ignatyev at oracle.com Thu Sep 28 23:29:35 2017 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 28 Sep 2017 16:29:35 -0700 Subject: RFR(XS) : 8188117 : jdk/test/lib/FileInstaller doesn't work for directories In-Reply-To: <73a65134-f215-be53-f900-f0322435ccbb@oracle.com> References: <73a65134-f215-be53-f900-f0322435ccbb@oracle.com> Message-ID: Serguei, Misha, thank you for your review. Cheers, -- Igor > On Sep 28, 2017, at 3:14 PM, Mikhailo Seledtsov wrote: > > Looks good, > > Misha > On Sep 28, 2017, at 3:43 PM, serguei.spitsyn at oracle.com wrote: > > Looks good. > > Thanks, > Serguei > > > On 9/28/17 14:41, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8188117/webrev.00/index.html >>> 5 lines changed: 2 ins; 0 del; 3 mod; >> Hi all, >> >> could you please review this tiny fix for testlibrary class? >> >> FileInstaller uses Path::relativize incorrectly, it calls this method on a child passing (grand*)parent as an argument, but Path::relativize constructs a relative path for an argument against an receiver. >> the patch also adds diagnostic output to help w/ failure analyses. >> >> webrev: http://cr.openjdk.java.net/~iignatyev//8188117/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8188117 >> >> Thanks, >> -- Igor > From OGATAK at jp.ibm.com Fri Sep 29 06:41:41 2017 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Fri, 29 Sep 2017 15:41:41 +0900 Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as other platforms Message-ID: Hi all, Please review a change for JDK-8188131. Bug report: https://bugs.openjdk.java.net/browse/JDK-8188131 Webrev: http://cr.openjdk.java.net/~horii/8188131/webrev.00/ This change increases the default values of FreqInlineSize and InlineSmallCode in ppc64 to 325 and 2500, respectively. These values are the same as aarch64. The performance of TPC-DS Q96 was improved by about 6% with this change. Regards, Ogata From HORIE at jp.ibm.com Fri Sep 29 09:37:53 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 29 Sep 2017 18:37:53 +0900 Subject: RFR(M):8188139:PPC64: Superword Level Parallelization with VSX Message-ID: Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8188139 Webrev: http://cr.openjdk.java.net/~mhorie/8188139/webrev.00/ This change introduces to use VSX for Superword Level Parallelization, concretely VSX instructions are emitted for Replicate[BSIFDL] nodes in ppc.ad. Since I am not familiar with the hotspot's register allocation and the TOC use in POWER, I would be very grateful to have any comments to improve the change. In addition, the change includes some minor fixes in assembler_ppc.inline.hpp. I think there are some instructions that should have 1u in higher bits. I used the attached micro benchmark. (See attached file: ArraysFillTest.java) Best regards, -- Michihiro, IBM Research - Tokyo From stefan.karlsson at oracle.com Fri Sep 29 10:41:34 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 29 Sep 2017 12:41:34 +0200 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> Message-ID: <1498efad-e443-5875-cc20-b0d0c926e883@oracle.com> Hi Coleen, I started looking at this, but will need a second round before I've fully reviewed the GC parts. Here are some nits that would be nice to get cleaned up. ========== http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.cpp.frames.html 788 record_modified_oops(); // necessary? This could be removed. Only G1 cares about deleted "weak" references. Or we can wait until Erik?'s GC Barrier Interface is in place and remove it then. ---------- #ifdef CLD_DUMP_KLASSES if (Verbose) { Klass* k = _klasses; while (k != NULL) { - out->print_cr("klass " PTR_FORMAT ", %s, CT: %d, MUT: %d", k, k->name()->as_C_string(), - k->has_modified_oops(), k->has_accumulated_modified_oops()); + out->print_cr("klass " PTR_FORMAT ", %s", k, k->name()->as_C_string()); assert(k != k->next_link(), "no loops!"); k = k->next_link(); } } #endif // CLD_DUMP_KLASSES Pre-existing: I don't think this will compile if you turn on CLD_DUMP_KLASSES. k must be p2i(k). ========== http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.hpp.udiff.html + // Remembered sets support for the oops in the class loader data. + jbyte _modified_oops; // Card Table Equivalent (YC/CMS support) + jbyte _accumulated_modified_oops; // Mod Union Equivalent (CMS support) We should create a follow-up bug to change these jbytes to bools. ========== http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1HeapVerifier.cpp.frames.html Spurious addition: + G1CollectedHeap* _g1h; ========== http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1OopClosures.hpp.udiff.html Spurious addition?: + G1CollectedHeap* g1() { return _g1; } ========== http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psScavenge.inline.hpp.patch PSPromotionManager* _pm; - // Used to redirty a scanned klass if it has oops + // Used to redirty a scanned cld if it has oops // pointing to the young generation after being scanned. - Klass* _scanned_klass; + ClassLoaderData* _scanned_cld; Indentation. ========== http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psTasks.cpp.frames.html 80 case class_loader_data: 81 { 82 PSScavengeCLDClosure ps(pm); 83 ClassLoaderDataGraph::cld_do(&ps); 84 } Would you mind changing the name ps to cld_closure? ========== http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/shared/genOopClosures.hpp.patch + OopsInClassLoaderDataOrGenClosure* _scavenge_closure; // true if the the modified oops state should be saved. bool _accumulate_modified_oops; Indentation. ---------- + void do_cld(ClassLoaderData* k); Rename k? Thanks, StefanK On 2017-09-28 23:36, coleen.phillimore at oracle.com wrote: > > Thank you to Stefan Karlsson offlist for pointing out that the previous > .01 version of this webrev breaks CMS in that it doesn't remember > ClassLoaderData::_handles that are changed and added while concurrent > marking is in progress.? I've fixed this bug to move the > Klass::_modified_oops and _accumulated_modified_oops to the > ClassLoaderData and use these fields in the CMS remarking phase to catch > any new handles that are added.?? This also fixes this bug > https://bugs.openjdk.java.net/browse/JDK-8173988 . > > In addition, the previous version of this change removed an optimization > during young collection, which showed some uncertain performance > regression in young pause times, so I added this optimization back to > not walk ClassLoaderData during young collections if all the oops are > old.? The performance results of SPECjbb2015 now are slightly better, > but not significantly. > > This latest patch has been tested on tier1-5 on linux x64 and windows > x64 in mach5 test harness. > > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ > > Can I get at least 3 reviewers?? One from each of the compiler, gc, and > runtime group at least since there are changes to all 3. > > Thanks! > Coleen > > > On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >> Summary: Add indirection for fetching mirror so that GC doesn't have >> to follow CLD::_klasses >> >> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 >> changes. >> >> Ran nightly tests through Mach5 and RBT.?? Early performance testing >> showed good performance improvment in GC class loader data processing >> time, but nmethod processing time continues to dominate. Also >> performace testing showed no throughput regression.?? I'm rerunning >> both of these performance testing and will post the numbers. >> >> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >> >> Thanks, >> Coleen From erik.osterlund at oracle.com Fri Sep 29 10:45:25 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 29 Sep 2017 12:45:25 +0200 Subject: RFR (M): 8187977: Generalize Atomic::xchg to use templates In-Reply-To: <6436642A-1806-429F-81CD-06C96E78EECD@oracle.com> References: <1506599782.27149.154.camel@oracle.com> <6436642A-1806-429F-81CD-06C96E78EECD@oracle.com> Message-ID: <59CE2445.2010303@oracle.com> Hi Kim, Thanks for looking at this. Incremental webrev: http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00_01/ Full webrev: http://cr.openjdk.java.net/~eosterlund/8187977/webrev.01/ On 2017-09-28 21:02, Kim Barrett wrote: >> On Sep 28, 2017, at 7:56 AM, Erik ?sterlund wrote: >> >> Hi all, >> >> The time has come to generalize more atomics. >> >> I have modelled Atomic::xchg to systematically do what Atomic::cmpxchg >> did but for xchg. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8187977 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00/ >> >> Testing: mach5 hs-tier3 and JPRT >> >> Thanks, >> /Erik > ============================================================================== > src/hotspot/share/runtime/atomic.hpp > 312 // A default definition is not provided, so specializations must be > 313 // provided for: > 314 // T operator()(T, T volatile*) const > > That description is not correct. The default definition of > PlatformXchg is at line 410. And it makes no sense to talk about > specializing a function for a class that doesn't exist. This should > instead be using similar wording to that in the description of > PlatformCmpxchg. Fixed. > ============================================================================== > src/hotspot/share/runtime/atomic.hpp > > There are a lot of similarities between the handling of the > exchange_value by cmpxchg and xchg, and I expect Atomic::store and > OrderAccess::release_store and variants to also be similar. (Actually, > xchg and store may look *very* similar.) It would be nice to think > about whether there's some refactoring that could be used to reduce > the amount of code involved. I don't have a specific suggestion yet > though. For now, this is just something to think about. I would not like to mix xchg and load/store implementations in Atomic. The reason for that is that I would rather share the implementation between Atomic::load/store and OrderAccess::*load/store*, because they are even more similar. I have that already in my patch queue. > ============================================================================== > src/hotspot/share/compiler/compileBroker.hpp > 335 Atomic::xchg(jint(shutdown_compilation), &_should_compile_new_jobs); > [Added jint conversion.] > > Pre-existing: > > Why is this an Atomic::xchg at all, since the old value isn't used. > Seems like it could be a release_store. > > There also seems to be some data typing problems around > _should_compile_new_jobs. Shouldn't that variable be a > CompilerActivity? That wouldn't have worked previously, and we're > probably still not ready for it if the xchg gets changed to a > release_store, but eventually... so there probably ought to be a bug > for it. I will file a bug for that to be reconsidered later on. > ============================================================================== > src/hotspot/os_cpu/solaris_x86/atomic_solaris_x86.hpp > 146 template<> > 147 template > 148 inline T Atomic::PlatformXchg<8>::operator()(T exchange_value, > > Please move this definition up near the PlatformXchg<4> definition, > e.g. around line 96. Fixed. > > ============================================================================== > src/hotspot/os_cpu/linux_arm/atomic_linux_arm.hpp > > I would prefer the two PlatformXchg specializations be adjacent. If > the 4byte specialization was after PlatformAdd<8>, reviewing would > have been easier, and the different Add would be adjacent and the > different Xchg would be adjacent. The only cost is an extra #ifdef > AARCH64 block around the 8byte Xchg. Fixed. > ============================================================================== > src/hotspot/os_cpu/bsd_zero/atomic_bsd_zero.hpp > 216 return xchg_using_helper(arm_lock_test_and_set, exchange_value, dest); > 219 return xchg_using_helper(m68k_lock_test_and_set, exchange_value, dest); > > arm/m68k_lock_test_and_set expect their arguments in the other order, > and need to be updated. There was a similar change for cmpxchg. > > Same problem in atomic_linux_zero.hpp. Fixed. Thanks for the review. /Erik From coleen.phillimore at oracle.com Fri Sep 29 13:00:31 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 29 Sep 2017 09:00:31 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <1498efad-e443-5875-cc20-b0d0c926e883@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> <1498efad-e443-5875-cc20-b0d0c926e883@oracle.com> Message-ID: <21ea164a-e2f8-e365-69fc-17f4cce7d1d0@oracle.com> On 9/29/17 6:41 AM, Stefan Karlsson wrote: > Hi Coleen, > > I started looking at this, but will need a second round before I've > fully reviewed the GC parts. > > Here are some nits that would be nice to get cleaned up. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.cpp.frames.html > > > ?788???? record_modified_oops();? // necessary? > > This could be removed. Only G1 cares about deleted "weak" references. > > Or we can wait until Erik?'s GC Barrier Interface is in place and > remove it then. I'll remove it now.? I didn't know whether deleting a handle required another CLD walk in young collections, but I think you're saying it does not. > > ---------- > > ?#ifdef CLD_DUMP_KLASSES > ?? if (Verbose) { > ???? Klass* k = _klasses; > ???? while (k != NULL) { > -????? out->print_cr("klass " PTR_FORMAT ", %s, CT: %d, MUT: %d", k, > k->name()->as_C_string(), > -????????? k->has_modified_oops(), k->has_accumulated_modified_oops()); > +????? out->print_cr("klass " PTR_FORMAT ", %s", k, > k->name()->as_C_string()); > ?????? assert(k != k->next_link(), "no loops!"); > ?????? k = k->next_link(); > ???? } > ?? } > ?#endif? // CLD_DUMP_KLASSES > > Pre-existing: I don't think this will compile if you turn on > CLD_DUMP_KLASSES. k must be p2i(k). Fixed. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.hpp.udiff.html > > > +? // Remembered sets support for the oops in the class loader data. > +? jbyte _modified_oops;???????????? // Card Table Equivalent (YC/CMS > support) > +? jbyte _accumulated_modified_oops; // Mod Union Equivalent (CMS > support) > > We should create a follow-up bug to change these jbytes to bools. I agree.? I could do that here, in this change and retest.?? I noticed that I didn't initialize _accumulate_modified_oops in the CLD constructor. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1HeapVerifier.cpp.frames.html > > > Spurious addition: > +? G1CollectedHeap* _g1h; Yes, I don't need that field in VerifyCLDClosure.? Removed. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1OopClosures.hpp.udiff.html > > > Spurious addition?: > +? G1CollectedHeap* g1() { return _g1; } Removed, that was leftover from a previous version.? Thanks for noticing it. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psScavenge.inline.hpp.patch > > > ?? PSPromotionManager* _pm; > -? // Used to redirty a scanned klass if it has oops > +? // Used to redirty a scanned cld if it has oops > ?? // pointing to the young generation after being scanned. > -? Klass*???????????? _scanned_klass; > +? ClassLoaderData*???????????? _scanned_cld; > > Indentation. Fixed. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psTasks.cpp.frames.html > > > ? 80???? case class_loader_data: > ? 81???? { > ? 82?????? PSScavengeCLDClosure ps(pm); > ? 83?????? ClassLoaderDataGraph::cld_do(&ps); > ? 84???? } > > Would you mind changing the name ps to cld_closure? Not at all.?? Fixed. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/shared/genOopClosures.hpp.patch > > > +? OopsInClassLoaderDataOrGenClosure*?? _scavenge_closure; > ?? // true if the the modified oops state should be saved. > ?? bool???????????????????? _accumulate_modified_oops; > > Indentation. I moved it to align with _scavenge closure, although I think the coding standard algorithm is to align if there are no intervening comments, and not otherwise.? But it looks a bit better aligned. > > ---------- > +? void do_cld(ClassLoaderData* k); > > Rename k? Fixed. Thanks for reading through this.? I tried to make it completely nit free but I missed a k. argh.? :) Coleen > > Thanks, > StefanK > > On 2017-09-28 23:36, coleen.phillimore at oracle.com wrote: >> >> Thank you to Stefan Karlsson offlist for pointing out that the >> previous .01 version of this webrev breaks CMS in that it doesn't >> remember ClassLoaderData::_handles that are changed and added while >> concurrent marking is in progress.? I've fixed this bug to move the >> Klass::_modified_oops and _accumulated_modified_oops to the >> ClassLoaderData and use these fields in the CMS remarking phase to >> catch any new handles that are added.?? This also fixes this bug >> https://bugs.openjdk.java.net/browse/JDK-8173988 . >> >> In addition, the previous version of this change removed an >> optimization during young collection, which showed some uncertain >> performance regression in young pause times, so I added this >> optimization back to not walk ClassLoaderData during young >> collections if all the oops are old.? The performance results of >> SPECjbb2015 now are slightly better, but not significantly. >> >> This latest patch has been tested on tier1-5 on linux x64 and windows >> x64 in mach5 test harness. >> >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >> >> Can I get at least 3 reviewers?? One from each of the compiler, gc, >> and runtime group at least since there are changes to all 3. >> >> Thanks! >> Coleen >> >> >> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >>> Summary: Add indirection for fetching mirror so that GC doesn't have >>> to follow CLD::_klasses >>> >>> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 >>> changes. >>> >>> Ran nightly tests through Mach5 and RBT.?? Early performance testing >>> showed good performance improvment in GC class loader data >>> processing time, but nmethod processing time continues to dominate. >>> Also performace testing showed no throughput regression.?? I'm >>> rerunning both of these performance testing and will post the numbers. >>> >>> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >>> >>> Thanks, >>> Coleen From patric.hedlin at oracle.com Fri Sep 29 13:08:15 2017 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Fri, 29 Sep 2017 15:08:15 +0200 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). Message-ID: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue: https://bugs.openjdk.java.net/browse/JDK-8172232 Webrev: http://cr.openjdk.java.net/~phedlin/tr8172232/ 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). Subsumes (duplicate) JDK-8186579: VM_Version::platform_features() needs update on linux-sparc. Caveat: This update will introduce some redundancies into the code base, features and definitions currently not used, addressed by subsequent bug or feature updates/patches. Fujitsu HW is treated very conservatively. Testing: JDK9/JDK10 local jtreg/hotspot Thanks to Adrian for additional test (and review) support. Tested-By: John Paul Adrian Glaubitz Best regards, Patric From kim.barrett at oracle.com Fri Sep 29 16:24:02 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 29 Sep 2017 12:24:02 -0400 Subject: RFR (M): 8187977: Generalize Atomic::xchg to use templates In-Reply-To: <59CE2445.2010303@oracle.com> References: <1506599782.27149.154.camel@oracle.com> <6436642A-1806-429F-81CD-06C96E78EECD@oracle.com> <59CE2445.2010303@oracle.com> Message-ID: <0D84DC64-766A-4669-9DEA-2D0E71ECE9A4@oracle.com> > On Sep 29, 2017, at 6:45 AM, Erik ?sterlund wrote: > On 2017-09-28 21:02, Kim Barrett wrote: >> src/hotspot/share/runtime/atomic.hpp >> >> There are a lot of similarities between the handling of the >> exchange_value by cmpxchg and xchg, and I expect Atomic::store and >> OrderAccess::release_store and variants to also be similar. (Actually, >> xchg and store may look *very* similar.) It would be nice to think >> about whether there's some refactoring that could be used to reduce >> the amount of code involved. I don't have a specific suggestion yet >> though. For now, this is just something to think about. > > I would not like to mix xchg and load/store implementations in Atomic. The reason for that is that I would rather share the implementation between Atomic::load/store and OrderAccess::*load/store*, because they are even more similar. I have that already in my patch queue. I?m specifically looking at the canonicalization of the value_to_store and the destination. I think *all* of those operations ought to handle this the same way. And the present approach to that is somewhat verbose and duplicated per operation. It seems to me that we should be able to do better. Something like a canonicalization function and a type trait for the canonicalized destination type. I?ll see if I can come up with something more concrete to suggest. From kim.barrett at oracle.com Fri Sep 29 18:32:30 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 29 Sep 2017 14:32:30 -0400 Subject: RFR (M): 8187977: Generalize Atomic::xchg to use templates In-Reply-To: <59CE2445.2010303@oracle.com> References: <1506599782.27149.154.camel@oracle.com> <6436642A-1806-429F-81CD-06C96E78EECD@oracle.com> <59CE2445.2010303@oracle.com> Message-ID: <24862C9D-1CFC-484C-93F2-D34C12365A89@oracle.com> > On Sep 29, 2017, at 6:45 AM, Erik ?sterlund wrote: > > Hi Kim, > > Thanks for looking at this. > > Incremental webrev: > http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00_01/ > > Full webrev: > http://cr.openjdk.java.net/~eosterlund/8187977/webrev.01/ Looks good. I?ll get back to you if I come up with any ideas for improving the canonicalization of store_value and dest. From vladimir.kozlov at oracle.com Fri Sep 29 18:56:28 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Sep 2017 11:56:28 -0700 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). In-Reply-To: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> References: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> Message-ID: <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> In general it is fine. Few notes. You use ifdef DEBUG_SPARC_CAPS which is undefed at the beginning. Is it set by gcc by default? Coding style for methods definitions - open parenthesis should be on the same line: + bool match(const char* s) const + { Thanks, Vladimir On 9/29/17 6:08 AM, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8172232 > > Webrev: http://cr.openjdk.java.net/~phedlin/tr8172232/ > > > 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). > > ??? Subsumes (duplicate) JDK-8186579: VM_Version::platform_features() needs update on linux-sparc. > > > Caveat: > > ??? This update will introduce some redundancies into the code base, features and definitions > ??? currently not used, addressed by subsequent bug or feature updates/patches. Fujitsu HW is > ??? treated very conservatively. > > > Testing: > > ??? JDK9/JDK10 local jtreg/hotspot > > > Thanks to Adrian for additional test (and review) support. > > Tested-By: John Paul Adrian Glaubitz > > > Best regards, > Patric > From vladimir.kozlov at oracle.com Fri Sep 29 19:00:59 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Sep 2017 12:00:59 -0700 Subject: RFR(M):8188139:PPC64: Superword Level Parallelization with VSX In-Reply-To: References: Message-ID: I looked on shared code. One comment - change in opto/type.cpp affects s390. Vladimir On 9/29/17 2:37 AM, Michihiro Horie wrote: > Dear all, > > Would you please review the following change? > Bug: https://bugs.openjdk.java.net/browse/JDK-8188139 > Webrev: http://cr.openjdk.java.net/~mhorie/8188139/webrev.00/ > > This change introduces to use VSX for Superword Level Parallelization, concretely VSX instructions are emitted for > Replicate[BSIFDL] nodes in ppc.ad. > Since I am not familiar with the hotspot's register allocation and the TOC use in POWER, I would be very grateful to > have any comments to improve the change. > > In addition, the change includes some minor fixes in assembler_ppc.inline.hpp. I think there are some instructions that > should have 1u in higher bits. > > > I used the attached micro benchmark. > /(See attached file: ArraysFillTest.java)/ > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > From erik.osterlund at oracle.com Fri Sep 29 19:16:32 2017 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 29 Sep 2017 21:16:32 +0200 Subject: RFR (M): 8187977: Generalize Atomic::xchg to use templates In-Reply-To: <24862C9D-1CFC-484C-93F2-D34C12365A89@oracle.com> References: <1506599782.27149.154.camel@oracle.com> <6436642A-1806-429F-81CD-06C96E78EECD@oracle.com> <59CE2445.2010303@oracle.com> <24862C9D-1CFC-484C-93F2-D34C12365A89@oracle.com> Message-ID: Hi Kim, Thanks for the review. /Erik On 29 Sep 2017, at 20:32, Kim Barrett wrote: >> On Sep 29, 2017, at 6:45 AM, Erik ?sterlund wrote: >> >> Hi Kim, >> >> Thanks for looking at this. >> >> Incremental webrev: >> http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00_01/ >> >> Full webrev: >> http://cr.openjdk.java.net/~eosterlund/8187977/webrev.01/ > > Looks good. > > I?ll get back to you if I come up with any ideas for improving the canonicalization of store_value and dest. > From coleen.phillimore at oracle.com Fri Sep 29 19:17:08 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 29 Sep 2017 15:17:08 -0400 Subject: RFR (M): 8187977: Generalize Atomic::xchg to use templates In-Reply-To: <59CE2445.2010303@oracle.com> References: <1506599782.27149.154.camel@oracle.com> <6436642A-1806-429F-81CD-06C96E78EECD@oracle.com> <59CE2445.2010303@oracle.com> Message-ID: <7fa929c1-108a-6aec-f44c-0fcfc0b49aad@oracle.com> Erik, This change looks good to me. Do we want to deprecate xchg_ptr like cmpxchg_ptr ??? Is there an RFE filed for that? On 9/29/17 6:45 AM, Erik ?sterlund wrote: > Hi Kim, > > Thanks for looking at this. > > Incremental webrev: > http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00_01/ > > Full webrev: > http://cr.openjdk.java.net/~eosterlund/8187977/webrev.01/ > > On 2017-09-28 21:02, Kim Barrett wrote: >>> On Sep 28, 2017, at 7:56 AM, Erik ?sterlund >>> wrote: >>> >>> Hi all, >>> >>> The time has come to generalize more atomics. >>> >>> I have modelled Atomic::xchg to systematically do what Atomic::cmpxchg >>> did but for xchg. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8187977 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00/ >>> >>> Testing: mach5 hs-tier3 and JPRT >>> >>> Thanks, >>> /Erik >> ============================================================================== >> >> src/hotspot/share/runtime/atomic.hpp >> ? 312?? // A default definition is not provided, so specializations >> must be >> ? 313?? // provided for: >> ? 314?? //?? T operator()(T, T volatile*) const >> >> That description is not correct. The default definition of >> PlatformXchg is at line 410. And it makes no sense to talk about >> specializing a function for a class that doesn't exist. This should >> instead be using similar wording to that in the description of >> PlatformCmpxchg. > > Fixed. > >> ============================================================================== >> >> src/hotspot/share/runtime/atomic.hpp >> >> There are a lot of similarities between the handling of the >> exchange_value by cmpxchg and xchg, and I expect Atomic::store and >> OrderAccess::release_store and variants to also be similar. (Actually, >> xchg and store may look *very* similar.) It would be nice to think >> about whether there's some refactoring that could be used to reduce >> the amount of code involved. I don't have a specific suggestion yet >> though.? For now, this is just something to think about. > > I would not like to mix xchg and load/store implementations in Atomic. > The reason for that is that I would rather share the implementation > between Atomic::load/store and OrderAccess::*load/store*, because they > are even more similar. I have that already in my patch queue. > I think I would have to see understand what the duplication is before I want to see more lines of metaprogramming tricks. Do you mean the translate recover/decay here? 697 template 698 struct Atomic::XchgImpl< 699 T, T, 700 typename EnableIf::value>::type> 701 VALUE_OBJ_CLASS_SPEC 702 { ?703 T operator()(T exchange_value, T volatile* dest) const { 704 typedef PrimitiveConversions::Translate Translator; 705 typedef typename Translator::Decayed Decayed; 706 STATIC_ASSERT(sizeof(T) == sizeof(Decayed)); 707 return Translator::recover( 708 xchg(Translator::decay(exchange_value), 709 reinterpret_cast(dest))); 710 } I can imagine this can only have more levels of logical indirection if generalized more. I have to admit that I hate the VALUE_OBJ_CLASS_SPEC in these classes.?? There's already enough decoder ring work for each line without this distracting (mostly useless) macro here.?? Originally, this meant that this class is embedded in another.? It doesn't really help the code in any way here.?? Global operator new already has an assert in the very unlikely case that somebody did a Atomic::XchgImpl* x = new Atomic::XchgImpl();?? // did I write that correctly? >> ============================================================================== >> >> src/hotspot/share/compiler/compileBroker.hpp >> ? 335???? Atomic::xchg(jint(shutdown_compilation), >> &_should_compile_new_jobs); >> [Added jint conversion.] >> >> Pre-existing: >> >> Why is this an Atomic::xchg at all, since the old value isn't used. >> Seems like it could be a release_store. >> >> There also seems to be some data typing problems around >> _should_compile_new_jobs.? Shouldn't that variable be a >> CompilerActivity?? That wouldn't have worked previously, and we're >> probably still not ready for it if the xchg gets changed to a >> release_store, but eventually... so there probably ought to be a bug >> for it. > > I will file a bug for that to be reconsidered later on. > I would leave this to the compiler group to look at. Thanks, Coleen >> ============================================================================== >> >> src/hotspot/os_cpu/solaris_x86/atomic_solaris_x86.hpp >> ? 146 template<> >> ? 147 template >> ? 148 inline T Atomic::PlatformXchg<8>::operator()(T exchange_value, >> >> Please move this definition up near the PlatformXchg<4> definition, >> e.g. around line 96. > > Fixed. > >> >> ============================================================================== >> >> src/hotspot/os_cpu/linux_arm/atomic_linux_arm.hpp >> >> I would prefer the two PlatformXchg specializations be adjacent.? If >> the 4byte specialization was after PlatformAdd<8>, reviewing would >> have been easier, and the different Add would be adjacent and the >> different Xchg would be adjacent.? The only cost is an extra #ifdef >> AARCH64 block around the 8byte Xchg. > > Fixed. > >> ============================================================================== >> >> src/hotspot/os_cpu/bsd_zero/atomic_bsd_zero.hpp >> ? 216?? return xchg_using_helper(arm_lock_test_and_set, >> exchange_value, dest); >> ? 219?? return xchg_using_helper(m68k_lock_test_and_set, >> exchange_value, dest); >> >> arm/m68k_lock_test_and_set expect their arguments in the other order, >> and need to be updated.? There was a similar change for cmpxchg. >> >> Same problem in atomic_linux_zero.hpp. > > Fixed. > > Thanks for the review. > > /Erik From gromero at linux.vnet.ibm.com Fri Sep 29 19:19:51 2017 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 29 Sep 2017 16:19:51 -0300 Subject: RFR(M):8188139:PPC64: Superword Level Parallelization with VSX In-Reply-To: References: Message-ID: <59CE9CD7.1070407@linux.vnet.ibm.com> Hi Vladimir, On 29-09-2017 16:00, Vladimir Kozlov wrote: > I looked on shared code. One comment - change in opto/type.cpp affects s390. Do you strongly oppose to split PPC64 and s390 like? diff -r e93ed1a09240 src/share/vm/opto/type.cpp --- a/src/share/vm/opto/type.cpp Tue Aug 08 22:57:34 2017 +0000 +++ b/src/share/vm/opto/type.cpp Fri Sep 29 15:17:17 2017 -0400 @@ -67,7 +67,13 @@ { Bad, T_ILLEGAL, "vectorx:", false, 0, relocInfo::none }, // VectorX { Bad, T_ILLEGAL, "vectory:", false, 0, relocInfo::none }, // VectorY { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // VectorZ -#elif defined(PPC64) || defined(S390) +#elif defined(PPC64) + { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS + { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD + { Bad, T_ILLEGAL, "vectorx:", false, Op_VecX, relocInfo::none }, // VectorX + { Bad, T_ILLEGAL, "vectory:", false, 0, relocInfo::none }, // VectorY + { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // VectorZ +#elif defined(S390) { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD { Bad, T_ILLEGAL, "vectorx:", false, 0, relocInfo::none }, // VectorX Kind regards, Gustavo > Vladimir > > On 9/29/17 2:37 AM, Michihiro Horie wrote: >> Dear all, >> >> Would you please review the following change? >> Bug: https://bugs.openjdk.java.net/browse/JDK-8188139 >> Webrev: http://cr.openjdk.java.net/~mhorie/8188139/webrev.00/ >> >> This change introduces to use VSX for Superword Level Parallelization, concretely VSX instructions are emitted for Replicate[BSIFDL] nodes in ppc.ad. >> Since I am not familiar with the hotspot's register allocation and the TOC use in POWER, I would be very grateful to have any comments to improve the change. >> >> In addition, the change includes some minor fixes in assembler_ppc.inline.hpp. I think there are some instructions that should have 1u in higher bits. >> >> >> I used the attached micro benchmark. >> /(See attached file: ArraysFillTest.java)/ >> >> Best regards, >> -- >> Michihiro, >> IBM Research - Tokyo >> > From mbrandy at linux.vnet.ibm.com Fri Sep 29 21:00:13 2017 From: mbrandy at linux.vnet.ibm.com (Matthew Brandyberry) Date: Fri, 29 Sep 2017 16:00:13 -0500 Subject: RFR(M) 8188165: PPC64: Optimize Unsafe.copyMemory and arraycopy Message-ID: This is specific to PPC64LE only. The emphasis in the proposed code is on minimizing branches. Thus, this code makes no attempt to avoid misaligned accesses and each block is designed to copy as many elements as possible. As one data point, this yields as much as a 13x improvement in jbyte_disjoint_arraycopy for certain misaligned scenarios. Bug: https://bugs.openjdk.java.net/browse/JDK-8188165 Webrev: http://cr.openjdk.java.net/~mbrandy/8188165/jdk10/v1/ Thanks, -Matt From vladimir.kozlov at oracle.com Fri Sep 29 21:44:10 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Sep 2017 14:44:10 -0700 Subject: RFR(M):8188139:PPC64: Superword Level Parallelization with VSX In-Reply-To: <59CE9CD7.1070407@linux.vnet.ibm.com> References: <59CE9CD7.1070407@linux.vnet.ibm.com> Message-ID: I am fine with it for these changes. Thanks, Vladimir On 9/29/17 12:19 PM, Gustavo Romero wrote: > Hi Vladimir, > > On 29-09-2017 16:00, Vladimir Kozlov wrote: >> I looked on shared code. One comment - change in opto/type.cpp affects s390. > > Do you strongly oppose to split PPC64 and s390 like? > > diff -r e93ed1a09240 src/share/vm/opto/type.cpp > --- a/src/share/vm/opto/type.cpp Tue Aug 08 22:57:34 2017 +0000 > +++ b/src/share/vm/opto/type.cpp Fri Sep 29 15:17:17 2017 -0400 > @@ -67,7 +67,13 @@ > { Bad, T_ILLEGAL, "vectorx:", false, 0, relocInfo::none }, // VectorX > { Bad, T_ILLEGAL, "vectory:", false, 0, relocInfo::none }, // VectorY > { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // VectorZ > -#elif defined(PPC64) || defined(S390) > +#elif defined(PPC64) > + { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS > + { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD > + { Bad, T_ILLEGAL, "vectorx:", false, Op_VecX, relocInfo::none }, // VectorX > + { Bad, T_ILLEGAL, "vectory:", false, 0, relocInfo::none }, // VectorY > + { Bad, T_ILLEGAL, "vectorz:", false, 0, relocInfo::none }, // VectorZ > +#elif defined(S390) > { Bad, T_ILLEGAL, "vectors:", false, 0, relocInfo::none }, // VectorS > { Bad, T_ILLEGAL, "vectord:", false, Op_RegL, relocInfo::none }, // VectorD > { Bad, T_ILLEGAL, "vectorx:", false, 0, relocInfo::none }, // VectorX > > > Kind regards, > Gustavo > >> Vladimir >> >> On 9/29/17 2:37 AM, Michihiro Horie wrote: >>> Dear all, >>> >>> Would you please review the following change? >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8188139 >>> Webrev: http://cr.openjdk.java.net/~mhorie/8188139/webrev.00/ >>> >>> This change introduces to use VSX for Superword Level Parallelization, concretely VSX instructions are emitted for Replicate[BSIFDL] nodes in ppc.ad. >>> Since I am not familiar with the hotspot's register allocation and the TOC use in POWER, I would be very grateful to have any comments to improve the change. >>> >>> In addition, the change includes some minor fixes in assembler_ppc.inline.hpp. I think there are some instructions that should have 1u in higher bits. >>> >>> >>> I used the attached micro benchmark. >>> /(See attached file: ArraysFillTest.java)/ >>> >>> Best regards, >>> -- >>> Michihiro, >>> IBM Research - Tokyo >>> >> > From vladimir.kozlov at oracle.com Fri Sep 29 22:50:46 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 29 Sep 2017 15:50:46 -0700 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> Message-ID: <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> I hit build failure when tried to push changes: src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data I am going to fix it by casting (int): + void adjust_size(size_t used) { + _size = (int)used; + _data_offset = (int)used; + _code_end = (address)this + used; + _data_end = (address)this + used; + } Note, CodeCache size can't more than 2Gb (max_int) so such casting is fine. Vladimir On 9/6/17 6:20 AM, Volker Simonis wrote: > On Tue, Sep 5, 2017 at 9:36 PM, wrote: >> >> I was going to make the same comment about the friend declaration in v1, so >> v2 looks better to me. Looks good. Thank you for finding a solution to >> this problem that we've had for a long time. I will sponsor this (remind me >> if I forget after the 18th). >> > > Thanks Coleen! I've updated > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ > > in-place and added you as a second reviewer. > > Regards, > Volker > > >> thanks, >> Coleen >> >> >> >> On 9/5/17 1:17 PM, Vladimir Kozlov wrote: >>> >>> On 9/5/17 9:49 AM, Volker Simonis wrote: >>>> >>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov >>>> wrote: >>>>> >>>>> May be add new CodeBlob's method to adjust sizes instead of directly >>>>> setting >>>>> them in CodeCache::free_unused_tail(). Then you would not need friend >>>>> class >>>>> CodeCache in CodeBlob. >>>>> >>>> >>>> Changed as suggested (I didn't liked the friend declaration as well :) >>>> >>>>> Also I think adjustment to header_size should be done in >>>>> CodeCache::free_unused_tail() to limit scope of code who knows about >>>>> blob >>>>> layout. >>>>> >>>> >>>> Yes, that's much cleaner. Please find the updated webrev here: >>>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ >>> >>> >>> Good. >>> >>>> >>>> I've also found another "day 1" problem in StubQueue::next(): >>>> >>>> Stub* next(Stub* s) const { int i = >>>> index_of(s) + stub_size(s); >>>> - if (i == >>>> _buffer_limit) i = 0; >>>> + // Only wrap >>>> around in the non-contiguous case (see stubss.cpp) >>>> + if (i == >>>> _buffer_limit && _queue_end < _buffer_limit) i = 0; >>>> return (i == >>>> _queue_end) ? NULL : stub_at(i); >>>> } >>>> >>>> The problem was that the method was not prepared to handle the case >>>> where _buffer_limit == _queue_end == _buffer_size which lead to an >>>> infinite recursion when iterating over a StubQueue with >>>> StubQueue::next() until next() returns NULL (as this was for example >>>> done with -XX:+PrintInterpreter). But with the new, trimmed CodeBlob >>>> we run into exactly this situation. >>> >>> >>> Okay. >>> >>>> >>>> While doing this last fix I also noticed that "StubQueue::stubs_do()", >>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't seem >>>> to be used anywhere in the open code base (please correct me if I'm >>>> wrong). What do you think, maybe we should remove this code in a >>>> follow up change if it is really not needed? >>> >>> >>> register_queue() is used in constructor. Other 2 you can remove. >>> stub_code_begin() and stub_code_end() are not used too -remove. >>> I thought we run on linux with flag which warn about unused code. >>> >>>> >>>> Finally, could you please run the new version through JPRT and sponsor >>>> it once jdk10/hs will be opened again? >>> >>> >>> Will do when jdk10 "consolidation" is finished. Please, remind me later if >>> I forget. >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> Volker >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> >>>>> On 9/1/17 8:46 AM, Volker Simonis wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I've decided to split the fix for the 'CodeHeap::contains_blob()' >>>>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails >>>>>> because of problems in CodeHeap::contains_blob()" >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new >>>>>> review thread for discussing it at: >>>>>> >>>>>> >>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >>>>>> >>>>>> So please lets keep this thread for discussing the interpreter code >>>>>> size issue only. I've prepared a new version of the webrev which is >>>>>> the same as the first one with the only difference that the change to >>>>>> 'CodeHeap::contains_blob()' has been removed: >>>>>> >>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >>>>>> >>>>>> Thanks, >>>>>> Volker >>>>>> >>>>>> >>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Very good change. Thank you, Volker. >>>>>>>> >>>>>>>> About contains_blob(). The problem is that AOTCompiledMethod >>>>>>>> allocated >>>>>>>> in >>>>>>>> CHeap and not in aot code section (which is RO): >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>>>>>> >>>>>>>> It is allocated in CHeap after AOT library is loaded. Its >>>>>>>> code_begin() >>>>>>>> points to AOT code section but AOTCompiledMethod* points outside it >>>>>>>> (to >>>>>>>> normal malloced space) so you can't use (char*)blob address. >>>>>>>> >>>>>>> >>>>>>> Thanks for the explanation - now I got it. >>>>>>> >>>>>>>> There are 2 ways to fix it, I think. >>>>>>>> One is to add new field to CodeBlobLayout and set it to blob* address >>>>>>>> for >>>>>>>> normal CodeCache blobs and to code_begin for AOT code. >>>>>>>> Second is to use contains(blob->code_end() - 1) assuming that AOT >>>>>>>> code >>>>>>>> is >>>>>>>> never zero. >>>>>>>> >>>>>>> >>>>>>> I'll give it a try tomorrow and will send out a new webrev. >>>>>>> >>>>>>> Regards, >>>>>>> Volker >>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> >>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> While working on this, I found another problem which is related to >>>>>>>>>>> the >>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg >>>>>>>>>>> test >>>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>>>>>>> >>>>>>>>>>> The problem is that JDK-8183573 replaced >>>>>>>>>>> >>>>>>>>>>> virtual bool contains_blob(const CodeBlob* blob) const { >>>>>>>>>>> return >>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>>>>>>>> >>>>>>>>>>> by: >>>>>>>>>>> >>>>>>>>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>>>>>>>> contains(blob->code_begin()); } >>>>>>>>>>> >>>>>>>>>>> But that my be wrong in the corner case where the size of the >>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that case >>>>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header >>>>>>>>>>> which >>>>>>>>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I recall this change was somehow necessary to allow merging >>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>>>>>>>> one devirtualized method, so you need to ensure all AOT tests >>>>>>>>>> pass with this change (on linux-x64). >>>>>>>>>> >>>>>>>>> >>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>>>>>>>> successful. Are there any other tests I should check? >>>>>>>>> >>>>>>>>> That said, it is a little hard to follow the stages of your change. >>>>>>>>> It >>>>>>>>> seems like >>>>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>>>>>>> was reviewed [1] but then finally the slightly changed version from >>>>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ >>>>>>>>> was >>>>>>>>> checked in and linked to the bug report. >>>>>>>>> >>>>>>>>> The first, reviewed version of the change still had a correct >>>>>>>>> version >>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>>>>>>>> checked in version has the faulty version of that method. >>>>>>>>> >>>>>>>>> I don't know why you finally did that change to 'contains_blob()' >>>>>>>>> but >>>>>>>>> I don't see any reason why we shouldn't be able to directly use the >>>>>>>>> blob's address for inclusion checking. From what I understand, it >>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap so no >>>>>>>>> reason >>>>>>>>> to mess with 'CodeBlob::code_begin()'. >>>>>>>>> >>>>>>>>> Please let me know if I'm missing something. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>>> >>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>>>>>>> >>>>>>>>>> I can't help to wonder if we'd not be better served by disallowing >>>>>>>>>> zero-sized payloads. Is this something that can ever actually >>>>>>>>>> happen except by abuse of the white box API? >>>>>>>>>> >>>>>>>>> >>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>>>>>>>> wants to allocate "segment sized" blocks which is most easily >>>>>>>>> achieved >>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's nothing >>>>>>>>> wrong >>>>>>>>> about it if we handle the inclusion tests correctly. >>>>>>>>> >>>>>>>>> Thank you and best regards, >>>>>>>>> Volker >>>>>>>>> >>>>>>>>>> /Claes >> >> From volker.simonis at gmail.com Sat Sep 30 04:28:52 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Sat, 30 Sep 2017 04:28:52 +0000 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> Message-ID: Hi Vladimir, thanks a lot for remembering these changes! Regards, Volker Vladimir Kozlov schrieb am Fr. 29. Sep. 2017 um 15:47: > I hit build failure when tried to push changes: > > src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' : conversion > from 'size_t' to 'int', possible loss of data > src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' : conversion > from 'size_t' to 'int', possible loss of data > > I am going to fix it by casting (int): > > + void adjust_size(size_t used) { > + _size = (int)used; > + _data_offset = (int)used; > + _code_end = (address)this + used; > + _data_end = (address)this + used; > + } > > Note, CodeCache size can't more than 2Gb (max_int) so such casting is fine. > > Vladimir > > On 9/6/17 6:20 AM, Volker Simonis wrote: > > On Tue, Sep 5, 2017 at 9:36 PM, wrote: > >> > >> I was going to make the same comment about the friend declaration in > v1, so > >> v2 looks better to me. Looks good. Thank you for finding a solution to > >> this problem that we've had for a long time. I will sponsor this > (remind me > >> if I forget after the 18th). > >> > > > > Thanks Coleen! I've updated > > > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ > > > > in-place and added you as a second reviewer. > > > > Regards, > > Volker > > > > > >> thanks, > >> Coleen > >> > >> > >> > >> On 9/5/17 1:17 PM, Vladimir Kozlov wrote: > >>> > >>> On 9/5/17 9:49 AM, Volker Simonis wrote: > >>>> > >>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov > >>>> wrote: > >>>>> > >>>>> May be add new CodeBlob's method to adjust sizes instead of directly > >>>>> setting > >>>>> them in CodeCache::free_unused_tail(). Then you would not need > friend > >>>>> class > >>>>> CodeCache in CodeBlob. > >>>>> > >>>> > >>>> Changed as suggested (I didn't liked the friend declaration as well :) > >>>> > >>>>> Also I think adjustment to header_size should be done in > >>>>> CodeCache::free_unused_tail() to limit scope of code who knows about > >>>>> blob > >>>>> layout. > >>>>> > >>>> > >>>> Yes, that's much cleaner. Please find the updated webrev here: > >>>> > >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ > >>> > >>> > >>> Good. > >>> > >>>> > >>>> I've also found another "day 1" problem in StubQueue::next(): > >>>> > >>>> Stub* next(Stub* s) const { int i = > >>>> index_of(s) + stub_size(s); > >>>> - if (i == > >>>> _buffer_limit) i = 0; > >>>> + // Only wrap > >>>> around in the non-contiguous case (see stubss.cpp) > >>>> + if (i == > >>>> _buffer_limit && _queue_end < _buffer_limit) i = 0; > >>>> return (i == > >>>> _queue_end) ? NULL : stub_at(i); > >>>> } > >>>> > >>>> The problem was that the method was not prepared to handle the case > >>>> where _buffer_limit == _queue_end == _buffer_size which lead to an > >>>> infinite recursion when iterating over a StubQueue with > >>>> StubQueue::next() until next() returns NULL (as this was for example > >>>> done with -XX:+PrintInterpreter). But with the new, trimmed CodeBlob > >>>> we run into exactly this situation. > >>> > >>> > >>> Okay. > >>> > >>>> > >>>> While doing this last fix I also noticed that "StubQueue::stubs_do()", > >>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't seem > >>>> to be used anywhere in the open code base (please correct me if I'm > >>>> wrong). What do you think, maybe we should remove this code in a > >>>> follow up change if it is really not needed? > >>> > >>> > >>> register_queue() is used in constructor. Other 2 you can remove. > >>> stub_code_begin() and stub_code_end() are not used too -remove. > >>> I thought we run on linux with flag which warn about unused code. > >>> > >>>> > >>>> Finally, could you please run the new version through JPRT and sponsor > >>>> it once jdk10/hs will be opened again? > >>> > >>> > >>> Will do when jdk10 "consolidation" is finished. Please, remind me > later if > >>> I forget. > >>> > >>> Thanks, > >>> Vladimir > >>> > >>>> > >>>> Thanks, > >>>> Volker > >>>> > >>>>> Thanks, > >>>>> Vladimir > >>>>> > >>>>> > >>>>> On 9/1/17 8:46 AM, Volker Simonis wrote: > >>>>>> > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I've decided to split the fix for the 'CodeHeap::contains_blob()' > >>>>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails > >>>>>> because of problems in CodeHeap::contains_blob()" > >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a > new > >>>>>> review thread for discussing it at: > >>>>>> > >>>>>> > >>>>>> > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html > >>>>>> > >>>>>> So please lets keep this thread for discussing the interpreter code > >>>>>> size issue only. I've prepared a new version of the webrev which is > >>>>>> the same as the first one with the only difference that the change > to > >>>>>> 'CodeHeap::contains_blob()' has been removed: > >>>>>> > >>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ > >>>>>> > >>>>>> Thanks, > >>>>>> Volker > >>>>>> > >>>>>> > >>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis > >>>>>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> Very good change. Thank you, Volker. > >>>>>>>> > >>>>>>>> About contains_blob(). The problem is that AOTCompiledMethod > >>>>>>>> allocated > >>>>>>>> in > >>>>>>>> CHeap and not in aot code section (which is RO): > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 > >>>>>>>> > >>>>>>>> It is allocated in CHeap after AOT library is loaded. Its > >>>>>>>> code_begin() > >>>>>>>> points to AOT code section but AOTCompiledMethod* points outside > it > >>>>>>>> (to > >>>>>>>> normal malloced space) so you can't use (char*)blob address. > >>>>>>>> > >>>>>>> > >>>>>>> Thanks for the explanation - now I got it. > >>>>>>> > >>>>>>>> There are 2 ways to fix it, I think. > >>>>>>>> One is to add new field to CodeBlobLayout and set it to blob* > address > >>>>>>>> for > >>>>>>>> normal CodeCache blobs and to code_begin for AOT code. > >>>>>>>> Second is to use contains(blob->code_end() - 1) assuming that AOT > >>>>>>>> code > >>>>>>>> is > >>>>>>>> never zero. > >>>>>>>> > >>>>>>> > >>>>>>> I'll give it a try tomorrow and will send out a new webrev. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Volker > >>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Vladimir > >>>>>>>> > >>>>>>>> > >>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> While working on this, I found another problem which is > related to > >>>>>>>>>>> the > >>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing the > JTreg > >>>>>>>>>>> test > >>>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. > >>>>>>>>>>> > >>>>>>>>>>> The problem is that JDK-8183573 replaced > >>>>>>>>>>> > >>>>>>>>>>> virtual bool contains_blob(const CodeBlob* blob) const { > >>>>>>>>>>> return > >>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } > >>>>>>>>>>> > >>>>>>>>>>> by: > >>>>>>>>>>> > >>>>>>>>>>> bool contains_blob(const CodeBlob* blob) const { return > >>>>>>>>>>> contains(blob->code_begin()); } > >>>>>>>>>>> > >>>>>>>>>>> But that my be wrong in the corner case where the size of the > >>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of > the > >>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that case > >>>>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's > header > >>>>>>>>>>> which > >>>>>>>>>>> is a memory location which doesn't belong to the CodeBlob > anymore. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I recall this change was somehow necessary to allow merging > >>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into > >>>>>>>>>> one devirtualized method, so you need to ensure all AOT tests > >>>>>>>>>> pass with this change (on linux-x64). > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and > passed > >>>>>>>>> successful. Are there any other tests I should check? > >>>>>>>>> > >>>>>>>>> That said, it is a little hard to follow the stages of your > change. > >>>>>>>>> It > >>>>>>>>> seems like > >>>>>>>>> > http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ > >>>>>>>>> was reviewed [1] but then finally the slightly changed version > from > >>>>>>>>> > http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ > >>>>>>>>> was > >>>>>>>>> checked in and linked to the bug report. > >>>>>>>>> > >>>>>>>>> The first, reviewed version of the change still had a correct > >>>>>>>>> version > >>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the > second, > >>>>>>>>> checked in version has the faulty version of that method. > >>>>>>>>> > >>>>>>>>> I don't know why you finally did that change to 'contains_blob()' > >>>>>>>>> but > >>>>>>>>> I don't see any reason why we shouldn't be able to directly use > the > >>>>>>>>> blob's address for inclusion checking. From what I understand, it > >>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap so no > >>>>>>>>> reason > >>>>>>>>> to mess with 'CodeBlob::code_begin()'. > >>>>>>>>> > >>>>>>>>> Please let me know if I'm missing something. > >>>>>>>>> > >>>>>>>>> [1] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html > >>>>>>>>> > >>>>>>>>>> I can't help to wonder if we'd not be better served by > disallowing > >>>>>>>>>> zero-sized payloads. Is this something that can ever actually > >>>>>>>>>> happen except by abuse of the white box API? > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) > specifically > >>>>>>>>> wants to allocate "segment sized" blocks which is most easily > >>>>>>>>> achieved > >>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's nothing > >>>>>>>>> wrong > >>>>>>>>> about it if we handle the inclusion tests correctly. > >>>>>>>>> > >>>>>>>>> Thank you and best regards, > >>>>>>>>> Volker > >>>>>>>>> > >>>>>>>>>> /Claes > >> > >> > From erik.osterlund at oracle.com Sat Sep 30 21:02:20 2017 From: erik.osterlund at oracle.com (Erik =?ISO-8859-1?Q?=D6sterlund?=) Date: Sat, 30 Sep 2017 23:02:20 +0200 Subject: RFR (M): 8187977: Generalize Atomic::xchg to use templates In-Reply-To: <7fa929c1-108a-6aec-f44c-0fcfc0b49aad@oracle.com> References: <1506599782.27149.154.camel@oracle.com> <6436642A-1806-429F-81CD-06C96E78EECD@oracle.com> <59CE2445.2010303@oracle.com> <7fa929c1-108a-6aec-f44c-0fcfc0b49aad@oracle.com> Message-ID: <1506805340.27149.261.camel@oracle.com> Hi Coleen, Thanks for the review. I will file an RFE for removing the _ptr variants of Atomic. /Erik On fre, 2017-09-29 at 15:17 -0400, coleen.phillimore at oracle.com wrote: > Erik, This change looks good to me. > > Do we want to deprecate xchg_ptr like cmpxchg_ptr ??? Is there an > RFE? > filed for that? > > On 9/29/17 6:45 AM, Erik ?sterlund wrote: > > > > Hi Kim, > > > > Thanks for looking at this. > > > > Incremental webrev: > > http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00_01/ > > > > Full webrev: > > http://cr.openjdk.java.net/~eosterlund/8187977/webrev.01/ > > > > On 2017-09-28 21:02, Kim Barrett wrote: > > > > > > > > > > > On Sep 28, 2017, at 7:56 AM, Erik ?sterlund? > > > > wrote: > > > > > > > > Hi all, > > > > > > > > The time has come to generalize more atomics. > > > > > > > > I have modelled Atomic::xchg to systematically do what > > > > Atomic::cmpxchg > > > > did but for xchg. > > > > > > > > Bug: > > > > https://bugs.openjdk.java.net/browse/JDK-8187977 > > > > > > > > Webrev: > > > > http://cr.openjdk.java.net/~eosterlund/8187977/webrev.00/ > > > > > > > > Testing: mach5 hs-tier3 and JPRT > > > > > > > > Thanks, > > > > /Erik > > > ================================================================= > > > =============? > > > > > > src/hotspot/share/runtime/atomic.hpp > > > ? 312?? // A default definition is not provided, so > > > specializations? > > > must be > > > ? 313?? // provided for: > > > ? 314?? //?? T operator()(T, T volatile*) const > > > > > > That description is not correct. The default definition of > > > PlatformXchg is at line 410. And it makes no sense to talk about > > > specializing a function for a class that doesn't exist. This > > > should > > > instead be using similar wording to that in the description of > > > PlatformCmpxchg. > > Fixed. > > > > > > > > ================================================================= > > > =============? > > > > > > src/hotspot/share/runtime/atomic.hpp > > > > > > There are a lot of similarities between the handling of the > > > exchange_value by cmpxchg and xchg, and I expect Atomic::store > > > and > > > OrderAccess::release_store and variants to also be similar. > > > (Actually, > > > xchg and store may look *very* similar.) It would be nice to > > > think > > > about whether there's some refactoring that could be used to > > > reduce > > > the amount of code involved. I don't have a specific suggestion > > > yet > > > though.? For now, this is just something to think about. > > I would not like to mix xchg and load/store implementations in > > Atomic.? > > The reason for that is that I would rather share the > > implementation? > > between Atomic::load/store and OrderAccess::*load/store*, because > > they? > > are even more similar. I have that already in my patch queue. > > > I think I would have to see understand what the duplication is before > I? > want to see more lines of metaprogramming tricks. > > Do you mean the translate recover/decay here? > > 697 template > ? 698 struct Atomic::XchgImpl< > ? 699???T, T, > ? 700???typename > EnableIf::value>::type> > ? 701???VALUE_OBJ_CLASS_SPEC > ? 702 { > ??703???T operator()(T exchange_value, T volatile* dest) const { > ? 704?????typedef PrimitiveConversions::Translate Translator; > ? 705?????typedef typename Translator::Decayed Decayed; > ? 706?????STATIC_ASSERT(sizeof(T) == sizeof(Decayed)); > ? 707?????return Translator::recover( > ? 708???????xchg(Translator::decay(exchange_value), > ? 709????????????reinterpret_cast(dest))); > ? 710???} > > > I can imagine this can only have more levels of logical indirection > if? > generalized more. > > I have to admit that I hate the VALUE_OBJ_CLASS_SPEC in these > classes.??? > There's already enough decoder ring work for each line without this? > distracting (mostly useless) macro here.?? Originally, this meant > that? > this class is embedded in another.? It doesn't really help the code > in? > any way here.?? Global operator new already has an assert in the > very? > unlikely case that somebody did a > > Atomic::XchgImpl* x = new Atomic::XchgImpl();?? > //? > did I write that correctly? > > > > > > > > > ================================================================= > > > =============? > > > > > > src/hotspot/share/compiler/compileBroker.hpp > > > ? 335???? Atomic::xchg(jint(shutdown_compilation),? > > > &_should_compile_new_jobs); > > > [Added jint conversion.] > > > > > > Pre-existing: > > > > > > Why is this an Atomic::xchg at all, since the old value isn't > > > used. > > > Seems like it could be a release_store. > > > > > > There also seems to be some data typing problems around > > > _should_compile_new_jobs.? Shouldn't that variable be a > > > CompilerActivity?? That wouldn't have worked previously, and > > > we're > > > probably still not ready for it if the xchg gets changed to a > > > release_store, but eventually... so there probably ought to be a > > > bug > > > for it. > > I will file a bug for that to be reconsidered later on. > > > I would leave this to the compiler group to look at. > > Thanks, > Coleen > > > > > > > > > ================================================================= > > > =============? > > > > > > src/hotspot/os_cpu/solaris_x86/atomic_solaris_x86.hpp > > > ? 146 template<> > > > ? 147 template > > > ? 148 inline T Atomic::PlatformXchg<8>::operator()(T > > > exchange_value, > > > > > > Please move this definition up near the PlatformXchg<4> > > > definition, > > > e.g. around line 96. > > Fixed. > > > > > > > > > > > ================================================================= > > > =============? > > > > > > src/hotspot/os_cpu/linux_arm/atomic_linux_arm.hpp > > > > > > I would prefer the two PlatformXchg specializations be adjacent.? > > > If > > > the 4byte specialization was after PlatformAdd<8>, reviewing > > > would > > > have been easier, and the different Add would be adjacent and the > > > different Xchg would be adjacent.? The only cost is an extra > > > #ifdef > > > AARCH64 block around the 8byte Xchg. > > Fixed. > > > > > > > > ================================================================= > > > =============? > > > > > > src/hotspot/os_cpu/bsd_zero/atomic_bsd_zero.hpp > > > ? 216?? return xchg_using_helper(arm_lock_test_and_set,? > > > exchange_value, dest); > > > ? 219?? return xchg_using_helper(m68k_lock_test_and_set,? > > > exchange_value, dest); > > > > > > arm/m68k_lock_test_and_set expect their arguments in the other > > > order, > > > and need to be updated.? There was a similar change for cmpxchg. > > > > > > Same problem in atomic_linux_zero.hpp. > > Fixed. > > > > Thanks for the review. > > > > /Erik