From coleen.phillimore at oracle.com Fri Sep 1 00:03:31 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 31 Aug 2017 20:03:31 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A804F8.9000501@oracle.com> References: <59A804F8.9000501@oracle.com> Message-ID: <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> Hi, I'm trying to parse the templates to review this but maybe it's convention but decoding these with parameters that are single capital letters make reading the template very difficult.? There are already a lot of non-alphanumeric characters.?? When the letter is T, that is expected by convention, but D or especially I makes it really hard.?? Can these be normalized to all use T when there is only one template parameter?? It'll be clear that T* is a pointer and T is an integer without having it be P. +template +struct Atomic::IncImpl::value>::type> VALUE_OBJ_CLASS_SPEC { + void operator()(I volatile* dest) const { + typedef IntegralConstant Adjustment; + typedef PlatformInc PlatformOp; + PlatformOp()(dest); + } +}; This one isn't as difficult, because it's short, but it would be faster to understand with T. +template +struct Atomic::IncImpl::value>::type> VALUE_OBJ_CLASS_SPEC { + void operator()(T volatile* dest) const { + typedef IntegralConstant Adjustment; + typedef PlatformInc PlatformOp; + PlatformOp()(dest); + } +}; +template<> +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { + void operator()(jshort volatile* dest) const { + add(jshort(1), dest); + } +}; Did I already ask if this could be changed to u2 rather than jshort?? Or is that the follow-on RFE? +// Helper for platforms wanting a constant adjustment. +template +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { + typedef PlatformInc Derived; I can't find the caller of this.? Is it really a lot faster than having the platform independent add(1, T) / add(-1, T) to make all this code worth having?? How is this called?? I couldn't parse the trick.? Atomic::inc() is always a "constant adjustment" so I'm confused about what the comment means and what motivates all the asm code.?? Do these platform implementations exist because they don't have twos complement for integer representation?? really? Also, the function name This() is really disturbing and distracting.? Can it be called some verb() representing what it does?? cast_to_derived()? + template + void operator()(I volatile* dest) const { + This()->template inc(dest); + } I didn't know you could put "template" there.?? What does this call? Rather than I for integer case, and P for pointer case, can you add a one line comment above this like: // Helper for integer types and // Helper for pointer types Small local comments would be really helpful for many of these functions.?? Just to get more english words in there...? Since Kim's on vacation can you help me understand this code and add comments so I remember the reasons for some of this? Thanks! Coleen On 8/31/17 8:45 AM, Erik ?sterlund wrote: > Hi everyone, > > Bug ID: > https://bugs.openjdk.java.net/browse/JDK-8186838 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ > > The time has come for the next step in generalizing Atomic with > templates. Today I will focus on Atomic::inc/dec. > > I have tried to mimic the new Kim style that seems to have been > universally accepted. Like Atomic::add and Atomic::cmpxchg, the > structure looks like this: > > Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object > that performs some basic type checks. > Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define > the operation arbitrarily for a given platform. The default > implementation if not specialized for a platform is to call > Atomic::add. So only platforms that want to do something different > than that as an optimization have to provide a specialization. > Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec > to be more optimized may inherit from a helper class > IncUsingConstant/DecUsingConstant. This helper helps performing the > necessary computation what the increment/decrement should be after > pointer scaling using CRTP. The PlatformInc/PlatformDec operation then > only needs to define an inc/dec member function, and will then get all > the context information necessary to generate a more optimized > implementation. Easy peasy. > > It is worth noticing that the generalized Atomic::dec operation > assumes a two's complement integer machine and potentially sends the > unary negative of a potentially unsigned type to Atomic::add. I have > the following comments about this: > 1) We already assume in other code that two's complement integers must > be present. > 2) A machine that does not have two's complement integers may still > simply provide a specialization that solves the problem in a different > way. > 3) The alternative that does not make assumptions about that would use > the good old IntegerTypes::cast_to_signed metaprogramming stuff, and I > seem to recall we thought that was a bit too involved and complicated. > This is the reason why I have chosen to use unary minus on the > potentially unsigned type in the shared helper code that sends the > decrement as an addend to Atomic::add. > > It would also be nice if somebody with access to PPC and s390 machines > could try out the relevant changes there so I do not accidentally > break those platforms. I have blind-coded the addition of the > immediate values passed in to the inline assembly in a way that I > think looks like it should work. > > Testing: > RBT hs-tier3, JPRT --testset hotspot > > Thanks, > /Erik From david.holmes at oracle.com Fri Sep 1 00:49:43 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 1 Sep 2017 10:49:43 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A804F8.9000501@oracle.com> References: <59A804F8.9000501@oracle.com> Message-ID: <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> Hi Erik, Sorry but this one is really losing me. What is the role of Adjustment ?? How are inc/dec anything but "using constant" ?? Why do we special case jshort?? This is indecipherable to normal people ;-) This()->template inc(dest); For something as trivial as adding or subtracting 1 the template machinations here are just mind boggling! Cheers, David On 31/08/2017 10:45 PM, Erik ?sterlund wrote: > Hi everyone, > > Bug ID: > https://bugs.openjdk.java.net/browse/JDK-8186838 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ > > The time has come for the next step in generalizing Atomic with > templates. Today I will focus on Atomic::inc/dec. > > I have tried to mimic the new Kim style that seems to have been > universally accepted. Like Atomic::add and Atomic::cmpxchg, the > structure looks like this: > > Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object > that performs some basic type checks. > Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define > the operation arbitrarily for a given platform. The default > implementation if not specialized for a platform is to call Atomic::add. > So only platforms that want to do something different than that as an > optimization have to provide a specialization. > Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to > be more optimized may inherit from a helper class > IncUsingConstant/DecUsingConstant. This helper helps performing the > necessary computation what the increment/decrement should be after > pointer scaling using CRTP. The PlatformInc/PlatformDec operation then > only needs to define an inc/dec member function, and will then get all > the context information necessary to generate a more optimized > implementation. Easy peasy. > > It is worth noticing that the generalized Atomic::dec operation assumes > a two's complement integer machine and potentially sends the unary > negative of a potentially unsigned type to Atomic::add. I have the > following comments about this: > 1) We already assume in other code that two's complement integers must > be present. > 2) A machine that does not have two's complement integers may still > simply provide a specialization that solves the problem in a different way. > 3) The alternative that does not make assumptions about that would use > the good old IntegerTypes::cast_to_signed metaprogramming stuff, and I > seem to recall we thought that was a bit too involved and complicated. > This is the reason why I have chosen to use unary minus on the > potentially unsigned type in the shared helper code that sends the > decrement as an addend to Atomic::add. > > It would also be nice if somebody with access to PPC and s390 machines > could try out the relevant changes there so I do not accidentally break > those platforms. I have blind-coded the addition of the immediate values > passed in to the inline assembly in a way that I think looks like it > should work. > > Testing: > RBT hs-tier3, JPRT --testset hotspot > > Thanks, > /Erik From rohitarulraj at gmail.com Fri Sep 1 04:57:34 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 1 Sep 2017 10:27:34 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

Message-ID: On Fri, Sep 1, 2017 at 3:01 AM, David Holmes wrote: > Hi Rohit, > > I think the patch needs updating for jdk10 as I already see a lot of logic > around UseSHA in vm_version_x86.cpp. > > Thanks, > David > Thanks David, I will update the patch wrt JDK10 source base, test and resubmit for review. Regards, Rohit > > On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >> >> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >> wrote: >>> >>> Hi Rohit, >>> >>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>> >>>> >>>> I would like an volunteer to review this patch (openJDK9) which sets >>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>> the commit process. >>>> >>>> Webrev: >>>> >>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>> >>> >>> >>> Unfortunately patches can not be accepted from systems outside the >>> OpenJDK >>> infrastructure and ... >>> >>>> I have also attached the patch (hg diff -g) for reference. >>> >>> >>> >>> ... unfortunately patches tend to get stripped by the mail servers. If >>> the >>> patch is small please include it inline. Otherwise you will need to find >>> an >>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>> >> >>>> 3) I have done regression testing using jtreg ($make default) and >>>> didnt find any regressions. >>> >>> >>> >>> Sounds good, but until I see the patch it is hard to comment on testing >>> requirements. >>> >>> Thanks, >>> David >> >> >> Thanks David, >> Yes, it's a small patch. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1051,6 +1051,22 @@ >> } >> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >> } >> + if (supports_sha()) { >> + if (FLAG_IS_DEFAULT(UseSHA)) { >> + FLAG_SET_DEFAULT(UseSHA, true); >> + } >> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >> UseSHA512Intrinsics) { >> + if (!FLAG_IS_DEFAULT(UseSHA) || >> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + warning("SHA instructions are not available on this CPU"); >> + } >> + FLAG_SET_DEFAULT(UseSHA, false); >> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> >> // some defaults for AMD family 15h >> if ( cpu_family() == 0x15 ) { >> @@ -1072,11 +1088,43 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + UseXMMForArrayCopy = true; >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + UseUnalignedLoadStores = true; >> + } >> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >> + UseBMI2Instructions = true; >> + } >> + if (MaxVectorSize > 32) { >> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >> + } >> + if (UseSHA) { >> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } else if (UseSHA512Intrinsics) { >> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >> functions not available on this CPU."); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2()) { >> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -513,6 +513,16 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> } >> // Intel features. >> if(is_intel()) { >> >> Regards, >> Rohit >> > From rohitarulraj at gmail.com Fri Sep 1 05:14:44 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 1 Sep 2017 10:44:44 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <5f8def30-1554-29a5-dde7-62b9940d0161@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

<5f8def30-1554-29a5-dde7-62b9940d0161@oracle.com> Message-ID: Hello Vladimir, > But it also mean that AMD will have to do Java testing for this new platform > and be responsible for it. Can you please elaborate on this a little more? What all Java test suites would you like us to test from our end? > In a future we may forward this CPU related problems to you to analyze and > fix. Sure, looking forward to it. Regards, Rohit > Regards, > Vladimir > > > On 8/31/17 2:31 PM, David Holmes wrote: >> >> Hi Rohit, >> >> I think the patch needs updating for jdk10 as I already see a lot of logic >> around UseSHA in vm_version_x86.cpp. >> >> Thanks, >> David >> >> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>> >>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>> wrote: >>>> >>>> Hi Rohit, >>>> >>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>> the commit process. >>>>> >>>>> Webrev: >>>>> >>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>> >>>> >>>> >>>> Unfortunately patches can not be accepted from systems outside the >>>> OpenJDK >>>> infrastructure and ... >>>> >>>>> I have also attached the patch (hg diff -g) for reference. >>>> >>>> >>>> >>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>> the >>>> patch is small please include it inline. Otherwise you will need to find >>>> an >>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>> >>> >>>>> 3) I have done regression testing using jtreg ($make default) and >>>>> didnt find any regressions. >>>> >>>> >>>> >>>> Sounds good, but until I see the patch it is hard to comment on testing >>>> requirements. >>>> >>>> Thanks, >>>> David >>> >>> >>> Thanks David, >>> Yes, it's a small patch. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1051,6 +1051,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1072,11 +1088,43 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + UseXMMForArrayCopy = true; >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + UseUnalignedLoadStores = true; >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + UseBMI2Instructions = true; >>> + } >>> + if (MaxVectorSize > 32) { >>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -513,6 +513,16 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> } >>> // Intel features. >>> if(is_intel()) { >>> >>> Regards, >>> Rohit >>> > From vladimir.kozlov at oracle.com Fri Sep 1 07:19:18 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Sep 2017 00:19:18 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

<5f8def30-1554-29a5-dde7-62b9940d0161@oracle.com> Message-ID: <95adc793-0faf-9379-ae56-8dae1d5c498f@oracle.com> On 8/31/17 10:14 PM, Rohit Arul Raj wrote: > Hello Vladimir, > >> But it also mean that AMD will have to do Java testing for this new platform >> and be responsible for it. > > Can you please elaborate on this a little more? > What all Java test suites would you like us to test from our end? First, I am talking only about testing on your platform. In this case it is AMD 17h. You need to build and use fastdebug JVM for testing: configure --with-debug-level=fastdebug You need to make sure to run hotspot and jdk jtreg tests. At least next set of tests: make test JOBS=1 TEST_JOBS=1 TEST="hotspot_compiler hotspot_gc hotspot_runtime hotspot_serviceability hotspot_misc jdk_util jdk_lang" It will take time. You can try to increase JOBS=1 and TEST_JOBS=1 numbers to run tests in parallel but depending on memory and swap sizes it may not work. In addition to that would be nice if you track performance changes with specjvm2008 and specjbb2015 on your cpu to avoid regression when you apply new changes or pull changes from OpenJDK. If you have questions, please ask. > >> In a future we may forward this CPU related problems to you to analyze and >> fix. > > Sure, looking forward to it. Best regards, Vladimir > > Regards, > Rohit > >> Regards, >> Vladimir >> >> >> On 8/31/17 2:31 PM, David Holmes wrote: >>> >>> Hi Rohit, >>> >>> I think the patch needs updating for jdk10 as I already see a lot of logic >>> around UseSHA in vm_version_x86.cpp. >>> >>> Thanks, >>> David >>> >>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>> >>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>> wrote: >>>>> >>>>> Hi Rohit, >>>>> >>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>> the commit process. >>>>>> >>>>>> Webrev: >>>>>> >>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>> >>>>> >>>>> >>>>> Unfortunately patches can not be accepted from systems outside the >>>>> OpenJDK >>>>> infrastructure and ... >>>>> >>>>>> I have also attached the patch (hg diff -g) for reference. >>>>> >>>>> >>>>> >>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>> the >>>>> patch is small please include it inline. Otherwise you will need to find >>>>> an >>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>> >>>> >>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>> didnt find any regressions. >>>>> >>>>> >>>>> >>>>> Sounds good, but until I see the patch it is hard to comment on testing >>>>> requirements. >>>>> >>>>> Thanks, >>>>> David >>>> >>>> >>>> Thanks David, >>>> Yes, it's a small patch. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1051,6 +1051,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1072,11 +1088,43 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + UseXMMForArrayCopy = true; >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + UseUnalignedLoadStores = true; >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + UseBMI2Instructions = true; >>>> + } >>>> + if (MaxVectorSize > 32) { >>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -513,6 +513,16 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> >>>> Regards, >>>> Rohit >>>> >> From erik.osterlund at oracle.com Fri Sep 1 08:40:06 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 10:40:06 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> References: <59A804F8.9000501@oracle.com> <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> Message-ID: <59A91CE6.2080206@oracle.com> Hi Coleen, Thank you for taking your time to review this. On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote: > > Hi, I'm trying to parse the templates to review this but maybe it's > convention but decoding these with parameters that are single capital > letters make reading the template very difficult. There are already a > lot of non-alphanumeric characters. When the letter is T, that is > expected by convention, but D or especially I makes it really hard. > Can these be normalized to all use T when there is only one template > parameter? It'll be clear that T* is a pointer and T is an integer > without having it be P. I apologize the names of the template parameters are hard to understand. For what it's worth, I am only consistently applying Kim's conventions here. It seemed like a bad idea to violate conventions already set up - that would arguably be more confusing. The convention from earlier work by Kim is: D: Type of destination I: Operand type that has to be an integral type P: Operand type that is a pointer element type T: Generic operand type, may be integral or pointer type Personally, I do not mind this convention. It is more specific and annotates things we know about the type into the name of the type. Do you want me to: 1) Keep the convention, now that I have explained what the convention is and why it is your friend 2) Break the convention for this change only making the naming inconsistent 3) Change the convention throughout consistently, including all earlier work from Kim > > +template > +struct Atomic::IncImpl EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { > + void operator()(I volatile* dest) const { > + typedef IntegralConstant Adjustment; > + typedef PlatformInc PlatformOp; > + PlatformOp()(dest); > + } > +}; > > This one isn't as difficult, because it's short, but it would be > faster to understand with T. > > +template > +struct Atomic::IncImpl EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { > + void operator()(T volatile* dest) const { > + typedef IntegralConstant Adjustment; > + typedef PlatformInc PlatformOp; > + PlatformOp()(dest); > + } > +}; > > +template<> > +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { > + void operator()(jshort volatile* dest) const { > + add(jshort(1), dest); > + } > +}; > > > Did I already ask if this could be changed to u2 rather than jshort? > Or is that the follow-on RFE? That is a follow-on RFE. > +// Helper for platforms wanting a constant adjustment. > +template > +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { > + typedef PlatformInc Derived; > > > I can't find the caller of this. Is it really a lot faster than > having the platform independent add(1, T) / add(-1, T) to make all > this code worth having? How is this called? I couldn't parse the > trick. Atomic::inc() is always a "constant adjustment" so I'm > confused about what the comment means and what motivates all the asm > code. Do these platform implementations exist because they don't > have twos complement for integer representation? really? This is used by some x86, PPC and s390 platforms. Personally I question its usefulness for x86. I believe it might be one of those things were we ran some benchmarks a decade ago and concluded that it was slightly faster to have a slimmed path for Atomic::inc rather than reusing Atomic::add. I did not initially want to bring this up as it seems like none of my business, but now that the question has been asked about differences, I could not help but notice the advertised "leading sync" convention of Atomic::inc on PPC is not respected. That is, there is no "sync" fence before the atomic increment, as required by the specified semantics. There is not even a leading "lwsync". The corresponding Atomic::add operation though, does have leading lwsync (unlike Atomic::inc). Now this should arguably be reinforced to sync rather than lwsync to respect the advertised semantics of both Atomic::add and Atomic::inc on PPC. Hopefully that statement will not turn into a long unrelated mailing thread... Conclusively though, there is definitely a substantial difference in the fencing comparing the PPC implementation of Atomic::inc to Atomic::add. Whether either one of them conforms to intended semantics or not is a different matter - one that I was hoping not to have to deal with in this RFE as I am merely templateifying what was already there, without judging the existing specializations. And it is my observation that as the code looks now, we would incur a bunch of more fencing compared to what the code does today on PPC. > Also, the function name This() is really disturbing and distracting. > Can it be called some verb() representing what it does? > cast_to_derived()? > > + template > + void operator()(I volatile* dest) const { > + This()->template inc(dest); > + } > Yes, I will change the name accordingly as you suggest. > I didn't know you could put "template" there. It is required to put the template keyword before the member function name when calling a template member function with explicit template parameters (as opposed to implicitly inferred template parameters) on a template type. > What does this call? This calls the platform-defined intrinsic that is defined in the platform files - the one that contains the inline assembly. > Rather than I for integer case, and P for pointer case, can you add a > one line comment above this like: > // Helper for integer types > and > // Helper for pointer types Or perhaps we could do both? Nevertheless, I will add these comments. But as per the discussion above, I would be happy if we could keep the convention that Kim has already set up for the template type names. > Small local comments would be really helpful for many of these > functions. Just to get more english words in there... Since Kim's > on vacation can you help me understand this code and add comments so I > remember the reasons for some of this? Sure - I will decorate the code with some comments to help understanding. I will send an updated webrev when I get your reply regarding the typename naming convention verdict. Thanks for the review! /Erik > > Thanks! > Coleen > > > On 8/31/17 8:45 AM, Erik ?sterlund wrote: >> Hi everyone, >> >> Bug ID: >> https://bugs.openjdk.java.net/browse/JDK-8186838 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >> >> The time has come for the next step in generalizing Atomic with >> templates. Today I will focus on Atomic::inc/dec. >> >> I have tried to mimic the new Kim style that seems to have been >> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >> structure looks like this: >> >> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >> that performs some basic type checks. >> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >> define the operation arbitrarily for a given platform. The default >> implementation if not specialized for a platform is to call >> Atomic::add. So only platforms that want to do something different >> than that as an optimization have to provide a specialization. >> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec >> to be more optimized may inherit from a helper class >> IncUsingConstant/DecUsingConstant. This helper helps performing the >> necessary computation what the increment/decrement should be after >> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >> then only needs to define an inc/dec member function, and will then >> get all the context information necessary to generate a more >> optimized implementation. Easy peasy. >> >> It is worth noticing that the generalized Atomic::dec operation >> assumes a two's complement integer machine and potentially sends the >> unary negative of a potentially unsigned type to Atomic::add. I have >> the following comments about this: >> 1) We already assume in other code that two's complement integers >> must be present. >> 2) A machine that does not have two's complement integers may still >> simply provide a specialization that solves the problem in a >> different way. >> 3) The alternative that does not make assumptions about that would >> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, >> and I seem to recall we thought that was a bit too involved and >> complicated. >> This is the reason why I have chosen to use unary minus on the >> potentially unsigned type in the shared helper code that sends the >> decrement as an addend to Atomic::add. >> >> It would also be nice if somebody with access to PPC and s390 >> machines could try out the relevant changes there so I do not >> accidentally break those platforms. I have blind-coded the addition >> of the immediate values passed in to the inline assembly in a way >> that I think looks like it should work. >> >> Testing: >> RBT hs-tier3, JPRT --testset hotspot >> >> Thanks, >> /Erik > From erik.osterlund at oracle.com Fri Sep 1 09:29:58 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 11:29:58 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> Message-ID: <59A92896.9010604@oracle.com> Hi David, On 2017-09-01 02:49, David Holmes wrote: > Hi Erik, > > Sorry but this one is really losing me. > > What is the role of Adjustment ?? Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates. The type of the increment/decrement is the type of the destination when the destination is an integral type, otherwise if it is a pointer type, the increment/decrement type is ptrdiff_t. > How are inc/dec anything but "using constant" ?? I was also a bit torn on that name (I assume you are referring to IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted what this platform helper does. I considered calling the helper something with immediate in the name because it is really used to embed the constant as immediate values in inline assembly today. But then again that seemed too specific, as it is not completely obvious platform specializations will use it in that way. One might just want to specialize this to send it into some compiler Atomic::inc intrinsic for example. Do you have any other preferred names? Here are a few possible names for IncUsingConstant: IncUsingScaledConstant IncUsingAdjustedConstant IncUsingPlatformHelper Any favourites? > Why do we special case jshort?? To be consistent with the special case of Atomic::add on jshort. Do you want it removed? > This is indecipherable to normal people ;-) > > This()->template inc(dest); > > For something as trivial as adding or subtracting 1 the template > machinations here are just mind boggling! This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The idea is to devirtualize a virtual call by passing in the derived type as a template parameter to a base class, and then let the base class static_cast to the derived class to devirtualize the call. I hope this explanation sheds some light on what is going on. The same CRTP idiom was used in the Atomic::add implementation in a similar fashion. I will add some comments describing this in the next round after Coleen replies. Thanks for looking at this. /Erik > > Cheers, > David > > On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >> Hi everyone, >> >> Bug ID: >> https://bugs.openjdk.java.net/browse/JDK-8186838 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >> >> The time has come for the next step in generalizing Atomic with >> templates. Today I will focus on Atomic::inc/dec. >> >> I have tried to mimic the new Kim style that seems to have been >> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >> structure looks like this: >> >> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >> that performs some basic type checks. >> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >> define the operation arbitrarily for a given platform. The default >> implementation if not specialized for a platform is to call >> Atomic::add. So only platforms that want to do something different >> than that as an optimization have to provide a specialization. >> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec >> to be more optimized may inherit from a helper class >> IncUsingConstant/DecUsingConstant. This helper helps performing the >> necessary computation what the increment/decrement should be after >> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >> then only needs to define an inc/dec member function, and will then >> get all the context information necessary to generate a more >> optimized implementation. Easy peasy. >> >> It is worth noticing that the generalized Atomic::dec operation >> assumes a two's complement integer machine and potentially sends the >> unary negative of a potentially unsigned type to Atomic::add. I have >> the following comments about this: >> 1) We already assume in other code that two's complement integers >> must be present. >> 2) A machine that does not have two's complement integers may still >> simply provide a specialization that solves the problem in a >> different way. >> 3) The alternative that does not make assumptions about that would >> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, >> and I seem to recall we thought that was a bit too involved and >> complicated. >> This is the reason why I have chosen to use unary minus on the >> potentially unsigned type in the shared helper code that sends the >> decrement as an addend to Atomic::add. >> >> It would also be nice if somebody with access to PPC and s390 >> machines could try out the relevant changes there so I do not >> accidentally break those platforms. I have blind-coded the addition >> of the immediate values passed in to the inline assembly in a way >> that I think looks like it should work. >> >> Testing: >> RBT hs-tier3, JPRT --testset hotspot >> >> Thanks, >> /Erik From rohitarulraj at gmail.com Fri Sep 1 09:34:52 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 1 Sep 2017 15:04:52 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <95adc793-0faf-9379-ae56-8dae1d5c498f@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

<5f8def30-1554-29a5-dde7-62b9940d0161@oracle.com> <95adc793-0faf-9379-ae56-8dae1d5c498f@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 12:49 PM, Vladimir Kozlov wrote: > On 8/31/17 10:14 PM, Rohit Arul Raj wrote: >> >> Hello Vladimir, >> >>> But it also mean that AMD will have to do Java testing for this new >>> platform >>> and be responsible for it. >> >> >> Can you please elaborate on this a little more? >> What all Java test suites would you like us to test from our end? > > > First, I am talking only about testing on your platform. In this case it is > AMD 17h. > > You need to build and use fastdebug JVM for testing: configure > --with-debug-level=fastdebug > > You need to make sure to run hotspot and jdk jtreg tests. At least next set > of tests: > > make test JOBS=1 TEST_JOBS=1 TEST="hotspot_compiler hotspot_gc > hotspot_runtime hotspot_serviceability hotspot_misc jdk_util jdk_lang" > > It will take time. You can try to increase JOBS=1 and TEST_JOBS=1 numbers to > run tests in parallel but depending on memory and swap sizes it may not > work. Yes, We will do that. > In addition to that would be nice if you track performance changes with > specjvm2008 and specjbb2015 on your cpu to avoid regression when you apply > new changes or pull changes from OpenJDK. We do run SPECjbb2015. Regarding SPECjvm2008, we tried the base run but the results are pretty inconsistent. The base throughput varies from run to run (~30%). This is the command we use to generate the numbers (startup.compiler.sunflow & compiler.sunflow have been disabled). Is there any benchmark option we may be missing? java -jar SPECjvm2008.jar startup.helloworld startup.compiler.compiler startup.compress startup.crypto.aes startup.crypto.rsa startup.crypto.signverify startup.mpegaudio startup.scimark.fft startup.scimark.lu startup.scimark.monte_carlo startup.scimark.sor startup.scimark.sparse startup.serial startup.sunflow startup.xml.transform startup.xml.validation compiler.compiler compress crypto.aes crypto.rsa crypto.signverify derby mpegaudio scimark.fft.large scimark.lu.large scimark.sor.large scimark.sparse.large scimark.fft.small scimark.lu.small scimark.sor.small scimark.sparse.small scimark.monte_carlo serial sunflow xml.transform xml.validation Regards, Rohit > If you have questions, please ask. > >> >>> In a future we may forward this CPU related problems to you to analyze >>> and >>> fix. >> >> >> Sure, looking forward to it. > > > Best regards, > Vladimir > > >> >> Regards, >> Rohit >> >>> Regards, >>> Vladimir >>> >>> >>> On 8/31/17 2:31 PM, David Holmes wrote: >>>> >>>> >>>> Hi Rohit, >>>> >>>> I think the patch needs updating for jdk10 as I already see a lot of >>>> logic >>>> around UseSHA in vm_version_x86.cpp. >>>> >>>> Thanks, >>>> David >>>> >>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>>> the commit process. >>>>>>> >>>>>>> Webrev: >>>>>>> >>>>>>> >>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>> OpenJDK >>>>>> infrastructure and ... >>>>>> >>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>>> the >>>>>> patch is small please include it inline. Otherwise you will need to >>>>>> find >>>>>> an >>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>> >>>>> >>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>> didnt find any regressions. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>> testing >>>>>> requirements. >>>>>> >>>>>> Thanks, >>>>>> David >>>>> >>>>> >>>>> >>>>> Thanks David, >>>>> Yes, it's a small patch. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1051,6 +1051,22 @@ >>>>> } >>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>> } >>>>> + if (supports_sha()) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>> + } >>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>> UseSHA512Intrinsics) { >>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + warning("SHA instructions are not available on this CPU"); >>>>> + } >>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> >>>>> // some defaults for AMD family 15h >>>>> if ( cpu_family() == 0x15 ) { >>>>> @@ -1072,11 +1088,43 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + UseXMMForArrayCopy = true; >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>> { >>>>> + UseUnalignedLoadStores = true; >>>>> + } >>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>> + UseBMI2Instructions = true; >>>>> + } >>>>> + if (MaxVectorSize > 32) { >>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> + } >>>>> + if (UseSHA) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } else if (UseSHA512Intrinsics) { >>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>> functions not available on this CPU."); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2()) { >>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -513,6 +513,16 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> >>>>> Regards, >>>>> Rohit >>>>> >>> > From glaubitz at physik.fu-berlin.de Fri Sep 1 09:35:21 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Fri, 1 Sep 2017 11:35:21 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A804F8.9000501@oracle.com> References: <59A804F8.9000501@oracle.com> Message-ID: <602b39a1-85e3-0e34-ff0c-c9076885c206@physik.fu-berlin.de> On 08/31/2017 02:45 PM, Erik ?sterlund wrote: > It would also be nice if somebody with access to PPC and s390 machines > could try out the relevant changes there so I do not accidentally break > those platforms. And linux-zero and linux-sparc, of course :). I will test that. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From david.holmes at oracle.com Fri Sep 1 10:34:22 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 1 Sep 2017 20:34:22 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A92896.9010604@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> Message-ID: <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> Hi Erik, I just wanted to add that I would expect the cmpxchg, add and inc, Atomic API's to all require similar basic structure for manipulating types/values etc, yet all three seem to have quite different structures that I find very confusing. I'm still at a loss to fathom the CRTP and the hoops we seemingly have to jump through just to add or subtract 1!!! Cheers, David On 1/09/2017 7:29 PM, Erik ?sterlund wrote: > Hi David, > > On 2017-09-01 02:49, David Holmes wrote: >> Hi Erik, >> >> Sorry but this one is really losing me. >> >> What is the role of Adjustment ?? > > Adjustment represents the increment/decrement value as an > IntegralConstant - your template friend for passing around a constant > with both a specified type and value in templates. The type of the > increment/decrement is the type of the destination when the destination > is an integral type, otherwise if it is a pointer type, the > increment/decrement type is ptrdiff_t. > >> How are inc/dec anything but "using constant" ?? > > I was also a bit torn on that name (I assume you are referring to > IncUsingConstant/DecUsingConstant). It was hard to find a name that > depicted what this platform helper does. I considered calling the helper > something with immediate in the name because it is really used to embed > the constant as immediate values in inline assembly today. But then > again that seemed too specific, as it is not completely obvious platform > specializations will use it in that way. One might just want to > specialize this to send it into some compiler Atomic::inc intrinsic for > example. Do you have any other preferred names? Here are a few possible > names for IncUsingConstant: > > IncUsingScaledConstant > IncUsingAdjustedConstant > IncUsingPlatformHelper > > Any favourites? > >> Why do we special case jshort?? > > To be consistent with the special case of Atomic::add on jshort. Do you > want it removed? > >> This is indecipherable to normal people ;-) >> >> ?This()->template inc(dest); >> >> For something as trivial as adding or subtracting 1 the template >> machinations here are just mind boggling! > > This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The > idea is to devirtualize a virtual call by passing in the derived type as > a template parameter to a base class, and then let the base class > static_cast to the derived class to devirtualize the call. I hope this > explanation sheds some light on what is going on. The same CRTP idiom > was used in the Atomic::add implementation in a similar fashion. > > I will add some comments describing this in the next round after Coleen > replies. > > Thanks for looking at this. > > /Erik > >> >> Cheers, >> David >> >> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>> Hi everyone, >>> >>> Bug ID: >>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>> >>> The time has come for the next step in generalizing Atomic with >>> templates. Today I will focus on Atomic::inc/dec. >>> >>> I have tried to mimic the new Kim style that seems to have been >>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>> structure looks like this: >>> >>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>> that performs some basic type checks. >>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>> define the operation arbitrarily for a given platform. The default >>> implementation if not specialized for a platform is to call >>> Atomic::add. So only platforms that want to do something different >>> than that as an optimization have to provide a specialization. >>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec >>> to be more optimized may inherit from a helper class >>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>> necessary computation what the increment/decrement should be after >>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >>> then only needs to define an inc/dec member function, and will then >>> get all the context information necessary to generate a more >>> optimized implementation. Easy peasy. >>> >>> It is worth noticing that the generalized Atomic::dec operation >>> assumes a two's complement integer machine and potentially sends the >>> unary negative of a potentially unsigned type to Atomic::add. I have >>> the following comments about this: >>> 1) We already assume in other code that two's complement integers >>> must be present. >>> 2) A machine that does not have two's complement integers may still >>> simply provide a specialization that solves the problem in a >>> different way. >>> 3) The alternative that does not make assumptions about that would >>> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, >>> and I seem to recall we thought that was a bit too involved and >>> complicated. >>> This is the reason why I have chosen to use unary minus on the >>> potentially unsigned type in the shared helper code that sends the >>> decrement as an addend to Atomic::add. >>> >>> It would also be nice if somebody with access to PPC and s390 >>> machines could try out the relevant changes there so I do not >>> accidentally break those platforms. I have blind-coded the addition >>> of the immediate values passed in to the inline assembly in a way >>> that I think looks like it should work. >>> >>> Testing: >>> RBT hs-tier3, JPRT --testset hotspot >>> >>> Thanks, >>> /Erik > From neugens at redhat.com Fri Sep 1 10:43:41 2017 From: neugens at redhat.com (Mario Torre) Date: Fri, 1 Sep 2017 12:43:41 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A92896.9010604@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 11:29 AM, Erik ?sterlund wrote: > Hi David, > > On 2017-09-01 02:49, David Holmes wrote: >> >> Hi Erik, >> >> Sorry but this one is really losing me. >> >> What is the role of Adjustment ?? > > > Adjustment represents the increment/decrement value as an IntegralConstant - > your template friend for passing around a constant with both a specified > type and value in templates. The type of the increment/decrement is the type > of the destination when the destination is an integral type, otherwise if it > is a pointer type, the increment/decrement type is ptrdiff_t. > >> How are inc/dec anything but "using constant" ?? > > > I was also a bit torn on that name (I assume you are referring to > IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted > what this platform helper does. I considered calling the helper something > with immediate in the name because it is really used to embed the constant > as immediate values in inline assembly today. But then again that seemed too > specific, as it is not completely obvious platform specializations will use > it in that way. One might just want to specialize this to send it into some > compiler Atomic::inc intrinsic for example. Do you have any other preferred > names? Here are a few possible names for IncUsingConstant: > > IncUsingScaledConstant > IncUsingAdjustedConstant > IncUsingPlatformHelper > > Any favourites? > >> Why do we special case jshort?? > > > To be consistent with the special case of Atomic::add on jshort. Do you want > it removed? > >> This is indecipherable to normal people ;-) >> >> This()->template inc(dest); >> >> For something as trivial as adding or subtracting 1 the template >> machinations here are just mind boggling! > > > This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The > idea is to devirtualize a virtual call by passing in the derived type as a > template parameter to a base class, and then let the base class static_cast > to the derived class to devirtualize the call. I hope this explanation sheds > some light on what is going on. The same CRTP idiom was used in the > Atomic::add implementation in a similar fashion. > > I will add some comments describing this in the next round after Coleen > replies. > Isn't that a lot more slower than the current inline? BTW, I think I see what those magic constants are (4, 8... rings a bell ;), but I think a define here could make things more readable. Cheers, Mario From erik.osterlund at oracle.com Fri Sep 1 10:49:55 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 12:49:55 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> Message-ID: <59A93B53.9010505@oracle.com> Hi David, The shared structure for all operations is the following: An Atomic::something call creates a SomethingImpl function object that performs some basic type checking and then forwards the call straight to a PlatformSomething function object. This PlatformSomething object could decide to do anything. But to make life easier, it may inherit from a shared SomethingHelper function object with CRTP that calls back into the PlatformSomething function object to emit inline assembly. Hope this explanation helps understanding the intended structure of this work. Thanks, /Erik On 2017-09-01 12:34, David Holmes wrote: > Hi Erik, > > I just wanted to add that I would expect the cmpxchg, add and inc, > Atomic API's to all require similar basic structure for manipulating > types/values etc, yet all three seem to have quite different > structures that I find very confusing. I'm still at a loss to fathom > the CRTP and the hoops we seemingly have to jump through just to add > or subtract 1!!! > > Cheers, > David > > On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >> Hi David, >> >> On 2017-09-01 02:49, David Holmes wrote: >>> Hi Erik, >>> >>> Sorry but this one is really losing me. >>> >>> What is the role of Adjustment ?? >> >> Adjustment represents the increment/decrement value as an >> IntegralConstant - your template friend for passing around a constant >> with both a specified type and value in templates. The type of the >> increment/decrement is the type of the destination when the >> destination is an integral type, otherwise if it is a pointer type, >> the increment/decrement type is ptrdiff_t. >> >>> How are inc/dec anything but "using constant" ?? >> >> I was also a bit torn on that name (I assume you are referring to >> IncUsingConstant/DecUsingConstant). It was hard to find a name that >> depicted what this platform helper does. I considered calling the >> helper something with immediate in the name because it is really used >> to embed the constant as immediate values in inline assembly today. >> But then again that seemed too specific, as it is not completely >> obvious platform specializations will use it in that way. One might >> just want to specialize this to send it into some compiler >> Atomic::inc intrinsic for example. Do you have any other preferred >> names? Here are a few possible names for IncUsingConstant: >> >> IncUsingScaledConstant >> IncUsingAdjustedConstant >> IncUsingPlatformHelper >> >> Any favourites? >> >>> Why do we special case jshort?? >> >> To be consistent with the special case of Atomic::add on jshort. Do >> you want it removed? >> >>> This is indecipherable to normal people ;-) >>> >>> This()->template inc(dest); >>> >>> For something as trivial as adding or subtracting 1 the template >>> machinations here are just mind boggling! >> >> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. >> The idea is to devirtualize a virtual call by passing in the derived >> type as a template parameter to a base class, and then let the base >> class static_cast to the derived class to devirtualize the call. I >> hope this explanation sheds some light on what is going on. The same >> CRTP idiom was used in the Atomic::add implementation in a similar >> fashion. >> >> I will add some comments describing this in the next round after >> Coleen replies. >> >> Thanks for looking at this. >> >> /Erik >> >>> >>> Cheers, >>> David >>> >>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>> Hi everyone, >>>> >>>> Bug ID: >>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>> >>>> The time has come for the next step in generalizing Atomic with >>>> templates. Today I will focus on Atomic::inc/dec. >>>> >>>> I have tried to mimic the new Kim style that seems to have been >>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>> structure looks like this: >>>> >>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>> object that performs some basic type checks. >>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>> define the operation arbitrarily for a given platform. The default >>>> implementation if not specialized for a platform is to call >>>> Atomic::add. So only platforms that want to do something different >>>> than that as an optimization have to provide a specialization. >>>> Layer 3) Platforms that decide to specialize >>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>> performing the necessary computation what the increment/decrement >>>> should be after pointer scaling using CRTP. The >>>> PlatformInc/PlatformDec operation then only needs to define an >>>> inc/dec member function, and will then get all the context >>>> information necessary to generate a more optimized implementation. >>>> Easy peasy. >>>> >>>> It is worth noticing that the generalized Atomic::dec operation >>>> assumes a two's complement integer machine and potentially sends >>>> the unary negative of a potentially unsigned type to Atomic::add. I >>>> have the following comments about this: >>>> 1) We already assume in other code that two's complement integers >>>> must be present. >>>> 2) A machine that does not have two's complement integers may still >>>> simply provide a specialization that solves the problem in a >>>> different way. >>>> 3) The alternative that does not make assumptions about that would >>>> use the good old IntegerTypes::cast_to_signed metaprogramming >>>> stuff, and I seem to recall we thought that was a bit too involved >>>> and complicated. >>>> This is the reason why I have chosen to use unary minus on the >>>> potentially unsigned type in the shared helper code that sends the >>>> decrement as an addend to Atomic::add. >>>> >>>> It would also be nice if somebody with access to PPC and s390 >>>> machines could try out the relevant changes there so I do not >>>> accidentally break those platforms. I have blind-coded the addition >>>> of the immediate values passed in to the inline assembly in a way >>>> that I think looks like it should work. >>>> >>>> Testing: >>>> RBT hs-tier3, JPRT --testset hotspot >>>> >>>> Thanks, >>>> /Erik >> From erik.osterlund at oracle.com Fri Sep 1 11:42:51 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 13:42:51 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> Message-ID: <59A947BB.3040506@oracle.com> Hi Mario, On 2017-09-01 12:43, Mario Torre wrote: > On Fri, Sep 1, 2017 at 11:29 AM, Erik ?sterlund > wrote: >> Hi David, >> >> On 2017-09-01 02:49, David Holmes wrote: >>> Hi Erik, >>> >>> Sorry but this one is really losing me. >>> >>> What is the role of Adjustment ?? >> >> Adjustment represents the increment/decrement value as an IntegralConstant - >> your template friend for passing around a constant with both a specified >> type and value in templates. The type of the increment/decrement is the type >> of the destination when the destination is an integral type, otherwise if it >> is a pointer type, the increment/decrement type is ptrdiff_t. >> >>> How are inc/dec anything but "using constant" ?? >> >> I was also a bit torn on that name (I assume you are referring to >> IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted >> what this platform helper does. I considered calling the helper something >> with immediate in the name because it is really used to embed the constant >> as immediate values in inline assembly today. But then again that seemed too >> specific, as it is not completely obvious platform specializations will use >> it in that way. One might just want to specialize this to send it into some >> compiler Atomic::inc intrinsic for example. Do you have any other preferred >> names? Here are a few possible names for IncUsingConstant: >> >> IncUsingScaledConstant >> IncUsingAdjustedConstant >> IncUsingPlatformHelper >> >> Any favourites? >> >>> Why do we special case jshort?? >> >> To be consistent with the special case of Atomic::add on jshort. Do you want >> it removed? >> >>> This is indecipherable to normal people ;-) >>> >>> This()->template inc(dest); >>> >>> For something as trivial as adding or subtracting 1 the template >>> machinations here are just mind boggling! >> >> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The >> idea is to devirtualize a virtual call by passing in the derived type as a >> template parameter to a base class, and then let the base class static_cast >> to the derived class to devirtualize the call. I hope this explanation sheds >> some light on what is going on. The same CRTP idiom was used in the >> Atomic::add implementation in a similar fashion. >> >> I will add some comments describing this in the next round after Coleen >> replies. >> > Isn't that a lot more slower than the current inline? What makes you think so? Everything is inlined all the way to the underlying platform layer. Achieving that is the very reason why CRTP is used instead of virtual calls. > BTW, I think I see what those magic constants are (4, 8... rings a > bell ;), but I think a define here could make things more readable. Sorry, I am not sure I am following what you mean here. Thanks, /Erik > Cheers, > Mario From jesper.wilhelmsson at oracle.com Fri Sep 1 11:54:26 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Fri, 1 Sep 2017 13:54:26 +0200 Subject: Hotspot repository jdk10/hs closes today In-Reply-To: <2081B825-B14B-4846-A1BA-294B4ECC1B5A@oracle.com> References: <2081B825-B14B-4846-A1BA-294B4ECC1B5A@oracle.com> Message-ID: <26D1BC95-12A8-4FFC-BF52-661B338233B4@oracle.com> Hi, Just a reminder that this is happening today at 2 pm PT. The repository will be made read only for approx two weeks. Thanks, /Jesper > On 29 Aug 2017, at 19:08, jesper.wilhelmsson at oracle.com wrote: > > Hi, > > The repository consolidation is approaching and to prepare for that we need to push all new changes from jdk10/hs to jdk10/jdk10. Once that push is done the hotspot repository jdk10/hs will be closed for all pushes until the consolidation is done. > > The current plan is to integrate 10/hs to 10/10 on Friday/Saturday. The snapshot will be taken at 2pm PST. Pushes not completed before 2pm will be killed and rejected. > > To increase the likelihood of this proceeding smoothly, please act quickly if a bug is filed due to any change you are pushing this week. > > The repo consolidation will likely take at least two weeks. The preliminary date for opening 10/hs is September 18. This is subject to change depending on the duration of the consolidation effort. > > Thanks, > /Jesper > From shade at redhat.com Fri Sep 1 12:00:29 2017 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 1 Sep 2017 14:00:29 +0200 Subject: Hotspot repository jdk10/hs closes today In-Reply-To: <26D1BC95-12A8-4FFC-BF52-661B338233B4@oracle.com> References: <2081B825-B14B-4846-A1BA-294B4ECC1B5A@oracle.com> <26D1BC95-12A8-4FFC-BF52-661B338233B4@oracle.com> Message-ID: <7e46d1d2-e058-cd70-2d2b-73437806b7c3@redhat.com> Hi, On 09/01/2017 01:54 PM, jesper.wilhelmsson at oracle.com wrote: > Just a reminder that this is happening today at 2 pm PT. The repository will be made read only > for approx two weeks. Auxiliary question: does that mean jdk10/hs is "stable" now? I.e. no pending integrations, integration blockers, etc? We are preparing the derived shenandoah/jdk10 forest for consolidation too, and want to pull latest stable jdk10/hs to shenandoah/jdk10 to test in the interim two weeks of consolidation. Thanks, -Aleksey From neugens at redhat.com Fri Sep 1 12:20:37 2017 From: neugens at redhat.com (Mario Torre) Date: Fri, 1 Sep 2017 14:20:37 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A947BB.3040506@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <59A947BB.3040506@oracle.com> Message-ID: On Fri, Sep 1, 2017 at 1:42 PM, Erik ?sterlund wrote: >> Isn't that a lot more slower than the current inline? > > > What makes you think so? Everything is inlined all the way to the underlying > platform layer. Achieving that is the very reason why CRTP is used instead > of virtual calls. I'm not familiar with the CRTP so that's probably what confuses me, but I assume the templates are inlined, but the actual function call aren't, are they? I understand that inline is just a suggestion and with more aggressive optimisation the compiler will probably inline those anyway, but have you done some measurement to see what's the cost of all those templates? >> BTW, I think I see what those magic constants are (4, 8... rings a >> bell ;), but I think a define here could make things more readable. > > > Sorry, I am not sure I am following what you mean here. I mean this: +template +struct Atomic::PlatformInc<4, Adjustment>: Atomic::IncUsingConstant<4, Adjustment> { I need to look at atomic.hpp to find out that this 4 is a sizeof. I would rather make that more explicit, also hard coding numbers is error prone, since you are refactoring this code anyway, I think this is a nice touch that makes things a bit easier, especially given that those templates are quite cryptic to the untrained. Cheers, Mario From coleen.phillimore at oracle.com Fri Sep 1 12:51:55 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 1 Sep 2017 08:51:55 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A91CE6.2080206@oracle.com> References: <59A804F8.9000501@oracle.com> <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> <59A91CE6.2080206@oracle.com> Message-ID: <516f938c-3ed5-2d95-2a3b-418ad2d2a149@oracle.com> On 9/1/17 4:40 AM, Erik ?sterlund wrote: > Hi Coleen, > > Thank you for taking your time to review this. > > On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote: >> >> Hi, I'm trying to parse the templates to review this but maybe it's >> convention but decoding these with parameters that are single capital >> letters make reading the template very difficult.? There are already >> a lot of non-alphanumeric characters.?? When the letter is T, that is >> expected by convention, but D or especially I makes it really hard.?? >> Can these be normalized to all use T when there is only one template >> parameter?? It'll be clear that T* is a pointer and T is an integer >> without having it be P. > > I apologize the names of the template parameters are hard to > understand. For what it's worth, I am only consistently applying Kim's > conventions here. It seemed like a bad idea to violate conventions > already set up - that would arguably be more confusing. > > The convention from earlier work by Kim is: > D: Type of destination > I: Operand type that has to be an integral type > P: Operand type that is a pointer element type > T: Generic operand type, may be integral or pointer type > > Personally, I do not mind this convention. It is more specific and > annotates things we know about the type into the name of the type. > > Do you want me to: > > 1) Keep the convention, now that I have explained what the convention > is and why it is your friend It is not my friend.? It's not helpful.?? I have to go through multiple non-alphabetic characters looking for the letter I or the letter P to mentally make the substitution of the template type. > 2) Break the convention for this change only making the naming > inconsistent Break it for this changeset and we'll fix it later for the earlier work from Kim.? I don't remember P and I in Kim's changeset but realized while looking at your changeset, this was one thing that makes these templates slower and more difficult to read. In the case of cmpxchg templates with a source, destination and original values, it was necessary to have more than T be the template type, although unsatisfying, because it turned out that the types couldn't be the same. > 3) Change the convention throughout consistently, including all > earlier work from Kim > >> >> +template >> +struct Atomic::IncImpl> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >> + void operator()(I volatile* dest) const { >> + typedef IntegralConstant Adjustment; >> + typedef PlatformInc PlatformOp; >> + PlatformOp()(dest); >> + } >> +}; >> >> This one isn't as difficult, because it's short, but it would be >> faster to understand with T. >> >> +template >> +struct Atomic::IncImpl> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >> + void operator()(T volatile* dest) const { >> + typedef IntegralConstant Adjustment; >> + typedef PlatformInc PlatformOp; >> + PlatformOp()(dest); >> + } >> +}; >> >> +template<> >> +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { >> + void operator()(jshort volatile* dest) const { >> + add(jshort(1), dest); >> + } >> +}; >> >> >> Did I already ask if this could be changed to u2 rather than jshort?? >> Or is that the follow-on RFE? > > That is a follow-on RFE. Good.? I think that's the one that I assigned to myself. > >> +// Helper for platforms wanting a constant adjustment. >> +template >> +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { >> + typedef PlatformInc Derived; >> >> >> I can't find the caller of this.? Is it really a lot faster than >> having the platform independent add(1, T) / add(-1, T) to make all >> this code worth having?? How is this called?? I couldn't parse the >> trick.? Atomic::inc() is always a "constant adjustment" so I'm >> confused about what the comment means and what motivates all the asm >> code.?? Do these platform implementations exist because they don't >> have twos complement for integer representation?? really? > > This is used by some x86, PPC and s390 platforms. Personally I > question its usefulness for x86. I believe it might be one of those > things were we ran some benchmarks a decade ago and concluded that it > was slightly faster to have a slimmed path for Atomic::inc rather than > reusing Atomic::add. Yes, there are a lot of optimizations that we slog along in the code base because they might have either theoretically or measurably made some difference in something we don't have anymore. > > I did not initially want to bring this up as it seems like none of my > business, but now that the question has been asked about differences, > I could not help but notice the advertised "leading sync" convention > of Atomic::inc on PPC is not respected. That is, there is no "sync" > fence before the atomic increment, as required by the specified > semantics. There is not even a leading "lwsync". The corresponding > Atomic::add operation though, does have leading lwsync (unlike > Atomic::inc). Now this should arguably be reinforced to sync rather > than lwsync to respect the advertised semantics of both Atomic::add > and Atomic::inc on PPC. Hopefully that statement will not turn into a > long unrelated mailing thread... Could you file an bug with this observation? > > Conclusively though, there is definitely a substantial difference in > the fencing comparing the PPC implementation of Atomic::inc to > Atomic::add. Whether either one of them conforms to intended semantics > or not is a different matter - one that I was hoping not to have to > deal with in this RFE as I am merely templateifying what was already > there, without judging the existing specializations. And it is my > observation that as the code looks now, we would incur a bunch of more > fencing compared to what the code does today on PPC. > Completely understand.?? How are these called exactly though?? I couldn't figure it out. >> Also, the function name This() is really disturbing and distracting.? >> Can it be called some verb() representing what it does?? >> cast_to_derived()? >> >> + template >> + void operator()(I volatile* dest) const { >> + This()->template inc(dest); >> + } >> > > Yes, I will change the name accordingly as you suggest. > >> I didn't know you could put "template" there. > > It is required to put the template keyword before the member function > name when calling a template member function with explicit template > parameters (as opposed to implicitly inferred template parameters) on > a template type. I thought you could just stay inc() in the call, but my C++ template vocabularly is minimal. > >> What does this call? > > This calls the platform-defined intrinsic that is defined in the > platform files - the one that contains the inline assembly. How?? I don't see how...? :( > >> Rather than I for integer case, and P for pointer case, can you add a >> one line comment above this like: >> // Helper for integer types >> and >> // Helper for pointer types > > Or perhaps we could do both? Nevertheless, I will add these comments. > But as per the discussion above, I would be happy if we could keep the > convention that Kim has already set up for the template type names. > >> Small local comments would be really helpful for many of these >> functions.?? Just to get more english words in there...? Since Kim's >> on vacation can you help me understand this code and add comments so >> I remember the reasons for some of this? > > Sure - I will decorate the code with some comments to help > understanding. I will send an updated webrev when I get your reply > regarding the typename naming convention verdict. That's my opinion anyway.?? David might have the opposite opinion. Thanks, Coleen > > Thanks for the review! > > /Erik > >> >> Thanks! >> Coleen >> >> >> On 8/31/17 8:45 AM, Erik ?sterlund wrote: >>> Hi everyone, >>> >>> Bug ID: >>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>> >>> The time has come for the next step in generalizing Atomic with >>> templates. Today I will focus on Atomic::inc/dec. >>> >>> I have tried to mimic the new Kim style that seems to have been >>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>> structure looks like this: >>> >>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>> object that performs some basic type checks. >>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>> define the operation arbitrarily for a given platform. The default >>> implementation if not specialized for a platform is to call >>> Atomic::add. So only platforms that want to do something different >>> than that as an optimization have to provide a specialization. >>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec >>> to be more optimized may inherit from a helper class >>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>> necessary computation what the increment/decrement should be after >>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >>> then only needs to define an inc/dec member function, and will then >>> get all the context information necessary to generate a more >>> optimized implementation. Easy peasy. >>> >>> It is worth noticing that the generalized Atomic::dec operation >>> assumes a two's complement integer machine and potentially sends the >>> unary negative of a potentially unsigned type to Atomic::add. I have >>> the following comments about this: >>> 1) We already assume in other code that two's complement integers >>> must be present. >>> 2) A machine that does not have two's complement integers may still >>> simply provide a specialization that solves the problem in a >>> different way. >>> 3) The alternative that does not make assumptions about that would >>> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, >>> and I seem to recall we thought that was a bit too involved and >>> complicated. >>> This is the reason why I have chosen to use unary minus on the >>> potentially unsigned type in the shared helper code that sends the >>> decrement as an addend to Atomic::add. >>> >>> It would also be nice if somebody with access to PPC and s390 >>> machines could try out the relevant changes there so I do not >>> accidentally break those platforms. I have blind-coded the addition >>> of the immediate values passed in to the inline assembly in a way >>> that I think looks like it should work. >>> >>> Testing: >>> RBT hs-tier3, JPRT --testset hotspot >>> >>> Thanks, >>> /Erik >> > From erik.osterlund at oracle.com Fri Sep 1 13:31:24 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 15:31:24 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <516f938c-3ed5-2d95-2a3b-418ad2d2a149@oracle.com> References: <59A804F8.9000501@oracle.com> <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> <59A91CE6.2080206@oracle.com> <516f938c-3ed5-2d95-2a3b-418ad2d2a149@oracle.com> Message-ID: <59A9612C.40900@oracle.com> Hi Coleen, On 2017-09-01 14:51, coleen.phillimore at oracle.com wrote: > > > On 9/1/17 4:40 AM, Erik ?sterlund wrote: >> Hi Coleen, >> >> Thank you for taking your time to review this. >> >> On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote: >>> >>> Hi, I'm trying to parse the templates to review this but maybe it's >>> convention but decoding these with parameters that are single >>> capital letters make reading the template very difficult. There are >>> already a lot of non-alphanumeric characters. When the letter is >>> T, that is expected by convention, but D or especially I makes it >>> really hard. Can these be normalized to all use T when there is >>> only one template parameter? It'll be clear that T* is a pointer >>> and T is an integer without having it be P. >> >> I apologize the names of the template parameters are hard to >> understand. For what it's worth, I am only consistently applying >> Kim's conventions here. It seemed like a bad idea to violate >> conventions already set up - that would arguably be more confusing. >> >> The convention from earlier work by Kim is: >> D: Type of destination >> I: Operand type that has to be an integral type >> P: Operand type that is a pointer element type >> T: Generic operand type, may be integral or pointer type >> >> Personally, I do not mind this convention. It is more specific and >> annotates things we know about the type into the name of the type. >> >> Do you want me to: >> >> 1) Keep the convention, now that I have explained what the convention >> is and why it is your friend > > It is not my friend. It's not helpful. I have to go through > multiple non-alphabetic characters looking for the letter I or the > letter P to mentally make the substitution of the template type. Okay. I understand now that the pre-existing naming convention of types named I and P differentiating integral types from pointer types is not helpful to you. And if I understand you correctly, you would like to introduce a new naming convention that you find more helpful that uses the more general type name T instead, regardless if it refers to an integral type or a pointer type, and save the exercise of figuring out whether it is intentionally constrained to be a pointer type or an integral type to the reader by going to the declaration, and there reading some kind of comment describing such properties in text instead? Do we have a consensus that this new convention is indeed more desirable? > >> 2) Break the convention for this change only making the naming >> inconsistent > > Break it for this changeset and we'll fix it later for the earlier > work from Kim. I don't remember P and I in Kim's changeset but > realized while looking at your changeset, this was one thing that > makes these templates slower and more difficult to read. Okay. > In the case of cmpxchg templates with a source, destination and > original values, it was necessary to have more than T be the template > type, although unsatisfying, because it turned out that the types > couldn't be the same. Okay. > >> 3) Change the convention throughout consistently, including all >> earlier work from Kim >> >>> >>> +template >>> +struct Atomic::IncImpl>> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >>> + void operator()(I volatile* dest) const { >>> + typedef IntegralConstant Adjustment; >>> + typedef PlatformInc PlatformOp; >>> + PlatformOp()(dest); >>> + } >>> +}; >>> >>> This one isn't as difficult, because it's short, but it would be >>> faster to understand with T. >>> >>> +template >>> +struct Atomic::IncImpl>> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >>> + void operator()(T volatile* dest) const { >>> + typedef IntegralConstant Adjustment; >>> + typedef PlatformInc PlatformOp; >>> + PlatformOp()(dest); >>> + } >>> +}; >>> >>> +template<> >>> +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { >>> + void operator()(jshort volatile* dest) const { >>> + add(jshort(1), dest); >>> + } >>> +}; >>> >>> >>> Did I already ask if this could be changed to u2 rather than >>> jshort? Or is that the follow-on RFE? >> >> That is a follow-on RFE. > > Good. I think that's the one that I assigned to myself. Yes, you are right. >> >>> +// Helper for platforms wanting a constant adjustment. >>> +template >>> +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { >>> + typedef PlatformInc Derived; >>> >>> >>> I can't find the caller of this. Is it really a lot faster than >>> having the platform independent add(1, T) / add(-1, T) to make all >>> this code worth having? How is this called? I couldn't parse the >>> trick. Atomic::inc() is always a "constant adjustment" so I'm >>> confused about what the comment means and what motivates all the asm >>> code. Do these platform implementations exist because they don't >>> have twos complement for integer representation? really? >> >> This is used by some x86, PPC and s390 platforms. Personally I >> question its usefulness for x86. I believe it might be one of those >> things were we ran some benchmarks a decade ago and concluded that it >> was slightly faster to have a slimmed path for Atomic::inc rather >> than reusing Atomic::add. > > Yes, there are a lot of optimizations that we slog along in the code > base because they might have either theoretically or measurably made > some difference in something we don't have anymore. I noticed. :) > >> >> I did not initially want to bring this up as it seems like none of my >> business, but now that the question has been asked about differences, >> I could not help but notice the advertised "leading sync" convention >> of Atomic::inc on PPC is not respected. That is, there is no "sync" >> fence before the atomic increment, as required by the specified >> semantics. There is not even a leading "lwsync". The corresponding >> Atomic::add operation though, does have leading lwsync (unlike >> Atomic::inc). Now this should arguably be reinforced to sync rather >> than lwsync to respect the advertised semantics of both Atomic::add >> and Atomic::inc on PPC. Hopefully that statement will not turn into a >> long unrelated mailing thread... > > Could you file an bug with this observation? Sure. >> >> Conclusively though, there is definitely a substantial difference in >> the fencing comparing the PPC implementation of Atomic::inc to >> Atomic::add. Whether either one of them conforms to intended >> semantics or not is a different matter - one that I was hoping not to >> have to deal with in this RFE as I am merely templateifying what was >> already there, without judging the existing specializations. And it >> is my observation that as the code looks now, we would incur a bunch >> of more fencing compared to what the code does today on PPC. >> > > Completely understand. How are these called exactly though? I > couldn't figure it out. They are called like this: IncImpl::operator() calls PlatformInc::operator(), which has its class partially specialized by the platform (e.g. atomic_linux_pcc.hpp). Its operator() is defined by the super class helper, IncUsingConstant::operator(), that scales the addend accordingly and subsequently calls the PlatformInc::inc function that is defined in the PPC-specific atomic header and performs some suitable inline assembly for the operation. > >>> Also, the function name This() is really disturbing and >>> distracting. Can it be called some verb() representing what it >>> does? cast_to_derived()? >>> >>> + template >>> + void operator()(I volatile* dest) const { >>> + This()->template inc(dest); >>> + } >>> >> >> Yes, I will change the name accordingly as you suggest. >> >>> I didn't know you could put "template" there. >> >> It is required to put the template keyword before the member function >> name when calling a template member function with explicit template >> parameters (as opposed to implicitly inferred template parameters) on >> a template type. > > I thought you could just stay inc() in the call, but my C++ > template vocabularly is minimal. >> >>> What does this call? >> >> This calls the platform-defined intrinsic that is defined in the >> platform files - the one that contains the inline assembly. > > How? I don't see how... :( Hopefully I already explained this above. >> >>> Rather than I for integer case, and P for pointer case, can you add >>> a one line comment above this like: >>> // Helper for integer types >>> and >>> // Helper for pointer types >> >> Or perhaps we could do both? Nevertheless, I will add these comments. >> But as per the discussion above, I would be happy if we could keep >> the convention that Kim has already set up for the template type names. >> >>> Small local comments would be really helpful for many of these >>> functions. Just to get more english words in there... Since Kim's >>> on vacation can you help me understand this code and add comments so >>> I remember the reasons for some of this? >> >> Sure - I will decorate the code with some comments to help >> understanding. I will send an updated webrev when I get your reply >> regarding the typename naming convention verdict. > > That's my opinion anyway. David might have the opposite opinion. David? I am curious if you have the same opinion. If you both want to replace the template names I and P with T, then I am happy to do that. Thanks for the review. /Erik > Thanks, > Coleen > >> >> Thanks for the review! >> >> /Erik >> >>> >>> Thanks! >>> Coleen >>> >>> >>> On 8/31/17 8:45 AM, Erik ?sterlund wrote: >>>> Hi everyone, >>>> >>>> Bug ID: >>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>> >>>> The time has come for the next step in generalizing Atomic with >>>> templates. Today I will focus on Atomic::inc/dec. >>>> >>>> I have tried to mimic the new Kim style that seems to have been >>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>> structure looks like this: >>>> >>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>> object that performs some basic type checks. >>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>> define the operation arbitrarily for a given platform. The default >>>> implementation if not specialized for a platform is to call >>>> Atomic::add. So only platforms that want to do something different >>>> than that as an optimization have to provide a specialization. >>>> Layer 3) Platforms that decide to specialize >>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>> performing the necessary computation what the increment/decrement >>>> should be after pointer scaling using CRTP. The >>>> PlatformInc/PlatformDec operation then only needs to define an >>>> inc/dec member function, and will then get all the context >>>> information necessary to generate a more optimized implementation. >>>> Easy peasy. >>>> >>>> It is worth noticing that the generalized Atomic::dec operation >>>> assumes a two's complement integer machine and potentially sends >>>> the unary negative of a potentially unsigned type to Atomic::add. I >>>> have the following comments about this: >>>> 1) We already assume in other code that two's complement integers >>>> must be present. >>>> 2) A machine that does not have two's complement integers may still >>>> simply provide a specialization that solves the problem in a >>>> different way. >>>> 3) The alternative that does not make assumptions about that would >>>> use the good old IntegerTypes::cast_to_signed metaprogramming >>>> stuff, and I seem to recall we thought that was a bit too involved >>>> and complicated. >>>> This is the reason why I have chosen to use unary minus on the >>>> potentially unsigned type in the shared helper code that sends the >>>> decrement as an addend to Atomic::add. >>>> >>>> It would also be nice if somebody with access to PPC and s390 >>>> machines could try out the relevant changes there so I do not >>>> accidentally break those platforms. I have blind-coded the addition >>>> of the immediate values passed in to the inline assembly in a way >>>> that I think looks like it should work. >>>> >>>> Testing: >>>> RBT hs-tier3, JPRT --testset hotspot >>>> >>>> Thanks, >>>> /Erik >>> >> > From erik.osterlund at oracle.com Fri Sep 1 13:31:43 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 15:31:43 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <602b39a1-85e3-0e34-ff0c-c9076885c206@physik.fu-berlin.de> References: <59A804F8.9000501@oracle.com> <602b39a1-85e3-0e34-ff0c-c9076885c206@physik.fu-berlin.de> Message-ID: <59A9613F.3080902@oracle.com> Hi Adrian, Thank you for trying this for me. /Erik On 2017-09-01 11:35, John Paul Adrian Glaubitz wrote: > On 08/31/2017 02:45 PM, Erik ?sterlund wrote: >> It would also be nice if somebody with access to PPC and s390 machines >> could try out the relevant changes there so I do not accidentally break >> those platforms. > > And linux-zero and linux-sparc, of course :). I will test that. > > Adrian > From aph at redhat.com Fri Sep 1 13:41:01 2017 From: aph at redhat.com (Andrew Haley) Date: Fri, 1 Sep 2017 14:41:01 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A804F8.9000501@oracle.com> References: <59A804F8.9000501@oracle.com> Message-ID: On 31/08/17 13:45, Erik ?sterlund wrote: > Hi everyone, > > Bug ID: > https://bugs.openjdk.java.net/browse/JDK-8186838 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ > > The time has come for the next step in generalizing Atomic with > templates. Today I will focus on Atomic::inc/dec. > > I have tried to mimic the new Kim style that seems to have been > universally accepted. Like Atomic::add and Atomic::cmpxchg, the > structure looks like this: > > Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object > that performs some basic type checks. > Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define > the operation arbitrarily for a given platform. The default > implementation if not specialized for a platform is to call Atomic::add. > So only platforms that want to do something different than that as an > optimization have to provide a specialization. > Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to > be more optimized may inherit from a helper class > IncUsingConstant/DecUsingConstant. This helper helps performing the > necessary computation what the increment/decrement should be after > pointer scaling using CRTP. The PlatformInc/PlatformDec operation then > only needs to define an inc/dec member function, and will then get all > the context information necessary to generate a more optimized > implementation. Easy peasy. I wanted to say something nice, but I honestly can't. I am dismayed. I hoped that inc/dec would turn out to be much simpler than the cmpxchg functions: I think they should, because they don't have to deal with the complexity of potentially three different types. Instead we have, again, a large and complex patch. Even on AArch64, which should be the simplest case because Atomic::inc can be defined as template inc(T1 *dest) { return __sync_add_and_fetch(dest, 1); } or something similar, we have Atomic::inc Atomic::IncImpl::operator() Atomic::PlatformInc<4ul, IntegralConstant >::operator() Atomic::add Atomic::AddImpl::operator() Atomic::AddAndFetch >::operator() Atomic::PlatformAdd<4ul>::add_and_fetch __sync_add_and_fetch I quite understand that it isn't so easy on some systems, and they need a generic form that explodes into four different calls, one for each size of integer. I completely accept that it will be more complex for everything else. But is it necessary to have so much code for something so simple? This is a 1400 line patch. Granted, much of it is simply moving stuff around, but despite the potential of template code to simplify the implementation we have a more complex solution than we had before. I ask you, is this the simplest solution that you believe is possible? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Fri Sep 1 14:15:57 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 16:15:57 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> Message-ID: <59A96B9D.6070002@oracle.com> Hi Andrew, On 2017-09-01 15:41, Andrew Haley wrote: > On 31/08/17 13:45, Erik ?sterlund wrote: >> Hi everyone, >> >> Bug ID: >> https://bugs.openjdk.java.net/browse/JDK-8186838 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >> >> The time has come for the next step in generalizing Atomic with >> templates. Today I will focus on Atomic::inc/dec. >> >> I have tried to mimic the new Kim style that seems to have been >> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >> structure looks like this: >> >> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >> that performs some basic type checks. >> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >> the operation arbitrarily for a given platform. The default >> implementation if not specialized for a platform is to call Atomic::add. >> So only platforms that want to do something different than that as an >> optimization have to provide a specialization. >> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to >> be more optimized may inherit from a helper class >> IncUsingConstant/DecUsingConstant. This helper helps performing the >> necessary computation what the increment/decrement should be after >> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >> only needs to define an inc/dec member function, and will then get all >> the context information necessary to generate a more optimized >> implementation. Easy peasy. > I wanted to say something nice, but I honestly can't. I am dismayed. Okay. > I hoped that inc/dec would turn out to be much simpler than the > cmpxchg functions: I think they should, because they don't have to > deal with the complexity of potentially three different types. > Instead we have, again, a large and complex patch. > > Even on AArch64, which should be the simplest case because Atomic::inc > can be defined as > > template > inc(T1 *dest) { > return __sync_add_and_fetch(dest, 1); > } AArch64 is indeed the simplest case. It does not have a specialization in my patch. It simply expresses Atomic::inc in terms of Atomic::add. > or something similar, we have > > Atomic::inc > Atomic::IncImpl::operator() > Atomic::PlatformInc<4ul, IntegralConstant >::operator() > Atomic::add > Atomic::AddImpl::operator() > Atomic::AddAndFetch >::operator() > Atomic::PlatformAdd<4ul>::add_and_fetch > __sync_add_and_fetch > > I quite understand that it isn't so easy on some systems, and they > need a generic form that explodes into four different calls, one for > each size of integer. I completely accept that it will be more > complex for everything else. But is it necessary to have so much code > for something so simple? This is a 1400 line patch. Granted, much of > it is simply moving stuff around, but despite the potential of > template code to simplify the implementation we have a more complex > solution than we had before. > > I ask you, is this the simplest solution that you believe is possible? It is not the simplest solution I can think of. The simplest solution I can think of is to remove all specialized versions of Atomic::inc/dec and just have it call Atomic::add directly. That would remove the optimizations we have today, for whatever reason we have them. It would lead to slightly more conservative fencing on PPC/S390, and would lead to slightly less optimal machine encoding on x86 (without immediate values in the instructions). But it would be simpler for sure. I did not put any judgement into whether our existing optimizations are worthwhile or not. But if you want to prioritize simplicity, removing those optimizations is one possible solution. Would you prefer that? Thanks, /Erik From coleen.phillimore at oracle.com Fri Sep 1 14:42:48 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 1 Sep 2017 10:42:48 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A96B9D.6070002@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> On 9/1/17 10:15 AM, Erik ?sterlund wrote: > Hi Andrew, > > On 2017-09-01 15:41, Andrew Haley wrote: >> On 31/08/17 13:45, Erik ?sterlund wrote: >>> Hi everyone, >>> >>> Bug ID: >>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>> >>> The time has come for the next step in generalizing Atomic with >>> templates. Today I will focus on Atomic::inc/dec. >>> >>> I have tried to mimic the new Kim style that seems to have been >>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>> structure looks like this: >>> >>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>> that performs some basic type checks. >>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >>> the operation arbitrarily for a given platform. The default >>> implementation if not specialized for a platform is to call >>> Atomic::add. >>> So only platforms that want to do something different than that as an >>> optimization have to provide a specialization. >>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to >>> be more optimized may inherit from a helper class >>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>> necessary computation what the increment/decrement should be after >>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >>> only needs to define an inc/dec member function, and will then get all >>> the context information necessary to generate a more optimized >>> implementation. Easy peasy. >> I wanted to say something nice, but I honestly can't.? I am dismayed. > > Okay. > >> I hoped that inc/dec would turn out to be much simpler than the >> cmpxchg functions: I think they should, because they don't have to >> deal with the complexity of potentially three different types. >> Instead we have, again, a large and complex patch. >> >> Even on AArch64, which should be the simplest case because Atomic::inc >> can be defined as >> >> template >> inc(T1 *dest) { >> ?? return __sync_add_and_fetch(dest, 1); >> } > > AArch64 is indeed the simplest case. It does not have a specialization > in my patch. It simply expresses Atomic::inc in terms of Atomic::add. > >> or something similar, we have >> >> Atomic::inc >> Atomic::IncImpl::operator() >> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >> Atomic::add >> Atomic::AddImpl::operator() >> Atomic::AddAndFetch >::operator() >> Atomic::PlatformAdd<4ul>::add_and_fetch >> __sync_add_and_fetch >> >> I quite understand that it isn't so easy on some systems, and they >> need a generic form that explodes into four different calls, one for >> each size of integer.? I completely accept that it will be more >> complex for everything else.? But is it necessary to have so much code >> for something so simple?? This is a 1400 line patch.? Granted, much of >> it is simply moving stuff around, but despite the potential of >> template code to simplify the implementation we have a more complex >> solution than we had before. >> >> I ask you, is this the simplest solution that you believe is possible? > > It is not the simplest solution I can think of. The simplest solution > I can think of is to remove all specialized versions of > Atomic::inc/dec and just have it call Atomic::add directly. That would > remove the optimizations we have today, for whatever reason we have > them. It would lead to slightly more conservative fencing on PPC/S390, > and would lead to slightly less optimal machine encoding on x86 > (without immediate values in the instructions). But it would be > simpler for sure. I did not put any judgement into whether our > existing optimizations are worthwhile or not. But if you want to > prioritize simplicity, removing those optimizations is one possible > solution. Would you prefer that? I wonder if you could remove the linux x86 asm code for inc/dec, recode it to use add, and do a dev submit run against your patch? While we're discussing this. thanks, Coleen > > Thanks, > /Erik From rohitarulraj at gmail.com Fri Sep 1 15:04:04 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 1 Sep 2017 20:34:04 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

Message-ID: On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj wrote: > On Fri, Sep 1, 2017 at 3:01 AM, David Holmes wrote: >> Hi Rohit, >> >> I think the patch needs updating for jdk10 as I already see a lot of logic >> around UseSHA in vm_version_x86.cpp. >> >> Thanks, >> David >> > > Thanks David, I will update the patch wrt JDK10 source base, test and > resubmit for review. > > Regards, > Rohit > Hi All, I have updated the patch wrt openjdk10/hotspot (parent: 13519:71337910df60), did regression testing using jtreg ($make default) and didnt find any regressions. Can anyone please volunteer to review this patch which sets flag/ISA defaults for newer AMD 17h (EPYC) processor? ************************* Patch **************************** diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -1088,6 +1088,22 @@ } FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); } + if (supports_sha()) { + if (FLAG_IS_DEFAULT(UseSHA)) { + FLAG_SET_DEFAULT(UseSHA, true); + } + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || UseSHA512Intrinsics) { + if (!FLAG_IS_DEFAULT(UseSHA) || + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { + warning("SHA instructions are not available on this CPU"); + } + FLAG_SET_DEFAULT(UseSHA, false); + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } // some defaults for AMD family 15h if ( cpu_family() == 0x15 ) { @@ -1109,11 +1125,43 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + UseXMMForArrayCopy = true; + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + UseUnalignedLoadStores = true; + } + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { + UseBMI2Instructions = true; + } + if (MaxVectorSize > 32) { + FLAG_SET_DEFAULT(MaxVectorSize, 32); + } + if (UseSHA) { + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } else if (UseSHA512Intrinsics) { + warning("Intrinsics for SHA-384 and SHA-512 crypto hash functions not available on this CPU."); + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } + } +#ifdef COMPILER2 + if (supports_sse4_2()) { + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -505,6 +505,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -515,19 +523,13 @@ result |= CPU_LZCNT; if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) result |= CPU_SSE4A; + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) + result |= CPU_HT; } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; ************************************************************** Thanks, Rohit >> >> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>> >>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>> wrote: >>>> >>>> Hi Rohit, >>>> >>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>> the commit process. >>>>> >>>>> Webrev: >>>>> >>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>> >>>> >>>> >>>> Unfortunately patches can not be accepted from systems outside the >>>> OpenJDK >>>> infrastructure and ... >>>> >>>>> I have also attached the patch (hg diff -g) for reference. >>>> >>>> >>>> >>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>> the >>>> patch is small please include it inline. Otherwise you will need to find >>>> an >>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>> >>> >>>>> 3) I have done regression testing using jtreg ($make default) and >>>>> didnt find any regressions. >>>> >>>> >>>> >>>> Sounds good, but until I see the patch it is hard to comment on testing >>>> requirements. >>>> >>>> Thanks, >>>> David >>> >>> >>> Thanks David, >>> Yes, it's a small patch. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1051,6 +1051,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1072,11 +1088,43 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + UseXMMForArrayCopy = true; >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + UseUnalignedLoadStores = true; >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + UseBMI2Instructions = true; >>> + } >>> + if (MaxVectorSize > 32) { >>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -513,6 +513,16 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> } >>> // Intel features. >>> if(is_intel()) { >>> >>> Regards, >>> Rohit >>> >> From erik.osterlund at oracle.com Fri Sep 1 15:23:49 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 1 Sep 2017 17:23:49 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> Message-ID: <59A97B85.8@oracle.com> Hi Coleen, On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote: > > > On 9/1/17 10:15 AM, Erik ?sterlund wrote: >> Hi Andrew, >> >> On 2017-09-01 15:41, Andrew Haley wrote: >>> On 31/08/17 13:45, Erik ?sterlund wrote: >>>> Hi everyone, >>>> >>>> Bug ID: >>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>> >>>> The time has come for the next step in generalizing Atomic with >>>> templates. Today I will focus on Atomic::inc/dec. >>>> >>>> I have tried to mimic the new Kim style that seems to have been >>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>> structure looks like this: >>>> >>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>>> that performs some basic type checks. >>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >>>> the operation arbitrarily for a given platform. The default >>>> implementation if not specialized for a platform is to call >>>> Atomic::add. >>>> So only platforms that want to do something different than that as an >>>> optimization have to provide a specialization. >>>> Layer 3) Platforms that decide to specialize >>>> PlatformInc/PlatformDec to >>>> be more optimized may inherit from a helper class >>>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>>> necessary computation what the increment/decrement should be after >>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >>>> only needs to define an inc/dec member function, and will then get all >>>> the context information necessary to generate a more optimized >>>> implementation. Easy peasy. >>> I wanted to say something nice, but I honestly can't. I am dismayed. >> >> Okay. >> >>> I hoped that inc/dec would turn out to be much simpler than the >>> cmpxchg functions: I think they should, because they don't have to >>> deal with the complexity of potentially three different types. >>> Instead we have, again, a large and complex patch. >>> >>> Even on AArch64, which should be the simplest case because Atomic::inc >>> can be defined as >>> >>> template >>> inc(T1 *dest) { >>> return __sync_add_and_fetch(dest, 1); >>> } >> >> AArch64 is indeed the simplest case. It does not have a >> specialization in my patch. It simply expresses Atomic::inc in terms >> of Atomic::add. >> >>> or something similar, we have >>> >>> Atomic::inc >>> Atomic::IncImpl::operator() >>> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >>> Atomic::add >>> Atomic::AddImpl::operator() >>> Atomic::AddAndFetch >::operator() >>> Atomic::PlatformAdd<4ul>::add_and_fetch >>> __sync_add_and_fetch >>> >>> I quite understand that it isn't so easy on some systems, and they >>> need a generic form that explodes into four different calls, one for >>> each size of integer. I completely accept that it will be more >>> complex for everything else. But is it necessary to have so much code >>> for something so simple? This is a 1400 line patch. Granted, much of >>> it is simply moving stuff around, but despite the potential of >>> template code to simplify the implementation we have a more complex >>> solution than we had before. >>> >>> I ask you, is this the simplest solution that you believe is possible? >> >> It is not the simplest solution I can think of. The simplest solution >> I can think of is to remove all specialized versions of >> Atomic::inc/dec and just have it call Atomic::add directly. That >> would remove the optimizations we have today, for whatever reason we >> have them. It would lead to slightly more conservative fencing on >> PPC/S390, and would lead to slightly less optimal machine encoding on >> x86 (without immediate values in the instructions). But it would be >> simpler for sure. I did not put any judgement into whether our >> existing optimizations are worthwhile or not. But if you want to >> prioritize simplicity, removing those optimizations is one possible >> solution. Would you prefer that? > > I wonder if you could remove the linux x86 asm code for inc/dec, > recode it to use add, and do a dev submit run against your patch? > While we're discussing this. Okay, I will try that. /Erik > thanks, > Coleen > >> >> Thanks, >> /Erik > From jesper.wilhelmsson at oracle.com Fri Sep 1 15:42:03 2017 From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson) Date: Fri, 1 Sep 2017 17:42:03 +0200 Subject: Hotspot repository jdk10/hs closes today In-Reply-To: <7e46d1d2-e058-cd70-2d2b-73437806b7c3@redhat.com> References: <2081B825-B14B-4846-A1BA-294B4ECC1B5A@oracle.com> <26D1BC95-12A8-4FFC-BF52-661B338233B4@oracle.com> <7e46d1d2-e058-cd70-2d2b-73437806b7c3@redhat.com> Message-ID: <3DE5D3A5-0E5E-413D-9191-9C6B7132D22F@oracle.com> Jdk10/hs has been fairly stable lately. There are no open integration blockers right now. I'll send out a status update tomorrow when I have looked at the results of the Friday nightly. /Jesper > 1 sep. 2017 kl. 14:00 skrev Aleksey Shipilev : > > Hi, > >> On 09/01/2017 01:54 PM, jesper.wilhelmsson at oracle.com wrote: >> Just a reminder that this is happening today at 2 pm PT. The repository will be made read only >> for approx two weeks. > Auxiliary question: does that mean jdk10/hs is "stable" now? I.e. no pending integrations, > integration blockers, etc? We are preparing the derived shenandoah/jdk10 forest for consolidation > too, and want to pull latest stable jdk10/hs to shenandoah/jdk10 to test in the interim two weeks of > consolidation. > > Thanks, > -Aleksey > From volker.simonis at gmail.com Fri Sep 1 15:42:53 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 1 Sep 2017 17:42:53 +0200 Subject: RFR(S): 8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob() Message-ID: Hi, can I please have a review and sponsor for the following small fix: http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091/ https://bugs.openjdk.java.net/browse/JDK-8187091 We see failures in test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java which are cause by problems in CodeHeap::contains_blob() for corner cases with CodeBlobs of zero size: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (heap.cpp:248), pid=27586, tid=27587 # guarantee((char*) b >= _memory.low_boundary() && (char*) b < _memory.high()) failed: The block to be deallocated 0x00007fffe6666f80 is not within the heap starting with 0x00007fffe6667000 and ending with 0x00007fffe6ba000 The problem is that JDK-8183573 replaced virtual bool contains_blob(const CodeBlob* blob) const { return low_boundary() <= (char*) blob && (char*) blob < high(); } by: bool contains_blob(const CodeBlob* blob) const { return contains(blob->code_begin()); } But that my be wrong in the corner case where the size of the CodeBlob's payload is zero (i.e. the CodeBlob consists only of the 'header' - i.e. the C++ object itself) because in that case CodeBlob::code_begin() points right behind the CodeBlob's header which is a memory location which doesn't belong to the CodeBlob anymore. This exact corner case is exercised by ReturnBlobToWrongHeapTest which allocates CodeBlobs of size zero (i.e. zero 'payload') with the help of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills up. The test first fills the 'non-profiled nmethods' CodeHeap. If the 'non-profiled nmethods' CodeHeap is full, the VM automatically tries to allocate from the 'profiled nmethods' CodeHeap until that fills up as well. But in the CodeCache the 'profiled nmethods' CodeHeap is located right before the non-profiled nmethods' CodeHeap. So if the last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a payload size of zero and uses all the CodeHeaps remaining size, we will end up with a CodeBlob whose code_begin() address will point right behind the actual CodeHeap (i.e. it will point right at the beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This will result in the above guarantee to fire, when we will try to free the last allocated CodeBlob (with sun.hotspot.WhiteBox.freeCodeBlob()). In a previous mail thread (http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/028175.html) Vladimir explained why JDK-8183573 was done: > About contains_blob(). The problem is that AOTCompiledMethod allocated in CHeap and not in aot code section (which is RO): > > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 > > It is allocated in CHeap after AOT library is loaded. Its code_begin() points to AOT code section but AOTCompiledMethod* > points outside it (to normal malloced space) so you can't use (char*)blob address. and proposed these two fixes: > There are 2 ways to fix it, I think. > One is to add new field to CodeBlobLayout and set it to blob* address for normal CodeCache blobs and to code_begin for > AOT code. > Second is to use contains(blob->code_end() - 1) assuming that AOT code is never zero. I came up with a slightly different solution - just use 'CodeHeap::code_blob_type()' whether to use 'blob->code_begin()' (for the AOT case) or '(void*)blob' (for all other blobs) as input for the call to 'CodeHeap::contain()'. It's simple and still much cheaper than a virtual call. What do you think? I've also updated the documentation of the CodeBlob class hierarchy in codeBlob.hpp. Please let me know if I've missed something. Thank you and best regards, Volker From volker.simonis at gmail.com Fri Sep 1 15:46:32 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 1 Sep 2017 17:46:32 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com>

Message-ID: Hi, I've decided to split the fix for the 'CodeHeap::contains_blob()' problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob()" (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new review thread for discussing it at: http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html So please lets keep this thread for discussing the interpreter code size issue only. I've prepared a new version of the webrev which is the same as the first one with the only difference that the change to 'CodeHeap::contains_blob()' has been removed: http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ Thanks, Volker On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis wrote: > On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov > wrote: >> Very good change. Thank you, Volker. >> >> About contains_blob(). The problem is that AOTCompiledMethod allocated in >> CHeap and not in aot code section (which is RO): >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >> >> It is allocated in CHeap after AOT library is loaded. Its code_begin() >> points to AOT code section but AOTCompiledMethod* points outside it (to >> normal malloced space) so you can't use (char*)blob address. >> > > Thanks for the explanation - now I got it. > >> There are 2 ways to fix it, I think. >> One is to add new field to CodeBlobLayout and set it to blob* address for >> normal CodeCache blobs and to code_begin for AOT code. >> Second is to use contains(blob->code_end() - 1) assuming that AOT code is >> never zero. >> > > I'll give it a try tomorrow and will send out a new webrev. > > Regards, > Volker > >> Thanks, >> Vladimir >> >> >> On 8/31/17 5:43 AM, Volker Simonis wrote: >>> >>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>> wrote: >>>> >>>> >>>> >>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>> >>>>> >>>>> While working on this, I found another problem which is related to the >>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg test >>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>> >>>>> The problem is that JDK-8183573 replaced >>>>> >>>>> virtual bool contains_blob(const CodeBlob* blob) const { return >>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>> >>>>> by: >>>>> >>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>> contains(blob->code_begin()); } >>>>> >>>>> But that my be wrong in the corner case where the size of the >>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>> 'header' - i.e. the C++ object itself) because in that case >>>>> CodeBlob::code_begin() points right behind the CodeBlob's header which >>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>> >>>> >>>> >>>> I recall this change was somehow necessary to allow merging >>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>> one devirtualized method, so you need to ensure all AOT tests >>>> pass with this change (on linux-x64). >>>> >>> >>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>> successful. Are there any other tests I should check? >>> >>> That said, it is a little hard to follow the stages of your change. It >>> seems like >>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>> was reviewed [1] but then finally the slightly changed version from >>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ was >>> checked in and linked to the bug report. >>> >>> The first, reviewed version of the change still had a correct version >>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>> checked in version has the faulty version of that method. >>> >>> I don't know why you finally did that change to 'contains_blob()' but >>> I don't see any reason why we shouldn't be able to directly use the >>> blob's address for inclusion checking. From what I understand, it >>> should ALWAYS be contained in the corresponding CodeHeap so no reason >>> to mess with 'CodeBlob::code_begin()'. >>> >>> Please let me know if I'm missing something. >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>> >>>> I can't help to wonder if we'd not be better served by disallowing >>>> zero-sized payloads. Is this something that can ever actually >>>> happen except by abuse of the white box API? >>>> >>> >>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>> wants to allocate "segment sized" blocks which is most easily achieved >>> by allocation zero-sized CodeBlobs. And I think there's nothing wrong >>> about it if we handle the inclusion tests correctly. >>> >>> Thank you and best regards, >>> Volker >>> >>>> /Claes From coleen.phillimore at oracle.com Fri Sep 1 15:52:01 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 1 Sep 2017 11:52:01 -0400 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A97B85.8@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> <59A97B85.8@oracle.com> Message-ID: <2914a34f-845a-3dd1-0407-c42dbac04b19@oracle.com> The only Atomic::inc* that I found in product code that wasn't printing statistics or exception cases was mostly in G1 and one interesting case in objectMonitor and safepointing, where a lot of other CAS operations already have been done.? I'm willing to bet this platform specific optimization has no value.?? I would vote removal, pending examination of these places. share/vm/gc/g1/dirtyCardQueue.cpp ? if (result) { ??? assert_fully_consumed(node, buffer_size()); ??? Atomic::inc(&_processed_buffers_mut); ? } ... ????? Atomic::inc(&_processed_buffers_rs_thread);* * share/vm/gc/g1/heapRegionRemSet.cpp ????????? Atomic::inc(&_occupied); ? Atomic::inc(&_n_coarsenings); share/vm/runtime/objectMonitor.cpp ObjectMonitor::enter() ? // Prevent deflation at STW-time.? See deflate_idle_monitors() and is_busy(). ? // Ensure the object-monitor relationship remains stable while there's contention. ? Atomic::inc(&_count); share/vm/runtime/safepoint.cpp ????? if (is_synchronizing()) { ???????? Atomic::inc (&TryingToBlock) ; ????? } share/vm/code/nmethod.cpp nmethodLocker ? Atomic::inc(&nm->_lock_count); Coleen On 9/1/17 11:23 AM, Erik ?sterlund wrote: > Hi Coleen, > > On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote: >> >> >> On 9/1/17 10:15 AM, Erik ?sterlund wrote: >>> Hi Andrew, >>> >>> On 2017-09-01 15:41, Andrew Haley wrote: >>>> On 31/08/17 13:45, Erik ?sterlund wrote: >>>>> Hi everyone, >>>>> >>>>> Bug ID: >>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>> >>>>> The time has come for the next step in generalizing Atomic with >>>>> templates. Today I will focus on Atomic::inc/dec. >>>>> >>>>> I have tried to mimic the new Kim style that seems to have been >>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>> structure looks like this: >>>>> >>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>>>> that performs some basic type checks. >>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>> define >>>>> the operation arbitrarily for a given platform. The default >>>>> implementation if not specialized for a platform is to call >>>>> Atomic::add. >>>>> So only platforms that want to do something different than that as an >>>>> optimization have to provide a specialization. >>>>> Layer 3) Platforms that decide to specialize >>>>> PlatformInc/PlatformDec to >>>>> be more optimized may inherit from a helper class >>>>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>>>> necessary computation what the increment/decrement should be after >>>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation >>>>> then >>>>> only needs to define an inc/dec member function, and will then get >>>>> all >>>>> the context information necessary to generate a more optimized >>>>> implementation. Easy peasy. >>>> I wanted to say something nice, but I honestly can't.? I am dismayed. >>> >>> Okay. >>> >>>> I hoped that inc/dec would turn out to be much simpler than the >>>> cmpxchg functions: I think they should, because they don't have to >>>> deal with the complexity of potentially three different types. >>>> Instead we have, again, a large and complex patch. >>>> >>>> Even on AArch64, which should be the simplest case because Atomic::inc >>>> can be defined as >>>> >>>> template >>>> inc(T1 *dest) { >>>> ?? return __sync_add_and_fetch(dest, 1); >>>> } >>> >>> AArch64 is indeed the simplest case. It does not have a >>> specialization in my patch. It simply expresses Atomic::inc in terms >>> of Atomic::add. >>> >>>> or something similar, we have >>>> >>>> Atomic::inc >>>> Atomic::IncImpl::operator() >>>> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >>>> Atomic::add >>>> Atomic::AddImpl::operator() >>>> Atomic::AddAndFetch >::operator() >>>> Atomic::PlatformAdd<4ul>::add_and_fetch >>>> __sync_add_and_fetch >>>> >>>> I quite understand that it isn't so easy on some systems, and they >>>> need a generic form that explodes into four different calls, one for >>>> each size of integer.? I completely accept that it will be more >>>> complex for everything else.? But is it necessary to have so much code >>>> for something so simple?? This is a 1400 line patch. Granted, much of >>>> it is simply moving stuff around, but despite the potential of >>>> template code to simplify the implementation we have a more complex >>>> solution than we had before. >>>> >>>> I ask you, is this the simplest solution that you believe is possible? >>> >>> It is not the simplest solution I can think of. The simplest >>> solution I can think of is to remove all specialized versions of >>> Atomic::inc/dec and just have it call Atomic::add directly. That >>> would remove the optimizations we have today, for whatever reason we >>> have them. It would lead to slightly more conservative fencing on >>> PPC/S390, and would lead to slightly less optimal machine encoding >>> on x86 (without immediate values in the instructions). But it would >>> be simpler for sure. I did not put any judgement into whether our >>> existing optimizations are worthwhile or not. But if you want to >>> prioritize simplicity, removing those optimizations is one possible >>> solution. Would you prefer that? >> >> I wonder if you could remove the linux x86 asm code for inc/dec, >> recode it to use add, and do a dev submit run against your patch? >> While we're discussing this. > > Okay, I will try that. > > /Erik > >> thanks, >> Coleen >> >>> >>> Thanks, >>> /Erik >> > From vladimir.kozlov at oracle.com Fri Sep 1 16:00:42 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Sep 2017 09:00:42 -0700 Subject: RFR(S): 8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob() In-Reply-To: References: Message-ID: Checking type is emulation of virtual call ;-) But I agree that it is simplest solution - one line change (excluding comment - comment is good BTW). You can also add guard AOT_ONLY() around aot specific code: const void* start = AOT_ONLY( (code_blob_type() == CodeBlobType::AOT) ? blob->code_begin() : ) (void*)blob; because we do have builds without AOT. Thanks, Vladimir On 9/1/17 8:42 AM, Volker Simonis wrote: > Hi, > > can I please have a review and sponsor for the following small fix: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091/ > https://bugs.openjdk.java.net/browse/JDK-8187091 > > We see failures in > test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java which > are cause by problems in CodeHeap::contains_blob() for corner cases > with CodeBlobs of zero size: > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (heap.cpp:248), pid=27586, tid=27587 > # guarantee((char*) b >= _memory.low_boundary() && (char*) b < > _memory.high()) failed: The block to be deallocated 0x00007fffe6666f80 > is not within the heap starting with 0x00007fffe6667000 and ending > with 0x00007fffe6ba000 > > The problem is that JDK-8183573 replaced > > virtual bool contains_blob(const CodeBlob* blob) const { return > low_boundary() <= (char*) blob && (char*) blob < high(); } > > by: > > bool contains_blob(const CodeBlob* blob) const { return > contains(blob->code_begin()); } > > But that my be wrong in the corner case where the size of the > CodeBlob's payload is zero (i.e. the CodeBlob consists only of the > 'header' - i.e. the C++ object itself) because in that case > CodeBlob::code_begin() points right behind the CodeBlob's header which > is a memory location which doesn't belong to the CodeBlob anymore. > > This exact corner case is exercised by ReturnBlobToWrongHeapTest which > allocates CodeBlobs of size zero (i.e. zero 'payload') with the help > of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills > up. The test first fills the 'non-profiled nmethods' CodeHeap. If the > 'non-profiled nmethods' CodeHeap is full, the VM automatically tries > to allocate from the 'profiled nmethods' CodeHeap until that fills up > as well. But in the CodeCache the 'profiled nmethods' CodeHeap is > located right before the non-profiled nmethods' CodeHeap. So if the > last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a > payload size of zero and uses all the CodeHeaps remaining size, we > will end up with a CodeBlob whose code_begin() address will point > right behind the actual CodeHeap (i.e. it will point right at the > beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This > will result in the above guarantee to fire, when we will try to free > the last allocated CodeBlob (with > sun.hotspot.WhiteBox.freeCodeBlob()). > > In a previous mail thread > (http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/028175.html) > Vladimir explained why JDK-8183573 was done: > >> About contains_blob(). The problem is that AOTCompiledMethod allocated in CHeap and not in aot code section (which is RO): >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >> >> It is allocated in CHeap after AOT library is loaded. Its code_begin() points to AOT code section but AOTCompiledMethod* >> points outside it (to normal malloced space) so you can't use (char*)blob address. > > and proposed these two fixes: > >> There are 2 ways to fix it, I think. >> One is to add new field to CodeBlobLayout and set it to blob* address for normal CodeCache blobs and to code_begin for >> AOT code. >> Second is to use contains(blob->code_end() - 1) assuming that AOT code is never zero. > > I came up with a slightly different solution - just use > 'CodeHeap::code_blob_type()' whether to use 'blob->code_begin()' (for > the AOT case) or '(void*)blob' (for all other blobs) as input for the > call to 'CodeHeap::contain()'. It's simple and still much cheaper than > a virtual call. What do you think? > > I've also updated the documentation of the CodeBlob class hierarchy in > codeBlob.hpp. Please let me know if I've missed something. > > Thank you and best regards, > Volker > From erik.osterlund at oracle.com Fri Sep 1 16:10:38 2017 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 1 Sep 2017 18:10:38 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <2914a34f-845a-3dd1-0407-c42dbac04b19@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <87f3f12e-26d8-9806-6821-3fb5783bf832@oracle.com> <59A97B85.8@oracle.com> <2914a34f-845a-3dd1-0407-c42dbac04b19@oracle.com> Message-ID: <5F1D016E-C92B-4C26-8A3C-D4BF59033751@oracle.com> Hi Coleen, I tend to agree. I would happily nuke this optimization in the name of simplicity. Thanks, /Erik > On 1 Sep 2017, at 17:52, coleen.phillimore at oracle.com wrote: > > > The only Atomic::inc* that I found in product code that wasn't printing statistics or exception cases was mostly in G1 and one interesting case in objectMonitor and safepointing, where a lot of other CAS operations already have been done. I'm willing to bet this platform specific optimization has no value. I would vote removal, pending examination of these places. > > share/vm/gc/g1/dirtyCardQueue.cpp > > if (result) { > assert_fully_consumed(node, buffer_size()); > Atomic::inc(&_processed_buffers_mut); > } > ... > Atomic::inc(&_processed_buffers_rs_thread); > > share/vm/gc/g1/heapRegionRemSet.cpp > > Atomic::inc(&_occupied); > Atomic::inc(&_n_coarsenings); > > share/vm/runtime/objectMonitor.cpp > > ObjectMonitor::enter() > > // Prevent deflation at STW-time. See deflate_idle_monitors() and is_busy(). > // Ensure the object-monitor relationship remains stable while there's contention. > Atomic::inc(&_count); > > share/vm/runtime/safepoint.cpp > > if (is_synchronizing()) { > Atomic::inc (&TryingToBlock) ; > } > > share/vm/code/nmethod.cpp > nmethodLocker > > Atomic::inc(&nm->_lock_count); > > > Coleen > > >> On 9/1/17 11:23 AM, Erik ?sterlund wrote: >> Hi Coleen, >> >>> On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote: >>> >>> >>>> On 9/1/17 10:15 AM, Erik ?sterlund wrote: >>>> Hi Andrew, >>>> >>>>> On 2017-09-01 15:41, Andrew Haley wrote: >>>>>> On 31/08/17 13:45, Erik ?sterlund wrote: >>>>>> Hi everyone, >>>>>> >>>>>> Bug ID: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>> >>>>>> The time has come for the next step in generalizing Atomic with >>>>>> templates. Today I will focus on Atomic::inc/dec. >>>>>> >>>>>> I have tried to mimic the new Kim style that seems to have been >>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>>> structure looks like this: >>>>>> >>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>>>>> that performs some basic type checks. >>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >>>>>> the operation arbitrarily for a given platform. The default >>>>>> implementation if not specialized for a platform is to call Atomic::add. >>>>>> So only platforms that want to do something different than that as an >>>>>> optimization have to provide a specialization. >>>>>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to >>>>>> be more optimized may inherit from a helper class >>>>>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>>>>> necessary computation what the increment/decrement should be after >>>>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >>>>>> only needs to define an inc/dec member function, and will then get all >>>>>> the context information necessary to generate a more optimized >>>>>> implementation. Easy peasy. >>>>> I wanted to say something nice, but I honestly can't. I am dismayed. >>>> >>>> Okay. >>>> >>>>> I hoped that inc/dec would turn out to be much simpler than the >>>>> cmpxchg functions: I think they should, because they don't have to >>>>> deal with the complexity of potentially three different types. >>>>> Instead we have, again, a large and complex patch. >>>>> >>>>> Even on AArch64, which should be the simplest case because Atomic::inc >>>>> can be defined as >>>>> >>>>> template >>>>> inc(T1 *dest) { >>>>> return __sync_add_and_fetch(dest, 1); >>>>> } >>>> >>>> AArch64 is indeed the simplest case. It does not have a specialization in my patch. It simply expresses Atomic::inc in terms of Atomic::add. >>>> >>>>> or something similar, we have >>>>> >>>>> Atomic::inc >>>>> Atomic::IncImpl::operator() >>>>> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >>>>> Atomic::add >>>>> Atomic::AddImpl::operator() >>>>> Atomic::AddAndFetch >::operator() >>>>> Atomic::PlatformAdd<4ul>::add_and_fetch >>>>> __sync_add_and_fetch >>>>> >>>>> I quite understand that it isn't so easy on some systems, and they >>>>> need a generic form that explodes into four different calls, one for >>>>> each size of integer. I completely accept that it will be more >>>>> complex for everything else. But is it necessary to have so much code >>>>> for something so simple? This is a 1400 line patch. Granted, much of >>>>> it is simply moving stuff around, but despite the potential of >>>>> template code to simplify the implementation we have a more complex >>>>> solution than we had before. >>>>> >>>>> I ask you, is this the simplest solution that you believe is possible? >>>> >>>> It is not the simplest solution I can think of. The simplest solution I can think of is to remove all specialized versions of Atomic::inc/dec and just have it call Atomic::add directly. That would remove the optimizations we have today, for whatever reason we have them. It would lead to slightly more conservative fencing on PPC/S390, and would lead to slightly less optimal machine encoding on x86 (without immediate values in the instructions). But it would be simpler for sure. I did not put any judgement into whether our existing optimizations are worthwhile or not. But if you want to prioritize simplicity, removing those optimizations is one possible solution. Would you prefer that? >>> >>> I wonder if you could remove the linux x86 asm code for inc/dec, recode it to use add, and do a dev submit run against your patch? While we're discussing this. >> >> Okay, I will try that. >> >> /Erik >> >>> thanks, >>> Coleen >>> >>>> >>>> Thanks, >>>> /Erik >>> >> > From vladimir.kozlov at oracle.com Fri Sep 1 16:16:28 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Sep 2017 09:16:28 -0700 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com>

Message-ID: <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> May be add new CodeBlob's method to adjust sizes instead of directly setting them in CodeCache::free_unused_tail(). Then you would not need friend class CodeCache in CodeBlob. Also I think adjustment to header_size should be done in CodeCache::free_unused_tail() to limit scope of code who knows about blob layout. Thanks, Vladimir On 9/1/17 8:46 AM, Volker Simonis wrote: > Hi, > > I've decided to split the fix for the 'CodeHeap::contains_blob()' > problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails > because of problems in CodeHeap::contains_blob()" > (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new > review thread for discussing it at: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html > > So please lets keep this thread for discussing the interpreter code > size issue only. I've prepared a new version of the webrev which is > the same as the first one with the only difference that the change to > 'CodeHeap::contains_blob()' has been removed: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ > > Thanks, > Volker > > > On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis > wrote: >> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >> wrote: >>> Very good change. Thank you, Volker. >>> >>> About contains_blob(). The problem is that AOTCompiledMethod allocated in >>> CHeap and not in aot code section (which is RO): >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>> >>> It is allocated in CHeap after AOT library is loaded. Its code_begin() >>> points to AOT code section but AOTCompiledMethod* points outside it (to >>> normal malloced space) so you can't use (char*)blob address. >>> >> >> Thanks for the explanation - now I got it. >> >>> There are 2 ways to fix it, I think. >>> One is to add new field to CodeBlobLayout and set it to blob* address for >>> normal CodeCache blobs and to code_begin for AOT code. >>> Second is to use contains(blob->code_end() - 1) assuming that AOT code is >>> never zero. >>> >> >> I'll give it a try tomorrow and will send out a new webrev. >> >> Regards, >> Volker >> >>> Thanks, >>> Vladimir >>> >>> >>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>> >>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>> wrote: >>>>> >>>>> >>>>> >>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>> >>>>>> >>>>>> While working on this, I found another problem which is related to the >>>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg test >>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>> >>>>>> The problem is that JDK-8183573 replaced >>>>>> >>>>>> virtual bool contains_blob(const CodeBlob* blob) const { return >>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>>>> >>>>>> by: >>>>>> >>>>>> bool contains_blob(const CodeBlob* blob) const { return >>>>>> contains(blob->code_begin()); } >>>>>> >>>>>> But that my be wrong in the corner case where the size of the >>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>>>> 'header' - i.e. the C++ object itself) because in that case >>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header which >>>>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>>> >>>>> >>>>> >>>>> I recall this change was somehow necessary to allow merging >>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >>>>> one devirtualized method, so you need to ensure all AOT tests >>>>> pass with this change (on linux-x64). >>>>> >>>> >>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >>>> successful. Are there any other tests I should check? >>>> >>>> That said, it is a little hard to follow the stages of your change. It >>>> seems like >>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>> was reviewed [1] but then finally the slightly changed version from >>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ was >>>> checked in and linked to the bug report. >>>> >>>> The first, reviewed version of the change still had a correct version >>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >>>> checked in version has the faulty version of that method. >>>> >>>> I don't know why you finally did that change to 'contains_blob()' but >>>> I don't see any reason why we shouldn't be able to directly use the >>>> blob's address for inclusion checking. From what I understand, it >>>> should ALWAYS be contained in the corresponding CodeHeap so no reason >>>> to mess with 'CodeBlob::code_begin()'. >>>> >>>> Please let me know if I'm missing something. >>>> >>>> [1] >>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>> >>>>> I can't help to wonder if we'd not be better served by disallowing >>>>> zero-sized payloads. Is this something that can ever actually >>>>> happen except by abuse of the white box API? >>>>> >>>> >>>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >>>> wants to allocate "segment sized" blocks which is most easily achieved >>>> by allocation zero-sized CodeBlobs. And I think there's nothing wrong >>>> about it if we handle the inclusion tests correctly. >>>> >>>> Thank you and best regards, >>>> Volker >>>> >>>>> /Claes From vladimir.kozlov at oracle.com Fri Sep 1 16:43:54 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 1 Sep 2017 09:43:54 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

Message-ID: Hi Rohit, Changes look good. Only question I have is about MaxVectorSize. It is set > 16 only in presence of AVX: http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 Does that code works for AMD 17h too? Thanks, Vladimir On 9/1/17 8:04 AM, Rohit Arul Raj wrote: > On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj wrote: >> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes wrote: >>> Hi Rohit, >>> >>> I think the patch needs updating for jdk10 as I already see a lot of logic >>> around UseSHA in vm_version_x86.cpp. >>> >>> Thanks, >>> David >>> >> >> Thanks David, I will update the patch wrt JDK10 source base, test and >> resubmit for review. >> >> Regards, >> Rohit >> > > Hi All, > > I have updated the patch wrt openjdk10/hotspot (parent: > 13519:71337910df60), did regression testing using jtreg ($make > default) and didnt find any regressions. > > Can anyone please volunteer to review this patch which sets flag/ISA > defaults for newer AMD 17h (EPYC) processor? > > ************************* Patch **************************** > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -1088,6 +1088,22 @@ > } > FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); > } > + if (supports_sha()) { > + if (FLAG_IS_DEFAULT(UseSHA)) { > + FLAG_SET_DEFAULT(UseSHA, true); > + } > + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || > UseSHA512Intrinsics) { > + if (!FLAG_IS_DEFAULT(UseSHA) || > + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || > + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || > + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { > + warning("SHA instructions are not available on this CPU"); > + } > + FLAG_SET_DEFAULT(UseSHA, false); > + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); > + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } > > // some defaults for AMD family 15h > if ( cpu_family() == 0x15 ) { > @@ -1109,11 +1125,43 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + UseXMMForArrayCopy = true; > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + UseUnalignedLoadStores = true; > + } > + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { > + UseBMI2Instructions = true; > + } > + if (MaxVectorSize > 32) { > + FLAG_SET_DEFAULT(MaxVectorSize, 32); > + } > + if (UseSHA) { > + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } else if (UseSHA512Intrinsics) { > + warning("Intrinsics for SHA-384 and SHA-512 crypto hash > functions not available on this CPU."); > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } > + } > +#ifdef COMPILER2 > + if (supports_sse4_2()) { > + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -505,6 +505,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -515,19 +523,13 @@ > result |= CPU_LZCNT; > if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) > result |= CPU_SSE4A; > + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > + result |= CPU_HT; > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > > ************************************************************** > > Thanks, > Rohit > >>> >>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>> >>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>> wrote: >>>>> >>>>> Hi Rohit, >>>>> >>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>> the commit process. >>>>>> >>>>>> Webrev: >>>>>> >>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>> >>>>> >>>>> >>>>> Unfortunately patches can not be accepted from systems outside the >>>>> OpenJDK >>>>> infrastructure and ... >>>>> >>>>>> I have also attached the patch (hg diff -g) for reference. >>>>> >>>>> >>>>> >>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>> the >>>>> patch is small please include it inline. Otherwise you will need to find >>>>> an >>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>> >>>> >>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>> didnt find any regressions. >>>>> >>>>> >>>>> >>>>> Sounds good, but until I see the patch it is hard to comment on testing >>>>> requirements. >>>>> >>>>> Thanks, >>>>> David >>>> >>>> >>>> Thanks David, >>>> Yes, it's a small patch. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1051,6 +1051,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1072,11 +1088,43 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + UseXMMForArrayCopy = true; >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + UseUnalignedLoadStores = true; >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + UseBMI2Instructions = true; >>>> + } >>>> + if (MaxVectorSize > 32) { >>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -513,6 +513,16 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> >>>> Regards, >>>> Rohit >>>> >>> From david.holmes at oracle.com Fri Sep 1 21:51:17 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 2 Sep 2017 07:51:17 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A9612C.40900@oracle.com> References: <59A804F8.9000501@oracle.com> <331bf921-243a-9c28-eb0f-c945bdc11384@oracle.com> <59A91CE6.2080206@oracle.com> <516f938c-3ed5-2d95-2a3b-418ad2d2a149@oracle.com> <59A9612C.40900@oracle.com> Message-ID: <20fe4ff5-5ea6-449a-09a8-3859db022477@oracle.com> > David? I am curious if you have the same opinion. If you both want to replace the template names I and P with T, then I am happy to do that. I don't mind the P, I convention, but probably would not miss it either. So I'm on the fence. David ----- On 1/09/2017 11:31 PM, Erik ?sterlund wrote: > Hi Coleen, > > On 2017-09-01 14:51, coleen.phillimore at oracle.com wrote: >> >> >> On 9/1/17 4:40 AM, Erik ?sterlund wrote: >>> Hi Coleen, >>> >>> Thank you for taking your time to review this. >>> >>> On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote: >>>> >>>> Hi, I'm trying to parse the templates to review this but maybe it's >>>> convention but decoding these with parameters that are single >>>> capital letters make reading the template very difficult.? There are >>>> already a lot of non-alphanumeric characters.?? When the letter is >>>> T, that is expected by convention, but D or especially I makes it >>>> really hard.?? Can these be normalized to all use T when there is >>>> only one template parameter?? It'll be clear that T* is a pointer >>>> and T is an integer without having it be P. >>> >>> I apologize the names of the template parameters are hard to >>> understand. For what it's worth, I am only consistently applying >>> Kim's conventions here. It seemed like a bad idea to violate >>> conventions already set up - that would arguably be more confusing. >>> >>> The convention from earlier work by Kim is: >>> D: Type of destination >>> I: Operand type that has to be an integral type >>> P: Operand type that is a pointer element type >>> T: Generic operand type, may be integral or pointer type >>> >>> Personally, I do not mind this convention. It is more specific and >>> annotates things we know about the type into the name of the type. >>> >>> Do you want me to: >>> >>> 1) Keep the convention, now that I have explained what the convention >>> is and why it is your friend >> >> It is not my friend.? It's not helpful.?? I have to go through >> multiple non-alphabetic characters looking for the letter I or the >> letter P to mentally make the substitution of the template type. > > Okay. I understand now that the pre-existing naming convention of types > named I and P differentiating integral types from pointer types is not > helpful to you. And if I understand you correctly, you would like to > introduce a new naming convention that you find more helpful that uses > the more general type name T instead, regardless if it refers to an > integral type or a pointer type, and save the exercise of figuring out > whether it is intentionally constrained to be a pointer type or an > integral type to the reader by going to the declaration, and there > reading some kind of comment describing such properties in text instead? > > Do we have a consensus that this new convention is indeed more desirable? > >> >>> 2) Break the convention for this change only making the naming >>> inconsistent >> >> Break it for this changeset and we'll fix it later for the earlier >> work from Kim.? I don't remember P and I in Kim's changeset but >> realized while looking at your changeset, this was one thing that >> makes these templates slower and more difficult to read. > > Okay. > >> In the case of cmpxchg templates with a source, destination and >> original values, it was necessary to have more than T be the template >> type, although unsatisfying, because it turned out that the types >> couldn't be the same. > > Okay. > >> >>> 3) Change the convention throughout consistently, including all >>> earlier work from Kim >>> >>>> >>>> +template >>>> +struct Atomic::IncImpl>>> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >>>> + void operator()(I volatile* dest) const { >>>> + typedef IntegralConstant Adjustment; >>>> + typedef PlatformInc PlatformOp; >>>> + PlatformOp()(dest); >>>> + } >>>> +}; >>>> >>>> This one isn't as difficult, because it's short, but it would be >>>> faster to understand with T. >>>> >>>> +template >>>> +struct Atomic::IncImpl>>> EnableIf::value>::type> VALUE_OBJ_CLASS_SPEC { >>>> + void operator()(T volatile* dest) const { >>>> + typedef IntegralConstant Adjustment; >>>> + typedef PlatformInc PlatformOp; >>>> + PlatformOp()(dest); >>>> + } >>>> +}; >>>> >>>> +template<> >>>> +struct Atomic::IncImpl VALUE_OBJ_CLASS_SPEC { >>>> + void operator()(jshort volatile* dest) const { >>>> + add(jshort(1), dest); >>>> + } >>>> +}; >>>> >>>> >>>> Did I already ask if this could be changed to u2 rather than >>>> jshort?? Or is that the follow-on RFE? >>> >>> That is a follow-on RFE. >> >> Good.? I think that's the one that I assigned to myself. > > Yes, you are right. > >>> >>>> +// Helper for platforms wanting a constant adjustment. >>>> +template >>>> +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC { >>>> + typedef PlatformInc Derived; >>>> >>>> >>>> I can't find the caller of this.? Is it really a lot faster than >>>> having the platform independent add(1, T) / add(-1, T) to make all >>>> this code worth having?? How is this called?? I couldn't parse the >>>> trick.? Atomic::inc() is always a "constant adjustment" so I'm >>>> confused about what the comment means and what motivates all the asm >>>> code.?? Do these platform implementations exist because they don't >>>> have twos complement for integer representation?? really? >>> >>> This is used by some x86, PPC and s390 platforms. Personally I >>> question its usefulness for x86. I believe it might be one of those >>> things were we ran some benchmarks a decade ago and concluded that it >>> was slightly faster to have a slimmed path for Atomic::inc rather >>> than reusing Atomic::add. >> >> Yes, there are a lot of optimizations that we slog along in the code >> base because they might have either theoretically or measurably made >> some difference in something we don't have anymore. > > I noticed. :) > >> >>> >>> I did not initially want to bring this up as it seems like none of my >>> business, but now that the question has been asked about differences, >>> I could not help but notice the advertised "leading sync" convention >>> of Atomic::inc on PPC is not respected. That is, there is no "sync" >>> fence before the atomic increment, as required by the specified >>> semantics. There is not even a leading "lwsync". The corresponding >>> Atomic::add operation though, does have leading lwsync (unlike >>> Atomic::inc). Now this should arguably be reinforced to sync rather >>> than lwsync to respect the advertised semantics of both Atomic::add >>> and Atomic::inc on PPC. Hopefully that statement will not turn into a >>> long unrelated mailing thread... >> >> Could you file an bug with this observation? > > Sure. > >>> >>> Conclusively though, there is definitely a substantial difference in >>> the fencing comparing the PPC implementation of Atomic::inc to >>> Atomic::add. Whether either one of them conforms to intended >>> semantics or not is a different matter - one that I was hoping not to >>> have to deal with in this RFE as I am merely templateifying what was >>> already there, without judging the existing specializations. And it >>> is my observation that as the code looks now, we would incur a bunch >>> of more fencing compared to what the code does today on PPC. >>> >> >> Completely understand.?? How are these called exactly though?? I >> couldn't figure it out. > > They are called like this: > IncImpl::operator() calls PlatformInc::operator(), which has its class > partially specialized by the platform (e.g. atomic_linux_pcc.hpp). Its > operator() is defined by the super class helper, > IncUsingConstant::operator(), that scales the addend accordingly and > subsequently calls the PlatformInc::inc function that is defined in the > PPC-specific atomic header and performs some suitable inline assembly > for the operation. > >> >>>> Also, the function name This() is really disturbing and >>>> distracting.? Can it be called some verb() representing what it >>>> does?? cast_to_derived()? >>>> >>>> + template >>>> + void operator()(I volatile* dest) const { >>>> + This()->template inc(dest); >>>> + } >>>> >>> >>> Yes, I will change the name accordingly as you suggest. >>> >>>> I didn't know you could put "template" there. >>> >>> It is required to put the template keyword before the member function >>> name when calling a template member function with explicit template >>> parameters (as opposed to implicitly inferred template parameters) on >>> a template type. >> >> I thought you could just stay inc() in the call, but my C++ >> template vocabularly is minimal. >>> >>>> What does this call? >>> >>> This calls the platform-defined intrinsic that is defined in the >>> platform files - the one that contains the inline assembly. >> >> How?? I don't see how...? :( > > Hopefully I already explained this above. > >>> >>>> Rather than I for integer case, and P for pointer case, can you add >>>> a one line comment above this like: >>>> // Helper for integer types >>>> and >>>> // Helper for pointer types >>> >>> Or perhaps we could do both? Nevertheless, I will add these comments. >>> But as per the discussion above, I would be happy if we could keep >>> the convention that Kim has already set up for the template type names. >>> >>>> Small local comments would be really helpful for many of these >>>> functions.?? Just to get more english words in there...? Since Kim's >>>> on vacation can you help me understand this code and add comments so >>>> I remember the reasons for some of this? >>> >>> Sure - I will decorate the code with some comments to help >>> understanding. I will send an updated webrev when I get your reply >>> regarding the typename naming convention verdict. >> >> That's my opinion anyway.?? David might have the opposite opinion. > > David? I am curious if you have the same opinion. If you both want to > replace the template names I and P with T, then I am happy to do that. > > Thanks for the review. > > /Erik > >> Thanks, >> Coleen >> >>> >>> Thanks for the review! >>> >>> /Erik >>> >>>> >>>> Thanks! >>>> Coleen >>>> >>>> >>>> On 8/31/17 8:45 AM, Erik ?sterlund wrote: >>>>> Hi everyone, >>>>> >>>>> Bug ID: >>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>> >>>>> The time has come for the next step in generalizing Atomic with >>>>> templates. Today I will focus on Atomic::inc/dec. >>>>> >>>>> I have tried to mimic the new Kim style that seems to have been >>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>> structure looks like this: >>>>> >>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>> object that performs some basic type checks. >>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>> define the operation arbitrarily for a given platform. The default >>>>> implementation if not specialized for a platform is to call >>>>> Atomic::add. So only platforms that want to do something different >>>>> than that as an optimization have to provide a specialization. >>>>> Layer 3) Platforms that decide to specialize >>>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>>> performing the necessary computation what the increment/decrement >>>>> should be after pointer scaling using CRTP. The >>>>> PlatformInc/PlatformDec operation then only needs to define an >>>>> inc/dec member function, and will then get all the context >>>>> information necessary to generate a more optimized implementation. >>>>> Easy peasy. >>>>> >>>>> It is worth noticing that the generalized Atomic::dec operation >>>>> assumes a two's complement integer machine and potentially sends >>>>> the unary negative of a potentially unsigned type to Atomic::add. I >>>>> have the following comments about this: >>>>> 1) We already assume in other code that two's complement integers >>>>> must be present. >>>>> 2) A machine that does not have two's complement integers may still >>>>> simply provide a specialization that solves the problem in a >>>>> different way. >>>>> 3) The alternative that does not make assumptions about that would >>>>> use the good old IntegerTypes::cast_to_signed metaprogramming >>>>> stuff, and I seem to recall we thought that was a bit too involved >>>>> and complicated. >>>>> This is the reason why I have chosen to use unary minus on the >>>>> potentially unsigned type in the shared helper code that sends the >>>>> decrement as an addend to Atomic::add. >>>>> >>>>> It would also be nice if somebody with access to PPC and s390 >>>>> machines could try out the relevant changes there so I do not >>>>> accidentally break those platforms. I have blind-coded the addition >>>>> of the immediate values passed in to the inline assembly in a way >>>>> that I think looks like it should work. >>>>> >>>>> Testing: >>>>> RBT hs-tier3, JPRT --testset hotspot >>>>> >>>>> Thanks, >>>>> /Erik >>>> >>> >> > From david.holmes at oracle.com Fri Sep 1 21:57:14 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 2 Sep 2017 07:57:14 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A96B9D.6070002@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: On 2/09/2017 12:15 AM, Erik ?sterlund wrote: > Hi Andrew, > > On 2017-09-01 15:41, Andrew Haley wrote: >> On 31/08/17 13:45, Erik ?sterlund wrote: >>> Hi everyone, >>> >>> Bug ID: >>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>> >>> The time has come for the next step in generalizing Atomic with >>> templates. Today I will focus on Atomic::inc/dec. >>> >>> I have tried to mimic the new Kim style that seems to have been >>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>> structure looks like this: >>> >>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object >>> that performs some basic type checks. >>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define >>> the operation arbitrarily for a given platform. The default >>> implementation if not specialized for a platform is to call Atomic::add. >>> So only platforms that want to do something different than that as an >>> optimization have to provide a specialization. >>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to >>> be more optimized may inherit from a helper class >>> IncUsingConstant/DecUsingConstant. This helper helps performing the >>> necessary computation what the increment/decrement should be after >>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then >>> only needs to define an inc/dec member function, and will then get all >>> the context information necessary to generate a more optimized >>> implementation. Easy peasy. >> I wanted to say something nice, but I honestly can't.? I am dismayed. > > Okay. > >> I hoped that inc/dec would turn out to be much simpler than the >> cmpxchg functions: I think they should, because they don't have to >> deal with the complexity of potentially three different types. >> Instead we have, again, a large and complex patch. >> >> Even on AArch64, which should be the simplest case because Atomic::inc >> can be defined as >> >> template >> inc(T1 *dest) { >> ?? return __sync_add_and_fetch(dest, 1); >> } > > AArch64 is indeed the simplest case. It does not have a specialization > in my patch. It simply expresses Atomic::inc in terms of Atomic::add. > >> or something similar, we have >> >> Atomic::inc >> Atomic::IncImpl::operator() >> Atomic::PlatformInc<4ul, IntegralConstant >::operator() >> Atomic::add >> Atomic::AddImpl::operator() >> Atomic::AddAndFetch >::operator() >> Atomic::PlatformAdd<4ul>::add_and_fetch >> __sync_add_and_fetch >> >> I quite understand that it isn't so easy on some systems, and they >> need a generic form that explodes into four different calls, one for >> each size of integer.? I completely accept that it will be more >> complex for everything else.? But is it necessary to have so much code >> for something so simple?? This is a 1400 line patch.? Granted, much of >> it is simply moving stuff around, but despite the potential of >> template code to simplify the implementation we have a more complex >> solution than we had before. >> >> I ask you, is this the simplest solution that you believe is possible? > > It is not the simplest solution I can think of. The simplest solution I > can think of is to remove all specialized versions of Atomic::inc/dec > and just have it call Atomic::add directly. That would remove the I don't think this is the source of complexity that screams for simplification. It is all the template stuff. I can't get right into this right now (it's Saturday) but I still don't see why we have such seemingly different things in cmpxchg, add and inc/dec when the basic jobs at each level are the same. Maybe it is just use of different names that is confusing me. David ----- > optimizations we have today, for whatever reason we have them. It would > lead to slightly more conservative fencing on PPC/S390, and would lead > to slightly less optimal machine encoding on x86 (without immediate > values in the instructions). But it would be simpler for sure. I did not > put any judgement into whether our existing optimizations are worthwhile > or not. But if you want to prioritize simplicity, removing those > optimizations is one possible solution. Would you prefer that? > > Thanks, > /Erik From serguei.spitsyn at oracle.com Fri Sep 1 22:48:15 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 1 Sep 2017 15:48:15 -0700 Subject: RFR (S) 8081323: ConstantPool::_resolved_references is missing in heap dump In-Reply-To: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> References: <915c3300-2528-2b85-2492-0b54a783c622@oracle.com> Message-ID: Hi Coleen, The fix looks good. Thanks, Serguei On 8/31/17 09:02, coleen.phillimore at oracle.com wrote: > Summary: Add resolved_references and init_lock as hidden static field > in class so root is found. > > Tested manually with YourKit. See bug for images. Also ran > serviceability tests. > > open webrev at http://cr.openjdk.java.net/~coleenp/8081323.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8081323 > > Thanks, > Coleen > From rohitarulraj at gmail.com Sat Sep 2 08:16:33 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Sat, 2 Sep 2017 13:46:33 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

Message-ID: Hello Vladimir, > Changes look good. Only question I have is about MaxVectorSize. It is set > > 16 only in presence of AVX: > > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 > > Does that code works for AMD 17h too? Thanks for pointing that out. Yes, the code works fine for AMD 17h. So I have removed the surplus check for MaxVectorSize from my patch. I have updated, re-tested and attached the patch. I have one query regarding the setting of UseSHA flag: http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 AMD 17h has support for SHA. AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets enabled for it based on the availability of BMI2 and AVX2. Is there an underlying reason for this? I have handled this in the patch but just wanted to confirm. Thanks for taking time to review the code. diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -1088,6 +1088,22 @@ } FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); } + if (supports_sha()) { + if (FLAG_IS_DEFAULT(UseSHA)) { + FLAG_SET_DEFAULT(UseSHA, true); + } + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || UseSHA512Intrinsics) { + if (!FLAG_IS_DEFAULT(UseSHA) || + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { + warning("SHA instructions are not available on this CPU"); + } + FLAG_SET_DEFAULT(UseSHA, false); + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } // some defaults for AMD family 15h if ( cpu_family() == 0x15 ) { @@ -1109,11 +1125,40 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); + } + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { + FLAG_SET_DEFAULT(UseBMI2Instructions, true); + } + if (UseSHA) { + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } else if (UseSHA512Intrinsics) { + warning("Intrinsics for SHA-384 and SHA-512 crypto hash functions not available on this CPU."); + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); + } + } +#ifdef COMPILER2 + if (supports_sse4_2()) { + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -505,6 +505,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -515,19 +523,13 @@ result |= CPU_LZCNT; if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) result |= CPU_SSE4A; + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) + result |= CPU_HT; } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; Regards, Rohit > On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >> >> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >> wrote: >>> >>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>> wrote: >>>> >>>> Hi Rohit, >>>> >>>> I think the patch needs updating for jdk10 as I already see a lot of >>>> logic >>>> around UseSHA in vm_version_x86.cpp. >>>> >>>> Thanks, >>>> David >>>> >>> >>> Thanks David, I will update the patch wrt JDK10 source base, test and >>> resubmit for review. >>> >>> Regards, >>> Rohit >>> >> >> Hi All, >> >> I have updated the patch wrt openjdk10/hotspot (parent: >> 13519:71337910df60), did regression testing using jtreg ($make >> default) and didnt find any regressions. >> >> Can anyone please volunteer to review this patch which sets flag/ISA >> defaults for newer AMD 17h (EPYC) processor? >> >> ************************* Patch **************************** >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1088,6 +1088,22 @@ >> } >> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >> } >> + if (supports_sha()) { >> + if (FLAG_IS_DEFAULT(UseSHA)) { >> + FLAG_SET_DEFAULT(UseSHA, true); >> + } >> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >> UseSHA512Intrinsics) { >> + if (!FLAG_IS_DEFAULT(UseSHA) || >> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + warning("SHA instructions are not available on this CPU"); >> + } >> + FLAG_SET_DEFAULT(UseSHA, false); >> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> >> // some defaults for AMD family 15h >> if ( cpu_family() == 0x15 ) { >> @@ -1109,11 +1125,43 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + UseXMMForArrayCopy = true; >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + UseUnalignedLoadStores = true; >> + } >> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >> + UseBMI2Instructions = true; >> + } >> + if (MaxVectorSize > 32) { >> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >> + } >> + if (UseSHA) { >> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } else if (UseSHA512Intrinsics) { >> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >> functions not available on this CPU."); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2()) { >> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -505,6 +505,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -515,19 +523,13 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> >> ************************************************************** >> >> Thanks, >> Rohit >> >>>> >>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>>> the commit process. >>>>>>> >>>>>>> Webrev: >>>>>>> >>>>>>> >>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>> OpenJDK >>>>>> infrastructure and ... >>>>>> >>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>>> the >>>>>> patch is small please include it inline. Otherwise you will need to >>>>>> find >>>>>> an >>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>> >>>>> >>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>> didnt find any regressions. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>> testing >>>>>> requirements. >>>>>> >>>>>> Thanks, >>>>>> David >>>>> >>>>> >>>>> >>>>> Thanks David, >>>>> Yes, it's a small patch. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1051,6 +1051,22 @@ >>>>> } >>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>> } >>>>> + if (supports_sha()) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>> + } >>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>> UseSHA512Intrinsics) { >>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + warning("SHA instructions are not available on this CPU"); >>>>> + } >>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> >>>>> // some defaults for AMD family 15h >>>>> if ( cpu_family() == 0x15 ) { >>>>> @@ -1072,11 +1088,43 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + UseXMMForArrayCopy = true; >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>> { >>>>> + UseUnalignedLoadStores = true; >>>>> + } >>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>> + UseBMI2Instructions = true; >>>>> + } >>>>> + if (MaxVectorSize > 32) { >>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> + } >>>>> + if (UseSHA) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } else if (UseSHA512Intrinsics) { >>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>> functions not available on this CPU."); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2()) { >>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -513,6 +513,16 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> >>>>> Regards, >>>>> Rohit >>>>> >>>> > From aph at redhat.com Sat Sep 2 08:31:46 2017 From: aph at redhat.com (Andrew Haley) Date: Sat, 2 Sep 2017 09:31:46 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A96B9D.6070002@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: On 01/09/17 15:15, Erik ?sterlund wrote: > It is not the simplest solution I can think of. The simplest solution I > can think of is to remove all specialized versions of Atomic::inc/dec > and just have it call Atomic::add directly. That would remove the > optimizations we have today, for whatever reason we have them. It would > lead to slightly more conservative fencing on PPC/S390, I see. Can you say what instructions would be different? > and would lead to slightly less optimal machine encoding on x86 > (without immediate values in the instructions). But it would be > simpler for sure. I did not put any judgement into whether our > existing optimizations are worthwhile or not. But if you want to > prioritize simplicity, removing those optimizations is one possible > solution. Would you prefer that? Is this really about optimization? If we cared about getting this stuff as optimized as possible we'd use intrinsics on GCC/x86 targets. These have been supported for a long time. But it seems we're determined to preserve the legacy assembly-language implementations and use them everywhere, even where they are not necessary. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jesper.wilhelmsson at oracle.com Sat Sep 2 10:15:30 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Sat, 2 Sep 2017 12:15:30 +0200 Subject: jdk10/hs integration status Message-ID: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> Hi, After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: JDK-8187124 TestInterpreterMethodEntries.java: Unable to create shared archive file This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. There are four test failures that looks slightly different: tools/jar/modularJar/Basic.java tools/jar/multiRelease/ApiValidatorTest.java tools/jar/multiRelease/Basic.java tools/launcher/InfoStreams.java These four tests fails because they get a warning on stderr: Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. I do not consider this a blocker for integration, bug filed: JDK-8187125 JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. /Jesper From david.holmes at oracle.com Sat Sep 2 11:03:03 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 2 Sep 2017 21:03:03 +1000 Subject: jdk10/hs integration status In-Reply-To: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> Message-ID: <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> Hi Jesper, On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. The JPRT job that was used for the nightly testing was not valid. The repos were out of sync due to a re-run with an intervening integration job. The MaxRAMFraction failures in the tools tests below were caused by that. David ----- > > There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: > > JDK-8187124 > TestInterpreterMethodEntries.java: Unable to create shared archive file > > This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. > > > There are four test failures that looks slightly different: > tools/jar/modularJar/Basic.java > tools/jar/multiRelease/ApiValidatorTest.java > tools/jar/multiRelease/Basic.java > tools/launcher/InfoStreams.java > > These four tests fails because they get a warning on stderr: > Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. > I do not consider this a blocker for integration, bug filed: JDK-8187125 > > > JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. > > /Jesper > From jesper.wilhelmsson at oracle.com Sat Sep 2 12:30:45 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Sat, 2 Sep 2017 14:30:45 +0200 Subject: jdk10/hs integration status In-Reply-To: <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> Message-ID: <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> > On 2 Sep 2017, at 13:03, David Holmes wrote: > > Hi Jesper, > > On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >> Hi, >> After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. > > The JPRT job that was used for the nightly testing was not valid. The repos were out of sync due to a re-run with an intervening integration job. The MaxRAMFraction failures in the tools tests below were caused by that. Sigh... I thought we didn't use rerun for integration jobs. Thanks for the heads-up David! I'll start a new nightly now to get a trustworthy result. /Jesper > > David > ----- > >> There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: >> JDK-8187124 >> TestInterpreterMethodEntries.java: Unable to create shared archive file >> This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. >> There are four test failures that looks slightly different: >> tools/jar/modularJar/Basic.java >> tools/jar/multiRelease/ApiValidatorTest.java >> tools/jar/multiRelease/Basic.java >> tools/launcher/InfoStreams.java >> These four tests fails because they get a warning on stderr: >> Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. >> I do not consider this a blocker for integration, bug filed: JDK-8187125 >> JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >> /Jesper From daniel.daugherty at oracle.com Sat Sep 2 14:49:36 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 2 Sep 2017 08:49:36 -0600 Subject: jdk10/hs integration status In-Reply-To: <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> Message-ID: We're not completely out of the woods. These tests: tools/jar/modularJar/Basic.java tools/jar/multiRelease/ApiValidatorTest.java tools/jar/multiRelease/Basic.java tools/launcher/InfoStreams.java still failed in the 2017-09-01 JDK10-hs nightly with: java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit Server VM warning: Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. Dan On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >> On 2 Sep 2017, at 13:03, David Holmes wrote: >> >> Hi Jesper, >> >> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>> Hi, >>> After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. >> The JPRT job that was used for the nightly testing was not valid. The repos were out of sync due to a re-run with an intervening integration job. The MaxRAMFraction failures in the tools tests below were caused by that. > Sigh... I thought we didn't use rerun for integration jobs. > > Thanks for the heads-up David! I'll start a new nightly now to get a trustworthy result. > > /Jesper > > >> David >> ----- >> >>> There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: >>> JDK-8187124 >>> TestInterpreterMethodEntries.java: Unable to create shared archive file >>> This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. >>> There are four test failures that looks slightly different: >>> tools/jar/modularJar/Basic.java >>> tools/jar/multiRelease/ApiValidatorTest.java >>> tools/jar/multiRelease/Basic.java >>> tools/launcher/InfoStreams.java >>> These four tests fails because they get a warning on stderr: >>> Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. >>> I do not consider this a blocker for integration, bug filed: JDK-8187125 >>> JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >>> /Jesper > From vladimir.kozlov at oracle.com Sat Sep 2 17:55:31 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 2 Sep 2017 10:55:31 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

Message-ID: Hi Rohit, On 9/2/17 1:16 AM, Rohit Arul Raj wrote: > Hello Vladimir, > >> Changes look good. Only question I have is about MaxVectorSize. It is set > >> 16 only in presence of AVX: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >> >> Does that code works for AMD 17h too? > > Thanks for pointing that out. Yes, the code works fine for AMD 17h. So > I have removed the surplus check for MaxVectorSize from my patch. I > have updated, re-tested and attached the patch. Which check you removed? > > I have one query regarding the setting of UseSHA flag: > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 > > AMD 17h has support for SHA. > AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets > enabled for it based on the availability of BMI2 and AVX2. Is there an > underlying reason for this? I have handled this in the patch but just > wanted to confirm. It was done with next changes which use only AVX2 and BMI2 instructions to calculate SHA-256: http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 I don't know if AMD 15h supports these instructions and can execute that code. You need to test it. May be you should move your new UseSHA related code to the line 821 to set UseSHA for AMD. Then you don't need to overwrite UseSHA*Intrinsics flags which are set after that line. Regards, Vladimir > > Thanks for taking time to review the code. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -1088,6 +1088,22 @@ > } > FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); > } > + if (supports_sha()) { > + if (FLAG_IS_DEFAULT(UseSHA)) { > + FLAG_SET_DEFAULT(UseSHA, true); > + } > + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || > UseSHA512Intrinsics) { > + if (!FLAG_IS_DEFAULT(UseSHA) || > + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || > + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || > + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { > + warning("SHA instructions are not available on this CPU"); > + } > + FLAG_SET_DEFAULT(UseSHA, false); > + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); > + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } > > // some defaults for AMD family 15h > if ( cpu_family() == 0x15 ) { > @@ -1109,11 +1125,40 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { > + FLAG_SET_DEFAULT(UseBMI2Instructions, true); > + } > + if (UseSHA) { > + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } else if (UseSHA512Intrinsics) { > + warning("Intrinsics for SHA-384 and SHA-512 crypto hash > functions not available on this CPU."); > + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); > + } > + } > +#ifdef COMPILER2 > + if (supports_sse4_2()) { > + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -505,6 +505,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -515,19 +523,13 @@ > result |= CPU_LZCNT; > if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) > result |= CPU_SSE4A; > + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > + result |= CPU_HT; > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > > > Regards, > Rohit > > > >> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>> >>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>> wrote: >>>> >>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>> wrote: >>>>> >>>>> Hi Rohit, >>>>> >>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>> logic >>>>> around UseSHA in vm_version_x86.cpp. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>> >>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>> resubmit for review. >>>> >>>> Regards, >>>> Rohit >>>> >>> >>> Hi All, >>> >>> I have updated the patch wrt openjdk10/hotspot (parent: >>> 13519:71337910df60), did regression testing using jtreg ($make >>> default) and didnt find any regressions. >>> >>> Can anyone please volunteer to review this patch which sets flag/ISA >>> defaults for newer AMD 17h (EPYC) processor? >>> >>> ************************* Patch **************************** >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1088,6 +1088,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1109,11 +1125,43 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + UseXMMForArrayCopy = true; >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + UseUnalignedLoadStores = true; >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + UseBMI2Instructions = true; >>> + } >>> + if (MaxVectorSize > 32) { >>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -505,6 +505,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -515,19 +523,13 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> >>> ************************************************************** >>> >>> Thanks, >>> Rohit >>> >>>>> >>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Rohit, >>>>>>> >>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I would like an volunteer to review this patch (openJDK9) which sets >>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with >>>>>>>> the commit process. >>>>>>>> >>>>>>>> Webrev: >>>>>>>> >>>>>>>> >>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>> OpenJDK >>>>>>> infrastructure and ... >>>>>>> >>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ... unfortunately patches tend to get stripped by the mail servers. If >>>>>>> the >>>>>>> patch is small please include it inline. Otherwise you will need to >>>>>>> find >>>>>>> an >>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>> >>>>>> >>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>> didnt find any regressions. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>> testing >>>>>>> requirements. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>> >>>>>> >>>>>> >>>>>> Thanks David, >>>>>> Yes, it's a small patch. >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -1051,6 +1051,22 @@ >>>>>> } >>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>> } >>>>>> + if (supports_sha()) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>> + } >>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>> UseSHA512Intrinsics) { >>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>> + } >>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> >>>>>> // some defaults for AMD family 15h >>>>>> if ( cpu_family() == 0x15 ) { >>>>>> @@ -1072,11 +1088,43 @@ >>>>>> } >>>>>> >>>>>> #ifdef COMPILER2 >>>>>> - if (MaxVectorSize > 16) { >>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> } >>>>>> #endif // COMPILER2 >>>>>> + >>>>>> + // Some defaults for AMD family 17h >>>>>> + if ( cpu_family() == 0x17 ) { >>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>> Array Copy >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> + UseXMMForArrayCopy = true; >>>>>> + } >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>> { >>>>>> + UseUnalignedLoadStores = true; >>>>>> + } >>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>> + UseBMI2Instructions = true; >>>>>> + } >>>>>> + if (MaxVectorSize > 32) { >>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>> + } >>>>>> + if (UseSHA) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } else if (UseSHA512Intrinsics) { >>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>> functions not available on this CPU."); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> + } >>>>>> +#ifdef COMPILER2 >>>>>> + if (supports_sse4_2()) { >>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> + } >>>>>> + } >>>>>> +#endif >>>>>> + } >>>>>> } >>>>>> >>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -513,6 +513,16 @@ >>>>>> result |= CPU_LZCNT; >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>> result |= CPU_SSE4A; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> + result |= CPU_BMI2; >>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>> + result |= CPU_HT; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> + result |= CPU_ADX; >>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> + result |= CPU_SHA; >>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> + result |= CPU_FMA; >>>>>> } >>>>>> // Intel features. >>>>>> if(is_intel()) { >>>>>> >>>>>> Regards, >>>>>> Rohit >>>>>> >>>>> >> From ioi.lam at oracle.com Sun Sep 3 00:05:55 2017 From: ioi.lam at oracle.com (Ioi Lam) Date: Sat, 2 Sep 2017 17:05:55 -0700 Subject: jdk10/hs integration status In-Reply-To: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> Message-ID: <6962b514-502c-9f56-b87f-ff68a1911847@oracle.com> On 9/2/17 3:15 AM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. > > > There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: > > JDK-8187124 > TestInterpreterMethodEntries.java: Unable to create shared archive file > > This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. JDK-8187124 is not a new regression and it's a test bug, so I've removed the integration_blocker label. Thanks - Ioi > > There are four test failures that looks slightly different: > tools/jar/modularJar/Basic.java > tools/jar/multiRelease/ApiValidatorTest.java > tools/jar/multiRelease/Basic.java > tools/launcher/InfoStreams.java > > These four tests fails because they get a warning on stderr: > Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. > I do not consider this a blocker for integration, bug filed: JDK-8187125 > > > JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. > > /Jesper > From david.holmes at oracle.com Sun Sep 3 04:40:32 2017 From: david.holmes at oracle.com (David Holmes) Date: Sun, 3 Sep 2017 14:40:32 +1000 Subject: jdk10/hs integration status In-Reply-To: References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com> Message-ID: On 3/09/2017 12:49 AM, Daniel D. Daugherty wrote: > We're not completely out of the woods. These tests: > > tools/jar/modularJar/Basic.java > tools/jar/multiRelease/ApiValidatorTest.java > tools/jar/multiRelease/Basic.java > tools/launcher/InfoStreams.java > > still failed in the 2017-09-01 JDK10-hs nightly with: > > java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit Server > VM warning: Option MaxRAMFraction was deprecated in version 10.0 and > will likely be removed in a future release. Isn't that the nightly we're talking about Dan? Those tests only fail if the closed repo has not got Bob's changes that switch from MaxRAMFraction to MaxRAMPercentage. David > Dan > > > On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >>> On 2 Sep 2017, at 13:03, David Holmes wrote: >>> >>> Hi Jesper, >>> >>> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>>> Hi, >>>> After going through the results of our nightlies it seems we are in >>>> fairly good shape for integration. There was one issue with a typo >>>> in a recent fix that caused some failures, this issue was resolved >>>> yesterday just after the nightly snapshot was taken. >>> The JPRT job that was used for the nightly testing was not valid. The >>> repos were out of sync due to a re-run with an intervening >>> integration job. The MaxRAMFraction failures in the tools tests below >>> were caused by that. >> Sigh... I thought we didn't use rerun for integration jobs. >> >> Thanks for the heads-up David!? I'll start a new nightly now to get a >> trustworthy result. >> >> /Jesper >> >> >>> David >>> ----- >>> >>>> There is currently one issue that I didn't recognise and at the >>>> moment it is marked as an integration blocker: >>>> JDK-8187124 >>>> TestInterpreterMethodEntries.java: Unable to create shared archive >>>> file >>>> This could as well be a problem with the test execution in which >>>> case it is not a blocker, but someone needs to look into the details >>>> here. >>>> There are four test failures that looks slightly different: >>>> tools/jar/modularJar/Basic.java >>>> tools/jar/multiRelease/ApiValidatorTest.java >>>> tools/jar/multiRelease/Basic.java >>>> tools/launcher/InfoStreams.java >>>> These four tests fails because they get a warning on stderr: >>>> Option MaxRAMFraction was deprecated in version 10.0 and will likely >>>> be removed in a future release. >>>> I do not consider this a blocker for integration, bug filed: >>>> JDK-8187125 >>>> JDK10/hs now has restricted write access. Basically it is locked but >>>> in order to fix any urgent issues that might pop up over the next >>>> couple of days these people have write access: Vladimir Kozlov, Dan >>>> Daugherty, Stefan Karlsson, and myself. >>>> /Jesper >> > From daniel.daugherty at oracle.com Sun Sep 3 04:52:38 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 2 Sep 2017 22:52:38 -0600 Subject: jdk10/hs integration status In-Reply-To: References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com>

Message-ID: <232afa04-3bf5-1ea5-3462-6b0920b8f0d9@oracle.com> On 9/2/17 10:40 PM, David Holmes wrote: > On 3/09/2017 12:49 AM, Daniel D. Daugherty wrote: >> We're not completely out of the woods. These tests: >> >> tools/jar/modularJar/Basic.java >> tools/jar/multiRelease/ApiValidatorTest.java >> tools/jar/multiRelease/Basic.java >> tools/launcher/InfoStreams.java >> >> still failed in the 2017-09-01 JDK10-hs nightly with: >> >> java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit >> Server VM warning: Option MaxRAMFraction was deprecated in version >> 10.0 and will likely be removed in a future release. > > Isn't that the nightly we're talking about Dan? No. The failures that were originally discussed were in the 2017-08-31 JDK10-hs nightly and were mostly caused by Calvin's rerun JPRT job. I found a couple of places in the current JDK10-hs repo where MaxRAMFraction is still used, but none that explain the above four test failures. My conclusion is that they are picking up the MaxRAMFraction from whatever mechanism is being used to launch those tests... These two kitchensink config files still use MaxRAMFraction: ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:test.jvm.args=-XX:MaxRAMFraction=2 -XX:+CrashOnOutOfMemoryError -Djava.net.preferIPv6Addresses=false -XX:-PrintVMOptions -XX:+DisplayVMOutputToStderr -XX:+UsePerfData -Xlog:gc*:gc.log -XX:+DisableExplicitGC -XX:+PrintFlagsFinal -XX:+StartAttachListener -XX:+UnlockCommercialFeatures -XX:NativeMemoryTracking=detail -XX:+ResourceManagement -XX:+FlightRecorder ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:original.jvm.args=-XX:MaxRAMFraction=8 -Djava.net.preferIPv6Addresses=false Dan > Those tests only fail if the closed repo has not got Bob's changes > that switch from MaxRAMFraction to MaxRAMPercentage. > > David > >> Dan >> >> >> On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >>>> On 2 Sep 2017, at 13:03, David Holmes wrote: >>>> >>>> Hi Jesper, >>>> >>>> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>>>> Hi, >>>>> After going through the results of our nightlies it seems we are >>>>> in fairly good shape for integration. There was one issue with a >>>>> typo in a recent fix that caused some failures, this issue was >>>>> resolved yesterday just after the nightly snapshot was taken. >>>> The JPRT job that was used for the nightly testing was not valid. >>>> The repos were out of sync due to a re-run with an intervening >>>> integration job. The MaxRAMFraction failures in the tools tests >>>> below were caused by that. >>> Sigh... I thought we didn't use rerun for integration jobs. >>> >>> Thanks for the heads-up David!? I'll start a new nightly now to get >>> a trustworthy result. >>> >>> /Jesper >>> >>> >>>> David >>>> ----- >>>> >>>>> There is currently one issue that I didn't recognise and at the >>>>> moment it is marked as an integration blocker: >>>>> JDK-8187124 >>>>> TestInterpreterMethodEntries.java: Unable to create shared archive >>>>> file >>>>> This could as well be a problem with the test execution in which >>>>> case it is not a blocker, but someone needs to look into the >>>>> details here. >>>>> There are four test failures that looks slightly different: >>>>> tools/jar/modularJar/Basic.java >>>>> tools/jar/multiRelease/ApiValidatorTest.java >>>>> tools/jar/multiRelease/Basic.java >>>>> tools/launcher/InfoStreams.java >>>>> These four tests fails because they get a warning on stderr: >>>>> Option MaxRAMFraction was deprecated in version 10.0 and will >>>>> likely be removed in a future release. >>>>> I do not consider this a blocker for integration, bug filed: >>>>> JDK-8187125 >>>>> JDK10/hs now has restricted write access. Basically it is locked >>>>> but in order to fix any urgent issues that might pop up over the >>>>> next couple of days these people have write access: Vladimir >>>>> Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >>>>> /Jesper >>> >> From daniel.daugherty at oracle.com Sun Sep 3 04:56:52 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 2 Sep 2017 22:56:52 -0600 Subject: jdk10/hs integration status In-Reply-To: <232afa04-3bf5-1ea5-3462-6b0920b8f0d9@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com>

<232afa04-3bf5-1ea5-3462-6b0920b8f0d9@oracle.com> Message-ID: The test suite execution directory has a jtreg.sh script: 'time' ''/bin/bash'' '/export/home/aginfra/CommonData/jtreg_dir/bin/jtreg' '-testjdk:"/export/home/aginfra/CommonData/TEST_JAVA_HOME"' '-dir:/export/home/aginfra/CommonData/j2se_jdk/jdk//test' '-w:/export/home/aginfra/sandbox/results/workDir' '-r:/export/home/aginfra/sandbox/results/report' '-retain:fail,error' '-status:notRun,error,fail' '-ignore:quiet' '-a' '-javacoptions:' '-javaoption:-Xmixed' '-javaoption:-server' '-javaoption:-XX:MaxRAMPercentage=12.5' ''-k:!ignore'' '-timeout:16' '-verbose:summary' '-nativepath:/export/home/aginfra/sandbox/JTREG_NATIVEPATH_LIBRARY_PREPARED' '-thd:/export/home/aginfra/CommonData/JTREG_EFH_HOME/jtregFailureHandler.jar' '-th:jdk.test.failurehandler.jtreg.GatherProcessInfoTimeoutHandler' '-od:/export/home/aginfra/CommonData/JTREG_EFH_HOME/jtregFailureHandler.jar' '-o:jdk.test.failurehandler.jtreg.GatherDiagnosticInfoObserver' '-J-Djava.library.path='/export/home/aginfra/CommonData/JTREG_EFH_HOME'' '-othervm' '-conc:3' '-vmoptions:'-XX:MaxRAMFraction=6'' '-exclude:/export/home/aginfra/sandbox/results/exclude1.jtx' '-exclude:/export/home/aginfra/sandbox/results/exclude2.jtx' 'tools' so whatever Aurora/RBT used to setup this job added: -XX:MaxRAMFraction=6 Dan On 9/2/17 10:52 PM, Daniel D. Daugherty wrote: > On 9/2/17 10:40 PM, David Holmes wrote: >> On 3/09/2017 12:49 AM, Daniel D. Daugherty wrote: >>> We're not completely out of the woods. These tests: >>> >>> tools/jar/modularJar/Basic.java >>> tools/jar/multiRelease/ApiValidatorTest.java >>> tools/jar/multiRelease/Basic.java >>> tools/launcher/InfoStreams.java >>> >>> still failed in the 2017-09-01 JDK10-hs nightly with: >>> >>> java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit >>> Server VM warning: Option MaxRAMFraction was deprecated in version >>> 10.0 and will likely be removed in a future release. >> >> Isn't that the nightly we're talking about Dan? > > No. The failures that were originally discussed were in > the 2017-08-31 JDK10-hs nightly and were mostly caused by > Calvin's rerun JPRT job. > > I found a couple of places in the current JDK10-hs repo where > MaxRAMFraction is still used, but none that explain the above > four test failures. My conclusion is that they are picking up > the MaxRAMFraction from whatever mechanism is being used to > launch those tests... > > These two kitchensink config files still use MaxRAMFraction: > > ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:test.jvm.args=-XX:MaxRAMFraction=2 > -XX:+CrashOnOutOfMemoryError -Djava.net.preferIPv6Addresses=false > -XX:-PrintVMOptions -XX:+DisplayVMOutputToStderr -XX:+UsePerfData > -Xlog:gc*:gc.log -XX:+DisableExplicitGC -XX:+PrintFlagsFinal > -XX:+StartAttachListener -XX:+UnlockCommercialFeatures > -XX:NativeMemoryTracking=detail -XX:+ResourceManagement > -XX:+FlightRecorder > ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:original.jvm.args=-XX:MaxRAMFraction=8 > -Djava.net.preferIPv6Addresses=false > > Dan > > >> Those tests only fail if the closed repo has not got Bob's changes >> that switch from MaxRAMFraction to MaxRAMPercentage. >> >> David >> >>> Dan >>> >>> >>> On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >>>>> On 2 Sep 2017, at 13:03, David Holmes >>>>> wrote: >>>>> >>>>> Hi Jesper, >>>>> >>>>> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>>>>> Hi, >>>>>> After going through the results of our nightlies it seems we are >>>>>> in fairly good shape for integration. There was one issue with a >>>>>> typo in a recent fix that caused some failures, this issue was >>>>>> resolved yesterday just after the nightly snapshot was taken. >>>>> The JPRT job that was used for the nightly testing was not valid. >>>>> The repos were out of sync due to a re-run with an intervening >>>>> integration job. The MaxRAMFraction failures in the tools tests >>>>> below were caused by that. >>>> Sigh... I thought we didn't use rerun for integration jobs. >>>> >>>> Thanks for the heads-up David!? I'll start a new nightly now to get >>>> a trustworthy result. >>>> >>>> /Jesper >>>> >>>> >>>>> David >>>>> ----- >>>>> >>>>>> There is currently one issue that I didn't recognise and at the >>>>>> moment it is marked as an integration blocker: >>>>>> JDK-8187124 >>>>>> TestInterpreterMethodEntries.java: Unable to create shared >>>>>> archive file >>>>>> This could as well be a problem with the test execution in which >>>>>> case it is not a blocker, but someone needs to look into the >>>>>> details here. >>>>>> There are four test failures that looks slightly different: >>>>>> tools/jar/modularJar/Basic.java >>>>>> tools/jar/multiRelease/ApiValidatorTest.java >>>>>> tools/jar/multiRelease/Basic.java >>>>>> tools/launcher/InfoStreams.java >>>>>> These four tests fails because they get a warning on stderr: >>>>>> Option MaxRAMFraction was deprecated in version 10.0 and will >>>>>> likely be removed in a future release. >>>>>> I do not consider this a blocker for integration, bug filed: >>>>>> JDK-8187125 >>>>>> JDK10/hs now has restricted write access. Basically it is locked >>>>>> but in order to fix any urgent issues that might pop up over the >>>>>> next couple of days these people have write access: Vladimir >>>>>> Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >>>>>> /Jesper >>>> >>> > From daniel.daugherty at oracle.com Sun Sep 3 05:01:19 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Sat, 2 Sep 2017 23:01:19 -0600 Subject: jdk10/hs integration status In-Reply-To: References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> <9657c73c-ded2-c12e-587f-fa18ceda63a7@oracle.com> <8B87D638-C2CC-42CB-8C94-8DE3F4F6AC81@oracle.com>

<232afa04-3bf5-1ea5-3462-6b0920b8f0d9@oracle.com> Message-ID: <93becaf6-6dd5-1471-0200-7a73c8c82156@oracle.com> Based on the Aurora job log for this testsuite run, it looks to me like UTE is being used to execute JTREG which executes these tests. Dan On 9/2/17 10:56 PM, Daniel D. Daugherty wrote: > The test suite execution directory has a jtreg.sh script: > > 'time' ''/bin/bash'' > '/export/home/aginfra/CommonData/jtreg_dir/bin/jtreg' > '-testjdk:"/export/home/aginfra/CommonData/TEST_JAVA_HOME"' > '-dir:/export/home/aginfra/CommonData/j2se_jdk/jdk//test' > '-w:/export/home/aginfra/sandbox/results/workDir' > '-r:/export/home/aginfra/sandbox/results/report' '-retain:fail,error' > '-status:notRun,error,fail' '-ignore:quiet' '-a' '-javacoptions:' > '-javaoption:-Xmixed' '-javaoption:-server' > '-javaoption:-XX:MaxRAMPercentage=12.5' ''-k:!ignore'' '-timeout:16' > '-verbose:summary' > '-nativepath:/export/home/aginfra/sandbox/JTREG_NATIVEPATH_LIBRARY_PREPARED' > '-thd:/export/home/aginfra/CommonData/JTREG_EFH_HOME/jtregFailureHandler.jar' > '-th:jdk.test.failurehandler.jtreg.GatherProcessInfoTimeoutHandler' > '-od:/export/home/aginfra/CommonData/JTREG_EFH_HOME/jtregFailureHandler.jar' > '-o:jdk.test.failurehandler.jtreg.GatherDiagnosticInfoObserver' > '-J-Djava.library.path='/export/home/aginfra/CommonData/JTREG_EFH_HOME'' > '-othervm' '-conc:3' '-vmoptions:'-XX:MaxRAMFraction=6'' > '-exclude:/export/home/aginfra/sandbox/results/exclude1.jtx' > '-exclude:/export/home/aginfra/sandbox/results/exclude2.jtx' 'tools' > > so whatever Aurora/RBT used to setup this job added: -XX:MaxRAMFraction=6 > > Dan > > > On 9/2/17 10:52 PM, Daniel D. Daugherty wrote: >> On 9/2/17 10:40 PM, David Holmes wrote: >>> On 3/09/2017 12:49 AM, Daniel D. Daugherty wrote: >>>> We're not completely out of the woods. These tests: >>>> >>>> tools/jar/modularJar/Basic.java >>>> tools/jar/multiRelease/ApiValidatorTest.java >>>> tools/jar/multiRelease/Basic.java >>>> tools/launcher/InfoStreams.java >>>> >>>> still failed in the 2017-09-01 JDK10-hs nightly with: >>>> >>>> java.lang.AssertionError: Unknown value Java HotSpot(TM) 64-Bit >>>> Server VM warning: Option MaxRAMFraction was deprecated in version >>>> 10.0 and will likely be removed in a future release. >>> >>> Isn't that the nightly we're talking about Dan? >> >> No. The failures that were originally discussed were in >> the 2017-08-31 JDK10-hs nightly and were mostly caused by >> Calvin's rerun JPRT job. >> >> I found a couple of places in the current JDK10-hs repo where >> MaxRAMFraction is still used, but none that explain the above >> four test failures. My conclusion is that they are picking up >> the MaxRAMFraction from whatever mechanism is being used to >> launch those tests... >> >> These two kitchensink config files still use MaxRAMFraction: >> >> ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:test.jvm.args=-XX:MaxRAMFraction=2 >> -XX:+CrashOnOutOfMemoryError -Djava.net.preferIPv6Addresses=false >> -XX:-PrintVMOptions -XX:+DisplayVMOutputToStderr -XX:+UsePerfData >> -Xlog:gc*:gc.log -XX:+DisableExplicitGC -XX:+PrintFlagsFinal >> -XX:+StartAttachListener -XX:+UnlockCommercialFeatures >> -XX:NativeMemoryTracking=detail -XX:+ResourceManagement >> -XX:+FlightRecorder >> ./hotspot/test/closed/applications/kitchensink/kitchensink.default.properties:original.jvm.args=-XX:MaxRAMFraction=8 >> -Djava.net.preferIPv6Addresses=false >> >> Dan >> >> >>> Those tests only fail if the closed repo has not got Bob's changes >>> that switch from MaxRAMFraction to MaxRAMPercentage. >>> >>> David >>> >>>> Dan >>>> >>>> >>>> On 9/2/17 6:30 AM, jesper.wilhelmsson at oracle.com wrote: >>>>>> On 2 Sep 2017, at 13:03, David Holmes >>>>>> wrote: >>>>>> >>>>>> Hi Jesper, >>>>>> >>>>>> On 2/09/2017 8:15 PM, jesper.wilhelmsson at oracle.com wrote: >>>>>>> Hi, >>>>>>> After going through the results of our nightlies it seems we are >>>>>>> in fairly good shape for integration. There was one issue with a >>>>>>> typo in a recent fix that caused some failures, this issue was >>>>>>> resolved yesterday just after the nightly snapshot was taken. >>>>>> The JPRT job that was used for the nightly testing was not valid. >>>>>> The repos were out of sync due to a re-run with an intervening >>>>>> integration job. The MaxRAMFraction failures in the tools tests >>>>>> below were caused by that. >>>>> Sigh... I thought we didn't use rerun for integration jobs. >>>>> >>>>> Thanks for the heads-up David!? I'll start a new nightly now to >>>>> get a trustworthy result. >>>>> >>>>> /Jesper >>>>> >>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> There is currently one issue that I didn't recognise and at the >>>>>>> moment it is marked as an integration blocker: >>>>>>> JDK-8187124 >>>>>>> TestInterpreterMethodEntries.java: Unable to create shared >>>>>>> archive file >>>>>>> This could as well be a problem with the test execution in which >>>>>>> case it is not a blocker, but someone needs to look into the >>>>>>> details here. >>>>>>> There are four test failures that looks slightly different: >>>>>>> tools/jar/modularJar/Basic.java >>>>>>> tools/jar/multiRelease/ApiValidatorTest.java >>>>>>> tools/jar/multiRelease/Basic.java >>>>>>> tools/launcher/InfoStreams.java >>>>>>> These four tests fails because they get a warning on stderr: >>>>>>> Option MaxRAMFraction was deprecated in version 10.0 and will >>>>>>> likely be removed in a future release. >>>>>>> I do not consider this a blocker for integration, bug filed: >>>>>>> JDK-8187125 >>>>>>> JDK10/hs now has restricted write access. Basically it is locked >>>>>>> but in order to fix any urgent issues that might pop up over the >>>>>>> next couple of days these people have write access: Vladimir >>>>>>> Kozlov, Dan Daugherty, Stefan Karlsson, and myself. >>>>>>> /Jesper >>>>> >>>> >> > From rohitarulraj at gmail.com Sun Sep 3 16:42:42 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Sun, 3 Sep 2017 22:12:42 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

Message-ID: Hello Vladimir, On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov wrote: > Hi Rohit, > > On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >> >> Hello Vladimir, >> >>> Changes look good. Only question I have is about MaxVectorSize. It is set >>> > >>> 16 only in presence of AVX: >>> >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>> >>> Does that code works for AMD 17h too? >> >> >> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >> I have removed the surplus check for MaxVectorSize from my patch. I >> have updated, re-tested and attached the patch. > > > Which check you removed? > My older patch had the below mentioned check which was required on JDK9 where the default MaxVectorSize was 64. It has been handled better in openJDK10. So this check is not required anymore. + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { ... ... + if (MaxVectorSize > 32) { + FLAG_SET_DEFAULT(MaxVectorSize, 32); + } .. .. + } >> >> I have one query regarding the setting of UseSHA flag: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >> >> AMD 17h has support for SHA. >> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >> enabled for it based on the availability of BMI2 and AVX2. Is there an >> underlying reason for this? I have handled this in the patch but just >> wanted to confirm. > > > It was done with next changes which use only AVX2 and BMI2 instructions to > calculate SHA-256: > > http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 > > I don't know if AMD 15h supports these instructions and can execute that > code. You need to test it. > Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, it should work. Confirmed by running following sanity tests: ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java So I have removed those SHA checks from my patch too. Please find attached updated, re-tested patch. diff --git a/src/cpu/x86/vm/vm_version_x86.cpp b/src/cpu/x86/vm/vm_version_x86.cpp --- a/src/cpu/x86/vm/vm_version_x86.cpp +++ b/src/cpu/x86/vm/vm_version_x86.cpp @@ -1109,11 +1109,27 @@ } #ifdef COMPILER2 - if (MaxVectorSize > 16) { - // Limit vectors size to 16 bytes on current AMD cpus. + if (cpu_family() < 0x17 && MaxVectorSize > 16) { + // Limit vectors size to 16 bytes on AMD cpus < 17h. FLAG_SET_DEFAULT(MaxVectorSize, 16); } #endif // COMPILER2 + + // Some defaults for AMD family 17h + if ( cpu_family() == 0x17 ) { + // On family 17h processors use XMM and UnalignedLoadStores for Array Copy + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); + } + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); + } +#ifdef COMPILER2 + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { + FLAG_SET_DEFAULT(UseFPUForSpilling, true); + } +#endif + } } if( is_intel() ) { // Intel cpus specific settings diff --git a/src/cpu/x86/vm/vm_version_x86.hpp b/src/cpu/x86/vm/vm_version_x86.hpp --- a/src/cpu/x86/vm/vm_version_x86.hpp +++ b/src/cpu/x86/vm/vm_version_x86.hpp @@ -505,6 +505,14 @@ result |= CPU_CLMUL; if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) result |= CPU_RTM; + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) + result |= CPU_ADX; + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) + result |= CPU_BMI2; + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) + result |= CPU_SHA; + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) + result |= CPU_FMA; // AMD features. if (is_amd()) { @@ -515,19 +523,13 @@ result |= CPU_LZCNT; if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) result |= CPU_SSE4A; + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) + result |= CPU_HT; } // Intel features. if(is_intel()) { - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) - result |= CPU_ADX; - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) - result |= CPU_BMI2; - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) - result |= CPU_SHA; if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) result |= CPU_LZCNT; - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) - result |= CPU_FMA; // for Intel, ecx.bits.misalignsse bit (bit 8) indicates support for prefetchw if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { result |= CPU_3DNOW_PREFETCH; Please let me know your comments. Thanks for your time. Rohit >> >> Thanks for taking time to review the code. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1088,6 +1088,22 @@ >> } >> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >> } >> + if (supports_sha()) { >> + if (FLAG_IS_DEFAULT(UseSHA)) { >> + FLAG_SET_DEFAULT(UseSHA, true); >> + } >> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >> UseSHA512Intrinsics) { >> + if (!FLAG_IS_DEFAULT(UseSHA) || >> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + warning("SHA instructions are not available on this CPU"); >> + } >> + FLAG_SET_DEFAULT(UseSHA, false); >> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> >> // some defaults for AMD family 15h >> if ( cpu_family() == 0x15 ) { >> @@ -1109,11 +1125,40 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> + } >> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >> + } >> + if (UseSHA) { >> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } else if (UseSHA512Intrinsics) { >> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >> functions not available on this CPU."); >> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >> + } >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2()) { >> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -505,6 +505,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -515,19 +523,13 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> >> >> Regards, >> Rohit >> >> >> >>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>> >>>> >>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>> wrote: >>>>> >>>>> >>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Rohit, >>>>>> >>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>> logic >>>>>> around UseSHA in vm_version_x86.cpp. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>> >>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>> resubmit for review. >>>>> >>>>> Regards, >>>>> Rohit >>>>> >>>> >>>> Hi All, >>>> >>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>> 13519:71337910df60), did regression testing using jtreg ($make >>>> default) and didnt find any regressions. >>>> >>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>> defaults for newer AMD 17h (EPYC) processor? >>>> >>>> ************************* Patch **************************** >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1088,6 +1088,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1109,11 +1125,43 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + UseXMMForArrayCopy = true; >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + UseUnalignedLoadStores = true; >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + UseBMI2Instructions = true; >>>> + } >>>> + if (MaxVectorSize > 32) { >>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -505,6 +505,14 @@ >>>> result |= CPU_CLMUL; >>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> result |= CPU_RTM; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> >>>> // AMD features. >>>> if (is_amd()) { >>>> @@ -515,19 +523,13 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> - result |= CPU_ADX; >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> - result |= CPU_BMI2; >>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> - result |= CPU_SHA; >>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> result |= CPU_LZCNT; >>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> - result |= CPU_FMA; >>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> result |= CPU_3DNOW_PREFETCH; >>>> >>>> ************************************************************** >>>> >>>> Thanks, >>>> Rohit >>>> >>>>>> >>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Rohit, >>>>>>>> >>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>> sets >>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>> with >>>>>>>>> the commit process. >>>>>>>>> >>>>>>>>> Webrev: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>> OpenJDK >>>>>>>> infrastructure and ... >>>>>>>> >>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ... unfortunately patches tend to get stripped by the mail servers. >>>>>>>> If >>>>>>>> the >>>>>>>> patch is small please include it inline. Otherwise you will need to >>>>>>>> find >>>>>>>> an >>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>> >>>>>>> >>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>> didnt find any regressions. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>> testing >>>>>>>> requirements. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks David, >>>>>>> Yes, it's a small patch. >>>>>>> >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>> } >>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>> } >>>>>>> + if (supports_sha()) { >>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>> + } >>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>>> UseSHA512Intrinsics) { >>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>> + } >>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } >>>>>>> >>>>>>> // some defaults for AMD family 15h >>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>> } >>>>>>> >>>>>>> #ifdef COMPILER2 >>>>>>> - if (MaxVectorSize > 16) { >>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>> } >>>>>>> #endif // COMPILER2 >>>>>>> + >>>>>>> + // Some defaults for AMD family 17h >>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>> for >>>>>>> Array Copy >>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>> + UseXMMForArrayCopy = true; >>>>>>> + } >>>>>>> + if (supports_sse2() && >>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>> { >>>>>>> + UseUnalignedLoadStores = true; >>>>>>> + } >>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>> + UseBMI2Instructions = true; >>>>>>> + } >>>>>>> + if (MaxVectorSize > 32) { >>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>> + } >>>>>>> + if (UseSHA) { >>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>> functions not available on this CPU."); >>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>> + } >>>>>>> + } >>>>>>> +#ifdef COMPILER2 >>>>>>> + if (supports_sse4_2()) { >>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>> + } >>>>>>> + } >>>>>>> +#endif >>>>>>> + } >>>>>>> } >>>>>>> >>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>> @@ -513,6 +513,16 @@ >>>>>>> result |= CPU_LZCNT; >>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>> result |= CPU_SSE4A; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>> + result |= CPU_BMI2; >>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>> + result |= CPU_HT; >>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>> + result |= CPU_ADX; >>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>> + result |= CPU_SHA; >>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>> + result |= CPU_FMA; >>>>>>> } >>>>>>> // Intel features. >>>>>>> if(is_intel()) { >>>>>>> >>>>>>> Regards, >>>>>>> Rohit >>>>>>> >>>>>> >>> > From jesper.wilhelmsson at oracle.com Sun Sep 3 20:02:14 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Sun, 3 Sep 2017 22:02:14 +0200 Subject: jdk10/hs integration status In-Reply-To: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> References: <29BFCC6C-67A3-4E57-A114-6FDA8A877374@oracle.com> Message-ID: <47E6B89B-D08C-46AB-B8ED-B2FE02F5D8CC@oracle.com> Hi, JDK-8187124 is no longer considered a blocker. Thanks to everyone involved in the investigation! The integration is now completed. jdk10/hs will now remain closed until the repo consolidation is done. Thanks, /Jesper > On 2 Sep 2017, at 12:15, jesper.wilhelmsson at oracle.com wrote: > > Hi, > > After going through the results of our nightlies it seems we are in fairly good shape for integration. There was one issue with a typo in a recent fix that caused some failures, this issue was resolved yesterday just after the nightly snapshot was taken. > > > There is currently one issue that I didn't recognise and at the moment it is marked as an integration blocker: > > JDK-8187124 > TestInterpreterMethodEntries.java: Unable to create shared archive file > > This could as well be a problem with the test execution in which case it is not a blocker, but someone needs to look into the details here. > > > There are four test failures that looks slightly different: > tools/jar/modularJar/Basic.java > tools/jar/multiRelease/ApiValidatorTest.java > tools/jar/multiRelease/Basic.java > tools/launcher/InfoStreams.java > > These four tests fails because they get a warning on stderr: > Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release. > I do not consider this a blocker for integration, bug filed: JDK-8187125 > > > JDK10/hs now has restricted write access. Basically it is locked but in order to fix any urgent issues that might pop up over the next couple of days these people have write access: Vladimir Kozlov, Dan Daugherty, Stefan Karlsson, and myself. > > /Jesper > From david.holmes at oracle.com Mon Sep 4 01:24:32 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 4 Sep 2017 11:24:32 +1000 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59A93B53.9010505@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> Message-ID: <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> Hi Erik, On 1/09/2017 8:49 PM, Erik ?sterlund wrote: > Hi David, > > The shared structure for all operations is the following: > > An Atomic::something call creates a SomethingImpl function object that > performs some basic type checking and then forwards the call straight to > a PlatformSomething function object. This PlatformSomething object could > decide to do anything. But to make life easier, it may inherit from a > shared SomethingHelper function object with CRTP that calls back into > the PlatformSomething function object to emit inline assembly. Right, but! Lets look at some details. Atomic::add AddImpl PlatformAdd FetchAndAdd AddAndFetch add_using_helper Atomic::cmpxchg CmpxchgImpl PlatformCmpxchg cmpxchg_using_helper Atomic::inc IncImpl PlatformInc IncUsingConstant Why is it that the simplest operation (inc/dec) has the most complex platform template definition? Why do we need Adjustment? You previously said "Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates". But add passes around values and doesn't need this. Further inc/dec don't need to pass anything around anywhere - inc adds 1, dec subtracts 1! This "1" does not need to appear anywhere in the API or get passed across layers - the only place this "1" becomes evident is in the actual platform asm that does the logic of "add 1" or "subtract 1". My understanding from previous discussions is that much of the template machinations was to deal with type management for "dest" and the values being passed around. But here, for inc/dec there are no values being passed so we don't have to make "dest" type-compatible with any value. Cheers, David ----- > Hope this explanation helps understanding the intended structure of this > work. > > Thanks, > /Erik > > On 2017-09-01 12:34, David Holmes wrote: >> Hi Erik, >> >> I just wanted to add that I would expect the cmpxchg, add and inc, >> Atomic API's to all require similar basic structure for manipulating >> types/values etc, yet all three seem to have quite different >> structures that I find very confusing. I'm still at a loss to fathom >> the CRTP and the hoops we seemingly have to jump through just to add >> or subtract 1!!! >> >> Cheers, >> David >> >> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>> Hi David, >>> >>> On 2017-09-01 02:49, David Holmes wrote: >>>> Hi Erik, >>>> >>>> Sorry but this one is really losing me. >>>> >>>> What is the role of Adjustment ?? >>> >>> Adjustment represents the increment/decrement value as an >>> IntegralConstant - your template friend for passing around a constant >>> with both a specified type and value in templates. The type of the >>> increment/decrement is the type of the destination when the >>> destination is an integral type, otherwise if it is a pointer type, >>> the increment/decrement type is ptrdiff_t. >>> >>>> How are inc/dec anything but "using constant" ?? >>> >>> I was also a bit torn on that name (I assume you are referring to >>> IncUsingConstant/DecUsingConstant). It was hard to find a name that >>> depicted what this platform helper does. I considered calling the >>> helper something with immediate in the name because it is really used >>> to embed the constant as immediate values in inline assembly today. >>> But then again that seemed too specific, as it is not completely >>> obvious platform specializations will use it in that way. One might >>> just want to specialize this to send it into some compiler >>> Atomic::inc intrinsic for example. Do you have any other preferred >>> names? Here are a few possible names for IncUsingConstant: >>> >>> IncUsingScaledConstant >>> IncUsingAdjustedConstant >>> IncUsingPlatformHelper >>> >>> Any favourites? >>> >>>> Why do we special case jshort?? >>> >>> To be consistent with the special case of Atomic::add on jshort. Do >>> you want it removed? >>> >>>> This is indecipherable to normal people ;-) >>>> >>>> ?This()->template inc(dest); >>>> >>>> For something as trivial as adding or subtracting 1 the template >>>> machinations here are just mind boggling! >>> >>> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. >>> The idea is to devirtualize a virtual call by passing in the derived >>> type as a template parameter to a base class, and then let the base >>> class static_cast to the derived class to devirtualize the call. I >>> hope this explanation sheds some light on what is going on. The same >>> CRTP idiom was used in the Atomic::add implementation in a similar >>> fashion. >>> >>> I will add some comments describing this in the next round after >>> Coleen replies. >>> >>> Thanks for looking at this. >>> >>> /Erik >>> >>>> >>>> Cheers, >>>> David >>>> >>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>> Hi everyone, >>>>> >>>>> Bug ID: >>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>> >>>>> The time has come for the next step in generalizing Atomic with >>>>> templates. Today I will focus on Atomic::inc/dec. >>>>> >>>>> I have tried to mimic the new Kim style that seems to have been >>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>> structure looks like this: >>>>> >>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>> object that performs some basic type checks. >>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>> define the operation arbitrarily for a given platform. The default >>>>> implementation if not specialized for a platform is to call >>>>> Atomic::add. So only platforms that want to do something different >>>>> than that as an optimization have to provide a specialization. >>>>> Layer 3) Platforms that decide to specialize >>>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>>> performing the necessary computation what the increment/decrement >>>>> should be after pointer scaling using CRTP. The >>>>> PlatformInc/PlatformDec operation then only needs to define an >>>>> inc/dec member function, and will then get all the context >>>>> information necessary to generate a more optimized implementation. >>>>> Easy peasy. >>>>> >>>>> It is worth noticing that the generalized Atomic::dec operation >>>>> assumes a two's complement integer machine and potentially sends >>>>> the unary negative of a potentially unsigned type to Atomic::add. I >>>>> have the following comments about this: >>>>> 1) We already assume in other code that two's complement integers >>>>> must be present. >>>>> 2) A machine that does not have two's complement integers may still >>>>> simply provide a specialization that solves the problem in a >>>>> different way. >>>>> 3) The alternative that does not make assumptions about that would >>>>> use the good old IntegerTypes::cast_to_signed metaprogramming >>>>> stuff, and I seem to recall we thought that was a bit too involved >>>>> and complicated. >>>>> This is the reason why I have chosen to use unary minus on the >>>>> potentially unsigned type in the shared helper code that sends the >>>>> decrement as an addend to Atomic::add. >>>>> >>>>> It would also be nice if somebody with access to PPC and s390 >>>>> machines could try out the relevant changes there so I do not >>>>> accidentally break those platforms. I have blind-coded the addition >>>>> of the immediate values passed in to the inline assembly in a way >>>>> that I think looks like it should work. >>>>> >>>>> Testing: >>>>> RBT hs-tier3, JPRT --testset hotspot >>>>> >>>>> Thanks, >>>>> /Erik >>> > From vladimir.kozlov at oracle.com Mon Sep 4 02:39:15 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 3 Sep 2017 19:39:15 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

Message-ID: <998a7014-4199-26b0-a8f5-20441f4d3f04@oracle.com> Looks good. Currently jdk10 repository is undergoing "consolidation" update. It may take 2 weeks. You need to wait when we can push your changes. Regards, Vladimir On 9/3/17 9:42 AM, Rohit Arul Raj wrote: > Hello Vladimir, > > On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov > wrote: >> Hi Rohit, >> >> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>> >>> Hello Vladimir, >>> >>>> Changes look good. Only question I have is about MaxVectorSize. It is set >>>>> >>>> 16 only in presence of AVX: >>>> >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>> >>>> Does that code works for AMD 17h too? >>> >>> >>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >>> I have removed the surplus check for MaxVectorSize from my patch. I >>> have updated, re-tested and attached the patch. >> >> >> Which check you removed? >> > > My older patch had the below mentioned check which was required on > JDK9 where the default MaxVectorSize was 64. It has been handled > better in openJDK10. So this check is not required anymore. > > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > ... > ... > + if (MaxVectorSize > 32) { > + FLAG_SET_DEFAULT(MaxVectorSize, 32); > + } > .. > .. > + } > >>> >>> I have one query regarding the setting of UseSHA flag: >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>> >>> AMD 17h has support for SHA. >>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>> enabled for it based on the availability of BMI2 and AVX2. Is there an >>> underlying reason for this? I have handled this in the patch but just >>> wanted to confirm. >> >> >> It was done with next changes which use only AVX2 and BMI2 instructions to >> calculate SHA-256: >> >> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >> >> I don't know if AMD 15h supports these instructions and can execute that >> code. You need to test it. >> > > Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, > it should work. > Confirmed by running following sanity tests: > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > > So I have removed those SHA checks from my patch too. > > Please find attached updated, re-tested patch. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -1109,11 +1109,27 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > +#ifdef COMPILER2 > + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -505,6 +505,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -515,19 +523,13 @@ > result |= CPU_LZCNT; > if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) > result |= CPU_SSE4A; > + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) > + result |= CPU_HT; > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > > Please let me know your comments. > > Thanks for your time. > Rohit > >>> >>> Thanks for taking time to review the code. >>> >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -1088,6 +1088,22 @@ >>> } >>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>> } >>> + if (supports_sha()) { >>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>> + FLAG_SET_DEFAULT(UseSHA, true); >>> + } >>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>> UseSHA512Intrinsics) { >>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + warning("SHA instructions are not available on this CPU"); >>> + } >>> + FLAG_SET_DEFAULT(UseSHA, false); >>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> >>> // some defaults for AMD family 15h >>> if ( cpu_family() == 0x15 ) { >>> @@ -1109,11 +1125,40 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>> + } >>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>> + } >>> + if (UseSHA) { >>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } else if (UseSHA512Intrinsics) { >>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>> functions not available on this CPU."); >>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>> + } >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2()) { >>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -505,6 +505,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -515,19 +523,13 @@ >>> result |= CPU_LZCNT; >>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>> result |= CPU_SSE4A; >>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>> + result |= CPU_HT; >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> >>> >>> Regards, >>> Rohit >>> >>> >>> >>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>> >>>>> >>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>> wrote: >>>>>> >>>>>> >>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Rohit, >>>>>>> >>>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>>> logic >>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>> >>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>>> resubmit for review. >>>>>> >>>>>> Regards, >>>>>> Rohit >>>>>> >>>>> >>>>> Hi All, >>>>> >>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>> default) and didnt find any regressions. >>>>> >>>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>>> defaults for newer AMD 17h (EPYC) processor? >>>>> >>>>> ************************* Patch **************************** >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -1088,6 +1088,22 @@ >>>>> } >>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>> } >>>>> + if (supports_sha()) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>> + } >>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>> UseSHA512Intrinsics) { >>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + warning("SHA instructions are not available on this CPU"); >>>>> + } >>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> >>>>> // some defaults for AMD family 15h >>>>> if ( cpu_family() == 0x15 ) { >>>>> @@ -1109,11 +1125,43 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + UseXMMForArrayCopy = true; >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>> + UseUnalignedLoadStores = true; >>>>> + } >>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>> + UseBMI2Instructions = true; >>>>> + } >>>>> + if (MaxVectorSize > 32) { >>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>> + } >>>>> + if (UseSHA) { >>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } else if (UseSHA512Intrinsics) { >>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>> functions not available on this CPU."); >>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>> + } >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2()) { >>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -505,6 +505,14 @@ >>>>> result |= CPU_CLMUL; >>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>> result |= CPU_RTM; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> >>>>> // AMD features. >>>>> if (is_amd()) { >>>>> @@ -515,19 +523,13 @@ >>>>> result |= CPU_LZCNT; >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>> result |= CPU_SSE4A; >>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>> + result |= CPU_HT; >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> - result |= CPU_ADX; >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> - result |= CPU_BMI2; >>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> - result |= CPU_SHA; >>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>> result |= CPU_LZCNT; >>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> - result |= CPU_FMA; >>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>> support for prefetchw >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>> result |= CPU_3DNOW_PREFETCH; >>>>> >>>>> ************************************************************** >>>>> >>>>> Thanks, >>>>> Rohit >>>>> >>>>>>> >>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Rohit, >>>>>>>>> >>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>>> sets >>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>>> with >>>>>>>>>> the commit process. >>>>>>>>>> >>>>>>>>>> Webrev: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>>> OpenJDK >>>>>>>>> infrastructure and ... >>>>>>>>> >>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ... unfortunately patches tend to get stripped by the mail servers. >>>>>>>>> If >>>>>>>>> the >>>>>>>>> patch is small please include it inline. Otherwise you will need to >>>>>>>>> find >>>>>>>>> an >>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>> >>>>>>>> >>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>>> didnt find any regressions. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>> testing >>>>>>>>> requirements. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks David, >>>>>>>> Yes, it's a small patch. >>>>>>>> >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>> } >>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>> } >>>>>>>> + if (supports_sha()) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>> + } >>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>>>> UseSHA512Intrinsics) { >>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>> + } >>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } >>>>>>>> >>>>>>>> // some defaults for AMD family 15h >>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>> } >>>>>>>> >>>>>>>> #ifdef COMPILER2 >>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>> } >>>>>>>> #endif // COMPILER2 >>>>>>>> + >>>>>>>> + // Some defaults for AMD family 17h >>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>> for >>>>>>>> Array Copy >>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>> + } >>>>>>>> + if (supports_sse2() && >>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>> { >>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>> + } >>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>> + UseBMI2Instructions = true; >>>>>>>> + } >>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>> + } >>>>>>>> + if (UseSHA) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>> functions not available on this CPU."); >>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +#ifdef COMPILER2 >>>>>>>> + if (supports_sse4_2()) { >>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>> + } >>>>>>>> + } >>>>>>>> +#endif >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>> result |= CPU_LZCNT; >>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>> result |= CPU_SSE4A; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> + result |= CPU_BMI2; >>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>> + result |= CPU_HT; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> + result |= CPU_ADX; >>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> + result |= CPU_SHA; >>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> + result |= CPU_FMA; >>>>>>>> } >>>>>>>> // Intel features. >>>>>>>> if(is_intel()) { >>>>>>>> >>>>>>>> Regards, >>>>>>>> Rohit >>>>>>>> >>>>>>> >>>> >> > From rohitarulraj at gmail.com Mon Sep 4 03:59:09 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Mon, 4 Sep 2017 09:29:09 +0530 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <998a7014-4199-26b0-a8f5-20441f4d3f04@oracle.com> References: <9016267b-3a6e-b684-4263-2c95b2f630ff@oracle.com>

<998a7014-4199-26b0-a8f5-20441f4d3f04@oracle.com> Message-ID: On Mon, Sep 4, 2017 at 8:09 AM, Vladimir Kozlov wrote: > Looks good. > > Currently jdk10 repository is undergoing "consolidation" update. It may take > 2 weeks. You need to wait when we can push your changes. > Sure Vladimir, Thanks for the support. Regards, Rohit > > On 9/3/17 9:42 AM, Rohit Arul Raj wrote: >> >> Hello Vladimir, >> >> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >> wrote: >>> >>> Hi Rohit, >>> >>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>> >>>> >>>> Hello Vladimir, >>>> >>>>> Changes look good. Only question I have is about MaxVectorSize. It is >>>>> set >>>>>> >>>>>> >>>>> 16 only in presence of AVX: >>>>> >>>>> >>>>> >>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>> >>>>> Does that code works for AMD 17h too? >>>> >>>> >>>> >>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So >>>> I have removed the surplus check for MaxVectorSize from my patch. I >>>> have updated, re-tested and attached the patch. >>> >>> >>> >>> Which check you removed? >>> >> >> My older patch had the below mentioned check which was required on >> JDK9 where the default MaxVectorSize was 64. It has been handled >> better in openJDK10. So this check is not required anymore. >> >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> ... >> ... >> + if (MaxVectorSize > 32) { >> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >> + } >> .. >> .. >> + } >> >>>> >>>> I have one query regarding the setting of UseSHA flag: >>>> >>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>> >>>> AMD 17h has support for SHA. >>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag gets >>>> enabled for it based on the availability of BMI2 and AVX2. Is there an >>>> underlying reason for this? I have handled this in the patch but just >>>> wanted to confirm. >>> >>> >>> >>> It was done with next changes which use only AVX2 and BMI2 instructions >>> to >>> calculate SHA-256: >>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>> >>> I don't know if AMD 15h supports these instructions and can execute that >>> code. You need to test it. >>> >> >> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions, >> it should work. >> Confirmed by running following sanity tests: >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >> >> So I have removed those SHA checks from my patch too. >> >> Please find attached updated, re-tested patch. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -1109,11 +1109,27 @@ >> } >> >> #ifdef COMPILER2 >> - if (MaxVectorSize > 16) { >> - // Limit vectors size to 16 bytes on current AMD cpus. >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> #endif // COMPILER2 >> + >> + // Some defaults for AMD family 17h >> + if ( cpu_family() == 0x17 ) { >> + // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> + } >> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> + } >> +#ifdef COMPILER2 >> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> + } >> +#endif >> + } >> } >> >> if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -505,6 +505,14 @@ >> result |= CPU_CLMUL; >> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> result |= CPU_RTM; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> + result |= CPU_ADX; >> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> + result |= CPU_BMI2; >> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> + result |= CPU_SHA; >> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> + result |= CPU_FMA; >> >> // AMD features. >> if (is_amd()) { >> @@ -515,19 +523,13 @@ >> result |= CPU_LZCNT; >> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >> result |= CPU_SSE4A; >> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >> + result |= CPU_HT; >> } >> // Intel features. >> if(is_intel()) { >> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> - result |= CPU_ADX; >> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> - result |= CPU_BMI2; >> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> - result |= CPU_SHA; >> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> result |= CPU_LZCNT; >> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> - result |= CPU_FMA; >> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> result |= CPU_3DNOW_PREFETCH; >> >> Please let me know your comments. >> >> Thanks for your time. >> Rohit >> >>>> >>>> Thanks for taking time to review the code. >>>> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -1088,6 +1088,22 @@ >>>> } >>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>> } >>>> + if (supports_sha()) { >>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>> + } >>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>> UseSHA512Intrinsics) { >>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + warning("SHA instructions are not available on this CPU"); >>>> + } >>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> >>>> // some defaults for AMD family 15h >>>> if ( cpu_family() == 0x15 ) { >>>> @@ -1109,11 +1125,40 @@ >>>> } >>>> >>>> #ifdef COMPILER2 >>>> - if (MaxVectorSize > 16) { >>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> #endif // COMPILER2 >>>> + >>>> + // Some defaults for AMD family 17h >>>> + if ( cpu_family() == 0x17 ) { >>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>> + } >>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>> + } >>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>> + } >>>> + if (UseSHA) { >>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } else if (UseSHA512Intrinsics) { >>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>> functions not available on this CPU."); >>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>> + } >>>> + } >>>> +#ifdef COMPILER2 >>>> + if (supports_sse4_2()) { >>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> + } >>>> + } >>>> +#endif >>>> + } >>>> } >>>> >>>> if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -505,6 +505,14 @@ >>>> result |= CPU_CLMUL; >>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> result |= CPU_RTM; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> + result |= CPU_ADX; >>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> + result |= CPU_BMI2; >>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> + result |= CPU_SHA; >>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> + result |= CPU_FMA; >>>> >>>> // AMD features. >>>> if (is_amd()) { >>>> @@ -515,19 +523,13 @@ >>>> result |= CPU_LZCNT; >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>> result |= CPU_SSE4A; >>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>> + result |= CPU_HT; >>>> } >>>> // Intel features. >>>> if(is_intel()) { >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> - result |= CPU_ADX; >>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> - result |= CPU_BMI2; >>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> - result |= CPU_SHA; >>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> result |= CPU_LZCNT; >>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> - result |= CPU_FMA; >>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> result |= CPU_3DNOW_PREFETCH; >>>> >>>> >>>> Regards, >>>> Rohit >>>> >>>> >>>> >>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Rohit, >>>>>>>> >>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of >>>>>>>> logic >>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>> >>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and >>>>>>> resubmit for review. >>>>>>> >>>>>>> Regards, >>>>>>> Rohit >>>>>>> >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>> 13519:71337910df60), did regression testing using jtreg ($make >>>>>> default) and didnt find any regressions. >>>>>> >>>>>> Can anyone please volunteer to review this patch which sets flag/ISA >>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>> >>>>>> ************************* Patch **************************** >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -1088,6 +1088,22 @@ >>>>>> } >>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>> } >>>>>> + if (supports_sha()) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>> + } >>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics || >>>>>> UseSHA512Intrinsics) { >>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>> + } >>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> >>>>>> // some defaults for AMD family 15h >>>>>> if ( cpu_family() == 0x15 ) { >>>>>> @@ -1109,11 +1125,43 @@ >>>>>> } >>>>>> >>>>>> #ifdef COMPILER2 >>>>>> - if (MaxVectorSize > 16) { >>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> } >>>>>> #endif // COMPILER2 >>>>>> + >>>>>> + // Some defaults for AMD family 17h >>>>>> + if ( cpu_family() == 0x17 ) { >>>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>>> Array Copy >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> + UseXMMForArrayCopy = true; >>>>>> + } >>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>> { >>>>>> + UseUnalignedLoadStores = true; >>>>>> + } >>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>> + UseBMI2Instructions = true; >>>>>> + } >>>>>> + if (MaxVectorSize > 32) { >>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>> + } >>>>>> + if (UseSHA) { >>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } else if (UseSHA512Intrinsics) { >>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>> functions not available on this CPU."); >>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>> + } >>>>>> + } >>>>>> +#ifdef COMPILER2 >>>>>> + if (supports_sse4_2()) { >>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> + } >>>>>> + } >>>>>> +#endif >>>>>> + } >>>>>> } >>>>>> >>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -505,6 +505,14 @@ >>>>>> result |= CPU_CLMUL; >>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>> result |= CPU_RTM; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> + result |= CPU_ADX; >>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> + result |= CPU_BMI2; >>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> + result |= CPU_SHA; >>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> + result |= CPU_FMA; >>>>>> >>>>>> // AMD features. >>>>>> if (is_amd()) { >>>>>> @@ -515,19 +523,13 @@ >>>>>> result |= CPU_LZCNT; >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>> result |= CPU_SSE4A; >>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>> + result |= CPU_HT; >>>>>> } >>>>>> // Intel features. >>>>>> if(is_intel()) { >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> - result |= CPU_ADX; >>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> - result |= CPU_BMI2; >>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> - result |= CPU_SHA; >>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>> result |= CPU_LZCNT; >>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> - result |= CPU_FMA; >>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>> support for prefetchw >>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>> >>>>>> ************************************************************** >>>>>> >>>>>> Thanks, >>>>>> Rohit >>>>>> >>>>>>>> >>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Rohit, >>>>>>>>>> >>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which >>>>>>>>>>> sets >>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us >>>>>>>>>>> with >>>>>>>>>>> the commit process. >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Unfortunately patches can not be accepted from systems outside the >>>>>>>>>> OpenJDK >>>>>>>>>> infrastructure and ... >>>>>>>>>> >>>>>>>>>>> I have also attached the patch (hg diff -g) for reference. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ... unfortunately patches tend to get stripped by the mail >>>>>>>>>> servers. >>>>>>>>>> If >>>>>>>>>> the >>>>>>>>>> patch is small please include it inline. Otherwise you will need >>>>>>>>>> to >>>>>>>>>> find >>>>>>>>>> an >>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net. >>>>>>>>>> >>>>>>>>> >>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and >>>>>>>>>>> didnt find any regressions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on >>>>>>>>>> testing >>>>>>>>>> requirements. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks David, >>>>>>>>> Yes, it's a small patch. >>>>>>>>> >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>> } >>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>> } >>>>>>>>> + if (supports_sha()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>> + } >>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics >>>>>>>>> || >>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + warning("SHA instructions are not available on this CPU"); >>>>>>>>> + } >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> >>>>>>>>> // some defaults for AMD family 15h >>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>> } >>>>>>>>> >>>>>>>>> #ifdef COMPILER2 >>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>> } >>>>>>>>> #endif // COMPILER2 >>>>>>>>> + >>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>> for >>>>>>>>> Array Copy >>>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>> { >>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>> + } >>>>>>>>> + if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>> { >>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>> + } >>>>>>>>> + if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>> { >>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>> + } >>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>> + } >>>>>>>>> + if (UseSHA) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto hash >>>>>>>>> functions not available on this CPU."); >>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#ifdef COMPILER2 >>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> +#endif >>>>>>>>> + } >>>>>>>>> } >>>>>>>>> >>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>> result |= CPU_LZCNT; >>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>> result |= CPU_SSE4A; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> + result |= CPU_BMI2; >>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>> + result |= CPU_HT; >>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> + result |= CPU_ADX; >>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> + result |= CPU_SHA; >>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> + result |= CPU_FMA; >>>>>>>>> } >>>>>>>>> // Intel features. >>>>>>>>> if(is_intel()) { >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>> >>>>> >>> >> > From erik.osterlund at oracle.com Mon Sep 4 07:15:02 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 09:15:02 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> Message-ID: <59ACFD76.3000606@oracle.com> Hi David, On 2017-09-04 03:24, David Holmes wrote: > Hi Erik, > > On 1/09/2017 8:49 PM, Erik ?sterlund wrote: >> Hi David, >> >> The shared structure for all operations is the following: >> >> An Atomic::something call creates a SomethingImpl function object >> that performs some basic type checking and then forwards the call >> straight to a PlatformSomething function object. This >> PlatformSomething object could decide to do anything. But to make >> life easier, it may inherit from a shared SomethingHelper function >> object with CRTP that calls back into the PlatformSomething function >> object to emit inline assembly. > > Right, but! Lets look at some details. > > Atomic::add > AddImpl > PlatformAdd > FetchAndAdd > AddAndFetch > add_using_helper > > Atomic::cmpxchg > CmpxchgImpl > PlatformCmpxchg > cmpxchg_using_helper > > Atomic::inc > IncImpl > PlatformInc > IncUsingConstant > > Why is it that the simplest operation (inc/dec) has the most complex > platform template definition? Why do we need Adjustment? You > previously said "Adjustment represents the increment/decrement value > as an IntegralConstant - your template friend for passing around a > constant with both a specified type and value in templates". But add > passes around values and doesn't need this. Further inc/dec don't need > to pass anything around anywhere - inc adds 1, dec subtracts 1! This > "1" does not need to appear anywhere in the API or get passed across > layers - the only place this "1" becomes evident is in the actual > platform asm that does the logic of "add 1" or "subtract 1". > > My understanding from previous discussions is that much of the > template machinations was to deal with type management for "dest" and > the values being passed around. But here, for inc/dec there are no > values being passed so we don't have to make "dest" type-compatible > with any value. Dealing with different types being passed in is one part of the problem - a problem that almost all operations seems to have. But Atomic::add and inc/dec have more problems to deal with. The Atomic::add operation has two more problems that cmpxchg does not have. 1) It needs to scale pointer arithmetic. So if you have a P* and you add it by 2, then you really add the underlying value by 2 * sizeof(P), and the scaled addend needs to be of the right type - the type of the destination for integral types and ptrdiff_t for pointers. This is similar semantics to ++pointer. 2) It connects backends with different semantics - either fetch_and_add or add_and_fetch to a common public interface with add_and_fetch semantics. This is the reason that Atomic::add might appear more complicated than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type problems to deal with - no pointer arithmetics. The reason why Atomic::inc/dec looks more complicated than Atomic::add is that it needs to preserve the pointer arithmetic as constants rather than values, because the scaled addend is embedded in the inline assembly as immediate values. Therefore it passes around an IntegralConstant that embeds both the type and size of the addend. And it is not just 1/-1. For integral destinations the constant used is 1/-1 of the type stored at the destination. For pointers the constant is ptrdiff_t with a value representing the size of the element pointed to. Having said that - I am not opposed to simply removing the specializations of inc/dec if we are scared of the complexity of passing this constant to the platform layer. After running a bunch of benchmarks over the weekend, it showed no significant regressions after removal. Now of course that might not tell the full story - it could have missed that some critical operation in the JVM takes longer. But I would be very surprised if that was the case. Thanks, /Erik > > Cheers, > David > ----- > >> Hope this explanation helps understanding the intended structure of >> this work. >> >> Thanks, >> /Erik >> >> On 2017-09-01 12:34, David Holmes wrote: >>> Hi Erik, >>> >>> I just wanted to add that I would expect the cmpxchg, add and inc, >>> Atomic API's to all require similar basic structure for manipulating >>> types/values etc, yet all three seem to have quite different >>> structures that I find very confusing. I'm still at a loss to fathom >>> the CRTP and the hoops we seemingly have to jump through just to add >>> or subtract 1!!! >>> >>> Cheers, >>> David >>> >>> On 1/09/2017 7:29 PM, Erik ?sterlund wrote: >>>> Hi David, >>>> >>>> On 2017-09-01 02:49, David Holmes wrote: >>>>> Hi Erik, >>>>> >>>>> Sorry but this one is really losing me. >>>>> >>>>> What is the role of Adjustment ?? >>>> >>>> Adjustment represents the increment/decrement value as an >>>> IntegralConstant - your template friend for passing around a >>>> constant with both a specified type and value in templates. The >>>> type of the increment/decrement is the type of the destination when >>>> the destination is an integral type, otherwise if it is a pointer >>>> type, the increment/decrement type is ptrdiff_t. >>>> >>>>> How are inc/dec anything but "using constant" ?? >>>> >>>> I was also a bit torn on that name (I assume you are referring to >>>> IncUsingConstant/DecUsingConstant). It was hard to find a name that >>>> depicted what this platform helper does. I considered calling the >>>> helper something with immediate in the name because it is really >>>> used to embed the constant as immediate values in inline assembly >>>> today. But then again that seemed too specific, as it is not >>>> completely obvious platform specializations will use it in that >>>> way. One might just want to specialize this to send it into some >>>> compiler Atomic::inc intrinsic for example. Do you have any other >>>> preferred names? Here are a few possible names for IncUsingConstant: >>>> >>>> IncUsingScaledConstant >>>> IncUsingAdjustedConstant >>>> IncUsingPlatformHelper >>>> >>>> Any favourites? >>>> >>>>> Why do we special case jshort?? >>>> >>>> To be consistent with the special case of Atomic::add on jshort. Do >>>> you want it removed? >>>> >>>>> This is indecipherable to normal people ;-) >>>>> >>>>> This()->template inc(dest); >>>>> >>>>> For something as trivial as adding or subtracting 1 the template >>>>> machinations here are just mind boggling! >>>> >>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ >>>> idiom. The idea is to devirtualize a virtual call by passing in the >>>> derived type as a template parameter to a base class, and then let >>>> the base class static_cast to the derived class to devirtualize the >>>> call. I hope this explanation sheds some light on what is going on. >>>> The same CRTP idiom was used in the Atomic::add implementation in a >>>> similar fashion. >>>> >>>> I will add some comments describing this in the next round after >>>> Coleen replies. >>>> >>>> Thanks for looking at this. >>>> >>>> /Erik >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On 31/08/2017 10:45 PM, Erik ?sterlund wrote: >>>>>> Hi everyone, >>>>>> >>>>>> Bug ID: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838 >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/ >>>>>> >>>>>> The time has come for the next step in generalizing Atomic with >>>>>> templates. Today I will focus on Atomic::inc/dec. >>>>>> >>>>>> I have tried to mimic the new Kim style that seems to have been >>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the >>>>>> structure looks like this: >>>>>> >>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function >>>>>> object that performs some basic type checks. >>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can >>>>>> define the operation arbitrarily for a given platform. The >>>>>> default implementation if not specialized for a platform is to >>>>>> call Atomic::add. So only platforms that want to do something >>>>>> different than that as an optimization have to provide a >>>>>> specialization. >>>>>> Layer 3) Platforms that decide to specialize >>>>>> PlatformInc/PlatformDec to be more optimized may inherit from a >>>>>> helper class IncUsingConstant/DecUsingConstant. This helper helps >>>>>> performing the necessary computation what the increment/decrement >>>>>> should be after pointer scaling using CRTP. The >>>>>> PlatformInc/PlatformDec operation then only needs to define an >>>>>> inc/dec member function, and will then get all the context >>>>>> information necessary to generate a more optimized >>>>>> implementation. Easy peasy. >>>>>> >>>>>> It is worth noticing that the generalized Atomic::dec operation >>>>>> assumes a two's complement integer machine and potentially sends >>>>>> the unary negative of a potentially unsigned type to Atomic::add. >>>>>> I have the following comments about this: >>>>>> 1) We already assume in other code that two's complement integers >>>>>> must be present. >>>>>> 2) A machine that does not have two's complement integers may >>>>>> still simply provide a specialization that solves the problem in >>>>>> a different way. >>>>>> 3) The alternative that does not make assumptions about that >>>>>> would use the good old IntegerTypes::cast_to_signed >>>>>> metaprogramming stuff, and I seem to recall we thought that was a >>>>>> bit too involved and complicated. >>>>>> This is the reason why I have chosen to use unary minus on the >>>>>> potentially unsigned type in the shared helper code that sends >>>>>> the decrement as an addend to Atomic::add. >>>>>> >>>>>> It would also be nice if somebody with access to PPC and s390 >>>>>> machines could try out the relevant changes there so I do not >>>>>> accidentally break those platforms. I have blind-coded the >>>>>> addition of the immediate values passed in to the inline assembly >>>>>> in a way that I think looks like it should work. >>>>>> >>>>>> Testing: >>>>>> RBT hs-tier3, JPRT --testset hotspot >>>>>> >>>>>> Thanks, >>>>>> /Erik >>>> >> From erik.osterlund at oracle.com Mon Sep 4 08:14:48 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 10:14:48 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: <59AD0B78.4000707@oracle.com> Hi Andrew, On 2017-09-02 10:31, Andrew Haley wrote: > On 01/09/17 15:15, Erik ?sterlund wrote: >> It is not the simplest solution I can think of. The simplest solution I >> can think of is to remove all specialized versions of Atomic::inc/dec >> and just have it call Atomic::add directly. That would remove the >> optimizations we have today, for whatever reason we have them. It would >> lead to slightly more conservative fencing on PPC/S390, > I see. Can you say what instructions would be different? Sure. Specializations exist on x86, PPC and S390. Removing these specializations would have the following consequences: ------------------------------------------------------------------- On x86 Atomic::inc of 4 byte sized types: lock addl $immediateAddend,(rDest) becomes lock xaddl rAddend,(rDest) # stores the value that was there back in rAddend upon completion So the inc optimization currently makes sure the addend can be encoded as an immediate value in the code stream, and exploits that we do not need to see the returned value. Therefore a lock addl is good enough for those purposes and does not require the use of an extra register. But it is not obvious that on a modern machine today that slimmed encoding will make any significant difference at all. In the contended case it arguably will not matter. Similar arguments apply for 8 byte sized types and the Atomic::dec variants. ------------------------------------------------------------------- On PPC Atomic::inc/dec and Atomic::add have the following differences: Atomic::inc/dec uses addic between the LL and SC instructions with an immediate value for adding, whereas Atomic::add uses the add instruction with an extra register. Atomic::add has a leading lwsync fence and Atomic::inc/dec has no leading fence. Atomic::add has a trailing isync fence and Atomic::inc/dec has no trailing fence. So the current implementation of Atomic::add uses heavier fencing than Atomic::inc/dec. I can imagine that does matter for performance today. However, the documented semantics of Atomic::inc/dec requires a leading sync fence - so they are both arguably too weak and should have stronger fencing than they do today. And I would argue that if both conformed to the fencing required by our public API, then the difference would probably be small. If dodging those fences on PPC is crucial for performance, then I believe the right way of fixing that is by introducing relaxed atomics should that be necessary. ------------------------------------------------------------------- On S390 Atomic::inc/dec and Atomic::add look almost identical. But I spotted the following tindy differences: Atomic::inc on 4-byte sized types loads the increment with LGHI, whereas Atomic::add loads it with LGFR Similarly, Atomic::inc calculates the new value with AGHI and Atomic::add calculates the new value with AR. I am not too familiar with S390, but if I get this right then Atomic::add uses a fetch_and_add instruction, and then adds the fetched value by one in the assembly to conform to add_and_fetch semantics. Atomic::inc also uses a fetch_and_add instruction and seems to also calculate the add_and_fetch result value, without returning it or in any other way using it. If the native fetch_and_add instruction is not available, it resorts to using a load-link add CAS loop - and they look identical except for using an immediate value for Atomic::inc. The same applies for Atomic::dec and 8 byte sized types. Either way, the differences between add and inc/dec seems to currently mostly be related to using immediate values vs a register, if I get it right. And I would be surprised if that makes a huge difference. ------------------------------------------------------------------- All in all, I would not be unhappy about dropping Atomic::inc specializations in the name of simplicity, and potentially introducing relaxed atomics instead for the platforms that rely on fence elision, should that be required. Thanks, /Erik > >> and would lead to slightly less optimal machine encoding on x86 >> (without immediate values in the instructions). But it would be >> simpler for sure. I did not put any judgement into whether our >> existing optimizations are worthwhile or not. But if you want to >> prioritize simplicity, removing those optimizations is one possible >> solution. Would you prefer that? > Is this really about optimization? If we cared about getting this > stuff as optimized as possible we'd use intrinsics on GCC/x86 targets. > These have been supported for a long time. But it seems we're > determined to preserve the legacy assembly-language implementations > and use them everywhere, even where they are not necessary. > From aph at redhat.com Mon Sep 4 09:21:01 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 10:21:01 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59ACFD76.3000606@oracle.com> References: <59A804F8.9000501@oracle.com> <2789fbb4-efd3-a299-9ba3-a6b8100a9b06@oracle.com> <59A92896.9010604@oracle.com> <9d935d76-7e02-f973-1879-9c8e9a0ef8e7@oracle.com> <59A93B53.9010505@oracle.com> <2f1d98e3-72b5-7d34-8e7a-b8517583022b@oracle.com> <59ACFD76.3000606@oracle.com> Message-ID: <227233be-69a3-44f6-ea74-c86ed66aa44e@redhat.com> On 04/09/17 08:15, Erik ?sterlund wrote: > Having said that - I am not opposed to simply removing the > specializations of inc/dec if we are scared of the complexity of > passing this constant to the platform layer. It isn't exactly about fear, but of course we should be cautious about adding complexity. Simplicity is prerequisite for reliability. [One of Dijkstra's pithiest comments.] > After running a bunch of benchmarks over the weekend, it showed no > significant regressions after removal. Now of course that might not > tell the full story - it could have missed that some critical > operation in the JVM takes longer. But I would be very surprised if > that was the case. Good. So would I. Fred Brooks distinguishes between two types of complexity: accidental and essential. Essential complexity is determined by the problem to be solved, and nothing can remove it. Accidental complexity is caused by the implementation: programming language, use of assembly code, and so on. In this case, the idea of atomically incrementing a variable is extremely simple. It's barely even worthy of the name "algorithm". I believe that almost all of the complexity of a solution is accidental: it's mostly caused by C++, the C++ compilers we use, and the internal conventions of HotSpot. The question in my mind is: how much of the accidental complexity can we remove? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Sep 4 09:24:14 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 10:24:14 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AD0B78.4000707@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <59AD0B78.4000707@oracle.com> Message-ID: <8482b0b5-8791-9495-7d3c-d9155bb32518@redhat.com> On 04/09/17 09:14, Erik ?sterlund wrote: > On PPC Atomic::inc/dec and Atomic::add have the following differences: > > Atomic::inc/dec uses addic between the LL and SC instructions with an > immediate value for adding, whereas Atomic::add uses the add instruction > with an extra register. > Atomic::add has a leading lwsync fence and Atomic::inc/dec has no > leading fence. > Atomic::add has a trailing isync fence and Atomic::inc/dec has no > trailing fence. One of those must be a bug. Either one of them is unnecessary or both are necessary. > So the current implementation of Atomic::add uses heavier fencing than > Atomic::inc/dec. I can imagine that does matter for performance today. > However, the documented semantics of Atomic::inc/dec requires a leading > sync fence - so they are both arguably too weak and should have stronger > fencing than they do today. And I would argue that if both conformed to > the fencing required by our public API, then the difference would > probably be small. Right. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Mon Sep 4 09:34:46 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 4 Sep 2017 11:34:46 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> Message-ID: <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> Hi, On 09/02/2017 10:31 AM, Andrew Haley wrote: > On 01/09/17 15:15, Erik ?sterlund wrote: >> It is not the simplest solution I can think of. The simplest solution I >> can think of is to remove all specialized versions of Atomic::inc/dec >> and just have it call Atomic::add directly. That would remove the >> optimizations we have today, for whatever reason we have them. It would >> lead to slightly more conservative fencing on PPC/S390, > > I see. Can you say what instructions would be different? > >> and would lead to slightly less optimal machine encoding on x86 >> (without immediate values in the instructions). But it would be >> simpler for sure. I did not put any judgement into whether our >> existing optimizations are worthwhile or not. But if you want to >> prioritize simplicity, removing those optimizations is one possible >> solution. Would you prefer that? > > Is this really about optimization? If we cared about getting this > stuff as optimized as possible we'd use intrinsics on GCC/x86 targets. > These have been supported for a long time. But it seems we're > determined to preserve the legacy assembly-language implementations > and use them everywhere, even where they are not necessary. > Why not use gcc/clang intrinsic on for all platforms we use gcc/clang? (not just gcc/x86) For "__atomic_fetch_add (&value, inc, __ATOMIC_RELAXED);" gcc seem to generate "lock addl" on x86 and armv8 ldxr,stxr, with acq_rel ldaxr,stlxr, which is what I would expect. And thus we can remove a lot of code! (if we should have the relaxed version in API is another question) /Robbin From erik.osterlund at oracle.com Mon Sep 4 09:50:14 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 11:50:14 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> Message-ID: <59AD21D6.8040305@oracle.com> Hi Robbin, I agree that on x86, there isn't a whole lot of other things the compiler could do with the intrinsics than what we want it to do due to the relatively strong memory model of the machine. So this might be a possible simplification on x86 gcc/clang targets (but still not all x86 targets). As for PPC and ARMv7 though, that is not true any longer. For example, our conservative memory model is more conservative than seq_cst semantics. E.g. it also has "leading sync" semantics always guaranteed, which is exploited in our code base and would be broken if translated simply as seq_cst. Also, since the fencing from the C++ compiler must be compliant with what our code generation does, they could end up being incompatible due to choice of different fencing conventions. Intrinsic provided operations may or may not have leading sync semantics. We can hope for it, but we should never rely on it. Thanks, /Erik On 2017-09-04 11:34, Robbin Ehn wrote: > Hi, > > On 09/02/2017 10:31 AM, Andrew Haley wrote: >> On 01/09/17 15:15, Erik ?sterlund wrote: >>> It is not the simplest solution I can think of. The simplest solution I >>> can think of is to remove all specialized versions of Atomic::inc/dec >>> and just have it call Atomic::add directly. That would remove the >>> optimizations we have today, for whatever reason we have them. It would >>> lead to slightly more conservative fencing on PPC/S390, >> >> I see. Can you say what instructions would be different? >> >>> and would lead to slightly less optimal machine encoding on x86 >>> (without immediate values in the instructions). But it would be >>> simpler for sure. I did not put any judgement into whether our >>> existing optimizations are worthwhile or not. But if you want to >>> prioritize simplicity, removing those optimizations is one possible >>> solution. Would you prefer that? >> >> Is this really about optimization? If we cared about getting this >> stuff as optimized as possible we'd use intrinsics on GCC/x86 targets. >> These have been supported for a long time. But it seems we're >> determined to preserve the legacy assembly-language implementations >> and use them everywhere, even where they are not necessary. >> > > Why not use gcc/clang intrinsic on for all platforms we use gcc/clang? > (not just gcc/x86) > For "__atomic_fetch_add (&value, inc, __ATOMIC_RELAXED);" > gcc seem to generate "lock addl" on x86 and armv8 ldxr,stxr, with > acq_rel ldaxr,stlxr, which is what I would expect. > > And thus we can remove a lot of code! > > (if we should have the relaxed version in API is another question) > > /Robbin From aph at redhat.com Mon Sep 4 10:05:53 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 11:05:53 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AD21D6.8040305@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> Message-ID: On 04/09/17 10:50, Erik ?sterlund wrote: > As for PPC and ARMv7 though, that is not true any longer. For > example, our conservative memory model is more conservative than > seq_cst semantics. E.g. it also has "leading sync" semantics always > guaranteed, which is exploited in our code base and would be broken > if translated simply as seq_cst. Also, since the fencing from the > C++ compiler must be compliant with what our code generation does, > they could end up being incompatible due to choice of different > fencing conventions. Intrinsic provided operations may or may not > have leading sync semantics. We can hope for it, but we should never > rely on it. We can use intrinsics to get any fencing we want. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Mon Sep 4 10:18:04 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 4 Sep 2017 12:18:04 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> Message-ID: <4cdc378a-960f-6ffd-96cf-23e932da0dda@oracle.com> On 09/04/2017 12:05 PM, Andrew Haley wrote: > On 04/09/17 10:50, Erik ?sterlund wrote: > >> As for PPC and ARMv7 though, that is not true any longer. For >> example, our conservative memory model is more conservative than >> seq_cst semantics. E.g. it also has "leading sync" semantics always >> guaranteed, which is exploited in our code base and would be broken >> if translated simply as seq_cst. Also, since the fencing from the >> C++ compiler must be compliant with what our code generation does, >> they could end up being incompatible due to choice of different >> fencing conventions. Intrinsic provided operations may or may not >> have leading sync semantics. We can hope for it, but we should never >> rely on it. > > We can use intrinsics to get any fencing we want. > +1, was just writing the same thing. /Robbin From erik.osterlund at oracle.com Mon Sep 4 10:26:44 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 4 Sep 2017 12:26:44 +0200 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> Message-ID: <59AD2A64.3070507@oracle.com> Hi Andrew, On 2017-09-04 12:05, Andrew Haley wrote: > On 04/09/17 10:50, Erik ?sterlund wrote: > >> As for PPC and ARMv7 though, that is not true any longer. For >> example, our conservative memory model is more conservative than >> seq_cst semantics. E.g. it also has "leading sync" semantics always >> guaranteed, which is exploited in our code base and would be broken >> if translated simply as seq_cst. Also, since the fencing from the >> C++ compiler must be compliant with what our code generation does, >> they could end up being incompatible due to choice of different >> fencing conventions. Intrinsic provided operations may or may not >> have leading sync semantics. We can hope for it, but we should never >> rely on it. > We can use intrinsics to get any fencing we want. 1) I want evidence for this claim. Can you get leading and trailing dmb sy (rather than dmb ish) for atomic operations on ARMv7? 2) Even if you could and the compiler happens to generate that - we can not rely on it because there is no contract to the compiler what fence instructions it elects to use. The only contract the compiler needs to abide to is how atomic C++ operations interact with other C++ operations. And we do not want the underlying fencing to silently change when performing compiler upgrades. Thanks, /Erik From magnus.ihse.bursie at oracle.com Mon Sep 4 10:30:20 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 4 Sep 2017 12:30:20 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: References: <088045c0-efcc-ab7a-a088-e80579a3c12b@physik.fu-berlin.de> <14f28ddc-6929-dd0d-a77a-1a463c47d40b@oracle.com> <79503f9c-bc57-725e-b8f1-40cb522b9218@physik.fu-berlin.de> <90649da7-48e5-22b1-3118-5861c6bb0e24@physik.fu-berlin.de> <3cb65ceb-c575-e446-bb66-a50c4b02684a@physik.fu-berlin.de> Message-ID: <12eb6779-8b25-89b5-b3c0-ea30828979fd@oracle.com> On 2017-08-24 18:19, Thomas St?fe wrote: > On Thu, Aug 24, 2017 at 3:51 PM, John Paul Adrian Glaubitz < > glaubitz at physik.fu-berlin.de> wrote: > >> On 08/24/2017 03:22 PM, John Paul Adrian Glaubitz wrote: >> >>> Do the gtests (especially test_memset_with_concurrent_readers.cpp) run >>>> through with your patch? >>>> >>> I will run the testsuite in a second and report back. >>> >> Ok. I have to admit I don't understand how to run the testsuite out of the >> build tree. It mentions jtreg which I have installed: >> >> glaubitz at deb4g:~$ jtreg -version >> jtreg, version 4.2 src b07 >> Installed in /usr/share/java/jtreg.jar >> Running on platform version 1.8.0_144 from /usr/lib/jvm/java-8-openjdk-sp >> arc64/jre. >> Built with 1.8.0_131 on Tue, 20 Jun 2017 10:54:14 +0200. >> Copyright (c) 1999, 2016, Oracle and/or its affiliates. All rights >> reserved. >> Use is subject to license terms. >> glaubitz at deb4g:~$ >> >> But the configure script complains about jtreg missing: >> >> checking if jtreg failure handler should be built... configure: error: >> Cannot enable jtreg failure handler without jtreg. >> configure exiting with result code 1 >> glaubitz at deb4g:~/openjdk/hs$ >> >> I also don't fully understand how the testsuite is run as mentioned in >> [1]. It >> talks about jtreg and then about jtreg harness which doesn't have clear >> build >> instructions [2]. >> >> Adrian >> > Sorry, I should have been more specific. The gtests have nothing to do with > the jtreg suite, they are a set of native tests using google test. > > Just execute (from your build directory): > ./hotspot/variant-server/libjvm/gtest/gtestLauncher -jdk:./images/jdk > > There is also a way to execute them from the make, but I do not know how. For the record: "make run-test-gtest" or "make run-test TEST=gtest" The latter form also allows for a test selection, like this: "make run-test TEST=gtest:LogDecorations". See common/doc/testing.md for more information. /Magnus > > Best Regards, Thomas > > >> [1] http://download.java.net/openjdk/testresults/8/docs/howtoruntests.html >>> [2] http://openjdk.java.net/jtreg/build.html >>> >> -- >> .''`. John Paul Adrian Glaubitz >> : :' : Debian Developer - glaubitz at debian.org >> `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de >> `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 >> From aph at redhat.com Mon Sep 4 10:41:38 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 11:41:38 +0100 Subject: RFR: 8186838: Generalize Atomic::inc/dec with templates In-Reply-To: <59AD2A64.3070507@oracle.com> References: <59A804F8.9000501@oracle.com> <59A96B9D.6070002@oracle.com> <6f623116-8213-1523-ba37-fb4ce17c4afa@oracle.com> <59AD21D6.8040305@oracle.com> <59AD2A64.3070507@oracle.com> Message-ID: On 04/09/17 11:26, Erik ?sterlund wrote: > 1) I want evidence for this claim. Can you get leading and trailing dmb > sy (rather than dmb ish) for atomic operations on ARMv7? I hope not. There is no reason for us to want such a thing in HotSpot. But even if we did want such a thing, we could crop down to asm: the point is the usual cases, not weird corner cases. > 2) Even if you could and the compiler happens to generate that - we can > not rely on it because there is no contract to the compiler what fence > instructions it elects to use. The only contract the compiler needs to > abide to is how atomic C++ operations interact with other C++ > operations. And we do not want the underlying fencing to silently change > when performing compiler upgrades. There is no way that GCC writers would break ABI compatibility in such a fundamental way. There would be a firestorm. I know this because even if no-one else started the fire, I would. I am a GCC author. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From glaubitz at physik.fu-berlin.de Mon Sep 4 11:18:30 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Mon, 4 Sep 2017 13:18:30 +0200 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? Message-ID: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Hello! I'm currently testing Zero builds on Linux Alpha, in my particular case on QEMU in a Debian unstable alpha chroot, using OpenJDK 8 for bootstrapping. For some reason, OpenJDK 8 from Debian's openjdk8 assumes a heap size which is too small and refuses to start: (sid-alpha-sbuild)root at nofan:/# java -version Error occurred during initialization of VM Too small initial heap (sid-alpha-sbuild)root at nofan:/# This can be fixed by overriding the heap settings with _JAVA_OPTIONS: (sid-alpha-sbuild)root at nofan:/# export _JAVA_OPTIONS="-Xmx1024m -Xms256m" (sid-alpha-sbuild)root at nofan:/# java -version Picked up _JAVA_OPTIONS: -Xmx1024m -Xms256m openjdk version "1.8.0_141" OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-3-b15) OpenJDK 64-Bit Zero VM (build 25.141-b15, interpreted mode) (sid-alpha-sbuild)root at nofan:/# As you can see, this has the side effect that the JVM becomes very chatty about the fact that _JAVA_OPTIONS were set. While this doesn't seem to be a problem at first sight, it becomes a problem when trying to run configure for JDK10 which will fail because of the unexpected output when trying to determine the version of the boot JDK: configure: Found potential Boot JDK using configure arguments configure: Potential Boot JDK found at /usr/lib/jvm/java-8-openjdk-alpha/ is incorrect JDK version (Picked up _JAVA_OPTIONS: -Xmx1024m -Xms256m); ignoring configure: (Your Boot JDK must be version 8 or 9) configure: error: The path given by --with-boot-jdk does not contain a valid Boot JDK configure exiting with result code 1 Is there any way to silence the JVM regarding "_JAVA_OPTIONS"? If no, we should probably patch the JVM to do that by default. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From aph at redhat.com Mon Sep 4 11:36:55 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 12:36:55 +0100 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Message-ID: On 04/09/17 12:18, John Paul Adrian Glaubitz wrote: > Hello! > > I'm currently testing Zero builds on Linux Alpha, in my particular case on > QEMU in a Debian unstable alpha chroot, using OpenJDK 8 for bootstrapping. > > For some reason, OpenJDK 8 from Debian's openjdk8 assumes a heap size which > is too small and refuses to start: > > (sid-alpha-sbuild)root at nofan:/# java -version > Error occurred during initialization of VM > Too small initial heap > (sid-alpha-sbuild)root at nofan:/# > > This can be fixed by overriding the heap settings with _JAVA_OPTIONS: We should probably just fix the bug. I recently did something very similar for another target, but I can't find it. :-) -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Alan.Bateman at oracle.com Mon Sep 4 11:37:15 2017 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 4 Sep 2017 12:37:15 +0100 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Message-ID: On 04/09/2017 12:18, John Paul Adrian Glaubitz wrote: > : > > Is there any way to silence the JVM regarding "_JAVA_OPTIONS"? If no, > we should probably patch the JVM to do that by default. The undocumented/unsupported _JAVA_OPTIONS option is highly problematic. One of its flaws is that it appends rather than prepends so it potentially overrides options that you specify on the command lines. So the output message is deliberate, it would be too confusing to have VM options magically overridden. For the issue you are running into then I assume the probe in the build can be updated to ignore the message. -Alan From glaubitz at physik.fu-berlin.de Mon Sep 4 11:53:14 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Mon, 4 Sep 2017 13:53:14 +0200 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Message-ID: <7b3b22cf-0419-9f43-b74f-1ba628a9f500@physik.fu-berlin.de> On 09/04/2017 01:36 PM, Andrew Haley wrote: >> This can be fixed by overriding the heap settings with _JAVA_OPTIONS: > > We should probably just fix the bug. I recently did something very similar > for another target, but I can't find it. :-) Oh, I agree. I just wasn't sure where the default heap settings come from. Can you point me to the place in the sources? Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From magnus.ihse.bursie at oracle.com Mon Sep 4 12:15:41 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 4 Sep 2017 14:15:41 +0200 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de> Message-ID: On 2017-09-04 13:18, John Paul Adrian Glaubitz wrote: > Hello! > > I'm currently testing Zero builds on Linux Alpha, in my particular > case on > QEMU in a Debian unstable alpha chroot, using OpenJDK 8 for > bootstrapping. > > For some reason, OpenJDK 8 from Debian's openjdk8 assumes a heap size > which > is too small and refuses to start: > > (sid-alpha-sbuild)root at nofan:/# java -version > Error occurred during initialization of VM > Too small initial heap > (sid-alpha-sbuild)root at nofan:/# > > This can be fixed by overriding the heap settings with _JAVA_OPTIONS: > > (sid-alpha-sbuild)root at nofan:/# export _JAVA_OPTIONS="-Xmx1024m -Xms256m" > (sid-alpha-sbuild)root at nofan:/# java -version > Picked up _JAVA_OPTIONS: -Xmx1024m -Xms256m > openjdk version "1.8.0_141" > OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-3-b15) > OpenJDK 64-Bit Zero VM (build 25.141-b15, interpreted mode) > (sid-alpha-sbuild)root at nofan:/# > > As you can see, this has the side effect that the JVM becomes very > chatty about the fact that _JAVA_OPTIONS were set. > > While this doesn't seem to be a problem at first sight, it becomes > a problem when trying to run configure for JDK10 which will fail > because of the unexpected output when trying to determine the version > of the boot JDK: > > configure: Found potential Boot JDK using configure arguments > configure: Potential Boot JDK found at > /usr/lib/jvm/java-8-openjdk-alpha/ is incorrect JDK version (Picked up > _JAVA_OPTIONS: -Xmx1024m -Xms256m); ignoring > configure: (Your Boot JDK must be version 8 or 9) > configure: error: The path given by --with-boot-jdk does not contain a > valid Boot JDK > configure exiting with result code 1 > > Is there any way to silence the JVM regarding "_JAVA_OPTIONS"? If no, > we should probably patch the JVM to do that by default. Ouch! Lots of small, idiotic issues. For the build identification part: are both the _JAVA_OPTIONS and the version outputted to stdout? Or can you separate them by separating stdout/stderr? Otherwise, this patch would solve the issue in your case. I'm not sure how it would affect all other java instances we try to detect, so I'm a bit reluctant to take it in. diff --git a/common/autoconf/boot-jdk.m4 b/common/autoconf/boot-jdk.m4 --- a/common/autoconf/boot-jdk.m4 +++ b/common/autoconf/boot-jdk.m4 @@ -74,7 +74,7 @@ BOOT_JDK_FOUND=no else # Oh, this is looking good! We probably have found a proper JDK. Is it the correct version? - BOOT_JDK_VERSION=`"$BOOT_JDK/bin/java" -version 2>&1 | $HEAD -n 1` + BOOT_JDK_VERSION=`"$BOOT_JDK/bin/java" -version 2>&1 | $GREP version | $HEAD -n 1` # Extra M4 quote needed to protect [] in grep expression. [FOUND_CORRECT_VERSION=`$ECHO $BOOT_JDK_VERSION | $EGREP '\"9([\.+-].*)?\"|(1\.[89]\.)'`] But the main problem here seems to be the Debian openjdk8 instance that crashes on "java -version". Seems like a good and simple test to add to your test matrix. ;-) /Magnus > > Adrian > From thomas.stuefe at gmail.com Mon Sep 4 12:36:29 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 4 Sep 2017 14:36:29 +0200 Subject: [RFR]: 8186578: Zero fails to build on linux-sparc due to sparc-specific code In-Reply-To: <12eb6779-8b25-89b5-b3c0-ea30828979fd@oracle.com> References: <088045c0-efcc-ab7a-a088-e80579a3c12b@physik.fu-berlin.de> <14f28ddc-6929-dd0d-a77a-1a463c47d40b@oracle.com> <79503f9c-bc57-725e-b8f1-40cb522b9218@physik.fu-berlin.de> <90649da7-48e5-22b1-3118-5861c6bb0e24@physik.fu-berlin.de> <3cb65ceb-c575-e446-bb66-a50c4b02684a@physik.fu-berlin.de> <12eb6779-8b25-89b5-b3c0-ea30828979fd@oracle.com> Message-ID: On Mon, Sep 4, 2017 at 12:30 PM, Magnus Ihse Bursie < magnus.ihse.bursie at oracle.com> wrote: > > On 2017-08-24 18:19, Thomas St?fe wrote: > >> On Thu, Aug 24, 2017 at 3:51 PM, John Paul Adrian Glaubitz < >> glaubitz at physik.fu-berlin.de> wrote: >> >> On 08/24/2017 03:22 PM, John Paul Adrian Glaubitz wrote: >>> >>> Do the gtests (especially test_memset_with_concurrent_readers.cpp) run >>>> >>>>> through with your patch? >>>>> >>>>> I will run the testsuite in a second and report back. >>>> >>>> Ok. I have to admit I don't understand how to run the testsuite out of >>> the >>> build tree. It mentions jtreg which I have installed: >>> >>> glaubitz at deb4g:~$ jtreg -version >>> jtreg, version 4.2 src b07 >>> Installed in /usr/share/java/jtreg.jar >>> Running on platform version 1.8.0_144 from /usr/lib/jvm/java-8-openjdk-sp >>> arc64/jre. >>> Built with 1.8.0_131 on Tue, 20 Jun 2017 10:54:14 +0200. >>> Copyright (c) 1999, 2016, Oracle and/or its affiliates. All rights >>> reserved. >>> Use is subject to license terms. >>> glaubitz at deb4g:~$ >>> >>> But the configure script complains about jtreg missing: >>> >>> checking if jtreg failure handler should be built... configure: error: >>> Cannot enable jtreg failure handler without jtreg. >>> configure exiting with result code 1 >>> glaubitz at deb4g:~/openjdk/hs$ >>> >>> I also don't fully understand how the testsuite is run as mentioned in >>> [1]. It >>> talks about jtreg and then about jtreg harness which doesn't have clear >>> build >>> instructions [2]. >>> >>> Adrian >>> >>> Sorry, I should have been more specific. The gtests have nothing to do >> with >> the jtreg suite, they are a set of native tests using google test. >> >> Just execute (from your build directory): >> ./hotspot/variant-server/libjvm/gtest/gtestLauncher -jdk:./images/jdk >> >> There is also a way to execute them from the make, but I do not know how. >> > For the record: > > "make run-test-gtest" > or > "make run-test TEST=gtest" > > The latter form also allows for a test selection, like this: "make > run-test TEST=gtest:LogDecorations". > > See common/doc/testing.md for more information. > > /Magnus > > Thank you Magnus! I usually prefer running the test directly, because I might have to fire up the debugger and debug them, and this is difficult if the test is a sub process of make. But yes, this is easier if one expects no errors. ..Thomas > > >> Best Regards, Thomas >> >> >> [1] http://download.java.net/openjdk/testresults/8/docs/howtorun >>> tests.html >>> >>>> [2] http://openjdk.java.net/jtreg/build.html >>>> >>>> -- >>> .''`. John Paul Adrian Glaubitz >>> : :' : Debian Developer - glaubitz at debian.org >>> `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de >>> `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 >>> >>> > From aph at redhat.com Mon Sep 4 12:36:44 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 4 Sep 2017 13:36:44 +0100 Subject: How to suppress verbosity when settting _JAVA_OPTIONS? In-Reply-To: <7b3b22cf-0419-9f43-b74f-1ba628a9f500@physik.fu-berlin.de> References: <4f21fbf9-9802-6f39-e90c-fda58c09bf72@physik.fu-berlin.de>