From martinrb at google.com Sun Aug 5 15:30:27 2018 From: martinrb at google.com (Martin Buchholz) Date: Sun, 5 Aug 2018 08:30:27 -0700 Subject: Using C++11+ in hotspot In-Reply-To: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com> References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com> Message-ID: On Fri, Aug 3, 2018 at 3:14 PM, Mikael Vidstedt wrote: > > Martin Buchholz suggested the topic and Mikael signed up to lead the > session. Martin gave an introduction. He had observed some issues recently > (?impossible null pointer exceptions?) which after investigation turned out > to be caused by a toolchain upgrade and in turn revealed the fact that some > code in hotspot requires atomicity but does not make this requirement very > explicit and in the end assumes that the C++ compiler will produce suitable > code to guarantee atomicity. Martin also observed that (on linux) hotspot > is compiled targeting the c++98 standard, which is old enough to not even > mention the concept of threads. Mikael also added that for extra fun the > story is different on different platforms and toolchains. > I was surprised to hear that IBM AIX xlc compilers might not support C++11 - it's not the IBM way. But from reading the tea leaves at https://www-01.ibm.com/support/docview.wss?uid=swg27007322&aid=1 I concluded that IBM is a Linux company now! Even for IBM, AIX is a niche legacy platform and they just couldn't keep up with the evolution of C++ (who can blame them?). Meanwhile gcc is available for AIX and can/should be used to build openjdk. Has anyone tried? Here's one example of code that actually did go wrong with Google's latest internal toolchain, because the copy was not in fact word-atomic. Thanks to whoever added the comment long ago. static inline void copy_table(address* from, address* to, int size) { // Copy non-overlapping tables. The copy has to occur word wise for MT safety. while (size-- > 0) *to++ = *from++; } Recommendation: target C++11 for jdk12; use gcc to build openjdk on AIX. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Mon Aug 6 18:12:55 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 6 Aug 2018 11:12:55 -0700 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com> Message-ID: <3959BC8E-F755-48B4-BE05-9BA13BC3E575@oracle.com> On Aug 5, 2018, at 8:30 AM, Martin Buchholz wrote: > > Thanks to whoever added the comment long ago. FTR I think it was Steffen Grarup. We were just learning about MT safety at the time. The copy conjoint/disjoint APIs were not yet in existence. I think they came around 2003, and Paul Hohensee's name is all over the SCCS history there. s 00008/00002/00762 d D 1.147 99/02/17 10:14:36 steffen 235 233 ? I 235 static inline void copy_table(address* from, address* to, int size) { // Copy non-overlapping tables. The copy has to occur word wise for MT safety. while (size-- > 0) *to++ = *from++; } Today, that loop should be recoded to use copy, and copy in turn needs to do whatever magic is required to force word-atomic access on non-atomic data. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue Aug 7 00:45:57 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 7 Aug 2018 10:45:57 +1000 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com>

Message-ID: On 6/08/2018 6:16 PM, Andrew Haley wrote: > On 08/05/2018 04:30 PM, Martin Buchholz wrote: >> Here's one example of code that actually did go wrong with Google's latest >> internal toolchain, because the copy was not in fact word-atomic. Thanks >> to whoever added the comment long ago. >> >> static inline void copy_table(address* from, address* to, int size) { >> // Copy non-overlapping tables. The copy has to occur word wise for MT >> safety. >> while (size-- > 0) *to++ = *from++; >> } >> >> Recommendation: target C++11 for jdk12; > I don't think that helps. There's no legal way AFAICS to force an atomic > access to non-atomic types in C++11. I would agree. We implicitly rely on compilers doing the obvious/natural thing as long as the variables are suitable aligned. We're outside the language here with regards to "atomic access".** David From martinrb at google.com Tue Aug 7 04:54:11 2018 From: martinrb at google.com (Martin Buchholz) Date: Mon, 6 Aug 2018 21:54:11 -0700 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com>

Message-ID: On Mon, Aug 6, 2018 at 1:16 AM, Andrew Haley wrote: > On 08/05/2018 04:30 PM, Martin Buchholz wrote: > > Here's one example of code that actually did go wrong with Google's > latest > > internal toolchain, because the copy was not in fact word-atomic. Thanks > > to whoever added the comment long ago. > > > > static inline void copy_table(address* from, address* to, int size) { > > // Copy non-overlapping tables. The copy has to occur word wise for MT > > safety. > > while (size-- > 0) *to++ = *from++; > > } > > > > Recommendation: target C++11 for jdk12; > I don't think that helps. There's no legal way AFAICS to force an atomic > access to non-atomic types in C++11. > Ohh... perhaps that's the idea behind atomic_ref https://en.cppreference.com/w/cpp/atomic/atomic_ref we only have to wait one more decade for that to become available. I don't know what is actually being copied here, but can't the underlying type be atomic ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Tue Aug 7 05:48:46 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 6 Aug 2018 22:48:46 -0700 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com>

Message-ID: On Aug 6, 2018, at 9:54 PM, Martin Buchholz wrote: > > I don't know what is actually being copied here, but can't the underlying > type be atomic ? Yes, if we are allowed to cast some random sequence of metadata words to atomic[]. If that's the magic incantation to get to the hardware's atomicity primitives, OK. I suspect a more direct technique may be needed, such as assembly code stubs. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Tue Aug 7 06:26:48 2018 From: martinrb at google.com (Martin Buchholz) Date: Mon, 6 Aug 2018 23:26:48 -0700 Subject: Using C++11+ in hotspot In-Reply-To: <3959BC8E-F755-48B4-BE05-9BA13BC3E575@oracle.com> References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com> <3959BC8E-F755-48B4-BE05-9BA13BC3E575@oracle.com> Message-ID: On Mon, Aug 6, 2018 at 11:12 AM, John Rose wrote: > On Aug 5, 2018, at 8:30 AM, Martin Buchholz wrote: > > > Thanks to whoever added the comment long ago. > > > FTR I think it was Steffen Grarup. We were just learning about MT safety > at the time. > The copy conjoint/disjoint APIs were not yet in existence. I think they > came around > 2003, and Paul Hohensee's name is all over the SCCS history there. > > s 00008/00002/00762 > d D 1.147 99/02/17 10:14:36 steffen 235 233 > ? > I 235 > static inline void copy_table(address* from, address* to, int size) { > // Copy non-overlapping tables. The copy has to occur word wise for MT > safety. > while (size-- > 0) *to++ = *from++; > } > > Today, that loop should be recoded to use copy, and copy in turn needs to > do whatever magic is required to force word-atomic access on non-atomic > data. > > That loop copies address*, while pd_disjoint_words_atomic copies HeapWord, so these are not compatible out of the box. We could have atomic relaxed copies like below. Using compiler builtins also avoids the problem of the underlying type not being declared atomic<>, and is ISA-independent. OTOH maybe we always want that loop compiled to REP MOVSQ on x64. template static ALWAYSINLINE void copy_atomic_relaxed(const T* from, T* to) { T val; __atomic_load(from, &val, __ATOMIC_RELAXED); __atomic_store(to, &val, __ATOMIC_RELAXED); } static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* to, size_t count) { #ifdef AMD64 switch (count) { case 8: copy_atomic_relaxed(from + 7, to + 7); case 7: copy_atomic_relaxed(from + 6, to + 6); case 6: copy_atomic_relaxed(from + 5, to + 5); case 5: copy_atomic_relaxed(from + 4, to + 4); case 4: copy_atomic_relaxed(from + 3, to + 3); case 3: copy_atomic_relaxed(from + 2, to + 2); case 2: copy_atomic_relaxed(from + 1, to + 1); case 1: copy_atomic_relaxed(from + 0, to + 0); case 0: break; default: while (count-- > 0) { copy_atomic_relaxed(from++, to++); } break; } -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.vidstedt at oracle.com Tue Aug 7 18:36:40 2018 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Tue, 7 Aug 2018 11:36:40 -0700 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com> <3959BC8E-F755-48B4-BE05-9BA13BC3E575@oracle.com> Message-ID: In utilities/copy.[ch]pp there?s Copy::conjoint_copy and its close friends which does support different element sizes, and which promises to not tear the words/elements (if the underlying implementation doesn?t do the right thing it needs to be fixed). It doesn?t currently allow for configuring/customizing memory ordering requirements though, and If ?extreme? performance is required there may well be some additional specialization needed as well. Cheers, Mikael > On Aug 6, 2018, at 11:26 PM, Martin Buchholz wrote: > > > > On Mon, Aug 6, 2018 at 11:12 AM, John Rose > wrote: > On Aug 5, 2018, at 8:30 AM, Martin Buchholz > wrote: >> >> Thanks to whoever added the comment long ago. > > FTR I think it was Steffen Grarup. We were just learning about MT safety at the time. > The copy conjoint/disjoint APIs were not yet in existence. I think they came around > 2003, and Paul Hohensee's name is all over the SCCS history there. > > s 00008/00002/00762 > d D 1.147 99/02/17 10:14:36 steffen 235 233 > ? > I 235 > static inline void copy_table(address* from, address* to, int size) { > // Copy non-overlapping tables. The copy has to occur word wise for MT safety. > while (size-- > 0) *to++ = *from++; > } > > Today, that loop should be recoded to use copy, and copy in turn needs to > do whatever magic is required to force word-atomic access on non-atomic data. > > > That loop copies address*, while pd_disjoint_words_atomic copies HeapWord, so these are not compatible out of the box. > > We could have atomic relaxed copies like below. Using compiler builtins also avoids the problem of the underlying type not being declared atomic<>, and is ISA-independent. OTOH maybe we always want that loop compiled to REP MOVSQ on x64. > > > template > static ALWAYSINLINE void copy_atomic_relaxed(const T* from, T* to) { > T val; > __atomic_load(from, &val, __ATOMIC_RELAXED); > __atomic_store(to, &val, __ATOMIC_RELAXED); > } > > static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* to, size_t count) { > #ifdef AMD64 > switch (count) { > case 8: copy_atomic_relaxed(from + 7, to + 7); > case 7: copy_atomic_relaxed(from + 6, to + 6); > case 6: copy_atomic_relaxed(from + 5, to + 5); > case 5: copy_atomic_relaxed(from + 4, to + 4); > case 4: copy_atomic_relaxed(from + 3, to + 3); > case 3: copy_atomic_relaxed(from + 2, to + 2); > case 2: copy_atomic_relaxed(from + 1, to + 1); > case 1: copy_atomic_relaxed(from + 0, to + 0); > case 0: break; > default: > while (count-- > 0) { > copy_atomic_relaxed(from++, to++); > } > break; > } -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Wed Aug 8 06:51:06 2018 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 8 Aug 2018 08:51:06 +0200 Subject: Using C++11+ in hotspot In-Reply-To: <347ce2b1-550b-112c-6a76-3c1657f48a7b@redhat.com> References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com>

<347ce2b1-550b-112c-6a76-3c1657f48a7b@redhat.com> Message-ID: Hi Andrew, This question is very important and deserves highlighting. I have been thinking for some time that there ought to be a document describing ?allowed C++ sins in HotSpot?. And I would put aliasing rules at the top of said document. Today we rely on compilers being tamed not to be tempted to exploit aliasing rules (e.g. -fno-strict-aliasing). The reliance on this in our code base goes so deep that we will arguably never be able to stop relying on it. And why would we want to, more than to find more interesting ways of tormenting ourselves? This is something we should embrace. By embracing this and putting it in a document that this is allowed, we would have the following benefits: 1) All reoccuring discussions whether we should or should not care about aliasing would come to quick ends, and decisions would not have to be taken (inconsistently) over and over again on a case by case basis. 2) Porters would know what requirements HotSpot has on compiler taming to safely run HotSpot. If they can not tame the compiler to ignore aliasing rules, then they can not use HotSpot. 3) By embracing aliasing violations as an allowed C++ sin, time will be saved for everyone involved, not having to invent complicated solutions circumventing it. Spending time honoring these rules seems like a waste of time and resources unless we fully commit to removing all such behaviour and hence can flip the compiler switches and hence remove our reliance on this. And we will never be able to do that. Nor should we if we could. 4) It seems like the problem being discussed in this thread before I hijacked it would have simple solutions. So basically, my answer to your question is: no we do not and should not care. And that message ought to be documented somewhere to remove all uncertainty and inconsistency around that reoccuring question. Thanks, /Erik > On 7 Aug 2018, at 16:00, Andrew Haley wrote: > >> On 08/07/2018 06:48 AM, John Rose wrote: >>> On Aug 6, 2018, at 9:54 PM, Martin Buchholz wrote: >>> >>> I don't know what is actually being copied here, but can't the underlying >>> type be atomic ? >> >> Yes, if we are allowed to cast some random sequence of metadata >> words to atomic[]. > > We're not. Well, we sort-of are because we use -fno-strict-aliasing, but that's > not standard C++11. Do we care? :-) > > GCC builtins do what we need when we're using GCC, but then we don't need C++11. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martinrb at google.com Wed Aug 8 19:51:39 2018 From: martinrb at google.com (Martin Buchholz) Date: Wed, 8 Aug 2018 12:51:39 -0700 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com>

<347ce2b1-550b-112c-6a76-3c1657f48a7b@redhat.com> Message-ID: I agree getting to a hotspot without C++ undefined behavior is very hard. Organizations like Google like to have control over all their software, including the toolchain, and want both high performance and standards compliance, enforced via tools like ubsan https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html. Full employment for engineers like me who try to bridge the 2 worlds. I'd like to see hotspot stop using the -fno-strict-aliasing curtch, replacing it with union and/or may_alias, but this is a serious investment. Here are my type-punning notes: #---------------------------------------------------------------- # -fstrict-aliasing, union, memcpy, C99, C++. #---------------------------------------------------------------- https://blog.regehr.org/archives/959 http://dbp-consulting.com/tutorials/StrictAliasing.html C99 allows type punning via members of a union, and all known C++ compilers allow it, but strictly speaking not permitted by the C++ standard. 85. If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a trap representation. Doing type punning via memcpy really goes against low-level programmer instinct - we really have to trust the compiler to optimize away the memcpy library call! gcc (what about clang?) has the may_alias attribute https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Wed Aug 8 20:05:01 2018 From: martinrb at google.com (Martin Buchholz) Date: Wed, 8 Aug 2018 13:05:01 -0700 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com>

<347ce2b1-550b-112c-6a76-3c1657f48a7b@redhat.com>

Message-ID: On Wed, Aug 8, 2018 at 7:02 AM, Andrew Haley wrote: > On 08/08/2018 07:51 AM, Erik Osterlund wrote: > > > So basically, my answer to your question is: no we do not and should > > not care. And that message ought to be documented somewhere to > > remove all uncertainty and inconsistency around that reoccuring > > question. > > That sounds sensible. I guess that if we use -fno-strict-aliasing > then we can cast *T to *atomic. I can ask on gcc@ to be sure. > A difficulty might arise if the representation of atomic is different from T, as might happen if the arch has no atomic instructions for a type of that size and so a lock must be allocated somewhere. I don't know how gcc's atomic builtins deal with that problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Wed Aug 8 21:16:07 2018 From: john.r.rose at oracle.com (John Rose) Date: Wed, 8 Aug 2018 14:16:07 -0700 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com>

<347ce2b1-550b-112c-6a76-3c1657f48a7b@redhat.com>

Message-ID: On Aug 8, 2018, at 1:05 PM, Martin Buchholz wrote: > > > That sounds sensible. I guess that if we use -fno-strict-aliasing > then we can cast *T to *atomic. I can ask on gcc@ to be sure. > > A difficulty might arise if the representation of atomic is different from T, as might happen if the arch has no atomic instructions for a type of that size and so a lock must be allocated somewhere. I don't know how gcc's atomic builtins deal with that problem. A CPU/memory architecture which requires STM to for atomic storage of machine words would be a prime example of such a platform. It's also a prime example of a platform which HotSpot would not be portable to without deep refactoring along the lines of the access API but for all data. I think the two properties would be correlated, in practice. HotSpot makes pervasive assumptions that machine word data is routinely atomic (non-tearable, with valid race-winners in all cases), and it will be hard to break it of those assumptions in all cases. We've started to do that with things like the atomic, copy, and access APIs, but there's lots more to do, if we need to go down that road. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Thu Aug 9 02:24:15 2018 From: martinrb at google.com (Martin Buchholz) Date: Wed, 8 Aug 2018 19:24:15 -0700 Subject: Using C++11+ in hotspot In-Reply-To: <3959BC8E-F755-48B4-BE05-9BA13BC3E575@oracle.com> References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com> <3959BC8E-F755-48B4-BE05-9BA13BC3E575@oracle.com> Message-ID: On Mon, Aug 6, 2018 at 11:12 AM, John Rose wrote: > On Aug 5, 2018, at 8:30 AM, Martin Buchholz wrote: > > > static inline void copy_table(address* from, address* to, int size) { > // Copy non-overlapping tables. The copy has to occur word wise for MT > safety. > while (size-- > 0) *to++ = *from++; > } > > Today, that loop should be recoded to use copy, and copy in turn needs to > do whatever magic is required to force word-atomic access on non-atomic > data. > I now see the many variants of copy in share/utilities/copy.hpp but there is none that makes copies of the type "address". Maybe you could make an atomic copy template that takes any type T with sizeof(T) <= 8 ? --- At the type system level, HeapWord is a struct, so C++ will not be happy with our other traditional trick of reading a pointer to a volatile HeapWord to "force" atomicity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinrb at google.com Sat Aug 11 00:41:27 2018 From: martinrb at google.com (Martin Buchholz) Date: Fri, 10 Aug 2018 17:41:27 -0700 Subject: Using C++11+ in hotspot In-Reply-To: References: <6D166068-9FA9-4B0B-A157-0CB109753F4C@oracle.com>

<347ce2b1-550b-112c-6a76-3c1657f48a7b@redhat.com>

Message-ID: OK, it looks like -fno-strict-aliasing is here to stay. Casting freely between pointers to different types is pervasive in the source code, and there's insufficient discipline in the culture to try to fix it. And I'm not volunteering. static void pd_arrayof_conjoint_jlongs(const HeapWord* from, HeapWord* to, size_t count) { #ifdef AMD64 _Copy_arrayof_conjoint_jlongs(from, to, count); #else pd_conjoint_jlongs_atomic((const jlong*)from, (jlong*)to, count); #endif // AMD64 } static void pd_arrayof_conjoint_oops(const HeapWord* from, HeapWord* to, size_t count) { #ifdef AMD64 assert(BytesPerLong == BytesPerOop, "jlongs and oops must be the same size"); _Copy_arrayof_conjoint_jlongs(from, to, count); #else pd_conjoint_oops_atomic((const oop*)from, (oop*)to, count); #endif // AMD64 } On Fri, Aug 10, 2018 at 5:59 AM, Andrew Haley wrote: > On 08/08/2018 08:51 PM, Martin Buchholz wrote: > > I'd like to see hotspot stop using the -fno-strict-aliasing curtch, > > replacing it with union and/or may_alias, but this is a serious > investment. > > It's a serious investment, and we'd risk breaking stuff. At least the > entire > heap would have to be an array with elements which are a union of all of > the > possible types, and there'd be a byte array overlaid on top of that. It > would > not be pretty. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gromero at linux.vnet.ibm.com Mon Aug 27 15:55:50 2018 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 27 Aug 2018 12:55:50 -0300 Subject: JEP 163: Enable NUMA Mode by Default When Appropriate In-Reply-To: References: Message-ID: Hi Swati, On 08/27/2018 09:10 AM, Swati Sharma wrote: > I was going through this JEP 163: Enable NUMA Mode by Default When Appropriate(http://openjdk.java.net/jeps/163). > > Can we make UseNUMA flag by default ON after detecting that if process runs on multiple NUMA nodes.So after that user don't need to specify the UseNUMA flag and JVM will auto detect and switch ON the flag. The last time I tested SPECjvm with +UseNUMA I recall a slight better performance on PPC64 and no regressions. However I recall that IBM performance team reported at least one regression when using +UseNUMA for a Apache Hadoop YARN workload. They also reported that they were able to get better results when manually tuning the JVM using a numactl approach, for instance. At that occasion they recorded ~24% of numa misses when using +UseNUMA whilst only ~4% when using their numactl approach. That was on OpenJDK 8, but I understand based on changes from 8 to 12 that it didn't change much. So I'm not convinced that the currently +UseNUMA will help in the majority of cases and won't hurt badly some important workload for us. That JEP was create a long time ago (2012) and was updated on 2016, so _maybe_ community lost interest on that as the OS NUMA balancers evolved over the years, helping most workloads / scenarios. So in summing up, at least from a PPC64 perspective, I vouch for not enabling -XX:+UseNUMA by default at the moment without further investigation. I also see the JEP as "draft" so I CC:ed Jesper in case he has any insights about the history and the current / future state of that JEP. Anyway, others might want to comment on that, so let's wait for further comments. > Note : If we can then I can provide the implementaion for this JEP. BTW, https://bugs.openjdk.java.net/browse/JDK-7179517 mentions a patch contributed by Eric, probably the one discussed in this thread: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-July/004691.html but it looks like that change never got pushed to 8. Best regards, Gustavo From martin.doerr at sap.com Tue Aug 28 17:34:44 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 28 Aug 2018 17:34:44 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References:

Message-ID: <346da54af45243c4bdaf475f118a450d@sap.com> Hi Michihiro, thank you for implementing it. I have just taken a first look at your webrev.01. It looks basically good. Only the Power version check seems to be incorrect. VM_Version::has_popcntb() checks for Power5. I believe most instructions are available with Power7. Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with Power8? We should check this carefully. Also, indentation in register_ppc.hpp could get improved. Thanks and best regard, Martin -----Original Message----- From: Gustavo Romero Sent: Donnerstag, 26. Juli 2018 16:02 To: Michihiro Horie Cc: Lindenmaier, Goetz ; hotspot-dev at openjdk.java.net; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Michi, On 07/26/2018 01:43 AM, Michihiro Horie wrote: > I updated webrev: > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ Thanks for providing an updated webrev and for fixing indentation and function order in assembler_ppc.inline.hpp as well. I have no further comments :) Best Regards, Gustavo > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > From: Gustavo Romero > To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-dev at openjdk.java.net, hotspot-dev at openjdk.java.net > Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" > Date: 2018/07/25 23:05 > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Michi, > > On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > Dear all, > > > > Would you review the following change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > > > > This change adds support for vectorized arithmetic calculation with SLP. > > > > The to_vr function is added to convert VSR to VR. Currently, vecX is associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, which are exactly overlapped with VRs. Instruction APIs receiving VRs use the to_vr via vecX. Another thing is the change in sqrtF_reg to enable the matching with SqrtVF. I think the change in sqrtF_reg would be fine due to the ConvD2FNode::Value in convertnode.cpp. > > Looks good. Just a few comments: > > - In vmul4F_reg() would it be reasonable to use xvmulsp instead of vmaddfp in > order to avoid the splat? > > - Although all instructions added by your change where introduced in ISA 2.06, > so POWER7 and above are OK, as I see probes for PowerArchictecturePPC64=6|5 in > vm_version_ppc.cpp (line 64), I'm wondering if there is any control point to > guarantee that these instructions won't be emitted on a CPU that does not > support them. > > - I think that in general string in format %{} are in upper case. For instance, > this the current output on optoassembly for vmul4F: > > 2941835 5b4 ADDI R24, R24, #64 > 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > I think it would be better to be in upper case instead. I also think that if > the node match emits more than one instruction all instructions must be listed > in format %{}, since it's meant for detailed debugging. Finally I think it > would be better to replace \t! by \t// in that string (unless I'm missing any > special meaning for that char). So for vmul4F it would be something like: > > 2941835 5b4 ADDI R24, R24, #64 > VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > > But feel free to change anything just after you get additional reviews :) > > > > I confirmed this change with JTREG. In addition, I used attached micro benchmarks. > > /(See attached file: slp_microbench.zip)/ > > Thanks for sharing it. > Btw, another option to host it would be in the CR > server, in http://cr.openjdk.java.net/~mhorie/8208171 > > > Best regards, > Gustavo > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > > From HORIE at jp.ibm.com Fri Aug 31 13:16:02 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 31 Aug 2018 22:16:02 +0900 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: <346da54af45243c4bdaf475f118a450d@sap.com> References:

<346da54af45243c4bdaf475f118a450d@sap.com> Message-ID: Hi Martin, Thank you so much for giving comments! I fixed version checks and indentation. New webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.02/ Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Gustavo Romero , Michihiro Horie Cc: "Lindenmaier, Goetz" , "hotspot-dev at openjdk.java.net" , "ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" Date: 2018/08/29 02:35 Subject: RE: RFR: 8208171: PPC64: Enrich SLP support Hi Michihiro, thank you for implementing it. I have just taken a first look at your webrev.01. It looks basically good. Only the Power version check seems to be incorrect. VM_Version::has_popcntb() checks for Power5. I believe most instructions are available with Power7. Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with Power8? We should check this carefully. Also, indentation in register_ppc.hpp could get improved. Thanks and best regard, Martin -----Original Message----- From: Gustavo Romero Sent: Donnerstag, 26. Juli 2018 16:02 To: Michihiro Horie Cc: Lindenmaier, Goetz ; hotspot-dev at openjdk.java.net; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Michi, On 07/26/2018 01:43 AM, Michihiro Horie wrote: > I updated webrev: > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ Thanks for providing an updated webrev and for fixing indentation and function order in assembler_ppc.inline.hpp as well. I have no further comments :) Best Regards, Gustavo > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > From: Gustavo Romero > To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-dev at openjdk.java.net, hotspot-dev at openjdk.java.net > Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" > Date: 2018/07/25 23:05 > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Michi, > > On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > Dear all, > > > > Would you review the following change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > > > > This change adds support for vectorized arithmetic calculation with SLP. > > > > The to_vr function is added to convert VSR to VR. Currently, vecX is associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, which are exactly overlapped with VRs. Instruction APIs receiving VRs use the to_vr via vecX. Another thing is the change in sqrtF_reg to enable the matching with SqrtVF. I think the change in sqrtF_reg would be fine due to the ConvD2FNode::Value in convertnode.cpp. > > Looks good. Just a few comments: > > - In vmul4F_reg() would it be reasonable to use xvmulsp instead of vmaddfp in > order to avoid the splat? > > - Although all instructions added by your change where introduced in ISA 2.06, > so POWER7 and above are OK, as I see probes for PowerArchictecturePPC64=6|5 in > vm_version_ppc.cpp (line 64), I'm wondering if there is any control point to > guarantee that these instructions won't be emitted on a CPU that does not > support them. > > - I think that in general string in format %{} are in upper case. For instance, > this the current output on optoassembly for vmul4F: > > 2941835 5b4 ADDI R24, R24, #64 > 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > I think it would be better to be in upper case instead. I also think that if > the node match emits more than one instruction all instructions must be listed > in format %{}, since it's meant for detailed debugging. Finally I think it > would be better to replace \t! by \t// in that string (unless I'm missing any > special meaning for that char). So for vmul4F it would be something like: > > 2941835 5b4 ADDI R24, R24, #64 > VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > > But feel free to change anything just after you get additional reviews :) > > > > I confirmed this change with JTREG. In addition, I used attached micro benchmarks. > > /(See attached file: slp_microbench.zip)/ > > Thanks for sharing it. > Btw, another option to host it would be in the CR > server, in http://cr.openjdk.java.net/~mhorie/8208171 > > > Best regards, > Gustavo > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From martin.doerr at sap.com Fri Aug 31 15:28:24 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 31 Aug 2018 15:28:24 +0000 Subject: RFR: 8208171: PPC64: Enrich SLP support In-Reply-To: References:

<346da54af45243c4bdaf475f118a450d@sap.com> Message-ID: <1390fbbb7b6147ce8570ab977a670f05@sap.com> Hi Michihiro, thanks for the update. Looks correct, now. I can also sponsor this change. Does anybody else want to review it? Best regards, Martin From: Michihiro Horie Sent: Freitag, 31. August 2018 15:16 To: Doerr, Martin Cc: Lindenmaier, Goetz ; Gustavo Romero ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: RE: RFR: 8208171: PPC64: Enrich SLP support Hi Martin, Thank you so much for giving comments! I fixed version checks and indentation. New webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.02/ Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/08/29 02:35:05---Hi Michihiro, thank you for implementing it. I have just tak]"Doerr, Martin" ---2018/08/29 02:35:05---Hi Michihiro, thank you for implementing it. I have just taken a first look at your webrev.01. From: "Doerr, Martin" > To: Gustavo Romero >, Michihiro Horie > Cc: "Lindenmaier, Goetz" >, "hotspot-dev at openjdk.java.net" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" > Date: 2018/08/29 02:35 Subject: RE: RFR: 8208171: PPC64: Enrich SLP support ________________________________ Hi Michihiro, thank you for implementing it. I have just taken a first look at your webrev.01. It looks basically good. Only the Power version check seems to be incorrect. VM_Version::has_popcntb() checks for Power5. I believe most instructions are available with Power7. Some ones (vsubudm, ..., vmmuluwm, vpopcntw) were introduced with Power8? We should check this carefully. Also, indentation in register_ppc.hpp could get improved. Thanks and best regard, Martin -----Original Message----- From: Gustavo Romero > Sent: Donnerstag, 26. Juli 2018 16:02 To: Michihiro Horie > Cc: Lindenmaier, Goetz >; hotspot-dev at openjdk.java.net; Doerr, Martin >; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support Hi Michi, On 07/26/2018 01:43 AM, Michihiro Horie wrote: > I updated webrev: > http://cr.openjdk.java.net/~mhorie/8208171/webrev.01/ Thanks for providing an updated webrev and for fixing indentation and function order in assembler_ppc.inline.hpp as well. I have no further comments :) Best Regards, Gustavo > > Best regards, > -- > Michihiro, > IBM Research - Tokyo > > Inactive hide details for Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote:Gustavo Romero ---2018/07/25 23:05:32---Hi Michi, On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > From: Gustavo Romero > > To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port-dev at openjdk.java.net, hotspot-dev at openjdk.java.net > Cc: goetz.lindenmaier at sap.com, volker.simonis at sap.com, "Doerr, Martin" > > Date: 2018/07/25 23:05 > Subject: Re: RFR: 8208171: PPC64: Enrich SLP support > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Michi, > > On 07/25/2018 02:43 AM, Michihiro Horie wrote: > > Dear all, > > > > Would you review the following change? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8208171 > > Webrev: http://cr.openjdk.java.net/~mhorie/8208171/webrev.00 > > > > This change adds support for vectorized arithmetic calculation with SLP. > > > > The to_vr function is added to convert VSR to VR. Currently, vecX is associated with a VSR class vs_reg that only defines VSR32-51 in ppc.ad, which are exactly overlapped with VRs. Instruction APIs receiving VRs use the to_vr via vecX. Another thing is the change in sqrtF_reg to enable the matching with SqrtVF. I think the change in sqrtF_reg would be fine due to the ConvD2FNode::Value in convertnode.cpp. > > Looks good. Just a few comments: > > - In vmul4F_reg() would it be reasonable to use xvmulsp instead of vmaddfp in > order to avoid the splat? > > - Although all instructions added by your change where introduced in ISA 2.06, > so POWER7 and above are OK, as I see probes for PowerArchictecturePPC64=6|5 in > vm_version_ppc.cpp (line 64), I'm wondering if there is any control point to > guarantee that these instructions won't be emitted on a CPU that does not > support them. > > - I think that in general string in format %{} are in upper case. For instance, > this the current output on optoassembly for vmul4F: > > 2941835 5b4 ADDI R24, R24, #64 > 2941836 5b8 vmaddfp VSR32,VSR32,VSR36 ! mul packed4F > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > I think it would be better to be in upper case instead. I also think that if > the node match emits more than one instruction all instructions must be listed > in format %{}, since it's meant for detailed debugging. Finally I think it > would be better to replace \t! by \t// in that string (unless I'm missing any > special meaning for that char). So for vmul4F it would be something like: > > 2941835 5b4 ADDI R24, R24, #64 > VSPLTISW VSR34, 0 // Splat 0 imm in VSR34 > 2941836 5b8 VMADDFP VSR32,VSR32,VSR36,VSR34 // Mul packed4F > 2941837 5c0 STXVD2X [R17], VSR32 // store 16-byte Vector > > > But feel free to change anything just after you get additional reviews :) > > > > I confirmed this change with JTREG. In addition, I used attached micro benchmarks. > > /(See attached file: slp_microbench.zip)/ > > Thanks for sharing it. > Btw, another option to host it would be in the CR > server, in http://cr.openjdk.java.net/~mhorie/8208171 > > > Best regards, > Gustavo > > > > > Best regards, > > -- > > Michihiro, > > IBM Research - Tokyo > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: