On Fri, Aug 3, 2018 at 3:14 PM, Mikael Vidstedt <mikael.vidstedt@oracle.com> wrote:
Martin Buchholz suggested the topic and Mikael signed up to lead the session. Martin gave an introduction. He had observed some issues recently (“impossible null pointer exceptions”) which after investigation turned out to be caused by a toolchain upgrade and in turn revealed the fact that some code in hotspot requires atomicity but does not make this requirement very explicit and in the end assumes that the C++ compiler will produce suitable code to guarantee atomicity. Martin also observed that (on linux) hotspot is compiled targeting the c++98 standard, which is old enough to not even mention the concept of threads. Mikael also added that for extra fun the story is different on different platforms and toolchains.
I was surprised to hear that IBM AIX xlc compilers might not support C++11 - it's not the IBM way. But from reading the tea leaves at https://www-01.ibm.com/support/docview.wss?uid=swg27007322&aid=1 I concluded that IBM is a Linux company now! Even for IBM, AIX is a niche legacy platform and they just couldn't keep up with the evolution of C++ (who can blame them?). Meanwhile gcc is available for AIX and can/should be used to build openjdk. Has anyone tried? Here's one example of code that actually did go wrong with Google's latest internal toolchain, because the copy was not in fact word-atomic. Thanks to whoever added the comment long ago. static inline void copy_table(address* from, address* to, int size) { // Copy non-overlapping tables. The copy has to occur word wise for MT safety. while (size-- > 0) *to++ = *from++; } Recommendation: target C++11 for jdk12; use gcc to build openjdk on AIX.
On Aug 5, 2018, at 8:30 AM, Martin Buchholz <martinrb@google.com> wrote:
Thanks to whoever added the comment long ago.
FTR I think it was Steffen Grarup. We were just learning about MT safety at the time. The copy conjoint/disjoint APIs were not yet in existence. I think they came around 2003, and Paul Hohensee's name is all over the SCCS history there. s 00008/00002/00762 d D 1.147 99/02/17 10:14:36 steffen 235 233 … I 235 static inline void copy_table(address* from, address* to, int size) { // Copy non-overlapping tables. The copy has to occur word wise for MT safety. while (size-- > 0) *to++ = *from++; } Today, that loop should be recoded to use copy, and copy in turn needs to do whatever magic is required to force word-atomic access on non-atomic data. — John
On Mon, Aug 6, 2018 at 11:12 AM, John Rose <john.r.rose@oracle.com> wrote:
On Aug 5, 2018, at 8:30 AM, Martin Buchholz <martinrb@google.com> wrote:
Thanks to whoever added the comment long ago.
FTR I think it was Steffen Grarup. We were just learning about MT safety at the time. The copy conjoint/disjoint APIs were not yet in existence. I think they came around 2003, and Paul Hohensee's name is all over the SCCS history there.
s 00008/00002/00762 d D 1.147 99/02/17 10:14:36 steffen 235 233 … I 235 static inline void copy_table(address* from, address* to, int size) { // Copy non-overlapping tables. The copy has to occur word wise for MT safety. while (size-- > 0) *to++ = *from++; }
Today, that loop should be recoded to use copy, and copy in turn needs to do whatever magic is required to force word-atomic access on non-atomic data.
That loop copies address*, while pd_disjoint_words_atomic copies HeapWord, so these are not compatible out of the box. We could have atomic relaxed copies like below. Using compiler builtins also avoids the problem of the underlying type not being declared atomic<>, and is ISA-independent. OTOH maybe we always want that loop compiled to REP MOVSQ on x64. template <typename T> static ALWAYSINLINE void copy_atomic_relaxed(const T* from, T* to) { T val; __atomic_load(from, &val, __ATOMIC_RELAXED); __atomic_store(to, &val, __ATOMIC_RELAXED); } static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* to, size_t count) { #ifdef AMD64 switch (count) { case 8: copy_atomic_relaxed(from + 7, to + 7); case 7: copy_atomic_relaxed(from + 6, to + 6); case 6: copy_atomic_relaxed(from + 5, to + 5); case 5: copy_atomic_relaxed(from + 4, to + 4); case 4: copy_atomic_relaxed(from + 3, to + 3); case 3: copy_atomic_relaxed(from + 2, to + 2); case 2: copy_atomic_relaxed(from + 1, to + 1); case 1: copy_atomic_relaxed(from + 0, to + 0); case 0: break; default: while (count-- > 0) { copy_atomic_relaxed(from++, to++); } break; }
In utilities/copy.[ch]pp there’s Copy::conjoint_copy and its close friends which does support different element sizes, and which promises to not tear the words/elements (if the underlying implementation doesn’t do the right thing it needs to be fixed). It doesn’t currently allow for configuring/customizing memory ordering requirements though, and If “extreme” performance is required there may well be some additional specialization needed as well. Cheers, Mikael
On Aug 6, 2018, at 11:26 PM, Martin Buchholz <martinrb@google.com> wrote:
On Mon, Aug 6, 2018 at 11:12 AM, John Rose <john.r.rose@oracle.com <mailto:john.r.rose@oracle.com>> wrote: On Aug 5, 2018, at 8:30 AM, Martin Buchholz <martinrb@google.com <mailto:martinrb@google.com>> wrote:
Thanks to whoever added the comment long ago.
FTR I think it was Steffen Grarup. We were just learning about MT safety at the time. The copy conjoint/disjoint APIs were not yet in existence. I think they came around 2003, and Paul Hohensee's name is all over the SCCS history there.
s 00008/00002/00762 d D 1.147 99/02/17 10:14:36 steffen 235 233 … I 235 static inline void copy_table(address* from, address* to, int size) { // Copy non-overlapping tables. The copy has to occur word wise for MT safety. while (size-- > 0) *to++ = *from++; }
Today, that loop should be recoded to use copy, and copy in turn needs to do whatever magic is required to force word-atomic access on non-atomic data.
That loop copies address*, while pd_disjoint_words_atomic copies HeapWord, so these are not compatible out of the box.
We could have atomic relaxed copies like below. Using compiler builtins also avoids the problem of the underlying type not being declared atomic<>, and is ISA-independent. OTOH maybe we always want that loop compiled to REP MOVSQ on x64.
template <typename T> static ALWAYSINLINE void copy_atomic_relaxed(const T* from, T* to) { T val; __atomic_load(from, &val, __ATOMIC_RELAXED); __atomic_store(to, &val, __ATOMIC_RELAXED); }
static void pd_disjoint_words_atomic(const HeapWord* from, HeapWord* to, size_t count) { #ifdef AMD64 switch (count) { case 8: copy_atomic_relaxed(from + 7, to + 7); case 7: copy_atomic_relaxed(from + 6, to + 6); case 6: copy_atomic_relaxed(from + 5, to + 5); case 5: copy_atomic_relaxed(from + 4, to + 4); case 4: copy_atomic_relaxed(from + 3, to + 3); case 3: copy_atomic_relaxed(from + 2, to + 2); case 2: copy_atomic_relaxed(from + 1, to + 1); case 1: copy_atomic_relaxed(from + 0, to + 0); case 0: break; default: while (count-- > 0) { copy_atomic_relaxed(from++, to++); } break; }
On Mon, Aug 6, 2018 at 11:12 AM, John Rose <john.r.rose@oracle.com> wrote:
On Aug 5, 2018, at 8:30 AM, Martin Buchholz <martinrb@google.com> wrote:
static inline void copy_table(address* from, address* to, int size) { // Copy non-overlapping tables. The copy has to occur word wise for MT safety. while (size-- > 0) *to++ = *from++; }
Today, that loop should be recoded to use copy, and copy in turn needs to do whatever magic is required to force word-atomic access on non-atomic data.
I now see the many variants of copy in share/utilities/copy.hpp but there is none that makes copies of the type "address". Maybe you could make an atomic copy template that takes any type T with sizeof(T) <= 8 ? --- At the type system level, HeapWord is a struct, so C++ will not be happy with our other traditional trick of reading a pointer to a volatile HeapWord to "force" atomicity.
participants (3)
-
John Rose
-
Martin Buchholz
-
Mikael Vidstedt