From aleksey.shipilev at oracle.com Thu Jun 16 20:28:05 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 16 Jun 2016 23:28:05 +0300 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects Message-ID: <57630BD5.60004@oracle.com> Hi, Our current VarHandle.{getAcquire|setRelease} are oddly specified: /* * Returns the value of a variable, and ensures that subsequent * loads and stores are not reordered before this access. * ... */ Object getAcquire(Object... args); /** * Sets the value of a variable to the {@code newValue}, and ensures * that prior loads and stores are not reordered after this access. * ... */ void setRelease(Object... args); The clause about reorderings trips users into believing that getAcquire and putRelease have fence-like semantics. But in reality, they are weaker than fences, because optimizers are not obliged to introduce barriers around the operation to get the needed semantics. This opens up optimization opportunities that we would not like to miss with over-specifying VarHandles. For example, we do not specify volatile semantics in terms of reorderings, because volatile accesses and associated barriers should be completely removable here: @JCStressTest @Outcome(id = "1, 0", expect = Expect.ACCEPTABLE, desc = "No happens-before here.") @Outcome( expect = Expect.ACCEPTABLE, desc = "All other cases are acceptable.") @State public class VolatilesAreNotBarriers { static class Holder { volatile int GREAT_BARRIER_REEF; } int x; int y; @Actor public void actor1() { Holder h = new Holder(); x = 1; h.GREAT_BARRIER_REEF = h.GREAT_BARRIER_REEF; y = 1; } @Actor public void actor2(IntResult2 r) { Holder h = new Holder(); r.r1 = y; h.GREAT_BARRIER_REEF = h.GREAT_BARRIER_REEF; r.r2 = x; } } (A similar example can be built with setRelease/getAcquire) Current C2 does leave barriers behind, but that seem to be an implementation inefficiency: while it purges both Holder instances and volatile ops, it loses the association between the actual store and the relevant barrier shortly after parsing. My duct-taped Graal runs show that Graal seems to eliminate both instances and associated barriers on x86 (disassembly shows no barriers, and performance is 10x faster in actor methods). We certainly would not like to throw compilers under the bus and say this is forbidden. Therefore, I wonder if this is a better getAcquire specification (setRelease is symmetric to that): /** * Returns the value of a variable to the {@code newValue}, with * memory semantics similar to {@code volatile} variable load, * but without total ordering. * *

Reads with this access mode are access atomic. * *

Previous writes to the same variable synchronize-with (and * therefore happen-before) reads with this access mode, if writes * are performed with * {@code setVolatile}, {@code setRelease}, * {@code compareAndExchangeVolatile}, * {@code compareAndExchangeRelease}, * {@code compareAndSet}, * {@code weakCompareAndSetVolatile} or * {@code weakCompareAndSetRelease} access modes. * *

Reads with this access mode are not part of total * synchronization order. This makes {@code getAcquire} access * mode weaker than {@code getVolatile} access mode. * * ... */ Object getAcquire(Object... args); Or is there some hazard I am not seeing with spec like that? Apart from a creepy disconnect with acq/rel participating in SW, but not in SO, while SW is spec-ed as the suborder of SO. Thanks, -Aleksey From aph at redhat.com Fri Jun 17 08:08:50 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 17 Jun 2016 09:08:50 +0100 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: <57630BD5.60004@oracle.com> References: <57630BD5.60004@oracle.com> Message-ID: <5763B012.9090500@redhat.com> On 16/06/16 21:28, Aleksey Shipilev wrote: > The clause about reorderings trips users into believing that > getAcquire and putRelease have fence-like semantics. But in reality, > they are weaker than fences, because optimizers are not obliged to > introduce barriers around the operation to get the needed semantics. I've read this several times now and I still do not understand it. Release and acquire are well-defined terms in the industry, and I can't think of any great reason to add more words like yours, which seem to make things even more confusing than they were. The wording which is there seems to me to be accurate, at least, as a description of release and acquire. Andrew. From aleksey.shipilev at oracle.com Fri Jun 17 08:34:27 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 17 Jun 2016 11:34:27 +0300 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: <5763B012.9090500@redhat.com> References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> Message-ID: <5763B613.4060801@oracle.com> On 06/17/2016 11:08 AM, Andrew Haley wrote: > On 16/06/16 21:28, Aleksey Shipilev wrote: > >> The clause about reorderings trips users into believing that >> getAcquire and putRelease have fence-like semantics. But in reality, >> they are weaker than fences, because optimizers are not obliged to >> introduce barriers around the operation to get the needed semantics. > > I've read this several times now and I still do not understand it. > Release and acquire are well-defined terms in the industry, I dare anyone to produce the consistent industry definition of "acquire" and "release" :) > and I can't think of any great reason to add more words like yours, > which seem to make things even more confusing than they were. The > wording which is there seems to me to be accurate, at least, as a > description of release and acquire. That might be true if you take the *hardware* definition of acquire/releases, where prohibiting reorderings is surely a plausible semantics. (There is no telling if CPU vendors will some day describe their hardware memory models with abstract rules, like high-level languages do) But the trouble is with allowing software optimizations, as volatile example shows. A runtime should be able to elide barriers around volatiles, if this elision does not violate JMM. In fact, it is trivially doable when optimizers realize they are dealing with thread-local volatile ops (e.g. on a non-escaped object, or in constructor). This is what my original example shows. Now, VarHandle's acquire/release can be seen as further _relaxations_ of volatile semantics, when you give up sequential consistency for better performance. Therefore, the barriers around acquire/releases should also be elide-able. Therefore, specifying VarHandle's acquire/release with reoderings is odd. You can certainly specify that acq/rel inhibit reordering, but this way we face even a weirder inconsistency. It means that volatiles are *more optimizeable* than acq/rel, which defeats the purpose of having acq/rel to begin with! HTHS, -Aleksey From nitsanw at yahoo.com Fri Jun 17 13:02:18 2016 From: nitsanw at yahoo.com (Nitsan Wakart) Date: Fri, 17 Jun 2016 13:02:18 +0000 (UTC) Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: <5763B613.4060801@oracle.com> References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <5763B613.4060801@oracle.com> Message-ID: <1537350078.233169.1466168538391.JavaMail.yahoo@mail.yahoo.com> I think the effect you seek can be achieved by expanding on the acq/rel relationship. E.g.: /* * Returns the value of a variable, and ensures that subsequent * loads and stores are not **VISIBLY** reordered before this * access **when matched with a release of same variable** . * ... */ Object getAcquire(Object... args); /** * Sets the value of a variable to the {@code newValue}, and ensures * that prior loads and stores are not **VISIBLY** reordered after this * access **when matched with an acquire of same variable** .. * ... */ void setRelease(Object... args); The wording is lame, but the intent is to make the guarantee depend on the relationship between acq and rel. If the relationship is void so is the guarantee. This makes the optimization of TL variables, or any unmatched acq/rel if provable, valid. From martinrb at google.com Sat Jun 18 00:40:18 2016 From: martinrb at google.com (Martin Buchholz) Date: Fri, 17 Jun 2016 17:40:18 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: <57630BD5.60004@oracle.com> References: <57630BD5.60004@oracle.com> Message-ID: I looked again at the C++ standard, and it tries hard to specify everything abstractly via the happens-before order and the total synchronization order on volatiles. Which they probably more or less copied from java! It should not be too hard to amend the Java memory model to incorporate the concepts of "acquire operation" and "release operation" back from C++. We already mostly have them! We simply need to introduce acquire operations and release operations that are not reads/writes of volatiles. The library API should then simply say things like "this is an acquire operation" and link to the relevant section of the jls. (except maybe for fences - those are special and isolated) We don't know how to fix some aspects of the memory model, but bringing acquire/release into the existing model (without talking about reorderings) seems very achievable. From dl at cs.oswego.edu Sat Jun 18 11:50:01 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 18 Jun 2016 07:50:01 -0400 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: <5763B012.9090500@redhat.com> References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> Message-ID: <57653569.40500@cs.oswego.edu> On 06/17/2016 04:08 AM, Andrew Haley wrote: > On 16/06/16 21:28, Aleksey Shipilev wrote: > >> The clause about reorderings trips users into believing that >> getAcquire and putRelease have fence-like semantics. You are right that we should clarify. > I've read this several times now and I still do not understand it. > Release and acquire are well-defined terms in the industry, and I > can't think of any great reason to add more words like yours, But maybe some other words. Backing up: The rationale for the JMM JEP (http://openjdk.java.net/jeps/188) was to provide rigorous specs for the C/C++11 analogs introduced in jdk9 VarHandles (http://openjdk.java.net/jeps/193; see current API at http://download.java.net/java/jdk9/docs/api/java/lang/invoke/VarHandle.html). But we then discovered that doing this encountered some unsolved research problems. Mainly that the current C/C++ model relies on rules saying that programs with data races have no semantics, and does not prohibit out-of-thin-air reads. And the existing JMM causality rules that might help address this are known to be broken. Plus there are a few minor-mismatch snags that do not seem hard to address, including that plain Java variables are a little stronger than plain C/CC variables (for example references cannot be word-torn), but weaker than atomics in relaxed mode (they are not required to be coherent). The hope (last year) was that progress would be made on these fronts, but for the jdk-9 time frame, in the absence of a revised base memory model, we should use careful English specs that convey intent that would someday be backed up by a revised memory model. There has been some progress on the underlying issues. See recent work by Peter Sewell's group (http://www.cl.cam.ac.uk/~pes20/) on OOTA and operational semantics, Allan Jeffrey (http://asaj.org/papers/lics16.pdf), Jade Alglave (http://www0.cs.ucl.ac.uk/staff/J.Alglave/) Viktor Vafeiadis (http://www.mpi-sws.org/~viktor/), and others. So it is worth considering whether we can again move forward on a major JMM revision. Comments would be welcome. But in the mean time, we should do something to improve wordings that Aleksey points out could mislead users and implementors in cases where acquire or release access is optimized away or weakened because it is thread-local, and so on. Minimally, we could add a disclaimer that enables at least those transformations and optimizations applying to volatiles. Maybe just: The ordering constraints imposed by getAcquire and setRelease are weaker than, and subsumed by those of getVolatile and setVolatile. -Doug From martinrb at google.com Mon Jun 20 17:48:31 2016 From: martinrb at google.com (Martin Buchholz) Date: Mon, 20 Jun 2016 10:48:31 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: <57653569.40500@cs.oswego.edu> References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <57653569.40500@cs.oswego.edu> Message-ID: On Sat, Jun 18, 2016 at 4:50 AM, Doug Lea

wrote: > > But we then discovered that doing this encountered some unsolved > research problems. Mainly that the current C/C++ model relies on rules > saying that programs with data races have no semantics ... Plus there are > a few > minor-mismatch snags that do not seem hard to address, including that > plain Java variables are a little stronger than plain C/CC variables > (for example references cannot be word-torn), but weaker than atomics > in relaxed mode (they are not required to be coherent). > My mental model is to map plain old java variables to C++ relaxed atomics, except for long and double, which could perhaps be mapped to an array of atomic bytes, technically eliminating all data races. Both C++ and Java try to prevent OOTA results, and both succeed in practice, if not in theory. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html From martinrb at google.com Mon Jun 20 18:14:06 2016 From: martinrb at google.com (Martin Buchholz) Date: Mon, 20 Jun 2016 11:14:06 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: <57653569.40500@cs.oswego.edu> References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <57653569.40500@cs.oswego.edu> Message-ID: With the proliferation of weakly ordered operations, our story on happens-before and sequential consistency seems to have changed a little. Our discussion on memory operations http://download.java.net/java/jdk9/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility only discusses the happens-before relationship (a partial order), but the memory operations being discussed (e.g. enqueueing and dequeueing from a concurrent queue) actually satisfy a stronger condition - they are sequentially consistent and form part of the global total order of all such operations. It would not be OK to have general purpose data structure update and read operations to have only release/acquire memory order. I'm hoping that our definition of "happens-before" will be as close as possible to the C++ definition, for the sanity of the software industry. On Sat, Jun 18, 2016 at 4:50 AM, Doug Lea
wrote: > > But in the mean time, we should do something to improve wordings that > Aleksey points out could mislead users and implementors in cases > where acquire or release access is optimized away or weakened > because it is thread-local, and so on. Minimally, we could add a > disclaimer that enables at least those transformations and optimizations > applying to volatiles. Maybe just: > > The ordering constraints imposed by getAcquire and setRelease are weaker > than, and subsumed by those of getVolatile and setVolatile. From boehm at acm.org Fri Jun 24 04:45:47 2016 From: boehm at acm.org (Hans Boehm) Date: Thu, 23 Jun 2016 21:45:47 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <57653569.40500@cs.oswego.edu> Message-ID: There is another important difference between C++ relaxed atomics and Java ordinary variables. C++ relaxed atomics guarantee that accesses to a single variable, independent of all other accesses, look sequentually consistent. I.e. we promise "cache coherence". Java does not for ordinary variables. And that's pretty critical for compiler optimization on the Java side. (It also affects fencing on one or two hardware platforms.) On Mon, Jun 20, 2016 at 10:48 AM, Martin Buchholz wrote: > On Sat, Jun 18, 2016 at 4:50 AM, Doug Lea
wrote: > > > > > But we then discovered that doing this encountered some unsolved > > research problems. Mainly that the current C/C++ model relies on rules > > saying that programs with data races have no semantics ... Plus there are > > a few > > minor-mismatch snags that do not seem hard to address, including that > > plain Java variables are a little stronger than plain C/CC variables > > (for example references cannot be word-torn), but weaker than atomics > > in relaxed mode (they are not required to be coherent). > > > > My mental model is to map plain old java variables to C++ relaxed atomics, > except for long and double, which could perhaps be mapped to an array of > atomic bytes, technically eliminating all data races. Both C++ and Java > try to prevent OOTA results, and both succeed in practice, if not in > theory. > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html > From martinrb at google.com Sat Jun 25 00:21:50 2016 From: martinrb at google.com (Martin Buchholz) Date: Fri, 24 Jun 2016 17:21:50 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <57653569.40500@cs.oswego.edu> Message-ID: On Thu, Jun 23, 2016 at 9:45 PM, Hans Boehm wrote: > There is another important difference between C++ relaxed atomics and Java > ordinary variables. C++ relaxed atomics guarantee that accesses to a > single variable, independent of all other accesses, look sequentually > consistent. I.e. we promise "cache coherence". Java does not for ordinary > variables. And that's pretty critical for compiler optimization on the Java > side. (It also affects fencing on one or two hardware platforms.) > Single variable cache coherence in C++ is unexpected and seems not in the spirit of C++. I'm happy to keep that feature out of Java. But I'm still a fan of how acquire/release and fences are defined in C++ (inspired by Java!) and hoping for some reciprocal inspiration. > > On Mon, Jun 20, 2016 at 10:48 AM, Martin Buchholz > wrote: > >> On Sat, Jun 18, 2016 at 4:50 AM, Doug Lea
wrote: >> >> > >> > But we then discovered that doing this encountered some unsolved >> > research problems. Mainly that the current C/C++ model relies on rules >> > saying that programs with data races have no semantics ... Plus there >> are >> > a few >> > minor-mismatch snags that do not seem hard to address, including that >> > plain Java variables are a little stronger than plain C/CC variables >> > (for example references cannot be word-torn), but weaker than atomics >> > in relaxed mode (they are not required to be coherent). >> > >> >> My mental model is to map plain old java variables to C++ relaxed atomics, >> except for long and double, which could perhaps be mapped to an array of >> atomic bytes, technically eliminating all data races. Both C++ and Java >> try to prevent OOTA results, and both succeed in practice, if not in >> theory. >> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html >> > > From sanjoy at playingwithpointers.com Sat Jun 25 04:34:14 2016 From: sanjoy at playingwithpointers.com (Sanjoy Das) Date: Fri, 24 Jun 2016 21:34:14 -0700 Subject: [jmm-dev] Optimizing external actions in the JMM Message-ID: Hi, I'm stuck trying to model a certain kind of execution in the JMM, and was wondering if people here can help. Let's say we have a program like this: x == y == 0 initially ThreadA: r1 = x r2 = f(r1) y = r2 ThreadB: r3 = y x = r3 where f is some external action that the compiler understands. If the compiler knows `f` always returns 42 and has no other effect, can it optimize ThreadA to ThreadA: r1 = x r2 = 42 // f(r1) y = r2 and then to ThreadA: r2 = 42 // f(r1) y = r2 r1 = x thereby introducing a OOTA-like value of 42 into the system? Given how the JMM is phrased, in cases like say, f(x) == (x - x) + 42, (i.e. `f` is _not_ a external action) I'd proceed to justify r1 = r2 = r3 = 42 by: 1. Committing the initial writes of 0 to x and y in E0 2. Adding the load of r1 as 0 to E1 (but not committing it) and committing the write of 42 to y (justified since f(0) = 42) in E1 3. Nothing in E2 4. Now the load from y == r3 can see the write committed to y in E1, so commit the read r3 and the write of 42 to x in E3 5. Nothing in E4 6. In E5, finally commit the read from x == r1, which can see 42 now However, if `f` is a side-effecting operation, then I don't see how I can reason about `f(0)` in step 2 inside the memory model. As far as the JMM goes, `f` is a black box, and the only thing known about it given the trace (with the OOTA-like behavior) is that "f(42) == 42". This is fine for the vast majority of external actions that are black boxes to the compiler also. But `f` could be a routine in (say) DirectByteBuffer that touches native memory under the hood; and the compiler could very well be in a position to fold loads from native memory (that aren't modeled in the JMM AFAICT, and thus appear as "external actions"). Another potentially problematic definition for `f(x)` is: f(x) = // JNI_GetEnv is the "external action" here but the JIT // knows that no loaded class calls `setenv`, so it is safe to assume // that JNI_GetEnv is invariant for an invariant argument. String EnvVar = "VAR_" + Integer.toString(x); if (JNI_GetEnv(EnvVar) == JNI_GetEnv(EnvVar)) return 42; return 90; It is perhaps not too unreasonable to teach the compiler to fold `f` above to "return 42", and then optimize ThreadA to enable "r1 = r2 = r3 = 42". But justifying this in the JMM is difficult -- I'll end up having to justify that f(0) is 42, and that is difficult to do just by considering reads, writes and results of external actions already present in the trace. It seems to me that this isn't just a problem with the current JMM, but is a general issue with a JMM that is phrased as a predicate on a program trace -- a predicative memory rules out the compiler from optimizing external actions as first class entities. Is that accurate, or am I off base? Thank you for your time! -- Sanjoy From boehm at acm.org Sat Jun 25 07:42:55 2016 From: boehm at acm.org (Hans Boehm) Date: Sat, 25 Jun 2016 00:42:55 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <57653569.40500@cs.oswego.edu> Message-ID: I disagree somewhat. If you expect a variable will be used in a racy way, then there's a fairly strong argument that programmers expect single variable cache coherence, and you should try to provide it. Especially if it doesn't require special hardware instructions, which it doesn't on most architectures. Programmers just don't expect a counter that's only modified by getting incremented by a single thread to decrease. The situation is different if the variable is expected to be used in a race-free way, where the cache coherence guarantee buys you nothing at some performance cost. I think this is one of several reasons not to conflate race-free accesses with unordered atomic accesses. On Fri, Jun 24, 2016 at 5:21 PM, Martin Buchholz wrote: > > > On Thu, Jun 23, 2016 at 9:45 PM, Hans Boehm wrote: > >> There is another important difference between C++ relaxed atomics and >> Java ordinary variables. C++ relaxed atomics guarantee that accesses to a >> single variable, independent of all other accesses, look sequentually >> consistent. I.e. we promise "cache coherence". Java does not for ordinary >> variables. And that's pretty critical for compiler optimization on the Java >> side. (It also affects fencing on one or two hardware platforms.) >> > > Single variable cache coherence in C++ is unexpected and seems not in the > spirit of C++. I'm happy to keep that feature out of Java. > > But I'm still a fan of how acquire/release and fences are defined in C++ > (inspired by Java!) and hoping for some reciprocal inspiration. > > > >> >> On Mon, Jun 20, 2016 at 10:48 AM, Martin Buchholz >> wrote: >> >>> On Sat, Jun 18, 2016 at 4:50 AM, Doug Lea
wrote: >>> >>> > >>> > But we then discovered that doing this encountered some unsolved >>> > research problems. Mainly that the current C/C++ model relies on rules >>> > saying that programs with data races have no semantics ... Plus there >>> are >>> > a few >>> > minor-mismatch snags that do not seem hard to address, including that >>> > plain Java variables are a little stronger than plain C/CC variables >>> > (for example references cannot be word-torn), but weaker than atomics >>> > in relaxed mode (they are not required to be coherent). >>> > >>> >>> My mental model is to map plain old java variables to C++ relaxed >>> atomics, >>> except for long and double, which could perhaps be mapped to an array of >>> atomic bytes, technically eliminating all data races. Both C++ and Java >>> try to prevent OOTA results, and both succeed in practice, if not in >>> theory. >>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html >>> >> >> > From boehm at acm.org Sat Jun 25 07:48:06 2016 From: boehm at acm.org (Hans Boehm) Date: Sat, 25 Jun 2016 00:48:06 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: <5763B613.4060801@oracle.com> References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <5763B613.4060801@oracle.com> Message-ID: I think C++'s definitions of acquire/release, as refined by the work by Mark Batty et al, are in pretty good shape. But I strongly suspect they are incompatible with Java's notion of a total synchronization order. That order is not very meaningful for acquire/release operations. Specifying acquire/release correctly in a Java context does require some work. On Fri, Jun 17, 2016 at 1:34 AM, Aleksey Shipilev < aleksey.shipilev at oracle.com> wrote: > On 06/17/2016 11:08 AM, Andrew Haley wrote: > > On 16/06/16 21:28, Aleksey Shipilev wrote: > > > >> The clause about reorderings trips users into believing that > >> getAcquire and putRelease have fence-like semantics. But in reality, > >> they are weaker than fences, because optimizers are not obliged to > >> introduce barriers around the operation to get the needed semantics. > > > > I've read this several times now and I still do not understand it. > > Release and acquire are well-defined terms in the industry, > > I dare anyone to produce the consistent industry definition of "acquire" > and "release" :) > > > and I can't think of any great reason to add more words like yours, > > which seem to make things even more confusing than they were. The > > wording which is there seems to me to be accurate, at least, as a > > description of release and acquire. > > That might be true if you take the *hardware* definition of > acquire/releases, where prohibiting reorderings is surely a plausible > semantics. (There is no telling if CPU vendors will some day describe > their hardware memory models with abstract rules, like high-level > languages do) > > But the trouble is with allowing software optimizations, as volatile > example shows. A runtime should be able to elide barriers around > volatiles, if this elision does not violate JMM. In fact, it is > trivially doable when optimizers realize they are dealing with > thread-local volatile ops (e.g. on a non-escaped object, or in > constructor). This is what my original example shows. > > Now, VarHandle's acquire/release can be seen as further _relaxations_ of > volatile semantics, when you give up sequential consistency for better > performance. Therefore, the barriers around acquire/releases should also > be elide-able. Therefore, specifying VarHandle's acquire/release with > reoderings is odd. > > You can certainly specify that acq/rel inhibit reordering, but this way > we face even a weirder inconsistency. It means that volatiles are *more > optimizeable* than acq/rel, which defeats the purpose of having acq/rel > to begin with! > > HTHS, > -Aleksey > > From martinrb at google.com Mon Jun 27 23:39:42 2016 From: martinrb at google.com (Martin Buchholz) Date: Mon, 27 Jun 2016 16:39:42 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <57653569.40500@cs.oswego.edu> Message-ID: In Java even plain variables default to atomic, so the cost of guaranteeing special properties for atomics would be much higher than in C++ where atomic<> will be rare. Would it mean that plain fields can no longer be cached in registers? It is surprising to discover that relaxed atomics provide a kind of sequential consistency, but restricted to a single memory location. It may be the right decision for C++. On Sat, Jun 25, 2016 at 12:42 AM, Hans Boehm wrote: > I disagree somewhat. If you expect a variable will be used in a racy way, > then there's a fairly strong argument that programmers expect single > variable cache coherence, and you should try to provide it. Especially if > it doesn't require special hardware instructions, which it doesn't on most > architectures. Programmers just don't expect a counter that's only > modified by getting incremented by a single thread to decrease. > > The situation is different if the variable is expected to be used in a > race-free way, where the cache coherence guarantee buys you nothing at some > performance cost. I think this is one of several reasons not to conflate > race-free accesses with unordered atomic accesses. > > On Fri, Jun 24, 2016 at 5:21 PM, Martin Buchholz > wrote: > >> >> >> On Thu, Jun 23, 2016 at 9:45 PM, Hans Boehm wrote: >> >>> There is another important difference between C++ relaxed atomics and >>> Java ordinary variables. C++ relaxed atomics guarantee that accesses to a >>> single variable, independent of all other accesses, look sequentually >>> consistent. I.e. we promise "cache coherence". Java does not for ordinary >>> variables. And that's pretty critical for compiler optimization on the Java >>> side. (It also affects fencing on one or two hardware platforms.) >>> >> >> Single variable cache coherence in C++ is unexpected and seems not in the >> spirit of C++. I'm happy to keep that feature out of Java. >> >> But I'm still a fan of how acquire/release and fences are defined in C++ >> (inspired by Java!) and hoping for some reciprocal inspiration. >> >> >> >>> >>> On Mon, Jun 20, 2016 at 10:48 AM, Martin Buchholz >>> wrote: >>> >>>> On Sat, Jun 18, 2016 at 4:50 AM, Doug Lea
wrote: >>>> >>>> > >>>> > But we then discovered that doing this encountered some unsolved >>>> > research problems. Mainly that the current C/C++ model relies on rules >>>> > saying that programs with data races have no semantics ... Plus there >>>> are >>>> > a few >>>> > minor-mismatch snags that do not seem hard to address, including that >>>> > plain Java variables are a little stronger than plain C/CC variables >>>> > (for example references cannot be word-torn), but weaker than atomics >>>> > in relaxed mode (they are not required to be coherent). >>>> > >>>> >>>> My mental model is to map plain old java variables to C++ relaxed >>>> atomics, >>>> except for long and double, which could perhaps be mapped to an array of >>>> atomic bytes, technically eliminating all data races. Both C++ and Java >>>> try to prevent OOTA results, and both succeed in practice, if not in >>>> theory. >>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html >>>> >>> >>> >> > From martinrb at google.com Mon Jun 27 23:50:39 2016 From: martinrb at google.com (Martin Buchholz) Date: Mon, 27 Jun 2016 16:50:39 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <5763B613.4060801@oracle.com> Message-ID: (I'm not qualified to revise the memory model, but I can handwave ...) On Sat, Jun 25, 2016 at 12:48 AM, Hans Boehm wrote: > I think C++'s definitions of acquire/release, as refined by the work by > Mark Batty et al, are in pretty good shape. But I strongly suspect they > are incompatible with Java's notion of a total synchronization order. https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.6 Current memory model has: - synchronization order, which is a total order over all synchronization actions - synchronizes-with, a partial order over synchronization actions - happens-before, a partial order over actions and (hand-waving) "all we need to do" is allow synchronizes-with to also apply to actions that are not synchronization actions (acquires and releases) From boehm at acm.org Tue Jun 28 00:20:20 2016 From: boehm at acm.org (Hans Boehm) Date: Mon, 27 Jun 2016 17:20:20 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <5763B613.4060801@oracle.com> Message-ID: I would probably define synchronization actions to include acquire and release operations, and rename synchronization order to something else (SC order?), and ensure that it includes only SC operations. That's what C++ currently does, effectively. On Mon, Jun 27, 2016 at 4:50 PM, Martin Buchholz wrote: > (I'm not qualified to revise the memory model, but I can handwave ...) > > On Sat, Jun 25, 2016 at 12:48 AM, Hans Boehm wrote: > >> I think C++'s definitions of acquire/release, as refined by the work by >> Mark Batty et al, are in pretty good shape. But I strongly suspect they >> are incompatible with Java's notion of a total synchronization order. > > > https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.6 > Current memory model has: > > - synchronization order, which is a total order over all synchronization > actions > - synchronizes-with, a partial order over synchronization actions > - happens-before, a partial order over actions > > and (hand-waving) "all we need to do" is allow synchronizes-with to also > apply to actions that are not synchronization actions (acquires and > releases) > > From martinrb at google.com Tue Jun 28 01:05:40 2016 From: martinrb at google.com (Martin Buchholz) Date: Mon, 27 Jun 2016 18:05:40 -0700 Subject: [jmm-dev] Specifying VarHandle acquire/release without ill effects In-Reply-To: References: <57630BD5.60004@oracle.com> <5763B012.9090500@redhat.com> <5763B613.4060801@oracle.com> Message-ID: On Mon, Jun 27, 2016 at 5:20 PM, Hans Boehm wrote: > I would probably define synchronization actions to include acquire and > release operations, and rename synchronization order to something else (SC > order?), and ensure that it includes only SC operations. That's what C++ > currently does, effectively. > Reading the C++ standard, I see: - they use "synchronization operations" instead of "synchronization actions". We should adopt the former, incorporating the release/acquire operations as they do. - they don't have a real name for "synchronization order"; they just give it a temporary name """There shall be a single total order S on all memory_order_seq_cst operations"""