From dl at cs.oswego.edu Fri Feb 7 09:45:25 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 07 Feb 2014 12:45:25 -0500 Subject: [jmm-dev] Now playing on the OpenJDK jmm-dev list Message-ID: <52F51BB5.6010100@cs.oswego.edu> Here's the third, and I hope last, Intro post for this effort. The goal of the OpenJDK Memory Model Update project (http://openjdk.java.net/projects/jmm/) is to provide an updated Java Memory Model, as described in JEP188 (http://openjdk.java.net/jeps/188). Probably, most results will be posted on the OpenJDK Wiki (https://wiki.openjdk.java.net/display/jmm/Main), for use in updating JLS and as a reference for other JEPs producing associated software. This mailing list is intended for developing an updated JMM, not for usage questions etc about the JMM (for which concurrency-interest at cs.oswego.edu and/or javamemorymodel-discussion at cs.umd.edu may be more appropriate). We welcome participation by concurrency experts in formal specification, hardware and software engineering, and software development tools. Even though I'm listed as project lead, I hope never to have any special role in development processes except for cat-herding efforts to help direct attention to issues. The idea of proposing JEP188 started with some informal exchanges that grew to an unmanageable CC list upon proposal, so shifted to a temporary mailing list while we waited for approval. There has already been a fair amount of traffic (summarized below). Those on the pre-jmm9 list should be auto-subscribed and receiving this as first jmm-dev post. To view pre-jmm9 mailing list echanges, see the mailman archives (http://cs.oswego.edu/pipermail/pre-jmm9/), and for previous CCed exchanges, see the gzipped mbox archive (http://gee.cs.oswego.edu/dl/papers/preprejmm9.mbox.gz). Here's a brief summary of in-progress efforts (updated from 3 weeks ago). Many of them are only loosely coupled with each other, so I've been encouraging people to explore them concurrently. 1. Objectives. These are not yet written all in one place, but I think we are heading to rough agreement about core issues: safety (including disallowing OOTA (out-of-thin-air) reads), security (including information flow via indirection), global properties (including SC (sequential consistency) for DRF (data race free) programs solely using locks, and sometimes other cases), and expressiveness (enabling finer-grained ordering control required to implement known shared-memory concurrent algorithms). 2. Formalisms. Several people/groups are contemplating different approaches. This is still in early stages, but I'm optimistic about prospects for something really good to emerge. 3. Mappings. How do models translate into compiler and processor rules or actions? Or to JLS specs? 4. Experimentation, including: (1) Do compilers (mainly, optimizers) and processors do what we think/hope? (2) What are performance impacts of proposed mappings? 5. Initialization. Pending lots of details and checks, we might have settled on a simpler path for this that amounts to ensuring release fences at the ends of constructors in a way that introduces no (or at most little) additional performance impact. 6. Expression of ordering constraints. There seems to be no substantive disagreement with the idea of supplying C/C++11-like methods offering manual ordering control via a compination of enhanced l-value operations (".volatile") and fence methods. Many details needed. 7. Implementation guidance. We have already seen cases where exploring alternatives has led to some possible improvements in JVMs. Probably much more to come, sometimes in the form of contributing patches. 8. User/usage validation. Do the results of this effort help developers? We have a lot of known usages and complaints built up over the years to draw on before needing to invite more. 9. Consequences. Not really started yet: Can we propose tools, annotations, test suites, etc. Also User guidance docs. (Aside: The Android cross-language "SMP guide" might be a good model for some audiences (http://developer.android.com/training/articles/smp.html). -Doug From hansboehm at yahoo.com Fri Feb 7 23:11:52 2014 From: hansboehm at yahoo.com (Hans Boehm) Date: Fri, 7 Feb 2014 23:11:52 -0800 (PST) Subject: [jmm-dev] [pre-jmm9] Expressing ordering constraints In-Reply-To: <20140207184626.GU4250@linux.vnet.ibm.com> Message-ID: <1391843512.43684.YahooMailBasic@web122205.mail.ne1.yahoo.com> I'm not sure where the proposal to add br;isync came from. I think BMM would require that, but that seems more drastic than what I've seen proposed here. Even the N3710 ld->st ordering proposal is believed to require at most the branch without the isync. (And, if adopted, I would expect that branch to be replaced by another instruction that doesn't consume branch prediction slots in a few years.) Hans -------------------------------------------- On Fri, 2/7/14, Paul E. McKenney wrote: Subject: Re: [pre-jmm9] Expressing ordering constraints To: "Doug Lea"
Cc: "pre-jmm9 at cs.oswego.edu" , "Boehm, Hans" Date: Friday, February 7, 2014, 10:46 AM On Sat, Feb 01, 2014 at 10:16:34AM -0500, Doug Lea wrote: > On 01/30/2014 01:30 AM, Boehm, Hans wrote: > > >You mean that you would prefer intentionally racy but unordered accesses > >have the same semantics as data not-intended-to-be-racy accesses? ... The > >feeling on the C++ committee, particularly on Paul's part, IIRC, was that we > >did want coherence for the intentionally racy accesses, because its absence > >was just too horrible to deal with. > > This might be one of my rare disagreements with Paul. In this case, > simplicity of rules seems worth the extra agony for people who can > figure out how to cope if they need to.? Without distinguishing these, > the parts of the JMM that most programmers would ever need to deal > with might look like something like the following. It's not quite > as simple as, say, BMM, but seems to be on the right track: It is not just me.? The possibility of adding compare-branch-isb/isync to C11 relaxed loads came up on the Linux kernel mailing list today, and the reaction of one prominent maintainer was, and I quote, "sounds like someone took a big toke from the bong again."? One of the ARM64 maintainers took a somewhat less colorful but equally negative position, questioning why additional otherwise-unnecessary instructions were being contemplated to solve what he termed a compiler problem. The Linux-kernel discussion was of course C11 rather than Java, but nevertheless, just saying... From dl at cs.oswego.edu Sat Feb 8 06:17:44 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 08 Feb 2014 09:17:44 -0500 Subject: [jmm-dev] [pre-jmm9] Expressing ordering constraints In-Reply-To: <1391843512.43684.YahooMailBasic@web122205.mail.ne1.yahoo.com> References: <1391843512.43684.YahooMailBasic@web122205.mail.ne1.yahoo.com> Message-ID: <52F63C88.7040800@cs.oswego.edu> On 02/08/2014 02:11 AM, Hans Boehm wrote: > I'm not sure where the proposal to add br;isync came from. Me neither. Backing up ... Because it would be vastly nicer in Java to equate semantics for ordinary accesses to non-volatiles and relaxed accesses to volatiles, I've been trying to further diagnose the basis for the C/C++11 distinction and subsequent issues, to see if we can avoid them. Hans's N3710 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html) includes some discussions, but I'm still missing some context because this was introduced after I stopped paying close attention to C/C++11 MM development. As Hans mentioned, in C/C++ atomics are required to be coherent even if accessed in relaxed mode. Coherence requires, among other things that you don't see values "running backwards in time", which might otherwise occur in reorderings such as { r1 = x; r2 = x; }. When you mix this with current C/C++11 OOTA loopholes you encounter inexplicable anomalies. (See examples in N3710 and others discovered by Brian Demsky). Ignoring the OOTA issue (that I suspect that we address in some other way), the decision about requiring coherence regardless of access mode seems suspicious. Is there a killer example of an otherwise unprogrammable algorithm? An otherwise unprovable property? If not, my current sense is "Shrug; if you want ordering, ask for it". At most we could support more ordering control methods. (It looks like we will already be adding something like dependentStoreFence(ref) to those in C/C++, and there's no reason not to contemplate others.) -Doug From dl at cs.oswego.edu Sun Feb 9 07:31:09 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 09 Feb 2014 10:31:09 -0500 Subject: [jmm-dev] final reads/writes Message-ID: <52F79F3D.5050802@cs.oswego.edu> Continuing my quest to introduce issues early and often... Assuming that we go ahead with the idea of ensuring a release fence upon construction (normally free, because piggybacked with those required anyway for object headers), rather than only in the presence of final fields, do we need to say anything special about final fields? I can't quite rule it out. Thoughts welcome. Background: Optimizers like to remove unnecessary reads (and computations based on them). It seems that any plausible memory model will allow cases based on the idea that if you can identify a readsFrom source for a value, and you've already read it, then no additional ordering constraints could force you to re-read, so don't. In a more ideal world, "final" would allow a more aggressive version: If you've (implicitly) identified ANY readsFrom source, that's good enough, because there is only one. Unfortunately, "final" doesn't strictly mean this in JVMs -- there are cheats sometimes allowing further updates to final variables. And in practice JVMs are conservative enough to allow those cheats to work, despite some wording in the JSR133 JMM allowing them not to work. Additionally, JDK8 hotspot introduced the @Stable annotation that in essence says: if the value is nonnull, then it is the final written value. And similar cases arise in which there may be bookkeeping to track "Freeze after writing" status (https://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR710), and a possible JDK9 proposal for explicitly "frozen" arrays. The question at hand is: Does a memory model itself need to say anything explicitly about any of these? -Doug From ludwig.mark at siemens.com Sun Feb 9 07:33:37 2014 From: ludwig.mark at siemens.com (Ludwig, Mark) Date: Sun, 9 Feb 2014 15:33:37 +0000 Subject: [jmm-dev] Atomic references Message-ID: Greetings, This is probably in the category of "picking a nit," but I have yet to find any statement that shared Java variables that are object references are atomic. (An example of a shared variable is a class static declaration.) That is, I have the impression that a shared variable that is a reference to an object (these days, a 4- or 8-byte pointer at the hardware level for 32- or 64-bit architecture, respectively) is naturally atomic, that if I have two or more asynchronous Java threads assigning an object reference to a shared variable, or assigning null to a shared variable, that any other thread reading that reference will consistently read either null or one of the object references assigned along the way. We believe this is true because of the need for such atomicity within the hardware, so it naturally provides this in the machine instructions that store and retrieve addresses (between registers and main memory). This assumption might only hold for properly-aligned values in main memory, but we assume that Java provides this at the machine level, naturally, too. ("Properly aligned" means that any address referring to a 4-byte address in memory has zeroes in the last two bits, and an 8-byte reference has zeroes in the last three bits.) To be clear, I am /not/ talking about an 8-byte /anything/ on a 32-bit architecture. I am also aware that, without synchronization, there is a timing window among the threads about what they read (that any thread may read out-of-date data for an indeterminate period of time according to the hardware caching architecture). My point is not about timing, per se, but about self-consistency among the bytes comprising a reference in a shared variable. We use this assumption heavily in a server application, and have yet to ever hear of any trouble or concern around this, but cannot find anything specifying this behavior. While all you distinguished folks are updating the JMM, I thought you might cover this. OTOH, if it /is/ specified, I'd appreciate a pointer to the language specifying it. For background - and in case I haven't used terms above that precisely mean what I intend: We use this technique for letting threads allocate a Singleton, with an idempotent construction sequence, that is accessed at very high frequency, without using any synchronization. (Each thread looks for null and if it is, constructs the Singleton and assigns the reference.) This makes sense to us when the code to construct the Singleton is cheap enough, and we have strictly limited the Singleton to final fields in order to use the existing JMM guarantee that when construction finishes, it's safe to let any thread pick up a reference to the object and use it asynchronously. I use the label "very high frequency" when accesses to the Singleton occur within each thread perhaps thousands of times per second on a fast-enough machine. We believe it's cheaper to let every thread (at the worst case) construct the Singleton, and let the garbage collector take care of cleaning up the duplicates (if any), than it would be to synchronize around the reference for the life of the application. The server application runs for indeterminate periods of time ... easily months, depending on scheduled down-time. It creates threads for client actions. (We write business software in use by an unknown number of customers.) At large customer sites, easily millions of threads go through this code while the application is running. Such sites have large processor complexes. We know one customer has 64 processors in a large server machine. The fact that threads are created to service client requests also means that the number of competing threads during application start-up is limited to a fairly small number. I would be surprised if any customer could manage to get even ten (10) client actions running concurrently that might all construct the Singleton. At the peak of the working day, once there are thousands of users sending requests to the server, it's reasonable to expect that every processor is reading the reference to the Singleton perhaps thousands of times per second. (We know from scalability testing that synchronizing around such a reference imposes a noticeable bottleneck.) Thanks! Mark Ludwig Lifecycle Coll Product Lifecycle Management Siemens Industry Sector Siemens Product Lifecycle Management Software Inc. 5939 Rice Creek Parkway Shoreview, MN 55126 United States Tel. :+1 (651) 855-6140 Fax :+1 (651) 855-6280 ludwig.mark at siemens.com www.siemens.com/plm From dl at cs.oswego.edu Sun Feb 9 11:57:36 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 09 Feb 2014 14:57:36 -0500 Subject: [jmm-dev] Atomic references In-Reply-To: References: Message-ID: <52F7DDB0.8040503@cs.oswego.edu> On 02/09/2014 10:33 AM, Ludwig, Mark wrote: > This is probably in the category of "picking a nit," but I have yet to find > any statement that shared Java variables that are object references are > atomic. Yes, they must be. The statement is hiding in JLS sec 17.7 "Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values." http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.7 (Curiously, there is no direct statement that ints, shorts, chars, or bytes are atomic. This probably needs fixing.) -Doug From jeremymanson at google.com Sun Feb 9 16:34:32 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Sun, 9 Feb 2014 16:34:32 -0800 Subject: [jmm-dev] final reads/writes In-Reply-To: <52F79F3D.5050802@cs.oswego.edu> References: <52F79F3D.5050802@cs.oswego.edu> Message-ID: So, to reiterate my previous postings (too): I used to believe that a memory model should provide extra latitude for optimizations in these cases, but now I'm perfectly happy with not saying anything. We do need to talk about the safety guarantees, and anything that VM developers can do given those safety guarantees is fine. My mind was changed when we tried hoisting final loads in Hotspot, and got a percent or so speedup, but threw some code into infinite loops. I think it is just too confusing for users if they change the value of a field, but subsequent loads - regular, same-thread loads - don't see the changed value. Compilers should only do a loop invariant hoist if they can prove that the value being hoisted is loop invariant. Now, if you can prove that the value doesn't change, then you can certainly do the optimization. But in that case, the final annotation is a hint, not a proof. I'm not sure that the JMM needs to say anything about that. Jeremy On Sun, Feb 9, 2014 at 7:31 AM, Doug Lea
wrote: > Continuing my quest to introduce issues early and often... > > Assuming that we go ahead with the idea of ensuring a release fence > upon construction (normally free, because piggybacked with those > required anyway for object headers), rather than only in the presence > of final fields, do we need to say anything special about final fields? > > I can't quite rule it out. Thoughts welcome. > > Background: Optimizers like to remove unnecessary reads > (and computations based on them). It seems that any plausible > memory model will allow cases based on the idea that if you can > identify a readsFrom source for a value, and you've already > read it, then no additional ordering constraints could > force you to re-read, so don't. > > In a more ideal world, "final" would allow a more aggressive > version: If you've (implicitly) identified ANY readsFrom source, > that's good enough, because there is only one. Unfortunately, "final" > doesn't strictly mean this in JVMs -- there are cheats > sometimes allowing further updates to final variables. And in > practice JVMs are conservative enough to allow those cheats > to work, despite some wording in the JSR133 JMM allowing them > not to work. > > Additionally, JDK8 hotspot introduced the @Stable annotation > that in essence says: if the value is nonnull, then it is the final > written value. And similar cases arise in which there may be > bookkeeping to track "Freeze after writing" status > (https://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR710), > and a possible JDK9 proposal for explicitly "frozen" arrays. > > The question at hand is: Does a memory model itself need to say > anything explicitly about any of these? > > -Doug > > From aleksey.shipilev at oracle.com Mon Feb 10 11:18:25 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Feb 2014 23:18:25 +0400 Subject: [jmm-dev] Enforcing access atomicity (benchmarks) Message-ID: <52F92601.7040700@oracle.com> Hi there, Here you go, the early estimates for enforcing access atomicity: http://shipilev.net/blog/2014/all-accesses-are-atomic/ Go straight to "Conclusion" for TL;DR summary. In short, in 2014, most platforms are already able to pull off 64-bit accesses, and hence it seems redundant to keep the 64-bit exception in the spec. Thanks, -Aleksey. From aleksey.shipilev at oracle.com Mon Feb 10 11:18:33 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 10 Feb 2014 23:18:33 +0400 Subject: [jmm-dev] Enforcing access atomicity (benchmarks) Message-ID: <52F92609.5070407@oracle.com> Hi there, Here you go, the early estimates for enforcing access atomicity: http://shipilev.net/blog/2014/all-accesses-are-atomic/ Go straight to "Conclusion" for TL;DR summary. In short, in 2014, most platforms are already able to pull off 64-bit accesses, and hence it seems redundant to keep the 64-bit exception in the spec. Thanks, -Aleksey. From dl at cs.oswego.edu Fri Feb 14 05:49:36 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Feb 2014 08:49:36 -0500 Subject: [jmm-dev] Enforcing access atomicity (benchmarks) In-Reply-To: <52F92609.5070407@oracle.com> References: <52F92609.5070407@oracle.com> Message-ID: <52FE1EF0.30006@cs.oswego.edu> On 02/10/2014 02:18 PM, Aleksey Shipilev wrote: > Here you go, the early estimates for enforcing access atomicity: > http://shipilev.net/blog/2014/all-accesses-are-atomic/ > > Go straight to "Conclusion" for TL;DR summary. In short, in 2014, most > platforms are already able to pull off 64-bit accesses Thanks! This prodded me to further investigate a few issues: Are there ANY platforms that do or could otherwise support JVMs and for which there is no reasonable way to conform? The answer depends on how far you want to stretch "reasonable" (worst case, JVMs could insert locks), but some 32bit versions of PPC and MIPS seem problematic. Also, it might be the case that floating-point (double) on ARM (even ARMv8) requires special handling. The answer also depends on what you mean by "JVM". Java "ME" (M for Mobile) specs have not kept pace with the "SE" specs that we've implicitly been targeting. Most but probably not all problematic cases are only relevant for ME anyway. Backing up, the main reason for contemplating this is spec simplification. Getting rid of non-obvious rules and special cases one by one may eventually result in a model/spec that overcomes the "you are not expected to understand this" reputation of the JMM among developers. An argument for not simplifying is that programs shouldn't have any races where non-atomicity would be observable anyway. It's a pretty good argument, although not very convincing to some developers writing code for monitoring and profiling, as well as some numerical heuristics. They often could care less about race-freedom so long as they arrive at empirically acceptable approximations of reality. And in any case, the presence of potential non-atomicity causing reads of a long or double to rarely take crazy/wild values only on uncommon platforms is not a very nice way to alert people of problems. Another argument for not simplifying is that (as Brian mentioned) we expect JDK9 to support wider value types of some sort; surely including those for which no processor guarantees atomicity. So there will always be atomicity disclaimers of some kind somewhere. Across these concerns, it seems that resolving this issue is mostly a policy decision. I welcome any more compelling arguments on either side than I listed above. Without them, this might not become settled until (much) later when canvassing broader community input. -Doug From dl at cs.oswego.edu Sun Feb 16 15:00:30 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 16 Feb 2014 18:00:30 -0500 Subject: [jmm-dev] stores Message-ID: <5301430E.9010009@cs.oswego.edu> Memory models can generate a fair amount of excitement. See Linus Torvalds's post on the linux kernel list: https://lkml.org/lkml/2014/2/14/492 and follow-ups with Paul McKenney. (Condolences!) I don't think this introduces anything new with respect to JMM9 discussions so far though. In general, speculative stores and out-of-thin-air reads break basic safety properties. Although there still might be some related open cases about inserted stores, including "redundant" ones. As in: if (x != 0) x = 0; ==> x = 0; ? -Doug From david.holmes at oracle.com Sun Feb 16 16:47:06 2014 From: david.holmes at oracle.com (David Holmes) Date: Mon, 17 Feb 2014 10:47:06 +1000 Subject: [jmm-dev] Enforcing access atomicity (benchmarks) In-Reply-To: <52FE1EF0.30006@cs.oswego.edu> References: <52F92609.5070407@oracle.com> <52FE1EF0.30006@cs.oswego.edu> Message-ID: <53015C0A.204@oracle.com> On 14/02/2014 11:49 PM, Doug Lea wrote: > On 02/10/2014 02:18 PM, Aleksey Shipilev wrote: >> Here you go, the early estimates for enforcing access atomicity: >> http://shipilev.net/blog/2014/all-accesses-are-atomic/ >> >> Go straight to "Conclusion" for TL;DR summary. In short, in 2014, most >> platforms are already able to pull off 64-bit accesses > > Thanks! This prodded me to further investigate a few issues: > > Are there ANY platforms that do or could otherwise support JVMs > and for which there is no reasonable way to conform? > > The answer depends on how far you want to stretch "reasonable" > (worst case, JVMs could insert locks), but some 32bit versions > of PPC and MIPS seem problematic. Also, it might be the case that > floating-point (double) on ARM (even ARMv8) requires special handling. Not just double but also float. The ARMv8 spec not only doesn't provide single-copy atomicity for 64-bit FP values but it even rolls back the 32-bit guarantees, for FP, to only providing byte-level single-copy atomicity! So in theory a float/double can be loaded/stored one byte at a time! Perhaps our ARM folk could comment on this as we've been used to getting 32-bit atomic accesses on 32-bit platforms for an awful long time now. > The answer also depends on what you mean by "JVM". > Java "ME" (M for Mobile) specs have not kept pace with > the "SE" specs that we've implicitly been targeting. > Most but probably not all problematic cases are only > relevant for ME anyway. SE Embedded is impacted by this. > Backing up, the main reason for contemplating this is spec > simplification. Getting rid of non-obvious rules and special > cases one by one may eventually result in a model/spec that > overcomes the "you are not expected to understand this" > reputation of the JMM among developers. I think the non-atomic treatment of long/double unless volatile is so isolated in the memory-model, and so stand-alone and simple, that removing it would be imperceptible in the overall complexity of the JMM. David ------ > An argument for not simplifying is that programs shouldn't > have any races where non-atomicity would be observable anyway. > It's a pretty good argument, although not very convincing > to some developers writing code for monitoring and profiling, > as well as some numerical heuristics. They often could care less > about race-freedom so long as they arrive at empirically > acceptable approximations of reality. And in any case, > the presence of potential non-atomicity causing reads of a > long or double to rarely take crazy/wild values only > on uncommon platforms is not a very nice way to alert people > of problems. > > Another argument for not simplifying is that (as Brian mentioned) > we expect JDK9 to support wider value types of some sort; > surely including those for which no processor guarantees > atomicity. So there will always be atomicity disclaimers of > some kind somewhere. > > Across these concerns, it seems that resolving this issue is > mostly a policy decision. I welcome any more compelling > arguments on either side than I listed above. Without > them, this might not become settled until (much) later when > canvassing broader community input. > > -Doug > From aleksey.shipilev at oracle.com Mon Feb 17 00:56:56 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 17 Feb 2014 12:56:56 +0400 Subject: [jmm-dev] Enforcing access atomicity (benchmarks) In-Reply-To: <52FE1EF0.30006@cs.oswego.edu> References: <52F92609.5070407@oracle.com> <52FE1EF0.30006@cs.oswego.edu> Message-ID: <5301CED8.6040706@oracle.com> On 02/14/2014 05:49 PM, Doug Lea wrote: > On 02/10/2014 02:18 PM, Aleksey Shipilev wrote: >> Here you go, the early estimates for enforcing access atomicity: >> http://shipilev.net/blog/2014/all-accesses-are-atomic/ >> >> Go straight to "Conclusion" for TL;DR summary. In short, in 2014, most >> platforms are already able to pull off 64-bit accesses > > Are there ANY platforms that do or could otherwise support JVMs > and for which there is no reasonable way to conform? > > The answer depends on how far you want to stretch "reasonable" > (worst case, JVMs could insert locks), but some 32bit versions > of PPC and MIPS seem problematic. Also, it might be the case that > floating-point (double) on ARM (even ARMv8) requires special handling. I would *really* like the feedback from ARM folks on this, because the performance experiments need a properly functional and correct access code for all the platforms. While we haven't found the empirical evidence those we have now are broken, it might be just the luck? > An argument for not simplifying is that programs shouldn't > have any races where non-atomicity would be observable anyway. > It's a pretty good argument, although not very convincing > to some developers writing code for monitoring and profiling, > as well as some numerical heuristics. They often could care less > about race-freedom so long as they arrive at empirically > acceptable approximations of reality. And in any case, > the presence of potential non-atomicity causing reads of a > long or double to rarely take crazy/wild values only > on uncommon platforms is not a very nice way to alert people > of problems. +1. Having written some sophisticated high-performance code in Java, I am excited about the access atomicity guarantees, when you are dealing with the eventually-consistent code. I would be more relaxed about long/double exception for ordinary load/stores, as long as there is a way to achieve *only* the access atomicity, without burdening myself with the memory semantics around volatiles. That's one of the things my post was trying to showcase: the add-on of volatile semantics significantly increases the costs comparing to "just" the atomic access. > Another argument for not simplifying is that (as Brian mentioned) > we expect JDK9 to support wider value types of some sort; > surely including those for which no processor guarantees > atomicity. So there will always be atomicity disclaimers of > some kind somewhere. I think there is a slight bias in the way we ask the question. We call it "drop the access atomicity exception", while we really should discuss "requiring the access atomicity for 64-bit types as well". The argument about value types surely fits the former discussion, because why dropping the exception, if we are about to reintroduce it? The latter one is more interesting: where we draw the line about what accesses are atomic, and what are not. -Aleksey. From dl at cs.oswego.edu Mon Feb 17 06:00:51 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 17 Feb 2014 09:00:51 -0500 Subject: [jmm-dev] Enforcing access atomicity (benchmarks) In-Reply-To: <53015C0A.204@oracle.com> References: <52F92609.5070407@oracle.com> <52FE1EF0.30006@cs.oswego.edu> <53015C0A.204@oracle.com> Message-ID: <53021613.6020409@cs.oswego.edu> On 02/16/2014 07:47 PM, David Holmes wrote: > > I think the non-atomic treatment of long/double unless volatile is so isolated > in the memory-model, and so stand-alone and simple, that removing it would be > imperceptible in the overall complexity of the JMM. > True in the JSR133 JMM, but for JMM9, I think we'd like to equate non-volatile access with relaxed mode of volatile. To preserve this, we need more modes. Maybe we do anyway. Stealing a term from clang (http://llvm.org/docs/Atomics.html), we could use "monotonic", that combines atomicity with not allowing time to run backwards. Here's how this might fit in to the One Page Memory Model (that includes a big cheat for now.) ... 1. A program consists of one or more .class files containing bytecodes, typically translated from a source language. A program starts by accessing an object constructed in accord with a given .class file. 2. Any read (i.e., a get* bytecode) returns a value written (i.e., a put* bytecode) by some thread, as constrained by rules below, or in the absence of well-founded constraints, zero (0/0.0/false/null). All accesses are guaranteed atomic except for those of long or double variables that are either non-volatile or accessed in Relaxed mode. 3. The order of (get* put*) bytecodes accessing ordinary variables or access invocations in lRelaxed mode for volatile variables imposes no ordering constraints on execution except for the following: a. Indirect read ordering is always preserved; i.e., any such read is equivalent to getDependent(). b. Field assignments within constructors are always ordered before subsequent program-order assignments of references to the constructed objects; i.e., each such store is equivalent to field.setDependent(constructedObject). c. Data-race-free programs using monitor locks are sequentially consistent. [The big cheat for now.] 4. Explicit ordering control is available using volatiles, fence methods, and .volatile expressions, as follows, illustrated using the .volatile form: * Monotonic mode v.getMonotonic() and v.setMonotonic(x) given a: v.getMonotonic() and b: v.getMonotonic(), and a is before b in bytcode order, then it is not the case that b is ordered before a in any execution. [Etc.] * Acquire/Release mode. v.getAcquire() and v.setRelease(x) [explain...] * Indirection dependent mode. v.getDependent(ref) and v.setDependent(x, ref) [explain as scoped acquire/release...] * Sequential mode. v.getSequential() and v.setSequential(x) [explain...] 5. Other misc: Thread.start etc. From Peter.Sewell at cl.cam.ac.uk Mon Feb 17 08:45:59 2014 From: Peter.Sewell at cl.cam.ac.uk (Peter Sewell) Date: Mon, 17 Feb 2014 16:45:59 +0000 Subject: [jmm-dev] thin-air summary Message-ID: Dear all, Mark Batty and I have written a short note trying to summarise the thin-air problem as crisply as we can: http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html Comments welcome, of course. We've also been thinking here about possible approaches; hopefully we'll have another note about that in a few days. Peter From david.holmes at oracle.com Mon Feb 17 20:14:12 2014 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 Feb 2014 14:14:12 +1000 Subject: [jmm-dev] thin-air summary In-Reply-To: References: Message-ID: <5302DE14.8080709@oracle.com> Hi Peter, On 18/02/2014 2:45 AM, Peter Sewell wrote: > Dear all, > > Mark Batty and I have written a short note trying to summarise the > thin-air problem as crisply as we can: > > http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html > > Comments welcome, of course. We've also been thinking here about > possible approaches; hopefully we'll have another note about that in a > few days. I'm a lay-person when it comes to the formalities of all this but given: 4 Example LB+ctrl+data+ctrl-double (language must allow) r1 = x; // reads 42 if (r1 == 42) y = r1; --------------------------- r2 = y; // reads 42 if (r2 == 42) x = 42; else x = 42; the compiler optimization would elide the conditional and simply do the store, so this then reduces to: r1 = x; // reads 42 if (r1 == 42) y = r1; --------------------------- r2 = y; // reads 42 x = 42; which, as stated, now matches "3 Example LB+ctrl+data+po". But in that case I don't understand how you can say that for "5 Example LB+ctrl+data+ctrl-single" "the candidate execution that we want to forbid here is identical to the execution of the previous example that we have to allow" - as 5 has a conditional and 4 no longer does, hence they are no longer the same ? Further/similarly, it would seem based on these examples (and I realize that there may well be other examples that show otherwise) that the straw-man of prohibiting the (rf+dep) cycle would hold if you first reduced the code to its "minimal" form ie once 4 is reduced to 3 there is no cycle. Of course I may have just shifted the problem into one of being able to define what a "minimal" form is. :) David ----- > Peter > From Peter.Sewell at cl.cam.ac.uk Tue Feb 18 00:32:51 2014 From: Peter.Sewell at cl.cam.ac.uk (Peter Sewell) Date: Tue, 18 Feb 2014 08:32:51 +0000 Subject: [jmm-dev] thin-air summary In-Reply-To: <5302DE14.8080709@oracle.com> References: <5302DE14.8080709@oracle.com> Message-ID: On 18 February 2014 04:14, David Holmes wrote: > Hi Peter, > > > On 18/02/2014 2:45 AM, Peter Sewell wrote: >> >> Dear all, >> >> Mark Batty and I have written a short note trying to summarise the >> thin-air problem as crisply as we can: >> >> http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html >> >> Comments welcome, of course. We've also been thinking here about >> possible approaches; hopefully we'll have another note about that in a >> few days. > > > I'm a lay-person when it comes to the formalities of all this but given: > > 4 Example LB+ctrl+data+ctrl-double (language must allow) > > r1 = x; // reads 42 > if (r1 == 42) > y = r1; > --------------------------- > r2 = y; // reads 42 > if (r2 == 42) > x = 42; > else > x = 42; > > the compiler optimization would elide the conditional and simply do the > store, so this then reduces to: > > r1 = x; // reads 42 > if (r1 == 42) > y = r1; > --------------------------- > r2 = y; // reads 42 > x = 42; > > which, as stated, now matches "3 Example LB+ctrl+data+po". But in that case > I don't understand how you can say that for "5 Example > LB+ctrl+data+ctrl-single" "the candidate execution that we want to forbid > here is identical to the execution of the previous example that we have to > allow" - as 5 has a conditional and 4 no longer does, hence they are no > longer the same ? What we're showing here is that "shows that thin-air executions cannot be forbidden by any per-candidate-execution condition **using the C/C++11 notion of candidate executions**". That notion (summarised in Section 2.5 of the Batty et al. POPL11 paper ) is not a trace of machine instructions, but instead a set of memory access events together with various relations over them. Branches don't appear explicitly, and those relations don't include control dependency. > Further/similarly, it would seem based on these examples (and I realize that > there may well be other examples that show otherwise) that the straw-man of > prohibiting the (rf+dep) cycle would hold if you first reduced the code to > its "minimal" form ie once 4 is reduced to 3 there is no cycle. > > Of course I may have just shifted the problem into one of being able to > define what a "minimal" form is. :) indeed... Peter From dl at cs.oswego.edu Wed Feb 19 05:54:00 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 19 Feb 2014 08:54:00 -0500 Subject: [jmm-dev] Enhanced Volatiles Message-ID: <5304B778.3040104@cs.oswego.edu> Just as an FYI, I submitted the JEP pasted below, that includes a few small updates reflecting feedback on openjdk lists. ... Title: Enhanced Volatiles Author: Doug Lea Organization: SUNY Oswego Created: 2014/01/06 Type: Feature State: Draft Exposure: Open Component: core/libs core/lang vm/rt Scope: JDK Discussion: core-libs-dev at openjdk.java.net compiler-dev at openjdk.java.net hotspot-dev at openjdk.java.net Start: 2014/Q1 Depends: JEP-188 Effort: M Duration: L Template: 1.0 Reviewed-by: Endorsed-by: Brian Goetz Summary ------- This JEP results in a means for programmers to invoke the equivalents of java.util.concurrent.atomic methods on object fields. Motivation ---------- As concurrent and parallel programming in Java continue to expand, programmers are increasingly frustrated by not being able to use Java constructions for arranging atomic or ordered operations for the fields of individual classes; for example atomically incrementing a "count" field. Until now the only ways to achieve these effects were to use a stand-alone AtomicInteger (adding both space overhead and additional concurrency issues to manage indirection) or, in some situations, to use atomic FieldUpdaters (often encountering more overhead than the operation itself), or to use JVM Unsafe intrinsics. Because intrinsics are preferable on performance grounds, their use has been increasingly common, to the detriment of safety and portability. Without this JEP, these problems are expected to become worse as atomic APIs expand to cover additional access consistency policies (aligned with the recent C++11 memory model) as part of Java Memory Model revisions. Description ----------- The target solution requires a syntax enhancement, a few library enhancements, and compiler support. We model the extended operations on volatile integers via an interface VolatileInt, that also captures the functionality of AtomicInteger (which will also be updated to reflect Java Memory Model revisions as part of this JEP). A tentative version is below. Similar interfaces are needed for other primitive and reference types. We then enable access to corresponding methods for fields using the ".volatile" prefix. For example: class Usage { volatile int count; int incrementCount() { return count.volatile.incrementAndGet(); } } The ".volatile" syntax is slightly unusual, but we are confident that it is syntactically unambiguous and semantically specifiable. New syntax is required to avoid ambiguities with existing usages, especially for volatile references -- invocations of methods on the reference versus the referent would be indistinguishable. The ".volatile" prefix introduces a scope for operations on these "L-values", not their retrieved contents. However, just using the prefix itself without a method invocation (as in "count.volatile;") would be meaningless and illegal. We also expect to allow volatile operations on array elements in addition to fields. Enforcement of semantic restrictions (for example attempted usages for "final" fields) will require compiler support. The main task is to translate these calls into corresponding JVM intrinsics. The most likely option is for the source compiler to use method handles. This and other techniques are known to suffice, but are subject to further exploration. Minor enhancements to intrinsics and a few additional JDK library methods may also be needed. Here is a tentative VolatileInt interface. Those for other types are similar. The final released versions will surely differ, subject to the results of JEP-188. interface VolatileInt { int get(); int getRelaxed(); int getAcquire(); int getSequential(); void set(int x); void setRelaxed(int x); void setRelease(int x); void setSequential(int x); int getAndSet(int x); boolean compareAndSet(int e, int x); boolean compareAndSetAcquire(int e, int x); boolean compareAndSetRelease(int e, int x); boolean weakCompareAndSet(int e, int x); boolean weakCompareAndSetAcquire(int e, int x); boolean weakCompareAndSetRelease(int e, int x); int getAndAdd(int x); int addAndGet(int x); int getAndIncrement(); int incrementAndGet(); int getAndDecrement(); int decrementAndGet(); } This proposal focuses on the control of atomicity and ordering for single variables. We expect the resulting specifications to be amenable for extension in natural ways for additional primitive-like value types, if they are ever defined for Java. However, it is not a general-purpose transaction mechanism for controlling accesses and updates to multiple variables. Alternative forms for expressing and implementing such constructions may be explored in the course of this JEP, and may be the subject of further JEPs. Alternatives ------------ We considered instead introducing new forms of "value type" that support volatile operations. However, this would be inconsistent with properties of other types, and would also require more effort for programmers to use. We also considered expanding reliance on java.util.concurrent.atomic FieldUpdaters, but their dynamic overhead and usage limitations make them unsuitable. Several other alternatives (including those based on field references) have been raised and dismissed as unworkable on syntactic, efficiency, and/or usability grounds over the many years that these issues have been discussed. Risks and Assumptions --------------------- We are confident of feasibility. However, we expect that it will require more experimentation to arrive at compilation techniques that result in efficient enough implementation for routine use in the performance-critical contexts where these constructs are most often needed. The use of method handles may be impacted by and may impact JVM method handle support. Impact ------ A large number of usages in java.util.concurrent (and a few elsewhere in JDK) could be simplified and updated to use this support. From ajeffrey at bell-labs.com Wed Feb 19 12:07:02 2014 From: ajeffrey at bell-labs.com (Alan Jeffrey) Date: Wed, 19 Feb 2014 14:07:02 -0600 Subject: [jmm-dev] LTL specification of relaxed memory Message-ID: <53050EE6.3050507@bell-labs.com> Hi everyone, I've been messing around with proving the DRF theorem for a Mazurkeiwicz trace model of relaxed memory. I'm pretty close to convincing the Agda theorem prover that this is true, most of the time has been spent coming up with good definitions. I think the definitions are in a state worth sharing... Recall that a Mazurkeiwicz trace model consists of an alphabet Sigma, with a binary "independence" relation I. This induces an equivalence ~ on Sigma^* given as the smallest congruence containing: ab ~ ba (for any (a,b) in I) We can define a variant of past time Linear Temporal Logic, whose semantics is given as subsets of Sigma^*. The usual operators of LTL are: epsilon not in (prev phi) sa in (prev phi) whenever s in phi epsilon in (wprev phi) sa in (wprev phi) whenever s in phi (always phi) = (phi and wprev(always phi)) (sometime phi) = (phi or prev(sometime phi)) (phi since psi) = (psi or (phi and prev(phi since psi)) The interesting new operator is a permutation operator: s in (permute phi) whenever t in phi for some s ~ t From permute we can define a "previous state up to permutation" as: (pprev phi) = exists(a) (a and permute((not a) since (a and prev(phi)))) unpacking this, sa in (pprev phi) whenever there is some sa ~ tau, a is not in u, and t is in phi. LTL can be used to specify the relaxed memory model we're interested in (I think). Making use of two new binary relations on Sigma: C thought of as "read-write conflict" J thought of as "read-write justification" the canonical example being: (W x=v, W x=w) in C (R x=v, W x=w) in C (W x=v, R x=w) in C (W x=v, R x=v) in J The LTL spec for sequential consistency is: start = always false justified(a) = start or (not(C(a)) since J(a)) sconsistent = always forall(a) (a implies prev(justified(a)) Unpacking this... * start is only true on the empty trace epsilon. * justified(a) is true either if a is the initial action or we can find a past action b which justifies a, and there is no action c between a and b in conflict with a. * sconsistent is true if every action is preceded by a justifier. After all this, the LTL spec for relaxed consistency is: rconsistent = always forall(a) (a implies pprev(justified(a)) that is, the only difference between sequential consistency and relaxed consistency is whether we use "prev" (previous state) or "pprev" (previous state up to permutation). For example, the canonical trace for relaxed memory is: s = (1: W x=0) (1: W x=1) (1: W y=1) (2: R y=1) (2: R x=0) We can double-check that s not in sconsistent (since the justifier for the action (2: R x=0) is (1: W x=0) but there is an intervening action (1: W x=1) which is conflict with (2: R x=0)). On the other hand s is in rconsistent, since we have: s ~ (1: W x=0) (2: R x=0) (1: W x=1) (1: W y=1) (2: R y=1) and so (1: W x=0) can act as the justifier up to permutation. Just to finish off the problem spec, we can define the DRF property as: datarace = sometime exists(a) (a and pprev(C(a))) drf = sconsistent implies not datarace so the problem is find conditions on P such that if [P implies drf] and [P implies rconsistent] then [P implies sconsistent]. Alan. From dl at cs.oswego.edu Wed Feb 19 16:56:35 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 19 Feb 2014 19:56:35 -0500 Subject: [jmm-dev] LTL specification of relaxed memory In-Reply-To: <53050EE6.3050507@bell-labs.com> References: <53050EE6.3050507@bell-labs.com> Message-ID: <530552C3.5010008@cs.oswego.edu> On 02/19/2014 03:07 PM, Alan Jeffrey wrote: > datarace = sometime exists(a) (a and pprev(C(a))) > drf = sconsistent implies not datarace > > so the problem is find conditions on P such that if [P implies drf] and [P > implies rconsistent] then [P implies sconsistent]. > Where, as a first step, the conditions amount to some representation of lock-based access? -Doug From ajeffrey at bell-labs.com Thu Feb 20 07:34:40 2014 From: ajeffrey at bell-labs.com (Alan Jeffrey) Date: Thu, 20 Feb 2014 09:34:40 -0600 Subject: [jmm-dev] LTL specification of relaxed memory In-Reply-To: <530552C3.5010008@cs.oswego.edu> References: <53050EE6.3050507@bell-labs.com> <530552C3.5010008@cs.oswego.edu> Message-ID: <53062090.9020505@bell-labs.com> Locking, volatiles, etc. should be treated by the independence and justification relations. For example a simple model of locks would be something like: (m: (un)lock p) I (n: (un)lock q) when m != n and p != q (m: (un)lock p) I (n: R/W x=v) when m != n (m: lock p) J (n: unlock p) (m: unlock p) J (n: lock p) (init) J (n: lock p) I'm hoping that there's a separation of concerns here, where the DRF theorem can be proved for any suitable I, J and C, and that a variety of memory models can be investigated by varying I, J and C. A. On 02/19/2014 06:56 PM, Doug Lea wrote: > On 02/19/2014 03:07 PM, Alan Jeffrey wrote: > >> datarace = sometime exists(a) (a and pprev(C(a))) >> drf = sconsistent implies not datarace >> >> so the problem is find conditions on P such that if [P implies drf] >> and [P >> implies rconsistent] then [P implies sconsistent]. >> > > Where, as a first step, the conditions amount to some representation > of lock-based access? > > -Doug > > From luc.maranget at inria.fr Thu Feb 20 08:50:00 2014 From: luc.maranget at inria.fr (Luc Maranget) Date: Thu, 20 Feb 2014 17:50:00 +0100 Subject: [jmm-dev] thin-air summary In-Reply-To: References: Message-ID: <20140220165000.GA29036@yquem.inria.fr> Dear all, We have extended our litmus testing infrastructure so as to handle C11 (small) programs. I have just run Mark and Peter examples on one ARM system (DragonBroard, running some old android) with experimental gcc cross compiler with -O2 (arm-linux-gnueabi-gcc (GCC) 4.9.0 20140213 (experimental)) We exactly observe the results predicted by Peter in his note on the first five examples (we cannot handle the sixth example yet) |Kind | APQ8060 --------------------------------------------- --------------------------------------------- LB |Allow | Ok, 332/100M --------------------------------------------- LB+datas |Forbid| Ok, 0/100M --------------------------------------------- LB+ctrl+data+po |Allow | Ok, 2/100M --------------------------------------------- LB+ctrl+data+ctrl-double|Allow | Ok, 6/100M --------------------------------------------- LB+ctrl+data+ctrl-single|Forbid| Ok, 0/100M --Luc > Dear all, > > Mark Batty and I have written a short note trying to summarise the > thin-air problem as crisply as we can: > > http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html > > Comments welcome, of course. We've also been thinking here about > possible approaches; hopefully we'll have another note about that in a > few days. > > Peter -- Luc From ajeffrey at bell-labs.com Thu Feb 20 15:17:33 2014 From: ajeffrey at bell-labs.com (Alan Jeffrey) Date: Thu, 20 Feb 2014 17:17:33 -0600 Subject: [jmm-dev] LTL specification of relaxed memory In-Reply-To: <53062090.9020505@bell-labs.com> References: <53050EE6.3050507@bell-labs.com> <530552C3.5010008@cs.oswego.edu> <53062090.9020505@bell-labs.com> Message-ID: <53068D0D.1040401@bell-labs.com> As promised, I now have a proof of the DRF theorem using the LTL formulation of DRF and relaxed consistency. The proof has gone through the Agda proof checker. I needed an auxiliary definition, of "compatible action". Define: b is compatible with a whenever (a I c) implies (b I c) for any c and (a C c) implies (b C c) for any c Note that if b is compatible with a and sa has a data race, then sb has a data race. The requirements on I, J and C are pretty tame: * I is symmetric and irreflexive * C is symmetric * if b in J(a) and c in C(a) then c in C(b) The result is that for any set of traces S which satisfies the following conditions: * S is prefix-closed (that is if sa in S then s in S) * S is justification-enabled (that is if sa in S then sb in S for some b compatible with a and b is justified by s) * S is DRF (that is any sequentially consistent s has no data race) * S is relaxed consistent we have: * S is sequentially consistent Note that there is no notion of commitment or having to use multiple executions for justification. The next step is to check this definition against the torture test... A. On 02/20/2014 09:34 AM, Alan Jeffrey wrote: > Locking, volatiles, etc. should be treated by the independence and > justification relations. For example a simple model of locks would be > something like: > > (m: (un)lock p) I (n: (un)lock q) when m != n and p != q > (m: (un)lock p) I (n: R/W x=v) when m != n > (m: lock p) J (n: unlock p) > (m: unlock p) J (n: lock p) > (init) J (n: lock p) > > I'm hoping that there's a separation of concerns here, where the DRF > theorem can be proved for any suitable I, J and C, and that a variety of > memory models can be investigated by varying I, J and C. > > A. > > On 02/19/2014 06:56 PM, Doug Lea wrote: >> On 02/19/2014 03:07 PM, Alan Jeffrey wrote: >> >>> datarace = sometime exists(a) (a and pprev(C(a))) >>> drf = sconsistent implies not datarace >>> >>> so the problem is find conditions on P such that if [P implies drf] >>> and [P >>> implies rconsistent] then [P implies sconsistent]. >>> >> >> Where, as a first step, the conditions amount to some representation >> of lock-based access? >> >> -Doug >> >> From dl at cs.oswego.edu Sat Feb 22 07:59:21 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 22 Feb 2014 10:59:21 -0500 Subject: [jmm-dev] Sequential Consistency Message-ID: <5308C959.80502@cs.oswego.edu> Another in the continuing series of issues to contemplate: There's a tension between those who believe that all "correct" programs are provably sequentially consistent versus those who consider sequential consistency as a goal only of lock-based programs; not necessarily of those using lock-free techniques and/or are components of distributed systems. (see for example Herlihy & Shavit's "The Art of Multiprocessor Programming" http://store.elsevier.com/The-Art-of-Multiprocessor-Programming/Maurice-Herlihy/isbn-9780080569581/) No one disagrees about the need for a memory model guaranteeing that DRF lock-based programs are sequentially consistent. Other cases may be less clear cut. For the most famous example: Can a program using non-lock-based techniques (for example, using Java volatile loads/stores) be "correct" if it fails some variant of the IRIW test? Is IRIW conformance an unnecessary action-at-a-distance by-product of SC, or does it play some intrinsically useful role in assuring correctness? IRIW is not the only example of a case in which SC imposes conditions that some programmers in some contexts seem not to care about. But it is most famous because it so clearly impacts the nature and cost of mappings (for various modes of load, store, and CAS) on some existing processors as well as potential mappings on future processors. I won't yet try to summarize different positions and rationales, but for now just invite further discussion. -Doug PS: As a reminder, here's IRIW. Given global x, y: Thread 1: x = 1; Thread 2: y = 1; Thread 3: r1 = x; r2 = y; // sees r1 == 1, r2 == 0 Thread 4: r3 = y; r4 = x; // sees r3 == 1, r4 == 0 From jeremymanson at google.com Sat Feb 22 11:58:01 2014 From: jeremymanson at google.com (Jeremy Manson) Date: Sat, 22 Feb 2014 11:58:01 -0800 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <5308C959.80502@cs.oswego.edu> References: <5308C959.80502@cs.oswego.edu> Message-ID: On Sat, Feb 22, 2014 at 7:59 AM, Doug Lea
wrote: > > Another in the continuing series of issues to contemplate: > > There's a tension between those who believe that all "correct" > programs are provably sequentially consistent versus those who > consider sequential consistency as a goal only of lock-based programs; > not necessarily of those using lock-free techniques and/or are > components of distributed systems. (see for example Herlihy & Shavit's > "The Art of Multiprocessor Programming" > http://store.elsevier.com/The-Art-of-Multiprocessor- > Programming/Maurice-Herlihy/isbn-9780080569581/) > > Who falls into the first category? A "correct" program is one where the behavior matches the spec, and if that can be done with non-SC behavior (which it often can), then the conversation is over. I think the major limiting factor for volatiles and atomics supporting SC (which is how I read what you are asking) is whether it can be done reasonably (i.e., with acceptable performance) on the target platforms. If it can, then for everyone's sanity (and in keeping with the desire for Java to have somewhat accessible semantics for stuff like this), it makes sense to specify them as being SC. If it can't, then (IMO) the IRIW-alike idioms are few and far between enough that it makes no sense to try to decrease everyone's performance to support SC for them. Jeremy From john.r.rose at oracle.com Sat Feb 22 14:55:11 2014 From: john.r.rose at oracle.com (John Rose) Date: Sat, 22 Feb 2014 14:55:11 -0800 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <5308C959.80502@cs.oswego.edu> References: <5308C959.80502@cs.oswego.edu> Message-ID: <326E34A2-0606-4100-BA43-42A3DAF4700E@oracle.com> On Feb 22, 2014, at 7:59 AM, Doug Lea
wrote: > IRIW conformance an unnecessary action-at-a-distance > by-product of SC, or does it play some intrinsically useful role in > assuring correctness It sounds like you are asking someone to speak up for the usefulness of SC when the two bits of test state (global x, y) are at an arbitrarily large "distance". Though I am not well-read on this stuff, I'll venture related questions that seem relevant to the JMM and that appealing idea of distance. If x and y are related together because they represent something coherent, SC could act as a fail-safe after everybody loses track of their relation. It would be best not to lose track, though. What are the cases where there is a sufficiently small "distance" that programmers would want SC? One example would be an actor (variables used only by one thread). Another would be an object under a mutex. Or the two variables ("globals") are not under a mutex but are in the same object (cache line?) being racily used. Or the variables are related more tenuously but all the threads agree in a single safely published access path to both. Is there an idea of "distance" or locality for the JMM that would be useful to programmers? Would it be useful to provide programmers SC within a singly-rooted connected subgraph of Java heap nodes? Can we define such subgraphs in a way that is not sensitive to mutations in the connecting references? Can we use ideas of reference variables which are immutable (time invariant) or have some monotonicity (set-once)? Or does the research demonstrate IRIW-like anomalies for all sorts of "distances"? ? John From aleksey.shipilev at oracle.com Sun Feb 23 01:06:42 2014 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Sun, 23 Feb 2014 13:06:42 +0400 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <5308C959.80502@cs.oswego.edu> References: <5308C959.80502@cs.oswego.edu> Message-ID: <5309BA22.9090900@oracle.com> On 02/22/2014 07:59 PM, Doug Lea wrote: > Other cases may be less clear cut. For the most famous example: Can a > program using non-lock-based techniques (for example, using Java > volatile loads/stores) be "correct" if it fails some variant of the IRIW > test? Is IRIW conformance an unnecessary action-at-a-distance > by-product of SC, or does it play some intrinsically useful role in > assuring correctness? IMO, we are on a thin ice here. The absence of counter-examples how non-SC behaviors for IRIW-like constructions demolish the correctness at larger scale does not mean we wouldn't find the case where it breaks badly in future, when the spec solidifies. In other words, absence of evidence is not evidence of absence. I, for one, would not like to wake up to another double-checked-locking-like calamity because we allowed a particular sneaky behavior in the name of performance. And yes, being the performance guy, I still think strong correctness wins over performance ten times over. The relaxations are welcome, but only in a few very constrained places, where you are able to relatively easy fix/rewrite the bad usages or even provide stronger ad-hoc semantics. On other words, the things you allow in a library (e.g. Linux RCU) are not the things you want to burn into a language spec. > IRIW is not the only example of a case in which SC imposes conditions > that some programmers in some contexts seem not to care about. But > it is most famous because it so clearly impacts the nature and cost of > mappings (for various modes of load, store, and CAS) on some existing > processors as well as potential mappings on future processors. Being the language guy, I think the hardware not being able to provide the sane SC primitives should pay up the costs. The hardware which makes it relatively easy to implement the non-tricky language memory model should be in the sweet spot. -Aleksey. From dl at cs.oswego.edu Sun Feb 23 05:59:41 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Sun, 23 Feb 2014 08:59:41 -0500 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <5308C959.80502@cs.oswego.edu> References: <5308C959.80502@cs.oswego.edu> Message-ID: <5309FECD.3000502@cs.oswego.edu> On 02/22/2014 10:59 AM, Doug Lea wrote: > I won't yet try to summarize different positions and rationales, > but for now just invite further discussion. That was too cowardly. Here's a shot at summarizing some of the historical context. > PS: As a reminder, here's IRIW. Given global x, y: > Thread 1: x = 1; > Thread 2: y = 1; > Thread 3: r1 = x; r2 = y; // sees r1 == 1, r2 == 0 > Thread 4: r3 = y; r4 = x; // sees r3 == 1, r4 == 0 (This outcome is not allowed by SC.) The IRIW example is a fun one in part because it is not especially intuitive. Some people do not at first think that it is a result forced by SC. I occasionally present this in courses, and most students' first reaction is that you should use a common lock in all threads if you want to ensure agreement about order of x and y here. The fact that you don't need to strikes some (but by no means all) people as a magical/spooky property of SC. This example (and variants of it) was also among those first driving research into more efficient distributed multicast protocols in the late 80's/early 90's (when I first encountered consistency policies and protocols). Maintaining this property of SC is much more expensive in a distributed setting than other consistency policies that are sufficient to implement most distributed algorithms. SC normally requires blocking on O(#hosts) round-trips per message in the absence of failure, and heavy (and fallible) failure-recovery mechanics. Other policies, including "causal broadcast" (guaranteeing only transitivity of read-write happens-before in producer-consumer chains) usually don't need to wait out all the round-trips (but still require buffering). While the situation is a little better for multiprocessor/multicore designers, it is not surprising that they occasionally propose (as did AMD and then Intel five years or so ago) schemes that are by default weaker (but still with full-SC modes). Arguments for not giving in to the whinings of implementors include those claiming that uniform SC requirements enable better tools, simpler proofs of correctness, more understandable models, and the reduction of counterintuitive orderings. And that no single "natural" property has emerged to replace it, despite a fair amount of trying. -Doug From boehm at acm.org Sun Feb 23 22:52:42 2014 From: boehm at acm.org (Hans Boehm) Date: Sun, 23 Feb 2014 22:52:42 -0800 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <5309FECD.3000502@cs.oswego.edu> References: <5308C959.80502@cs.oswego.edu> <5309FECD.3000502@cs.oswego.edu> Message-ID: I think it's that last comment here that needs to be emphasized: We don't really have a viable candidate property to replace SC, that's anywhere near as easy to reason about and provides significant performance advantages. Several people, including Doug, looked hard for such things when we were talking about C++. As far as I can tell, everyone intuitively wants to reason about thread behavior in terms of interleaving thread actions, possibly after allowing some reorderings within threads. IRIW seems inherently incompatible with that, which might be a partial explanation of why it's difficult to reason directly with consistency properties that allow it. Hans On Sun, Feb 23, 2014 at 5:59 AM, Doug Lea
wrote: > On 02/22/2014 10:59 AM, Doug Lea wrote: > >> I won't yet try to summarize different positions and rationales, >> but for now just invite further discussion. >> > > That was too cowardly. Here's a shot at summarizing some of the > historical context. > > PS: As a reminder, here's IRIW. Given global x, y: >> Thread 1: x = 1; >> Thread 2: y = 1; >> Thread 3: r1 = x; r2 = y; // sees r1 == 1, r2 == 0 >> Thread 4: r3 = y; r4 = x; // sees r3 == 1, r4 == 0 >> > > (This outcome is not allowed by SC.) > > The IRIW example is a fun one in part because it is not especially > intuitive. Some people do not at first think that it is a result > forced by SC. I occasionally present this in courses, and most > students' first reaction is that you should use a common lock in all > threads if you want to ensure agreement about order of x and y > here. The fact that you don't need to strikes some (but by no means > all) people as a magical/spooky property of SC. > > This example (and variants of it) was also among those first driving > research into more efficient distributed multicast protocols in the > late 80's/early 90's (when I first encountered consistency policies > and protocols). Maintaining this property of SC is much more > expensive in a distributed setting than other consistency policies > that are sufficient to implement most distributed algorithms. SC > normally requires blocking on O(#hosts) round-trips per message in the > absence of failure, and heavy (and fallible) failure-recovery > mechanics. Other policies, including "causal broadcast" (guaranteeing > only transitivity of read-write happens-before in producer-consumer > chains) usually don't need to wait out all the round-trips (but still > require buffering). While the situation is a little better for > multiprocessor/multicore designers, it is not surprising that they > occasionally propose (as did AMD and then Intel five years or so ago) > schemes that are by default weaker (but still with full-SC modes). > > Arguments for not giving in to the whinings of implementors include > those claiming that uniform SC requirements enable better tools, > simpler proofs of correctness, more understandable models, and the > reduction of counterintuitive orderings. And that no single "natural" > property has emerged to replace it, despite a fair amount of trying. > > -Doug > > From dl at cs.oswego.edu Mon Feb 24 05:00:08 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 24 Feb 2014 08:00:08 -0500 Subject: [jmm-dev] Sequential Consistency In-Reply-To: References: <5308C959.80502@cs.oswego.edu> <5309FECD.3000502@cs.oswego.edu> Message-ID: <530B4258.4030308@cs.oswego.edu> On 02/24/2014 01:52 AM, Hans Boehm wrote: > I think it's that last comment here that needs to be emphasized: We don't really > have a viable candidate property to replace SC, that's anywhere near as easy to > reason about and provides significant performance advantages. Several people, > including Doug, looked hard for such things when we were talking about C++. > Yes (plus similar explorations for X10, and distributed consistency). We are pretty sure that there is no good substitute for requiring SC for lock-based programs. I think the main issue at hand is how far SC applies. We cannot require SC for all uses of mode-based/fenced/volatile accesses, because some sets of usages clearly are not SC. The audience of people using them seem happy to rely only on specs of ordering constraints. So it may suffice to just leave it at that. Although people do need to know which usages are emergently SC, so that they can for example build locks, which may require some special care in specification. This is just a slightly different perspective on similar issues and decisions in C/C++11. Among the differences is that we have "legacy" mode-less default volatile load/store, for which it is not clear that requiring uniform SC guarantees (versus only for get/set Sequential) would be doing anyone a favor. And not clear that it wouldn't. While I'm at it... > On Sun, Feb 23, 2014 at 5:59 AM, Doug Lea
> wrote: > > The IRIW example is a fun one in part because it is not especially > intuitive. Some people do not at first think that it is a result > forced by SC. I occasionally present this in courses, and most > students' first reaction is that you should use a common lock in all > threads if you want to ensure agreement about order of x and y > here. The fact that you don't need to strikes some (but by no means > all) people as a magical/spooky property of SC. A caveat: When I've done this in courses, there's usually some student who tries to exploit this to avoid locks/sync in some programming project. But never correctly -- the example does not seem to generalize in any useful way. In fact, I have never seen a program where SC-IRIW matters, so arguably, most people are better off not even knowing about it :-) -Doug From ajeffrey at bell-labs.com Mon Feb 24 09:19:49 2014 From: ajeffrey at bell-labs.com (Alan Jeffrey) Date: Mon, 24 Feb 2014 11:19:49 -0600 Subject: [jmm-dev] Sequential Consistency In-Reply-To: References: <5308C959.80502@cs.oswego.edu> <5309FECD.3000502@cs.oswego.edu> Message-ID: <530B7F35.2070502@bell-labs.com> The LTL formulation of relaxed consistency does validate IRIW. The interesting trace is: (1: W x=1) (2: W y=1) (3: Rx=1) (3: Ry=0) (4: R y=1) (4: R x=0) The reason why this trace is relaxed consistent is that each action can be justified by a different permutation of the actions before it. In particular, the action (4: x=0) can be justified by the permutation: (2: W y=1) (3: Rx=1) (3: Ry=0) (4: R y=1) (4: R x=0) (1: W x=1) and the action (3: Ry=0) can be justified by the permutation: (1: W x=1) (3: Rx=1) (3: Ry=0) (2: W y=1) So I there are models based on interleaved actions and reorderings that validate IRIW, but crucially different reorderings are used to justify different read actions. I'm not going to try to claim that LTL with permutations is as easy to reason about as SC though! A. On 02/24/2014 12:52 AM, Hans Boehm wrote: > I think it's that last comment here that needs to be emphasized: We don't > really have a viable candidate property to replace SC, that's anywhere near > as easy to reason about and provides significant performance advantages. > Several people, including Doug, looked hard for such things when we were > talking about C++. > > As far as I can tell, everyone intuitively wants to reason about thread > behavior in terms of interleaving thread actions, possibly after allowing > some reorderings within threads. IRIW seems inherently incompatible with > that, which might be a partial explanation of why it's difficult to reason > directly with consistency properties that allow it. > > Hans > > > On Sun, Feb 23, 2014 at 5:59 AM, Doug Lea
wrote: > >> On 02/22/2014 10:59 AM, Doug Lea wrote: >> >>> I won't yet try to summarize different positions and rationales, >>> but for now just invite further discussion. >>> >> >> That was too cowardly. Here's a shot at summarizing some of the >> historical context. >> >> PS: As a reminder, here's IRIW. Given global x, y: >>> Thread 1: x = 1; >>> Thread 2: y = 1; >>> Thread 3: r1 = x; r2 = y; // sees r1 == 1, r2 == 0 >>> Thread 4: r3 = y; r4 = x; // sees r3 == 1, r4 == 0 >>> >> >> (This outcome is not allowed by SC.) >> >> The IRIW example is a fun one in part because it is not especially >> intuitive. Some people do not at first think that it is a result >> forced by SC. I occasionally present this in courses, and most >> students' first reaction is that you should use a common lock in all >> threads if you want to ensure agreement about order of x and y >> here. The fact that you don't need to strikes some (but by no means >> all) people as a magical/spooky property of SC. >> >> This example (and variants of it) was also among those first driving >> research into more efficient distributed multicast protocols in the >> late 80's/early 90's (when I first encountered consistency policies >> and protocols). Maintaining this property of SC is much more >> expensive in a distributed setting than other consistency policies >> that are sufficient to implement most distributed algorithms. SC >> normally requires blocking on O(#hosts) round-trips per message in the >> absence of failure, and heavy (and fallible) failure-recovery >> mechanics. Other policies, including "causal broadcast" (guaranteeing >> only transitivity of read-write happens-before in producer-consumer >> chains) usually don't need to wait out all the round-trips (but still >> require buffering). While the situation is a little better for >> multiprocessor/multicore designers, it is not surprising that they >> occasionally propose (as did AMD and then Intel five years or so ago) >> schemes that are by default weaker (but still with full-SC modes). >> >> Arguments for not giving in to the whinings of implementors include >> those claiming that uniform SC requirements enable better tools, >> simpler proofs of correctness, more understandable models, and the >> reduction of counterintuitive orderings. And that no single "natural" >> property has emerged to replace it, despite a fair amount of trying. >> >> -Doug >> >> From paulmck at linux.vnet.ibm.com Mon Feb 24 14:44:25 2014 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Mon, 24 Feb 2014 14:44:25 -0800 Subject: [jmm-dev] Sequential Consistency In-Reply-To: References: <5308C959.80502@cs.oswego.edu> Message-ID: <20140224224425.GD8264@linux.vnet.ibm.com> On Sat, Feb 22, 2014 at 11:58:01AM -0800, Jeremy Manson wrote: > On Sat, Feb 22, 2014 at 7:59 AM, Doug Lea
wrote: > > > > > Another in the continuing series of issues to contemplate: > > > > There's a tension between those who believe that all "correct" > > programs are provably sequentially consistent versus those who > > consider sequential consistency as a goal only of lock-based programs; > > not necessarily of those using lock-free techniques and/or are > > components of distributed systems. (see for example Herlihy & Shavit's > > "The Art of Multiprocessor Programming" > > http://store.elsevier.com/The-Art-of-Multiprocessor- > > Programming/Maurice-Herlihy/isbn-9780080569581/) > > > > > Who falls into the first category? A "correct" program is one where the > behavior matches the spec, and if that can be done with non-SC behavior > (which it often can), then the conversation is over. Hear, hear! ;-) > I think the major limiting factor for volatiles and atomics supporting SC > (which is how I read what you are asking) is whether it can be done > reasonably (i.e., with acceptable performance) on the target platforms. If > it can, then for everyone's sanity (and in keeping with the desire for Java > to have somewhat accessible semantics for stuff like this), it makes sense > to specify them as being SC. If it can't, then (IMO) the IRIW-alike idioms > are few and far between enough that it makes no sense to try to decrease > everyone's performance to support SC for them. This is my experience as well -- I have seen very few actual algorithms that relied on SC. Thanx, Paul From paulmck at linux.vnet.ibm.com Mon Feb 24 14:54:21 2014 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Mon, 24 Feb 2014 14:54:21 -0800 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <5309BA22.9090900@oracle.com> References: <5308C959.80502@cs.oswego.edu> <5309BA22.9090900@oracle.com> Message-ID: <20140224225421.GE8264@linux.vnet.ibm.com> On Sun, Feb 23, 2014 at 01:06:42PM +0400, Aleksey Shipilev wrote: > On 02/22/2014 07:59 PM, Doug Lea wrote: > > Other cases may be less clear cut. For the most famous example: Can a > > program using non-lock-based techniques (for example, using Java > > volatile loads/stores) be "correct" if it fails some variant of the IRIW > > test? Is IRIW conformance an unnecessary action-at-a-distance > > by-product of SC, or does it play some intrinsically useful role in > > assuring correctness? > > IMO, we are on a thin ice here. The absence of counter-examples how > non-SC behaviors for IRIW-like constructions demolish the correctness at > larger scale does not mean we wouldn't find the case where it breaks > badly in future, when the spec solidifies. In other words, absence of > evidence is not evidence of absence. > > I, for one, would not like to wake up to another > double-checked-locking-like calamity because we allowed a particular > sneaky behavior in the name of performance. And yes, being the > performance guy, I still think strong correctness wins over performance > ten times over. > > The relaxations are welcome, but only in a few very constrained places, > where you are able to relatively easy fix/rewrite the bad usages or even > provide stronger ad-hoc semantics. On other words, the things you allow > in a library (e.g. Linux RCU) are not the things you want to burn into a > language spec. Hmmm... On the one hand, use of SC is no substitute for carefully designed APIs that are easy to use. Some of my ugliest bugs in my Linux-kernel work would not be helped by SC -- they involved very conservative fully locked code. On the other hand, if you are using non-SC primitives, then you had better have a really carefully designed heavily stress-tested API. A proof of correctness wouldn't hurt either. ;-) > > IRIW is not the only example of a case in which SC imposes conditions > > that some programmers in some contexts seem not to care about. But > > it is most famous because it so clearly impacts the nature and cost of > > mappings (for various modes of load, store, and CAS) on some existing > > processors as well as potential mappings on future processors. > > Being the language guy, I think the hardware not being able to provide > the sane SC primitives should pay up the costs. The hardware which makes > it relatively easy to implement the non-tricky language memory model > should be in the sweet spot. All hardware I know of has a non-trivial penalty for its SC primitives, so there is a place for non-SC algorithms. Thanx, Paul From paulmck at linux.vnet.ibm.com Mon Feb 24 14:58:30 2014 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Mon, 24 Feb 2014 14:58:30 -0800 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <530B4258.4030308@cs.oswego.edu> References: <5308C959.80502@cs.oswego.edu> <5309FECD.3000502@cs.oswego.edu> <530B4258.4030308@cs.oswego.edu> Message-ID: <20140224225830.GF8264@linux.vnet.ibm.com> On Mon, Feb 24, 2014 at 08:00:08AM -0500, Doug Lea wrote: > On 02/24/2014 01:52 AM, Hans Boehm wrote: > >I think it's that last comment here that needs to be emphasized: We don't really > >have a viable candidate property to replace SC, that's anywhere near as easy to > >reason about and provides significant performance advantages. Several people, > >including Doug, looked hard for such things when we were talking about C++. > > Yes (plus similar explorations for X10, and distributed consistency). > We are pretty sure that there is no good substitute for requiring > SC for lock-based programs. I think the main issue at hand is > how far SC applies. We cannot require SC for all uses of > mode-based/fenced/volatile accesses, because some sets of > usages clearly are not SC. The audience of people using them > seem happy to rely only on specs of ordering constraints. > So it may suffice to just leave it at that. Although people > do need to know which usages are emergently SC, so that they > can for example build locks, which may require some special > care in specification. > > This is just a slightly different perspective on similar issues > and decisions in C/C++11. Among the differences is that we > have "legacy" mode-less default volatile load/store, for > which it is not clear that requiring uniform SC guarantees > (versus only for get/set Sequential) would be doing anyone > a favor. And not clear that it wouldn't. Even I have come to grudgingly accept that SC is a reasonable default. But I definitely would not want to give up weaker modes. Something about needing my code to perform and scale well. ;-) > While I'm at it... > > >On Sun, Feb 23, 2014 at 5:59 AM, Doug Lea
>> wrote: > > > > The IRIW example is a fun one in part because it is not especially > > intuitive. Some people do not at first think that it is a result > > forced by SC. I occasionally present this in courses, and most > > students' first reaction is that you should use a common lock in all > > threads if you want to ensure agreement about order of x and y > > here. The fact that you don't need to strikes some (but by no means > > all) people as a magical/spooky property of SC. > > A caveat: When I've done this in courses, there's usually > some student who tries to exploit this to avoid locks/sync > in some programming project. But never correctly -- the example > does not seem to generalize in any useful way. In fact, I have > never seen a program where SC-IRIW matters, so arguably, > most people are better off not even knowing about it :-) I heard a rumor that some work-stealing task scheduler relied on SC-IRIW, but never have been able to track it down. Even if someone does track it down, I would argue that it is the exception that proves the rule. ;-) Thanx, Paul From paulmck at linux.vnet.ibm.com Mon Feb 24 16:20:20 2014 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Mon, 24 Feb 2014 16:20:20 -0800 Subject: [jmm-dev] stores In-Reply-To: <5301430E.9010009@cs.oswego.edu> References: <5301430E.9010009@cs.oswego.edu> Message-ID: <20140225002020.GI8264@linux.vnet.ibm.com> On Sun, Feb 16, 2014 at 06:00:30PM -0500, Doug Lea wrote: > > Memory models can generate a fair amount of excitement. > See Linus Torvalds's post on the linux kernel list: > https://lkml.org/lkml/2014/2/14/492 > and follow-ups with Paul McKenney. (Condolences!) Heh! The fun continues... > I don't think this introduces anything new with respect > to JMM9 discussions so far though. In general, speculative > stores and out-of-thin-air reads break basic safety properties. > Although there still might be some related open cases > about inserted stores, including "redundant" ones. As in: > if (x != 0) x = 0; > ==> x = 0; ? One possibly interesting thing from later in the LKML discussion, though I am not sure that it maps into the Java final-field model -- you guys can be the judge of that. I will present it in C just for definiteness. T1: p = &nondefault_gp; p->a = 42; atomic_store_release(&gp, p); T2: p = atomic_load_explicit(&gp, memory_order_consume); if (p != &default_gp) { do_something_with(p); return; } r1 = p->a; /* At this point, the compiler knows p == &default_gp. */ If this particular execution has only &default_gp and &nondefault_gp as values for gp, are we guaranteed that r1==42? It would be given the current wording in the C11 and C++11 standards. Assuming that this example even makes sense in the context of Java final fields... Thanx, Paul From dl at cs.oswego.edu Tue Feb 25 04:26:45 2014 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 25 Feb 2014 07:26:45 -0500 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <20140224224425.GD8264@linux.vnet.ibm.com> References: <5308C959.80502@cs.oswego.edu> <20140224224425.GD8264@linux.vnet.ibm.com> Message-ID: <530C8C05.5070008@cs.oswego.edu> On 02/24/2014 05:44 PM, Paul E. McKenney wrote: > This is my experience as well -- I have seen very few actual algorithms > that relied on SC. This seems to be the attitude of almost all developers of non-lock-based algorithms: Explicit ordering constraints are critical, but program-wide SC is not. Which is nearly opposite to almost every developer's view of lock-based programs: any ordering is OK so long as SC is maintained. One place these different views meet up is when creating locks out of non-blocking primitives. So there must be guaranteed ways of achieving SC using modeful/fenced accesses. Beyond that, the problem seems underconstrained. I'm not sure that litmus-test-style examples will suffice to provide an answer. When you are not dealing with locks, it seems that for every odd consequence of some non-SC rule, you can find an equally odd one for an SC-based rule. For example, Ali Sezgin (who is on this list) has written up some especially bizarre sequentially consistent examples in: Sezgin, Ali, and Ganesh Gopalakrishnan. "On the definition of sequential consistency." Information processing letters 2005. http://www.cs.utah.edu/formal_verification/publications/june2013update/dblp/2005/2/j23.pdf -Doug From paulmck at linux.vnet.ibm.com Tue Feb 25 10:53:08 2014 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Tue, 25 Feb 2014 10:53:08 -0800 Subject: [jmm-dev] Sequential Consistency In-Reply-To: <530C8C05.5070008@cs.oswego.edu> References: <5308C959.80502@cs.oswego.edu> <20140224224425.GD8264@linux.vnet.ibm.com> <530C8C05.5070008@cs.oswego.edu> Message-ID: <20140225185307.GV8264@linux.vnet.ibm.com> On Tue, Feb 25, 2014 at 07:26:45AM -0500, Doug Lea wrote: > On 02/24/2014 05:44 PM, Paul E. McKenney wrote: > >This is my experience as well -- I have seen very few actual algorithms > >that relied on SC. > > This seems to be the attitude of almost all developers of > non-lock-based algorithms: Explicit ordering constraints are > critical, but program-wide SC is not. Which is nearly opposite > to almost every developer's view of lock-based programs: any ordering > is OK so long as SC is maintained. I am quite capable of maintaining both viewpoints internally. If I am using locks, I want the benefits of locking. When I am not using locks, I don't want to be forced to wear the locking straightjacket. ;-) > One place these different views meet up is when creating locks > out of non-blocking primitives. So there must be guaranteed > ways of achieving SC using modeful/fenced accesses. Yep. > Beyond that, the problem seems underconstrained. > > I'm not sure that litmus-test-style examples will suffice > to provide an answer. When you are not dealing with locks, > it seems that for every odd consequence of some non-SC rule, > you can find an equally odd one for an SC-based rule. > For example, Ali Sezgin (who is on this list) has written up some > especially bizarre sequentially consistent examples in: > Sezgin, Ali, and Ganesh Gopalakrishnan. "On the definition of > sequential consistency." Information processing letters 2005. > http://www.cs.utah.edu/formal_verification/publications/june2013update/dblp/2005/2/j23.pdf I had not seen this one before! Classic!!! ;-) Thanx, Paul