From nitsanw at yahoo.com Fri Jul 8 11:08:01 2016 From: nitsanw at yahoo.com (Nitsan Wakart) Date: Fri, 8 Jul 2016 11:08:01 +0000 (UTC) Subject: [jmm-dev] Optimizing external actions in the JMM In-Reply-To: References: Message-ID: <159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com> Given: "... where f is some external action that the compiler understands. If the compiler knows `f` always returns 42 and has no other effect, can it optimize ThreadA to ... thereby introducing a OOTA-like value of 42 into the system?" Why is this OOTA? The thing is you define: "external action that... always returns 42 and has no other effect" Which according to: "An external action is an action that may be observable outside of an execution, and has a result based on an environment external to the execution." (from https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.2) is not an external action. To make a function an "external action" we need to satisfy: 1. "may be observable outside of an execution" AND 2. "has a result based on an environment external to the execution" The concerns you raise around off-heap memory handling boil down to: - Unsafe.put(long address,*)/Unsafe.put(null, long address,*): don't fullfil 2 - Unsafe.get*(long address)/Unsafe.get*(null, long address): don't fullfil 1 The volatile accesses to offheap are not covered by JMM AFAIK, but are relied upon by many to mean the same as their heap counter parts. From sanjoy at playingwithpointers.com Tue Jul 12 07:35:47 2016 From: sanjoy at playingwithpointers.com (Sanjoy Das) Date: Tue, 12 Jul 2016 00:35:47 -0700 Subject: [jmm-dev] Optimizing external actions in the JMM In-Reply-To: <159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com> References: <159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com> Message-ID: <57849DD3.6060006@playingwithpointers.com> Hi Nitsan, Thank you for replying! Nitsan Wakart wrote: > Given: > "... > where f is some external action that the compiler understands. If the > compiler knows `f` always returns 42 and has no other effect, can it > optimize ThreadA to > ... > thereby introducing a OOTA-like value of 42 into the system?" > Why is this OOTA? It isn't OOTA intuitively, and I'm trying to justify its non-OOTA ness by the JMM rules. > The thing is you define: > "external action that... always returns 42 and has no other effect" I didn't mean to say that it "always returns 42 and has no other effect" by the spec, but that the compiler knows it "always returns 42 and has no other effect" by some external knowledge it has about `f` (and perhaps the environment). For instance, say f(x) was "return Unix_open("/home/foo/" + x)", and the JIT knew that since the process is running under user "bar", the call to open would always return -1 and not have any other external effect. Would it be okay then to introduce the OOTA like value -1 by replacing the call to Unix_fopen with -1? I'd like to say yes, but I can't justify -1 for the same reason as we couldn't justify 42 earlier -- there isn't enough information in the trace to infer f(0) is -1 -- the trace will only state that f(-1) is -1. > Which according to: > "An external action is an action that may be observable outside of an execution, > and has a result based on an environment external to the execution." > (from https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.2) > is not an external action. > > To make a function an "external action" we need to satisfy: > 1. "may be observable outside of an execution" AND > 2. "has a result based on an environment external to the execution" > > The concerns you raise around off-heap memory handling boil down to: > - Unsafe.put(long address,*)/Unsafe.put(null, long address,*): don't fullfil 2 > - Unsafe.get*(long address)/Unsafe.get*(null, long address): don't fullfil 1 Under this interpretation the Unsafe.get*() intrinsics have a corner case -- if they're being called on a mmapped file (or uninitialized memory, even) then the value returned by them is not specified by the memory model (to be precise: whether a certain value returned by the call is correct is not decidable in the JMM). This hints that they need to be modeled as external actions; moreover, I think that (1) really means "*may* be observable" and not "*has to be* observable". -- Sanjoy > The volatile accesses to offheap are not covered by JMM AFAIK, but are relied > upon by many to mean the same as their heap counter parts. From nitsanw at yahoo.com Tue Jul 12 08:06:11 2016 From: nitsanw at yahoo.com (Nitsan Wakart) Date: Tue, 12 Jul 2016 08:06:11 +0000 (UTC) Subject: [jmm-dev] Optimizing external actions in the JMM In-Reply-To: <57849DD3.6060006@playingwithpointers.com> References: <159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com> <57849DD3.6060006@playingwithpointers.com> Message-ID: <1983003876.2333894.1468310771326.JavaMail.yahoo@mail.yahoo.com> >> The thing is you define: >> "external action that... always returns 42 and has no other effect" > I didn't mean to say that it "always returns 42 and has no other > effect" by the spec, but that the compiler knows it "always returns 42 > and has no other effect" by some external knowledge it has about `f` > (and perhaps the environment). If the JIT compiler KNOWS, then it knows, job done. Same way it knows Math.pow is not external. >> - Unsafe.get*(long address)/Unsafe.get*(null, long address): don't fullfil 1 > Under this interpretation the Unsafe.get*() intrinsics have a corner > case -- if they're being called on a mmapped file (or uninitialized > memory, even) then the value returned by them is not specified by the > memory model (to be precise: whether a certain value returned by the > call is correct is not decidable in the JMM). This hints that they > need to be modeled as external actions; moreover, I think that (1) > really means "*may* be observable" and not "*has to be* observable". Where behaviour is undefined, it is down to precedent and sensibility... How is a 'read' "observable outside the execution"? Consider the following code: ---- long address = Unsafe.allocate(1024); int i = 1; Unsafe.putInt(address,i); return Unsafe.getInt(address) == i; // might as well return true; ----- From sanjoy at playingwithpointers.com Tue Jul 12 08:34:51 2016 From: sanjoy at playingwithpointers.com (Sanjoy Das) Date: Tue, 12 Jul 2016 01:34:51 -0700 Subject: [jmm-dev] Optimizing external actions in the JMM In-Reply-To: <1983003876.2333894.1468310771326.JavaMail.yahoo@mail.yahoo.com> References: <159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com> <57849DD3.6060006@playingwithpointers.com> <1983003876.2333894.1468310771326.JavaMail.yahoo@mail.yahoo.com> Message-ID: <5784ABAB.2010307@playingwithpointers.com> Hi Nitsan, Nitsan Wakart wrote: >>> The thing is you define: > >>> "external action that... always returns 42 and has no other effect" >> I didn't mean to say that it "always returns 42 and has no other >> effect" by the spec, but that the compiler knows it "always returns 42 >> and has no other effect" by some external knowledge it has about `f` >> (and perhaps the environment). > > > If the JIT compiler KNOWS, then it knows, job done. Same way it knows Math.pow is not external. I'm trying to justify precisely the "job done" bit. :) Specifically, can it replace f(x) with 42 even if it knows that given the current environment the "external action" f(x) always returns 42 and has no other effect? Math.pow(a, b) is fundamentally different than f(x) = Unix_open("/home/foo" + x) -- it can be evaluated for any a, b independent of the environment. This isn't true for f(x) as defined: in the previous example the JIT knows that f(x) returns -1 and has no other effect _because_ it knows that the process is being run as user "bar". This information is not present in the execution trace, so we can't "evaluate" f(0) to justify the write of -1 to y. >>> - Unsafe.get*(long address)/Unsafe.get*(null, long address): don't fullfil 1 >> Under this interpretation the Unsafe.get*() intrinsics have a corner >> case -- if they're being called on a mmapped file (or uninitialized >> memory, even) then the value returned by them is not specified by the >> memory model (to be precise: whether a certain value returned by the >> call is correct is not decidable in the JMM). This hints that they >> need to be modeled as external actions; moreover, I think that (1) >> really means "*may* be observable" and not "*has to be* observable". > > > Where behaviour is undefined, it is down to precedent and sensibility... That's fine (especially given that s.m.Unsafe is an internal API), but how do you "plug in" the precedent and sensibility into the rest of the memory model? IOW, in (say) Thread1: addr = mmap_file(); r1 = unsafe.getByte(addr); this.y = r1 this.volatileF = r1 Thread2: r2 = this.volatileF; r3 = this.y when trying to prove things about the r3 + r2 (say) how do you model r1? Given that r1's value cannot be described by the JMM, it seems reasonable to me to give it a sensible value intuitively, but in the JMM model it as an external action that happens to return that sensible value. > How is a 'read' "observable outside the execution"? Its usually not, it sounded like you were interpreting "observable outside of execution" as a necessary condition for an action to be a side effect, when I think observability is sufficient but not necessary for an action to be considered an external action. That is likely the reason for saying "may be observable outside of an execution" and not "has to be observable outside of an execution". > Consider the following code: > ---- > long address = Unsafe.allocate(1024); > int i = 1; > Unsafe.putInt(address,i); > return Unsafe.getInt(address) == i; // might as well return true; > ----- From aph at redhat.com Tue Jul 12 10:13:10 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 12 Jul 2016 11:13:10 +0100 Subject: [jmm-dev] Make load/store of 64-bit long and double atomic In-Reply-To: References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org> Message-ID: <5784C2B6.8030500@redhat.com> On 12/07/16 03:21, David Holmes wrote: > This is not a hotspot issue but a Java programming language issue. > Hotspot would never provide a flag that changes the Java programming > language semantics. The performance impact of all-accesses-are- > atomic on 32-bit systems is considerable Not necessarily. There are significant performance implications on some 32-bit systems, but by no means all. And such 32-bit systems are getting rarer -- IMVHO. > so as long as we support 32-bit I don't see this happening > (regardless of what may be discussed on jmm-dev). It would be > unconscionable to have different semantics on 32-bit and 64-bit so > that is not an option either. I wonder if a better solution to this might be to make VarHandle.{get,set}Opaque atomic on all primitive types. This gives us a way to get atomic operations on 32-bit machines without the overhead of volatile accesses. Being able to read a 64-bit counter atomically is very useful. C++ says: [ Note: Atomic operations specifying memory_order_relaxed are relaxed with respect to memory ordering. Implementations must still guarantee that any given atomic access to a particular atomic object be indivisible with respect to all other atomic accesses to that object. ? end note ] But Java says: Unless stated otherwise in the documentation of a factory method, the access modes get and set (if supported) provide atomic access for reference types and all primitives types, with the exception of long and double on 32-bit platforms. I wonder if this divergence between Java and C++ is deliberate. It seems wrong to me. Andrew. From aleksey.shipilev at oracle.com Tue Jul 12 10:19:20 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 12 Jul 2016 13:19:20 +0300 Subject: [jmm-dev] Make load/store of 64-bit long and double atomic In-Reply-To: <5784C2B6.8030500@redhat.com> References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org> <5784C2B6.8030500@redhat.com> Message-ID: <5784C428.2090401@oracle.com> On 07/12/2016 01:13 PM, Andrew Haley wrote: > I wonder if a better solution to this might be to make > VarHandle.{get,set}Opaque atomic on all primitive types. This gives > us a way to get atomic operations on 32-bit machines without the > overhead of volatile accesses. Being able to read a 64-bit counter > atomically is very useful. VarHandle.{get,set}Opaque is single-copy atomic for all primitive types. Pretty much like C++ std::atomic(..., mem_ord_relaxed). Thanks, -Aleksey From aph at redhat.com Tue Jul 12 10:22:35 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 12 Jul 2016 11:22:35 +0100 Subject: [jmm-dev] Make load/store of 64-bit long and double atomic In-Reply-To: <5784C428.2090401@oracle.com> References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org> <5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com> Message-ID: <5784C4EB.8090207@redhat.com> On 12/07/16 11:19, Aleksey Shipilev wrote: > On 07/12/2016 01:13 PM, Andrew Haley wrote: >> I wonder if a better solution to this might be to make >> VarHandle.{get,set}Opaque atomic on all primitive types. This gives >> us a way to get atomic operations on 32-bit machines without the >> overhead of volatile accesses. Being able to read a 64-bit counter >> atomically is very useful. > > VarHandle.{get,set}Opaque is single-copy atomic for all primitive types. > Pretty much like C++ std::atomic(..., mem_ord_relaxed). So what does Unless stated otherwise in the documentation of a factory method, the access modes get and set (if supported) provide atomic access for reference types and all primitives types, with the exception of long and double on 32-bit platforms. refer to? And where foes the spec say that VarHandle.{get,set}Opaque is atomic? Andrew. From aleksey.shipilev at oracle.com Tue Jul 12 11:30:51 2016 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 12 Jul 2016 14:30:51 +0300 Subject: [jmm-dev] Make load/store of 64-bit long and double atomic In-Reply-To: <5784C4EB.8090207@redhat.com> References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org> <5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com> <5784C4EB.8090207@redhat.com> Message-ID: <5784D4EB.60904@oracle.com> On 07/12/2016 01:22 PM, Andrew Haley wrote: > On 12/07/16 11:19, Aleksey Shipilev wrote: >> On 07/12/2016 01:13 PM, Andrew Haley wrote: >>> I wonder if a better solution to this might be to make >>> VarHandle.{get,set}Opaque atomic on all primitive types. This gives >>> us a way to get atomic operations on 32-bit machines without the >>> overhead of volatile accesses. Being able to read a 64-bit counter >>> atomically is very useful. >> >> VarHandle.{get,set}Opaque is single-copy atomic for all primitive types. >> Pretty much like C++ std::atomic(..., mem_ord_relaxed). > > So what does > > Unless stated otherwise in the documentation of a factory method, the > access modes get and set (if supported) provide atomic access for > reference types and all primitives types, with the exception of long > and double on 32-bit platforms. > > refer to? That's for VarHandle.{get|set}, not for VarHandle.{get|set}Opaque. Access mode "get" is different from access mode "getOpaque". > And where foes the spec say that VarHandle.{get,set}Opaque is > atomic? Nowhere yet. I tried to capture atomicity in Javadoc like this: http://mail.openjdk.java.net/pipermail/jmm-dev/2016-June/000282.html ...but it's not yet there. Thanks, -Aleksey From paul.sandoz at oracle.com Tue Jul 12 12:22:17 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 12 Jul 2016 14:22:17 +0200 Subject: [jmm-dev] Make load/store of 64-bit long and double atomic In-Reply-To: <5784D4EB.60904@oracle.com> References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org> <5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com> <5784C4EB.8090207@redhat.com> <5784D4EB.60904@oracle.com> Message-ID: <664572BB-9859-4019-B722-1E1FB3D8A4D6@oracle.com> > On 12 Jul 2016, at 13:30, Aleksey Shipilev wrote: > > On 07/12/2016 01:22 PM, Andrew Haley wrote: >> On 12/07/16 11:19, Aleksey Shipilev wrote: >>> On 07/12/2016 01:13 PM, Andrew Haley wrote: >>>> I wonder if a better solution to this might be to make >>>> VarHandle.{get,set}Opaque atomic on all primitive types. This gives >>>> us a way to get atomic operations on 32-bit machines without the >>>> overhead of volatile accesses. Being able to read a 64-bit counter >>>> atomically is very useful. >>> >>> VarHandle.{get,set}Opaque is single-copy atomic for all primitive types. >>> Pretty much like C++ std::atomic(..., mem_ord_relaxed). >> >> So what does >> >> Unless stated otherwise in the documentation of a factory method, the >> access modes get and set (if supported) provide atomic access for >> reference types and all primitives types, with the exception of long >> and double on 32-bit platforms. >> >> refer to? > > That's for VarHandle.{get|set}, not for VarHandle.{get|set}Opaque. > Access mode "get" is different from access mode "getOpaque". > >> And where foes the spec say that VarHandle.{get,set}Opaque is >> atomic? > > Nowhere yet. I tried to capture atomicity in Javadoc like this: > http://mail.openjdk.java.net/pipermail/jmm-dev/2016-June/000282.html > > ...but it's not yet there. > It does state it here: * Read/write access modes (if supported), with the exception of * {@code get} and {@code set}, provide atomic access for * reference types and all primitive types. Before the ?unless stated otherwise?? quoted above. As part of the sweep through the specification we should make that clearer. Paul. From dl at cs.oswego.edu Tue Jul 12 12:29:52 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 12 Jul 2016 08:29:52 -0400 Subject: [jmm-dev] Make load/store of 64-bit long and double atomic In-Reply-To: <5784E0D0.70009@oracle.com> References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org> <5784E0D0.70009@oracle.com> Message-ID: <5784E2C0.1070408@cs.oswego.edu> On 07/12/2016 08:21 AM, Aleksey Shipilev wrote: > On 07/12/2016 02:50 PM, John Crowley wrote: >>> On Jul 11, 2016, at 10:21 PM, David Holmes >>> wrote: >>> On 7/07/2016 9:29 PM, John Crowley wrote: >>>> Would like to make a suggestion re the JVM and non-atomic >>>> load/store for long and double values since both are 64-bit. >>>> (Sec 17.7 of the JLS version 8 - have not been able to find a JLS >>>> V9 yet). Did some searching through JSRs and mailing lists, but >>>> did not see this addressed - please send me a link if it has been >>>> and I just missed it. > > In Hotspot, there is an experimental -XX:+AlwaysAtomicAccesses flag that > turns long/double accesses to be single-copy atomic. Not sure it works > properly in interpreter though. You may build on that. > > The sound counter-argument that I heard against enabling long/double > atomic accesses is the interaction with value types. If we make all > present types access-atomic, and have to retract that back when > larger-than-machine-word value types come in, that would be bad. Since > this long/double spec change is at best Java 10, we better off seeing > how it plays out with value types. > Yes, thanks. That's an accurate synopsis of discussions on the jmm-dev list in 2014. (http://mail.openjdk.java.net/pipermail/jmm-dev/) In the mean time, we do need to make a clean-up pass on VarHandle javadocs/specs, that now include some remnants of previous designs and are missing a few clarifications. -Doug From aph at redhat.com Tue Jul 12 12:31:11 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 12 Jul 2016 13:31:11 +0100 Subject: [jmm-dev] Make load/store of 64-bit long and double atomic In-Reply-To: <664572BB-9859-4019-B722-1E1FB3D8A4D6@oracle.com> References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org> <5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com> <5784C4EB.8090207@redhat.com> <5784D4EB.60904@oracle.com> <664572BB-9859-4019-B722-1E1FB3D8A4D6@oracle.com> Message-ID: <5784E30F.9010700@redhat.com> On 12/07/16 13:22, Paul Sandoz wrote: > It does state it here: > > * Read/write access modes (if supported), with the exception of > * {@code get} and {@code set}, provide atomic access for > * reference types and all primitive types. > > Before the ?unless stated otherwise?? quoted above. > > As part of the sweep through the specification we should make that clearer. It's very hard to understand what is going on. compareAndExchange() has stronger ordering semantics than compareAndExchangeRelease() but set() has weaker ordering semantics than setRelease(). We're making a real mess that nobody is going to thank us for. Andrew. From paul.sandoz at oracle.com Tue Jul 12 13:12:45 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 12 Jul 2016 15:12:45 +0200 Subject: [jmm-dev] Make load/store of 64-bit long and double atomic In-Reply-To: <5784E30F.9010700@redhat.com> References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org> <5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com> <5784C4EB.8090207@redhat.com> <5784D4EB.60904@oracle.com> <664572BB-9859-4019-B722-1E1FB3D8A4D6@oracle.com> <5784E30F.9010700@redhat.com> Message-ID: <57190EA9-92B5-4276-BB9B-33C1AA65D133@oracle.com> > On 12 Jul 2016, at 14:31, Andrew Haley wrote: > > On 12/07/16 13:22, Paul Sandoz wrote: >> It does state it here: >> >> * Read/write access modes (if supported), with the exception of >> * {@code get} and {@code set}, provide atomic access for >> * reference types and all primitive types. >> >> Before the ?unless stated otherwise?? quoted above. >> >> As part of the sweep through the specification we should make that clearer. > > It's very hard to understand what is going on. compareAndExchange() > has stronger ordering semantics than compareAndExchangeRelease() but > set() has weaker ordering semantics than setRelease(). We're making a > real mess that nobody is going to thank us for. > It?s an awkward situation. Doug previously mentioned in an email on core-libs: >> No matter which conventions you choose here, some people will be >> unhappy or confused. The current scheme seems to make the current users >> of both Unsafe and AtomicX least unhappy or confused. http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-July/042249.html. You mentioned in a previous email the possibility of using doing something similar to C++ atomics and pass in the the memory order characteristics as a constant. We did mull that over a little early on, one concern was the performance aspects. It might possible to pull that off with an enum and implementing the if/else in Java so it constant folds enabling reuse of existing intrinsics and simplifying the addition of new ones. That would be a significant deviation from the API/implementation/tests at this stage in the 9 release schedule. I suppose it?s something we could support later on as a complementary feature (e.g. using the ?explicit? suffix in the method names). Paul. From john.r.rose at oracle.com Fri Jul 15 02:09:05 2016 From: john.r.rose at oracle.com (John Rose) Date: Thu, 14 Jul 2016 19:09:05 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS Message-ID: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> I think we are missing an important opportunity by not supporting single-bit RMW operations in VarHandles. In particular, the x86 "bts" (bit-test-and-test) "btr" (bit-test-and-reset) and "btc" (bit-test-and-clear) are sometimes the right way to modify some data structure, when the alternative is a load and "cmpxchg" in a loop. The overall costs are probably the same in the best case, but the loop-based idiom has some danger (relative to the single-instruction idiom) of costs stemming from larger code size. At the JIT level, one can hope that the CAS-based idiom (coded in the current VH API) will be recognized and optimized to a single instruction on x86, but there is a strong risk that this will fail. It's safer to specify the operation explicitly using a separate VH method. The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened. An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable. (In a rather deep sense, getAndAdd is less powerful than testAndSetBit or getAndBitwiseOr, because op+ is bijective in each argument, while op| is idempotent. This means that you can operate bitwise on a structure in such a way that your operation disappears when the structure is already in some state you are pushing it towards. Of course, you also need a way to "exchange" in the previous value, atomically.) For a parallel discussion among the gcc folk, where they are working on pattern matching of CAS to BTS, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 So, here is a specific proposal: https://bugs.openjdk.java.net/browse/JDK-8161444 VarHandles should provide access bitwise atomics In a nutshell, testAndSetBitAcquire behaves as if it were built on top of compareAndExchangeAcquire (which it may on some platforms). The outgoing value parameter is not a value but a bit position within the memory value (zero = LSB, range-checked). On x86 it compiles to "lock;bts" with appropriate fencing. It is a great candidate for building a mutex-enter operation. For symmetry, I'm also proposing testAndClearBitRelease (or should it also be Acquire?) and flipAndGetBit (with Volatile ordering since it is likely to be stand-alone). But it's just symmetry; testAndSetBitAcquire is the important part. What do folks think? Is "lock;bts" useless for some reason? (Note that the lock prefix is interpreted by modern x86's as a cache transaction request, just like xadd, with no external signal.) Or are there no significant single-bit concurrent structures out there? I know of two: SeqLocks and AtomicMarkableReference (if/when the JVM embraces it). More background: The SeqLocks are likely to be important for value types (when they are too large for native hardware atomics, and must be accessed atomically). Note that many uncontended value types will still need to be used with SeqLocks, when structure-tearing must be prevented for one reason or another. (Yet more background: Non-tearability can be demanded by a value type's definition. If this were not possible, values could not embody invariants that affect security.) Thanks, ? John P.S. For the record here are the important spec. details: /** * Atomically loads the bit at the specified {@code index} in a variable with * the memory semantics of {@link #getAcquire}; if the bit is clear, * sets it with the memory semantics of {@link #set}; and finally returns * the original bit value as a boolean. * *

The variable may be of any primitive type. * Bits are numbered from zero, which refers to the arithmetically * least-significant bit, to {@code N-1} inclusive, where {@code N} is * the number of bits in the variable. Booleans have exactly one bit, * while other variables have an appropriate multiple of eight bits. * *

The method signature is of the form {@code (CT, int index)boolean}. * *

The symbolic type descriptor at the call site of {@code testAndSetBitAcquire} * must match the access mode type that is the result of calling * {@code accessModeType(VarHandle.AccessMode.TEST_AND_SET_BIT_ACQUIRE)} on this * VarHandle. * * @implNote The effects of this method are similar to a call to * {@code get} and {@code compareAndExchangeAcquire}, where the new * value is obtained from the old value by setting the specified bit. * The full effect of {@code testAndSetBitAcquire} would be obtained * by retrying the sequence as needed until the bit is either observed * to be set, or updated to be set. More efficient implementations may * be available on some platforms. * * @param args the signature-polymorphic parameter list of the form * {@code (CT, int index)} * , statically represented using varargs. * @return a boolean, the original value of the bit (before any update) * , statically represented using {@code Object}. * @throws UnsupportedOperationException if the access mode is unsupported * for this VarHandle. * @throws WrongMethodTypeException if the access mode type is not * compatible with the caller's symbolic type descriptor. * @throws ClassCastException if the access mode type is compatible with the * caller's symbolic type descriptor, but a reference cast fails. * @throws ClassCastException if the access mode type is compatible with the * caller's symbolic type descriptor, but a reference cast fails. * @throws IllegalArgumentException if the supplied index is not in the range * of zero (inclusive) to the number of bits in the variable (exclusive). * @see #getAcquire(Object...) * @see #set(Object...) * @see #compareAndExchangeAcquire(Object...) */ public final native @MethodHandle.PolymorphicSignature @HotSpotIntrinsicCandidate Object testAndSetBitAcquire(Object... args); From david.holmes at oracle.com Fri Jul 15 03:16:25 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 15 Jul 2016 13:16:25 +1000 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> Message-ID: <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> Hi John, On 15/07/2016 12:09 PM, John Rose wrote: > I think we are missing an important opportunity by not supporting single-bit RMW operations in VarHandles. > > In particular, the x86 "bts" (bit-test-and-test) "btr" (bit-test-and-reset) and "btc" (bit-test-and-clear) are sometimes the right way to modify some data structure, when the alternative is a load and "cmpxchg" in a loop. The overall costs are probably the same in the best case, but the loop-based idiom has some danger (relative to the single-instruction idiom) of costs stemming from larger code size. Is this readily supported on non-x86? David ----- > At the JIT level, one can hope that the CAS-based idiom (coded in the current VH API) will be recognized and optimized to a single instruction on x86, but there is a strong risk that this will fail. It's safer to specify the operation explicitly using a separate VH method. > > The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened. An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable. > > (In a rather deep sense, getAndAdd is less powerful than testAndSetBit or getAndBitwiseOr, > because op+ is bijective in each argument, while op| is idempotent. This means that > you can operate bitwise on a structure in such a way that your operation disappears > when the structure is already in some state you are pushing it towards. Of course, > you also need a way to "exchange" in the previous value, atomically.) > > For a parallel discussion among the gcc folk, where they are working on pattern matching > of CAS to BTS, see: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 > > So, here is a specific proposal: > https://bugs.openjdk.java.net/browse/JDK-8161444 > VarHandles should provide access bitwise atomics > > In a nutshell, testAndSetBitAcquire behaves as if it were built on top of > compareAndExchangeAcquire (which it may on some platforms). > The outgoing value parameter is not a value but a bit position within > the memory value (zero = LSB, range-checked). On x86 it compiles > to "lock;bts" with appropriate fencing. It is a great candidate for > building a mutex-enter operation. > > For symmetry, I'm also proposing testAndClearBitRelease (or should it > also be Acquire?) and flipAndGetBit (with Volatile ordering since it is likely > to be stand-alone). But it's just symmetry; testAndSetBitAcquire is the > important part. > > What do folks think? Is "lock;bts" useless for some reason? (Note that > the lock prefix is interpreted by modern x86's as a cache transaction > request, just like xadd, with no external signal.) Or are there no significant > single-bit concurrent structures out there? I know of two: SeqLocks and > AtomicMarkableReference (if/when the JVM embraces it). > > More background: The SeqLocks are likely to be important for value types > (when they are too large for native hardware atomics, and must be accessed > atomically). Note that many uncontended value types will still need to be used > with SeqLocks, when structure-tearing must be prevented for one reason > or another. > > (Yet more background: Non-tearability can be demanded by a value type's definition. > If this were not possible, values could not embody invariants that affect security.) > > Thanks, > ? John > > P.S. For the record here are the important spec. details: > > /** > * Atomically loads the bit at the specified {@code index} in a variable with > * the memory semantics of {@link #getAcquire}; if the bit is clear, > * sets it with the memory semantics of {@link #set}; and finally returns > * the original bit value as a boolean. > * > *

The variable may be of any primitive type. > * Bits are numbered from zero, which refers to the arithmetically > * least-significant bit, to {@code N-1} inclusive, where {@code N} is > * the number of bits in the variable. Booleans have exactly one bit, > * while other variables have an appropriate multiple of eight bits. > * > *

The method signature is of the form {@code (CT, int index)boolean}. > * > *

The symbolic type descriptor at the call site of {@code testAndSetBitAcquire} > * must match the access mode type that is the result of calling > * {@code accessModeType(VarHandle.AccessMode.TEST_AND_SET_BIT_ACQUIRE)} on this > * VarHandle. > * > * @implNote The effects of this method are similar to a call to > * {@code get} and {@code compareAndExchangeAcquire}, where the new > * value is obtained from the old value by setting the specified bit. > * The full effect of {@code testAndSetBitAcquire} would be obtained > * by retrying the sequence as needed until the bit is either observed > * to be set, or updated to be set. More efficient implementations may > * be available on some platforms. > * > * @param args the signature-polymorphic parameter list of the form > * {@code (CT, int index)} > * , statically represented using varargs. > * @return a boolean, the original value of the bit (before any update) > * , statically represented using {@code Object}. > * @throws UnsupportedOperationException if the access mode is unsupported > * for this VarHandle. > * @throws WrongMethodTypeException if the access mode type is not > * compatible with the caller's symbolic type descriptor. > * @throws ClassCastException if the access mode type is compatible with the > * caller's symbolic type descriptor, but a reference cast fails. > * @throws ClassCastException if the access mode type is compatible with the > * caller's symbolic type descriptor, but a reference cast fails. > * @throws IllegalArgumentException if the supplied index is not in the range > * of zero (inclusive) to the number of bits in the variable (exclusive). > * @see #getAcquire(Object...) > * @see #set(Object...) > * @see #compareAndExchangeAcquire(Object...) > */ > public final native > @MethodHandle.PolymorphicSignature > @HotSpotIntrinsicCandidate > Object testAndSetBitAcquire(Object... args); > From aph at redhat.com Fri Jul 15 08:06:45 2016 From: aph at redhat.com (Andrew Haley) Date: Fri, 15 Jul 2016 09:06:45 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> Message-ID: <57889995.5030100@redhat.com> On 15/07/16 04:16, David Holmes wrote: > On 15/07/2016 12:09 PM, John Rose wrote: >> > I think we are missing an important opportunity by not supporting >> > single-bit RMW operations in VarHandles. >> > >> > In particular, the x86 "bts" (bit-test-and-test) "btr" >> > (bit-test-and-reset) and "btc" (bit-test-and-clear) are sometimes >> > the right way to modify some data structure, when the alternative >> > is a load and "cmpxchg" in a loop. The overall costs are >> > probably the same in the best case, but the loop-based idiom has >> > some danger (relative to the single-instruction idiom) of costs >> > stemming from larger code size. > Is this readily supported on non-x86? On ARMv8, yes. >> > In a nutshell, testAndSetBitAcquire behaves as if it were built >> > on top of compareAndExchangeAcquire (which it may on some >> > platforms). >> > The outgoing value parameter is not a value but a bit position >> > within the memory value (zero = LSB, range-checked). On x86 it >> > compiles to "lock;bts" with appropriate fencing. It is a great >> > candidate for building a mutex-enter operation. It's a huge mistake to insist that only a single bit can be set or cleared. If it just so happens that a "bts" can be used, fine, but to bake such a restriction into the library and VM is wrong. The C++ atomic functions which do this job are ++ -- += -= &= |= ^= All of these take a std::memory_order argument. C++ compatibility should be our starting point for such things, IMO. Andrew. From dl at cs.oswego.edu Fri Jul 15 16:27:03 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 15 Jul 2016 12:27:03 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <57889995.5030100@redhat.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> Message-ID: <57890ED7.8090704@cs.oswego.edu> I agree with Andrew. I now think the three C++ bitwise atomic methods were prematurely triaged out. (Sorry for being the triager.) On 07/15/2016 04:06 AM, Andrew Haley wrote: > > The C++ atomic functions which do this job are ++, -- +=, -= These exist both pre- and post- style in both Java and C++, (getAndIncrement vs incrementAndGet etc), but for bitwise operations ... > &=, |=, ^= ... only the getAndX forms seem useful, with only Volatile and Release orderings. Using the default-volatile RMW convention, this would require 6 methods: getAndOrBits, getAndOrBitsRelease, getAndAndBits, getAndAndBitsRelease, getAndXorBits, getAndXorBitsRelease (the embedded "AndAnd" is a little jarring but probably inevitable.) On X86, it would require some compiler work to transform these into locked-bts etc instructions when applicable, but until they are, the unoptimized forms would be no worse than hand-build CAS loops. On ARMv8.1, these translate into new atomic instructions (at least the "release" forms). Similarly for the upcoming RISC-V specs. -Doug From dl at cs.oswego.edu Fri Jul 15 19:17:30 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 15 Jul 2016 15:17:30 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <57890ED7.8090704@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> Message-ID: <578936CA.4070304@cs.oswego.edu> On 07/15/2016 12:27 PM, Doug Lea wrote: > ... only the getAndX forms seem useful, with only Volatile > and Release orderings. Using the default-volatile RMW convention, > this would require 6 methods: > John suggests the slightly less weird (and thus better): getAndBitwiseOr, getAndBitwiseAnd, getAndBitwiseXor getAndBitwiseOrRelease, getAndBitwiseAndRelease, getAndBitwiseXorRelease that at least separates the two "And"s. And in the spirit of not making another premature triage proposal, perhaps these should also include Acquire variants: getAndBitwiseOrAcquire, getAndBitwiseAndAcquire, getAndBitwiseXorAcquire The implicitly-volatile versions should be useful without implementation penalty in the Acquire use cases that come to mind, but perhaps there are others. Suggestions welcome. -Doug From john.r.rose at oracle.com Fri Jul 15 19:24:04 2016 From: john.r.rose at oracle.com (John Rose) Date: Fri, 15 Jul 2016 12:24:04 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <578936CA.4070304@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> Message-ID: On Jul 15, 2016, at 12:17 PM, Doug Lea

wrote: > > On 07/15/2016 12:27 PM, Doug Lea wrote: > >> ... only the getAndX forms seem useful, with only Volatile >> and Release orderings. Using the default-volatile RMW convention, >> this would require 6 methods: >> > > John suggests the slightly less weird (and thus better): > getAndBitwiseOr, getAndBitwiseAnd, getAndBitwiseXor > getAndBitwiseOrRelease, getAndBitwiseAndRelease, getAndBitwiseXorRelease > that at least separates the two "And"s. > > And in the spirit of not making another premature triage proposal, > perhaps these should also include Acquire variants: > getAndBitwiseOrAcquire, getAndBitwiseAndAcquire, getAndBitwiseXorAcquire > > The implicitly-volatile versions should be useful without implementation > penalty in the Acquire use cases that come to mind, but perhaps there are > others. Suggestions welcome. Thanks. I withdraw the single-bit proposals! As you note, there are plausible ways to express bts/btr with full-width bitwise ops. (The bitwise ones are my real preference anyway, except for the unpleasant fact that the most common ISA does not support it. I thought I was being cleverly practical by proposing the single-bit versions.! How should these be aligned with compareAndExchange*? By that I mean the ordering of reads and writes should documented as no weaker than as if the thing had been implemented in terms of some corresponding CAS loop. (Or is there a better way?) This raises a question about omitting the store. Suppose the operation turns out to be a no-op. This can mean that contention is detected or an idempotent op has already raced to completion. In that case, should the op include the Release constraint or not? Put another way, can a reference implementation include the marked optimization or not: int getAndBitwiseOr(Object x, int mask) { for (;;) { int val0 = get(x); // getPlain int val1 = val0 & mask; if (val1 == val0) return val0; // ALLOW THIS OPTIMIZATION? int witness = compareAndExchangeRelease(x, val0, val1); if (witness == val0) return val0; } } The optimization allows getAndBitwiseOr to take a weaker form, which is generally desirable. OTOH, if that is a form known to be useless or problematic, we shouldn't go there. The usefulness would stem from the ability of a thread to detect already-done or contention conditions with the least overhead. A weaker form can be strengthened by adding a fence. A stronger form can be weakened by adding additional polling logic, but that is clumsy and error prone. ? John From martinrb at google.com Fri Jul 15 21:39:53 2016 From: martinrb at google.com (Martin Buchholz) Date: Fri, 15 Jul 2016 14:39:53 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> Message-ID: On Thu, Jul 14, 2016 at 7:09 PM, John Rose wrote: > > The particular use case I have in mind is SeqLocks, specifically the > writer-enter operation, which needs to change the lock state to "odd", > unless it is already "odd", and let the processor know what happened. An > "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is > preferable. > Most synchronizers have more complex state than "locked or unlocked". StampedLock is a read-write lock, so you can only acquire the write lock if not currently read-locked. (Did I miss something?) ReentrantLock is reentrant (!) so needs to store the lock hold count. Perhaps ReentrantLock could benefit if you optimize for non-reentrant acquires, at the cost of doing an extra update for reentrant acquires. (In a rather deep sense, getAndAdd is less powerful than testAndSetBit or > getAndBitwiseOr, > because op+ is bijective in each argument, while op| is idempotent. This > means that > you can operate bitwise on a structure in such a way that your operation > disappears > when the structure is already in some state you are pushing it towards. > Of course, > you also need a way to "exchange" in the previous value, atomically.) > From dl at cs.oswego.edu Fri Jul 15 23:17:19 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 15 Jul 2016 19:17:19 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> Message-ID: <57896EFF.2000403@cs.oswego.edu> On 07/15/2016 03:24 PM, John Rose wrote: > How should these be aligned with compareAndExchange*? > By that I mean the ordering of reads and writes should documented > as no weaker than as if the thing had been implemented in terms of > some corresponding CAS loop. (Or is there a better way?) Right. The specs for all getAndX operations should just amount to "equivalent to CAS loop". > > This raises a question about omitting the store. > Suppose the operation turns out to be a no-op. > This can mean that contention is detected or an idempotent > op has already raced to completion. > > In that case, should the op include the Release constraint or not? As a lock implementation question: If you fail fast path, then if options include spinning using Thread.onSpinWait, which has fence-like effects anyway. And if the alternative is no-op and it is expected to be common, then users should guard the atomic with a read to filter out most cases. And if it is a queued lock, then you generally need a full volatile fence anyway to operate on queue. So, across these and other options, release overhead is not not often measurable. Which seems to argue against complicating effects specification by allowing the early exit in: > > Put another way, can a reference implementation include > the marked optimization or not: > > int getAndBitwiseOr(Object x, int mask) { > for (;;) { > int val0 = get(x); // getPlain > int val1 = val0 & mask; > if (val1 == val0) return val0; // ALLOW THIS OPTIMIZATION? > int witness = compareAndExchangeRelease(x, val0, val1); > if (witness == val0) return val0; > } > } > -Doug From john.r.rose at oracle.com Sat Jul 16 00:50:08 2016 From: john.r.rose at oracle.com (John Rose) Date: Fri, 15 Jul 2016 17:50:08 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> Message-ID: On Jul 15, 2016, at 2:39 PM, Martin Buchholz wrote: > > On Thu, Jul 14, 2016 at 7:09 PM, John Rose > wrote: > > The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened. An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable. > > Most synchronizers have more complex state than "locked or unlocked". StampedLock is a read-write lock, so you can only acquire the write lock if not currently read-locked. (Did I miss something?) The bitwise stuff allows you to acquire or release a single independent bit in a lock word (or maybe more than one bit). That bit doesn't have to encode the whole state of the lock; in fact if it did we'd use getAndSet of a boolean. The point is you can build lock state management on top of getAndBitwise* in useful ways, when if the first interaction with the lock is to assert a setting of that one state bit, while at the same time querying the values of the other bits. > ReentrantLock is reentrant (!) so needs to store the lock hold count. Perhaps ReentrantLock could benefit if you optimize for non-reentrant acquires, at the cost of doing an extra update for reentrant acquires. It seems to me that any multi-field concurrent structure (like a StampedLock) could be protected by a single-bit micro-lock built on top of a reserved bit taken from one of the structure's fields. There are often reasons not to do such things, but when the technique is appropriate, the bitwise operators let you lay down the bit inside the same cache line as the rest of the structure. That seems like a win to me. Some day we can persuade the JVM to loosen its grip on the slack bits in pointers, allowing types like AtomicMarkableReference to be implemented in one word. In that case, AMR.attemptMark might use BTS/BTR. ? John From paul.sandoz at oracle.com Mon Jul 18 11:43:47 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 18 Jul 2016 13:43:47 +0200 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <578936CA.4070304@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> Message-ID: <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> > On 15 Jul 2016, at 21:17, Doug Lea
wrote: > > On 07/15/2016 12:27 PM, Doug Lea wrote: > >> ... only the getAndX forms seem useful, with only Volatile >> and Release orderings. Using the default-volatile RMW convention, >> this would require 6 methods: >> > > John suggests the slightly less weird (and thus better): > getAndBitwiseOr, getAndBitwiseAnd, getAndBitwiseXor > getAndBitwiseOrRelease, getAndBitwiseAndRelease, getAndBitwiseXorRelease > that at least separates the two "And"s. > > And in the spirit of not making another premature triage proposal, > perhaps these should also include Acquire variants: > getAndBitwiseOrAcquire, getAndBitwiseAndAcquire, getAndBitwiseXorAcquire > > The implicitly-volatile versions should be useful without implementation > penalty in the Acquire use cases that come to mind, but perhaps there are > others. Suggestions welcome. > We can support boolean, byte, char, short, int and long, where boolean defers to byte, and char defers to short. In terms of the Unsafe Java implementations have i got the following correct (it?s the acquire variant i am unsure of)? @ForceInline public final int getAndBitwiseOrInt(Object o, long offset, int mask) { int current; do { current = getIntVolatile(o, offset); } while (!weakCompareAndSwapIntVolatile(o, offset, current, current | mask)); return current; } @ForceInline public final int getAndBitwiseOrIntRelease(Object o, long offset, int mask) { int current; do { current = getInt(o, offset); } while (!weakCompareAndSwapIntRelease(o, offset, current, current | mask)); return current; } @ForceInline public final int getAndBitwiseOrIntAcquire(Object o, long offset, int mask) { int current; do { current = getIntAcquire(o, offset); } while (!weakCompareAndSwapIntAcquire(o, offset, current, current | mask)); return current; } As previously indicated, with suitable intrinsics and constant power of two masks (and complement of) it should be possible to boil it down to almost single bit setting instructions on x86 (more so if the returned value, aka current/witness, is dropped). ? Separately, i would like to propose a naming scheme: - for the read or write method, plain is the default. - for read-modify-write methods volatile is the default volatile - rename weakCompareAndSet to weakCompareAndSetPlain - rename weakCompareAndSetVolatile to weakCompareAndSet - deprecate (not for removal) Atomic*.weakCompareAndSet, add Atomic*.weakCompareAndSetPlain which leaves the inconsistency of Atomic*.weakCompareAndSetVolatile. Analysis on grepcode shows very little usage of the Atomic*.weakCompareAndSet methods. Paul. From aph at redhat.com Mon Jul 18 18:27:18 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Jul 2016 19:27:18 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> Message-ID: <578D1F86.7070408@redhat.com> On 18/07/16 12:43, Paul Sandoz wrote: > - for the read or write method, plain is the default. > > - for read-modify-write methods volatile is the default volatile > - rename weakCompareAndSet to weakCompareAndSetPlain Why "plain"? Is this the same as C++ "relaxed"? Andrew. From dl at cs.oswego.edu Mon Jul 18 19:31:29 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 18 Jul 2016 15:31:29 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> Message-ID: <578D2E91.5000500@cs.oswego.edu> On 07/18/2016 07:43 AM, Paul Sandoz wrote: > In terms of the Unsafe Java implementations have i got the following correct (it?s the acquire variant i am unsure of)? > These look OK. For ARM/POWER, it's possible to avoid some fences in loops at assembly level, but that's why they are intrinsics. > > Separately, i would like to propose a naming scheme: > > - for the read or write method, plain is the default. > > - for read-modify-write methods volatile is the default volatile > - rename weakCompareAndSet to weakCompareAndSetPlain > - rename weakCompareAndSetVolatile to weakCompareAndSet > - deprecate (not for removal) Atomic*.weakCompareAndSet, add Atomic*.weakCompareAndSetPlain > which leaves the inconsistency of Atomic*.weakCompareAndSetVolatile. > Sure. This does seem slightly better. (And I'm content to continue to take the blame for OKing the naming :-) > On 07/18/2016 02:27 PM, Andrew Haley wrote: >> On 18/07/16 12:43, Paul Sandoz wrote: >>> - for the read or write method, plain is the default. >>> >>> - for read-modify-write methods volatile is the default volatile >>> - rename weakCompareAndSet to weakCompareAndSetPlain >> >> Why "plain"? Is this the same as C++ "relaxed"? In this case, yes. But Java-plain is not necessarily always the same as C++ relaxed, so we've been cautious with namings. -Doug From aph at redhat.com Tue Jul 19 07:59:24 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 19 Jul 2016 08:59:24 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <578D2E91.5000500@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> Message-ID: <578DDDDC.8050704@redhat.com> On 18/07/16 20:31, Doug Lea wrote: >> On 07/18/2016 02:27 PM, Andrew Haley wrote: >>> On 18/07/16 12:43, Paul Sandoz wrote: >>>> - for the read or write method, plain is the default. >>>> >>>> - for read-modify-write methods volatile is the default volatile >>>> - rename weakCompareAndSet to weakCompareAndSetPlain >>> >>> Why "plain"? Is this the same as C++ "relaxed"? > > In this case, yes. But Java-plain is not necessarily always the > same as C++ relaxed, so we've been cautious with namings. Mmmm, but it's baffling for me, and I've been involved for a long time. We have "Opaque" and now "Plain". What is the difference between them? I haven't seen these terms anywhere else. Is this new terminology? Andrew. From paul.sandoz at oracle.com Tue Jul 19 08:39:14 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 19 Jul 2016 10:39:14 +0200 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <578DDDDC.8050704@redhat.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> Message-ID: <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> > On 19 Jul 2016, at 09:59, Andrew Haley wrote: > > On 18/07/16 20:31, Doug Lea wrote: > >>> On 07/18/2016 02:27 PM, Andrew Haley wrote: >>>> On 18/07/16 12:43, Paul Sandoz wrote: >>>>> - for the read or write method, plain is the default. >>>>> >>>>> - for read-modify-write methods volatile is the default volatile >>>>> - rename weakCompareAndSet to weakCompareAndSetPlain >>>> >>>> Why "plain"? Is this the same as C++ "relaxed"? >> >> In this case, yes. But Java-plain is not necessarily always the >> same as C++ relaxed, so we've been cautious with namings. > > Mmmm, but it's baffling for me, and I've been involved for a long > time. We have "Opaque" and now "Plain". What is the difference > between them? I haven't seen these terms anywhere else. Is this new > terminology? > Plain behaves like non-volatile/non-final field access e.g. like get/putfield byte codes. Both plain and opaque have ?no assurance of memory ordering effects with respect to other threads? but opaque is stronger in the sense that the compiler is restricted in what optimisations it may perform, in a sense the access is ?opaque? to the compiler e.g. it cannot elide the access or fold it into a more recent access etc. A good example is presented in Aleksey?s VarHandles slides #55 http://shipilev.net/talks/jpoint-April2016-varhandles.pdf I am still holding off updating the specifications to clarify, as Doug may have cooking some foundational tweaks from which we can build upon. Hth, Paul. From paul.sandoz at oracle.com Tue Jul 19 14:23:26 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 19 Jul 2016 16:23:26 +0200 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <578D2E91.5000500@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> Message-ID: <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com> > On 18 Jul 2016, at 21:31, Doug Lea
wrote: > > On 07/18/2016 07:43 AM, Paul Sandoz wrote: > >> In terms of the Unsafe Java implementations have i got the following correct (it?s the acquire variant i am unsure of)? >> > > These look OK. For ARM/POWER, it's possible to avoid some fences in loops > at assembly level, but that's why they are intrinsics. Here is an initial (and untested) webrev for those that might be interested: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ Paul. > >> >> Separately, i would like to propose a naming scheme: >> >> - for the read or write method, plain is the default. >> >> - for read-modify-write methods volatile is the default volatile >> - rename weakCompareAndSet to weakCompareAndSetPlain >> - rename weakCompareAndSetVolatile to weakCompareAndSet >> - deprecate (not for removal) Atomic*.weakCompareAndSet, add Atomic*.weakCompareAndSetPlain >> which leaves the inconsistency of Atomic*.weakCompareAndSetVolatile. >> > > Sure. This does seem slightly better. > (And I'm content to continue to take the blame for OKing the naming :-) > >> On 07/18/2016 02:27 PM, Andrew Haley wrote: >>> On 18/07/16 12:43, Paul Sandoz wrote: >>>> - for the read or write method, plain is the default. >>>> >>>> - for read-modify-write methods volatile is the default volatile >>>> - rename weakCompareAndSet to weakCompareAndSetPlain >>> >>> Why "plain"? Is this the same as C++ "relaxed"? > > In this case, yes. But Java-plain is not necessarily always the > same as C++ relaxed, so we've been cautious with namings. > > -Doug > > > > From john.r.rose at oracle.com Tue Jul 19 18:56:36 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 19 Jul 2016 11:56:36 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com> Message-ID: On Jul 19, 2016, at 7:23 AM, Paul Sandoz wrote: > > Here is an initial (and untested) webrev for those that might be interested: > > http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ > I like this very much. Mainly because it gives us single-bit atomics, but also because, with C++, it leans towards the newer ISAs. ? John From aph at redhat.com Tue Jul 19 20:51:37 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 19 Jul 2016 21:51:37 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> Message-ID: <578E92D9.80201@redhat.com> On 19/07/16 09:39, Paul Sandoz wrote: > >> On 19 Jul 2016, at 09:59, Andrew Haley wrote: >> >> On 18/07/16 20:31, Doug Lea wrote: >>>> On 07/18/2016 02:27 PM, Andrew Haley wrote: >>>>> On 18/07/16 12:43, Paul Sandoz wrote: >>>>>> - for the read or write method, plain is the default. >>>>>> >>>>>> - for read-modify-write methods volatile is the default volatile >>>>>> - rename weakCompareAndSet to weakCompareAndSetPlain >>>>> >>>>> Why "plain"? Is this the same as C++ "relaxed"? >>> >>> In this case, yes. But Java-plain is not necessarily always the >>> same as C++ relaxed, so we've been cautious with namings. >> >> Mmmm, but it's baffling for me, and I've been involved for a long >> time. We have "Opaque" and now "Plain". What is the difference >> between them? I haven't seen these terms anywhere else. Is this new >> terminology? > > Plain behaves like non-volatile/non-final field access e.g. like > get/putfield byte codes. > > Both plain and opaque have ?no assurance of memory ordering effects > with respect to other threads? but opaque is stronger in the sense > that the compiler is restricted in what optimisations it may > perform, in a sense the access is ?opaque? to the compiler e.g. it > cannot elide the access or fold it into a more recent access etc. OK, but if the processor can reorder accesses (and satisfy them from local caches) in the absence of fences, why is this a distinction that is worth bothering about? And how on Earth would you make such a distinction in the context of a high-level language specification? > A good example is presented in Aleksey?s VarHandles slides #55 > > http://shipilev.net/talks/jpoint-April2016-varhandles.pdf Thanks. > I am still holding off updating the specifications to clarify, as > Doug may have cooking some foundational tweaks from which we can > build upon. I look forward to seeing that. Andrew. From martinrb at google.com Wed Jul 20 00:14:34 2016 From: martinrb at google.com (Martin Buchholz) Date: Tue, 19 Jul 2016 17:14:34 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> Message-ID: On Fri, Jul 15, 2016 at 5:50 PM, John Rose wrote: > On Jul 15, 2016, at 2:39 PM, Martin Buchholz wrote: > > > On Thu, Jul 14, 2016 at 7:09 PM, John Rose wrote: > >> >> The particular use case I have in mind is SeqLocks, specifically the >> writer-enter operation, which needs to change the lock state to "odd", >> unless it is already "odd", and let the processor know what happened. An >> "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is >> preferable. >> > > Most synchronizers have more complex state than "locked or unlocked". > StampedLock is a read-write lock, so you can only acquire the write lock if > not currently read-locked. (Did I miss something?) > > > The bitwise stuff allows you to acquire or release a single independent bit > in a lock word (or maybe more than one bit). That bit doesn't have to > encode > the whole state of the lock; in fact if it did we'd use getAndSet of a > boolean. > The point is you can build lock state management on top of getAndBitwise* > in useful ways, when if the first interaction with the lock is to assert a > setting > I'm still thinking about where in j.u.c. we would use getAndBitwise*. ... StampedLock ... we have to distinguish readers and writers, so both readers and writers acquire the micro-lock before proceeding on success to do another write to indicate the actual current lock state. We'd better not lose our time slice in between! If an acquirer fails to acquire the micro-lock in an indeterminate state, they probably spin waiting for the micro-lock owner, but for how long? ReentrantLock seems more promising. The micro-lock bit unambiguously indicates "exclusively held"; other bits are reentrant hold count bits. On reentrant acquire, have to check thread field: lock.thread == Thread.currentThread(). If we don't acquire reentrantly, then a single getAndSetMicroLock is sufficient to unambiguously acquire the lock. ReentrantLock is reentrant (!) so needs to store the lock hold count. > Perhaps ReentrantLock could benefit if you optimize for non-reentrant > acquires, at the cost of doing an extra update for reentrant acquires. > > > It seems to me that any multi-field concurrent structure (like a > StampedLock) > could be protected by a single-bit micro-lock built on top of a reserved > bit taken > from one of the structure's fields. There are often reasons not to do such > things, but when the technique is appropriate, the bitwise operators let > you > lay down the bit inside the same cache line as the rest of the structure. > That seems like a win to me. > > Some day we can persuade the JVM to loosen its grip on the slack bits > in pointers, allowing types like AtomicMarkableReference to be implemented > in one word. In that case, AMR.attemptMark might use BTS/BTR. > But ... AtomicMarkableReference probably needs to be implemented in the VM, not in pure Java code that uses VarHandles, since pointer bit stealing depends on things like compressed oops? From martinrb at google.com Wed Jul 20 00:25:22 2016 From: martinrb at google.com (Martin Buchholz) Date: Tue, 19 Jul 2016 17:25:22 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <578E92D9.80201@redhat.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> Message-ID: On Tue, Jul 19, 2016 at 1:51 PM, Andrew Haley wrote: > On 19/07/16 09:39, Paul Sandoz wrote: > > Plain behaves like non-volatile/non-final field access e.g. like > > get/putfield byte codes. > We should probably clarify whether we really mean that even word-tearing on longs/doubles is allowed. C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses: truly atomic (!) and single-memory-location-sequentially-consistent. From john.r.rose at oracle.com Wed Jul 20 00:31:57 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 19 Jul 2016 17:31:57 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> Message-ID: <60A0B4AA-3521-44D4-941A-8ADEF6AF1B23@oracle.com> On Jul 19, 2016, at 5:14 PM, Martin Buchholz wrote: > > > On Fri, Jul 15, 2016 at 5:50 PM, John Rose > wrote: > On Jul 15, 2016, at 2:39 PM, Martin Buchholz > wrote: >> >> On Thu, Jul 14, 2016 at 7:09 PM, John Rose > wrote: >> >> The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened. An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable. >> >> Most synchronizers have more complex state than "locked or unlocked". StampedLock is a read-write lock, so you can only acquire the write lock if not currently read-locked. (Did I miss something?) > > The bitwise stuff allows you to acquire or release a single independent bit > in a lock word (or maybe more than one bit). That bit doesn't have to encode > the whole state of the lock; in fact if it did we'd use getAndSet of a boolean. > The point is you can build lock state management on top of getAndBitwise* > in useful ways, when if the first interaction with the lock is to assert a setting > > I'm still thinking about where in j.u.c. we would use getAndBitwise*. > > ... StampedLock ... > > we have to distinguish readers and writers, so both readers and writers acquire the micro-lock before proceeding on success to do another write to indicate the actual current lock state. We'd better not lose our time slice in between! If an acquirer fails to acquire the micro-lock in an indeterminate state, they probably spin waiting for the micro-lock owner, but for how long? Yes, more work is needed to make that operate correctly. I suppose we can reuse an idea from HotSpot and have compact and inflated states for such locks. In a nutshell, it works like this: The compact state needs at a minimum just enough bits to encode semantic lock state, plus distinguish compact from inflation states. The lock would try to stay in the compact state, but inflate if waiter lists need to be dealt with. The inflated state would have an out-of-line control block with waiter queues and every creature comfort. It might be hard to do this on top of the JVM, which likes to use safepoints to pull tricks like deflating cold locks. > ReentrantLock seems more promising. The micro-lock bit unambiguously indicates "exclusively held"; other bits are reentrant hold count bits. On reentrant acquire, have to check thread field: > lock.thread == Thread.currentThread(). > If we don't acquire reentrantly, then a single getAndSetMicroLock is sufficient to unambiguously acquire the lock. > > >> ReentrantLock is reentrant (!) so needs to store the lock hold count. Perhaps ReentrantLock could benefit if you optimize for non-reentrant acquires, at the cost of doing an extra update for reentrant acquires. > > > It seems to me that any multi-field concurrent structure (like a StampedLock) > could be protected by a single-bit micro-lock built on top of a reserved bit taken > from one of the structure's fields. There are often reasons not to do such > things, but when the technique is appropriate, the bitwise operators let you > lay down the bit inside the same cache line as the rest of the structure. > That seems like a win to me. > > Some day we can persuade the JVM to loosen its grip on the slack bits > in pointers, allowing types like AtomicMarkableReference to be implemented > in one word. In that case, AMR.attemptMark might use BTS/BTR. > > But ... AtomicMarkableReference probably needs to be implemented in the VM, not in pure Java code that uses VarHandles, since pointer bit stealing depends on things like compressed oops? What I mean by "loosen its grip" is share enough layout information about pointers that Java code can find and use a slack bit in the pointer format. (And if there isn't such a bit, then Java code would have to go away and do something else.) Also, for pointers which are treated this way, the GC would have to mask off the shared bits. ? John From john.r.rose at oracle.com Wed Jul 20 00:33:52 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 19 Jul 2016 17:33:52 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> Message-ID: On Jul 19, 2016, at 5:25 PM, Martin Buchholz wrote: > > We should probably clarify whether we really mean that even word-tearing on > longs/doubles is allowed. Yuck. This is one of the reasons reason "Plain" is also "Odd". I long for the day when I can fully appreciate this problem?in the rear view mirror. ? John From john.r.rose at oracle.com Wed Jul 20 00:44:33 2016 From: john.r.rose at oracle.com (John Rose) Date: Tue, 19 Jul 2016 17:44:33 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <60A0B4AA-3521-44D4-941A-8ADEF6AF1B23@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <60A0B4AA-3521-44D4-941A-8ADEF6AF1B23@oracle.com> Message-ID: <23F4B458-905D-4F07-85E3-BDF6BE6E374C@oracle.com> On Jul 19, 2016, at 5:31 PM, John Rose wrote: > > What I mean by "loosen its grip" is share enough layout information about pointers that Java code can find and use a slack bit in the pointer format. (And if there isn't such a bit, then Java code would have to go away and do something else.) Also, for pointers which are treated this way, the GC would have to mask off the shared bits. P.S. One more thought on this: We probably need a special marking for pointer variables which have this funny property. The JVM can lay them out with 64 bits even when they are compressed, and then inform Java code how many slack bits are available. Some days 32 (compressed oops), some days 3 (all 61 high bits are significant) and some days 8/16/24. In the non-compressed case, the GC will have to mask off the bits which the JVM shares with the Java code. This is related to some work Rickard Backman did in 2012, where a 64-bit pointer variable could also contain non-pointer bits usable for any purpose. In that case, the bits were mutually exclusive with the pointer, and a tag scheme would tell the GC and everybody else what was in the variable. This is a different semantics from "stolen" color bits or flag bits, but has many of the same implementation moves. http://cr.openjdk.java.net/~rbackman/tagged.patch/mlvm.hs.patch In the future, we can probably use value types as a principled way to mark such special variables for special processing by the JVM. (I'm thinking TaggedReference, Contended, WeakReference, etc. Details to be worked out later?) From Paul.Sandoz at oracle.com Wed Jul 20 08:25:29 2016 From: Paul.Sandoz at oracle.com (Paul Sandoz) Date: Wed, 20 Jul 2016 10:25:29 +0200 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> Message-ID: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> > On 20 Jul 2016, at 02:25, Martin Buchholz > wrote: > > > > On Tue, Jul 19, 2016 at 1:51 PM, Andrew Haley > wrote: > On 19/07/16 09:39, Paul Sandoz wrote: > > Plain behaves like non-volatile/non-final field access e.g. like > > get/putfield byte codes. > > We should probably clarify whether we really mean that even word-tearing on longs/doubles is allowed. > Just to be clear you are referring to atomicity rather than word tearing as specified by JLS: https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 ? (I have tended to use word tearing interchangeably in the past and it has caused confusion.) > C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses: truly atomic (!) and single-memory-location-sequentially-consistent. Yes, it?s the latter that seems harder to apply. Paul. From paul.sandoz at oracle.com Wed Jul 20 08:28:26 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 20 Jul 2016 10:28:26 +0200 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com> Message-ID: <463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com> > On 19 Jul 2016, at 20:56, John Rose wrote: > > On Jul 19, 2016, at 7:23 AM, Paul Sandoz > wrote: >> >> Here is an initial (and untested) webrev for those that might be interested: >> >> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ > > > I like this very much. > > Mainly because it gives us single-bit atomics, but also because, with C++, it leans towards the newer ISAs. > Just to be clear i added a bunch of methods to Unsafe in the anticipation they will be made intrinsic. ? Perhaps i am asking for trouble bringing this up, but do we require acquire/release variants of getAndAdd and getAndSet? (IMHO we could drop addAndGet). Paul. From dl at cs.oswego.edu Wed Jul 20 10:37:31 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 20 Jul 2016 06:37:31 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com> <463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com> Message-ID: On 07/20/2016 04:28 AM, Paul Sandoz wrote: > > Perhaps i am asking for trouble bringing this up, but do we require acquire/release variants of getAndAdd and getAndSet? (IMHO we could drop addAndGet). > I had noticed this as well. I agree that we ought to do this for the sake of consistency; adding: getAndAddRelease, getAndAddAcquire, getAndSetRelease, getAndSetAcquire -Doug From dl at cs.oswego.edu Wed Jul 20 12:49:14 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 20 Jul 2016 08:49:14 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> On 07/20/2016 04:25 AM, Paul Sandoz wrote: >> C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses: truly atomic (!) and single-memory-location-sequentially-consistent. > > Yes, it?s the latter that seems harder to apply. > To illustrate the main consequence (also showing how Java-Plain vs C++relaxed differences are so small and subtle), in C++-relaxed, compilers cannot perform some forms of common subexpression elimination in the presence of possible aliasing, but for Java-plain (and C++-plain), they can. As in: class Point ( int x, y; } void f(Point a, Point b) { int r1 = a.x; int r2 = b.x; int r3 = a.x; // simplify to: int r3 = r1 ? use (r1, r2, r3); } If the accesses were C++-relaxed, then the transformation could not be applied if a and b are the same point because the r3 read might be older than r2 if some other thread wrote between the reads. But C++-plain and Java-Plain both allow this to be done anyway. Intuitively, because the per view (a vs b) reads are "coherent", which is spec'ed as OK even though the per-location rule need not hold. (Mostly unrelatedly, note that if a and b were known to be aliased, then you could apply this transformation if you first simplified the "r2 = b.x" to "r2 = r1".) -Doug From paul.sandoz at oracle.com Wed Jul 20 14:18:02 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 20 Jul 2016 16:18:02 +0200 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com> <463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com> Message-ID: > On 20 Jul 2016, at 12:37, Doug Lea
wrote: > > On 07/20/2016 04:28 AM, Paul Sandoz wrote: > >> >> Perhaps i am asking for trouble bringing this up, but do we require acquire/release variants of getAndAdd and getAndSet? (IMHO we could drop addAndGet). >> > > I had noticed this as well. I agree that we ought to do this for the > sake of consistency; adding: > > getAndAddRelease, getAndAddAcquire, getAndSetRelease, getAndSetAcquire > > Updated: http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ Again the same trick with Unsafe methods is applied in the anticipation they will be made intrinsic later on. On second thoughts it may be better if the currently non-intrinsic Unsafe acquire/release variants defer to the stronger volatile variant that is intrinsic. Any opinions on that? I will defer the removal of addAndGet and the proposed renaming to separate patch. Paul. From john.r.rose at oracle.com Wed Jul 20 16:32:21 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 20 Jul 2016 09:32:21 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com> <463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com> Message-ID: <0ECAA8CD-1751-4C31-8EB5-DF7ED38A96DF@oracle.com> On Jul 20, 2016, at 7:18 AM, Paul Sandoz wrote: > On second thoughts it may be better if the currently non-intrinsic Unsafe acquire/release variants defer to the stronger volatile variant that is intrinsic. Any opinions on that? I would prefer that the default implementations of the various bitwise ops defer to the same-flavored CAS ops instead of to the volatile bitwise ops. Reason: On platforms without rich bitwise ops (x86, SPARC) you lose memory ordering information if you alias to the volatile version. (It's not a strong reason, since those CPUs are TSO.) Platforms with rich bitwise ops are also likely to have rich fences, so again there's no benefit to aliasing to the volatile version. ? John From martinrb at google.com Wed Jul 20 16:59:49 2016 From: martinrb at google.com (Martin Buchholz) Date: Wed, 20 Jul 2016 09:59:49 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz wrote: > > > We should probably clarify whether we really mean that even word-tearing > on longs/doubles is allowed. > > Just to be clear you are referring to atomicity rather than word tearing > as specified by JLS: > > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 < > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6> > > ? (I have tended to use word tearing interchangeably in the past and it > has caused confusion.) > > YIKES! I just re-read 17.6. Word Tearing and 17.7. Non-atomic Treatment of double and long and now realize I've been using "word tearing" to mean 17.7 instead of 17.6 for many years. I don't have a good word for 17.6, but I want something along the lines of "ghost writes" or "collateral damage". Am I supposed to visualize "tearing" as (sad eye water) tears running out of one byte across neighbor bytes? From john.r.rose at oracle.com Wed Jul 20 17:05:48 2016 From: john.r.rose at oracle.com (John Rose) Date: Wed, 20 Jul 2016 10:05:48 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: On Jul 20, 2016, at 9:59 AM, Martin Buchholz wrote: > > On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz wrote: > >> >>> We should probably clarify whether we really mean that even word-tearing >> on longs/doubles is allowed. >> >> Just to be clear you are referring to atomicity rather than word tearing >> as specified by JLS: >> >> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 < >> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6> >> >> ? (I have tended to use word tearing interchangeably in the past and it >> has caused confusion.) >> >> > YIKES! I just re-read > 17.6. Word Tearing > and > 17.7. Non-atomic Treatment of double and long > > and now realize I've been using "word tearing" to mean 17.7 instead of 17.6 > for many years. I don't have a good word for 17.6, but I want something > along the lines of "ghost writes" or "collateral damage". > > Am I supposed to visualize "tearing" as (sad eye water) tears running out > of one byte across neighbor bytes? I call the 17.7 thing "struct tearing", in the State of the Values 2014. http://cr.openjdk.java.net/~jrose/values/values.html ? John From boehm at acm.org Wed Jul 20 18:20:43 2016 From: boehm at acm.org (Hans Boehm) Date: Wed, 20 Jul 2016 11:20:43 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: You are not alone. I have the suspicion that "word tearing" used to mean 17.7 before the 2005 JLS revision. But the JLS usage seems to have won, for better or worse, at least in Java circles. On Wed, Jul 20, 2016 at 9:59 AM, Martin Buchholz wrote: > On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz > wrote: > > > > > > We should probably clarify whether we really mean that even > word-tearing > > on longs/doubles is allowed. > > > > Just to be clear you are referring to atomicity rather than word tearing > > as specified by JLS: > > > > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 < > > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6> > > > > ? (I have tended to use word tearing interchangeably in the past and it > > has caused confusion.) > > > > > YIKES! I just re-read > 17.6. Word Tearing > and > 17.7. Non-atomic Treatment of double and long > > and now realize I've been using "word tearing" to mean 17.7 instead of 17.6 > for many years. I don't have a good word for 17.6, but I want something > along the lines of "ghost writes" or "collateral damage". > > Am I supposed to visualize "tearing" as (sad eye water) tears running out > of one byte across neighbor bytes? > From boehm at acm.org Wed Jul 20 18:42:30 2016 From: boehm at acm.org (Hans Boehm) Date: Wed, 20 Jul 2016 11:42:30 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz wrote: > > > > On 20 Jul 2016, at 02:25, Martin Buchholz > wrote: > > > C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses: truly atomic (!) and single-memory-location-sequentially-consistent. > > Yes, it?s the latter that seems harder to apply. > I'm not sure whether it's "harder to apply" or "less consciously assumed", i.e. generally implicitly assumed, but without the programmer's awareness. At least in my experience, memory_order_relaxed tends to be surprisingly commonly used for what one might call "single word data structures": An individual word that describes some aspect of the state independent of other data structures. I suspect a lot of such code is not prepared to see such data flip-flop back and forth repeatedly as the result of a single update. If a counter is only ever incremented by a single thread, programmers don't expect it to decrease. At a minimum, it's much easier to reason about such code if you don't have to consider this possibility. All hardware vendors either provide the property by default (errata aside), or provide a relatively cheap mechanism that adds it (only Itanium that I know of). I believe the property is worth its (compiler only on the most commonly used hardware) performance cost where you have some reason to believe that the data is concurrently accessed. It's pretty clearly undesirable for plain non-racing accesses, since it does interfere with compiler optimization. I would put it in a similar category to long/double/struct atomicity. From dl at cs.oswego.edu Wed Jul 20 19:16:26 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 20 Jul 2016 15:16:26 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> Message-ID: <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> Replying to Hans by replying to myself :-) On 07/20/2016 08:49 AM, Doug Lea wrote: > in C++-relaxed, compilers cannot perform > some forms of common subexpression elimination in the presence of possible > aliasing, but for Java-plain (and C++-plain), they can. As in: > > class Point ( int x, y; } > > void f(Point a, Point b) { > int r1 = a.x; > int r2 = b.x; > int r3 = a.x; // simplify to: int r3 = r1 ? > use (r1, r2, r3); > } Or, in pseudo-VarHandle style using "getM" (for varying Ms): static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, "x", int.class); void f(Point a, Point b) { int r1 = PX.getM(a); int r2 = PX.getM(b); int r3 = PX.getM(a); // * use (r1, r2, r3); } Can you simplify (*) to "r3 = r1" ? It depends on M: * Java-Plain and C++-Plain: yes. * Java Opaque: no. * C++-Relaxed: only if a != b. * (And, for the record, other modes: no) This is one reason "opaque" mode is needed. Neither Plain nor Opaque exactly match C++ Relaxed atomics, but together you can express everything (and probably more). You can create similar but more contrived-looking examples for read-after-write and write-after-write. And also for write-after-read, but that one may interact with out-of-thin-air and related issues. (Which if we had a good enough solution for, or even knew how to encapsulate, fleshing out formal/formalizable specs on the above should not be hard. People do continue to work on this, so there is still hope.) -Doug From martinrb at google.com Wed Jul 20 21:11:37 2016 From: martinrb at google.com (Martin Buchholz) Date: Wed, 20 Jul 2016 14:11:37 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: 17.6 should be called "word bleeding" 17.7 should be called "long fission" (fission breaks up your atoms!) On Wed, Jul 20, 2016 at 11:20 AM, Hans Boehm wrote: > You are not alone. I have the suspicion that "word tearing" used to mean > 17.7 before the 2005 JLS revision. But the JLS usage seems to have won, > for better or worse, at least in Java circles. > > On Wed, Jul 20, 2016 at 9:59 AM, Martin Buchholz > wrote: > >> On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz >> wrote: >> >> > >> > > We should probably clarify whether we really mean that even >> word-tearing >> > on longs/doubles is allowed. >> > >> > Just to be clear you are referring to atomicity rather than word tearing >> > as specified by JLS: >> > >> > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 >> < >> > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6> >> > >> > ? (I have tended to use word tearing interchangeably in the past and it >> > has caused confusion.) >> > >> > >> YIKES! I just re-read >> 17.6. Word Tearing >> and >> 17.7. Non-atomic Treatment of double and long >> >> and now realize I've been using "word tearing" to mean 17.7 instead of >> 17.6 >> for many years. I don't have a good word for 17.6, but I want something >> along the lines of "ghost writes" or "collateral damage". >> >> Am I supposed to visualize "tearing" as (sad eye water) tears running out >> of one byte across neighbor bytes? >> > > From boehm at acm.org Wed Jul 20 23:17:47 2016 From: boehm at acm.org (Hans Boehm) Date: Wed, 20 Jul 2016 16:17:47 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> Message-ID: On Wed, Jul 20, 2016 at 12:16 PM, Doug Lea
wrote: > > Replying to Hans by replying to myself :-) > > On 07/20/2016 08:49 AM, Doug Lea wrote: > >> in C++-relaxed, compilers cannot perform >> some forms of common subexpression elimination in the presence of >> possible >> aliasing, but for Java-plain (and C++-plain), they can. As in: >> >> class Point ( int x, y; } >> >> void f(Point a, Point b) { >> int r1 = a.x; >> int r2 = b.x; >> int r3 = a.x; // simplify to: int r3 = r1 ? >> use (r1, r2, r3); >> } >> > > Or, in pseudo-VarHandle style using "getM" (for varying Ms): > > static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, > "x", int.class); > > void f(Point a, Point b) { > int r1 = PX.getM(a); > int r2 = PX.getM(b); > int r3 = PX.getM(a); // * > use (r1, r2, r3); > } > > Can you simplify (*) to "r3 = r1" ? It depends on M: > * Java-Plain and C++-Plain: yes. > * Java Opaque: no. > Does Opaque imply cache coherence? In the Opaque case, is r3 guaranteed to see a store that is no earlier than the one seen by r2? Or are we still only talking compiler optimizations? > * C++-Relaxed: only if a != b. * (And, for the record, other modes: no) > What if the compiler knows that a==b? Can all the get()s be merged, even for Opaque? > This is one reason "opaque" mode is needed. Neither Plain nor Opaque > exactly > match C++ Relaxed atomics, but together you can express everything (and > probably more). > > You can create similar but more contrived-looking examples for > read-after-write and write-after-write. And also for write-after-read, > but that one may interact with out-of-thin-air and related issues. > (Which if we had a good enough solution for, or even knew how to > encapsulate, fleshing out formal/formalizable specs on the above should > not be hard. People do continue to work on this, so there is still hope.) > > -Doug > > From david.holmes at oracle.com Thu Jul 21 05:16:52 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 21 Jul 2016 15:16:52 +1000 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: On 21/07/2016 4:20 AM, Hans Boehm wrote: > You are not alone. I have the suspicion that "word tearing" used to mean > 17.7 before the 2005 JLS revision. But the JLS usage seems to have won, > for better or worse, at least in Java circles. No not at all. word-tearing has "always" concerned the inability to perform sub-word atomic accesses - ie the subword has to be torn out of the word. Here's a 2001 reference which was part of the discussion that led to the JLS update :) http://www.cs.umd.edu/~pugh/java/memoryModel/archive/0967.html Cheers, David > > On Wed, Jul 20, 2016 at 9:59 AM, Martin Buchholz > wrote: > >> On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz >> wrote: >> >>> >>>> We should probably clarify whether we really mean that even >> word-tearing >>> on longs/doubles is allowed. >>> >>> Just to be clear you are referring to atomicity rather than word tearing >>> as specified by JLS: >>> >>> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 < >>> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6> >>> >>> ? (I have tended to use word tearing interchangeably in the past and it >>> has caused confusion.) >>> >>> >> YIKES! I just re-read >> 17.6. Word Tearing >> and >> 17.7. Non-atomic Treatment of double and long >> >> and now realize I've been using "word tearing" to mean 17.7 instead of 17.6 >> for many years. I don't have a good word for 17.6, but I want something >> along the lines of "ghost writes" or "collateral damage". >> >> Am I supposed to visualize "tearing" as (sad eye water) tears running out >> of one byte across neighbor bytes? >> From aph at redhat.com Thu Jul 21 05:45:40 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jul 2016 06:45:40 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> Message-ID: <57906184.3070008@redhat.com> On 20/07/16 01:25, Martin Buchholz wrote: > We should probably clarify whether we really mean that even word-tearing on > longs/doubles is allowed. I surely hope that the answer to that is "no"! > C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses: > truly atomic (!) and single-memory-location-sequentially-consistent. Earlier in the development of this respin of the JMM, I remember someone (Doug?) saying that compatibility with C++ was an important consideration, We seem to be drifting away from that, for no good reason that I understand. Andrew. From aph at redhat.com Thu Jul 21 05:53:48 2016 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jul 2016 06:53:48 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <57906184.3070008@redhat.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <57906184.3070008@redhat.com> Message-ID: <5790636C.1050804@redhat.com> On 21/07/16 06:45, Andrew Haley wrote: > Earlier in the development of this respin of the JMM, I remember > someone (Doug?) saying that compatibility with C++ was an important > consideration, We seem to be drifting away from that, for no good > reason that I understand. I withdraw this comment in the light of later replies to this thread. Andrew. From dl at cs.oswego.edu Thu Jul 21 12:53:07 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 21 Jul 2016 08:53:07 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> Message-ID: <2bdaf6ba-7b14-3729-86ea-8a76d6aadf4f@cs.oswego.edu> On 07/20/2016 07:17 PM, Hans Boehm wrote: > Can you simplify (*) to "r3 = r1" ? It depends on M: > * Java-Plain and C++-Plain: yes. > * Java Opaque: no. > > Does Opaque imply cache coherence? Here, yes. > * C++-Relaxed: only if a != b. > > * (And, for the record, other modes: no) > > What if the compiler knows that a==b? Can all the get()s be merged, even for Opaque? Not in general unless thread-private (unescaped). I hope to write up a summary of progress and open issues soon, but in the mean time, here is the extremely telegraphic version of my current thoughts: Start with coherence, characterized in same way as current C++17 draft (http://www.open-std.org/jtc1/sc22/wg21/). Like C++17, use sc-per-loc as basis, here for for opaque, but (unlike C++ relaxed) constraining read->write reorderings ("prescient" or "promised" writes), probably based on in-progress work by mpi-sws group. Distinguish Plain from Opaque by weakening to sc-per-view (i.e., possibly-aliased access paths). Disable merges for all modes except plain if there exist any possible execution that could detect doing so (for example disabling transforming a spin-loop into an "if"). Account for possible mode weakenings for (unescaped) thread-private variables among other cases. Add RA (release/acquire) and SC ("volatile") rules based on work over the past year or so by Mark Batty, Viktor Vafeiadis, and others (which also seem mostly present in C++17 draft). Also add other fence and final field rules. -Doug From john.r.rose at oracle.com Fri Jul 22 02:57:18 2016 From: john.r.rose at oracle.com (John Rose) Date: Thu, 21 Jul 2016 19:57:18 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: On Jul 20, 2016, at 1:25 AM, Paul Sandoz wrote: >> >> On Tue, Jul 19, 2016 at 1:51 PM, Andrew Haley > wrote: >>> On 19/07/16 09:39, Paul Sandoz wrote: >>> Plain behaves like non-volatile/non-final field access e.g. like >>> get/putfield byte codes. >> >> We should probably clarify whether we really mean that even word-tearing on longs/doubles is allowed. Putting aside the history and esthetics of terms, the big question here is whether to remove the exception for 64 bit types in 17.7 (Non-atomic Treatment of double and long), and mandate that all primitive types are atomic, including non-volatile longs and doubles. Is it time to do that yet, or is there some 32-bit JVM out there that will fall over if it has to do the volatile dance even on non-volatile types? I'm going to guess that there still *ARE* such JVMs out there, but their number is decreasing exponentially over time. Eventually we can do this. Argument to keep things as they are: 64-bit non-atomicity (aka struct tearing) is just a precursor to non-atomicity of 128-bit and larger value types. It's a permanent feature on our landscape, so don't fight it. VarHandles provide an easy-enough way to select either the atomic or the non-atomic accesses for 64-bit things (right?) and presumably they will do the same for value types. I see one possibly urgent argument to move away from non-atomicity of longs in the VH API: VH's support atomic operations on arbitrary, unprepared longs. (Before, the JVM got fair warning of atomicity, because the long was tagged as "volatile". Now any long is fair game.) Doesn't that require even plain references to at least look around carefully for a VH, before they just move the two halves non-atomically? After all, for large structs, the STM required for atomics is *incompatible* with the naive component-wise loads and stores. Or, do VH's just refuse to perform atomic operations on non-volatile longs, on 32-bit machines? One way to avoid these corner cases is just chop off the corner, and require all JVMs to treat 64-bit primitives as atomic, always. ? John From dl at cs.oswego.edu Fri Jul 22 11:11:00 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 22 Jul 2016 07:11:00 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: <71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu> On 07/21/2016 10:57 PM, John Rose wrote: > Putting aside the history and esthetics of terms, the big question > here is whether to remove the exception for 64 bit types in > 17.7 (Non-atomic Treatment of double and long), and mandate > that all primitive types are atomic, including non-volatile longs > and doubles. (This was in the initial issues list for JMM revision, and like every other issue, refuses to go away all by itself :-) > > Is it time to do that yet, or is there some 32-bit JVM out there > that will fall over if it has to do the volatile dance even on > non-volatile types? > > I'm going to guess that there still *ARE* such JVMs out there, > but their number is decreasing exponentially over time. > Eventually we can do this. > > Argument to keep things as they are: 64-bit non-atomicity > (aka struct tearing) is just a precursor to non-atomicity of 128-bit > and larger value types. It's a permanent feature on our landscape, > so don't fight it. Right. I think that this where we last left this. > > VarHandles provide an easy-enough way to select either the > atomic or the non-atomic accesses for 64-bit things (right?) > and presumably they will do the same for value types. > > I see one possibly urgent argument to move away from > non-atomicity of longs in the VH API: VH's support atomic > operations on arbitrary, unprepared longs. The recommended usage is that (as has always been the case), concurrently accessible fields should be declared as volatile. This provides safe defaults. People can then use VH for other (non-Plain) access methods. If people follow this usage guidance, all is well. Except that this doesn't hold for array elements, that cannot be declared as volatile. Here, usages relying on access atomicity must use only non-Plain access methods. This is not always easy to ensure -- people need to avoid calling other methods that might access elements without VarHandles unless there is no possibility of concurrent access during call. But people writing intentionally racy code using arrays need to be careful about things like this anyway. -Doug From paul.sandoz at oracle.com Fri Jul 22 11:38:01 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 22 Jul 2016 13:38:01 +0200 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: <42711493-491F-4326-88DF-8BB6CBBE3C39@oracle.com> > On 22 Jul 2016, at 04:57, John Rose wrote: > > On Jul 20, 2016, at 1:25 AM, Paul Sandoz wrote: > >>> >>> On Tue, Jul 19, 2016 at 1:51 PM, Andrew Haley > wrote: >>>> On 19/07/16 09:39, Paul Sandoz wrote: >>>> Plain behaves like non-volatile/non-final field access e.g. like >>>> get/putfield byte codes. >>> >>> We should probably clarify whether we really mean that even word-tearing on longs/doubles is allowed. > > Putting aside the history and esthetics of terms, the big question > here is whether to remove the exception for 64 bit types in > 17.7 (Non-atomic Treatment of double and long), and mandate > that all primitive types are atomic, including non-volatile longs > and doubles. > > Is it time to do that yet, or is there some 32-bit JVM out there > that will fall over if it has to do the volatile dance even on > non-volatile types? > > I'm going to guess that there still *ARE* such JVMs out there, > but their number is decreasing exponentially over time. > Eventually we can do this. > > Argument to keep things as they are: 64-bit non-atomicity > (aka struct tearing) is just a precursor to non-atomicity of 128-bit > and larger value types. It's a permanent feature on our landscape, > so don't fight it. > > VarHandles provide an easy-enough way to select either the > atomic or the non-atomic accesses for 64-bit things (right?) Yes, set/get is not guaranteed to be atomic, all other accesses are under the guidelines Doug mentions in his last email. > and presumably they will do the same for value types. > > I see one possibly urgent argument to move away from > non-atomicity of longs in the VH API: VH's support atomic > operations on arbitrary, unprepared longs. (Before, the > JVM got fair warning of atomicity, because the long > was tagged as "volatile". Now any long is fair game.) > Doesn't that require even plain references to at least > look around carefully for a VH, before they just move > the two halves non-atomically? After all, for large > structs, the STM required for atomics is *incompatible* > with the naive component-wise loads and stores. > > Or, do VH's just refuse to perform atomic operations on > non-volatile longs, on 32-bit machines? > We have some jcstress tests checking atomicity. The concern i have implementation-wise is AtomicLong has this: /** * Records whether the underlying JVM supports lockless * compareAndSwap for longs. While the intrinsic compareAndSwapLong * method works in either case, some constructions should be * handled at Java level to avoid locking user-visible locks. */ static final boolean VM_SUPPORTS_LONG_CAS = VMSupportsCS8(); /** * Returns whether underlying JVM supports lockless CompareAndSet * for longs. Called only once and cached in VM_SUPPORTS_LONG_CAS. */ private static native boolean VMSupportsCS8(); And that field VM_SUPPORTS_LONG_CAS is used only in AtomicLongFieldUpdater. Relevant VarHandle implementations don?t currently make this distinction. At the moment i don?t fully understand the bit about "locking user-visible locks?. However, this seems separate from atomicity. Paul. > One way to avoid these corner cases is just chop off > the corner, and require all JVMs to treat 64-bit primitives > as atomic, always. > > ? John From dl at cs.oswego.edu Fri Jul 22 12:27:30 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 22 Jul 2016 08:27:30 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <42711493-491F-4326-88DF-8BB6CBBE3C39@oracle.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <42711493-491F-4326-88DF-8BB6CBBE3C39@oracle.com> Message-ID: On 07/22/2016 07:38 AM, Paul Sandoz wrote: > The concern i have implementation-wise is AtomicLong has this: > > /** > * Records whether the underlying JVM supports lockless > * compareAndSwap for longs. While the intrinsic compareAndSwapLong > * method works in either case, some constructions should be > * handled at Java level to avoid locking user-visible locks. > */ > static final boolean VM_SUPPORTS_LONG_CAS = VMSupportsCS8(); > This was initially needed to support Power5. I am not sure if it returns false on any jdk9-supported platforms -- if so, probably only non-OpenJDK "closed" ones. This problem is/was that volatile long reads are implemented differently than AtomicLong.get (i.e., VH getVolatile) and that the lock-based CAS implementation relied on the latter. The internal AtomicLongFieldUpdater.LockedUpdater was used to make sure that the locked versions were always used so long as all accesses used the Updater. People working on closed hotspot ports are invited to help figure out whether this is still necessary. -Doug From martinrb at google.com Fri Jul 22 20:29:58 2016 From: martinrb at google.com (Martin Buchholz) Date: Fri, 22 Jul 2016 13:29:58 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: On Wed, Jul 20, 2016 at 10:16 PM, David Holmes wrote: > On 21/07/2016 4:20 AM, Hans Boehm wrote: > >> You are not alone. I have the suspicion that "word tearing" used to mean >> 17.7 before the 2005 JLS revision. But the JLS usage seems to have won, >> for better or worse, at least in Java circles. >> > > No not at all. word-tearing has "always" concerned the inability to > perform sub-word atomic accesses - ie the subword has to be torn out of the > word. > > Here's a 2001 reference which was part of the discussion that led to the > JLS update :) > > http://www.cs.umd.edu/~pugh/java/memoryModel/archive/0967.html Thanks for the history lesson. "word tearing" still seems unintuitive to me - it's the INability to tear up a word into sub-words that's the problem. That is, "word tearing" is not the problem, it's the solution we can't use! But my own "word bleeding" is also not that great, and unlikely to catch on. From martinrb at google.com Fri Jul 22 20:50:57 2016 From: martinrb at google.com (Martin Buchholz) Date: Fri, 22 Jul 2016 13:50:57 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu> Message-ID: On Fri, Jul 22, 2016 at 4:11 AM, Doug Lea
wrote: > > The recommended usage is that (as has always been the case), concurrently > accessible fields should be declared as volatile. This provides safe > defaults. > People can then use VH for other (non-Plain) access methods. > If people follow this usage guidance, all is well. Except that this doesn't > hold for array elements, that cannot be declared as volatile. > Here, usages relying on access atomicity must use only non-Plain access > methods. This is not always easy to ensure -- people need to avoid > calling other methods that might access elements without VarHandles > unless there is no possibility of concurrent access during call. > But people writing intentionally racy code using arrays need to be careful > about things like this anyway. Is it reasonable to add syntax for volatile array elements? There's obvious confusion between the reference and the elements, e.g. we already have volatile int[] volatile_array_reference; It could go like C declarations. Then we would get (volatile int)[] array_with_volatile_elements; volatile (volatile int)[] volatile_array_reference_with_volatile_elements; Yeah, they'll hate us forever for adding that! From boehm at acm.org Fri Jul 22 21:51:03 2016 From: boehm at acm.org (Hans Boehm) Date: Fri, 22 Jul 2016 14:51:03 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: "Word-tearing" was definitely and consistently used that way in the JSR 133 discussions. That's where the JLS terminlogy came from. But people I've talked to who didn't participate in that effort generally seemed to share Martin's opinion. On Fri, Jul 22, 2016 at 1:29 PM, Martin Buchholz wrote: > > > On Wed, Jul 20, 2016 at 10:16 PM, David Holmes > wrote: > >> On 21/07/2016 4:20 AM, Hans Boehm wrote: >> >>> You are not alone. I have the suspicion that "word tearing" used to mean >>> 17.7 before the 2005 JLS revision. But the JLS usage seems to have won, >>> for better or worse, at least in Java circles. >>> >> >> No not at all. word-tearing has "always" concerned the inability to >> perform sub-word atomic accesses - ie the subword has to be torn out of the >> word. >> >> Here's a 2001 reference which was part of the discussion that led to the >> JLS update :) >> >> http://www.cs.umd.edu/~pugh/java/memoryModel/archive/0967.html > > > Thanks for the history lesson. "word tearing" still seems unintuitive to > me - it's the INability to tear up a word into sub-words that's the > problem. That is, "word tearing" is not the problem, it's the solution we > can't use! But my own "word bleeding" is also not that great, and unlikely > to catch on. > From boehm at acm.org Fri Jul 22 22:07:03 2016 From: boehm at acm.org (Hans Boehm) Date: Fri, 22 Jul 2016 15:07:03 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu> Message-ID: On Fri, Jul 22, 2016 at 4:11 AM, Doug Lea
wrote: > > On 07/21/2016 10:57 PM, John Rose wrote: > >> Putting aside the history and esthetics of terms, the big question >> here is whether to remove the exception for 64 bit types in >> 17.7 (Non-atomic Treatment of double and long), and mandate >> that all primitive types are atomic, including non-volatile longs >> and doubles. > > > (This was in the initial issues list for JMM revision, and like > every other issue, refuses to go away all by itself :-) > >> >> Is it time to do that yet, or is there some 32-bit JVM out there >> that will fall over if it has to do the volatile dance even on >> non-volatile types? >> >> I'm going to guess that there still *ARE* such JVMs out there, >> but their number is decreasing exponentially over time. >> Eventually we can do this. >> >> Argument to keep things as they are: 64-bit non-atomicity >> (aka struct tearing) is just a precursor to non-atomicity of 128-bit >> and larger value types. It's a permanent feature on our landscape, >> so don't fight it. > > > Right. I think that this where we last left this. The other argument that we missed last time is there are MIPS variants for which atomicity of 64-bit types is expensive. The same applies to 32 bit ARM with the "large physical address extension". I don't think these constitute a large fraction of the interesting devices anymore, but I also suspect there are still way too many of them to be ignored. From john.r.rose at oracle.com Fri Jul 22 22:15:34 2016 From: john.r.rose at oracle.com (John Rose) Date: Fri, 22 Jul 2016 15:15:34 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> Message-ID: On Jul 22, 2016, at 2:51 PM, Hans Boehm wrote: > > "Word-tearing" was definitely and consistently used that way in the JSR 133 > discussions. That's where the JLS terminlogy came from. The word that is torn is the word *containing* the datum of interest. Which is why when people quite naturally assume that "the word" is the the datum itself, the term is unintelligible. With "struct tearing" it is more clear that the thing being torn is in fact the datum of interest. Where "torn" covers all cases of "exposes non-semantic memory states in variables". On Jul 22, 2016, at 1:50 PM, Martin Buchholz wrote: > Is it reasonable to add syntax for volatile array elements? It's difficult. I've been thinking about this for a while for the related use case of frozen arrays (array-of-final-T). With frozen arrays you need a store check even for primitive arrays. For volatile arrays you'd need a load check and a store check for all arrays. Since all array references include a range check, a JVM implementor would want to find a clever way of piggy-backing the load and store checks onto the range check, and then having all the checks float together out of loops. Really, an array of volatiles is an oxymoron like a herd of cats or team of individuals. In normal arrays there is affinity between neighboring values; with volatiles there is a basic decoupling between neighbors, since each has its own distinct sequence of effects. (I'm speaking of typical uses of arrays; you can use arrays as low-level storage where neighbors have no logical connection with each other. We optimize for the typical case.) All this assumes we make array-of-volatile-ints be a subtype of array-of-int-type. An alternative today is to have array-like container types which are *not* related to the legacy array type. Like a private normal array, accessed only by VH-based atomics. ? John From aph at redhat.com Mon Jul 25 08:54:14 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 25 Jul 2016 09:54:14 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> Message-ID: <5795D3B6.9050407@redhat.com> On 20/07/16 20:16, Doug Lea wrote: > Or, in pseudo-VarHandle style using "getM" (for varying Ms): > > static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, "x", > int.class); > > void f(Point a, Point b) { > int r1 = PX.getM(a); > int r2 = PX.getM(b); > int r3 = PX.getM(a); // * > use (r1, r2, r3); > } > > Can you simplify (*) to "r3 = r1" ? It depends on M: > * Java-Plain and C++-Plain: yes. > * Java Opaque: no. > * C++-Relaxed: only if a != b. > * (And, for the record, other modes: no) > > This is one reason "opaque" mode is needed. But the processor hardware is allowed to simplify (*) to "r3 = r1" even if Opaque is used. So, again, why does it matter? Andrew. From aph at redhat.com Mon Jul 25 08:54:20 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 25 Jul 2016 09:54:20 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <578E92D9.80201@redhat.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> Message-ID: <5795D3BC.4050308@redhat.com> On 19/07/16 21:51, Andrew Haley wrote: > On 19/07/16 09:39, Paul Sandoz wrote: >> >> Both plain and opaque have ?no assurance of memory ordering effects >> with respect to other threads? but opaque is stronger in the sense >> that the compiler is restricted in what optimisations it may >> perform, in a sense the access is ?opaque? to the compiler e.g. it >> cannot elide the access or fold it into a more recent access etc. > > OK, but if the processor can reorder accesses (and satisfy them from > local caches) in the absence of fences, why is this a distinction that > is worth bothering about? And how on Earth would you make such a > distinction in the context of a high-level language specification? I'm still wondering about this one. I think Doug has said that Opaque accesses are coherent but Plain accesses aren't. I guess there's also non-atomic treatment of long and double. Andrew. From dl at cs.oswego.edu Mon Jul 25 12:35:29 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 25 Jul 2016 08:35:29 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <5795D3BC.4050308@redhat.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <5795D3BC.4050308@redhat.com> Message-ID: On 07/25/2016 04:54 AM, Andrew Haley wrote: > > I'm still wondering about this one. I think Doug has said that > Opaque accesses are coherent but Plain accesses aren't. I guess > there's also non-atomic treatment of long and double. Users familiar with C/C++-11/17 will use Java opaque whenever they would use C++ atomic-relaxed, and the implementation effects should be indistinguishable. Which is not the same as saying the specs can or should be identical (if only because they deal with different languages). Which sometimes forces formal attention to distinctions otherwise not worth bothering about. Reminder of the game plan for VarHandles Javadocs: Initially use wordings that are frustratingly loose but surely not wrong with respect to the range of possible rigorous specs. Improve them when possible. -Doug From dl at cs.oswego.edu Mon Jul 25 13:24:37 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 25 Jul 2016 09:24:37 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <5795D3B6.9050407@redhat.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> Message-ID: <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> On 07/25/2016 04:54 AM, Andrew Haley wrote: > On 20/07/16 20:16, Doug Lea wrote: >> Or, in pseudo-VarHandle style using "getM" (for varying Ms): >> >> static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, "x", >> int.class); >> >> void f(Point a, Point b) { >> int r1 = PX.getM(a); >> int r2 = PX.getM(b); >> int r3 = PX.getM(a); // * >> use (r1, r2, r3); >> } >> >> Can you simplify (*) to "r3 = r1" ? It depends on M: >> * Java-Plain and C++-Plain: yes. >> * Java Opaque: no. >> * C++-Relaxed: only if a != b. >> * (And, for the record, other modes: no) >> >> This is one reason "opaque" mode is needed. > > But the processor hardware is allowed to simplify (*) to "r3 = r1" even > if Opaque is used. So, again, why does it matter? > The existence of one case where it may not matter doesn't mean that it is always OK (consider loops), and so the best practical answer for compilers is "no" here. (Arguably, it should similarly be "no" for C++-relaxed, depending in part on whether "coherence" is defined to entail progress properties by the memory system (as cache memory-system specs normally do) especially given the C++17 updates about execution progress.) The issue of merging reads is in many ways symmetric to that of inserting writes. For example when spilling registers, JVMs never store transient garbage-values into possibly-visible home locations of variables, even though there may be cases where they could. Coming up with a formal model and spec that clearly delineates legal transformations hits a lot of "little" issues along these lines. -Doug From aph at redhat.com Mon Jul 25 13:50:36 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 25 Jul 2016 14:50:36 +0100 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> Message-ID: <5796192C.3020402@redhat.com> On 25/07/16 14:24, Doug Lea wrote: > On 07/25/2016 04:54 AM, Andrew Haley wrote: >> On 20/07/16 20:16, Doug Lea wrote: >>> Or, in pseudo-VarHandle style using "getM" (for varying Ms): >>> >>> static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, "x", >>> int.class); >>> >>> void f(Point a, Point b) { >>> int r1 = PX.getM(a); >>> int r2 = PX.getM(b); >>> int r3 = PX.getM(a); // * >>> use (r1, r2, r3); >>> } >>> >>> Can you simplify (*) to "r3 = r1" ? It depends on M: >>> * Java-Plain and C++-Plain: yes. >>> * Java Opaque: no. >>> * C++-Relaxed: only if a != b. >>> * (And, for the record, other modes: no) >>> >>> This is one reason "opaque" mode is needed. >> >> But the processor hardware is allowed to simplify (*) to "r3 = r1" even >> if Opaque is used. So, again, why does it matter? > > The existence of one case where it may not matter doesn't mean > that it is always OK (consider loops), and so the best practical > answer for compilers is "no" here. Well, OK, but I'm trying to think of one case where a Java program could tell the difference between the two, and I'm coming up empty. One could argue (and I would argue) that if it's not possible to write such a test case then perhaps such a thing doesn't belong in a language specification. > Coming up with a formal model and spec that clearly delineates legal > transformations hits a lot of "little" issues along these lines. It sure does! Andrew. From dl at cs.oswego.edu Mon Jul 25 14:28:59 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 25 Jul 2016 10:28:59 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <5796192C.3020402@redhat.com> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> <5796192C.3020402@redhat.com> Message-ID: <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> On 07/25/2016 09:50 AM, Andrew Haley wrote: > Well, OK, but I'm trying to think of one case where a Java program > could tell the difference between the two, and I'm coming up empty. Oh, sorry for not including some. Using Point and PX VarHandle for Point.x: 1. Unbounded spin: while (PX.getOpaque(a) == 0) ; Note that programmers would normally use getAcquire or getVolatile here, but the question remains even if they don't. Can this be transformed into conditional infinite spin? As in: if (PX.getOpaque(a) == 0) for (;;) ; Not if coherence is defined to entail progress. 2. Bounded spin: long i = 1000; while (PX.getOpaque(a) == 0 && --i > 0) ; Can this be optimized into a no-op? What if i = Long.MaxValue? Under coherence, an implementation would have to establish some maximum bound K for merges to decide if/when to do this. In which case the best option is for the spec to say that K must be exactly one (i.e., no merges) for the sake of definitiveness. -Doug From boehm at acm.org Mon Jul 25 18:19:05 2016 From: boehm at acm.org (Hans Boehm) Date: Mon, 25 Jul 2016 11:19:05 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> <5796192C.3020402@redhat.com> <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> Message-ID: Just to make sure we're clear here. The differences between Opaque and Plain seem to be: 1. Opaque is cache coherent (i.e. single-variable sequentially consistent), just like memory_order_relaxed in C++. This means that Opaque will generate different instructions on architectures that don't promise cache coherence by default. (Currently probably just Itanium, but hardware architects seem to eventually want to apply similar optimizations to compilers.) 2. Opaque prevents compiler merging of accesses, which probably makes it more like volatile atomic in C++. (WG21/SG1 has been discussing some related restrictions on non-volatile atomics, but they haven't gone anywhere. Certainly C++17 is unlikely to say anything here. From my perspective, C++ "volatile" really seems to be more defined by processor ABIs than the language standard, for the reasons Andrew mentioned. Standard-conforming programs usually can't tell conclusively whether the rules are being followed, but low-level systems programs can.) In my mind, (2) is separable from coherence. The intent would be to strengthen (Java) volatile, etc., so they are strictly stronger than Opaque? Currently I don't think there is a guarantee that a bounded spin loop using volatiles can't be collapsed to a no-op. Presumably no reasonable implementations actually do that, however. I have no idea whether there are implementations that merge a pair of volatile loads. On Mon, Jul 25, 2016 at 7:28 AM, Doug Lea
wrote: > On 07/25/2016 09:50 AM, Andrew Haley wrote: > >> Well, OK, but I'm trying to think of one case where a Java program >> could tell the difference between the two, and I'm coming up empty. >> > > Oh, sorry for not including some. Using Point and PX VarHandle for Point.x: > > 1. Unbounded spin: > while (PX.getOpaque(a) == 0) ; > > Note that programmers would normally use getAcquire or getVolatile > here, but the question remains even if they don't. > > Can this be transformed into conditional infinite spin? As in: > if (PX.getOpaque(a) == 0) for (;;) ; > Not if coherence is defined to entail progress. > > 2. Bounded spin: > long i = 1000; > while (PX.getOpaque(a) == 0 && --i > 0) ; > > Can this be optimized into a no-op? What if i = Long.MaxValue? > Under coherence, an implementation would have to establish some > maximum bound K for merges to decide if/when to do this. > In which case the best option is for the spec to say that K must > be exactly one (i.e., no merges) for the sake of definitiveness. > > -Doug > > > > > From dl at cs.oswego.edu Mon Jul 25 19:24:48 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 25 Jul 2016 15:24:48 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> <5796192C.3020402@redhat.com> <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> Message-ID: On 07/25/2016 02:19 PM, Hans Boehm wrote: > 1. Opaque is cache coherent (i.e. single-variable sequentially consistent), just > like memory_order_relaxed in C++. > > 2. Opaque prevents compiler merging of accesses, > > In my mind, (2) is separable from coherence. This might not be the right venue to discuss whether the new C++17 sec 1.10.4 progress requirements apply to the memory system. I think they must, and that this would be consistent with common formal cache-memory-system specs. In which case you are inevitably led to the no-merge rule, as seen in the examples I posted. And even if this were not done in C++, I don't know any argument for not doing so in Java. No programmer would be happy if their bounded spin loops were allowed to be transformed into no-ops. Why allow something that literally no one wants rather than just hoping that compilers don't happen to do it? (Gratuitously editorializing, one would think that in C++, it might also be popular to adopt this interpretation, and eliminate the need to ever integrate C "volatile", or to re-spec consume mode.) -Doug From paulmck at linux.vnet.ibm.com Tue Jul 26 17:09:18 2016 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Tue, 26 Jul 2016 10:09:18 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> <5796192C.3020402@redhat.com> <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> Message-ID: <20160726170918.GA7094@linux.vnet.ibm.com> On Mon, Jul 25, 2016 at 03:24:48PM -0400, Doug Lea wrote: > On 07/25/2016 02:19 PM, Hans Boehm wrote: > > >1. Opaque is cache coherent (i.e. single-variable sequentially consistent), just > >like memory_order_relaxed in C++. > > > >2. Opaque prevents compiler merging of accesses, > > > >In my mind, (2) is separable from coherence. > > This might not be the right venue to discuss whether the new C++17 sec 1.10.4 > progress requirements apply to the memory system. I think they must, and > that this would be consistent with common formal cache-memory-system specs. > > In which case you are inevitably led to the no-merge rule, as seen in the > examples I posted. > > And even if this were not done in C++, I don't know any argument for > not doing so in Java. No programmer would be happy if their bounded > spin loops were allowed to be transformed into no-ops. Why allow > something that literally no one wants rather than just hoping that > compilers don't happen to do it? > > (Gratuitously editorializing, one would think that in C++, > it might also be popular to adopt this interpretation, and > eliminate the need to ever integrate C "volatile", or to > re-spec consume mode.) Yes and no. If I am working on a low-level synchronization primitive, then yes, I really do want the system to do -exactly- what I tell it to, no more, no less. But in higher-level code, I would likely be quite happy for the compiler to fuse accesses, if it could do so without violating the memory model. Thanx, Paul From boehm at acm.org Tue Jul 26 19:26:36 2016 From: boehm at acm.org (Hans Boehm) Date: Tue, 26 Jul 2016 12:26:36 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com> <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com> <57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu> <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com> <578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com> <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com> <578E92D9.80201@redhat.com> <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> <5796192C.3020402@redhat.com> <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> Message-ID: I'm not quite sure which document you're referring to for C++. The latest draft (N4604 or N4606) reorganized section 1.10. 1.10.2 discusses forward progress in a lot more detail than before. But I think the only directly relevant statement here is p18, which was there before: "An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time." Recall that "should" (as opposed to e.g. "shall") is ISO standardese for a non-binding recommendation. The reason I haven't pushed for something stronger is that I don't think hardware specifications consistently contain the corresponding guarantees, which would put language implementers in a weird position. But that could probably be argued either way. This is now separate from the core memory model in 1.10.1. I think the "no merge" rule is not really formally specifiable, since it's a compiler-only constraint that can't be tested by a conforming program. We could specify a "no infinite merge" rule that handles the unbounded spin case on reasonable hardware. As I'm occasionally reminded by my WG21 colleagues, it's not clear that the extreme cases here are worth spending too much time on, since nobody is going to use an implementation that gets them wrong, no matter what we say. The tricky and more interesting cases are probably something like: l.my_spin_lock(); // Implemented with acquire CAS if (...) { ... l.my_spin_unlock(); // release store } else { ... l.my_spin_unlock(); ... // No synchronization; Known to terminate in bounded time } Can I move the unlock release store out of the conditional to merge the two stores? On Mon, Jul 25, 2016 at 12:24 PM, Doug Lea
wrote: > On 07/25/2016 02:19 PM, Hans Boehm wrote: > > 1. Opaque is cache coherent (i.e. single-variable sequentially >> consistent), just >> like memory_order_relaxed in C++. >> >> 2. Opaque prevents compiler merging of accesses, >> >> In my mind, (2) is separable from coherence. >> > > This might not be the right venue to discuss whether the new C++17 sec > 1.10.4 > progress requirements apply to the memory system. I think they must, and > that this would be consistent with common formal cache-memory-system specs. > > In which case you are inevitably led to the no-merge rule, as seen in the > examples I posted. > > And even if this were not done in C++, I don't know any argument for > not doing so in Java. No programmer would be happy if their bounded > spin loops were allowed to be transformed into no-ops. Why allow > something that literally no one wants rather than just hoping that > compilers don't happen to do it? > > (Gratuitously editorializing, one would think that in C++, > it might also be popular to adopt this interpretation, and > eliminate the need to ever integrate C "volatile", or to > re-spec consume mode.) > > -Doug > > > From dl at cs.oswego.edu Tue Jul 26 20:03:44 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 26 Jul 2016 16:03:44 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <20160726170918.GA7094@linux.vnet.ibm.com> References: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> <5796192C.3020402@redhat.com> <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> <20160726170918.GA7094@linux.vnet.ibm.com> Message-ID: <10809e13-82eb-70cd-553a-b4ed4890ce82@cs.oswego.edu> Moving ever further away from the alleged subject line... On 07/26/2016 01:09 PM, Paul E. McKenney wrote: > On Mon, Jul 25, 2016 at 03:24:48PM -0400, Doug Lea wrote: >> (Gratuitously editorializing, one would think that in C++, >> it might also be popular to adopt this interpretation, and >> eliminate the need to ever integrate C "volatile", or to >> re-spec consume mode.) > > Yes and no. > > If I am working on a low-level synchronization primitive, then yes, > I really do want the system to do -exactly- what I tell it to, no more, > no less. > > But in higher-level code, I would likely be quite happy for the compiler > to fuse accesses, if it could do so without violating the memory model. > The C++-relaxed spec definitely shows this tension. Sometimes people want it to mean just "plain, but don't tear words". Which is not the same as what you'd otherwise spec as "the cheapest mode for a thread-safe variable respecting coherence". In Java, with the availability of "Plain" accesses even for volatiles, and access-atomicity for references and <=32bit scalars, there is little motivation to compromise for Opaque mode. In which case, the main premise is that when users use non-plain access modes for reads (similarly, but less interestingly writes), they are expressing that they intend to handle all of the possible program traces that might result if two subsequent reads see different values. So implementations cannot be allowed to merge reads in ways that are sure to reduce the number of possible program traces. Again, this is symmetric to the idea that implementations cannot be allowed to add writes (e.g., duplicate them) in ways that are sure to increase the number of possible program traces. It is surely possible to introduce a formalization of traces that rigorously states both constraints. But it is not easy to define an underlying trace model that covers practical execution issues. So in a language spec, it may be preferable to just say no merged reads and no added writes for atomics. Which is what C++ and Java both do now for no-added-writes. Or, it may be a better idea to leave the trace-based requirements incompletely formalized, which should have the same practical effect. Or even better (but not soon) agree upon some formalism. -Doug From boehm at acm.org Wed Jul 27 17:55:44 2016 From: boehm at acm.org (Hans Boehm) Date: Wed, 27 Jul 2016 10:55:44 -0700 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: <10809e13-82eb-70cd-553a-b4ed4890ce82@cs.oswego.edu> References: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> <5796192C.3020402@redhat.com> <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> <20160726170918.GA7094@linux.vnet.ibm.com> <10809e13-82eb-70cd-553a-b4ed4890ce82@cs.oswego.edu> Message-ID: Peter Dimov gave a good example in C++ discussions for wanting merging of atomic operations: Reference counting. If you see two reference count increments in a row, you clearly want to merge the underlying fetch_and_add operations. (I say that in spite of the fact that I'm not a fan of explicit reference counting, and am currently spending way too much time debugging reference-counting code. But it seems unavoidable at times, occasionally even in Java, and pervasive in C++.) I don't understand Doug's statement: "So implementations cannot be allowed to merge reads in ways that are sure to reduce the number of possible program traces." We have hardware microarchitectures that do this on a grand scale by transactionally committing a bunch of memory operations in bulk (cf. http://dl.acm.org/citation.cfm?doid=1610252.1610271), so many intermediate states are invisible. In general the rules are that we cannot add traces, but removing possible traces is entirely fine. My (failed) proposal to the C++ committee was to restrict software transformations informally to be comparable to the hardware effects we observe anyway. I think that is the strongest property code that deals only with conventional memory (not device registers) can reliably test for. On Tue, Jul 26, 2016 at 1:03 PM, Doug Lea
wrote: > > Moving ever further away from the alleged subject line... > > > On 07/26/2016 01:09 PM, Paul E. McKenney wrote: > >> On Mon, Jul 25, 2016 at 03:24:48PM -0400, Doug Lea wrote: >> >>> (Gratuitously editorializing, one would think that in C++, >>> it might also be popular to adopt this interpretation, and >>> eliminate the need to ever integrate C "volatile", or to >>> re-spec consume mode.) >>> >> >> Yes and no. >> >> If I am working on a low-level synchronization primitive, then yes, >> I really do want the system to do -exactly- what I tell it to, no more, >> no less. >> >> But in higher-level code, I would likely be quite happy for the compiler >> to fuse accesses, if it could do so without violating the memory model. >> >> > The C++-relaxed spec definitely shows this tension. Sometimes people > want it to mean just "plain, but don't tear words". Which is not the > same as what you'd otherwise spec as "the cheapest mode for a > thread-safe variable respecting coherence". In Java, > with the availability of "Plain" accesses even for volatiles, > and access-atomicity for references and <=32bit scalars, > there is little motivation to compromise for Opaque mode. > > In which case, the main premise is that when users use non-plain > access modes for reads (similarly, but less interestingly writes), they > are expressing that they intend to handle all of the possible program > traces that might result if two subsequent reads see different values. > So implementations cannot be allowed to merge reads in ways that are > sure to reduce the number of possible program traces. > > Again, this is symmetric to the idea that implementations cannot be > allowed to add writes (e.g., duplicate them) in ways that are sure to > increase the number of possible program traces. > > It is surely possible to introduce a formalization of traces that > rigorously states both constraints. But it is not easy to define an > underlying trace model that covers practical execution issues. So in a > language spec, it may be preferable to just say no merged reads and no > added writes for atomics. Which is what C++ and Java both do now for > no-added-writes. Or, it may be a better idea to leave the trace-based > requirements incompletely formalized, which should have the same > practical effect. Or even better (but not soon) agree upon some formalism. > > -Doug > > From dl at cs.oswego.edu Wed Jul 27 20:21:48 2016 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 27 Jul 2016 16:21:48 -0400 Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS In-Reply-To: References: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com> <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu> <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu> <5795D3B6.9050407@redhat.com> <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu> <5796192C.3020402@redhat.com> <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu> <20160726170918.GA7094@linux.vnet.ibm.com> <10809e13-82eb-70cd-553a-b4ed4890ce82@cs.oswego.edu> Message-ID: <2552a774-eecf-83a5-72ee-b077cad9ed02@cs.oswego.edu> I should have known better than to invite debate about C++ relaxed etc specs here. Sorry. These details of how they are spec'ed in C++17 don't seem to and should not matter to us. It's probably best to move that discussion elsewhere. To summarize though, C/C++ effectively has 6 modes: plain, relaxed, consume, acquire/release, seq_cst, and linux pseudo-mode *(volatile*)relaxed. Java (jdk9) effectively has only 4: plain, opaque, acquire/release, volatile. Four appear to be enough -- modulo other language differences, for every C++ construction, there is an applicable Java construction that preserves essential properties and is expected to be implemented in the same way in common use cases. And to make this work out, we incorporate memory system progress properties. About which... On 07/26/2016 03:26 PM, Hans Boehm wrote: > [C++17]: "An implementation should ensure that the last value (in > modification order) assigned by an atomic or synchronization operation will > become visible to all other threads in a finite period of time." > > The reason I haven't pushed for something stronger is that I don't think > hardware specifications consistently contain the corresponding guarantees, > which would put language implementers in a weird position. But that could > probably be argued either way. Yes, just to clarify that other way: If a memory system not observing progress guarantees were shipped, the designers would be blamed for insufficiently specifying and testing its properties. To enable testing, "eventually Predicate P" specs are normally phrased as: Every implementation must pick and publish a K such that P always holds within K units (usually clock cycles). The same holds in software, but without such convenient units, sometimes leading to yet more arbitrary constants. > I think the "no merge" rule is not really formally specifiable, since it's a > compiler-only constraint that can't be tested by a conforming program. Testing is not impossible but is less portable. For a fun one, someone could try removing the volatile cast from the linux READ_ONCE macro, build, and check for test suite bugs. On 07/27/2016 01:55 PM, Hans Boehm wrote: > Peter Dimov gave a good example in C++ discussions for wanting merging of > atomic operations: Reference counting. If you see two reference count > increments in a row, you clearly want to merge the underlying fetch_and_add > operations. Is allowing this worth the loss in ability to prevent it in unwanted cases, e.g., when a huge but finite number of these were otherwise postponed in a long-lived but bounded loop? In other words, it is possible to write a combining-based ref-counter if you need one, but not to write a non-combining one when you don't. (And again, whatever the answer, Java vs C++ differences in such cases are not too important.) > I don't understand Doug's statement: "So implementations cannot be allowed to > merge reads in ways that are sure to reduce the number of possible program > traces." We have hardware microarchitectures that do this on a grand scale > by transactionally committing a bunch of memory operations in bulk This is getting increasingly far afield, but within (most? all?) definitions of transactions, multiple reads of the same variable are *required* to take the same value (act as if merged), which is not exactly the same as in any access mode. So something special would need to be said anyway about what happens especially for atomics/volatiles. (On the other hand, Concurrent code with multiple reads of the same atomic/volatile variable within a method or transaction is already highly suspicious and a-priori unlikely to be correct. So I agree with Hans that most of the issues we are discussing cover cases that most programmers should never encounter.) -Doug