From nitsanw at yahoo.com  Fri Jul  8 11:08:01 2016
From: nitsanw at yahoo.com (Nitsan Wakart)
Date: Fri, 8 Jul 2016 11:08:01 +0000 (UTC)
Subject: [jmm-dev] Optimizing external actions in the JMM
In-Reply-To: <CAMiUf7dOyNnLDRJTc7tQw1FjMzZ+HjfiaXou9LV4g7qNv4rY-w@mail.gmail.com>
References: <CAMiUf7dOyNnLDRJTc7tQw1FjMzZ+HjfiaXou9LV4g7qNv4rY-w@mail.gmail.com>
Message-ID: <159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com>

Given:
"...
where f is some external action that the compiler understands.  If the
compiler knows `f` always returns 42 and has no other effect, can it
optimize ThreadA to
...
thereby introducing a OOTA-like value of 42 into the system?"
Why is this OOTA?
The thing is you define:
"external action that... always returns 42 and has no other effect"
Which according to:
"An external action is an action that may be observable outside of an execution,
and has a result based on an environment external to the execution."
(from https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.2)
is not an external action.

To make a function an "external action" we need to satisfy:
1. "may be observable outside of an execution" AND
2. "has a result based on an environment external to the execution"

The concerns you raise around off-heap memory handling boil down to:
- Unsafe.put(long address,*)/Unsafe.put(null, long address,*): don't fullfil 2
- Unsafe.get*(long address)/Unsafe.get*(null, long address): don't fullfil 1
The volatile accesses to offheap are not covered by JMM AFAIK, but are relied
upon by many to mean the same as their heap counter parts.

From sanjoy at playingwithpointers.com  Tue Jul 12 07:35:47 2016
From: sanjoy at playingwithpointers.com (Sanjoy Das)
Date: Tue, 12 Jul 2016 00:35:47 -0700
Subject: [jmm-dev] Optimizing external actions in the JMM
In-Reply-To: <159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com>
References: <CAMiUf7dOyNnLDRJTc7tQw1FjMzZ+HjfiaXou9LV4g7qNv4rY-w@mail.gmail.com>
	<159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <57849DD3.6060006@playingwithpointers.com>

Hi Nitsan,

Thank you for replying!

Nitsan Wakart wrote:
 > Given:
 > "...
 > where f is some external action that the compiler understands.  If the
 > compiler knows `f` always returns 42 and has no other effect, can it
 > optimize ThreadA to
 > ...
 > thereby introducing a OOTA-like value of 42 into the system?"
 > Why is this OOTA?

It isn't OOTA intuitively, and I'm trying to justify its non-OOTA ness
by the JMM rules.

 > The thing is you define:
 > "external action that... always returns 42 and has no other effect"

I didn't mean to say that it "always returns 42 and has no other
effect" by the spec, but that the compiler knows it "always returns 42
and has no other effect" by some external knowledge it has about `f`
(and perhaps the environment).

For instance, say f(x) was "return Unix_open("/home/foo/" + x)", and
the JIT knew that since the process is running under user "bar", the
call to open would always return -1 and not have any other external
effect.  Would it be okay then to introduce the OOTA like value -1 by
replacing the call to Unix_fopen with -1?  I'd like to say yes, but I
can't justify -1 for the same reason as we couldn't justify 42 earlier
-- there isn't enough information in the trace to infer f(0) is -1 --
the trace will only state that f(-1) is -1.

 > Which according to:
 > "An external action is an action that may be observable outside of an execution,
 > and has a result based on an environment external to the execution."
 > (from https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.2)
 > is not an external action.
 >
 > To make a function an "external action" we need to satisfy:
 > 1. "may be observable outside of an execution" AND
 > 2. "has a result based on an environment external to the execution"
 >
 > The concerns you raise around off-heap memory handling boil down to:
 > - Unsafe.put(long address,*)/Unsafe.put(null, long address,*): don't fullfil 2
 > - Unsafe.get*(long address)/Unsafe.get*(null, long address): don't fullfil 1

Under this interpretation the Unsafe.get*() intrinsics have a corner
case -- if they're being called on a mmapped file (or uninitialized
memory, even) then the value returned by them is not specified by the
memory model (to be precise: whether a certain value returned by the
call is correct is not decidable in the JMM).  This hints that they
need to be modeled as external actions; moreover, I think that (1)
really means "*may* be observable" and not "*has to be* observable".

-- Sanjoy

 > The volatile accesses to offheap are not covered by JMM AFAIK, but are relied
 > upon by many to mean the same as their heap counter parts.

From nitsanw at yahoo.com  Tue Jul 12 08:06:11 2016
From: nitsanw at yahoo.com (Nitsan Wakart)
Date: Tue, 12 Jul 2016 08:06:11 +0000 (UTC)
Subject: [jmm-dev] Optimizing external actions in the JMM
In-Reply-To: <57849DD3.6060006@playingwithpointers.com>
References: <CAMiUf7dOyNnLDRJTc7tQw1FjMzZ+HjfiaXou9LV4g7qNv4rY-w@mail.gmail.com>
	<159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com>
	<57849DD3.6060006@playingwithpointers.com>
Message-ID: <1983003876.2333894.1468310771326.JavaMail.yahoo@mail.yahoo.com>

>> The thing is you define:

>> "external action that... always returns 42 and has no other effect"
> I didn't mean to say that it "always returns 42 and has no other
> effect" by the spec, but that the compiler knows it "always returns 42
> and has no other effect" by some external knowledge it has about `f`
> (and perhaps the environment).


If the JIT compiler KNOWS, then it knows, job done. Same way it knows Math.pow is not external. 
>> - Unsafe.get*(long address)/Unsafe.get*(null, long address): don't fullfil 1
> Under this interpretation the Unsafe.get*() intrinsics have a corner
> case -- if they're being called on a mmapped file (or uninitialized
> memory, even) then the value returned by them is not specified by the
> memory model (to be precise: whether a certain value returned by the
> call is correct is not decidable in the JMM).  This hints that they
> need to be modeled as external actions; moreover, I think that (1)
> really means "*may* be observable" and not "*has to be* observable".


Where behaviour is undefined, it is down to precedent and sensibility... How is a 'read' "observable outside the execution"? Consider the following code:
----
long address = Unsafe.allocate(1024);
int i = 1;
Unsafe.putInt(address,i);
return Unsafe.getInt(address) == i; // might as well return true;
-----

From sanjoy at playingwithpointers.com  Tue Jul 12 08:34:51 2016
From: sanjoy at playingwithpointers.com (Sanjoy Das)
Date: Tue, 12 Jul 2016 01:34:51 -0700
Subject: [jmm-dev] Optimizing external actions in the JMM
In-Reply-To: <1983003876.2333894.1468310771326.JavaMail.yahoo@mail.yahoo.com>
References: <CAMiUf7dOyNnLDRJTc7tQw1FjMzZ+HjfiaXou9LV4g7qNv4rY-w@mail.gmail.com>
	<159972590.772714.1467976081443.JavaMail.yahoo@mail.yahoo.com>
	<57849DD3.6060006@playingwithpointers.com>
	<1983003876.2333894.1468310771326.JavaMail.yahoo@mail.yahoo.com>
Message-ID: <5784ABAB.2010307@playingwithpointers.com>

Hi Nitsan,

Nitsan Wakart wrote:
 >>> The thing is you define:
 >
 >>> "external action that... always returns 42 and has no other effect"
 >> I didn't mean to say that it "always returns 42 and has no other
 >> effect" by the spec, but that the compiler knows it "always returns 42
 >> and has no other effect" by some external knowledge it has about `f`
 >> (and perhaps the environment).
 >
 >
 > If the JIT compiler KNOWS, then it knows, job done. Same way it knows Math.pow is not external.

I'm trying to justify precisely the "job done" bit. :) Specifically,
can it replace f(x) with 42 even if it knows that given the current
environment the "external action" f(x) always returns 42 and has no
other effect?

Math.pow(a, b) is fundamentally different than f(x) =
Unix_open("/home/foo" + x) -- it can be evaluated for any a, b
independent of the environment.  This isn't true for f(x) as defined:
in the previous example the JIT knows that f(x) returns -1 and has no
other effect _because_ it knows that the process is being run as user
"bar".  This information is not present in the execution trace, so we
can't "evaluate" f(0) to justify the write of -1 to y.

 >>> - Unsafe.get*(long address)/Unsafe.get*(null, long address): don't fullfil 1
 >> Under this interpretation the Unsafe.get*() intrinsics have a corner
 >> case -- if they're being called on a mmapped file (or uninitialized
 >> memory, even) then the value returned by them is not specified by the
 >> memory model (to be precise: whether a certain value returned by the
 >> call is correct is not decidable in the JMM).  This hints that they
 >> need to be modeled as external actions; moreover, I think that (1)
 >> really means "*may* be observable" and not "*has to be* observable".
 >
 >
 > Where behaviour is undefined, it is down to precedent and sensibility...

That's fine (especially given that s.m.Unsafe is an internal API), but
how do you "plug in" the precedent and sensibility into the rest of
the memory model?  IOW, in (say)

Thread1:
   addr = mmap_file();
   r1 = unsafe.getByte(addr);
   this.y = r1
   this.volatileF = r1

Thread2:
   r2 = this.volatileF;
   r3 = this.y

when trying to prove things about the r3 + r2 (say) how do you model
r1?  Given that r1's value cannot be described by the JMM, it seems
reasonable to me to give it a sensible value intuitively, but in the
JMM model it as an external action that happens to return that
sensible value.

 > How is a 'read' "observable outside the execution"?

Its usually not, it sounded like you were interpreting "observable
outside of execution" as a necessary condition for an action to be a
side effect, when I think observability is sufficient but not
necessary for an action to be considered an external action.  That is
likely the reason for saying "may be observable outside of an
execution" and not "has to be observable outside of an execution".

 > Consider the following code:
 > ----
 > long address = Unsafe.allocate(1024);
 > int i = 1;
 > Unsafe.putInt(address,i);
 > return Unsafe.getInt(address) == i; // might as well return true;
 > -----

From aph at redhat.com  Tue Jul 12 10:13:10 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 12 Jul 2016 11:13:10 +0100
Subject: [jmm-dev] Make load/store of 64-bit long and double atomic
In-Reply-To: <e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org>
	<e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
Message-ID: <5784C2B6.8030500@redhat.com>

On 12/07/16 03:21, David Holmes wrote:

> This is not a hotspot issue but a Java programming language issue.
> Hotspot would never provide a flag that changes the Java programming
> language semantics. The performance impact of all-accesses-are-
> atomic on 32-bit systems is considerable

Not necessarily.  There are significant performance implications on
some 32-bit systems, but by no means all.  And such 32-bit systems are
getting rarer -- IMVHO.

> so as long as we support 32-bit I don't see this happening
> (regardless of what may be discussed on jmm-dev). It would be
> unconscionable to have different semantics on 32-bit and 64-bit so
> that is not an option either.

I wonder if a better solution to this might be to make
VarHandle.{get,set}Opaque atomic on all primitive types.  This gives
us a way to get atomic operations on 32-bit machines without the
overhead of volatile accesses.  Being able to read a 64-bit counter
atomically is very useful.

C++ says:

[ Note: Atomic operations specifying memory_order_relaxed are relaxed
with respect to memory ordering.  Implementations must still guarantee
that any given atomic access to a particular atomic object be
indivisible with respect to all other atomic accesses to that
object. ? end note ]

But Java says:

Unless stated otherwise in the documentation of a factory method, the
access modes get and set (if supported) provide atomic access for
reference types and all primitives types, with the exception of long
and double on 32-bit platforms.

I wonder if this divergence between Java and C++ is deliberate.  It
seems wrong to me.

Andrew.

From aleksey.shipilev at oracle.com  Tue Jul 12 10:19:20 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Tue, 12 Jul 2016 13:19:20 +0300
Subject: [jmm-dev] Make load/store of 64-bit long and double atomic
In-Reply-To: <5784C2B6.8030500@redhat.com>
References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org>
	<e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
	<5784C2B6.8030500@redhat.com>
Message-ID: <5784C428.2090401@oracle.com>

On 07/12/2016 01:13 PM, Andrew Haley wrote:
> I wonder if a better solution to this might be to make
> VarHandle.{get,set}Opaque atomic on all primitive types.  This gives
> us a way to get atomic operations on 32-bit machines without the
> overhead of volatile accesses.  Being able to read a 64-bit counter
> atomically is very useful.

VarHandle.{get,set}Opaque is single-copy atomic for all primitive types.
Pretty much like C++ std::atomic(..., mem_ord_relaxed).

Thanks,
-Aleksey


From aph at redhat.com  Tue Jul 12 10:22:35 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 12 Jul 2016 11:22:35 +0100
Subject: [jmm-dev] Make load/store of 64-bit long and double atomic
In-Reply-To: <5784C428.2090401@oracle.com>
References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org>
	<e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
	<5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com>
Message-ID: <5784C4EB.8090207@redhat.com>

On 12/07/16 11:19, Aleksey Shipilev wrote:
> On 07/12/2016 01:13 PM, Andrew Haley wrote:
>> I wonder if a better solution to this might be to make
>> VarHandle.{get,set}Opaque atomic on all primitive types.  This gives
>> us a way to get atomic operations on 32-bit machines without the
>> overhead of volatile accesses.  Being able to read a 64-bit counter
>> atomically is very useful.
> 
> VarHandle.{get,set}Opaque is single-copy atomic for all primitive types.
> Pretty much like C++ std::atomic(..., mem_ord_relaxed).

So what does

Unless stated otherwise in the documentation of a factory method, the
access modes get and set (if supported) provide atomic access for
reference types and all primitives types, with the exception of long
and double on 32-bit platforms.

refer to?  And where foes the spec say that VarHandle.{get,set}Opaque is
atomic?

Andrew.


From aleksey.shipilev at oracle.com  Tue Jul 12 11:30:51 2016
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Tue, 12 Jul 2016 14:30:51 +0300
Subject: [jmm-dev] Make load/store of 64-bit long and double atomic
In-Reply-To: <5784C4EB.8090207@redhat.com>
References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org>
	<e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
	<5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com>
	<5784C4EB.8090207@redhat.com>
Message-ID: <5784D4EB.60904@oracle.com>

On 07/12/2016 01:22 PM, Andrew Haley wrote:
> On 12/07/16 11:19, Aleksey Shipilev wrote:
>> On 07/12/2016 01:13 PM, Andrew Haley wrote:
>>> I wonder if a better solution to this might be to make
>>> VarHandle.{get,set}Opaque atomic on all primitive types.  This gives
>>> us a way to get atomic operations on 32-bit machines without the
>>> overhead of volatile accesses.  Being able to read a 64-bit counter
>>> atomically is very useful.
>>
>> VarHandle.{get,set}Opaque is single-copy atomic for all primitive types.
>> Pretty much like C++ std::atomic(..., mem_ord_relaxed).
> 
> So what does
> 
> Unless stated otherwise in the documentation of a factory method, the
> access modes get and set (if supported) provide atomic access for
> reference types and all primitives types, with the exception of long
> and double on 32-bit platforms.
> 
> refer to? 

That's for VarHandle.{get|set}, not for VarHandle.{get|set}Opaque.
Access mode "get" is different from access mode "getOpaque".

> And where foes the spec say that VarHandle.{get,set}Opaque is
> atomic?

Nowhere yet. I tried to capture atomicity in Javadoc like this:
 http://mail.openjdk.java.net/pipermail/jmm-dev/2016-June/000282.html

...but it's not yet there.

Thanks,
-Aleksey


From paul.sandoz at oracle.com  Tue Jul 12 12:22:17 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 12 Jul 2016 14:22:17 +0200
Subject: [jmm-dev] Make load/store of 64-bit long and double atomic
In-Reply-To: <5784D4EB.60904@oracle.com>
References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org>
	<e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
	<5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com>
	<5784C4EB.8090207@redhat.com> <5784D4EB.60904@oracle.com>
Message-ID: <664572BB-9859-4019-B722-1E1FB3D8A4D6@oracle.com>


> On 12 Jul 2016, at 13:30, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:
> 
> On 07/12/2016 01:22 PM, Andrew Haley wrote:
>> On 12/07/16 11:19, Aleksey Shipilev wrote:
>>> On 07/12/2016 01:13 PM, Andrew Haley wrote:
>>>> I wonder if a better solution to this might be to make
>>>> VarHandle.{get,set}Opaque atomic on all primitive types.  This gives
>>>> us a way to get atomic operations on 32-bit machines without the
>>>> overhead of volatile accesses.  Being able to read a 64-bit counter
>>>> atomically is very useful.
>>> 
>>> VarHandle.{get,set}Opaque is single-copy atomic for all primitive types.
>>> Pretty much like C++ std::atomic(..., mem_ord_relaxed).
>> 
>> So what does
>> 
>> Unless stated otherwise in the documentation of a factory method, the
>> access modes get and set (if supported) provide atomic access for
>> reference types and all primitives types, with the exception of long
>> and double on 32-bit platforms.
>> 
>> refer to?
> 
> That's for VarHandle.{get|set}, not for VarHandle.{get|set}Opaque.
> Access mode "get" is different from access mode "getOpaque".
> 
>> And where foes the spec say that VarHandle.{get,set}Opaque is
>> atomic?
> 
> Nowhere yet. I tried to capture atomicity in Javadoc like this:
> http://mail.openjdk.java.net/pipermail/jmm-dev/2016-June/000282.html
> 
> ...but it's not yet there.
> 

It does state it here:

* Read/write access modes (if supported), with the exception of
* {@code get} and {@code set}, provide atomic access for
* reference types and all primitive types.

Before the ?unless stated otherwise?? quoted above.

As part of the sweep through the specification we should make that clearer.

Paul.

From dl at cs.oswego.edu  Tue Jul 12 12:29:52 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Tue, 12 Jul 2016 08:29:52 -0400
Subject: [jmm-dev] Make load/store of 64-bit long and double atomic
In-Reply-To: <5784E0D0.70009@oracle.com>
References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org>
	<e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
	<DB9CC62F-1107-4BD3-89FA-8D3276B211AD@computer.org>
	<5784E0D0.70009@oracle.com>
Message-ID: <5784E2C0.1070408@cs.oswego.edu>

On 07/12/2016 08:21 AM, Aleksey Shipilev wrote:
> On 07/12/2016 02:50 PM, John Crowley wrote:
>>> On Jul 11, 2016, at 10:21 PM, David Holmes
>>> <david.holmes at oracle.com> wrote:
>>> On 7/07/2016 9:29 PM, John Crowley wrote:
>>>> Would like to make a suggestion re the JVM and non-atomic
>>>> load/store for long and double values since both are 64-bit.
>>>> (Sec 17.7 of the JLS version 8 - have not been able to find a JLS
>>>> V9 yet). Did some searching through JSRs and mailing lists, but
>>>> did not see this addressed - please send me a link if it has been
>>>> and I just missed it.
>
> In Hotspot, there is an experimental -XX:+AlwaysAtomicAccesses flag that
> turns long/double accesses to be single-copy atomic. Not sure it works
> properly in interpreter though. You may build on that.
>
> The sound counter-argument that I heard against enabling long/double
> atomic accesses is the interaction with value types. If we make all
> present types access-atomic, and have to retract that back when
> larger-than-machine-word value types come in, that would be bad. Since
> this long/double spec change is at best Java 10, we better off seeing
> how it plays out with value types.
>

Yes, thanks. That's an accurate synopsis of discussions on the jmm-dev
list in 2014. (http://mail.openjdk.java.net/pipermail/jmm-dev/)

In the mean time, we do need to make a clean-up pass on VarHandle
javadocs/specs, that now include some remnants of previous designs
and are missing a few clarifications.

-Doug


From aph at redhat.com  Tue Jul 12 12:31:11 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 12 Jul 2016 13:31:11 +0100
Subject: [jmm-dev] Make load/store of 64-bit long and double atomic
In-Reply-To: <664572BB-9859-4019-B722-1E1FB3D8A4D6@oracle.com>
References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org>
	<e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
	<5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com>
	<5784C4EB.8090207@redhat.com> <5784D4EB.60904@oracle.com>
	<664572BB-9859-4019-B722-1E1FB3D8A4D6@oracle.com>
Message-ID: <5784E30F.9010700@redhat.com>

On 12/07/16 13:22, Paul Sandoz wrote:
> It does state it here:
> 
> * Read/write access modes (if supported), with the exception of
> * {@code get} and {@code set}, provide atomic access for
> * reference types and all primitive types.
> 
> Before the ?unless stated otherwise?? quoted above.
> 
> As part of the sweep through the specification we should make that clearer.

It's very hard to understand what is going on.  compareAndExchange()
has stronger ordering semantics than compareAndExchangeRelease() but
set() has weaker ordering semantics than setRelease().  We're making a
real mess that nobody is going to thank us for.

Andrew.

From paul.sandoz at oracle.com  Tue Jul 12 13:12:45 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 12 Jul 2016 15:12:45 +0200
Subject: [jmm-dev] Make load/store of 64-bit long and double atomic
In-Reply-To: <5784E30F.9010700@redhat.com>
References: <4E29117D-735B-445E-8C57-F047E5B00712@computer.org>
	<e9ad3104-0a71-e853-3d86-c1787c964dac@oracle.com>
	<5784C2B6.8030500@redhat.com> <5784C428.2090401@oracle.com>
	<5784C4EB.8090207@redhat.com> <5784D4EB.60904@oracle.com>
	<664572BB-9859-4019-B722-1E1FB3D8A4D6@oracle.com>
	<5784E30F.9010700@redhat.com>
Message-ID: <57190EA9-92B5-4276-BB9B-33C1AA65D133@oracle.com>


> On 12 Jul 2016, at 14:31, Andrew Haley <aph at redhat.com> wrote:
> 
> On 12/07/16 13:22, Paul Sandoz wrote:
>> It does state it here:
>> 
>> * Read/write access modes (if supported), with the exception of
>> * {@code get} and {@code set}, provide atomic access for
>> * reference types and all primitive types.
>> 
>> Before the ?unless stated otherwise?? quoted above.
>> 
>> As part of the sweep through the specification we should make that clearer.
> 
> It's very hard to understand what is going on.  compareAndExchange()
> has stronger ordering semantics than compareAndExchangeRelease() but
> set() has weaker ordering semantics than setRelease().  We're making a
> real mess that nobody is going to thank us for.
> 

It?s an awkward situation. Doug previously mentioned in an email on core-libs:

>> No matter which conventions you choose here, some people will be
>> unhappy or confused. The current scheme seems to make the current users
>> of both Unsafe and AtomicX least unhappy or confused.

  http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-July/042249.html.


You mentioned in a previous email the possibility of using doing something similar to C++ atomics and pass in the the memory order characteristics as a constant.

We did mull that over a little early on, one concern was the performance aspects. It might possible to pull that off with an enum and implementing the if/else in Java so it constant folds enabling reuse of existing intrinsics and simplifying the addition of new ones. That would be a significant deviation from the API/implementation/tests at this stage in the 9 release schedule. I suppose it?s something we could support later on as a complementary feature (e.g. using the ?explicit? suffix in the method names).

Paul.

From john.r.rose at oracle.com  Fri Jul 15 02:09:05 2016
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 14 Jul 2016 19:09:05 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
Message-ID: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>

I think we are missing an important opportunity by not supporting single-bit RMW operations in VarHandles.

In particular, the x86 "bts" (bit-test-and-test) "btr" (bit-test-and-reset) and "btc" (bit-test-and-clear) are sometimes the right way to modify some data structure, when the alternative is a load and "cmpxchg" in a loop.  The overall costs are probably the same in the best case, but the loop-based idiom has some danger (relative to the single-instruction idiom) of costs stemming from larger code size.

At the JIT level, one can hope that the CAS-based idiom (coded in the current VH API) will be recognized and optimized to a single instruction on x86, but there is a strong risk that this will fail.  It's safer to specify the operation explicitly using a separate VH method.

The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened.  An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable.

(In a rather deep sense, getAndAdd is less powerful than testAndSetBit or getAndBitwiseOr,
because op+ is bijective in each argument, while op| is idempotent.  This means that
you can operate bitwise on a structure in such a way that your operation disappears
when the structure is already in some state you are pushing it towards.  Of course,
you also need a way to "exchange" in the previous value, atomically.)

For a parallel discussion among the gcc folk, where they are working on pattern matching
of CAS to BTS, see:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

So, here is a specific proposal:
  https://bugs.openjdk.java.net/browse/JDK-8161444
  VarHandles should provide access bitwise atomics

In a nutshell, testAndSetBitAcquire behaves as if it were built on top of
compareAndExchangeAcquire (which it may on some platforms).
The outgoing value parameter is not a value but a bit position within
the memory value (zero = LSB, range-checked).  On x86 it compiles
to "lock;bts" with appropriate fencing.  It is a great candidate for
building a mutex-enter operation.

For symmetry, I'm also proposing testAndClearBitRelease (or should it
also be Acquire?) and flipAndGetBit (with Volatile ordering since it is likely
to be stand-alone).  But it's just symmetry; testAndSetBitAcquire is the
important part. 

What do folks think?  Is "lock;bts" useless for some reason?  (Note that
the lock prefix is interpreted by modern x86's as a cache transaction
request, just like xadd, with no external signal.)  Or are there no significant
single-bit concurrent structures out there?  I know of two:  SeqLocks and
AtomicMarkableReference (if/when the JVM embraces it).

More background: The SeqLocks are likely to be important for value types
(when they are too large for native hardware atomics, and must be accessed
atomically).  Note that many uncontended value types will still need to be used
with SeqLocks, when structure-tearing must be prevented for one reason
or another.

(Yet more background:  Non-tearability can be demanded by a value type's definition.
If this were not possible, values could not embody invariants that affect security.)

Thanks,
? John

P.S.  For the record here are the important spec. details:

/** 
* Atomically loads the bit at the specified {@code index} in a variable with 
* the memory semantics of {@link #getAcquire}; if the bit is clear, 
* sets it with the memory semantics of {@link #set}; and finally returns 
* the original bit value as a boolean. 
* 
* <p>The variable may be of any primitive type. 
* Bits are numbered from zero, which refers to the arithmetically 
* least-significant bit, to {@code N-1} inclusive, where {@code N} is 
* the number of bits in the variable. Booleans have exactly one bit, 
* while other variables have an appropriate multiple of eight bits. 
* 
* <p>The method signature is of the form {@code (CT, int index)boolean}. 
* 
* <p>The symbolic type descriptor at the call site of {@code testAndSetBitAcquire} 
* must match the access mode type that is the result of calling 
* {@code accessModeType(VarHandle.AccessMode.TEST_AND_SET_BIT_ACQUIRE)} on this 
* VarHandle. 
* 
* @implNote The effects of this method are similar to a call to 
* {@code get} and {@code compareAndExchangeAcquire}, where the new 
* value is obtained from the old value by setting the specified bit. 
* The full effect of {@code testAndSetBitAcquire} would be obtained 
* by retrying the sequence as needed until the bit is either observed 
* to be set, or updated to be set. More efficient implementations may 
* be available on some platforms. 
* 
* @param args the signature-polymorphic parameter list of the form 
* {@code (CT, int index)} 
* , statically represented using varargs. 
* @return a boolean, the original value of the bit (before any update) 
* , statically represented using {@code Object}. 
* @throws UnsupportedOperationException if the access mode is unsupported 
* for this VarHandle. 
* @throws WrongMethodTypeException if the access mode type is not 
* compatible with the caller's symbolic type descriptor. 
* @throws ClassCastException if the access mode type is compatible with the 
* caller's symbolic type descriptor, but a reference cast fails. 
* @throws ClassCastException if the access mode type is compatible with the 
* caller's symbolic type descriptor, but a reference cast fails. 
* @throws IllegalArgumentException if the supplied index is not in the range 
* of zero (inclusive) to the number of bits in the variable (exclusive). 
* @see #getAcquire(Object...) 
* @see #set(Object...) 
* @see #compareAndExchangeAcquire(Object...) 
*/ 
public final native 
@MethodHandle.PolymorphicSignature 
@HotSpotIntrinsicCandidate 
Object testAndSetBitAcquire(Object... args); 


From david.holmes at oracle.com  Fri Jul 15 03:16:25 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 15 Jul 2016 13:16:25 +1000
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
Message-ID: <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>

Hi John,

On 15/07/2016 12:09 PM, John Rose wrote:
> I think we are missing an important opportunity by not supporting single-bit RMW operations in VarHandles.
>
> In particular, the x86 "bts" (bit-test-and-test) "btr" (bit-test-and-reset) and "btc" (bit-test-and-clear) are sometimes the right way to modify some data structure, when the alternative is a load and "cmpxchg" in a loop.  The overall costs are probably the same in the best case, but the loop-based idiom has some danger (relative to the single-instruction idiom) of costs stemming from larger code size.

Is this readily supported on non-x86?

David
-----

> At the JIT level, one can hope that the CAS-based idiom (coded in the current VH API) will be recognized and optimized to a single instruction on x86, but there is a strong risk that this will fail.  It's safer to specify the operation explicitly using a separate VH method.
>
> The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened.  An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable.
>
> (In a rather deep sense, getAndAdd is less powerful than testAndSetBit or getAndBitwiseOr,
> because op+ is bijective in each argument, while op| is idempotent.  This means that
> you can operate bitwise on a structure in such a way that your operation disappears
> when the structure is already in some state you are pushing it towards.  Of course,
> you also need a way to "exchange" in the previous value, atomically.)
>
> For a parallel discussion among the gcc folk, where they are working on pattern matching
> of CAS to BTS, see:
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244
>
> So, here is a specific proposal:
>   https://bugs.openjdk.java.net/browse/JDK-8161444
>   VarHandles should provide access bitwise atomics
>
> In a nutshell, testAndSetBitAcquire behaves as if it were built on top of
> compareAndExchangeAcquire (which it may on some platforms).
> The outgoing value parameter is not a value but a bit position within
> the memory value (zero = LSB, range-checked).  On x86 it compiles
> to "lock;bts" with appropriate fencing.  It is a great candidate for
> building a mutex-enter operation.
>
> For symmetry, I'm also proposing testAndClearBitRelease (or should it
> also be Acquire?) and flipAndGetBit (with Volatile ordering since it is likely
> to be stand-alone).  But it's just symmetry; testAndSetBitAcquire is the
> important part.
>
> What do folks think?  Is "lock;bts" useless for some reason?  (Note that
> the lock prefix is interpreted by modern x86's as a cache transaction
> request, just like xadd, with no external signal.)  Or are there no significant
> single-bit concurrent structures out there?  I know of two:  SeqLocks and
> AtomicMarkableReference (if/when the JVM embraces it).
>
> More background: The SeqLocks are likely to be important for value types
> (when they are too large for native hardware atomics, and must be accessed
> atomically).  Note that many uncontended value types will still need to be used
> with SeqLocks, when structure-tearing must be prevented for one reason
> or another.
>
> (Yet more background:  Non-tearability can be demanded by a value type's definition.
> If this were not possible, values could not embody invariants that affect security.)
>
> Thanks,
> ? John
>
> P.S.  For the record here are the important spec. details:
>
> /**
> * Atomically loads the bit at the specified {@code index} in a variable with
> * the memory semantics of {@link #getAcquire}; if the bit is clear,
> * sets it with the memory semantics of {@link #set}; and finally returns
> * the original bit value as a boolean.
> *
> * <p>The variable may be of any primitive type.
> * Bits are numbered from zero, which refers to the arithmetically
> * least-significant bit, to {@code N-1} inclusive, where {@code N} is
> * the number of bits in the variable. Booleans have exactly one bit,
> * while other variables have an appropriate multiple of eight bits.
> *
> * <p>The method signature is of the form {@code (CT, int index)boolean}.
> *
> * <p>The symbolic type descriptor at the call site of {@code testAndSetBitAcquire}
> * must match the access mode type that is the result of calling
> * {@code accessModeType(VarHandle.AccessMode.TEST_AND_SET_BIT_ACQUIRE)} on this
> * VarHandle.
> *
> * @implNote The effects of this method are similar to a call to
> * {@code get} and {@code compareAndExchangeAcquire}, where the new
> * value is obtained from the old value by setting the specified bit.
> * The full effect of {@code testAndSetBitAcquire} would be obtained
> * by retrying the sequence as needed until the bit is either observed
> * to be set, or updated to be set. More efficient implementations may
> * be available on some platforms.
> *
> * @param args the signature-polymorphic parameter list of the form
> * {@code (CT, int index)}
> * , statically represented using varargs.
> * @return a boolean, the original value of the bit (before any update)
> * , statically represented using {@code Object}.
> * @throws UnsupportedOperationException if the access mode is unsupported
> * for this VarHandle.
> * @throws WrongMethodTypeException if the access mode type is not
> * compatible with the caller's symbolic type descriptor.
> * @throws ClassCastException if the access mode type is compatible with the
> * caller's symbolic type descriptor, but a reference cast fails.
> * @throws ClassCastException if the access mode type is compatible with the
> * caller's symbolic type descriptor, but a reference cast fails.
> * @throws IllegalArgumentException if the supplied index is not in the range
> * of zero (inclusive) to the number of bits in the variable (exclusive).
> * @see #getAcquire(Object...)
> * @see #set(Object...)
> * @see #compareAndExchangeAcquire(Object...)
> */
> public final native
> @MethodHandle.PolymorphicSignature
> @HotSpotIntrinsicCandidate
> Object testAndSetBitAcquire(Object... args);
>

From aph at redhat.com  Fri Jul 15 08:06:45 2016
From: aph at redhat.com (Andrew Haley)
Date: Fri, 15 Jul 2016 09:06:45 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
Message-ID: <57889995.5030100@redhat.com>

On 15/07/16 04:16, David Holmes wrote:
> On 15/07/2016 12:09 PM, John Rose wrote:

>> > I think we are missing an important opportunity by not supporting
>> > single-bit RMW operations in VarHandles.
>> >
>> > In particular, the x86 "bts" (bit-test-and-test) "btr"
>> > (bit-test-and-reset) and "btc" (bit-test-and-clear) are sometimes
>> > the right way to modify some data structure, when the alternative
>> > is a load and "cmpxchg" in a loop.  The overall costs are
>> > probably the same in the best case, but the loop-based idiom has
>> > some danger (relative to the single-instruction idiom) of costs
>> > stemming from larger code size.

> Is this readily supported on non-x86?

On ARMv8, yes.

>> > In a nutshell, testAndSetBitAcquire behaves as if it were built
>> > on top of compareAndExchangeAcquire (which it may on some
>> > platforms).
>> > The outgoing value parameter is not a value but a bit position
>> > within the memory value (zero = LSB, range-checked).  On x86 it
>> > compiles to "lock;bts" with appropriate fencing.  It is a great
>> > candidate for building a mutex-enter operation.

It's a huge mistake to insist that only a single bit can be set or
cleared.  If it just so happens that a "bts" can be used, fine, but to
bake such a restriction into the library and VM is wrong.  The C++
atomic functions which do this job are

++
--
+=
-=
&=
|=
^=

All of these take a std::memory_order argument.  C++ compatibility
should be our starting point for such things, IMO.

Andrew.

From dl at cs.oswego.edu  Fri Jul 15 16:27:03 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 15 Jul 2016 12:27:03 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <57889995.5030100@redhat.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
Message-ID: <57890ED7.8090704@cs.oswego.edu>


I agree with Andrew. I now think the three C++ bitwise atomic methods were
prematurely triaged out. (Sorry for being the triager.)

On 07/15/2016 04:06 AM, Andrew Haley wrote:

>
> The C++ atomic functions which do this job are ++, -- +=, -=

These exist both pre- and post- style in both Java and C++,
(getAndIncrement vs incrementAndGet etc), but for bitwise operations ...

> &=, |=, ^=

... only the getAndX forms seem useful, with only Volatile
and Release orderings. Using the default-volatile RMW convention,
this would require 6 methods:

  getAndOrBits, getAndOrBitsRelease,
  getAndAndBits, getAndAndBitsRelease,
  getAndXorBits, getAndXorBitsRelease

(the embedded "AndAnd" is a little jarring but probably inevitable.)

On X86, it would require some compiler work to transform these
into locked-bts etc instructions when applicable, but until they
are, the unoptimized forms would be no worse than hand-build CAS loops.

On ARMv8.1, these translate into new atomic instructions (at least
the "release" forms). Similarly for the upcoming RISC-V specs.

-Doug


From dl at cs.oswego.edu  Fri Jul 15 19:17:30 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 15 Jul 2016 15:17:30 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <57890ED7.8090704@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
Message-ID: <578936CA.4070304@cs.oswego.edu>

On 07/15/2016 12:27 PM, Doug Lea wrote:

> ... only the getAndX forms seem useful, with only Volatile
> and Release orderings. Using the default-volatile RMW convention,
> this would require 6 methods:
>

John suggests the slightly less weird (and thus better):
   getAndBitwiseOr, getAndBitwiseAnd,  getAndBitwiseXor
   getAndBitwiseOrRelease, getAndBitwiseAndRelease, getAndBitwiseXorRelease
that at least separates the two "And"s.

And in the spirit of not making another premature triage proposal,
perhaps these should also include Acquire variants:
  getAndBitwiseOrAcquire, getAndBitwiseAndAcquire, getAndBitwiseXorAcquire

The implicitly-volatile versions should be useful without implementation
penalty in the Acquire use cases that come to mind, but perhaps there are
others. Suggestions welcome.

-Doug


From john.r.rose at oracle.com  Fri Jul 15 19:24:04 2016
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 15 Jul 2016 12:24:04 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <578936CA.4070304@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
Message-ID: <AE3D2940-397E-40BF-8DD4-F99D4DE87EFE@oracle.com>

On Jul 15, 2016, at 12:17 PM, Doug Lea <dl at cs.oswego.edu> wrote:
> 
> On 07/15/2016 12:27 PM, Doug Lea wrote:
> 
>> ... only the getAndX forms seem useful, with only Volatile
>> and Release orderings. Using the default-volatile RMW convention,
>> this would require 6 methods:
>> 
> 
> John suggests the slightly less weird (and thus better):
>  getAndBitwiseOr, getAndBitwiseAnd,  getAndBitwiseXor
>  getAndBitwiseOrRelease, getAndBitwiseAndRelease, getAndBitwiseXorRelease
> that at least separates the two "And"s.
> 
> And in the spirit of not making another premature triage proposal,
> perhaps these should also include Acquire variants:
> getAndBitwiseOrAcquire, getAndBitwiseAndAcquire, getAndBitwiseXorAcquire
> 
> The implicitly-volatile versions should be useful without implementation
> penalty in the Acquire use cases that come to mind, but perhaps there are
> others. Suggestions welcome.

Thanks.  I withdraw the single-bit proposals!  As you note, there
are plausible ways to express bts/btr with full-width bitwise ops.

(The bitwise ones are my real preference anyway, except for the
unpleasant fact that the most common ISA does not support it.
I thought I was being cleverly practical by proposing the single-bit
versions.!

How should these be aligned with compareAndExchange*?
By that I mean the ordering of reads and writes should documented
as no weaker than as if the thing had been implemented in terms of
some corresponding CAS loop.  (Or is there a better way?)

This raises a question about omitting the store.
Suppose the operation turns out to be a no-op.
This can mean that contention is detected or an idempotent
op has already raced to completion.

In that case, should the op include the Release constraint or not?

Put another way, can a reference implementation include
the marked optimization or not:

int getAndBitwiseOr(Object x, int mask) {
  for (;;) {
    int val0 = get(x); // getPlain
    int val1 = val0 & mask;
    if (val1 == val0)  return val0;  // ALLOW THIS OPTIMIZATION?
    int witness = compareAndExchangeRelease(x, val0, val1);
    if (witness == val0)  return val0;
  }
}

The optimization allows getAndBitwiseOr to take a weaker form,
which is generally desirable.  OTOH, if that is a form known to be
useless or problematic, we shouldn't go there.  The usefulness
would stem from the ability of a thread to detect already-done
or contention conditions with the least overhead.  A weaker form
can be strengthened by adding a fence.  A stronger form can be
weakened by adding additional polling logic, but that is clumsy
and error prone.

? John

From martinrb at google.com  Fri Jul 15 21:39:53 2016
From: martinrb at google.com (Martin Buchholz)
Date: Fri, 15 Jul 2016 14:39:53 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
Message-ID: <CA+kOe08nmL=MHBP+i2CC0EopNR7nLLOMOtjYvb2QVkQ-LCCqqw@mail.gmail.com>

On Thu, Jul 14, 2016 at 7:09 PM, John Rose <john.r.rose at oracle.com> wrote:

>
> The particular use case I have in mind is SeqLocks, specifically the
> writer-enter operation, which needs to change the lock state to "odd",
> unless it is already "odd", and let the processor know what happened.  An
> "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is
> preferable.
>

Most synchronizers have more complex state than "locked or unlocked".
StampedLock is a read-write lock, so you can only acquire the write lock if
not currently read-locked.  (Did I miss something?) ReentrantLock is
reentrant (!) so needs to store the lock hold count.  Perhaps ReentrantLock
could benefit if you optimize for non-reentrant acquires, at the cost of
doing an extra update for reentrant acquires.

(In a rather deep sense, getAndAdd is less powerful than testAndSetBit or
> getAndBitwiseOr,
> because op+ is bijective in each argument, while op| is idempotent.  This
> means that
> you can operate bitwise on a structure in such a way that your operation
> disappears
> when the structure is already in some state you are pushing it towards.
> Of course,
> you also need a way to "exchange" in the previous value, atomically.)
>

From dl at cs.oswego.edu  Fri Jul 15 23:17:19 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 15 Jul 2016 19:17:19 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <AE3D2940-397E-40BF-8DD4-F99D4DE87EFE@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<AE3D2940-397E-40BF-8DD4-F99D4DE87EFE@oracle.com>
Message-ID: <57896EFF.2000403@cs.oswego.edu>

On 07/15/2016 03:24 PM, John Rose wrote:

> How should these be aligned with compareAndExchange*?
> By that I mean the ordering of reads and writes should documented
> as no weaker than as if the thing had been implemented in terms of
> some corresponding CAS loop.  (Or is there a better way?)

Right. The specs for all getAndX operations should just amount to
"equivalent to CAS loop".

>
> This raises a question about omitting the store.
> Suppose the operation turns out to be a no-op.
> This can mean that contention is detected or an idempotent
> op has already raced to completion.
>
> In that case, should the op include the Release constraint or not?

As a lock implementation question: If you fail fast path,
then if options include spinning using Thread.onSpinWait,
which has fence-like effects anyway.
And if the alternative is no-op and it is expected to be common,
then users should guard the atomic with a read to filter out
most cases. And if it is a queued lock, then you generally need
a full volatile fence anyway to operate on queue.

So, across these and other options, release overhead is not
not often measurable. Which seems to argue against complicating
effects specification by allowing the early exit in:

>
> Put another way, can a reference implementation include
> the marked optimization or not:
>
> int getAndBitwiseOr(Object x, int mask) {
>    for (;;) {
>      int val0 = get(x); // getPlain
>      int val1 = val0 & mask;
>      if (val1 == val0)  return val0;  // ALLOW THIS OPTIMIZATION?
>      int witness = compareAndExchangeRelease(x, val0, val1);
>      if (witness == val0)  return val0;
>    }
> }
>

-Doug


From john.r.rose at oracle.com  Sat Jul 16 00:50:08 2016
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 15 Jul 2016 17:50:08 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CA+kOe08nmL=MHBP+i2CC0EopNR7nLLOMOtjYvb2QVkQ-LCCqqw@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<CA+kOe08nmL=MHBP+i2CC0EopNR7nLLOMOtjYvb2QVkQ-LCCqqw@mail.gmail.com>
Message-ID: <BC28DE8F-1E18-4EDD-BA49-D0E1B9BC40A4@oracle.com>

On Jul 15, 2016, at 2:39 PM, Martin Buchholz <martinrb at google.com> wrote:
> 
> On Thu, Jul 14, 2016 at 7:09 PM, John Rose <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>> wrote:
> 
> The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened.  An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable.
> 
> Most synchronizers have more complex state than "locked or unlocked".  StampedLock is a read-write lock, so you can only acquire the write lock if not currently read-locked.  (Did I miss something?)

The bitwise stuff allows you to acquire or release a single independent bit
in a lock word (or maybe more than one bit).  That bit doesn't have to encode
the whole state of the lock; in fact if it did we'd use getAndSet of a boolean.
The point is you can build lock state management on top of getAndBitwise*
in useful ways, when if the first interaction with the lock is to assert a setting
of that one state bit, while at the same time querying the values of the other bits.

> ReentrantLock is reentrant (!) so needs to store the lock hold count.  Perhaps ReentrantLock could benefit if you optimize for non-reentrant acquires, at the cost of doing an extra update for reentrant acquires.


It seems to me that any multi-field concurrent structure (like a StampedLock)
could be protected by a single-bit micro-lock built on top of a reserved bit taken
from one of the structure's fields.  There are often reasons not to do such
things, but when the technique is appropriate, the bitwise operators let you
lay down the bit inside the same cache line as the rest of the structure.
That seems like a win to me.

Some day we can persuade the JVM to loosen its grip on the slack bits
in pointers, allowing types like AtomicMarkableReference to be implemented
in one word.  In that case, AMR.attemptMark might use BTS/BTR.

? John

From paul.sandoz at oracle.com  Mon Jul 18 11:43:47 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Mon, 18 Jul 2016 13:43:47 +0200
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <578936CA.4070304@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
Message-ID: <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>


> On 15 Jul 2016, at 21:17, Doug Lea <dl at cs.oswego.edu> wrote:
> 
> On 07/15/2016 12:27 PM, Doug Lea wrote:
> 
>> ... only the getAndX forms seem useful, with only Volatile
>> and Release orderings. Using the default-volatile RMW convention,
>> this would require 6 methods:
>> 
> 
> John suggests the slightly less weird (and thus better):
>  getAndBitwiseOr, getAndBitwiseAnd,  getAndBitwiseXor
>  getAndBitwiseOrRelease, getAndBitwiseAndRelease, getAndBitwiseXorRelease
> that at least separates the two "And"s.
> 
> And in the spirit of not making another premature triage proposal,
> perhaps these should also include Acquire variants:
> getAndBitwiseOrAcquire, getAndBitwiseAndAcquire, getAndBitwiseXorAcquire
> 
> The implicitly-volatile versions should be useful without implementation
> penalty in the Acquire use cases that come to mind, but perhaps there are
> others. Suggestions welcome.
> 

We can support boolean, byte, char, short, int and long, where boolean defers to byte, and char defers to short.

In terms of the Unsafe Java implementations have i got the following correct (it?s the acquire variant i am unsure of)?

@ForceInline
public final int getAndBitwiseOrInt(Object o, long offset, int mask) {
    int current;
    do {
        current = getIntVolatile(o, offset);
    } while (!weakCompareAndSwapIntVolatile(o, offset,
                                            current, current | mask));
    return current;
}

@ForceInline
public final int getAndBitwiseOrIntRelease(Object o, long offset, int mask) {
    int current;
    do {
        current = getInt(o, offset);
    } while (!weakCompareAndSwapIntRelease(o, offset,
                                           current, current | mask));
    return current;
}

@ForceInline
public final int getAndBitwiseOrIntAcquire(Object o, long offset, int mask) {
    int current;
    do {
        current = getIntAcquire(o, offset);
    } while (!weakCompareAndSwapIntAcquire(o, offset,
                                           current, current | mask));
    return current;
}


As previously indicated, with suitable intrinsics and constant power of two masks (and complement of) it should be possible to boil it down to almost single bit setting instructions on x86 (more so if the returned value, aka current/witness, is dropped).

?

Separately, i would like to propose a naming scheme:

- for the read or write method, plain is the default.

- for read-modify-write methods volatile is the default volatile
  - rename weakCompareAndSet to weakCompareAndSetPlain
  - rename weakCompareAndSetVolatile to weakCompareAndSet
  - deprecate (not for removal) Atomic*.weakCompareAndSet, add Atomic*.weakCompareAndSetPlain
    which leaves the inconsistency of Atomic*.weakCompareAndSetVolatile.

Analysis on grepcode shows very little usage of the Atomic*.weakCompareAndSet methods.

Paul.

From aph at redhat.com  Mon Jul 18 18:27:18 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 18 Jul 2016 19:27:18 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
Message-ID: <578D1F86.7070408@redhat.com>

On 18/07/16 12:43, Paul Sandoz wrote:
> - for the read or write method, plain is the default.
> 
> - for read-modify-write methods volatile is the default volatile
>   - rename weakCompareAndSet to weakCompareAndSetPlain

Why "plain"?  Is this the same as C++ "relaxed"?

Andrew.


From dl at cs.oswego.edu  Mon Jul 18 19:31:29 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 18 Jul 2016 15:31:29 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
Message-ID: <578D2E91.5000500@cs.oswego.edu>

On 07/18/2016 07:43 AM, Paul Sandoz wrote:

> In terms of the Unsafe Java implementations have i got the following correct (it?s the acquire variant i am unsure of)?
>

These look OK. For ARM/POWER, it's possible to avoid some fences in loops
at assembly level, but that's why they are intrinsics.

>
> Separately, i would like to propose a naming scheme:
>
> - for the read or write method, plain is the default.
>
> - for read-modify-write methods volatile is the default volatile
>    - rename weakCompareAndSet to weakCompareAndSetPlain
>    - rename weakCompareAndSetVolatile to weakCompareAndSet
>    - deprecate (not for removal) Atomic*.weakCompareAndSet, add Atomic*.weakCompareAndSetPlain
>      which leaves the inconsistency of Atomic*.weakCompareAndSetVolatile.
>

Sure. This does seem slightly better.
(And I'm content to continue to take the blame for OKing the naming :-)

> On 07/18/2016 02:27 PM, Andrew Haley wrote:
>> On 18/07/16 12:43, Paul Sandoz wrote:
>>> - for the read or write method, plain is the default.
>>>
>>> - for read-modify-write methods volatile is the default volatile
>>>    - rename weakCompareAndSet to weakCompareAndSetPlain
>>
>> Why "plain"?  Is this the same as C++ "relaxed"?

In this case, yes. But Java-plain is not necessarily always the
same as C++ relaxed, so we've been cautious with namings.

-Doug


From aph at redhat.com  Tue Jul 19 07:59:24 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 19 Jul 2016 08:59:24 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <578D2E91.5000500@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu>
Message-ID: <578DDDDC.8050704@redhat.com>

On 18/07/16 20:31, Doug Lea wrote:

>> On 07/18/2016 02:27 PM, Andrew Haley wrote:
>>> On 18/07/16 12:43, Paul Sandoz wrote:
>>>> - for the read or write method, plain is the default.
>>>>
>>>> - for read-modify-write methods volatile is the default volatile
>>>>    - rename weakCompareAndSet to weakCompareAndSetPlain
>>>
>>> Why "plain"?  Is this the same as C++ "relaxed"?
> 
> In this case, yes. But Java-plain is not necessarily always the
> same as C++ relaxed, so we've been cautious with namings.

Mmmm, but it's baffling for me, and I've been involved for a long
time.  We have "Opaque" and now "Plain".  What is the difference
between them?  I haven't seen these terms anywhere else.  Is this new
terminology?

Andrew.

From paul.sandoz at oracle.com  Tue Jul 19 08:39:14 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 19 Jul 2016 10:39:14 +0200
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <578DDDDC.8050704@redhat.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
Message-ID: <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>


> On 19 Jul 2016, at 09:59, Andrew Haley <aph at redhat.com> wrote:
> 
> On 18/07/16 20:31, Doug Lea wrote:
> 
>>> On 07/18/2016 02:27 PM, Andrew Haley wrote:
>>>> On 18/07/16 12:43, Paul Sandoz wrote:
>>>>> - for the read or write method, plain is the default.
>>>>> 
>>>>> - for read-modify-write methods volatile is the default volatile
>>>>>   - rename weakCompareAndSet to weakCompareAndSetPlain
>>>> 
>>>> Why "plain"?  Is this the same as C++ "relaxed"?
>> 
>> In this case, yes. But Java-plain is not necessarily always the
>> same as C++ relaxed, so we've been cautious with namings.
> 
> Mmmm, but it's baffling for me, and I've been involved for a long
> time.  We have "Opaque" and now "Plain".  What is the difference
> between them?  I haven't seen these terms anywhere else.  Is this new
> terminology?
> 

Plain behaves like non-volatile/non-final field access e.g. like get/putfield byte codes.

Both plain and opaque have ?no assurance of memory ordering effects with respect to other threads? but opaque is stronger in the sense that the compiler is restricted in what optimisations it may perform, in a sense the access is ?opaque? to the compiler e.g. it cannot elide the access or fold it into a more recent access etc.

A good example is presented in Aleksey?s VarHandles slides #55

  http://shipilev.net/talks/jpoint-April2016-varhandles.pdf


I am still holding off updating the specifications to clarify, as Doug may have cooking some foundational tweaks from which we can build upon.

Hth,
Paul.

From paul.sandoz at oracle.com  Tue Jul 19 14:23:26 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 19 Jul 2016 16:23:26 +0200
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <578D2E91.5000500@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu>
Message-ID: <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com>


> On 18 Jul 2016, at 21:31, Doug Lea <dl at cs.oswego.edu> wrote:
> 
> On 07/18/2016 07:43 AM, Paul Sandoz wrote:
> 
>> In terms of the Unsafe Java implementations have i got the following correct (it?s the acquire variant i am unsure of)?
>> 
> 
> These look OK. For ARM/POWER, it's possible to avoid some fences in loops
> at assembly level, but that's why they are intrinsics.


Here is an initial (and untested) webrev for those that might be interested:

  http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/>

Paul.

> 
>> 
>> Separately, i would like to propose a naming scheme:
>> 
>> - for the read or write method, plain is the default.
>> 
>> - for read-modify-write methods volatile is the default volatile
>>   - rename weakCompareAndSet to weakCompareAndSetPlain
>>   - rename weakCompareAndSetVolatile to weakCompareAndSet
>>   - deprecate (not for removal) Atomic*.weakCompareAndSet, add Atomic*.weakCompareAndSetPlain
>>     which leaves the inconsistency of Atomic*.weakCompareAndSetVolatile.
>> 
> 
> Sure. This does seem slightly better.
> (And I'm content to continue to take the blame for OKing the naming :-)
> 
>> On 07/18/2016 02:27 PM, Andrew Haley wrote:
>>> On 18/07/16 12:43, Paul Sandoz wrote:
>>>> - for the read or write method, plain is the default.
>>>> 
>>>> - for read-modify-write methods volatile is the default volatile
>>>>   - rename weakCompareAndSet to weakCompareAndSetPlain
>>> 
>>> Why "plain"?  Is this the same as C++ "relaxed"?
> 
> In this case, yes. But Java-plain is not necessarily always the
> same as C++ relaxed, so we've been cautious with namings.
> 
> -Doug
> 
> 
> 
> 


From john.r.rose at oracle.com  Tue Jul 19 18:56:36 2016
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 19 Jul 2016 11:56:36 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu>
	<1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com>
Message-ID: <E32B769F-273D-4586-9BC7-97B6A78ADE89@oracle.com>

On Jul 19, 2016, at 7:23 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> 
> Here is an initial (and untested) webrev for those that might be interested:
> 
>  http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/> <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/>>

I like this very much.

Mainly because it gives us single-bit atomics, but also because, with C++, it leans towards the newer ISAs.

? John

From aph at redhat.com  Tue Jul 19 20:51:37 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 19 Jul 2016 21:51:37 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
Message-ID: <578E92D9.80201@redhat.com>

On 19/07/16 09:39, Paul Sandoz wrote:
> 
>> On 19 Jul 2016, at 09:59, Andrew Haley <aph at redhat.com> wrote:
>>
>> On 18/07/16 20:31, Doug Lea wrote:
>>>> On 07/18/2016 02:27 PM, Andrew Haley wrote:
>>>>> On 18/07/16 12:43, Paul Sandoz wrote:
>>>>>> - for the read or write method, plain is the default.
>>>>>>
>>>>>> - for read-modify-write methods volatile is the default volatile
>>>>>>   - rename weakCompareAndSet to weakCompareAndSetPlain
>>>>>
>>>>> Why "plain"?  Is this the same as C++ "relaxed"?
>>>
>>> In this case, yes. But Java-plain is not necessarily always the
>>> same as C++ relaxed, so we've been cautious with namings.
>>
>> Mmmm, but it's baffling for me, and I've been involved for a long
>> time.  We have "Opaque" and now "Plain".  What is the difference
>> between them?  I haven't seen these terms anywhere else.  Is this new
>> terminology?
> 
> Plain behaves like non-volatile/non-final field access e.g. like
> get/putfield byte codes.
> 
> Both plain and opaque have ?no assurance of memory ordering effects
> with respect to other threads? but opaque is stronger in the sense
> that the compiler is restricted in what optimisations it may
> perform, in a sense the access is ?opaque? to the compiler e.g. it
> cannot elide the access or fold it into a more recent access etc.

OK, but if the processor can reorder accesses (and satisfy them from
local caches) in the absence of fences, why is this a distinction that
is worth bothering about?  And how on Earth would you make such a
distinction in the context of a high-level language specification?

> A good example is presented in Aleksey?s VarHandles slides #55
> 
>   http://shipilev.net/talks/jpoint-April2016-varhandles.pdf

Thanks.

> I am still holding off updating the specifications to clarify, as
> Doug may have cooking some foundational tweaks from which we can
> build upon.

I look forward to seeing that.

Andrew.

From martinrb at google.com  Wed Jul 20 00:14:34 2016
From: martinrb at google.com (Martin Buchholz)
Date: Tue, 19 Jul 2016 17:14:34 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <BC28DE8F-1E18-4EDD-BA49-D0E1B9BC40A4@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<CA+kOe08nmL=MHBP+i2CC0EopNR7nLLOMOtjYvb2QVkQ-LCCqqw@mail.gmail.com>
	<BC28DE8F-1E18-4EDD-BA49-D0E1B9BC40A4@oracle.com>
Message-ID: <CA+kOe0-x_Ua2oc8JLXEPz9_yb468tnZN_oEajaRXmL=AgTWDzg@mail.gmail.com>

On Fri, Jul 15, 2016 at 5:50 PM, John Rose <john.r.rose at oracle.com> wrote:

> On Jul 15, 2016, at 2:39 PM, Martin Buchholz <martinrb at google.com> wrote:
>
>
> On Thu, Jul 14, 2016 at 7:09 PM, John Rose <john.r.rose at oracle.com> wrote:
>
>>
>> The particular use case I have in mind is SeqLocks, specifically the
>> writer-enter operation, which needs to change the lock state to "odd",
>> unless it is already "odd", and let the processor know what happened.  An
>> "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is
>> preferable.
>>
>
> Most synchronizers have more complex state than "locked or unlocked".
> StampedLock is a read-write lock, so you can only acquire the write lock if
> not currently read-locked.  (Did I miss something?)
>
>
> The bitwise stuff allows you to acquire or release a single independent bit
> in a lock word (or maybe more than one bit).  That bit doesn't have to
> encode
> the whole state of the lock; in fact if it did we'd use getAndSet of a
> boolean.
> The point is you can build lock state management on top of getAndBitwise*
> in useful ways, when if the first interaction with the lock is to assert a
> setting
>

I'm still thinking about where in j.u.c. we would use getAndBitwise*.

... StampedLock ...

we have to distinguish readers and writers, so both readers and writers
acquire the micro-lock before proceeding on success to do another write to
indicate the actual current lock state.  We'd better not lose our time
slice in between!  If an acquirer fails to acquire the micro-lock in an
indeterminate state, they probably spin waiting for the micro-lock owner,
but for how long?

ReentrantLock seems more promising.  The micro-lock bit unambiguously
indicates "exclusively held"; other bits are reentrant hold count bits.  On
reentrant acquire, have to check thread field:
  lock.thread == Thread.currentThread().
If we don't acquire reentrantly, then a single getAndSetMicroLock is
sufficient to unambiguously acquire the lock.


ReentrantLock is reentrant (!) so needs to store the lock hold count.
> Perhaps ReentrantLock could benefit if you optimize for non-reentrant
> acquires, at the cost of doing an extra update for reentrant acquires.
>
>
> It seems to me that any multi-field concurrent structure (like a
> StampedLock)
> could be protected by a single-bit micro-lock built on top of a reserved
> bit taken
> from one of the structure's fields.  There are often reasons not to do such
> things, but when the technique is appropriate, the bitwise operators let
> you
> lay down the bit inside the same cache line as the rest of the structure.
> That seems like a win to me.
>
> Some day we can persuade the JVM to loosen its grip on the slack bits
> in pointers, allowing types like AtomicMarkableReference to be implemented
> in one word.  In that case, AMR.attemptMark might use BTS/BTR.
>

But ...  AtomicMarkableReference probably needs to be implemented in the
VM, not in pure Java code that uses VarHandles, since pointer bit stealing
depends on things like compressed oops?

From martinrb at google.com  Wed Jul 20 00:25:22 2016
From: martinrb at google.com (Martin Buchholz)
Date: Tue, 19 Jul 2016 17:25:22 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <578E92D9.80201@redhat.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
Message-ID: <CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>

On Tue, Jul 19, 2016 at 1:51 PM, Andrew Haley <aph at redhat.com> wrote:

> On 19/07/16 09:39, Paul Sandoz wrote:
> > Plain behaves like non-volatile/non-final field access e.g. like
> > get/putfield byte codes.
>

We should probably clarify whether we really mean that even word-tearing on
longs/doubles is allowed.

C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses:
truly atomic (!) and single-memory-location-sequentially-consistent.

From john.r.rose at oracle.com  Wed Jul 20 00:31:57 2016
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 19 Jul 2016 17:31:57 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CA+kOe0-x_Ua2oc8JLXEPz9_yb468tnZN_oEajaRXmL=AgTWDzg@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<CA+kOe08nmL=MHBP+i2CC0EopNR7nLLOMOtjYvb2QVkQ-LCCqqw@mail.gmail.com>
	<BC28DE8F-1E18-4EDD-BA49-D0E1B9BC40A4@oracle.com>
	<CA+kOe0-x_Ua2oc8JLXEPz9_yb468tnZN_oEajaRXmL=AgTWDzg@mail.gmail.com>
Message-ID: <60A0B4AA-3521-44D4-941A-8ADEF6AF1B23@oracle.com>

On Jul 19, 2016, at 5:14 PM, Martin Buchholz <martinrb at google.com> wrote:
> 
> 
> On Fri, Jul 15, 2016 at 5:50 PM, John Rose <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>> wrote:
> On Jul 15, 2016, at 2:39 PM, Martin Buchholz <martinrb at google.com <mailto:martinrb at google.com>> wrote:
>> 
>> On Thu, Jul 14, 2016 at 7:09 PM, John Rose <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>> wrote:
>> 
>> The particular use case I have in mind is SeqLocks, specifically the writer-enter operation, which needs to change the lock state to "odd", unless it is already "odd", and let the processor know what happened.  An "xadd" cannot do this, but a "cmpxchg" or "bts" can, and the "bts" is preferable.
>> 
>> Most synchronizers have more complex state than "locked or unlocked".  StampedLock is a read-write lock, so you can only acquire the write lock if not currently read-locked.  (Did I miss something?)
> 
> The bitwise stuff allows you to acquire or release a single independent bit
> in a lock word (or maybe more than one bit).  That bit doesn't have to encode
> the whole state of the lock; in fact if it did we'd use getAndSet of a boolean.
> The point is you can build lock state management on top of getAndBitwise*
> in useful ways, when if the first interaction with the lock is to assert a setting
> 
> I'm still thinking about where in j.u.c. we would use getAndBitwise*.
> 
> ... StampedLock ... 
> 
> we have to distinguish readers and writers, so both readers and writers acquire the micro-lock before proceeding on success to do another write to indicate the actual current lock state.  We'd better not lose our time slice in between!  If an acquirer fails to acquire the micro-lock in an indeterminate state, they probably spin waiting for the micro-lock owner, but for how long?

Yes, more work is needed to make that operate correctly.  I suppose we can reuse an idea from HotSpot and have compact and inflated states for such locks.  In a nutshell, it works like this:  The compact state needs at a minimum just enough bits to encode semantic lock state, plus distinguish compact from inflation states.  The lock would try to stay in the compact state, but inflate if waiter lists need to be dealt with.  The inflated state would have an out-of-line control block with waiter queues and every creature comfort.  It might be hard to do this on top of the JVM, which likes to use safepoints to pull tricks like deflating cold locks.

> ReentrantLock seems more promising.  The micro-lock bit unambiguously indicates "exclusively held"; other bits are reentrant hold count bits.  On reentrant acquire, have to check thread field: 
>   lock.thread == Thread.currentThread().
> If we don't acquire reentrantly, then a single getAndSetMicroLock is sufficient to unambiguously acquire the lock.
> 
> 
>> ReentrantLock is reentrant (!) so needs to store the lock hold count.  Perhaps ReentrantLock could benefit if you optimize for non-reentrant acquires, at the cost of doing an extra update for reentrant acquires.
> 
> 
> It seems to me that any multi-field concurrent structure (like a StampedLock)
> could be protected by a single-bit micro-lock built on top of a reserved bit taken
> from one of the structure's fields.  There are often reasons not to do such
> things, but when the technique is appropriate, the bitwise operators let you
> lay down the bit inside the same cache line as the rest of the structure.
> That seems like a win to me.
> 
> Some day we can persuade the JVM to loosen its grip on the slack bits
> in pointers, allowing types like AtomicMarkableReference to be implemented
> in one word.  In that case, AMR.attemptMark might use BTS/BTR.
> 
> But ...  AtomicMarkableReference probably needs to be implemented in the VM, not in pure Java code that uses VarHandles, since pointer bit stealing depends on things like compressed oops?

What I mean by "loosen its grip" is share enough layout information about pointers that Java code can find and use a slack bit in the pointer format.  (And if there isn't such a bit, then Java code would have to go away and do something else.)  Also, for pointers which are treated this way, the GC would have to mask off the shared bits.

? John

From john.r.rose at oracle.com  Wed Jul 20 00:33:52 2016
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 19 Jul 2016 17:33:52 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
Message-ID: <E08D3598-6A77-4956-9B04-2DEF2559E841@oracle.com>

On Jul 19, 2016, at 5:25 PM, Martin Buchholz <martinrb at google.com> wrote:
> 
> We should probably clarify whether we really mean that even word-tearing on
> longs/doubles is allowed.

Yuck.  This is one of the reasons reason "Plain" is also "Odd".
I long for the day when I can fully appreciate this problem?in the rear view mirror.

? John

From john.r.rose at oracle.com  Wed Jul 20 00:44:33 2016
From: john.r.rose at oracle.com (John Rose)
Date: Tue, 19 Jul 2016 17:44:33 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <60A0B4AA-3521-44D4-941A-8ADEF6AF1B23@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<CA+kOe08nmL=MHBP+i2CC0EopNR7nLLOMOtjYvb2QVkQ-LCCqqw@mail.gmail.com>
	<BC28DE8F-1E18-4EDD-BA49-D0E1B9BC40A4@oracle.com>
	<CA+kOe0-x_Ua2oc8JLXEPz9_yb468tnZN_oEajaRXmL=AgTWDzg@mail.gmail.com>
	<60A0B4AA-3521-44D4-941A-8ADEF6AF1B23@oracle.com>
Message-ID: <23F4B458-905D-4F07-85E3-BDF6BE6E374C@oracle.com>

On Jul 19, 2016, at 5:31 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> What I mean by "loosen its grip" is share enough layout information about pointers that Java code can find and use a slack bit in the pointer format.  (And if there isn't such a bit, then Java code would have to go away and do something else.)  Also, for pointers which are treated this way, the GC would have to mask off the shared bits.

P.S.  One more thought on this:  We probably need a special marking for
pointer variables which have this funny property.  The JVM can lay them
out with 64 bits even when they are compressed, and then inform Java
code how many slack bits are available.  Some days 32 (compressed
oops), some days 3 (all 61 high bits are significant) and some days 8/16/24.
In the non-compressed case, the GC will have to mask off the bits which
the JVM shares with the Java code.

This is related to some work Rickard Backman did in 2012, where a 64-bit
pointer variable could also contain non-pointer bits usable for any purpose.
In that case, the bits were mutually exclusive with the pointer, and a tag
scheme would tell the GC and everybody else what was in the variable.
This is a different semantics from "stolen" color bits or flag bits, but has
many of the same implementation moves.

http://cr.openjdk.java.net/~rbackman/tagged.patch/mlvm.hs.patch

In the future, we can probably use value types as a principled way
to mark such special variables for special processing by the JVM.
(I'm thinking TaggedReference<T>, Contended<T>, WeakReference<T>,
etc.  Details to be worked out later?)


From Paul.Sandoz at oracle.com  Wed Jul 20 08:25:29 2016
From: Paul.Sandoz at oracle.com (Paul Sandoz)
Date: Wed, 20 Jul 2016 10:25:29 +0200
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
Message-ID: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>


> On 20 Jul 2016, at 02:25, Martin Buchholz <martinrb at google.com <mailto:martinrb at google.com>> wrote:
> 
> 
> 
> On Tue, Jul 19, 2016 at 1:51 PM, Andrew Haley <aph at redhat.com <mailto:aph at redhat.com>> wrote:
> On 19/07/16 09:39, Paul Sandoz wrote:
> > Plain behaves like non-volatile/non-final field access e.g. like
> > get/putfield byte codes.
> 
> We should probably clarify whether we really mean that even word-tearing on longs/doubles is allowed.
> 

Just to be clear you are referring to atomicity rather than word tearing as specified by JLS:

https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 <https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6>

? (I have tended to use word tearing interchangeably in the past and it has caused confusion.)


> C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses: truly atomic (!) and single-memory-location-sequentially-consistent.

Yes, it?s the latter that seems harder to apply.

Paul.


From paul.sandoz at oracle.com  Wed Jul 20 08:28:26 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Wed, 20 Jul 2016 10:28:26 +0200
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <E32B769F-273D-4586-9BC7-97B6A78ADE89@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu>
	<1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com>
	<E32B769F-273D-4586-9BC7-97B6A78ADE89@oracle.com>
Message-ID: <463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com>


> On 19 Jul 2016, at 20:56, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Jul 19, 2016, at 7:23 AM, Paul Sandoz <paul.sandoz at oracle.com <mailto:paul.sandoz at oracle.com>> wrote:
>> 
>> Here is an initial (and untested) webrev for those that might be interested:
>> 
>>  http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/> <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/>>
> 
> I like this very much.
> 
> Mainly because it gives us single-bit atomics, but also because, with C++, it leans towards the newer ISAs.
> 

Just to be clear i added a bunch of methods to Unsafe in the anticipation they will be made intrinsic.

?

Perhaps i am asking for trouble bringing this up, but do we require acquire/release variants of getAndAdd and getAndSet? (IMHO we could drop addAndGet).

Paul.

From dl at cs.oswego.edu  Wed Jul 20 10:37:31 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 20 Jul 2016 06:37:31 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu>
	<1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com>
	<E32B769F-273D-4586-9BC7-97B6A78ADE89@oracle.com>
	<463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com>
Message-ID: <ec8a881c-0a0d-4695-9205-78118bb6ed1c@cs.oswego.edu>

On 07/20/2016 04:28 AM, Paul Sandoz wrote:

>
> Perhaps i am asking for trouble bringing this up, but do we require acquire/release variants of getAndAdd and getAndSet? (IMHO we could drop addAndGet).
>

I had noticed this as well. I agree that we ought to do this for the
sake of consistency; adding:

   getAndAddRelease, getAndAddAcquire, getAndSetRelease, getAndSetAcquire


-Doug


From dl at cs.oswego.edu  Wed Jul 20 12:49:14 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 20 Jul 2016 08:49:14 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
Message-ID: <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>

On 07/20/2016 04:25 AM, Paul Sandoz wrote:

>> C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses: truly atomic (!) and single-memory-location-sequentially-consistent.
>
> Yes, it?s the latter that seems harder to apply.
>

To illustrate the main consequence (also showing how Java-Plain vs C++relaxed
differences are so small and subtle), in C++-relaxed, compilers cannot perform
some forms of  common subexpression elimination in the presence of possible
aliasing, but for Java-plain (and C++-plain), they can.  As in:

class Point ( int x, y; }

void f(Point a, Point b) {
   int r1 = a.x;
   int r2 = b.x;
   int r3 = a.x; // simplify to: int r3 = r1 ?
   use (r1, r2, r3);
}

If the accesses were C++-relaxed, then the transformation could not be applied
if a and b are the same point because the r3 read might be older than r2 if
some other thread wrote between the reads. But C++-plain and Java-Plain both
allow this to be done anyway. Intuitively, because the per view (a vs b)
reads are "coherent", which is spec'ed as OK even though the per-location
rule need not hold.

(Mostly unrelatedly, note that if a and b were known to be aliased, then you
could apply this transformation if you first simplified the "r2 = b.x" to
"r2 = r1".)

-Doug


From paul.sandoz at oracle.com  Wed Jul 20 14:18:02 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Wed, 20 Jul 2016 16:18:02 +0200
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <ec8a881c-0a0d-4695-9205-78118bb6ed1c@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu>
	<1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com>
	<E32B769F-273D-4586-9BC7-97B6A78ADE89@oracle.com>
	<463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com>
	<ec8a881c-0a0d-4695-9205-78118bb6ed1c@cs.oswego.edu>
Message-ID: <DDD34317-EE4C-4026-8551-DD0113FEC5A0@oracle.com>


> On 20 Jul 2016, at 12:37, Doug Lea <dl at cs.oswego.edu> wrote:
> 
> On 07/20/2016 04:28 AM, Paul Sandoz wrote:
> 
>> 
>> Perhaps i am asking for trouble bringing this up, but do we require acquire/release variants of getAndAdd and getAndSet? (IMHO we could drop addAndGet).
>> 
> 
> I had noticed this as well. I agree that we ought to do this for the
> sake of consistency; adding:
> 
>  getAndAddRelease, getAndAddAcquire, getAndSetRelease, getAndSetAcquire
> 
> 

Updated:

http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/ <http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8161444-vhs-bitwise-atomics/webrev/>

Again the same trick with Unsafe methods is applied in the anticipation they will be made intrinsic later on. On second thoughts it may be better if the currently non-intrinsic Unsafe acquire/release variants defer to the stronger volatile variant that is intrinsic. Any opinions on that?


I will defer the removal of addAndGet and the proposed renaming to separate patch.

Paul.

From john.r.rose at oracle.com  Wed Jul 20 16:32:21 2016
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 20 Jul 2016 09:32:21 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <DDD34317-EE4C-4026-8551-DD0113FEC5A0@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu>
	<1A64B60D-F158-4241-8A75-E1E5BD594C87@oracle.com>
	<E32B769F-273D-4586-9BC7-97B6A78ADE89@oracle.com>
	<463AF8A3-0387-4F26-872B-E291ACC1C5AE@oracle.com>
	<ec8a881c-0a0d-4695-9205-78118bb6ed1c@cs.oswego.edu>
	<DDD34317-EE4C-4026-8551-DD0113FEC5A0@oracle.com>
Message-ID: <0ECAA8CD-1751-4C31-8EB5-DF7ED38A96DF@oracle.com>

On Jul 20, 2016, at 7:18 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:

> On second thoughts it may be better if the currently non-intrinsic Unsafe acquire/release variants defer to the stronger volatile variant that is intrinsic. Any opinions on that?

I would prefer that the default implementations of the various bitwise ops defer to the same-flavored CAS ops instead of to the volatile bitwise ops.

Reason:  On platforms without rich bitwise ops (x86, SPARC) you lose memory ordering information if you alias to the volatile version.

(It's not a strong reason, since those CPUs are TSO.)

Platforms with rich bitwise ops are also likely to have rich fences, so again there's no benefit to aliasing to the volatile version.

? John

From martinrb at google.com  Wed Jul 20 16:59:49 2016
From: martinrb at google.com (Martin Buchholz)
Date: Wed, 20 Jul 2016 09:59:49 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
Message-ID: <CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>

On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:

>
> > We should probably clarify whether we really mean that even word-tearing
> on longs/doubles is allowed.
>
> Just to be clear you are referring to atomicity rather than word tearing
> as specified by JLS:
>
> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 <
> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6>
>
> ? (I have tended to use word tearing interchangeably in the past and it
> has caused confusion.)
>
>
YIKES!  I just re-read
17.6. Word Tearing
and
17.7. Non-atomic Treatment of double and long

and now realize I've been using "word tearing" to mean 17.7 instead of 17.6
for many years.  I don't have a good word for 17.6, but I want something
along the lines of "ghost writes" or "collateral damage".

Am I supposed to visualize "tearing" as (sad eye water) tears running out
of one byte across neighbor bytes?

From john.r.rose at oracle.com  Wed Jul 20 17:05:48 2016
From: john.r.rose at oracle.com (John Rose)
Date: Wed, 20 Jul 2016 10:05:48 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
Message-ID: <F022B3A1-7D90-4DCF-950E-6AA1639E3433@oracle.com>

On Jul 20, 2016, at 9:59 AM, Martin Buchholz <martinrb at google.com> wrote:
> 
> On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
> 
>> 
>>> We should probably clarify whether we really mean that even word-tearing
>> on longs/doubles is allowed.
>> 
>> Just to be clear you are referring to atomicity rather than word tearing
>> as specified by JLS:
>> 
>> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 <
>> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6>
>> 
>> ? (I have tended to use word tearing interchangeably in the past and it
>> has caused confusion.)
>> 
>> 
> YIKES!  I just re-read
> 17.6. Word Tearing
> and
> 17.7. Non-atomic Treatment of double and long
> 
> and now realize I've been using "word tearing" to mean 17.7 instead of 17.6
> for many years.  I don't have a good word for 17.6, but I want something
> along the lines of "ghost writes" or "collateral damage".
> 
> Am I supposed to visualize "tearing" as (sad eye water) tears running out
> of one byte across neighbor bytes?

I call the 17.7 thing "struct tearing", in the State of the Values 2014.
http://cr.openjdk.java.net/~jrose/values/values.html

? John

From boehm at acm.org  Wed Jul 20 18:20:43 2016
From: boehm at acm.org (Hans Boehm)
Date: Wed, 20 Jul 2016 11:20:43 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
Message-ID: <CAPUmR1bi0aYFE8+L1vWk5rjv_0PZaq-x=3CiQN+jq=SFget4Lg@mail.gmail.com>

You are not alone.  I have the suspicion that "word tearing" used to mean
17.7 before the 2005 JLS revision.  But the JLS usage seems to have won,
for better or worse, at least in Java circles.

On Wed, Jul 20, 2016 at 9:59 AM, Martin Buchholz <martinrb at google.com>
wrote:

> On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz <Paul.Sandoz at oracle.com>
> wrote:
>
> >
> > > We should probably clarify whether we really mean that even
> word-tearing
> > on longs/doubles is allowed.
> >
> > Just to be clear you are referring to atomicity rather than word tearing
> > as specified by JLS:
> >
> > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 <
> > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6>
> >
> > ? (I have tended to use word tearing interchangeably in the past and it
> > has caused confusion.)
> >
> >
> YIKES!  I just re-read
> 17.6. Word Tearing
> and
> 17.7. Non-atomic Treatment of double and long
>
> and now realize I've been using "word tearing" to mean 17.7 instead of 17.6
> for many years.  I don't have a good word for 17.6, but I want something
> along the lines of "ghost writes" or "collateral damage".
>
> Am I supposed to visualize "tearing" as (sad eye water) tears running out
> of one byte across neighbor bytes?
>

From boehm at acm.org  Wed Jul 20 18:42:30 2016
From: boehm at acm.org (Hans Boehm)
Date: Wed, 20 Jul 2016 11:42:30 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
Message-ID: <CAPUmR1YVWqWeh2pwub4WBoNPmtsZ66qOFKm=jCYvZwW=Y4Q-9Q@mail.gmail.com>

On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
>
>
> > On 20 Jul 2016, at 02:25, Martin Buchholz <martinrb at google.com <mailto:
martinrb at google.com>> wrote:
>
> > C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses:
truly atomic (!) and single-memory-location-sequentially-consistent.
>
> Yes, it?s the latter that seems harder to apply.
>
I'm not sure whether it's "harder to apply" or "less consciously assumed",
i.e. generally implicitly assumed, but without the programmer's awareness.
At least in my experience, memory_order_relaxed tends to be surprisingly
commonly used for what one might call "single word data structures": An
individual word that describes some aspect of the state independent of
other data structures.  I suspect a lot of such code is not prepared to see
such data flip-flop back and forth repeatedly as the result of a single
update.  If a counter is only ever incremented by a single thread,
programmers don't expect it to decrease.  At a minimum, it's much easier to
reason about such code if you don't have to consider this possibility.

All hardware vendors either provide the property by default (errata aside),
or provide a relatively cheap mechanism that adds it (only Itanium that I
know of).

I believe the property is worth its (compiler only on the most commonly
used hardware) performance cost where you have some reason to believe that
the data is concurrently accessed. It's pretty clearly undesirable for
plain non-racing accesses, since it does interfere with compiler
optimization.  I would put it in a similar category to long/double/struct
atomicity.

From dl at cs.oswego.edu  Wed Jul 20 19:16:26 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 20 Jul 2016 15:16:26 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
Message-ID: <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>


Replying to Hans by replying to myself :-)

On 07/20/2016 08:49 AM, Doug Lea wrote:
> in C++-relaxed, compilers cannot perform
> some forms of  common subexpression elimination in the presence of possible
> aliasing, but for Java-plain (and C++-plain), they can.  As in:
>
> class Point ( int x, y; }
>
> void f(Point a, Point b) {
>   int r1 = a.x;
>   int r2 = b.x;
>   int r3 = a.x; // simplify to: int r3 = r1 ?
>   use (r1, r2, r3);
> }

Or, in pseudo-VarHandle style using "getM" (for varying Ms):

static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, "x", 
int.class);

void f(Point a, Point b) {
   int r1 = PX.getM(a);
   int r2 = PX.getM(b);
   int r3 = PX.getM(a); // *
   use (r1, r2, r3);
}

Can you simplify (*) to "r3 = r1" ? It depends on M:
* Java-Plain and C++-Plain: yes.
* Java Opaque: no.
* C++-Relaxed: only if a != b.
* (And, for the record, other modes: no)

This is one reason "opaque" mode is needed. Neither Plain nor Opaque exactly
match C++ Relaxed atomics, but together you can express everything (and
probably more).

You can create similar but more contrived-looking examples for
read-after-write and write-after-write. And also for write-after-read,
but that one may interact with out-of-thin-air and related issues.
(Which if we had a good enough solution for, or even knew how to
encapsulate, fleshing out formal/formalizable specs on the above should
not be hard. People do continue to work on this, so there is still hope.)

-Doug


From martinrb at google.com  Wed Jul 20 21:11:37 2016
From: martinrb at google.com (Martin Buchholz)
Date: Wed, 20 Jul 2016 14:11:37 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CAPUmR1bi0aYFE8+L1vWk5rjv_0PZaq-x=3CiQN+jq=SFget4Lg@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
	<CAPUmR1bi0aYFE8+L1vWk5rjv_0PZaq-x=3CiQN+jq=SFget4Lg@mail.gmail.com>
Message-ID: <CA+kOe09A+OpNKV=UDEwCFRJU=-qOtw4=6R1zemmFd4F43TSMtw@mail.gmail.com>

17.6 should be called "word bleeding"
17.7 should be called "long fission"

(fission breaks up your atoms!)

On Wed, Jul 20, 2016 at 11:20 AM, Hans Boehm <boehm at acm.org> wrote:

> You are not alone.  I have the suspicion that "word tearing" used to mean
> 17.7 before the 2005 JLS revision.  But the JLS usage seems to have won,
> for better or worse, at least in Java circles.
>
> On Wed, Jul 20, 2016 at 9:59 AM, Martin Buchholz <martinrb at google.com>
> wrote:
>
>> On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz <Paul.Sandoz at oracle.com>
>> wrote:
>>
>> >
>> > > We should probably clarify whether we really mean that even
>> word-tearing
>> > on longs/doubles is allowed.
>> >
>> > Just to be clear you are referring to atomicity rather than word tearing
>> > as specified by JLS:
>> >
>> > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6
>> <
>> > https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6>
>> >
>> > ? (I have tended to use word tearing interchangeably in the past and it
>> > has caused confusion.)
>> >
>> >
>> YIKES!  I just re-read
>> 17.6. Word Tearing
>> and
>> 17.7. Non-atomic Treatment of double and long
>>
>> and now realize I've been using "word tearing" to mean 17.7 instead of
>> 17.6
>> for many years.  I don't have a good word for 17.6, but I want something
>> along the lines of "ghost writes" or "collateral damage".
>>
>> Am I supposed to visualize "tearing" as (sad eye water) tears running out
>> of one byte across neighbor bytes?
>>
>
>

From boehm at acm.org  Wed Jul 20 23:17:47 2016
From: boehm at acm.org (Hans Boehm)
Date: Wed, 20 Jul 2016 16:17:47 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
Message-ID: <CAPUmR1aH3Jw_q=1R_h9V54BfuwznaneX1jcdWb41xB6aR2T45g@mail.gmail.com>

On Wed, Jul 20, 2016 at 12:16 PM, Doug Lea <dl at cs.oswego.edu> wrote:

>
> Replying to Hans by replying to myself :-)
>
> On 07/20/2016 08:49 AM, Doug Lea wrote:
>
>> in C++-relaxed, compilers cannot perform
>> some forms of  common subexpression elimination in the presence of
>> possible
>> aliasing, but for Java-plain (and C++-plain), they can.  As in:
>>
>> class Point ( int x, y; }
>>
>> void f(Point a, Point b) {
>>   int r1 = a.x;
>>   int r2 = b.x;
>>   int r3 = a.x; // simplify to: int r3 = r1 ?
>>   use (r1, r2, r3);
>> }
>>
>
> Or, in pseudo-VarHandle style using "getM" (for varying Ms):
>
> static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class,
> "x", int.class);
>
> void f(Point a, Point b) {
>   int r1 = PX.getM(a);
>   int r2 = PX.getM(b);
>   int r3 = PX.getM(a); // *
>   use (r1, r2, r3);
> }
>
> Can you simplify (*) to "r3 = r1" ? It depends on M:
> * Java-Plain and C++-Plain: yes.
> * Java Opaque: no.
>
Does Opaque imply cache coherence?  In the Opaque case, is r3 guaranteed to
see a store that is no earlier than the one seen by r2? Or are we still
only talking compiler optimizations?


> * C++-Relaxed: only if a != b.

* (And, for the record, other modes: no)
>
What if the compiler knows that a==b? Can all the get()s be merged, even
for Opaque?


> This is one reason "opaque" mode is needed. Neither Plain nor Opaque
> exactly
> match C++ Relaxed atomics, but together you can express everything (and
> probably more).
>
> You can create similar but more contrived-looking examples for
> read-after-write and write-after-write. And also for write-after-read,
> but that one may interact with out-of-thin-air and related issues.
> (Which if we had a good enough solution for, or even knew how to
> encapsulate, fleshing out formal/formalizable specs on the above should
> not be hard. People do continue to work on this, so there is still hope.)
>
> -Doug
>
>

From david.holmes at oracle.com  Thu Jul 21 05:16:52 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 21 Jul 2016 15:16:52 +1000
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CAPUmR1bi0aYFE8+L1vWk5rjv_0PZaq-x=3CiQN+jq=SFget4Lg@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
	<CAPUmR1bi0aYFE8+L1vWk5rjv_0PZaq-x=3CiQN+jq=SFget4Lg@mail.gmail.com>
Message-ID: <c706025f-2700-103a-fc90-fea2d5c23e05@oracle.com>

On 21/07/2016 4:20 AM, Hans Boehm wrote:
> You are not alone.  I have the suspicion that "word tearing" used to mean
> 17.7 before the 2005 JLS revision.  But the JLS usage seems to have won,
> for better or worse, at least in Java circles.

No not at all. word-tearing has "always" concerned the inability to 
perform sub-word atomic accesses - ie the subword has to be torn out of 
the word.

Here's a 2001 reference which was part of the discussion that led to the 
JLS update :)

http://www.cs.umd.edu/~pugh/java/memoryModel/archive/0967.html

Cheers,
David

>
> On Wed, Jul 20, 2016 at 9:59 AM, Martin Buchholz <martinrb at google.com>
> wrote:
>
>> On Wed, Jul 20, 2016 at 1:25 AM, Paul Sandoz <Paul.Sandoz at oracle.com>
>> wrote:
>>
>>>
>>>> We should probably clarify whether we really mean that even
>> word-tearing
>>> on longs/doubles is allowed.
>>>
>>> Just to be clear you are referring to atomicity rather than word tearing
>>> as specified by JLS:
>>>
>>> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6 <
>>> https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.6>
>>>
>>> ? (I have tended to use word tearing interchangeably in the past and it
>>> has caused confusion.)
>>>
>>>
>> YIKES!  I just re-read
>> 17.6. Word Tearing
>> and
>> 17.7. Non-atomic Treatment of double and long
>>
>> and now realize I've been using "word tearing" to mean 17.7 instead of 17.6
>> for many years.  I don't have a good word for 17.6, but I want something
>> along the lines of "ghost writes" or "collateral damage".
>>
>> Am I supposed to visualize "tearing" as (sad eye water) tears running out
>> of one byte across neighbor bytes?
>>

From aph at redhat.com  Thu Jul 21 05:45:40 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 21 Jul 2016 06:45:40 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
Message-ID: <57906184.3070008@redhat.com>

On 20/07/16 01:25, Martin Buchholz wrote:
> We should probably clarify whether we really mean that even word-tearing on
> longs/doubles is allowed.

I surely hope that the answer to that is "no"!

> C++ relaxed atomics are (perhaps!) stronger than "plain" in two senses:
> truly atomic (!) and single-memory-location-sequentially-consistent.

Earlier in the development of this respin of the JMM, I remember
someone (Doug?) saying that compatibility with C++ was an important
consideration, We seem to be drifting away from that, for no good
reason that I understand.

Andrew.


From aph at redhat.com  Thu Jul 21 05:53:48 2016
From: aph at redhat.com (Andrew Haley)
Date: Thu, 21 Jul 2016 06:53:48 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <57906184.3070008@redhat.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<57906184.3070008@redhat.com>
Message-ID: <5790636C.1050804@redhat.com>

On 21/07/16 06:45, Andrew Haley wrote:
> Earlier in the development of this respin of the JMM, I remember
> someone (Doug?) saying that compatibility with C++ was an important
> consideration, We seem to be drifting away from that, for no good
> reason that I understand.

I withdraw this comment in the light of later replies to this thread.

Andrew.


From dl at cs.oswego.edu  Thu Jul 21 12:53:07 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 21 Jul 2016 08:53:07 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CAPUmR1aH3Jw_q=1R_h9V54BfuwznaneX1jcdWb41xB6aR2T45g@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<CAPUmR1aH3Jw_q=1R_h9V54BfuwznaneX1jcdWb41xB6aR2T45g@mail.gmail.com>
Message-ID: <2bdaf6ba-7b14-3729-86ea-8a76d6aadf4f@cs.oswego.edu>

On 07/20/2016 07:17 PM, Hans Boehm wrote:
>     Can you simplify (*) to "r3 = r1" ? It depends on M:
>     * Java-Plain and C++-Plain: yes.
>     * Java Opaque: no.
>
> Does Opaque imply cache coherence?

Here, yes.

>     * C++-Relaxed: only if a != b.
>
>     * (And, for the record, other modes: no)
>
> What if the compiler knows that a==b? Can all the get()s be merged, even for Opaque?

Not in general unless thread-private (unescaped).

I hope to write up a summary of progress and open issues soon,
but in the mean time, here is the extremely telegraphic version
of my current thoughts:

Start with coherence, characterized in same way as current C++17 draft
(http://www.open-std.org/jtc1/sc22/wg21/). Like C++17, use sc-per-loc
as basis, here for for opaque, but (unlike C++ relaxed) constraining
read->write reorderings ("prescient" or "promised" writes), probably
based on in-progress work by mpi-sws group.  Distinguish Plain from
Opaque by weakening to sc-per-view (i.e., possibly-aliased access
paths). Disable merges for all modes except plain if there exist any
possible execution that could detect doing so (for example
disabling transforming a spin-loop into an "if"). Account for possible
mode weakenings for (unescaped) thread-private variables among other
cases. Add RA (release/acquire) and SC ("volatile") rules based on
work over the past year or so by Mark Batty, Viktor Vafeiadis, and
others (which also seem mostly present in C++17 draft). Also add other
fence and final field rules.

-Doug


From john.r.rose at oracle.com  Fri Jul 22 02:57:18 2016
From: john.r.rose at oracle.com (John Rose)
Date: Thu, 21 Jul 2016 19:57:18 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
Message-ID: <C6CAEB69-43C2-417F-94CF-E608935D21E9@oracle.com>

On Jul 20, 2016, at 1:25 AM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:

>> 
>> On Tue, Jul 19, 2016 at 1:51 PM, Andrew Haley <aph at redhat.com <mailto:aph at redhat.com>> wrote:
>>> On 19/07/16 09:39, Paul Sandoz wrote:
>>> Plain behaves like non-volatile/non-final field access e.g. like
>>> get/putfield byte codes.
>> 
>> We should probably clarify whether we really mean that even word-tearing on longs/doubles is allowed.

Putting aside the history and esthetics of terms, the big question
here is whether to remove the exception for 64 bit types in
17.7 (Non-atomic Treatment of double and long), and mandate
that all primitive types are atomic, including non-volatile longs
and doubles.

Is it time to do that yet, or is there some 32-bit JVM out there
that will fall over if it has to do the volatile dance even on
non-volatile types?

I'm going to guess that there still *ARE* such JVMs out there,
but their number is decreasing exponentially over time.
Eventually we can do this.

Argument to keep things as they are:  64-bit non-atomicity
(aka struct tearing) is just a precursor to non-atomicity of 128-bit
and larger value types.  It's a permanent feature on our landscape,
so don't fight it.

VarHandles provide an easy-enough way to select either the
atomic or the non-atomic accesses for 64-bit things (right?)
and presumably they will do the same for value types.

I see one possibly urgent argument to move away from
non-atomicity of longs in the VH API:  VH's support atomic
operations on arbitrary, unprepared longs.  (Before, the
JVM got fair warning of atomicity, because the long
was tagged as "volatile".  Now any long is fair game.)
Doesn't that require even plain references to at least
look around carefully for a VH, before they just move
the two halves non-atomically?  After all, for large
structs, the STM required for atomics is *incompatible*
with the naive component-wise loads and stores.

Or, do VH's just refuse to perform atomic operations on
non-volatile longs, on 32-bit machines?

One way to avoid these corner cases is just chop off
the corner, and require all JVMs to treat 64-bit primitives
as atomic, always.

? John

From dl at cs.oswego.edu  Fri Jul 22 11:11:00 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 22 Jul 2016 07:11:00 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <C6CAEB69-43C2-417F-94CF-E608935D21E9@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<C6CAEB69-43C2-417F-94CF-E608935D21E9@oracle.com>
Message-ID: <71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu>

On 07/21/2016 10:57 PM, John Rose wrote:

> Putting aside the history and esthetics of terms, the big question
> here is whether to remove the exception for 64 bit types in
> 17.7 (Non-atomic Treatment of double and long), and mandate
> that all primitive types are atomic, including non-volatile longs
> and doubles.

(This was in the initial issues list for JMM revision, and like
every other issue, refuses to go away all by itself :-)

>
> Is it time to do that yet, or is there some 32-bit JVM out there
> that will fall over if it has to do the volatile dance even on
> non-volatile types?
>
> I'm going to guess that there still *ARE* such JVMs out there,
> but their number is decreasing exponentially over time.
> Eventually we can do this.
>
> Argument to keep things as they are:  64-bit non-atomicity
> (aka struct tearing) is just a precursor to non-atomicity of 128-bit
> and larger value types.  It's a permanent feature on our landscape,
> so don't fight it.

Right. I think that this where we last left this.

>
> VarHandles provide an easy-enough way to select either the
> atomic or the non-atomic accesses for 64-bit things (right?)
> and presumably they will do the same for value types.
>
> I see one possibly urgent argument to move away from
> non-atomicity of longs in the VH API:  VH's support atomic
> operations on arbitrary, unprepared longs.

The recommended usage is that (as has always been the case), concurrently
accessible fields should be declared as volatile. This provides safe defaults.
People can then use VH for other (non-Plain) access methods.
If people follow this usage guidance, all is well. Except that this doesn't
hold for array elements, that cannot be declared as volatile.
Here, usages relying on access atomicity must use only non-Plain access
methods.  This is not always easy to ensure -- people need to avoid
calling other methods that might access elements without VarHandles
unless there is no possibility of concurrent access during call.
But people writing intentionally racy code using arrays need to be careful
about things like this anyway.

-Doug


From paul.sandoz at oracle.com  Fri Jul 22 11:38:01 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Fri, 22 Jul 2016 13:38:01 +0200
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <C6CAEB69-43C2-417F-94CF-E608935D21E9@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<C6CAEB69-43C2-417F-94CF-E608935D21E9@oracle.com>
Message-ID: <42711493-491F-4326-88DF-8BB6CBBE3C39@oracle.com>


> On 22 Jul 2016, at 04:57, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Jul 20, 2016, at 1:25 AM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
> 
>>> 
>>> On Tue, Jul 19, 2016 at 1:51 PM, Andrew Haley <aph at redhat.com <mailto:aph at redhat.com>> wrote:
>>>> On 19/07/16 09:39, Paul Sandoz wrote:
>>>> Plain behaves like non-volatile/non-final field access e.g. like
>>>> get/putfield byte codes.
>>> 
>>> We should probably clarify whether we really mean that even word-tearing on longs/doubles is allowed.
> 
> Putting aside the history and esthetics of terms, the big question
> here is whether to remove the exception for 64 bit types in
> 17.7 (Non-atomic Treatment of double and long), and mandate
> that all primitive types are atomic, including non-volatile longs
> and doubles.
> 
> Is it time to do that yet, or is there some 32-bit JVM out there
> that will fall over if it has to do the volatile dance even on
> non-volatile types?
> 
> I'm going to guess that there still *ARE* such JVMs out there,
> but their number is decreasing exponentially over time.
> Eventually we can do this.
> 
> Argument to keep things as they are:  64-bit non-atomicity
> (aka struct tearing) is just a precursor to non-atomicity of 128-bit
> and larger value types.  It's a permanent feature on our landscape,
> so don't fight it.
> 
> VarHandles provide an easy-enough way to select either the
> atomic or the non-atomic accesses for 64-bit things (right?)

Yes, set/get is not guaranteed to be atomic, all other accesses are under the guidelines Doug mentions in his last email.


> and presumably they will do the same for value types.
> 
> I see one possibly urgent argument to move away from
> non-atomicity of longs in the VH API:  VH's support atomic
> operations on arbitrary, unprepared longs.  (Before, the
> JVM got fair warning of atomicity, because the long
> was tagged as "volatile".  Now any long is fair game.)
> Doesn't that require even plain references to at least
> look around carefully for a VH, before they just move
> the two halves non-atomically?  After all, for large
> structs, the STM required for atomics is *incompatible*
> with the naive component-wise loads and stores.
> 
> Or, do VH's just refuse to perform atomic operations on
> non-volatile longs, on 32-bit machines?
> 

We have some jcstress tests checking atomicity.

The concern i have implementation-wise is AtomicLong has this:

  /**
   * Records whether the underlying JVM supports lockless
   * compareAndSwap for longs. While the intrinsic compareAndSwapLong
   * method works in either case, some constructions should be
   * handled at Java level to avoid locking user-visible locks.
   */
  static final boolean VM_SUPPORTS_LONG_CAS = VMSupportsCS8();

  /**
   * Returns whether underlying JVM supports lockless CompareAndSet
   * for longs. Called only once and cached in VM_SUPPORTS_LONG_CAS.
   */
  private static native boolean VMSupportsCS8();

And that field VM_SUPPORTS_LONG_CAS is used only in AtomicLongFieldUpdater. Relevant VarHandle implementations don?t currently make this distinction. At the moment i don?t fully understand the bit about "locking user-visible locks?. However, this seems separate from atomicity.

Paul.


> One way to avoid these corner cases is just chop off
> the corner, and require all JVMs to treat 64-bit primitives
> as atomic, always.
> 
> ? John


From dl at cs.oswego.edu  Fri Jul 22 12:27:30 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 22 Jul 2016 08:27:30 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <42711493-491F-4326-88DF-8BB6CBBE3C39@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<C6CAEB69-43C2-417F-94CF-E608935D21E9@oracle.com>
	<42711493-491F-4326-88DF-8BB6CBBE3C39@oracle.com>
Message-ID: <c4d5a896-1f45-803d-c7d7-f6b7bc1def64@cs.oswego.edu>

On 07/22/2016 07:38 AM, Paul Sandoz wrote:
> The concern i have implementation-wise is AtomicLong has this:
>
>   /**
>    * Records whether the underlying JVM supports lockless
>    * compareAndSwap for longs. While the intrinsic compareAndSwapLong
>    * method works in either case, some constructions should be
>    * handled at Java level to avoid locking user-visible locks.
>    */
>   static final boolean VM_SUPPORTS_LONG_CAS = VMSupportsCS8();
>

This was initially needed to support Power5. I am not sure if
it returns false on any jdk9-supported platforms -- if so,
probably only non-OpenJDK "closed" ones. This problem is/was
that volatile long reads are implemented differently than
AtomicLong.get (i.e., VH getVolatile) and that the lock-based
CAS implementation relied on the latter. The internal
AtomicLongFieldUpdater.LockedUpdater was used to make sure that
the locked versions were always used so long as all accesses
used the Updater. People working on closed hotspot ports are
invited to help figure out whether this is still necessary.

-Doug


From martinrb at google.com  Fri Jul 22 20:29:58 2016
From: martinrb at google.com (Martin Buchholz)
Date: Fri, 22 Jul 2016 13:29:58 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <c706025f-2700-103a-fc90-fea2d5c23e05@oracle.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
	<CAPUmR1bi0aYFE8+L1vWk5rjv_0PZaq-x=3CiQN+jq=SFget4Lg@mail.gmail.com>
	<c706025f-2700-103a-fc90-fea2d5c23e05@oracle.com>
Message-ID: <CA+kOe0_HK8sTecHiTTz+uOAnb+FwZCD7HHZ2uZKEhjsheVZjyg@mail.gmail.com>

On Wed, Jul 20, 2016 at 10:16 PM, David Holmes <david.holmes at oracle.com>
wrote:

> On 21/07/2016 4:20 AM, Hans Boehm wrote:
>
>> You are not alone.  I have the suspicion that "word tearing" used to mean
>> 17.7 before the 2005 JLS revision.  But the JLS usage seems to have won,
>> for better or worse, at least in Java circles.
>>
>
> No not at all. word-tearing has "always" concerned the inability to
> perform sub-word atomic accesses - ie the subword has to be torn out of the
> word.
>
> Here's a 2001 reference which was part of the discussion that led to the
> JLS update :)
>
> http://www.cs.umd.edu/~pugh/java/memoryModel/archive/0967.html


Thanks for the history lesson.  "word tearing" still seems unintuitive to
me - it's the INability to tear up a word into sub-words that's the
problem.  That is, "word tearing" is not the problem, it's the solution we
can't use!  But my own "word bleeding" is also not that great, and unlikely
to catch on.

From martinrb at google.com  Fri Jul 22 20:50:57 2016
From: martinrb at google.com (Martin Buchholz)
Date: Fri, 22 Jul 2016 13:50:57 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<C6CAEB69-43C2-417F-94CF-E608935D21E9@oracle.com>
	<71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu>
Message-ID: <CA+kOe09oB-pwA=7RTKxEjjFYeSkojToi2Uk5HE05iQ-n9LCt1w@mail.gmail.com>

On Fri, Jul 22, 2016 at 4:11 AM, Doug Lea <dl at cs.oswego.edu> wrote:

>
> The recommended usage is that (as has always been the case), concurrently
> accessible fields should be declared as volatile. This provides safe
> defaults.
> People can then use VH for other (non-Plain) access methods.
> If people follow this usage guidance, all is well. Except that this doesn't
> hold for array elements, that cannot be declared as volatile.
> Here, usages relying on access atomicity must use only non-Plain access
> methods.  This is not always easy to ensure -- people need to avoid
> calling other methods that might access elements without VarHandles
> unless there is no possibility of concurrent access during call.
> But people writing intentionally racy code using arrays need to be careful
> about things like this anyway.


Is it reasonable to add syntax for volatile array elements?
There's obvious confusion between the reference and the elements, e.g. we
already have
volatile int[] volatile_array_reference;
It could go like C declarations.  Then we would get
(volatile int)[] array_with_volatile_elements;
volatile (volatile int)[] volatile_array_reference_with_volatile_elements;

Yeah, they'll hate us forever for adding that!

From boehm at acm.org  Fri Jul 22 21:51:03 2016
From: boehm at acm.org (Hans Boehm)
Date: Fri, 22 Jul 2016 14:51:03 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CA+kOe0_HK8sTecHiTTz+uOAnb+FwZCD7HHZ2uZKEhjsheVZjyg@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
	<CAPUmR1bi0aYFE8+L1vWk5rjv_0PZaq-x=3CiQN+jq=SFget4Lg@mail.gmail.com>
	<c706025f-2700-103a-fc90-fea2d5c23e05@oracle.com>
	<CA+kOe0_HK8sTecHiTTz+uOAnb+FwZCD7HHZ2uZKEhjsheVZjyg@mail.gmail.com>
Message-ID: <CAPUmR1YhnqNyYaSmSCMYHJQrFge_Nc71wX6FTZWnFBuKrOK_aw@mail.gmail.com>

"Word-tearing" was definitely and consistently used that way in the JSR 133
discussions.  That's where the JLS terminlogy came from.  But people I've
talked to who didn't participate in that effort generally seemed to share
Martin's opinion.

On Fri, Jul 22, 2016 at 1:29 PM, Martin Buchholz <martinrb at google.com>
wrote:

>
>
> On Wed, Jul 20, 2016 at 10:16 PM, David Holmes <david.holmes at oracle.com>
> wrote:
>
>> On 21/07/2016 4:20 AM, Hans Boehm wrote:
>>
>>> You are not alone.  I have the suspicion that "word tearing" used to mean
>>> 17.7 before the 2005 JLS revision.  But the JLS usage seems to have won,
>>> for better or worse, at least in Java circles.
>>>
>>
>> No not at all. word-tearing has "always" concerned the inability to
>> perform sub-word atomic accesses - ie the subword has to be torn out of the
>> word.
>>
>> Here's a 2001 reference which was part of the discussion that led to the
>> JLS update :)
>>
>> http://www.cs.umd.edu/~pugh/java/memoryModel/archive/0967.html
>
>
> Thanks for the history lesson.  "word tearing" still seems unintuitive to
> me - it's the INability to tear up a word into sub-words that's the
> problem.  That is, "word tearing" is not the problem, it's the solution we
> can't use!  But my own "word bleeding" is also not that great, and unlikely
> to catch on.
>

From boehm at acm.org  Fri Jul 22 22:07:03 2016
From: boehm at acm.org (Hans Boehm)
Date: Fri, 22 Jul 2016 15:07:03 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<C6CAEB69-43C2-417F-94CF-E608935D21E9@oracle.com>
	<71516e79-e0d5-af0f-b6ee-fb1c3a877605@cs.oswego.edu>
Message-ID: <CAPUmR1bfaMx_7fBpA1298G=DvY0_9hA4j5DyOF=0vfWgcmZZRQ@mail.gmail.com>

On Fri, Jul 22, 2016 at 4:11 AM, Doug Lea <dl at cs.oswego.edu> wrote:
>
> On 07/21/2016 10:57 PM, John Rose wrote:
>
>> Putting aside the history and esthetics of terms, the big question
>> here is whether to remove the exception for 64 bit types in
>> 17.7 (Non-atomic Treatment of double and long), and mandate
>> that all primitive types are atomic, including non-volatile longs
>> and doubles.
>
>
> (This was in the initial issues list for JMM revision, and like
> every other issue, refuses to go away all by itself :-)
>
>>
>> Is it time to do that yet, or is there some 32-bit JVM out there
>> that will fall over if it has to do the volatile dance even on
>> non-volatile types?
>>
>> I'm going to guess that there still *ARE* such JVMs out there,
>> but their number is decreasing exponentially over time.
>> Eventually we can do this.
>>
>> Argument to keep things as they are:  64-bit non-atomicity
>> (aka struct tearing) is just a precursor to non-atomicity of 128-bit
>> and larger value types.  It's a permanent feature on our landscape,
>> so don't fight it.
>
>
> Right. I think that this where we last left this.

The other argument that we missed last time is there are MIPS variants for
which atomicity of 64-bit types is expensive.  The same applies to 32 bit
ARM with the "large physical address extension".  I don't think these
constitute a large fraction of the interesting devices anymore, but I also
suspect there are still way too many of them to be ignored.

From john.r.rose at oracle.com  Fri Jul 22 22:15:34 2016
From: john.r.rose at oracle.com (John Rose)
Date: Fri, 22 Jul 2016 15:15:34 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CAPUmR1YhnqNyYaSmSCMYHJQrFge_Nc71wX6FTZWnFBuKrOK_aw@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<CA+kOe09-t1jhQs3MYKeOMov2WRdwsxAyTUxFad+Gdg-5VTkO4g@mail.gmail.com>
	<CAPUmR1bi0aYFE8+L1vWk5rjv_0PZaq-x=3CiQN+jq=SFget4Lg@mail.gmail.com>
	<c706025f-2700-103a-fc90-fea2d5c23e05@oracle.com>
	<CA+kOe0_HK8sTecHiTTz+uOAnb+FwZCD7HHZ2uZKEhjsheVZjyg@mail.gmail.com>
	<CAPUmR1YhnqNyYaSmSCMYHJQrFge_Nc71wX6FTZWnFBuKrOK_aw@mail.gmail.com>
Message-ID: <B8F10805-01B5-4C24-9CD9-768C8C629AE9@oracle.com>

On Jul 22, 2016, at 2:51 PM, Hans Boehm <boehm at acm.org> wrote:
> 
> "Word-tearing" was definitely and consistently used that way in the JSR 133
> discussions.  That's where the JLS terminlogy came from. 

The word that is torn is the word *containing* the datum of interest.
Which is why when people quite naturally assume that "the word"
is the the datum itself, the term is unintelligible.

With "struct tearing" it is more clear that the thing being torn is
in fact the datum of interest.  Where "torn" covers all cases of
"exposes non-semantic memory states in variables".

On Jul 22, 2016, at 1:50 PM, Martin Buchholz <martinrb at google.com> wrote:

> Is it reasonable to add syntax for volatile array elements?


It's difficult.  I've been thinking about this for a while for the related
use case of frozen arrays (array-of-final-T).  With frozen arrays you
need a store check even for primitive arrays.  For volatile arrays
you'd need a load check and a store check for all arrays.

Since all array references include a range check, a JVM implementor
would want to find a clever way of piggy-backing the load and
store checks onto the range check, and then having all the checks
float together out of loops.

Really, an array of volatiles is an oxymoron like a herd of cats
or team of individuals.  In normal arrays there is affinity between
neighboring values; with volatiles there is a basic decoupling
between neighbors, since each has its own distinct sequence
of effects.  (I'm speaking of typical uses of arrays; you can use
arrays as low-level storage where neighbors have no logical
connection with each other.  We optimize for the typical case.)

All this assumes we make array-of-volatile-ints be a subtype
of array-of-int-type.  An alternative today is to have array-like
container types which are *not* related to the legacy array type.
Like a private normal array, accessed only by VH-based atomics.

? John

From aph at redhat.com  Mon Jul 25 08:54:14 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 25 Jul 2016 09:54:14 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
Message-ID: <5795D3B6.9050407@redhat.com>

On 20/07/16 20:16, Doug Lea wrote:
> Or, in pseudo-VarHandle style using "getM" (for varying Ms):
> 
> static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, "x", 
> int.class);
> 
> void f(Point a, Point b) {
>    int r1 = PX.getM(a);
>    int r2 = PX.getM(b);
>    int r3 = PX.getM(a); // *
>    use (r1, r2, r3);
> }
> 
> Can you simplify (*) to "r3 = r1" ? It depends on M:
> * Java-Plain and C++-Plain: yes.
> * Java Opaque: no.
> * C++-Relaxed: only if a != b.
> * (And, for the record, other modes: no)
> 
> This is one reason "opaque" mode is needed.

But the processor hardware is allowed to simplify (*) to "r3 = r1" even
if Opaque is used.  So, again, why does it matter?

Andrew.


From aph at redhat.com  Mon Jul 25 08:54:20 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 25 Jul 2016 09:54:20 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <578E92D9.80201@redhat.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
Message-ID: <5795D3BC.4050308@redhat.com>

On 19/07/16 21:51, Andrew Haley wrote:
> On 19/07/16 09:39, Paul Sandoz wrote:
>>
>> Both plain and opaque have ?no assurance of memory ordering effects
>> with respect to other threads? but opaque is stronger in the sense
>> that the compiler is restricted in what optimisations it may
>> perform, in a sense the access is ?opaque? to the compiler e.g. it
>> cannot elide the access or fold it into a more recent access etc.
> 
> OK, but if the processor can reorder accesses (and satisfy them from
> local caches) in the absence of fences, why is this a distinction that
> is worth bothering about?  And how on Earth would you make such a
> distinction in the context of a high-level language specification?

I'm still wondering about this one.  I think Doug has said that
Opaque accesses are coherent but Plain accesses aren't.  I guess
there's also non-atomic treatment of long and double.

Andrew.


From dl at cs.oswego.edu  Mon Jul 25 12:35:29 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 25 Jul 2016 08:35:29 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <5795D3BC.4050308@redhat.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com> <5795D3BC.4050308@redhat.com>
Message-ID: <d6fe3adb-7cc7-27eb-80cf-cd835c50f7c7@cs.oswego.edu>

On 07/25/2016 04:54 AM, Andrew Haley wrote:
>
> I'm still wondering about this one.  I think Doug has said that
> Opaque accesses are coherent but Plain accesses aren't.  I guess
> there's also non-atomic treatment of long and double.

Users familiar with C/C++-11/17 will use Java opaque whenever
they would use C++ atomic-relaxed, and the implementation effects
should be indistinguishable. Which is not the same as saying the specs
can or should be identical (if only because they deal with
different languages). Which sometimes forces formal attention
to distinctions otherwise not worth bothering about.

Reminder of the game plan for VarHandles Javadocs: Initially
use wordings that are frustratingly loose but surely not wrong
with respect to the range of possible rigorous specs.
Improve them when possible.

-Doug


From dl at cs.oswego.edu  Mon Jul 25 13:24:37 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 25 Jul 2016 09:24:37 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <5795D3B6.9050407@redhat.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
Message-ID: <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>

On 07/25/2016 04:54 AM, Andrew Haley wrote:
> On 20/07/16 20:16, Doug Lea wrote:
>> Or, in pseudo-VarHandle style using "getM" (for varying Ms):
>>
>> static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, "x",
>> int.class);
>>
>> void f(Point a, Point b) {
>>    int r1 = PX.getM(a);
>>    int r2 = PX.getM(b);
>>    int r3 = PX.getM(a); // *
>>    use (r1, r2, r3);
>> }
>>
>> Can you simplify (*) to "r3 = r1" ? It depends on M:
>> * Java-Plain and C++-Plain: yes.
>> * Java Opaque: no.
>> * C++-Relaxed: only if a != b.
>> * (And, for the record, other modes: no)
>>
>> This is one reason "opaque" mode is needed.
>
> But the processor hardware is allowed to simplify (*) to "r3 = r1" even
> if Opaque is used.  So, again, why does it matter?
>

The existence of one case where it may not matter doesn't mean
that it is always OK (consider loops), and so the best practical
answer for compilers is "no" here.

(Arguably, it should similarly be "no" for C++-relaxed, depending in
part on whether "coherence" is defined to entail progress properties
by the memory system (as cache memory-system specs normally do)
especially given the C++17 updates about execution progress.)

The issue of merging reads is in many ways symmetric to that of
inserting writes. For example when spilling registers, JVMs
never store transient garbage-values into possibly-visible
home locations of variables, even though there may be cases
where they could.

Coming up with a formal model and spec that clearly delineates legal
transformations hits a lot of "little" issues along these lines.

-Doug


From aph at redhat.com  Mon Jul 25 13:50:36 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 25 Jul 2016 14:50:36 +0100
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
Message-ID: <5796192C.3020402@redhat.com>

On 25/07/16 14:24, Doug Lea wrote:
> On 07/25/2016 04:54 AM, Andrew Haley wrote:
>> On 20/07/16 20:16, Doug Lea wrote:
>>> Or, in pseudo-VarHandle style using "getM" (for varying Ms):
>>>
>>> static VarHandle PX = MethodHandles.lookup().findVarHandle(Point.class, "x",
>>> int.class);
>>>
>>> void f(Point a, Point b) {
>>>    int r1 = PX.getM(a);
>>>    int r2 = PX.getM(b);
>>>    int r3 = PX.getM(a); // *
>>>    use (r1, r2, r3);
>>> }
>>>
>>> Can you simplify (*) to "r3 = r1" ? It depends on M:
>>> * Java-Plain and C++-Plain: yes.
>>> * Java Opaque: no.
>>> * C++-Relaxed: only if a != b.
>>> * (And, for the record, other modes: no)
>>>
>>> This is one reason "opaque" mode is needed.
>>
>> But the processor hardware is allowed to simplify (*) to "r3 = r1" even
>> if Opaque is used.  So, again, why does it matter?
> 
> The existence of one case where it may not matter doesn't mean
> that it is always OK (consider loops), and so the best practical
> answer for compilers is "no" here.

Well, OK, but I'm trying to think of one case where a Java program
could tell the difference between the two, and I'm coming up empty.
One could argue (and I would argue) that if it's not possible to write
such a test case then perhaps such a thing doesn't belong in a
language specification.

> Coming up with a formal model and spec that clearly delineates legal
> transformations hits a lot of "little" issues along these lines.

It sure does!

Andrew.

From dl at cs.oswego.edu  Mon Jul 25 14:28:59 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 25 Jul 2016 10:28:59 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <5796192C.3020402@redhat.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
	<5796192C.3020402@redhat.com>
Message-ID: <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>

On 07/25/2016 09:50 AM, Andrew Haley wrote:
> Well, OK, but I'm trying to think of one case where a Java program
> could tell the difference between the two, and I'm coming up empty.

Oh, sorry for not including some. Using Point and PX VarHandle for Point.x:

1. Unbounded spin:
   while (PX.getOpaque(a) == 0) ;

Note that programmers would normally use getAcquire or getVolatile
here, but the question remains even if they don't.

Can this be transformed into conditional infinite spin? As in:
   if (PX.getOpaque(a) == 0) for (;;) ;
Not if coherence is defined to entail progress.

2. Bounded spin:
   long i = 1000;
   while (PX.getOpaque(a) == 0 && --i > 0) ;

Can this be optimized into a no-op? What if i = Long.MaxValue?
Under coherence, an implementation would have to establish some
maximum bound K for merges to decide if/when to do this.
In which case the best option is for the spec to say that K must
be exactly one (i.e., no merges) for the sake of definitiveness.

-Doug


From boehm at acm.org  Mon Jul 25 18:19:05 2016
From: boehm at acm.org (Hans Boehm)
Date: Mon, 25 Jul 2016 11:19:05 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
	<5796192C.3020402@redhat.com>
	<1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>
Message-ID: <CAPUmR1bsG+w66yzd6hLNk2U9ZOUHMS0yYOUdYiRbX7CAgSSuDw@mail.gmail.com>

Just to make sure we're clear here.  The differences between Opaque and
Plain seem to be:

1. Opaque is cache coherent (i.e. single-variable sequentially consistent),
just like memory_order_relaxed in C++.  This means that Opaque will
generate different instructions on architectures that don't promise cache
coherence by default. (Currently probably just Itanium, but hardware
architects seem to eventually want to apply similar optimizations to
compilers.)

2. Opaque prevents compiler merging of accesses, which probably makes it
more like volatile atomic<T> in C++.  (WG21/SG1 has been discussing some
related restrictions on non-volatile atomics, but they haven't gone
anywhere. Certainly C++17 is unlikely to say anything here. From my
perspective, C++ "volatile" really seems to be more defined by processor
ABIs than the language standard, for the reasons Andrew mentioned.
Standard-conforming programs usually can't tell conclusively whether the
rules are being followed, but low-level systems programs can.)

In my mind, (2) is separable from coherence.

The intent would be to strengthen (Java) volatile, etc., so they are
strictly stronger than Opaque? Currently I don't think there is a guarantee
that a bounded spin loop using volatiles can't be collapsed to a no-op.
Presumably no reasonable implementations actually do that, however.  I have
no idea whether there are implementations that merge a pair of volatile
loads.

On Mon, Jul 25, 2016 at 7:28 AM, Doug Lea <dl at cs.oswego.edu> wrote:

> On 07/25/2016 09:50 AM, Andrew Haley wrote:
>
>> Well, OK, but I'm trying to think of one case where a Java program
>> could tell the difference between the two, and I'm coming up empty.
>>
>
> Oh, sorry for not including some. Using Point and PX VarHandle for Point.x:
>
> 1. Unbounded spin:
>   while (PX.getOpaque(a) == 0) ;
>
> Note that programmers would normally use getAcquire or getVolatile
> here, but the question remains even if they don't.
>
> Can this be transformed into conditional infinite spin? As in:
>   if (PX.getOpaque(a) == 0) for (;;) ;
> Not if coherence is defined to entail progress.
>
> 2. Bounded spin:
>   long i = 1000;
>   while (PX.getOpaque(a) == 0 && --i > 0) ;
>
> Can this be optimized into a no-op? What if i = Long.MaxValue?
> Under coherence, an implementation would have to establish some
> maximum bound K for merges to decide if/when to do this.
> In which case the best option is for the spec to say that K must
> be exactly one (i.e., no merges) for the sake of definitiveness.
>
> -Doug
>
>
>
>
>

From dl at cs.oswego.edu  Mon Jul 25 19:24:48 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 25 Jul 2016 15:24:48 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CAPUmR1bsG+w66yzd6hLNk2U9ZOUHMS0yYOUdYiRbX7CAgSSuDw@mail.gmail.com>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com> <57890ED7.8090704@cs.oswego.edu>
	<578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
	<5796192C.3020402@redhat.com>
	<1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>
	<CAPUmR1bsG+w66yzd6hLNk2U9ZOUHMS0yYOUdYiRbX7CAgSSuDw@mail.gmail.com>
Message-ID: <dff93d6d-7aeb-7f49-12bc-8bc2760bad37@cs.oswego.edu>

On 07/25/2016 02:19 PM, Hans Boehm wrote:

> 1. Opaque is cache coherent (i.e. single-variable sequentially consistent), just
> like memory_order_relaxed in C++.
>
> 2. Opaque prevents compiler merging of accesses,
>
> In my mind, (2) is separable from coherence.

This might not be the right venue to discuss whether the new C++17 sec 1.10.4
progress requirements apply to the memory system. I think they must, and
that this would be consistent with common formal cache-memory-system specs.

In which case you are inevitably led to the no-merge rule, as seen in the
examples I posted.

And even if this were not done in C++, I don't know any argument for
not doing so in Java. No programmer would be happy if their bounded
spin loops were allowed to be transformed into no-ops. Why allow
something that literally no one wants rather than just hoping that
compilers don't happen to do it?

(Gratuitously editorializing, one would think that in C++,
it might also be popular to adopt this interpretation, and
eliminate the need to ever integrate C "volatile", or to
re-spec consume mode.)

-Doug


From paulmck at linux.vnet.ibm.com  Tue Jul 26 17:09:18 2016
From: paulmck at linux.vnet.ibm.com (Paul E. McKenney)
Date: Tue, 26 Jul 2016 10:09:18 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <dff93d6d-7aeb-7f49-12bc-8bc2760bad37@cs.oswego.edu>
References: <CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
	<5796192C.3020402@redhat.com>
	<1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>
	<CAPUmR1bsG+w66yzd6hLNk2U9ZOUHMS0yYOUdYiRbX7CAgSSuDw@mail.gmail.com>
	<dff93d6d-7aeb-7f49-12bc-8bc2760bad37@cs.oswego.edu>
Message-ID: <20160726170918.GA7094@linux.vnet.ibm.com>

On Mon, Jul 25, 2016 at 03:24:48PM -0400, Doug Lea wrote:
> On 07/25/2016 02:19 PM, Hans Boehm wrote:
> 
> >1. Opaque is cache coherent (i.e. single-variable sequentially consistent), just
> >like memory_order_relaxed in C++.
> >
> >2. Opaque prevents compiler merging of accesses,
> >
> >In my mind, (2) is separable from coherence.
> 
> This might not be the right venue to discuss whether the new C++17 sec 1.10.4
> progress requirements apply to the memory system. I think they must, and
> that this would be consistent with common formal cache-memory-system specs.
> 
> In which case you are inevitably led to the no-merge rule, as seen in the
> examples I posted.
> 
> And even if this were not done in C++, I don't know any argument for
> not doing so in Java. No programmer would be happy if their bounded
> spin loops were allowed to be transformed into no-ops. Why allow
> something that literally no one wants rather than just hoping that
> compilers don't happen to do it?
> 
> (Gratuitously editorializing, one would think that in C++,
> it might also be popular to adopt this interpretation, and
> eliminate the need to ever integrate C "volatile", or to
> re-spec consume mode.)

Yes and no.

If I am working on a low-level synchronization primitive, then yes,
I really do want the system to do -exactly- what I tell it to, no more,
no less.

But in higher-level code, I would likely be quite happy for the compiler
to fuse accesses, if it could do so without violating the memory model.

							Thanx, Paul


From boehm at acm.org  Tue Jul 26 19:26:36 2016
From: boehm at acm.org (Hans Boehm)
Date: Tue, 26 Jul 2016 12:26:36 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <dff93d6d-7aeb-7f49-12bc-8bc2760bad37@cs.oswego.edu>
References: <834B7738-003E-42CC-B5F5-FDB062BB82C5@oracle.com>
	<1661a37c-9edc-5841-5376-db69cafea4d6@oracle.com>
	<57889995.5030100@redhat.com>
	<57890ED7.8090704@cs.oswego.edu> <578936CA.4070304@cs.oswego.edu>
	<51825156-98D6-4055-980F-AFFBE345C1A6@oracle.com>
	<578D2E91.5000500@cs.oswego.edu> <578DDDDC.8050704@redhat.com>
	<3256C3CC-D79F-4FD5-8EEE-3D1E599BED5C@oracle.com>
	<578E92D9.80201@redhat.com>
	<CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
	<5796192C.3020402@redhat.com>
	<1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>
	<CAPUmR1bsG+w66yzd6hLNk2U9ZOUHMS0yYOUdYiRbX7CAgSSuDw@mail.gmail.com>
	<dff93d6d-7aeb-7f49-12bc-8bc2760bad37@cs.oswego.edu>
Message-ID: <CAPUmR1ZorqMqGYg3iOPngijwzzr8QDSNto=Ez2GZueXYxLmKcQ@mail.gmail.com>

I'm not quite sure which document you're referring to for C++.  The latest
draft (N4604 or N4606)  reorganized section 1.10.

1.10.2 discusses forward progress in a lot more detail than before. But I
think the only directly relevant statement here is p18, which was there
before:

"An implementation should ensure that the last value (in modification
order) assigned by an atomic or
synchronization operation will become visible to all other threads in a
finite period of time."

Recall that "should" (as opposed to e.g. "shall") is ISO standardese for a
non-binding recommendation.  The reason I haven't pushed for something
stronger is that I don't think hardware specifications consistently contain
the corresponding guarantees, which would put language implementers in a
weird position. But that could probably be argued either way.

This is now separate from the core memory model in 1.10.1.

I think the "no merge" rule is not really formally specifiable, since it's
a compiler-only constraint that can't be tested by a conforming program.
We could specify a "no infinite merge" rule that handles the unbounded spin
case on reasonable hardware.

As I'm occasionally reminded by my WG21 colleagues, it's not clear that the
extreme cases here are worth spending too much time on, since nobody is
going to use an implementation that gets them wrong, no matter what we say.
The tricky and more interesting cases are probably something like:

l.my_spin_lock();  // Implemented with acquire CAS
if (...) {
   ...
   l.my_spin_unlock();  // release store
} else {
   ...
   l.my_spin_unlock();
   ...  // No synchronization; Known to terminate in bounded time
}

Can I move the unlock release store out of the conditional to merge the two
stores?


On Mon, Jul 25, 2016 at 12:24 PM, Doug Lea <dl at cs.oswego.edu> wrote:

> On 07/25/2016 02:19 PM, Hans Boehm wrote:
>
> 1. Opaque is cache coherent (i.e. single-variable sequentially
>> consistent), just
>> like memory_order_relaxed in C++.
>>
>> 2. Opaque prevents compiler merging of accesses,
>>
>> In my mind, (2) is separable from coherence.
>>
>
> This might not be the right venue to discuss whether the new C++17 sec
> 1.10.4
> progress requirements apply to the memory system. I think they must, and
> that this would be consistent with common formal cache-memory-system specs.
>
> In which case you are inevitably led to the no-merge rule, as seen in the
> examples I posted.
>
> And even if this were not done in C++, I don't know any argument for
> not doing so in Java. No programmer would be happy if their bounded
> spin loops were allowed to be transformed into no-ops. Why allow
> something that literally no one wants rather than just hoping that
> compilers don't happen to do it?
>
> (Gratuitously editorializing, one would think that in C++,
> it might also be popular to adopt this interpretation, and
> eliminate the need to ever integrate C "volatile", or to
> re-spec consume mode.)
>
> -Doug
>
>
>

From dl at cs.oswego.edu  Tue Jul 26 20:03:44 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Tue, 26 Jul 2016 16:03:44 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <20160726170918.GA7094@linux.vnet.ibm.com>
References: <CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
	<5796192C.3020402@redhat.com>
	<1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>
	<CAPUmR1bsG+w66yzd6hLNk2U9ZOUHMS0yYOUdYiRbX7CAgSSuDw@mail.gmail.com>
	<dff93d6d-7aeb-7f49-12bc-8bc2760bad37@cs.oswego.edu>
	<20160726170918.GA7094@linux.vnet.ibm.com>
Message-ID: <10809e13-82eb-70cd-553a-b4ed4890ce82@cs.oswego.edu>


Moving ever further away from the alleged subject line...


On 07/26/2016 01:09 PM, Paul E. McKenney wrote:
> On Mon, Jul 25, 2016 at 03:24:48PM -0400, Doug Lea wrote:
>> (Gratuitously editorializing, one would think that in C++,
>> it might also be popular to adopt this interpretation, and
>> eliminate the need to ever integrate C "volatile", or to
>> re-spec consume mode.)
>
> Yes and no.
>
> If I am working on a low-level synchronization primitive, then yes,
> I really do want the system to do -exactly- what I tell it to, no more,
> no less.
>
> But in higher-level code, I would likely be quite happy for the compiler
> to fuse accesses, if it could do so without violating the memory model.
>

The C++-relaxed spec definitely shows this tension. Sometimes people
want it to mean just "plain, but don't tear words".  Which is not the
same as what you'd otherwise spec as "the cheapest mode for a
thread-safe variable respecting coherence". In Java,
with the availability of "Plain" accesses even for volatiles,
and access-atomicity for references and <=32bit scalars,
there is little motivation to compromise for Opaque mode.

In which case, the main premise is that when users use non-plain
access modes for reads (similarly, but less interestingly writes), they
are expressing that they intend to handle all of the possible program
traces that might result if two subsequent reads see different values.
So implementations cannot be allowed to merge reads in ways that are
sure to reduce the number of possible program traces.

Again, this is symmetric to the idea that implementations cannot be
allowed to add writes (e.g., duplicate them) in ways that are sure to
increase the number of possible program traces.

It is surely possible to introduce a formalization of traces that
rigorously states both constraints. But it is not easy to define an
underlying trace model that covers practical execution issues. So in a
language spec, it may be preferable to just say no merged reads and no
added writes for atomics. Which is what C++ and Java both do now for
no-added-writes. Or, it may be a better idea to leave the trace-based
requirements incompletely formalized, which should have the same
practical effect. Or even better (but not soon) agree upon some formalism.

-Doug


From boehm at acm.org  Wed Jul 27 17:55:44 2016
From: boehm at acm.org (Hans Boehm)
Date: Wed, 27 Jul 2016 10:55:44 -0700
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <10809e13-82eb-70cd-553a-b4ed4890ce82@cs.oswego.edu>
References: <CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
	<5796192C.3020402@redhat.com>
	<1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>
	<CAPUmR1bsG+w66yzd6hLNk2U9ZOUHMS0yYOUdYiRbX7CAgSSuDw@mail.gmail.com>
	<dff93d6d-7aeb-7f49-12bc-8bc2760bad37@cs.oswego.edu>
	<20160726170918.GA7094@linux.vnet.ibm.com>
	<10809e13-82eb-70cd-553a-b4ed4890ce82@cs.oswego.edu>
Message-ID: <CAPUmR1Yowdyi25hkSU6JhWmkS7CH458aGzMUn3sXA52ujmLBeQ@mail.gmail.com>

Peter Dimov gave a good example in C++ discussions for wanting merging of
atomic operations: Reference counting. If you see two reference count
increments in a row, you clearly want to merge the underlying fetch_and_add
operations.

(I say that in spite of the fact that I'm not a fan of explicit reference
counting, and am currently spending way too much time debugging
reference-counting code.  But it seems unavoidable at times, occasionally
even in Java, and pervasive in C++.)

I don't understand Doug's statement: "So implementations cannot be allowed
to merge reads in ways that are sure to reduce the number of possible
program traces."  We have hardware microarchitectures that do this on a
grand scale by transactionally committing a bunch of memory operations in
bulk (cf. http://dl.acm.org/citation.cfm?doid=1610252.1610271), so many
intermediate states are invisible. In general the rules are that we cannot
add traces, but removing possible traces is entirely fine.

My (failed) proposal to the C++ committee was to restrict software
transformations informally to be comparable to the hardware effects we
observe anyway.  I think that is the strongest property code that deals
only with conventional memory (not device registers) can reliably test for.


On Tue, Jul 26, 2016 at 1:03 PM, Doug Lea <dl at cs.oswego.edu> wrote:

>
> Moving ever further away from the alleged subject line...
>
>
> On 07/26/2016 01:09 PM, Paul E. McKenney wrote:
>
>> On Mon, Jul 25, 2016 at 03:24:48PM -0400, Doug Lea wrote:
>>
>>> (Gratuitously editorializing, one would think that in C++,
>>> it might also be popular to adopt this interpretation, and
>>> eliminate the need to ever integrate C "volatile", or to
>>> re-spec consume mode.)
>>>
>>
>> Yes and no.
>>
>> If I am working on a low-level synchronization primitive, then yes,
>> I really do want the system to do -exactly- what I tell it to, no more,
>> no less.
>>
>> But in higher-level code, I would likely be quite happy for the compiler
>> to fuse accesses, if it could do so without violating the memory model.
>>
>>
> The C++-relaxed spec definitely shows this tension. Sometimes people
> want it to mean just "plain, but don't tear words".  Which is not the
> same as what you'd otherwise spec as "the cheapest mode for a
> thread-safe variable respecting coherence". In Java,
> with the availability of "Plain" accesses even for volatiles,
> and access-atomicity for references and <=32bit scalars,
> there is little motivation to compromise for Opaque mode.
>
> In which case, the main premise is that when users use non-plain
> access modes for reads (similarly, but less interestingly writes), they
> are expressing that they intend to handle all of the possible program
> traces that might result if two subsequent reads see different values.
> So implementations cannot be allowed to merge reads in ways that are
> sure to reduce the number of possible program traces.
>
> Again, this is symmetric to the idea that implementations cannot be
> allowed to add writes (e.g., duplicate them) in ways that are sure to
> increase the number of possible program traces.
>
> It is surely possible to introduce a formalization of traces that
> rigorously states both constraints. But it is not easy to define an
> underlying trace model that covers practical execution issues. So in a
> language spec, it may be preferable to just say no merged reads and no
> added writes for atomics. Which is what C++ and Java both do now for
> no-added-writes. Or, it may be a better idea to leave the trace-based
> requirements incompletely formalized, which should have the same
> practical effect. Or even better (but not soon) agree upon some formalism.
>
> -Doug
>
>

From dl at cs.oswego.edu  Wed Jul 27 20:21:48 2016
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 27 Jul 2016 16:21:48 -0400
Subject: [jmm-dev] bitwise RMW operators, specifically testAndSetBit/BTS
In-Reply-To: <CAPUmR1Yowdyi25hkSU6JhWmkS7CH458aGzMUn3sXA52ujmLBeQ@mail.gmail.com>
References: <CA+kOe0-+LzcqCTLcJKo-1JJ5ToRc5WBWefxWbSOcuGZFSJS5fw@mail.gmail.com>
	<1675E2B0-A738-44F4-A6CE-288C30CFF155@oracle.com>
	<43171481-98c0-1396-3942-7647bbfe31b6@cs.oswego.edu>
	<007ba403-7c39-49ab-66b1-b3699b493ea8@cs.oswego.edu>
	<5795D3B6.9050407@redhat.com>
	<3722cffd-69ec-5e97-2043-0046b80d82e4@cs.oswego.edu>
	<5796192C.3020402@redhat.com>
	<1d1b8341-ad74-b60a-18b1-d2cc23069561@cs.oswego.edu>
	<CAPUmR1bsG+w66yzd6hLNk2U9ZOUHMS0yYOUdYiRbX7CAgSSuDw@mail.gmail.com>
	<dff93d6d-7aeb-7f49-12bc-8bc2760bad37@cs.oswego.edu>
	<20160726170918.GA7094@linux.vnet.ibm.com>
	<10809e13-82eb-70cd-553a-b4ed4890ce82@cs.oswego.edu>
	<CAPUmR1Yowdyi25hkSU6JhWmkS7CH458aGzMUn3sXA52ujmLBeQ@mail.gmail.com>
Message-ID: <2552a774-eecf-83a5-72ee-b077cad9ed02@cs.oswego.edu>


I should have known better than to invite debate about C++ relaxed etc
specs here. Sorry. These details of how they are spec'ed in C++17 don't
seem to and should not matter to us. It's probably best to move that
discussion elsewhere.

To summarize though, C/C++ effectively has 6 modes: plain, relaxed,
consume, acquire/release, seq_cst, and linux pseudo-mode
*(volatile*)relaxed. Java (jdk9) effectively has only 4: plain,
opaque, acquire/release, volatile. Four appear to be enough -- modulo
other language differences, for every C++ construction, there is an
applicable Java construction that preserves essential properties
and is expected to be implemented in the same way in common
use cases. And to make this work out, we incorporate memory
system progress properties.

About which...

On 07/26/2016 03:26 PM, Hans Boehm wrote:

> [C++17]: "An implementation should ensure that the last value (in
> modification order) assigned by an atomic or synchronization operation will
> become visible to all other threads in a finite period of time."
>
> The reason I haven't pushed for something stronger is that I don't think
> hardware specifications consistently contain the corresponding guarantees,
> which would put language implementers in a weird position. But that could
> probably be argued either way.

Yes, just to clarify that other way: If a memory system not observing
progress guarantees were shipped, the designers would be blamed
for insufficiently specifying and testing its properties. To enable testing,
"eventually Predicate P" specs are normally phrased as: Every implementation
must pick and publish a K such that P always holds within K units (usually
clock cycles). The same holds in software, but without such convenient units,
sometimes leading to yet more arbitrary constants.

> I think the "no merge" rule is not really formally specifiable, since it's a
> compiler-only constraint that can't be tested by a conforming program.

Testing is not impossible but is less portable.
For a fun one, someone could try removing the volatile cast from the
linux READ_ONCE macro, build, and check for test suite bugs.

On 07/27/2016 01:55 PM, Hans Boehm wrote:
> Peter Dimov gave a good example in C++ discussions for wanting merging of
> atomic operations: Reference counting. If you see two reference count
> increments in a row, you clearly want to merge the underlying fetch_and_add
> operations.

Is allowing this worth the loss in ability to prevent it in
unwanted cases, e.g., when a huge but finite number of these were
otherwise postponed in a long-lived but bounded loop?
In other words, it is possible to write a combining-based ref-counter
if you need one, but not to write a non-combining one when you don't.
(And again, whatever the answer, Java vs C++ differences in such cases
are not too important.)

> I don't understand Doug's statement: "So implementations cannot be allowed to
> merge reads in ways that are sure to reduce the number of possible program
> traces."  We have hardware microarchitectures that do this on a grand scale
> by transactionally committing a bunch of memory operations in bulk

This is getting increasingly far afield, but within (most? all?) definitions
of transactions, multiple reads of the same variable are *required* to
take the same value (act as if merged), which is not exactly the same
as in any access mode. So something special would need to be said anyway
about what happens especially for atomics/volatiles.

(On the other hand, Concurrent code with multiple reads of the same
atomic/volatile variable within a method or transaction is already highly
suspicious and a-priori unlikely to be correct. So I agree with Hans that
most of the issues we are discussing cover cases that most programmers
should never encounter.)

-Doug