From aph-open at littlepinkcloud.com  Sat Aug  7 09:27:29 2021
From: aph-open at littlepinkcloud.com (Andrew Haley)
Date: Sat, 7 Aug 2021 10:27:29 +0100
Subject: [jmm-dev] Is the JMM spec progressing?
In-Reply-To: <CAMOCf+jiCGZP8iiNLGZQMJRdH9-zChUOk8Ju0qfBLAWG2eGE2w@mail.gmail.com>
References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com>
 <CAMOCf+jiCGZP8iiNLGZQMJRdH9-zChUOk8Ju0qfBLAWG2eGE2w@mail.gmail.com>
Message-ID: <d229c847-b9e3-bb20-98e8-d7975137de97@littlepinkcloud.com>

On 8/6/21 11:32 PM, Hans Boehm wrote:

> Probably even more controversially, I think we've realized that
> existing compiler technology can compile such racing code in ways
> that some of us are not 100% sure should really be allowed.

This implies, does it not, that the problem is not formalization as
such, but that we don't really understand what the language is
supposed to mean? That's always been my problem with OOTA: I'm unsure
whether the problem is due to the inadequacy of formal models, in
which case the formalists can fix their own problem, or something we
all have to pay attention to.

> Demonstrably unexecuted code can affect the semantics in ways that
> strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++
> version; IIUC, Lochbihler and others earlier came up with some
> closely related observations for Java.)

Looking again at p1217, it seems to me that enforcing load-store
ordering would have severe effects on compilers, at least without new
optimization techniques. We hoist loads before loops and sink stores
after them. When it all works out, there are no memory accesses in the
loop. A load-store barrier in a loop would have the effect of forcing
succeeding stores out to memory, and forcing preceding loads to reload
from memory. It's not hard to imagine that this would cause an
order-of-margnitude performance reduction in common cases.

I suppose one could argue that such optimizations would continue to be
valid, so only those stores which would have been emitted anyway would
be affected. But that's not how compilers work, as far as I know. In
our IR for C2, memory accesses are not pinned in any way, so the only
way to make unrelated accesses execute in any particular order is to
add a dependency between all loads and stores.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph-open at littlepinkcloud.com  Sat Aug  7 09:18:18 2021
From: aph-open at littlepinkcloud.com (Andrew Haley)
Date: Sat, 7 Aug 2021 10:18:18 +0100
Subject: [jmm-dev] Is the JMM spec progressing?
In-Reply-To: <CAMOCf+jiCGZP8iiNLGZQMJRdH9-zChUOk8Ju0qfBLAWG2eGE2w@mail.gmail.com>
References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com>
 <CAMOCf+jiCGZP8iiNLGZQMJRdH9-zChUOk8Ju0qfBLAWG2eGE2w@mail.gmail.com>
Message-ID: <3a326f91-7684-084b-8c1b-233c067d3f8f@littlepinkcloud.com>

On 8/6/21 11:32 PM, Hans Boehm wrote:

> Probably even more controversially, I think we've realized that
> existing compiler technology can compile such racing code in ways
> that some of us are not 100% sure should really be allowed.

This implies, does it not, that the problem is not formalization as
such, but that we don't really understand what the language is
supposed to mean? That's always been my problem with OOTA: I'm unsure
whether the problem is due to the inadequacy of formal models, in
which case the formalists can fix their own problem, or something we
all have to pay attention to.

> Demonstrably unexecuted code can affect the semantics in ways that
> strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++
> version; IIUC, Lochbihler and others earlier came up with some
> closely related observations for Java.)

Looking again at p1217, it seems to me that enforcing load-store
ordering would have severe effects on compilers, at least without new
optimization techniques. We hoist loads before loops and sink stores
after them. When it all works out, there are no memory accesses in the
loop. A load-store barrier in a loop would have the effect of forcing
succeeding stores out to memory, and forcing preceding loads to reload
from memory. It's not hard to imagine that this would cause an
order-of-margnitude performance reduction in common cases.

I suppose one could argue that such optimizations would continue to be
valid, so only those stores which would have been emitted anyway would
be affected. But that's not how compilers work, as far as I know. In
our IR for C2, memory accesses are not pinned in any way, so the only
way to make unrelated accesses execute in any particular order is to
add a dependency between all loads and stores.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From dl at cs.oswego.edu  Sat Aug  7 11:43:52 2021
From: dl at cs.oswego.edu (Doug Lea)
Date: Sat, 7 Aug 2021 07:43:52 -0400
Subject: [jmm-dev] Is the JMM spec progressing?
In-Reply-To: <d229c847-b9e3-bb20-98e8-d7975137de97@littlepinkcloud.com>
References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com>
 <CAMOCf+jiCGZP8iiNLGZQMJRdH9-zChUOk8Ju0qfBLAWG2eGE2w@mail.gmail.com>
 <d229c847-b9e3-bb20-98e8-d7975137de97@littlepinkcloud.com>
Message-ID: <a3d8d75e-0247-1601-1b72-25f89d88467e@cs.oswego.edu>

(Henceforth, let's drop jdk-dev and move this to jmm-dev.)

On 8/7/21 5:27 AM, Andrew Haley wrote:
> On 8/6/21 11:32 PM, Hans Boehm wrote:
>
>> Probably even more controversially, I think we've realized that
>> existing compiler technology can compile such racing code in ways
>> that some of us are not 100% sure should really be allowed.
> This implies, does it not, that the problem is not formalization as
> such, but that we don't really understand what the language is
> supposed to mean? That's always been my problem with OOTA: I'm unsure
> whether the problem is due to the inadequacy of formal models, in
> which case the formalists can fix their own problem, or something we
> all have to pay attention to.

Yes. My stance in the less formal account 
(http://gee.cs.oswego.edu/dl/html/j9mm.html) as well as Shuyang Liu et 
al's ongoing formalization (see links from 
http://compilers.cs.ucla.edu/people/) is that the most you want to say 
about racy Java programs is that they are typesafe. As in: you can't see 
a String when expecting an int. Even this looser constraint is 
challenging to specify, prove, and extend. But it is a path for Java 
that might not apply to languages like C that are not guaranteed 
typesafe anyway, and so enter Undefined Behavior territory (as opposed 
to possibly-unexpected but still typesafe behavior).

-Doug


>
>> Demonstrably unexecuted code can affect the semantics in ways that
>> strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++
>> version; IIUC, Lochbihler and others earlier came up with some
>> closely related observations for Java.)
> Looking again at p1217, it seems to me that enforcing load-store
> ordering would have severe effects on compilers, at least without new
> optimization techniques. We hoist loads before loops and sink stores
> after them. When it all works out, there are no memory accesses in the
> loop. A load-store barrier in a loop would have the effect of forcing
> succeeding stores out to memory, and forcing preceding loads to reload
> from memory. It's not hard to imagine that this would cause an
> order-of-margnitude performance reduction in common cases.
>
> I suppose one could argue that such optimizations would continue to be
> valid, so only those stores which would have been emitted anyway would
> be affected. But that's not how compilers work, as far as I know. In
> our IR for C2, memory accesses are not pinned in any way, so the only
> way to make unrelated accesses execute in any particular order is to
> add a dependency between all loads and stores.
>

From hboehm at google.com  Sat Aug  7 22:13:25 2021
From: hboehm at google.com (Hans Boehm)
Date: Sat, 7 Aug 2021 15:13:25 -0700
Subject: [jmm-dev] Is the JMM spec progressing?
In-Reply-To: <d229c847-b9e3-bb20-98e8-d7975137de97@littlepinkcloud.com>
References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com>
 <CAMOCf+jiCGZP8iiNLGZQMJRdH9-zChUOk8Ju0qfBLAWG2eGE2w@mail.gmail.com>
 <d229c847-b9e3-bb20-98e8-d7975137de97@littlepinkcloud.com>
Message-ID: <CAMOCf+gfBE8FuxncYS6PUKkQhysJ07k2tzjmrWkbZTxen1771A@mail.gmail.com>

On Sat, Aug 7, 2021 at 2:28 AM Andrew Haley <aph-open at littlepinkcloud.com>
wrote:

> On 8/6/21 11:32 PM, Hans Boehm wrote:
>
> > Probably even more controversially, I think we've realized that
> > existing compiler technology can compile such racing code in ways
> > that some of us are not 100% sure should really be allowed.
>
> This implies, does it not, that the problem is not formalization as
> such, but that we don't really understand what the language is
> supposed to mean? That's always been my problem with OOTA: I'm unsure
> whether the problem is due to the inadequacy of formal models, in
> which case the formalists can fix their own problem, or something we
> all have to pay attention to.
>
In some sense, I'm not sure either. The p1217 examples bother me in that
they seem to violate  some global programming rules ("if x is only ever null
or refers to an object properly constructed by the same thread, then x
should never appear to refer to an incompletely constructed object").
And there seems to be disagreement about whether the currently allowed
behavior is "correct".

On the other hand, in practice the weirdness doesn't seem to break things.
If you ask people advocating the current behavior, the answer will be
that it doesn't matter because nobody writes code that way. If you ask
people trying to analyzer or verify code, they'll probably be unhappy.
And I haven't been able to convince myself that you cannot get yourself
into these situations just by linking components together, each of which
does something perfectly reasonable.

And there are very common code
patterns (like the standard implementation of reentrant locks used
by all Java implementations) that break if you allow general OOTA
behavior. Which at least means that you can't currently formally verify such
code. The theorem you'd be trying to prove is false with respect to the
part of the language spec we know how to formalize.

It's a mess.


> > Demonstrably unexecuted code can affect the semantics in ways that
> > strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++
> > version; IIUC, Lochbihler and others earlier came up with some
> > closely related observations for Java.)
>
> Looking again at p1217, it seems to me that enforcing load-store
> ordering would have severe effects on compilers, at least without new
> optimization techniques. We hoist loads before loops and sink stores
> after them. When it all works out, there are no memory accesses in the
> loop. A load-store barrier in a loop would have the effect of forcing
> succeeding stores out to memory, and forcing preceding loads to reload
> from memory. It's not hard to imagine that this would cause an
> order-of-margnitude performance reduction in common cases.
>
I think it would be a fairly pervasive change to optimizers. It has also
become clear in WG21, the C++ committee, that there is not enough
support for requiring this. In that case, Ou and Demsky have a paper
saying that the overhead is likely to be on the order of 1% or less.
For Java if it were applied everywhere, it would probably be
appreciably higher.

On the other hand, it's a bit harder than that to come up with examples
where
the generated x86 code has to be worse. Moving loads earlier in the
code, or delaying stores, as you suggest, would still be fine. The only
issue is with delaying loads past stores, which seems less common,
though it can certainly be beneficial for reducing live ranges, probably
some
vectorization etc.

But it seems unlikely that such a restriction will be applied even to
C++ memory_order_relaxed, much less Java ordinary variables.

Hans

>
> I suppose one could argue that such optimizations would continue to be
> valid, so only those stores which would have been emitted anyway would
> be affected. But that's not how compilers work, as far as I know. In
> our IR for C2, memory accesses are not pinned in any way, so the only
> way to make unrelated accesses execute in any particular order is to
> add a dependency between all loads and stores.
>
> --
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>

From hboehm at google.com  Sat Aug  7 22:24:54 2021
From: hboehm at google.com (Hans Boehm)
Date: Sat, 7 Aug 2021 15:24:54 -0700
Subject: [jmm-dev] Is the JMM spec progressing?
In-Reply-To: <a3d8d75e-0247-1601-1b72-25f89d88467e@cs.oswego.edu>
References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com>
 <CAMOCf+jiCGZP8iiNLGZQMJRdH9-zChUOk8Ju0qfBLAWG2eGE2w@mail.gmail.com>
 <d229c847-b9e3-bb20-98e8-d7975137de97@littlepinkcloud.com>
 <a3d8d75e-0247-1601-1b72-25f89d88467e@cs.oswego.edu>
Message-ID: <CAMOCf+gSMfR4yjpuLgWO6shFv_PJttwvzin3cxpMQ4o71qqEkg@mail.gmail.com>

On Sat, Aug 7, 2021 at 4:44 AM Doug Lea <dl at cs.oswego.edu> wrote:

> (Henceforth, let's drop jdk-dev and move this to jmm-dev.)
>
> On 8/7/21 5:27 AM, Andrew Haley wrote:
> > On 8/6/21 11:32 PM, Hans Boehm wrote:
> >
> >> Probably even more controversially, I think we've realized that
> >> existing compiler technology can compile such racing code in ways
> >> that some of us are not 100% sure should really be allowed.
> > This implies, does it not, that the problem is not formalization as
> > such, but that we don't really understand what the language is
> > supposed to mean? That's always been my problem with OOTA: I'm unsure
> > whether the problem is due to the inadequacy of formal models, in
> > which case the formalists can fix their own problem, or something we
> > all have to pay attention to.
>
> Yes. My stance in the less formal account
> (http://gee.cs.oswego.edu/dl/html/j9mm.html) as well as Shuyang Liu et
> al's ongoing formalization (see links from
> http://compilers.cs.ucla.edu/people/) is that the most you want to say
> about racy Java programs is that they are typesafe. As in: you can't see
> a String when expecting an int. Even this looser constraint is
> challenging to specify, prove, and extend. But it is a path for Java
> that might not apply to languages like C that are not guaranteed
> typesafe anyway, and so enter Undefined Behavior territory (as opposed
> to possibly-unexpected but still typesafe behavior).
>
> -Doug
>

But this now breaks some common idioms, right? In particular, I think a
bunch of
code assumes that racing assignments of equivalent primitive values or
immutable
objects to the same field are OK.

If, in 2004, our view of language-based security had been the same as it is
now,
then I completely agree that this would have been the right approach. But I
think
doing it now would require significant user code changes. Which might still
be the best way forward ...

Hans

>
> >
> >> Demonstrably unexecuted code can affect the semantics in ways that
> >> strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++
> >> version; IIUC, Lochbihler and others earlier came up with some
> >> closely related observations for Java.)
> > Looking again at p1217, it seems to me that enforcing load-store
> > ordering would have severe effects on compilers, at least without new
> > optimization techniques. We hoist loads before loops and sink stores
> > after them. When it all works out, there are no memory accesses in the
> > loop. A load-store barrier in a loop would have the effect of forcing
> > succeeding stores out to memory, and forcing preceding loads to reload
> > from memory. It's not hard to imagine that this would cause an
> > order-of-margnitude performance reduction in common cases.
> >
> > I suppose one could argue that such optimizations would continue to be
> > valid, so only those stores which would have been emitted anyway would
> > be affected. But that's not how compilers work, as far as I know. In
> > our IR for C2, memory accesses are not pinned in any way, so the only
> > way to make unrelated accesses execute in any particular order is to
> > add a dependency between all loads and stores.
> >
>

From jriely at gmail.com  Tue Aug 10 23:00:18 2021
From: jriely at gmail.com (James Riely)
Date: Tue, 10 Aug 2021 18:00:18 -0500
Subject: [jmm-dev] Is the JMM spec progressing?
In-Reply-To: <CAMOCf+gfBE8FuxncYS6PUKkQhysJ07k2tzjmrWkbZTxen1771A@mail.gmail.com>
References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com>
 <CAMOCf+jiCGZP8iiNLGZQMJRdH9-zChUOk8Ju0qfBLAWG2eGE2w@mail.gmail.com>
 <d229c847-b9e3-bb20-98e8-d7975137de97@littlepinkcloud.com>
 <CAMOCf+gfBE8FuxncYS6PUKkQhysJ07k2tzjmrWkbZTxen1771A@mail.gmail.com>
Message-ID: <CA+45wtV_K+DtEJw0ev+762Duvs7+S8WMKD_qe0n1=_Veqpu=rw@mail.gmail.com>

Alan Jeffrey and I have been working on how to distinguish genuine thin-air
executions from those that are just thin-air-curious. A first draft of this
analysis appeared in last year?s OOPSLA [1]. We?ve been pursuing follow-up
work with an expanded list of colleagues to support sequential composition
and to develop some tooling and proof mechanization. The follow-up work has
revealed lots of little bugs in that paper, but the larger points are still
valid.

The main point is that, as a concept, ?thin-air? is too ill-defined to be
useful. Instead we should look for compositional reasoning principles.
Applying this to the Lochbilher example, we can show that speculative
operational models, like the JMM and Promising Semantics, fail to support
compositional reasoning for temporal safety properties. In section 6, we
provide a simplistic logic for reasoning through (a variant of) the
Lochbilher example.

In that section, we also analyze the first RFUB example [2]. In this case
we use Hoare logic as a window on what it means to have a dependency. In
particular, we observe that

   {true}  (r=y; if (r!=42){r=42}; x=r)  {x==42}    VALID

is a valid Hoare triple, arguing that the write to x is independent of the
read from y. But that is not true when you remove the ?unexecuted? code:

   {true}  (r=y; x=r)  {x==42}    NOT VALID

Here the write to x clearly depends on the read.

We were not aware of the RFUB New Constructor example when we wrote the
paper:

y=x  ||  r=y; if(r=nullptr){r=new C}; x=r; r->f()

With the goal of removing object creation, I believe it is fair to compare
this to the following:

y=x  ||  r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1}

In this case, we do have

  {true}  (r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1})  {z=1}    VALID

But not

  {true}  (r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1})  {x=42}
  NOT VALID (for any value you pick)

Thus there is a real dependency from y to x in this case, but not from y to
z.

Best wishes,
James

[1] https://doi.org/10.1145/3428262
[2] https://wg21.link/p1217r0
[3] https://wg21.link/p1217r2

On Sun, Aug 8, 2021 at 5:59 AM Hans Boehm <hboehm at google.com> wrote:

> On Sat, Aug 7, 2021 at 2:28 AM Andrew Haley <aph-open at littlepinkcloud.com>
> wrote:
>
> > On 8/6/21 11:32 PM, Hans Boehm wrote:
> >
> > > Probably even more controversially, I think we've realized that
> > > existing compiler technology can compile such racing code in ways
> > > that some of us are not 100% sure should really be allowed.
> >
> > This implies, does it not, that the problem is not formalization as
> > such, but that we don't really understand what the language is
> > supposed to mean? That's always been my problem with OOTA: I'm unsure
> > whether the problem is due to the inadequacy of formal models, in
> > which case the formalists can fix their own problem, or something we
> > all have to pay attention to.
> >
> In some sense, I'm not sure either. The p1217 examples bother me in that
> they seem to violate  some global programming rules ("if x is only ever
> null
> or refers to an object properly constructed by the same thread, then x
> should never appear to refer to an incompletely constructed object").
> And there seems to be disagreement about whether the currently allowed
> behavior is "correct".
>
> On the other hand, in practice the weirdness doesn't seem to break things.
> If you ask people advocating the current behavior, the answer will be
> that it doesn't matter because nobody writes code that way. If you ask
> people trying to analyzer or verify code, they'll probably be unhappy.
> And I haven't been able to convince myself that you cannot get yourself
> into these situations just by linking components together, each of which
> does something perfectly reasonable.
>
> And there are very common code
> patterns (like the standard implementation of reentrant locks used
> by all Java implementations) that break if you allow general OOTA
> behavior. Which at least means that you can't currently formally verify
> such
> code. The theorem you'd be trying to prove is false with respect to the
> part of the language spec we know how to formalize.
>
> It's a mess.
>
>
> > > Demonstrably unexecuted code can affect the semantics in ways that
> > > strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++
> > > version; IIUC, Lochbihler and others earlier came up with some
> > > closely related observations for Java.)
> >
> > Looking again at p1217, it seems to me that enforcing load-store
> > ordering would have severe effects on compilers, at least without new
> > optimization techniques. We hoist loads before loops and sink stores
> > after them. When it all works out, there are no memory accesses in the
> > loop. A load-store barrier in a loop would have the effect of forcing
> > succeeding stores out to memory, and forcing preceding loads to reload
> > from memory. It's not hard to imagine that this would cause an
> > order-of-margnitude performance reduction in common cases.
> >
> I think it would be a fairly pervasive change to optimizers. It has also
> become clear in WG21, the C++ committee, that there is not enough
> support for requiring this. In that case, Ou and Demsky have a paper
> saying that the overhead is likely to be on the order of 1% or less.
> For Java if it were applied everywhere, it would probably be
> appreciably higher.
>
> On the other hand, it's a bit harder than that to come up with examples
> where
> the generated x86 code has to be worse. Moving loads earlier in the
> code, or delaying stores, as you suggest, would still be fine. The only
> issue is with delaying loads past stores, which seems less common,
> though it can certainly be beneficial for reducing live ranges, probably
> some
> vectorization etc.
>
> But it seems unlikely that such a restriction will be applied even to
> C++ memory_order_relaxed, much less Java ordinary variables.
>
> Hans
>
> >
> > I suppose one could argue that such optimizations would continue to be
> > valid, so only those stores which would have been emitted anyway would
> > be affected. But that's not how compilers work, as far as I know. In
> > our IR for C2, memory accesses are not pinned in any way, so the only
> > way to make unrelated accesses execute in any particular order is to
> > add a dependency between all loads and stores.
> >
> > --
> > Andrew Haley  (he/him)
> > Java Platform Lead Engineer
> > Red Hat UK Ltd. <https://www.redhat.com>
> > https://keybase.io/andrewhaley
> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> >
>


-- 
James Riely  jriely at gmail.com  jriely at cs.depaul.edu
http://fpl.cs.depaul.edu/jriely/