# [jmm-dev] Is the JMM spec progressing?

James Riely jriely at gmail.com
Tue Aug 10 23:00:18 UTC 2021

```Alan Jeffrey and I have been working on how to distinguish genuine thin-air
executions from those that are just thin-air-curious. A first draft of this
analysis appeared in last year’s OOPSLA [1]. We’ve been pursuing follow-up
work with an expanded list of colleagues to support sequential composition
and to develop some tooling and proof mechanization. The follow-up work has
revealed lots of little bugs in that paper, but the larger points are still
valid.

The main point is that, as a concept, “thin-air” is too ill-defined to be
useful. Instead we should look for compositional reasoning principles.
Applying this to the Lochbilher example, we can show that speculative
operational models, like the JMM and Promising Semantics, fail to support
compositional reasoning for temporal safety properties. In section 6, we
provide a simplistic logic for reasoning through (a variant of) the
Lochbilher example.

In that section, we also analyze the first RFUB example [2]. In this case
we use Hoare logic as a window on what it means to have a dependency. In
particular, we observe that

{true}  (r=y; if (r!=42){r=42}; x=r)  {x==42}    VALID

is a valid Hoare triple, arguing that the write to x is independent of the
read from y. But that is not true when you remove the “unexecuted” code:

{true}  (r=y; x=r)  {x==42}    NOT VALID

Here the write to x clearly depends on the read.

We were not aware of the RFUB New Constructor example when we wrote the
paper:

y=x  ||  r=y; if(r=nullptr){r=new C}; x=r; r->f()

With the goal of removing object creation, I believe it is fair to compare
this to the following:

y=x  ||  r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1}

In this case, we do have

{true}  (r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1})  {z=1}    VALID

But not

{true}  (r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1})  {x=42}
NOT VALID (for any value you pick)

Thus there is a real dependency from y to x in this case, but not from y to
z.

Best wishes,
James

[1] https://doi.org/10.1145/3428262

On Sun, Aug 8, 2021 at 5:59 AM Hans Boehm <hboehm at google.com> wrote:

> On Sat, Aug 7, 2021 at 2:28 AM Andrew Haley <aph-open at littlepinkcloud.com>
> wrote:
>
> > On 8/6/21 11:32 PM, Hans Boehm wrote:
> >
> > > Probably even more controversially, I think we've realized that
> > > existing compiler technology can compile such racing code in ways
> > > that some of us are not 100% sure should really be allowed.
> >
> > This implies, does it not, that the problem is not formalization as
> > such, but that we don't really understand what the language is
> > supposed to mean? That's always been my problem with OOTA: I'm unsure
> > whether the problem is due to the inadequacy of formal models, in
> > which case the formalists can fix their own problem, or something we
> > all have to pay attention to.
> >
> In some sense, I'm not sure either. The p1217 examples bother me in that
> they seem to violate  some global programming rules ("if x is only ever
> null
> or refers to an object properly constructed by the same thread, then x
> should never appear to refer to an incompletely constructed object").
> And there seems to be disagreement about whether the currently allowed
> behavior is "correct".
>
> On the other hand, in practice the weirdness doesn't seem to break things.
> that it doesn't matter because nobody writes code that way. If you ask
> people trying to analyzer or verify code, they'll probably be unhappy.
> And I haven't been able to convince myself that you cannot get yourself
> into these situations just by linking components together, each of which
> does something perfectly reasonable.
>
> And there are very common code
> patterns (like the standard implementation of reentrant locks used
> by all Java implementations) that break if you allow general OOTA
> behavior. Which at least means that you can't currently formally verify
> such
> code. The theorem you'd be trying to prove is false with respect to the
> part of the language spec we know how to formalize.
>
> It's a mess.
>
>
> > > Demonstrably unexecuted code can affect the semantics in ways that
> > > strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++
> > > version; IIUC, Lochbihler and others earlier came up with some
> > > closely related observations for Java.)
> >
> > Looking again at p1217, it seems to me that enforcing load-store
> > ordering would have severe effects on compilers, at least without new
> > optimization techniques. We hoist loads before loops and sink stores
> > after them. When it all works out, there are no memory accesses in the
> > loop. A load-store barrier in a loop would have the effect of forcing
> > succeeding stores out to memory, and forcing preceding loads to reload
> > from memory. It's not hard to imagine that this would cause an
> > order-of-margnitude performance reduction in common cases.
> >
> I think it would be a fairly pervasive change to optimizers. It has also
> become clear in WG21, the C++ committee, that there is not enough
> support for requiring this. In that case, Ou and Demsky have a paper
> saying that the overhead is likely to be on the order of 1% or less.
> For Java if it were applied everywhere, it would probably be
> appreciably higher.
>
> On the other hand, it's a bit harder than that to come up with examples
> where
> the generated x86 code has to be worse. Moving loads earlier in the
> code, or delaying stores, as you suggest, would still be fine. The only
> issue is with delaying loads past stores, which seems less common,
> though it can certainly be beneficial for reducing live ranges, probably
> some
> vectorization etc.
>
> But it seems unlikely that such a restriction will be applied even to
> C++ memory_order_relaxed, much less Java ordinary variables.
>
> Hans
>
> >
> > I suppose one could argue that such optimizations would continue to be
> > valid, so only those stores which would have been emitted anyway would
> > be affected. But that's not how compilers work, as far as I know. In
> > our IR for C2, memory accesses are not pinned in any way, so the only
> > way to make unrelated accesses execute in any particular order is to
> >
> > --
> > Andrew Haley  (he/him)
> > Java Platform Lead Engineer
> > Red Hat UK Ltd. <https://www.redhat.com>
> > https://keybase.io/andrewhaley
> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> >
>

--
James Riely  jriely at gmail.com  jriely at cs.depaul.edu
http://fpl.cs.depaul.edu/jriely/
```