[jmm-dev] Another way to punt OOTA

Wed Oct 29 19:54:59 UTC 2014

On Wed, Oct 29, 2014 at 11:23 AM, Paul E. McKenney <
paulmck at linux.vnet.ibm.com> wrote:
> ...
> > - Most of us don't care that much about slowing down potentially racing
> > accesses a bit.  The reason the load->store ordering solution seems so
> > dubious for Java is that the compiler doesn't know which accesses might
> > race, and thus we need to treat many accesses conservatively.  ARM is
much
> > more concerned about Java than C++ memory_order_relaxed.
>
> Speaking as one of the people who is most definitely not in the "most
> of us" set...  ;-)
>
> Yes, if you are taking a cache miss on each access, you won't notice a
> compare and conditional branch.  But an important subset of well-tuned
> parallel software takes care to promote cache locality, which means
> that you -won't- see very many cache misses.  The extra unnecessary
> conditional branches can place pressure on branch-prediction resources,
> which can result in noticeable performance degradation.
>
> Degrading memory_order_relaxed in this manner will push people back to
> the old workarounds involving volatile and inline assembler, which
> seems quite counterproductive.

Probably less of an issue in Java.  But I agree that if we can find an
acceptable solution that doesn't require this, we should go with it.  I'm
not suggesting we stop work on that.

>
> > Possible long-term approach:
> >
> > - Switch to a much more C++-like (or Java library-like) model in which
data
> > races have something like "undefined behavior".  Exactly how to model
that
> > is an open question.  Ordinary loads can only see stores that
happen-before
> > them, but racing loads trigger "undefined behavior". "Undefined
behavior"
> > should be defined to allow reporting an error and produce any type
correct
> > answer for the racing load.  Unlike C++, we probably want to be explicit
> > that it disallows a VM crash
> >
> > - [Probably challenging, others understand the Java constraints better
than
> > I] Introduce a mechanism for (nearly) unordered memory-order-relaxed
like
> > racing loads.  Require current racing accesses to use that mechanism
> > instead.  (Open question: coherence)
> >
> > - Strengthen this memory_order_relaxed analog to guarantee load->store
> > ordering.
>
> Again, I am not a Java expert.  But demanding the unneeded load-to-store
> ordering while dispensing with coherence seems completely backwards.

Agreed.  But that seemed like a somewhat separable issue, so I left it as
an open question.  I think we should add coherence.  That means

>
> > - Implementations may specify that they treat all ordinary accesses as
> > these "memory_order_relaxed" accesses.  Browser-based implementations,
or
> > others that require the current security model, would do so.  Whether
and
> > how to specify this is open..
> >
> >
> > Transition strategy:
> >
> > - Implementations are not required to break existing code in any way,
> > though the spec would allow them to.  In the near term, we would expect
> > implementations to continue to support (possibly with a suitable flag)
the
> > current ill-defined model, along with the new model.  (Which should be
> > easy; the only really new support id for memory_order_relaxed and
> > load->store ordering.)
> >
> > - The fact that data race detection is much easier in the new model may
> > help to inspire people to move to it.
> >
> >
> > Long term potential advantages:
> >
> > - No OOTA issues!
>
> I still believe that we can avoid OOTA without requiring implementations
> to emit unnecessary instructions.

I'm all in favor, so long as the solution doesn't amount to introducing an
even more complicated model that we understand even less well, thus
effectively just kicking the can down the road.  View this as a fallback.

>
> > - Easier race detection.  A standard conforming JVM should be able to
throw
> > a data race exception if it finds one.
>
> You lost me here -- how do needless branches and foregone coherence help
> in detecting data races?

The problem is that Java, unlike C/C++, currently allows unannotated data
races.  Thus detected races are not necessarily bugs, meaning that data
race detectors will unavoidably generate false positives.  This benefit
essentially just derives from introducing a standard way to identify
intentional races that should not be reported.

>
> > - Uniform model for basic types and libraries.
>
> Uniform in what way?

Data races are errors in both cases.  Currently Java has (supposedly)
defined semantics for racing accesses to base types like int, and there are
semi-common idioms that rely on those data races.  In contrast we
effectively treat racing accessed to ArrayList<int> as errors, with no
restriction on their behavior.  I don't think we ever believed this made
much sense.  I think the original intention, at least in the back of my
mind, was that we would eventually fix the ArrayList etc. specifications.
It hasn't happened, and it probably isn't realistically possible.  I'm
proposing that we bite the bullet and say both are errors.

>
> > - We can delete the special cases for tearing longs and doubles without
> > unacceptably slowing down some implementations.
>
> The idea here is that tearing becomes legal, correct?  If so, I guess
> I can agree with one out of four!  ;-)

Right.  Since the issue is only raised by erroneous programs.

Hans