RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer

Thu Jun 18 00:31:51 UTC 2015

Forgot to mention - it'd be nice if EA was a bit smarter in C2, e.g. flow
sensitive like graal.  Is the plan to leave it alone in C2 and wait for
graal to mature?

sent from my phone
On Jun 17, 2015 8:28 PM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:

> So I'm not sure how many cases will arise where scheduling stores is
> beneficial (on modern cpus) apart from removing redundant ones.  The
> compiler would need some seriously detailed machine model, I think, to
> reason about this intelligently.  Removing redundant ones (or moving loop
> invariant ones out of loops, like Roland is trying here) seems more
> tractable and beneficial? Are there cases beyond this where it would be
> profitable? Perhaps scheduling writes to addresses likely to be on same
> cacheline maybe ...
>
> As for removing StoreStore barriers, it seems like that's practically
> feasible with java's semantics only when EA kicks in; I'm having a hard
> time imagining how the JIT can trace unsafe/racy publication reliably and
> with minimal overhead.  Perhaps I'm not thinking hard enough though ...
>
> It's almost unfortunate that final fields were granted this right to be
> published unsafely :) - would've been perhaps better if explicit fencing
> was required for such specialized case.
>
> sent from my phone
> On Jun 17, 2015 5:27 PM, "John Rose" <john.r.rose at oracle.com> wrote:
>
>
> On Jun 17, 2015, at 1:23 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
> Nope, that's an oversimplified understanding.  One place where the JMM
>> will bite you is with publication of object state via final fields. Normal
>> stores used to initialize a structure which is published via final-field
>> semantics must be ordered to take place before the object is published.  We
>> don't (and perhaps can't) track object publication events, nor their
>> relation to stores into newly-reachable subgraphs.  Instead, we have fences
>> that gently but firmly ensure that data (from normal stores, even to
>> non-final fields and array elements!) is posted to memory before any store
>> which could be a publishing store for that data.
>
>
> Not sure what's oversimplified —
>
>
> I probably misread you, then.
>
> you're describing a JMM semantic for final fields, which I'd expect to be
> modeled as barriers in the IR, just like volatile writes would be modeled
> as barriers, preventing removal or reordering of them.  I appreciate that
> it can be troublesome to track this information, but that only means
> compiler will have to play more conservative and there may be some
> optimization opportunities lost.  I'd think the pattern would look like:
>
> obj = allocZerodMemory(); // obj has final fields
> obj.ctor(); // arbitrarily long/complex CFG
> StoreStore
> _someRef = obj;
>
> I'd expect redundant stores to be removed as part of ctor() CFG without
> violating the storestore barrier.  But, I do understand the
> complexity/trickiness of getting this right.
>
>
> You are correct.  The StoreStore approximates the point at which the
> object is first published to other threads.  All normal stores above the
> StoreStore can be issued in any order (as far as this fence is concerned)
> but must settle before the object is published.  Presumably it is published
> shortly after the StoreStore, and the StoreStore could be sunk until that
> point, if we wanted to do this, or even eliminated if the object never gets
> published.  Also, stores provably unrelated to (unreachable from) the
> published object could drop below the StoreStore.  We don't attempt to make
> this distinction.  None of these train of thought affects the basic
> assertion that (if fences are absent) normal stores can be reordered.
>
> If we wish to remove that StoreStore (for some reason) we would either
> need a more precise set of fences (or HB edges), or else we would have to
> hold back on aggressive store reordering.  This is what makes me think we
> may discover a missing fence, once we start letting those little stores
> swarm around each other.
>
> What makes me more nervous about this is the clear fact that non-TSO
> platforms (TSO, Itanium) have to tweak their fences in various ad hoc ways
> to avoid breaking user code.  See, for example, Parse::do_exits.  If we
> make our thread-local orderings more non-TSO-ish, we might run into the
> same subtle issues that the PPC port wrestles with.  By "subtle" I partly
> mean "relating to unstated user expectations even if not supported by the
> JMM", and I also mean "hard to detect, characterize, and fix".
>
> — John
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150617/b864e24d/attachment.html>