[jmm-dev] Jmm revision status

Sat Jul 19 16:39:10 UTC 2014

I would to make a minor plea to not assume that ARM is an exotic processor example.  Sometimes it sounds like you're saying "if it's ok for ARM then it's going to be ok for everyone".  From my perspective that apple didn't fall far from the tree, so it's not much of a fitness proof if it works for ARM.

This being said my constructive feedback is that the relative costs of overriding the native RMO order varies a lot among RMO processors, because the space comprises many more radical designs.  So forcing LD->ST is not something that's cheap for everyone to do.  I don't see that changing very much in the near future.

Cheers,

Olivier

Sent from my iPhone

> On Jul 18, 2014, at 3:21 PM, "Hans Boehm" <boehm at acm.org> wrote:
> 
> I'm not quite sure what you mean by "the usual benign weirdness"?
> 
> Preserving load->store ordering (or equivalently requiring rf U hb to be
> acyclic) leads to an observably different memory model from what we have
> now.  If I have
> 
> Thread1: r1 = x; y = r1;
> 
> Thread2: r2 = y; x = 42;
> 
> r1 = r2 = 42 would no longer be allowed.
> 
> (In the section 6.2 version of the spec, we'd have to write something like
> 
> Thread 2: r2 = y; x = 0 * y + 42;
> 
> to exhibit the difference.)
> 
> I don't see how we could both prohibit this in the specification, but then
> not actually enforce it at the hardware level?
> 
> Note that having the hardware enforce it often has a minimal or zero
> impact, even on ARMv8, depending on how the initial load is used.  If we
> actually had, on ARMv8:
> 
> Thread2: r2 = y; x = 42; if (r2 > 0) z = 17;
> 
> I believe we can just transform the code to
> 
> Thread2: r2 = y; if (r2 > 0) z = 17; x = 42;
> 
> to preserve the ordering.  We add no instructions, but may stall earlier
> for r2 to become available.
> 
> This makes the cost estimates subtle and nontrivial.  A really dumb
> implementations that just adds a conditional branch to each load
> (carefully, so that no branch prediction slots are consumed) would be an
> interesting data point.  Delaying the bogus branch until the next
> non-data-dependent store, and omitting it if there already is an adequate
> branch in the interval, would be better.  Also delaying stores past
> existing conditionals would presumably be best.
> 
> I have no idea how many of these bogus branches actually remain after these
> transformations.  The down side is that this will have a major impact on
> optimization of Java code.  We'd be dramatically changing the ground rules,
> again.
> 
> Hans
> 
> 
>> On Fri, Jul 18, 2014 at 8:38 AM, Doug Lea <dl at cs.oswego.edu> wrote:
>> 
>>> On 07/17/2014 06:57 PM, Hans Boehm wrote:
>>> 
>>> A few other updates:  Brian and I had a paper in MSPC 14
>>> (http://dl.acm.org/citation.cfm?id=2618134) that mostly summarizes the
>>> out-of-thin-air issues and solutions based on prohibiting store->load
>>> reordering.  I would argue that those are still the most practical
>>> solutions we
>>> currently have.
>> Has anyone thought through a rule that amounts to requiring
>> compilers preserve them (as in your section 6), but treating
>> those relaxed/non-atomic cases  where Arm/Power don't honor
>> them as just the usual benign weirdness? Are there any
>> cases where the consequences are any worse than other cases
>> that we claim are benign? I think that some of Viktor et al's
>> variants come close to this.
>> 
>> -Doug
>> 
>> 
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------