Value equality

Wed May 18 22:58:30 UTC 2016

Yes but if CE is a subset of OE then their union is just OE.  In 

– John

> On May 18, 2016, at 3:52 PM, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> Even without talking about optimizations,
> LIFE for a value type is not correct because CE may not be included in EQ.
> 
> value class Foo {
>  int value;
> 
>  public boolean equals(Foo foo) {
>    return value & 0xFF == foo.value & 0xFF;
>  }
> }
> 
> here, if CE do a bitwise comparaison, the semantics of LIFE is not the semantics of EQ.
> 
> Rémi
> 
> ----- Mail original -----
>> De: "John Rose" <john.r.rose at oracle.com>
>> À: "Brian Goetz" <Brian.Goetz at oracle.com>
>> Cc: valhalla-spec-experts at openjdk.java.net
>> Envoyé: Jeudi 19 Mai 2016 00:11:12
>> Objet: Re: Value equality
>> 
>>> On May 18, 2016, at 7:57 AM, Brian Goetz <Brian.Goetz at Oracle.COM> wrote:
>>> 
>>> [1] John points out that if == is CE, then (a==b||a.equals(b)) will
>>> redundantly load the fields on failed ==.  But, many equals
>>> implementations start with "a==b" as a short-circuiting optimization,
>>> which means "a==b" will be a common (pure) subexpression in the resulting
>>> expansion (and for values, methods are monomorphic and will get inlined
>>> more frequently), so the two checks can be collapsed.
>> 
>> Proposed acronym:  Call the explicit legacy idiom for equality "LIFE".  This
>> is "a==b||a.equals(b)", maybe with a null (or NaN?) check.  Semantically it
>> is CE||OE.
>> 
>> The question of optimization is worth expanding on.  In a nutshell, pure
>> control flow test like a.f=ACMP=b.f are often collapsible.  (The dominating
>> test makes the later test go away, or in other words, the dominated test can
>> rise upward to merge into the dominating test.)  But, even pure tests cannot
>> be sunk past exception-producing code.  If a sub-operation of OE can produce
>> an exception, it cannot be reordered with a previous CE sub-operation.
>> Unfortunately, exceptions provide a semantic "window" into order of
>> operations; this is one of the major difficulties with Java optimization.
>> 
>> For example:
>> 
>> if (a.someTag != b.someTag)  return NOT_EQUAL;  //CE, first pass
>> if (!a.equals(b))  return NOT_EQUAL; //OE, second pass
>> return IS_EQUAL;
>> 
>> If the heavy comparison somehow includes a repetition of the "someTag"
>> comparison, that latter comparison can be removed as a dominating test.
>> 
>> But, if the heavy comparison can throw an exception (or make any side
>> effect), then the "someTag" comparison cannot be moved *down* into or past
>> the equals method.
>> 
>> The net of this is that the writer of the equals method has, in general, no
>> control over the first pass of the equals algorithm, even after
>> optimization.  The code might get some spot benefits from removal of
>> explicit tests in the OE pass if they are already covered (dominated) by the
>> CE pass.  If the code *omits* the equivalent of a CE test (e.g., an ACMP
>> comparison) for some reason, the CE test is extra work.  Not much extra
>> work, but it could cause extra fills or spills as the JIT works to provide a
>> first-pass access to a component that will be needed, again, in a second
>> pass after arbitrary intermediate labor.
>> 
>> So a mandatory LIFE sentence (couldn't resist that), if op== is CE (that's
>> P1), means some loss of performance control around equals methods.
>> 
>> By comparison, if op== is OE (that's P2), there is no need to explicitly call
>> CE; the .equals method does the exact job, and the coder has control.
>> 
>> In legacy code where op== is OE, the LIFE expression produces back-to-back
>> calls to .equals, semantically OE||OE.  That would be even worse than
>> CE||OE, unless we have a way to optimize OE||OE to OE, or until the users
>> pick the LIFE (or the Legacy Idiom Connoting Equality) out of their code.
>> 
>> But, if it is allowed, collapsing two adjacent calls to .equals is a much
>> simpler JIT transform than merging an adjacent CE and NE.  It's a
>> macro-level CSE on partially lowered IR.
>> 
>> As noted above, merging a CE into an NE requires that all the logic of the CE
>> be performed before the first possible exception produced by NE.
>> 
>> Let's try a more complete example:
>> 
>> class MyVal {
>>  TypeDesc type;
>>  long payload1;
>>  long[] payload2;
>>  boolean equals(MyVal that) {
>>    return this.type.equals(that.type) && this.payload1 == that.payload1 &&
>>    Arrays.equals(this.payload1, that.payload2);
>>  }
>> }
>> class User {
>>   boolean foo(MyVal a, MyVal b) {
>>      return a==b || a.equals(b);  // CE || OE == LIFE
>>   }
>> }
>> 
>> In P1, the intermediate representation for the legacy idiom will contain
>> these operations:
>> 
>> PASS1: // start CE
>> if (a.type !ACMP= b.type)  goto PASS2;
>> if (a.payload1 !LCMP= b.payload1)  goto PASS2;
>> if (a.payload2 !ACMP= b.payload2)  goto PASS2;
>> return IS_EQUAL;
>> PASS2: // start OE
>> z = a.type.INVOKEVIRTUAL[TypeDesc, equals, (TypeDesc)Z](b.type); //[DT
>> a.type?]
>> … // random logic here; can throw exceptions in general
>> if (!z)  return NOT_EQUAL;
>> if (a.payload1 !LCMP= b.payload1)  return NOT_EQUAL; //[DT a.payload1]
>> z = INVOKESTATIC[Arrays, equals, ([J [J)Z](a.payload2, b.payload2); //[DT
>> a.payload2]
>> … // random logic here; can throw exceptions in general
>> if (!z)  return NOT_EQUAL;
>> return IS_EQUAL;
>> 
>> What's wrong with this?  Well, from the JIT POV it is micromanagement, by the
>> CE code, of the order of operations.  The JIT has to reverse any unfortunate
>> decisions in the CE code if it wants to respect the order of operations in
>> the OE code.
>> 
>> From the POV of the author of MyVal.equals. the CE enforces a policy that it
>> is more profitable to compare the bits of payload1 and the object identity
>> of payload2 before the structure of TypeDesc is consulted.  Sometimes this
>> is right, sometimes it is wrong, but the CE+OE idiom fits everybody into a
>> one-size-fits all policy, and takes away ordering control from the .equals
>> method.
>> 
>> Could the JIT reorder stuff any way it wants, and make the code more like
>> what the author intended?  After all, CE is provably a subset of OE.  There
>> are two problems with that.  First, the subset relation is subtle and not
>> visible (to the JIT) at the bytecode level.  (After all, CE and OE are two
>> completely different bytecode shapes.)
>> 
>> Second, as noted above, the possibility of exceptions places barriers to
>> reordering.  You cannot reorder exceptions with normal exits.  In the above
>> IR, if the payload1 fields differ, TypeDesc.equals is never called, so even
>> if it were going to throw an exception (due to an assert or a bug or a
>> malformed MyVal or TypeDesc, say) the IR says it can't, and the JIT can't
>> reorder the CE code to merge it into the OE code.
>> 
>> In this particular example, the possible optimizations are removal of the
>> dominated tests marked "[DT …]" above.  The DT on payload2 is inside the
>> implementation of Arrays.equals.  The DT on a.type may or may not be present
>> in the TypeDesc.equals method; it's a programmer choice.
>> 
>> More perturbations are possible with larger value objects:  If you add more
>> 64-bit payload fields, the bitwise comparison clearly needs to load the
>> entire value object (from whatever buffer or box it is in) before the first
>> call to TypeDesc.equals.  If the programmer knows that TypeDesc.equals
>> method is likely to prove inequality, without examination of the payload,
>> then the first pass can be wasted bandwidth, in cases of degenerate values
>> with (say) all zeroes payloads.  Is this a problem?  Depends on the data
>> structure.  In this case, the dominated tests (e.g., DT of payload1) can
>> usually be elided, so at worst it is a reordering of programmer intentions.
>> 
>> On the other hand, if you add more reference fields (like type or payload2),
>> the first pass amounts to a reordering of (likely) ACMP operations inside
>> the .equals method of each successive component.  Is that bad?  Well, as
>> noted above, it forces the JIT to load each reference field twice, once to
>> do ACMP, and then (much later) to call .equals.  Are there enough registers
>> to avoid double loads?  Often, yes, until the computation gets register
>> bound, and then you simply have multiple loads, just to ensure the semantics
>> of exception ordering.
>> 
>> In summary, optimizing CE || OE (the legacy semantics of LIFE) is not
>> horrible but will remove some programmer control from .equals ordering, and
>> will cause more data motion in the presence of multiple reference
>> components.
>> 
>> LIFE with P1 will also make the JIT slightly slower and less robust, as it
>> pattern matches on more ad hoc and verbose bytecode shapes.  (Complex
>> pattern matching is difficult; that's why we moved the string concatenation
>> idioms into invokedynamic in JDK 9.)
>> 
>> Let's go to P2 for a moment.  In the presence of LIFE, the IR will first
>> contain unexpanded intrinsics for a repeated OE || OE:
>> 
>> // a==b
>> z = INTRINSIC[a.equals(b)];
>> if (!z)  goto NOT_EQUAL;
>> // a.equals(b)
>> z = INTRINSIC[a.equals(b)];
>> if (!z)  goto NOT_EQUAL;
>> goto IS_EQUAL;
>> 
>> We need a special permission, an above-the-bytecode guarantee that any normal
>> or abnormal result produced by the second call will be produced by the first
>> call.  In that case, we can elide the IR for the second call (as provably
>> redundant):
>> 
>> z = INTRINSIC[a.equals(b)];
>> if (!z)  goto NOT_EQUAL;
>> goto IS_EQUAL;
>> 
>> Expansion then continues as above, but starting with PASS2.  Code needs less
>> optimization and is more under the control of the author of .equals.
>> 
>> HTH
>> — John