Value equality

John Rose john.r.rose at oracle.com
Wed May 18 22:11:12 UTC 2016


On May 18, 2016, at 7:57 AM, Brian Goetz <Brian.Goetz at Oracle.COM> wrote:
> 
> [1] John points out that if == is CE, then (a==b||a.equals(b)) will redundantly load the fields on failed ==.  But, many equals implementations start with "a==b" as a short-circuiting optimization, which means "a==b" will be a common (pure) subexpression in the resulting expansion (and for values, methods are monomorphic and will get inlined more frequently), so the two checks can be collapsed.

Proposed acronym:  Call the explicit legacy idiom for equality "LIFE".  This is "a==b||a.equals(b)", maybe with a null (or NaN?) check.  Semantically it is CE||OE.

The question of optimization is worth expanding on.  In a nutshell, pure control flow test like a.f=ACMP=b.f are often collapsible.  (The dominating test makes the later test go away, or in other words, the dominated test can rise upward to merge into the dominating test.)  But, even pure tests cannot be sunk past exception-producing code.  If a sub-operation of OE can produce an exception, it cannot be reordered with a previous CE sub-operation.  Unfortunately, exceptions provide a semantic "window" into order of operations; this is one of the major difficulties with Java optimization.

For example:

if (a.someTag != b.someTag)  return NOT_EQUAL;  //CE, first pass
if (!a.equals(b))  return NOT_EQUAL; //OE, second pass
return IS_EQUAL;

If the heavy comparison somehow includes a repetition of the "someTag" comparison, that latter comparison can be removed as a dominating test.

But, if the heavy comparison can throw an exception (or make any side effect), then the "someTag" comparison cannot be moved *down* into or past the equals method.

The net of this is that the writer of the equals method has, in general, no control over the first pass of the equals algorithm, even after optimization.  The code might get some spot benefits from removal of explicit tests in the OE pass if they are already covered (dominated) by the CE pass.  If the code *omits* the equivalent of a CE test (e.g., an ACMP comparison) for some reason, the CE test is extra work.  Not much extra work, but it could cause extra fills or spills as the JIT works to provide a first-pass access to a component that will be needed, again, in a second pass after arbitrary intermediate labor.

So a mandatory LIFE sentence (couldn't resist that), if op== is CE (that's P1), means some loss of performance control around equals methods.

By comparison, if op== is OE (that's P2), there is no need to explicitly call CE; the .equals method does the exact job, and the coder has control.

In legacy code where op== is OE, the LIFE expression produces back-to-back calls to .equals, semantically OE||OE.  That would be even worse than CE||OE, unless we have a way to optimize OE||OE to OE, or until the users pick the LIFE (or the Legacy Idiom Connoting Equality) out of their code.

But, if it is allowed, collapsing two adjacent calls to .equals is a much simpler JIT transform than merging an adjacent CE and NE.  It's a macro-level CSE on partially lowered IR.

As noted above, merging a CE into an NE requires that all the logic of the CE be performed before the first possible exception produced by NE.

Let's try a more complete example:

class MyVal {
  TypeDesc type;
  long payload1;
  long[] payload2;
  boolean equals(MyVal that) {
    return this.type.equals(that.type) && this.payload1 == that.payload1 && Arrays.equals(this.payload1, that.payload2);
  }
}
class User {
   boolean foo(MyVal a, MyVal b) {
      return a==b || a.equals(b);  // CE || OE == LIFE
   }
}

In P1, the intermediate representation for the legacy idiom will contain these operations:

PASS1: // start CE
if (a.type !ACMP= b.type)  goto PASS2;
if (a.payload1 !LCMP= b.payload1)  goto PASS2;
if (a.payload2 !ACMP= b.payload2)  goto PASS2;
return IS_EQUAL;
PASS2: // start OE
z = a.type.INVOKEVIRTUAL[TypeDesc, equals, (TypeDesc)Z](b.type); //[DT a.type?]
… // random logic here; can throw exceptions in general
if (!z)  return NOT_EQUAL;
if (a.payload1 !LCMP= b.payload1)  return NOT_EQUAL; //[DT a.payload1]
z = INVOKESTATIC[Arrays, equals, ([J [J)Z](a.payload2, b.payload2); //[DT a.payload2]
… // random logic here; can throw exceptions in general
if (!z)  return NOT_EQUAL;
return IS_EQUAL;

What's wrong with this?  Well, from the JIT POV it is micromanagement, by the CE code, of the order of operations.  The JIT has to reverse any unfortunate decisions in the CE code if it wants to respect the order of operations in the OE code.

From the POV of the author of MyVal.equals. the CE enforces a policy that it is more profitable to compare the bits of payload1 and the object identity of payload2 before the structure of TypeDesc is consulted.  Sometimes this is right, sometimes it is wrong, but the CE+OE idiom fits everybody into a one-size-fits all policy, and takes away ordering control from the .equals method.

Could the JIT reorder stuff any way it wants, and make the code more like what the author intended?  After all, CE is provably a subset of OE.  There are two problems with that.  First, the subset relation is subtle and not visible (to the JIT) at the bytecode level.  (After all, CE and OE are two completely different bytecode shapes.)

Second, as noted above, the possibility of exceptions places barriers to reordering.  You cannot reorder exceptions with normal exits.  In the above IR, if the payload1 fields differ, TypeDesc.equals is never called, so even if it were going to throw an exception (due to an assert or a bug or a malformed MyVal or TypeDesc, say) the IR says it can't, and the JIT can't reorder the CE code to merge it into the OE code.

In this particular example, the possible optimizations are removal of the dominated tests marked "[DT …]" above.  The DT on payload2 is inside the implementation of Arrays.equals.  The DT on a.type may or may not be present in the TypeDesc.equals method; it's a programmer choice.

More perturbations are possible with larger value objects:  If you add more 64-bit payload fields, the bitwise comparison clearly needs to load the entire value object (from whatever buffer or box it is in) before the first call to TypeDesc.equals.  If the programmer knows that TypeDesc.equals method is likely to prove inequality, without examination of the payload, then the first pass can be wasted bandwidth, in cases of degenerate values with (say) all zeroes payloads.  Is this a problem?  Depends on the data structure.  In this case, the dominated tests (e.g., DT of payload1) can usually be elided, so at worst it is a reordering of programmer intentions.

On the other hand, if you add more reference fields (like type or payload2), the first pass amounts to a reordering of (likely) ACMP operations inside the .equals method of each successive component.  Is that bad?  Well, as noted above, it forces the JIT to load each reference field twice, once to do ACMP, and then (much later) to call .equals.  Are there enough registers to avoid double loads?  Often, yes, until the computation gets register bound, and then you simply have multiple loads, just to ensure the semantics of exception ordering.

In summary, optimizing CE || OE (the legacy semantics of LIFE) is not horrible but will remove some programmer control from .equals ordering, and will cause more data motion in the presence of multiple reference components.

LIFE with P1 will also make the JIT slightly slower and less robust, as it pattern matches on more ad hoc and verbose bytecode shapes.  (Complex pattern matching is difficult; that's why we moved the string concatenation idioms into invokedynamic in JDK 9.)

Let's go to P2 for a moment.  In the presence of LIFE, the IR will first contain unexpanded intrinsics for a repeated OE || OE:

// a==b
z = INTRINSIC[a.equals(b)];
if (!z)  goto NOT_EQUAL;
// a.equals(b)
z = INTRINSIC[a.equals(b)];
if (!z)  goto NOT_EQUAL;
goto IS_EQUAL;

We need a special permission, an above-the-bytecode guarantee that any normal or abnormal result produced by the second call will be produced by the first call.  In that case, we can elide the IR for the second call (as provably redundant):

z = INTRINSIC[a.equals(b)];
if (!z)  goto NOT_EQUAL;
goto IS_EQUAL;

Expansion then continues as above, but starting with PASS2.  Code needs less optimization and is more under the control of the author of .equals.

HTH
— John


More information about the valhalla-spec-observers mailing list