Value equality
Remi Forax
forax at univ-mlv.fr
Wed May 18 22:52:18 UTC 2016
Even without talking about optimizations,
LIFE for a value type is not correct because CE may not be included in EQ.
value class Foo {
int value;
public boolean equals(Foo foo) {
return value & 0xFF == foo.value & 0xFF;
}
}
here, if CE do a bitwise comparaison, the semantics of LIFE is not the semantics of EQ.
Rémi
----- Mail original -----
> De: "John Rose" <john.r.rose at oracle.com>
> À: "Brian Goetz" <Brian.Goetz at oracle.com>
> Cc: valhalla-spec-experts at openjdk.java.net
> Envoyé: Jeudi 19 Mai 2016 00:11:12
> Objet: Re: Value equality
>
> On May 18, 2016, at 7:57 AM, Brian Goetz <Brian.Goetz at Oracle.COM> wrote:
> >
> > [1] John points out that if == is CE, then (a==b||a.equals(b)) will
> > redundantly load the fields on failed ==. But, many equals
> > implementations start with "a==b" as a short-circuiting optimization,
> > which means "a==b" will be a common (pure) subexpression in the resulting
> > expansion (and for values, methods are monomorphic and will get inlined
> > more frequently), so the two checks can be collapsed.
>
> Proposed acronym: Call the explicit legacy idiom for equality "LIFE". This
> is "a==b||a.equals(b)", maybe with a null (or NaN?) check. Semantically it
> is CE||OE.
>
> The question of optimization is worth expanding on. In a nutshell, pure
> control flow test like a.f=ACMP=b.f are often collapsible. (The dominating
> test makes the later test go away, or in other words, the dominated test can
> rise upward to merge into the dominating test.) But, even pure tests cannot
> be sunk past exception-producing code. If a sub-operation of OE can produce
> an exception, it cannot be reordered with a previous CE sub-operation.
> Unfortunately, exceptions provide a semantic "window" into order of
> operations; this is one of the major difficulties with Java optimization.
>
> For example:
>
> if (a.someTag != b.someTag) return NOT_EQUAL; //CE, first pass
> if (!a.equals(b)) return NOT_EQUAL; //OE, second pass
> return IS_EQUAL;
>
> If the heavy comparison somehow includes a repetition of the "someTag"
> comparison, that latter comparison can be removed as a dominating test.
>
> But, if the heavy comparison can throw an exception (or make any side
> effect), then the "someTag" comparison cannot be moved *down* into or past
> the equals method.
>
> The net of this is that the writer of the equals method has, in general, no
> control over the first pass of the equals algorithm, even after
> optimization. The code might get some spot benefits from removal of
> explicit tests in the OE pass if they are already covered (dominated) by the
> CE pass. If the code *omits* the equivalent of a CE test (e.g., an ACMP
> comparison) for some reason, the CE test is extra work. Not much extra
> work, but it could cause extra fills or spills as the JIT works to provide a
> first-pass access to a component that will be needed, again, in a second
> pass after arbitrary intermediate labor.
>
> So a mandatory LIFE sentence (couldn't resist that), if op== is CE (that's
> P1), means some loss of performance control around equals methods.
>
> By comparison, if op== is OE (that's P2), there is no need to explicitly call
> CE; the .equals method does the exact job, and the coder has control.
>
> In legacy code where op== is OE, the LIFE expression produces back-to-back
> calls to .equals, semantically OE||OE. That would be even worse than
> CE||OE, unless we have a way to optimize OE||OE to OE, or until the users
> pick the LIFE (or the Legacy Idiom Connoting Equality) out of their code.
>
> But, if it is allowed, collapsing two adjacent calls to .equals is a much
> simpler JIT transform than merging an adjacent CE and NE. It's a
> macro-level CSE on partially lowered IR.
>
> As noted above, merging a CE into an NE requires that all the logic of the CE
> be performed before the first possible exception produced by NE.
>
> Let's try a more complete example:
>
> class MyVal {
> TypeDesc type;
> long payload1;
> long[] payload2;
> boolean equals(MyVal that) {
> return this.type.equals(that.type) && this.payload1 == that.payload1 &&
> Arrays.equals(this.payload1, that.payload2);
> }
> }
> class User {
> boolean foo(MyVal a, MyVal b) {
> return a==b || a.equals(b); // CE || OE == LIFE
> }
> }
>
> In P1, the intermediate representation for the legacy idiom will contain
> these operations:
>
> PASS1: // start CE
> if (a.type !ACMP= b.type) goto PASS2;
> if (a.payload1 !LCMP= b.payload1) goto PASS2;
> if (a.payload2 !ACMP= b.payload2) goto PASS2;
> return IS_EQUAL;
> PASS2: // start OE
> z = a.type.INVOKEVIRTUAL[TypeDesc, equals, (TypeDesc)Z](b.type); //[DT
> a.type?]
> … // random logic here; can throw exceptions in general
> if (!z) return NOT_EQUAL;
> if (a.payload1 !LCMP= b.payload1) return NOT_EQUAL; //[DT a.payload1]
> z = INVOKESTATIC[Arrays, equals, ([J [J)Z](a.payload2, b.payload2); //[DT
> a.payload2]
> … // random logic here; can throw exceptions in general
> if (!z) return NOT_EQUAL;
> return IS_EQUAL;
>
> What's wrong with this? Well, from the JIT POV it is micromanagement, by the
> CE code, of the order of operations. The JIT has to reverse any unfortunate
> decisions in the CE code if it wants to respect the order of operations in
> the OE code.
>
> From the POV of the author of MyVal.equals. the CE enforces a policy that it
> is more profitable to compare the bits of payload1 and the object identity
> of payload2 before the structure of TypeDesc is consulted. Sometimes this
> is right, sometimes it is wrong, but the CE+OE idiom fits everybody into a
> one-size-fits all policy, and takes away ordering control from the .equals
> method.
>
> Could the JIT reorder stuff any way it wants, and make the code more like
> what the author intended? After all, CE is provably a subset of OE. There
> are two problems with that. First, the subset relation is subtle and not
> visible (to the JIT) at the bytecode level. (After all, CE and OE are two
> completely different bytecode shapes.)
>
> Second, as noted above, the possibility of exceptions places barriers to
> reordering. You cannot reorder exceptions with normal exits. In the above
> IR, if the payload1 fields differ, TypeDesc.equals is never called, so even
> if it were going to throw an exception (due to an assert or a bug or a
> malformed MyVal or TypeDesc, say) the IR says it can't, and the JIT can't
> reorder the CE code to merge it into the OE code.
>
> In this particular example, the possible optimizations are removal of the
> dominated tests marked "[DT …]" above. The DT on payload2 is inside the
> implementation of Arrays.equals. The DT on a.type may or may not be present
> in the TypeDesc.equals method; it's a programmer choice.
>
> More perturbations are possible with larger value objects: If you add more
> 64-bit payload fields, the bitwise comparison clearly needs to load the
> entire value object (from whatever buffer or box it is in) before the first
> call to TypeDesc.equals. If the programmer knows that TypeDesc.equals
> method is likely to prove inequality, without examination of the payload,
> then the first pass can be wasted bandwidth, in cases of degenerate values
> with (say) all zeroes payloads. Is this a problem? Depends on the data
> structure. In this case, the dominated tests (e.g., DT of payload1) can
> usually be elided, so at worst it is a reordering of programmer intentions.
>
> On the other hand, if you add more reference fields (like type or payload2),
> the first pass amounts to a reordering of (likely) ACMP operations inside
> the .equals method of each successive component. Is that bad? Well, as
> noted above, it forces the JIT to load each reference field twice, once to
> do ACMP, and then (much later) to call .equals. Are there enough registers
> to avoid double loads? Often, yes, until the computation gets register
> bound, and then you simply have multiple loads, just to ensure the semantics
> of exception ordering.
>
> In summary, optimizing CE || OE (the legacy semantics of LIFE) is not
> horrible but will remove some programmer control from .equals ordering, and
> will cause more data motion in the presence of multiple reference
> components.
>
> LIFE with P1 will also make the JIT slightly slower and less robust, as it
> pattern matches on more ad hoc and verbose bytecode shapes. (Complex
> pattern matching is difficult; that's why we moved the string concatenation
> idioms into invokedynamic in JDK 9.)
>
> Let's go to P2 for a moment. In the presence of LIFE, the IR will first
> contain unexpanded intrinsics for a repeated OE || OE:
>
> // a==b
> z = INTRINSIC[a.equals(b)];
> if (!z) goto NOT_EQUAL;
> // a.equals(b)
> z = INTRINSIC[a.equals(b)];
> if (!z) goto NOT_EQUAL;
> goto IS_EQUAL;
>
> We need a special permission, an above-the-bytecode guarantee that any normal
> or abnormal result produced by the second call will be produced by the first
> call. In that case, we can elide the IR for the second call (as provably
> redundant):
>
> z = INTRINSIC[a.equals(b)];
> if (!z) goto NOT_EQUAL;
> goto IS_EQUAL;
>
> Expansion then continues as above, but starting with PASS2. Code needs less
> optimization and is more under the control of the author of .equals.
>
> HTH
> — John
More information about the valhalla-spec-observers
mailing list