Value equality
John Rose
john.r.rose at oracle.com
Wed May 18 22:58:30 UTC 2016
Yes but if CE is a subset of OE then their union is just OE. In
– John
> On May 18, 2016, at 3:52 PM, Remi Forax <forax at univ-mlv.fr> wrote:
>
> Even without talking about optimizations,
> LIFE for a value type is not correct because CE may not be included in EQ.
>
> value class Foo {
> int value;
>
> public boolean equals(Foo foo) {
> return value & 0xFF == foo.value & 0xFF;
> }
> }
>
> here, if CE do a bitwise comparaison, the semantics of LIFE is not the semantics of EQ.
>
> Rémi
>
> ----- Mail original -----
>> De: "John Rose" <john.r.rose at oracle.com>
>> À: "Brian Goetz" <Brian.Goetz at oracle.com>
>> Cc: valhalla-spec-experts at openjdk.java.net
>> Envoyé: Jeudi 19 Mai 2016 00:11:12
>> Objet: Re: Value equality
>>
>>> On May 18, 2016, at 7:57 AM, Brian Goetz <Brian.Goetz at Oracle.COM> wrote:
>>>
>>> [1] John points out that if == is CE, then (a==b||a.equals(b)) will
>>> redundantly load the fields on failed ==. But, many equals
>>> implementations start with "a==b" as a short-circuiting optimization,
>>> which means "a==b" will be a common (pure) subexpression in the resulting
>>> expansion (and for values, methods are monomorphic and will get inlined
>>> more frequently), so the two checks can be collapsed.
>>
>> Proposed acronym: Call the explicit legacy idiom for equality "LIFE". This
>> is "a==b||a.equals(b)", maybe with a null (or NaN?) check. Semantically it
>> is CE||OE.
>>
>> The question of optimization is worth expanding on. In a nutshell, pure
>> control flow test like a.f=ACMP=b.f are often collapsible. (The dominating
>> test makes the later test go away, or in other words, the dominated test can
>> rise upward to merge into the dominating test.) But, even pure tests cannot
>> be sunk past exception-producing code. If a sub-operation of OE can produce
>> an exception, it cannot be reordered with a previous CE sub-operation.
>> Unfortunately, exceptions provide a semantic "window" into order of
>> operations; this is one of the major difficulties with Java optimization.
>>
>> For example:
>>
>> if (a.someTag != b.someTag) return NOT_EQUAL; //CE, first pass
>> if (!a.equals(b)) return NOT_EQUAL; //OE, second pass
>> return IS_EQUAL;
>>
>> If the heavy comparison somehow includes a repetition of the "someTag"
>> comparison, that latter comparison can be removed as a dominating test.
>>
>> But, if the heavy comparison can throw an exception (or make any side
>> effect), then the "someTag" comparison cannot be moved *down* into or past
>> the equals method.
>>
>> The net of this is that the writer of the equals method has, in general, no
>> control over the first pass of the equals algorithm, even after
>> optimization. The code might get some spot benefits from removal of
>> explicit tests in the OE pass if they are already covered (dominated) by the
>> CE pass. If the code *omits* the equivalent of a CE test (e.g., an ACMP
>> comparison) for some reason, the CE test is extra work. Not much extra
>> work, but it could cause extra fills or spills as the JIT works to provide a
>> first-pass access to a component that will be needed, again, in a second
>> pass after arbitrary intermediate labor.
>>
>> So a mandatory LIFE sentence (couldn't resist that), if op== is CE (that's
>> P1), means some loss of performance control around equals methods.
>>
>> By comparison, if op== is OE (that's P2), there is no need to explicitly call
>> CE; the .equals method does the exact job, and the coder has control.
>>
>> In legacy code where op== is OE, the LIFE expression produces back-to-back
>> calls to .equals, semantically OE||OE. That would be even worse than
>> CE||OE, unless we have a way to optimize OE||OE to OE, or until the users
>> pick the LIFE (or the Legacy Idiom Connoting Equality) out of their code.
>>
>> But, if it is allowed, collapsing two adjacent calls to .equals is a much
>> simpler JIT transform than merging an adjacent CE and NE. It's a
>> macro-level CSE on partially lowered IR.
>>
>> As noted above, merging a CE into an NE requires that all the logic of the CE
>> be performed before the first possible exception produced by NE.
>>
>> Let's try a more complete example:
>>
>> class MyVal {
>> TypeDesc type;
>> long payload1;
>> long[] payload2;
>> boolean equals(MyVal that) {
>> return this.type.equals(that.type) && this.payload1 == that.payload1 &&
>> Arrays.equals(this.payload1, that.payload2);
>> }
>> }
>> class User {
>> boolean foo(MyVal a, MyVal b) {
>> return a==b || a.equals(b); // CE || OE == LIFE
>> }
>> }
>>
>> In P1, the intermediate representation for the legacy idiom will contain
>> these operations:
>>
>> PASS1: // start CE
>> if (a.type !ACMP= b.type) goto PASS2;
>> if (a.payload1 !LCMP= b.payload1) goto PASS2;
>> if (a.payload2 !ACMP= b.payload2) goto PASS2;
>> return IS_EQUAL;
>> PASS2: // start OE
>> z = a.type.INVOKEVIRTUAL[TypeDesc, equals, (TypeDesc)Z](b.type); //[DT
>> a.type?]
>> … // random logic here; can throw exceptions in general
>> if (!z) return NOT_EQUAL;
>> if (a.payload1 !LCMP= b.payload1) return NOT_EQUAL; //[DT a.payload1]
>> z = INVOKESTATIC[Arrays, equals, ([J [J)Z](a.payload2, b.payload2); //[DT
>> a.payload2]
>> … // random logic here; can throw exceptions in general
>> if (!z) return NOT_EQUAL;
>> return IS_EQUAL;
>>
>> What's wrong with this? Well, from the JIT POV it is micromanagement, by the
>> CE code, of the order of operations. The JIT has to reverse any unfortunate
>> decisions in the CE code if it wants to respect the order of operations in
>> the OE code.
>>
>> From the POV of the author of MyVal.equals. the CE enforces a policy that it
>> is more profitable to compare the bits of payload1 and the object identity
>> of payload2 before the structure of TypeDesc is consulted. Sometimes this
>> is right, sometimes it is wrong, but the CE+OE idiom fits everybody into a
>> one-size-fits all policy, and takes away ordering control from the .equals
>> method.
>>
>> Could the JIT reorder stuff any way it wants, and make the code more like
>> what the author intended? After all, CE is provably a subset of OE. There
>> are two problems with that. First, the subset relation is subtle and not
>> visible (to the JIT) at the bytecode level. (After all, CE and OE are two
>> completely different bytecode shapes.)
>>
>> Second, as noted above, the possibility of exceptions places barriers to
>> reordering. You cannot reorder exceptions with normal exits. In the above
>> IR, if the payload1 fields differ, TypeDesc.equals is never called, so even
>> if it were going to throw an exception (due to an assert or a bug or a
>> malformed MyVal or TypeDesc, say) the IR says it can't, and the JIT can't
>> reorder the CE code to merge it into the OE code.
>>
>> In this particular example, the possible optimizations are removal of the
>> dominated tests marked "[DT …]" above. The DT on payload2 is inside the
>> implementation of Arrays.equals. The DT on a.type may or may not be present
>> in the TypeDesc.equals method; it's a programmer choice.
>>
>> More perturbations are possible with larger value objects: If you add more
>> 64-bit payload fields, the bitwise comparison clearly needs to load the
>> entire value object (from whatever buffer or box it is in) before the first
>> call to TypeDesc.equals. If the programmer knows that TypeDesc.equals
>> method is likely to prove inequality, without examination of the payload,
>> then the first pass can be wasted bandwidth, in cases of degenerate values
>> with (say) all zeroes payloads. Is this a problem? Depends on the data
>> structure. In this case, the dominated tests (e.g., DT of payload1) can
>> usually be elided, so at worst it is a reordering of programmer intentions.
>>
>> On the other hand, if you add more reference fields (like type or payload2),
>> the first pass amounts to a reordering of (likely) ACMP operations inside
>> the .equals method of each successive component. Is that bad? Well, as
>> noted above, it forces the JIT to load each reference field twice, once to
>> do ACMP, and then (much later) to call .equals. Are there enough registers
>> to avoid double loads? Often, yes, until the computation gets register
>> bound, and then you simply have multiple loads, just to ensure the semantics
>> of exception ordering.
>>
>> In summary, optimizing CE || OE (the legacy semantics of LIFE) is not
>> horrible but will remove some programmer control from .equals ordering, and
>> will cause more data motion in the presence of multiple reference
>> components.
>>
>> LIFE with P1 will also make the JIT slightly slower and less robust, as it
>> pattern matches on more ad hoc and verbose bytecode shapes. (Complex
>> pattern matching is difficult; that's why we moved the string concatenation
>> idioms into invokedynamic in JDK 9.)
>>
>> Let's go to P2 for a moment. In the presence of LIFE, the IR will first
>> contain unexpanded intrinsics for a repeated OE || OE:
>>
>> // a==b
>> z = INTRINSIC[a.equals(b)];
>> if (!z) goto NOT_EQUAL;
>> // a.equals(b)
>> z = INTRINSIC[a.equals(b)];
>> if (!z) goto NOT_EQUAL;
>> goto IS_EQUAL;
>>
>> We need a special permission, an above-the-bytecode guarantee that any normal
>> or abnormal result produced by the second call will be produced by the first
>> call. In that case, we can elide the IR for the second call (as provably
>> redundant):
>>
>> z = INTRINSIC[a.equals(b)];
>> if (!z) goto NOT_EQUAL;
>> goto IS_EQUAL;
>>
>> Expansion then continues as above, but starting with PASS2. Code needs less
>> optimization and is more under the control of the author of .equals.
>>
>> HTH
>> — John
More information about the valhalla-spec-observers
mailing list