Value types questions & comments
John Rose
john.r.rose at oracle.com
Mon May 16 19:41:12 UTC 2016
On Apr 12, 2016, at 1:51 PM, Brian Goetz <Brian.Goetz at Oracle.COM> wrote:
>
>>
>> Using a value type for something that isn't a value raises alarm bells for me. At the minimum I would expect this user to have to implement eq/hc by hand, because the default behavior users want 99% of the time is (deep) content-based equality.
>
> This may be the reality-distortion field speaking, but in my view a reference *is* a kind of value — albeit a very special kind. They’re immutable, like other values.
To be more precise: Values have no mutable subfields. References and primitives have no mutable subfields. (References, when not null, point to various objects, some of which themselves have mutable subfields.) In this way, they are all similar.
Also, *some* value or references (but not any primitives or some references like String) *may* refer to objects which have mutable subfields or some other kind of mutable state.
Mainly because of this property, values, primitives, and references are at least partially referentially transparent under copy operations. When you copy a value (of any sort, ref or val), you capture all of its substructure, including any possible future value of its substructure. But the copy is not infinitely deep. (That would be an additional commitment.) The copy only copies the immediate subfields of the value: The whole ref, both halves of the long, all immediate fields of a custom value type.
So far a reference is indistinguishable from a one-field value wrapping a reference. The next question is whether a value-wrapped reference should be restricted in its mutability (or in its type, etc.). I think we have to give a firm *no* here; there's too much to lose from making restrictions that are not natural to the computational machinery of the JVM.
For example, if (following some well-intentioned commentators) we require that the transitive closure of a value be fully immutable ("like an int, right?") we give up the ability to use values as cursors and tuples. Q: "Why can't I return two values from my method without boxing to the heap?" A: "Well, one of your values was really a reference to mutable state, don't you see." That's not a conversation I am willing to have. Some languages (even on the JVM) might not allow partial immutability, but Java must, IMO.
> Almost all their state is encapsulated (they can be compared by identity, that’s it). They can only be constructed by privileged factories (we call these constructors.) But, ultimately, they behave like values — they are passed by value, they have no identity of their own.
…In any case, values with immutable immediate components lead us to questions about their equivalence relations (equality).
There is more than one natural-looking equality predicate that can be assigned to a tuple (and therefore to any value). This means any choice of default has a ready-made rebuttal.
1. Component-wise boxed normal. Equivalent to boxing the tuple into a List<Object>, and running List.equals.
2. Component-wise op==. Equivalent to using the Java op== on each component, and-ing the bits together, or (if possible) packing into an array and using Arrays.equals.
3. Copy-wise equality. Component-wise op==, except that floats and doubles are converted to their corresponding "raw bits".
And, perhaps a surprise:
0. Component-wise normal. Equivalent to boxing the tuple into a List<Object>, and running List.equals, except that floats and doubles are boxed to a non-standard container that lifts floating-point equality op==(float,float).
(Non-tuple values can be thought of as those with private state, as hashcode-savers, or with value-specific symmetries, as rationals. These guys should define their own equality relations explicitly.)
The basic choice (for tuples) is whether to recurse into "equals" on references (choices 0, 1). A secondary choice is whether to treat floats and doubles copy-wise (choices 0, 2). If not copy-wise, there is a further choice between Float.equals and op==(float,float). (Sadly, they are not exactly the same; see the javadoc!)
The numbered cases 0..3 are in order of increasing refinement. Actually, 2 does not exactly refine 1, because of the difference between boxed and unboxed float equality. But #3 distinguishes the logical maximum number of values. (I.e., there is no stronger equality relation, where the "weakest" relation would make everything appear equal.)
#0 captures the common practice of using op==(val,val) for values and Object.equals for refs. I'd like to call this "normal equality", and it is may be what we want for any-variables. (<T> T t1, t2; t1==t2 could be normal in this way.) For floats there's a second choice that may be called "boxed normal equality", where Object.equal is used uniformly after boxing non-refs. (You can also invent more: What if the refs are box types? What about arrays? Etc.)
Both kinds of "normal" of equality (actually, all kinds that resort to Object.equals) are in tension with what I want to call "copy-wise equality". Two values are copy-wise equal if and only if there is no way to prove that one is not an copy (by assignment) of the other. This is basically bitwise equality, but don't tell implementors that, because they will be required to skip padding bits, and to traverse internal indirections (if the implementation uses them), to prevent spurious inequality results.
Copy-wise equality is sometimes desirable, in order to prove that an operand has been seen before. (IdentityHashMap is used today for this purpose.) I think it should get a separate name, "System.isEqualCopy(T,T)" or some such. It is possible to imagine value types (such as IHM entries or proxies) which would use copy-wise equality on some of their fields.
Here's my evaluation of the four choices:
0. Component-wise normal. This is equivalent to writing the sort of method that people already write, which performs .equals on refs and op== on vals. Probably the least surprising. It is also generally useful, having worked well for collections. Use it if you can afford it. But beware: Cursors require identity comparison on any backing store component instead of Object.equals.
1. Component-wise boxed normal. Easy to describe, but has an unfortunate dependency on the surprising Float.equals, which makes it unsuitable for numerically-oriented values. Avoid for that reason in favor of #0.
2. Component-wise op==. A bad candidate for V.equals, but sometimes a useful analog to op==(ref,ref). However, because of float oddities, consider favoring #3 instead.
3. Copy-wise equality. Component-wise op==, except that floats and doubles are converted to their corresponding "raw bits". Distinguishes the logical maximum number of values (there is no stronger equality relation). Should not be the default, but give it a name and allow users and value class writers to apply it when needed.
Based on this analysis, if we were to assign a default equality predicate to value types, I think the weakest (#0) would be best. This is the one which works most like collections, but which "respects" primitives more by not boxing them.
I think the current translation strategy uses normal equality for op==(T,T) where T is an any-type variable. This is consistent with imagining the default equality predicate (#0 as recommended) being expanded from any-generic code that gives each field a separate any-type.
(IMO, this is a desirable incompatibility with legacy generics, where op==(T,T) means op==(ref,ref), and usually requires an explicit backup call to Object.equals.)
Note that copy-wise equality breaks abstraction boundaries in a way analogous to its version applied to refs, which is op==(ref,ref). It can be used to prove that two apparently equivalent values (or refs) are in fact different in some way (perhaps an invisible way). Open question: Are there some values that would want to specialize the isEqualCopy method, perhaps by making it behave more like the equals method? Is that worth the cost of virtualizing this operation?
The costs (in user model and implementation complexity) of isEqualCopy are (in my mind) the exact counterpart of today's costs of reference equality. (They add complexity to the user model which is not always needed, as for Strings, and they prevent some important unbox/box optimizations when EA fails.)
— John
P.S. Exercise for the reader: For a value type V which needs deep equality through many layers of indirect structure, it might be desirable to have factory methods (V constructors) perform interning (invisibly) on indirect portions of the value, using something like an IdentityHashTable (private to the implementation). How could such a value avoid a full recursion on every call to V.equals? What would the implementation look like? Does this technique require a cost increase somewhere other than V.equals? Under what circumstances will the cost savings in V.equals pay for any other costs? (Hint: Consider using System.isEqualCopy.)
More information about the valhalla-spec-observers
mailing list