Updated document on data classes and sealed types

Thu Mar 7 20:45:33 UTC 2019

Thanks for these great comments.  These cut to the heart of some uncomfortable tradeoffs.

> I have two remarks about this proposal. The first is basically: why allow overriding accessors? If a record is required to have a one-to-one correspondence between its (private final) fields and its public accessors, and is required to “give up [its] data freely to all requestors” what possible override could be correct? It makes sense to allow overriding the constructor, for validation and normalization, but once the fields are cemented in place, what could an accessor do but return its corresponding field?

Yes, overriding accessors could be abused to avoid giving up the classes data; they could be overridden to throw, for example, which would undermine the “give up their data easily” dictum.  

Note that if overriding accessors were not allowed, it would be as if the fields were public and final.  (We actually considered that as an option, briefly.)  I actually think that public final fields get a bad rap, but the Uniform Access principle encourages accessors, and public final fields freak a lot of people out.  So between the two, non-overridable accessors seem better.  

But, there’s still a reason to allow overriding accessors — mutable types which don’t provide unmodifiable views — arrays being the obvious case.  Yes, arrays and records are an uncomfortable pairing, but if you can override the accessors, at least you can clone them on the way out.  (If you can’t override the accessors, people might still use records with arrays, and then expose the mutable state, possibly without realizing it.  That seems worse.)  So overriding accessors seems like it should be in the “safe, legal, and rare” category.  

Note too that the deconstruction pattern is likely to delegate to the accessors, so that you only have to override things in one place to prevent mutable state from leaking.  

There’s another consideration, too.  We considered outlawing overriding the equals/hashCode method.  This goes a long way towards enforcing the desired invariants, but again seems pretty restrictive.  And, having an irregular set of rules about what can be overridden and what can’t (e.g., no to equals, yes to toString), seems likely to (a) make the feature harder to learn/undersatnd and (b) lead to lots more “why can’t I, I just want to ….” complaints.  Better to have an all-or-nothing treatment of overriding, even though people can undermine the intent by careless overriding.  (One thing working in our favor here is that, if you’re overriding a bunch of methods, the concision benefit drops a lot, so that helps limit the problem.)  

Before I jump into the second, let me talk about intended overriding modes for the constructor.  These are primarily: validation and normalization.  

The validation cases are obvious:

     record Range(int lo, int hi) { 
          public Range {
               if (lo > hi) throw new LowGreaterThanHighException();
          }
     }

Normalization can happen on single arguments or multiple:

     record Person(String name) { 
          public Person {
               name = name.toUpperCase();
          }
     }

(Note that I’m mutating the parameter, which will then get written to the field.)

     record Rational(int num, int denom) { 
          public Range {
              int gcd = gcd(num, denim);
              num /= gcd;
              denom /= gcd;
          }
     }

> 
> My second remark is much more long-winded, and inspired by the first. The TL;DR version is: what about normalization and derived fields? 

This is two questions :)  Let’s start with the first.

> In the longer version below, I’ll be using Fraction as an example of a simple class that could be a record instead, where normalization is reducing a fraction to simplest form. However, please generalize from this: it could apply to any record where a derived field can be computed from the provided fields by computing a perhaps-expensive pure function on them.

Rational numbers are a great example; Guy raised these earlier as well.  Where rationals challenge the model here is: the user provided a state vector of (4, 2), but the final state of the object is (2, 1).  This is at odds with the following desirable-seeming invariant:

     record Foo(int x, int y)
     assert new Foo(1, 2).x() == 1
     assert new Foo(1, 2).y() == 2

That is, if we normalize any fields in the ctor, then the relationship of “the constructor argument x and the accessor x() are referring to the same state” appears to be severed.

> 
> 
> A Fraction library can’t satisfy both Fran and Peter. It has to choose a place to do this normalization, or else decline to do it at all - but this is no solution, as now the class has very sharp edges, really no more useful than a Pair<Integer, Integer>.

That’s true, but what Peter really _wants_ is an IntIntPair class!  Because his goals are that it should hold the pair, and do no extra computation (and commit to no additional semantic requirements).  And he can easily write one.  (Or, he could get over his micro-performance obsession and use Fran’s class.)  

So, let’s wrap up normalization before we get to derived fields.  I don’t mind the Peter/Fran tension here, but I am mildly uncomfortable at the fact that “new Foo(x, y).x() == x” doesn’t always hold, because it complicates an attractive invariant.  

The actual invariant you get with normalization is slightly more complicated: that there be a projection-embedding pair between the constructor arguments and the representation.  Let’s write the ctor args and state as a tuple; while ctor \andThen dtor is not an identity, going around the other way (dtor \then ctor) is as long as the normalization is well-defined and consistently applied.   This is a tradeoff of simplicity vs usefulness; overall it seems a fair balance.  

> 
> There are two possible solutions I see to this. The first is to permit some kind of derived-field mechanism, preferably lazy. Then, Fraction’s constructor would save a thunk for producing the reduced form, and refer to that thunk in the numerator() and denominator() accessors, but ignore it in the #mul method so that we don’t pay the cost of reducing unless we want it (here, imagine reducing a Fraction is more expensive than allocating a thunk).

The stricture against derived fields was probably the hardest choice here.  On the one hand, strictly derived fields are safe and don’t undermine the invariants; on the other, without more help from the language or runtime, we can’t enforce that additional fields are actually derived, *and* it will be ultra-super-duper-tempting to make them not so.  (I don’t see remotely as much temptation to implement maliciously nonconformant accessors or equals methods.)  If we allowed additional fields, we would surely have to lock down equals/hashCode.  

We’re exploring the notion of lazy final fields; I think that would move the balance on allowing additional fields, since the mechanism would push pretty hard to making them truly derived from the record state.  

> 
> The second is to simply say that Fraction is a bad candidate for a record, because it wants to decouple its interface from its implementation. I think this is actually the right approach, but it may be unconvincing because of how “obvious” it is that a Fraction is just a pair with some extra calculations to perform based on its components. If we say that Fraction is a bad record, I worry that many more bad records like it will be built, and their subtle problems discovered only after their APIs have been published and committed to. Further, if this is indeed a bad record, I can’t think of any other good use case for overriding an accessor method (my first remark).

I’m sympathetic to both sides of this argument.  One the one hand, we want the feature to be useful; on the other, we want it to have a clear, unambiguous user model.  

A third explanation is that Peter’s expectations are either unreasonable or inconsistent with the idea of using someone else’s library class.  

> 
> On Fri, Mar 1, 2019 at 12:28 PM Brian Goetz <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>> wrote:
> I've updated the document on data classes here:
> 
>      http://cr.openjdk.java.net/~briangoetz/amber/datum.html <http://cr.openjdk.java.net/~briangoetz/amber/datum.html>
> 
> (older versions of the document are retained in the same directory for 
> historical comparison.)
> 
> While the previous version was mostly about tradeoffs, this version 
> takes a much more opinionated interpretation of the feature, offering 
> more examples of use cases of where it is intended to be used (and not 
> used).  Many of the "under consideration" flexibilities (extension, 
> mutability, additional fields) have collapsed to their more restrictive 
> form; while some people will be disappointed because it doesn't solve 
> the worst of their boilerplate problems, our conclusion is: records are 
> a powerful feature, but they're not necessarily the delivery vehicle for 
> easing all the (often self-inflicted) pain of JavaBeans.  We can 
> continue to explore relief for these situations too as separate 
> features, but trying to be all things to all classes has delayed the 
> records train long enough, and I'm convince they're separate problems 
> that want separate solutions.  Time to let the records train roll.
> 
> I've also combined the information on sealed types in this document, as 
> the two are so tightly related.
> 
> Comments welcome.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20190307/2ea8ba45/attachment-0001.html>