How records would fit into Google's codebase

Wed Apr 3 18:53:22 UTC 2019

Thanks Alan for this good work; grounding the analysis in real codebases is a valuable tool for validating our theories.

Some comments inline.  

> One additional thing I would have liked to do is to somehow find classes which were “almost” records, but which ended up not using @AutoValue. A survey of these might help us decide what changes we could make to increase adoption, or simply confirm for us, “Yes, it’s a good thing we included restriction X, because this class is a bad candidate for a record, but without restriction X it might have become a record”. Alas, unsurprisingly, it is much easier to find actual @AutoValue classes than to design a heuristic for “almost an @AutoValue”, and so I have not done this.

This is desirable, but as you point out, hard to do — you can’t grep for “@WishIWasAnAutoValue”.  Often the best we can do is recall specific instances of coding in the past when we tumbled off the cliff, and try to reconstruct what was going on.

> * Records can expect to be about as common as enums

If this was the outcome, I’d call it winning, though I hope for more.  The concision of records hopefully makes people more interested in using them as _local_ classes, as pure implementation details (such as the stream example in my writeup.). 

> * Records should be allowed to implement interfaces, and perhaps to extend a superclass, but should not allow their own subclasses (except perhaps abstract records)

The current design says yes on interfaces and no on superclasses, for good reason — that if a superclass has state, then the record descriptor is not “the state, the whole state, and nothing but the state.”  But, its possible this distinction is too coarse; it would probably be fine for a record to extend a class _with no instance fields_.  (In fact, in various versions of the prototype, records extended a base class, analogous to j.l.Enum, which is like this.). So its conceivable the “no superclass” rule could be relaxed to “no superclasses with state.”  And value types (which have the same restriction) could conceivably relax in the same way.  

That said, what would it buy us?  Such an abstract class could well be an interface — so is the problem that we have no-state, abstract classes in APIs that should have been interfaces, but were not?  

> * Records should have a very lightweight syntax for the common case, preferably one line. @AutoValue’s clunkier syntax may be reducing adoption for some use cases.

Especially when used encapsulated, such as in local classes.  

> * Records should be immutable. The language should enforce shallow mutability (by making all fields final), and style guides should recommend that deep immutability. 
> * We should consider adding an alternative way to construct records besides a constructor with positional parameters. Builders are popular for @AutoValue, but perhaps something better could be done for records as a language feature.

Named invocation parameters is a feature that is frequently requested.  As with most such features, it is more complicated than it initially seems, but the restrictions of records manage to avoid most of the nastier issues here.  So in theory, we could support named invocation of constructor arguments for records — but we have a concern that this would very quickly be viewed as “glass half empty”, and an impediment to freely refactoring from records to classes once you exceed the profile for which records were intended.  So I would prefer to come up with (not now, later) a more comprehensive way to address named invocation, and fit records into that.  

The other “alternative to positional constructors” option is factory methods.  Here, I think we’re stuck in an uncomfortable spot; on the one hand, factories are a common practice and have well-documented benefits over constructors; on the other hand, they’re not part of the language, so its even weirder for another language feature to rest on them.  (We have the same discomfort with accessors.). 

> * Records do not need language-level support for withFoo methods, or toBuilder, even if builder support is included.
> * Records do not need a way to include private/hidden state, or to memoize derived properties
> * Records should allow implementing Object methods by hand, rejecting the auto-generated implementation, but expect this to be done rarely

Overall, I take this as a validation that we’ve landed at just about the right place.  Good!  

> How often do @AutoValue classes make use of inheritance? 77% of @AutoValue classes are “islands” in the inheritance graph: they extend Object, and implement no interfaces. 15% of @AutoValue classes extend Object and implement exactly one interface. A mere 4% extend some class other than Object, and implement no interfaces. Very few do anything else (implement 2+ interfaces, or implement interface(s) while also extending a class).

Of those that extend a class other than Object, do you spot any common cases, either specific classes or classes with specific characteristics?  

> Just like @AutoValue, the current proposal plans to allow implementing interfaces and/or extending a class. That seems reasonable, but extending a class is rare enough that we could consider forbidding it if that fits better onto the semantic goals of “simple data carriers”: it won’t harm too large a percentage of the usages. Perhaps, for example, we could reserve the extends syntax for the future possibility of extending “abstract records” that Brian suggested. Still, it would be reasonable to allow normal subclassing, if we judge that it helps achieve the semantic goals of records.

The current proposal prohibits extending classes at all, but anticipates we might relax eventually to “abstract records”, as suggested.  

> Consider, for example, this pattern I have seen a few times:
> 
> public interface JobInfo {
>   String session();
>   boolean privileged();
>   Instant startTime();
> }

Someone else also suggested at one point we support some sort of 

    record interface JobInfo(String session, boolean privileged, Instant startTime)

option.  But, what I’m having a hard time seeing is: is this an interface that multiple classes would want to implement?  It seems mostly an extraction of the API of a single class.  

> Half of @AutoValue classes do the simplest thing: they define a single static factory which delegates to the generated constructor. 10% define two different factories. These groups, totaling 60% of @AutoValue classes, map well onto records as currently proposed. However, a third of @AutoValue classes think it’s worth the trouble to define a builder instead of, or in addition to, the static factory, even though they have to write a bunch more code to support it. This is an area where records could serve developers’ needs better, by offering some kind of opt-in support for generating a builder to go with your data carrier.
> 
> But why do people want a builder? How do they use it? Perhaps what they really want is named arguments, or default arguments. A builder may just be the best @AutoValue can do with the language as-is, but a new feature can try something bolder. In some use cases I see, every use of the builder sets every field explicitly, so the main advantage of a builder is that the field names are associated with their values at the construction call site. Such classes would probably be happy with a constructor with named arguments.

This is my theory; that named/default arguments will obviate 90+% of builders.  

> “Modification”
> 
> Of course with records being immutable, you can’t modify an existing record. But is it common to ask for a “modified version” of a record, copying a subset of fields but changing others? An often-suggested feature for records is support for “wither methods”: methods like
> 
> MyRecord withFoo(int newFoo) {return new MyRecord(newFoo, this.bar);}

I think such things will be more common with value types than with records — especially with values that encapsulate some state, such as a Cursor into a data structure.  A value cursor would be used like this:

    for (Cursor c = source.cursor(); c.hasNext(); c = c.next()) { … }

which is like an iterator, but doesn’t require mutation of a heap-based object.  With-like behavior (encapsulated within the next() method) will be a regular feature of such values.  

> @AutoValue supports another kind of “modification” that I expected to be more popular: toBuilder(). If your data carrier uses a builder for its construction, you can ask @AutoValue to generate a toBuilder() method, which converts an existing value into a builder, so that you can ask for a subset of fields to be changed before solidifying back down into an immutable value. But it turns out this feature is used very rarely: only 1.5% of @AutoValue classes with builders use this feature, which is less than 1% of all @AutoValue classes. So even considering wither methods and toBuilder together, less than 5% of @AutoValue classes use this feature.

This idiom might well be replaced with a pattern match, since records will (eventually) come with built-in pattern matching.

> In addition to overriding Object methods, there are other method signatures that crop up multiple times. Most common, although still less common than toString(), are conversion functions like toJson, toProto(), or toBuilder() (see Construction section). Much more rare, at around 0.1%, are methods like iterator() and size(): some @AutoValue classes wrap a single ImmutableCollection of some kind, and implement methods that delegate to this field. This could be an argument in favor of the recent method-delegation proposal, but it is a pretty rare thing to do, and many of these cases are really not a great idea: they should just call foo.coll().iterator() instead of foo.iterator(), and having Foo implement Iterable brings relatively little benefit.

And is well handled by the current proposal; just declare the interfaces and implement the methods.  

> <JDKRecordsProposalReport.html>

Thanks again for the great data.  I was a bit surprised, but pleased, to see that many of these “what about X” issues that came up in the design turn out to be infrequently used in practice.