Mutable records

Fri Mar 23 18:03:20 UTC 2018

A few people have asked, "wouldn't it just be easier to prohibit 
mutability in records"?  And while it surely would be easier (most of 
the issues I raised in my writeup go away without mutability), I think 
it would also greatly restrict the utility of the feature.  Let me talk 
about why, and give some examples -- and then I'd like to talk about 
what we can do, if anything, to make the mutable use cases easier.

## General argument: Mutability is pervasive in Java; you can only push 
it away a bit.

We saw this with lambdas; developers are all too eager to "work around" 
the limitation on mutable local capture by wrapping their mutables in a 
one-element array.  In fact, IDEs even "helpfully" offers to do this for 
you, thus ensuring that everyone thinks this is OK.

We will see this again with value types; even though value types are 
immutable, value types can contain references to mutable objects, and 
trying to enforce "values all the way down" would result in fairly 
useless value types.

(That doesn't mean we can't nudge towards immutability where we think it 
makes sense, if we think the value of the nudge exceeds the irregularity 
or complexity it entails.)

## Records and value types: goals and similarities

While records and value types have some features in common (getting 
equals() for free), they have different motivations.

Value types are about treating aggregates as, well, values, with all the 
things that entails; they can be freely shared, the runtime can 
routinely optimize them by putting them on the stack or in registers and 
flatten them into enclosing values, classes, or arrays (yielding better 
density and flatness.)  What they ask you to give up in exchange is 
identity, which means giving up mutability and layout polymorphism.

Records are about treating data as data; when modeling aggregates with 
records, the result is transparent classes whose API and representation 
are the same thing.  This means that records can be freely 
interconverted between their exploded and aggregate forms with no loss 
of information.  What they ask you to give up is the freedom to define 
the mapping between representation and API (constructors, accessors, 
equals, hashCode, deconstruction) in a nontransparent way.  
(Essentially, you give up all encapsulation except for the ability to 
control writes to their state.)

My claim is that the goals are mostly orthogonal, and the benefits and 
tradeoffs of each are as well.  All four quadrants make sense to me.  
Some aggregates are values but not transparent (think cursors that hold 
references into the internals of a data structure, or hold a native 
resource); some are "just their data" but not values (graph nodes, as 
well as the mutable examples below), and others are both (value records).

The superficial commonalities between records and values (both are 
restricted forms of aggregate, and these restrictions make it possible 
to provide sensible defaults for things like equals) tease us into 
thinking they are the same thing, but I don't think they are.

Assuming this to be true, how can we justify having two new constructs?  
Value types, by nature of what they require the developer to give up, 
enable the runtime to make significant optimizations it could not 
otherwise make.  So if we want flat and dense data, this is basically 
our only option -- make the programmer consent to the handcuffs.  The 
argument for records is more of a contingent one; records allow you to 
express more with less.  The "more with less" has at least two aspects; 
in addition to the obvious reduction in boilerplate, libraries and 
frameworks can make more reasonable assumptions about what construction 
or deconstruction means, and therefore can build useful functionality 
safely (such as marshaling to/from XML.)  But records don't let you do 
anything you can't already do with classes.  So if I had a quota, I'd 
have to pick values over records.

In a language with values on the roadmap, immutable-only records seem to 
offer a pretty lame return-on-complexity.  Nothing about values requires 
you to use encapsulation, so you could model most immutable records with 
a value type, with less boilerplate than a class (but more than none), 
and the remainder with classes. (Immutable records buy you one thing 
that values do not -- pointer polymorphism.  That lets you make graphs 
or trees of them.)  But I think it is clear that this model of records 
is a kind of weird half-one, half-the-other thing, and its not entirely 
clear it would carry its weight.

And, when users ask "why can't record components be mutable, after all, 
records are about data, and some data is mutable", I don't think we have 
a very good answer other than "immutability is good for you."  I much 
prefer the argument of "there are two orthogonal sets of tradeoffs; pick 
one, the other, or both."

## Use cases for mutable records

Here are two use cases that immediately come to mind; please share others.

Groups of related mutable state.  An example here is a set of counters.  
If I define:

     record CacheCounters(public int hitCount, public int accessCount) {
         float hitRate() { ... }
     }

then I can treat them as a single entity; store a counter-pair in a 
variable, have arrays of them, use them as values of Maps, pass them 
around, etc.  (The fact that they're mutable introduces constraints, but 
not new constraints; we deal with this problem every day.)  I can even 
lock on it, if that's how I want to do it.

Domain objects.  Another common use is domain agregates:

     record Person(String first, String last);

If I want to marshal one of these to or from XML using a framework like 
JAXB, I can provide mapping metadata between XML schema and classes, and 
the framework will gladly populate the object for me.  The way these 
frameworks want to work is to instantiate a mutable object with a no-arg 
constructor, and then set the fields (or call setters) as components 
become available on the stream. Yes, you can write a binding framework 
that waits until it has all the stuff and then calls a N-arg 
constructor, but that's a lot harder, and uses a lot more memory.  
Mutable records will play nicely with these frameworks.

## Embracing mutability

I cheated a bit in the two examples I gave; neither had a no-arg 
constructor.  We could do a few things about this:
  - Make the user write a no-arg constructor (and hopefully make this 
easy enough)
  - Provide a no-arg constructor for all records that just pass the 
default values for that type to the default constructor (which might 
reject them, if it doesn't like nulls)
  - Try to provide a "minimal" constructor that only takes the final 
fields.  (I don't like this because changing a field between final and 
not changes the signature of an implicit constructor, which won't be 
binary compatible.)

Similarly, you could object that deriving equals/hashCode from mutable 
state is dangerous.  (But List does do this.)  Again, there are a few 
ways to deal.  We could adjust the standard equals/hashCode to only take 
into account final fields.  But, I'm skeptical of this, because I could 
easily imagine people constructing records via mutation but then using 
them in an effectively immutable way thereafter, and they might want the 
stronger equals contract.  Or, we could tell people, as we do with List, 
not to use them as keys in hash-based collections.  (We could even have 
compiler warnings about this.)

## Additional considerations

Here are a few less fundamental points about accepting mutable records, 
none of which are slam-dunks, but might still be useful to consider:
  - People will just work around it anyway, as they do with lambdas.  If 
a class has N-1 final fields, and one mutable one, what do we think 
they're going to do?
  - C# embraced mutable records.  This isn't surprising, but what is 
surprising is that Scala's case classes did also.  While I don't have 
data from either Neal or Martin, I suspect that they went through a 
similar analysis -- that it would leave out too many desirable use cases 
for the feature, and still not protect us from deeper mutability anyway.
  - Mutability introduces pain, but so does repetition and boilerplate 
-- it gives bugs a place to hide.  Making the feature less applicable 
consigns more users to using a more error-prone mechanism.

## Fields: final by default?

One of the nudges we've considered is making fields final by default, 
but letting them be declared non-final.  This is a nudge, in that it 
sends a message that records are best served immutable, but if you want 
your revenge warm, you can have it.  I think there are reasonable 
arguments on both sides of this story, but one argument I am not 
particularly motivated by is "but then we'd have to introduce non-final 
as a keyword."  If we think final-by-default is a good idea, I don't 
think the lack of a denotation should be the impediment.

## Clone

Clone is a mess, and I'm not sure there's a good answer here, but 
there's surely a good discussion.

As a user, I find the ability to clone arrays (despite being shallow) is 
super useful, and it makes it far easier to be good about doing 
defensive copies all the time.  If cloning were harder (new 
array/arraycopy), I'd probably cut more corners.  If we can deliver the 
same benefit for records, that seems enticing.

There's a fair argument over whether the standard clone should be 
shallow (easy to specify and implement) or should try to deeply clone 
Cloneable components.  Or maybe both options suck.  Or maybe it should 
be opt in; if the record extends Clonable, you get a clone() method.

What did I miss?