Mutable records
Brian Goetz
brian.goetz at oracle.com
Fri Mar 23 18:03:20 UTC 2018
A few people have asked, "wouldn't it just be easier to prohibit
mutability in records"? And while it surely would be easier (most of
the issues I raised in my writeup go away without mutability), I think
it would also greatly restrict the utility of the feature. Let me talk
about why, and give some examples -- and then I'd like to talk about
what we can do, if anything, to make the mutable use cases easier.
## General argument: Mutability is pervasive in Java; you can only push
it away a bit.
We saw this with lambdas; developers are all too eager to "work around"
the limitation on mutable local capture by wrapping their mutables in a
one-element array. In fact, IDEs even "helpfully" offers to do this for
you, thus ensuring that everyone thinks this is OK.
We will see this again with value types; even though value types are
immutable, value types can contain references to mutable objects, and
trying to enforce "values all the way down" would result in fairly
useless value types.
(That doesn't mean we can't nudge towards immutability where we think it
makes sense, if we think the value of the nudge exceeds the irregularity
or complexity it entails.)
## Records and value types: goals and similarities
While records and value types have some features in common (getting
equals() for free), they have different motivations.
Value types are about treating aggregates as, well, values, with all the
things that entails; they can be freely shared, the runtime can
routinely optimize them by putting them on the stack or in registers and
flatten them into enclosing values, classes, or arrays (yielding better
density and flatness.) What they ask you to give up in exchange is
identity, which means giving up mutability and layout polymorphism.
Records are about treating data as data; when modeling aggregates with
records, the result is transparent classes whose API and representation
are the same thing. This means that records can be freely
interconverted between their exploded and aggregate forms with no loss
of information. What they ask you to give up is the freedom to define
the mapping between representation and API (constructors, accessors,
equals, hashCode, deconstruction) in a nontransparent way.
(Essentially, you give up all encapsulation except for the ability to
control writes to their state.)
My claim is that the goals are mostly orthogonal, and the benefits and
tradeoffs of each are as well. All four quadrants make sense to me.
Some aggregates are values but not transparent (think cursors that hold
references into the internals of a data structure, or hold a native
resource); some are "just their data" but not values (graph nodes, as
well as the mutable examples below), and others are both (value records).
The superficial commonalities between records and values (both are
restricted forms of aggregate, and these restrictions make it possible
to provide sensible defaults for things like equals) tease us into
thinking they are the same thing, but I don't think they are.
Assuming this to be true, how can we justify having two new constructs?
Value types, by nature of what they require the developer to give up,
enable the runtime to make significant optimizations it could not
otherwise make. So if we want flat and dense data, this is basically
our only option -- make the programmer consent to the handcuffs. The
argument for records is more of a contingent one; records allow you to
express more with less. The "more with less" has at least two aspects;
in addition to the obvious reduction in boilerplate, libraries and
frameworks can make more reasonable assumptions about what construction
or deconstruction means, and therefore can build useful functionality
safely (such as marshaling to/from XML.) But records don't let you do
anything you can't already do with classes. So if I had a quota, I'd
have to pick values over records.
In a language with values on the roadmap, immutable-only records seem to
offer a pretty lame return-on-complexity. Nothing about values requires
you to use encapsulation, so you could model most immutable records with
a value type, with less boilerplate than a class (but more than none),
and the remainder with classes. (Immutable records buy you one thing
that values do not -- pointer polymorphism. That lets you make graphs
or trees of them.) But I think it is clear that this model of records
is a kind of weird half-one, half-the-other thing, and its not entirely
clear it would carry its weight.
And, when users ask "why can't record components be mutable, after all,
records are about data, and some data is mutable", I don't think we have
a very good answer other than "immutability is good for you." I much
prefer the argument of "there are two orthogonal sets of tradeoffs; pick
one, the other, or both."
## Use cases for mutable records
Here are two use cases that immediately come to mind; please share others.
Groups of related mutable state. An example here is a set of counters.
If I define:
record CacheCounters(public int hitCount, public int accessCount) {
float hitRate() { ... }
}
then I can treat them as a single entity; store a counter-pair in a
variable, have arrays of them, use them as values of Maps, pass them
around, etc. (The fact that they're mutable introduces constraints, but
not new constraints; we deal with this problem every day.) I can even
lock on it, if that's how I want to do it.
Domain objects. Another common use is domain agregates:
record Person(String first, String last);
If I want to marshal one of these to or from XML using a framework like
JAXB, I can provide mapping metadata between XML schema and classes, and
the framework will gladly populate the object for me. The way these
frameworks want to work is to instantiate a mutable object with a no-arg
constructor, and then set the fields (or call setters) as components
become available on the stream. Yes, you can write a binding framework
that waits until it has all the stuff and then calls a N-arg
constructor, but that's a lot harder, and uses a lot more memory.
Mutable records will play nicely with these frameworks.
## Embracing mutability
I cheated a bit in the two examples I gave; neither had a no-arg
constructor. We could do a few things about this:
- Make the user write a no-arg constructor (and hopefully make this
easy enough)
- Provide a no-arg constructor for all records that just pass the
default values for that type to the default constructor (which might
reject them, if it doesn't like nulls)
- Try to provide a "minimal" constructor that only takes the final
fields. (I don't like this because changing a field between final and
not changes the signature of an implicit constructor, which won't be
binary compatible.)
Similarly, you could object that deriving equals/hashCode from mutable
state is dangerous. (But List does do this.) Again, there are a few
ways to deal. We could adjust the standard equals/hashCode to only take
into account final fields. But, I'm skeptical of this, because I could
easily imagine people constructing records via mutation but then using
them in an effectively immutable way thereafter, and they might want the
stronger equals contract. Or, we could tell people, as we do with List,
not to use them as keys in hash-based collections. (We could even have
compiler warnings about this.)
## Additional considerations
Here are a few less fundamental points about accepting mutable records,
none of which are slam-dunks, but might still be useful to consider:
- People will just work around it anyway, as they do with lambdas. If
a class has N-1 final fields, and one mutable one, what do we think
they're going to do?
- C# embraced mutable records. This isn't surprising, but what is
surprising is that Scala's case classes did also. While I don't have
data from either Neal or Martin, I suspect that they went through a
similar analysis -- that it would leave out too many desirable use cases
for the feature, and still not protect us from deeper mutability anyway.
- Mutability introduces pain, but so does repetition and boilerplate
-- it gives bugs a place to hide. Making the feature less applicable
consigns more users to using a more error-prone mechanism.
## Fields: final by default?
One of the nudges we've considered is making fields final by default,
but letting them be declared non-final. This is a nudge, in that it
sends a message that records are best served immutable, but if you want
your revenge warm, you can have it. I think there are reasonable
arguments on both sides of this story, but one argument I am not
particularly motivated by is "but then we'd have to introduce non-final
as a keyword." If we think final-by-default is a good idea, I don't
think the lack of a denotation should be the impediment.
## Clone
Clone is a mess, and I'm not sure there's a good answer here, but
there's surely a good discussion.
As a user, I find the ability to clone arrays (despite being shallow) is
super useful, and it makes it far easier to be good about doing
defensive copies all the time. If cloning were harder (new
array/arraycopy), I'd probably cut more corners. If we can deliver the
same benefit for records, that seems enticing.
There's a fair argument over whether the standard clone should be
shallow (easy to specify and implement) or should try to deeply clone
Cloneable components. Or maybe both options suck. Or maybe it should
be opt in; if the record extends Clonable, you get a clone() method.
What did I miss?
More information about the amber-spec-experts
mailing list