Mutable records

Fri Mar 23 20:13:09 UTC 2018

FWIW, some data:

Our AutoValue <https://github.com/google/auto/tree/master/value> tool at
Google has been *extremely* successful and forbids mutation.

   - Despite only being added relatively recently compared to the age of
   the codebase, a full 25% of all getters in our codebase are written with
   AutoValue.
   - We provide a @Memoized annotation, that memoizes the results of
   methods, and tooling for generating builders.
   - We don't forbid mutable field types.

I can confidently say that we've never regretted the decision to only
support immutable objects.

To give some broader context, I analyzed all instance fields in the Google
codebase, and came up with some edifying numbers:

   - Outside of generated code, 71% of all instance fields in our codebase
   are never assigned to outside of constructors for the class where they are
   defined, which I'd call "effectively final."
   - That number goes up one or two percent if you exclude test code and/or
   exclude fields in classes whose names end with "Builder".
   - It goes up to 76% if you include AutoValue "fields" (the fields are
   only declared as normal Java fields in generated code, which we'd excluded).
   - 67% of all classes with at least one field have all effectively final
   fields.

On Fri, Mar 23, 2018 at 11:03 AM Brian Goetz <brian.goetz at oracle.com> wrote:

> A few people have asked, "wouldn't it just be easier to prohibit
> mutability in records"?  And while it surely would be easier (most of
> the issues I raised in my writeup go away without mutability), I think
> it would also greatly restrict the utility of the feature.  Let me talk
> about why, and give some examples -- and then I'd like to talk about
> what we can do, if anything, to make the mutable use cases easier.
>
> ## General argument: Mutability is pervasive in Java; you can only push
> it away a bit.
>
> We saw this with lambdas; developers are all too eager to "work around"
> the limitation on mutable local capture by wrapping their mutables in a
> one-element array.  In fact, IDEs even "helpfully" offers to do this for
> you, thus ensuring that everyone thinks this is OK.
>
> We will see this again with value types; even though value types are
> immutable, value types can contain references to mutable objects, and
> trying to enforce "values all the way down" would result in fairly
> useless value types.
>
> (That doesn't mean we can't nudge towards immutability where we think it
> makes sense, if we think the value of the nudge exceeds the irregularity
> or complexity it entails.)
>
> ## Records and value types: goals and similarities
>
> While records and value types have some features in common (getting
> equals() for free), they have different motivations.
>
> Value types are about treating aggregates as, well, values, with all the
> things that entails; they can be freely shared, the runtime can
> routinely optimize them by putting them on the stack or in registers and
> flatten them into enclosing values, classes, or arrays (yielding better
> density and flatness.)  What they ask you to give up in exchange is
> identity, which means giving up mutability and layout polymorphism.
>
> Records are about treating data as data; when modeling aggregates with
> records, the result is transparent classes whose API and representation
> are the same thing.  This means that records can be freely
> interconverted between their exploded and aggregate forms with no loss
> of information.  What they ask you to give up is the freedom to define
> the mapping between representation and API (constructors, accessors,
> equals, hashCode, deconstruction) in a nontransparent way.
> (Essentially, you give up all encapsulation except for the ability to
> control writes to their state.)
>
> My claim is that the goals are mostly orthogonal, and the benefits and
> tradeoffs of each are as well.  All four quadrants make sense to me.
> Some aggregates are values but not transparent (think cursors that hold
> references into the internals of a data structure, or hold a native
> resource); some are "just their data" but not values (graph nodes, as
> well as the mutable examples below), and others are both (value records).
>
> The superficial commonalities between records and values (both are
> restricted forms of aggregate, and these restrictions make it possible
> to provide sensible defaults for things like equals) tease us into
> thinking they are the same thing, but I don't think they are.
>
> Assuming this to be true, how can we justify having two new constructs?
> Value types, by nature of what they require the developer to give up,
> enable the runtime to make significant optimizations it could not
> otherwise make.  So if we want flat and dense data, this is basically
> our only option -- make the programmer consent to the handcuffs.  The
> argument for records is more of a contingent one; records allow you to
> express more with less.  The "more with less" has at least two aspects;
> in addition to the obvious reduction in boilerplate, libraries and
> frameworks can make more reasonable assumptions about what construction
> or deconstruction means, and therefore can build useful functionality
> safely (such as marshaling to/from XML.)  But records don't let you do
> anything you can't already do with classes.  So if I had a quota, I'd
> have to pick values over records.
>
> In a language with values on the roadmap, immutable-only records seem to
> offer a pretty lame return-on-complexity.  Nothing about values requires
> you to use encapsulation, so you could model most immutable records with
> a value type, with less boilerplate than a class (but more than none),
> and the remainder with classes. (Immutable records buy you one thing
> that values do not -- pointer polymorphism.  That lets you make graphs
> or trees of them.)  But I think it is clear that this model of records
> is a kind of weird half-one, half-the-other thing, and its not entirely
> clear it would carry its weight.
>
> And, when users ask "why can't record components be mutable, after all,
> records are about data, and some data is mutable", I don't think we have
> a very good answer other than "immutability is good for you."  I much
> prefer the argument of "there are two orthogonal sets of tradeoffs; pick
> one, the other, or both."
>
> ## Use cases for mutable records
>
> Here are two use cases that immediately come to mind; please share others.
>
> Groups of related mutable state.  An example here is a set of counters.
> If I define:
>
>      record CacheCounters(public int hitCount, public int accessCount) {
>          float hitRate() { ... }
>      }
>
> then I can treat them as a single entity; store a counter-pair in a
> variable, have arrays of them, use them as values of Maps, pass them
> around, etc.  (The fact that they're mutable introduces constraints, but
> not new constraints; we deal with this problem every day.)  I can even
> lock on it, if that's how I want to do it.
>
> Domain objects.  Another common use is domain agregates:
>
>      record Person(String first, String last);
>
> If I want to marshal one of these to or from XML using a framework like
> JAXB, I can provide mapping metadata between XML schema and classes, and
> the framework will gladly populate the object for me.  The way these
> frameworks want to work is to instantiate a mutable object with a no-arg
> constructor, and then set the fields (or call setters) as components
> become available on the stream. Yes, you can write a binding framework
> that waits until it has all the stuff and then calls a N-arg
> constructor, but that's a lot harder, and uses a lot more memory.
> Mutable records will play nicely with these frameworks.
>
> ## Embracing mutability
>
> I cheated a bit in the two examples I gave; neither had a no-arg
> constructor.  We could do a few things about this:
>   - Make the user write a no-arg constructor (and hopefully make this
> easy enough)
>   - Provide a no-arg constructor for all records that just pass the
> default values for that type to the default constructor (which might
> reject them, if it doesn't like nulls)
>   - Try to provide a "minimal" constructor that only takes the final
> fields.  (I don't like this because changing a field between final and
> not changes the signature of an implicit constructor, which won't be
> binary compatible.)
>
> Similarly, you could object that deriving equals/hashCode from mutable
> state is dangerous.  (But List does do this.)  Again, there are a few
> ways to deal.  We could adjust the standard equals/hashCode to only take
> into account final fields.  But, I'm skeptical of this, because I could
> easily imagine people constructing records via mutation but then using
> them in an effectively immutable way thereafter, and they might want the
> stronger equals contract.  Or, we could tell people, as we do with List,
> not to use them as keys in hash-based collections.  (We could even have
> compiler warnings about this.)
>
> ## Additional considerations
>
> Here are a few less fundamental points about accepting mutable records,
> none of which are slam-dunks, but might still be useful to consider:
>   - People will just work around it anyway, as they do with lambdas.  If
> a class has N-1 final fields, and one mutable one, what do we think
> they're going to do?
>   - C# embraced mutable records.  This isn't surprising, but what is
> surprising is that Scala's case classes did also.  While I don't have
> data from either Neal or Martin, I suspect that they went through a
> similar analysis -- that it would leave out too many desirable use cases
> for the feature, and still not protect us from deeper mutability anyway.
>   - Mutability introduces pain, but so does repetition and boilerplate
> -- it gives bugs a place to hide.  Making the feature less applicable
> consigns more users to using a more error-prone mechanism.
>
> ## Fields: final by default?
>
> One of the nudges we've considered is making fields final by default,
> but letting them be declared non-final.  This is a nudge, in that it
> sends a message that records are best served immutable, but if you want
> your revenge warm, you can have it.  I think there are reasonable
> arguments on both sides of this story, but one argument I am not
> particularly motivated by is "but then we'd have to introduce non-final
> as a keyword."  If we think final-by-default is a good idea, I don't
> think the lack of a denotation should be the impediment.
>
> ## Clone
>
> Clone is a mess, and I'm not sure there's a good answer here, but
> there's surely a good discussion.
>
> As a user, I find the ability to clone arrays (despite being shallow) is
> super useful, and it makes it far easier to be good about doing
> defensive copies all the time.  If cloning were harder (new
> array/arraycopy), I'd probably cut more corners.  If we can deliver the
> same benefit for records, that seems enticing.
>
> There's a fair argument over whether the standard clone should be
> shallow (easy to specify and implement) or should try to deeply clone
> Cloneable components.  Or maybe both options suck.  Or maybe it should
> be opt in; if the record extends Clonable, you get a clone() method.
>
>
> What did I miss?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180323/4bc37bf2/attachment.html>