Records and derived fields (was: Updated document on data classes and sealed types)

Sat Mar 9 13:37:16 UTC 2019

The second of the two topics raised by Kevin’s note.

The subject of ancillary fields seems to be the hardest question to answer; none of the answers seem great.  But let’s tease apart some of the use cases.  

I think Kevin’s comment here is, at root: “but derived fields are super-useful, and seem perfectly safe, isn’t there some way to wedge these into the model?”  

Correct me if I’m wrong, but I would think you would agree that arbitrary mutable fields in records are _not_ a great idea.  So the game here is, is is possible to carve out a space where derived fields are possible, but non-derived ones are not?  

(And, I think there is, if we stop thinking about “fields” and start thinking about semantics.  Derived fields are a mechanism, in aid of: is it possible to ensure that a computation on record state is done at-most-once?  More on that later.)

> 
>> 
>> I still want to understand what the scenario we're worried about here is. Whether the value is computed later using a "lazy fields" feature or eagerly in the constructor, only the record's state is in scope, and sure, people can shoot themselves in the foot by calling out to some static method and getting some result not determined by the parameters, but why is this worth worrying about? Do you have an example that's both dangerous and tempting? (Sorry if you've said it before.)
> 
> If we did allow them, the next thing people (Alan already did, and you made this same comment in an earlier round) would ask is whether they can be mutable — so that derived fields can be lazily derived,  Now, records are a combination of a “true record" plus an unconstrained bag of mutable state.
> 
> My question did actually exclude this option. We should tell those people no.

OK, so you’re suggesting: ancillary final fields with initializers is OK.  Not a totally silly option (we did talk about it before), but let’s look at how it affects the user perception of what the feature is for, and then try and make a cost-benefit comparison between this and the base case (no additional fields.)  

Note that the benefit we’re aiming for here is purely an optimization; the avoidance of recalculating derived state.  (Nagging question: if this optimization _is_ super-important, is this the best way to ensure it?)  

I think you have also noted in the past that there are lots of ways to get around the restriction, at various degrees of obviously-missing-the-point:

     final notReallyDerived = new Foo[1]; // effectively, a mutable Foo field

     static final WeakHashMap<MyRecord, Foo> // same effect, just more absurd

> 
>  And what do you think the chances are that this state won’t make it into equals/hashCode semantics?
> 
> If it's derived, it doesn't hurt that much; if it's not, why are they working so hard to not make it a regular record field? 
> This is part of what I'm talking about when I say "sure, they can shoot themselves in the foot". What is a realistic example we are worried about?

The language doesn’t have a notion of “derived quantity”, so any attempt to restrict the relaxation to derived quantities will be an approximation.  But I think the “why are they working so hard” question answers itself: concision!  And the most complex the boundary of the records feature is, the more that Billy will be confused into thinking this is just an overly-complicated, sharp-edged, frankly-crappy way to get concision.  

If we want to carve out an exception for “derived state”, let’s go there more directly (in a way that benefits all classes, not just records).  One way we discussed was something like this:

     lazy final String fullName = first + last;

and then allowing records to have lazy fields.  This is a stronger hint that this is no ordinary field, but ultimately still is too easy to abuse by initializing it with a one-element array.  

A more direct way to get there is to introduce a semantic notion that a computation should be done at most once: 

     record Name(String first, String last) { 
          __at_most_once String fullName() -> first + last;
     }

This has a lot of advantages over the field approach:

 - We have said directly what we mean, in a way that doesn’t “leak” its implementation mechanism (fields);
 - The runtime can probably optimize more directly and flexibly;
 - We haven’t distorted the set of class members for a performance concern;
 - Mechanism usable equally by records and non-records.

A downside is that we don’t have this mechanism yet, and we don’t really want to hold up records to get it.    

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20190309/d7ccbdd9/attachment.html>