Record copy()/with()

Sat May 23 20:00:36 UTC 2020

On May 23, 2020, at 9:07 AM, Remi Forax <forax at univ-mlv.fr> wrote:
> 
>> 
>> 3. Record components could also be normalized in the constructor. E.g.
>> assume record Fraction(int numerator, int denominator) that normalizes
>> the components using GCD in the constructor. Using withers would
>> produce a weird result based on the previous value.
> 
> A record is transparent (same API externally and internally) so as a user i expect that if i set a record component to 3 using a constructor and then ask its value, the result will be 3,

This is not the expectation we set for record constructors, so it is
debatable that it would be an expectation for record “withers”.

This is why I like to use a special term for “withers”:  If an object
can be built from scratch by calling its class’s constructor, then
it follows that an API point which takes an existing record and
builds a new one (with modifications) is a sort of constructor
also; I call it a “reconstructor”.

Some context: I’ve been thinking about this for a long time.
When we added non-static final variables, with their funny DU/DA
rules, I did the design and implementation, and one thing that
irritated me was the lack of a way to change just one or two
final fields of an object full of finals.  At the time final fields
were rare so we thought the user would always prefer to build
another object “from scratch”.  And it would be expensive enough,
since fast GC was not yet a thing, that the user would not want a
slick way to do the job.  Now with inline types the balance of
concerns has shifted, towards objects with *only* final fields,
and very low allocation costs, yet we still have a “hole” in our
language, corresponding to that old and now-important
technical debt.  Back in the ‘90s I didn’t have a crisp idea about
the shape of this technical debt, but now I think I do.  Which
is not to say that I have a crisp idea how to pay it off, but I have
some insights I want to share.

The debt shows up when we transform mutable iterators into
immutable value-based “cursors”.  For those we need a way to say
“offset++” inside an active cursor—yielding a new cursor value which
is the logical “next state” of the iteration.  Physically, that’s a
constructor for the cursor class which takes the old cursor and does
“offset++” (or the equivalent) in the body of that constructor.  Because
it takes the old cursor and preserves all other fields untouched, it is a
very special kind of constructor, which should be called “reconstructor”
because it reconstructs (a copy of) the old object to match some
new modeling condition (an incremented offset).  As side note, it
is telling that the rules of constructors (but no other methods)
allow you to perform local side effects on variables which
correspond to fields; of course you commit “this.offset = offset”
at some point and shouldn’t dream of modifying “this.offset”,
but the constructor is free to change the values before committing
them.  Compact record constructors make this more seamless.
In such settings (which are natural to reconstructors), saying
something like “offset++” or “offset -= adjust”, before commit,
is totally normal.

A reconstructor makes the most sense for an inline type, but it
also makes sense for identity types (as long as users are willing
to eat the cost of making a new version of the object instead of
side-effecting the old version).

> so i don't think normalizing values of a record in the constructor is a good idea.
> This issue is independent of with/copy, calling a constructor with the results of some accessors of an already constructed gcd will produce weird results. 

Calling it a reconstructor sets expectation that those results are
not weird at all:  You are just getting the usual constructor logic
for that particular class, which enforces all of its invariants.

Having a “wither” that can “poke” any new value unchecked
into an already-checked configuration, without allowing the class
to validate the new data, would be the “weird result” in this case.
Let’s not do that, and let’s not set that kind of expectation.
Encapsulation means never having anyone else tell you the
exact values of your fields.  Constructors are the gatekeepers
of that encapsulation.  Even record classes (transparent as they
are) are allowed to have opinions about valid and invalid field
values, and to reject or modify requests to create instances
which are invalid according to the contract of the record class.

> 
>> 
>> 4. Points 2 and 3 may lead to the conclusion that not every record
>> actually needs copying. In fact, I believe, only a few of them would
>> need it. Adding them automatically would pollute the API and people
>> may accidentally use them. I believe, if any automatic copying
>> mechanism will be added, it should be explicitly enabled for specific
>> records.

An explicitly declared reconstructor would fulfill this goal.

A reconstructor, as opposed to a “wither” feature, would also scale
from one argument to any number of arguments.

This leads to the issue of defining API points which are polymorphic
across collections of (statically determined) fields.  Which is the
present point.  Note, however, that it has surprisingly deep roots.
As soon as we added non-static final fields to Java, we incurred a
debt to eventually examine this problem.  Time’s up; here we are.

> with/copy calls the canonical constructor at the end, it's not something that provide a new behavior, but more a syntactic sugar you provide because updating few fields of a record declaring a dozen of components by calling the canonical constructor explicitly involve a lot of boilerplate code that may hide stupid bugs like the values of two components can be swapped because the code called the accessors in the wrong order.

Well, this is an argument for keyword-based constructors also.
And reconstructors as well.  (See the connection?  It’s 1.25 things
here not 2 things.)

Setting all of the above aside for now, I have one old and one new
idea about how to smooth out keyword-based calling sequences.
These are offered in the spirit of brainstorming.

The old idea is that, while good old Object… is a fine way to pass
stuff around, we could (not now but later) choose to expand the
set of available varargs calls by adding new carrier types as possible
varargs bundles.  So a key/val/key/val/... sequence could be passed
with keys strongly typed as strings (or enum members, for extra
checking!) and the vals typed as… well Object, still.  The move
needed for such a thing is, I think, simple though somewhat
disruptive.  Sketch of design:

 - have some way for marking a class A as varargs-capable
 - allow a method m to be marked as A-varargs instead of Object[]-varargs (m(…A a)?)
 - transform any call to m(a,b,c…) as m(new A(a,b,…))
 - use the standard rules for constructor resolution in A
 - note that at least one A constructor is probably A-varargs (recursive)
 - this allows A’s constructors to do a L-to-R parse of m’s arguments
 - A(T,U) and A(T,U,…A) give you Map<T,U> key/val/key/val lists

I’m just putting that out there.  We can use Object… for the
foreseeable future.  An enhanced varargs feature would let us do
better type checking, though.  It would also allow the varargs
carrier (A not Object…) to be (drum roll please) an inline type,
getting rid of several kinds of technical debt associated with
array-based varargs.

Second, here’s a new idea:  During the JSR 292 design, we talked
about building BSMs which could somehow capture constant
(or presumed-constant) argument values and fold them into the
target of the call-site.  Remi, your proposed design for record
<strikeout>withers</strikeout> reconstructors could use such
a thing.  We were (IIRC) uncertain how to do this well, although
you may have prototyped something slick like you often do.

This conversation made me revisit the question, and I have a
proposal, a new general-purpose BSM combinator which sets
a “trap” for the first call to a call site, samples the arguments
which are purported to be constant, and then spins a subsidiary
call site which “sees” the constants, and patches the latter call
site into the former.  Various configurations of mutable and
constant call sites are possible and useful.  A new kind of
call site might be desirable, the StableCallSite, which is one
that computes its true target on the first call (not linkage)
and thereafter does not allow target changes.

/** Arrange a call site which samples selected arguments
* on the first call to the call site and calls the indicated bsm
* to hand-craft a sub-call site based on those arguments.
* The bsm is called as CallSite subcs = bsm(L,S,MT,ca…,arg…)
* where the ca values are sampled from the initial dynamic
* list according to caspec. */
StableCallSite bootstrapWithConstantArguments(L,S,MT,caspec,bsm,arg…)

/** Same as bootstrapWithConstantArguments, but the bsm
* is called not only the first time, but every time a new argument
* value is encountered.  Arguments are compared with == not equals.
*/
CallSite bootstrapWithSpeculatedArguments(L,S,MT,caspec,bsm,arg…)

Other variations are possible, using other comparators and also
key extractors.  Object::getClass is a great key extractor; this gives
us the pattern of monomorphic inline caches.  For the speculating
version, the existing and new targets could recombined into a
decision tree; that requires an extra hook, perhaps a SwitchingCallSite
or a MH switch combinator.

Anyway, I’m brainstorming here, but it seems like we might have
some MH API work to do that would give us leverage on the wither
problem.  It’s probably obvious, but I’ll say it anyway:  The “caspec”
thingy (a String? “0,2,4”?) would point out the places where the key
arguments are placed in the field-polymorphic reconstructor call.
The secondary BSM would take responsibility for building a custom
reconstructor MH that takes the non-key (val) arguments and builds
the requested record.  The primary BSM (bootstrapWithCA) would
recede to the background; it’s just a bit of colorless plumbing, having
no linkage at all to record type translation strategy, other than the
fact that it’s useful.  Maybe there’s a record-specific BSM that wraps
the whole magic trick, but it’s a simple combo on top.

The hard part is building the reconstructor factory.  I think that
should be done in such a way that the record class itself has complete
autonomy over the reconstruction.  Probably the reconstructor
factory should just wire up arguments “foo” where they exist
in the reconstructor call, and pass argument “this.bar” where
they are not mentioned via keys.  This is easy.  It’s clunky too,
but until the JVM gives a real way to say the thing directly,
it will work.  Note that the clunkiness is hidden deep inside
the runtime, and can be swapped out (or optimized) when
a better technique is available, *without changing translation
strategy*.  For records, everything goes through the canonical
constructor, including synthesized reconstructors.  In the
case of *inline* records, the runtime would create a suitable
constructor call, and might (if it could prove it correct) use
bare “withfield” opcodes to make optimization easier.

I think if we expose the API point as MyRecord::with(Object…)
it should be possible to call the thing reflectively, or with
variable keywords, or whatever.  But javac should detect the
common case of non-variable keywords, do some checks,
and replace the call site with an indy, for those cases.  That
way we can have our cake and eat it too.

(There are other things we can do beyond that, by slicing
up the per-variable concerns from the per-instance concerns
in the constructor, leading to better optimizations. The requires
unknown translation strategy hooks.  But this is enough
brainstorming for one email.)

— John