Record copy()/with()

Sun May 24 20:44:54 UTC 2020

On May 24, 2020, at 7:50 AM, forax at univ-mlv.fr wrote:
> ...I call it a “reconstructor”.
> 
> yes, it's a good name.

Glad you agree!

> ...A reconstructor makes the most sense for an inline type, but it
> also makes sense for identity types (as long as users are willing
> to eat the cost of making a new version of the object instead of
> side-effecting the old version).
> 
> I would say a reconstructor makes sense where there is a canonical constructor.

Very good point.  I suspect there are other uses where it makes
sense too:  For a single-field “wither” (special case, but maybe the
class designer likes it), or for a named-argument reconstructor
which is not canonical, but only lets you change a subset of the
fields, and/or field values within smaller limits (useful if the
canonical constructor is more private than the reconstructor).

Basically, the insight is that the canonical reconstructor provides
access to all possible names, but that may be too much, and so
lesser reconstructors (with names, probably) need to provide
restricted views.

> One thing we have not done is to figure out exactly what a canonical constructor means for a plain old Java class,
> it something we will have to figure out when introducing inline classes because all inline classes are not record classes.

Yep.  See above:  I think there is a range of possibilities.

>> ...An explicitly declared reconstructor would fulfill this goal.
> 
> yes, but given that we already provide a canonical constructor, i don't see the point of not providing the corresponding reconstructor.

Yes.  The CC associates closely with a canonical deconstructor
(matcher) and a canonical reconstructor.

>> 
>> A reconstructor, as opposed to a “wither” feature, would also scale
>> from one argument to any number of arguments.
>> 
> 
> yes, as Tagir said, being able to set several fields at the same time allows to keep the invariants right.

+1 I don’t know how to express this nicely in concrete notation,
but each field could express the fragment of the (re/de)constructor
logic that pertains to it, in isolation.  That would cover clipping,
validation, defensive copying, etc.  And then the bodies of the
(re/de)constructors would be mostly empty, because they would
contain only the logic which ensures coherence between multiple
fields (such as a “same sign” constraint).  To be clear:  I’m not
sure if that would pay for itself as a Java language feature, but
I’m on the lookout for the distinction between “what a constructor
does on a single object field” vs. “what a constructor does for
the object as a whole.

>> 
>> ...Well, this is an argument for keyword-based constructors also.
>> And reconstructors as well.  (See the connection?  It’s 1.25 things
>> here not 2 things.)
>> 
> yes, but introducing named arguments constructors without named arguments methods is weird.
> That said, there is a difference between the two, at least if i understand correctly what they have done in C# (which has two different syntaxes for named arguments and named argument constructors for record),
> Using your cursor example, a cursor on an array can be defined like this
>   record Cursor(int offset, String... array) {
>    String item() { return array[offset]; }
>   }
>   var cursor = new Cursor(0, "foo", "bar");
> and to update the offset
>   cursor = cursor.with("offset", cursor.offset() + 1);
> 
> what C# does by introducing a special syntax is to says that inside the method call to with, this doesn't reference the enclosing class but the current cursor,
> so 'offset' in the right hand side of the expression refers to the cursor offset
>   cursor = cursor with { offset = offset + 1 }

OK, yes; I phrased the translation strategy suggestions in the terms
you set (key/val/key/val), but it’s better to have something that looks
like a higher-order function that can tweak all the fields.  Then
there are no stringy names, just carefully adjusted scope rules.

I like to call this form of a reconstructor an “external reconstructor”,
because the body of the reconstruction logic is arbitrary code, supplied
from *outside* the encapsulation.  But it does fit in the rubric of
(re)constructors, because the body of C# “with” looks and acts like
a constructor placed *inside* the class, especially if you use the
(wonderful IMO) convention that final fields are not committed
until the end, and before that they are in mutable variables, to
be validated, clipped, and defensively copied, before the object
is populated.

inline record class Cursor {
   Object[] base; int offset;
   Cursor { rangeCheck(); }
   // named reconstructor provides a bump:
   __RECONSTRUCTOR incrementBy(int delta) {
      offset += delta;
      rangeCheck();
   }
   Cursor increment() { return incrementBy(1); }
}

As opposed to:

Cursor x = …;
x = x with { offset += 1; };  // ad hoc increment

The “with” form here is external, and must not be entrusted
with the task of calling rangeCheck.  The fields of the Cursor
are private, but somehow the “with” syntax needs to provide
access to them, at a distance, to allow the with-block to read
and write them.  Reading is easy using record accessors, while
writing must be done on exit from the block with a “commit”
which creates the new record from the ambient values.  That
looks exactly like the internal reconstructor API we already
discussed, with indy and all the rest.

I don’t know a nice notation for a class to provide an external
reconstructor, but I think it would take the form of an *internal*
reconstructor, plus a “hole” where external clients were invited
to place their own block of code.  That block would have fields
of the class (some or all) in scope, mediated by accessors, and
on exit would provide a new object state.  Note that this new
object state would have to be further validated, clipped, and
defensively copied by additional code in the reconstructor body,
so the “hole” for the external block would have to be *nested*
inside the declaration of the reconstructor.  Like this:

inline record class Cursor {
   Object[] base; int offset;
   Cursor { rangeCheck(); }
   __RECONSTRUCTOR withMyChanges() {
      __EXTERNAL_BLOCK_HERE;
      rangeCheck();
   }
}

Cursor x = …;
Object[] newBase = null; ...
x = x.withMyChanges() {
   if (newBase != null) { base = newBase; offset = 0; return; }
   ++offset
};

> 
> This notion that inside a 'builder' 'this' may not reference the enclosing instance but the receiver of the method call have been introduced first in Groovy and is now used in Kotlin, Swift and C#.

To avoid confusion, it might be nice if internal and external reconstructor
blocks didn’t have “this” in scope at all.  Not sure.  With external blocks,
there is potential confusion with an enclosing “this” from a method body.
We don’t need to have lots of little “this” bindings floating around as
a prerequisite to doing external constructors:  It’s enough to say that
(some or all of) the *fields* are in scope in the block, not “this”.  The
result is a transactional notation which operations on a cadre of named
values, produced by an encapsulation, and later consumed by it.

If the user wants to name the whole object, it’s easy enough to define
a named temp nearby which points at the whole object.  So dropping
“magic mini-this” from the design doesn’t hurt expressiveness,
and arguably forces the user to write more readable code.

> 
>> ...Sketch of design:
>> 
>>  - have some way for marking a class A as varargs-capable
>>  - allow a method m to be marked as A-varargs instead of Object[]-varargs (m(…A a)?)
>>  - transform any call to m(a,b,c…) as m(new A(a,b,…))
>>  - use the standard rules for constructor resolution in A
>>  - note that at least one A constructor is probably A-varargs (recursive)
>>  - this allows A’s constructors to do a L-to-R parse of m’s arguments
>>  - A(T,U) and A(T,U,…A) give you Map<T,U> key/val/key/val lists
>> 
> I don't think you need A to be varargs-capable, having a constructor which is a varargs is enough.
> I prefer using ** in a postfix way, Object[]** for array, List<String>** for list, Map<String,Integer>** for map.
> and + 1 of being recursive

Yes you are right; a varargs-capable class is really one which has one
or more constructors (or factory methods!!) which are acceptable to
use for reducing varargs lists into objects (that is, varargs boxing).

I mention factory methods because it would be a pretty straightforward
thing to mark java.util.List.of as a varargs factory, and then instantly
we’d have well-behaved varargs lists shaped as immutable Lists instead of
mutable arrays.  Constructors come first conceptually, but static factory
methods come next.  (And they are conflated in the case of inline objects.)

> 
> And i think you need to add what i was saying above, that 'this' in this context means the current constructed object

> 
> yes, varargs call should not allocate any object, implementing List** and Map** doesn't seem complex if there are translated into indy by the compiler and inline at runtime.

Yes, I think I see what you mean.  You need *something*
to reify the set of arguments, but it can be an inline.

>> Remi, your proposed design for record
>> <strikeout>withers</strikeout> reconstructors could use such
>> a thing.  We were (IIRC) uncertain how to do this well, although
>> you may have prototyped something slick like you often do.
>> 
> I have done something like this, you don't need a new kind of BSM just a BSM that traps arguments that are constant (that are the same as when previously call) and call another BSM with an array of the constant arguments as last arguments.
> I don't use that code anymore
> 1/ testing with == to find the constant is not enough, sometimes you want the class to not change, not the value
>    By example, for a reconstructor having the names of the components to be constant is not enough, you also need the class of the receiver to be able to see the name as record component.
>    Or a value like a Locale to be equivalent to a previously seen version

That the sort of thing I meant to address by suggesting key extractors
and comparators in the mix.

> 2/ you want to speed up the process if the compiler knows that something is a constant, there is no point to rediscover that it's a constant, so the compiler has to be aware of the special protocol

Yes, there’s often JIT work with these things to tweak the optimizations.

> 3/ storing method arguments means they are leaking because your delaying the time they can be GCed 

Yes.  The user of this should probably use long-lived keys, rather than
“sample” the first argument and save that.

assert(isSameClass(firstX, currentX));

is simpler but worse than

assert(firstXClassSaved == currentX.getClass());

In one case, a comparator looks at classes, and in the other
a class (the comparison key) is saved and reused, with getClass
as the key extractor.

> 4/ in the end, it's a kind of "half empty" mechanism (i don't know if there is a better name), it's for improving the performance of few cases but at the same time, writing an ad-hoc BSM for those cases is always better.
>     And in a runtime, if you have cross the rubicon saying i want to optimize that call, you don't want to half optimize it but squeeze the last drop of perf you can.

Oh, that’s a sad story of non-reusability.  Surely we can do better,
but it looks like you ran into a firm limit somewhere.

> ...
> For inline, the inline static constructor has to be marked as canonical as i was saying earlier,
> So there is no need to use withfield directly, calling the inline static constructor is enough.

There is no logical need to use withfield, as you note, but
there are physical reasons that withfield is better if you only
(in fact) need to change one field.  This is touching on the
decision to use withfield instead of a “vnew” or “newinline”
instruction, which would embody the canonical constructor.
The physical problem at the JVM level with the canonical
constructor, as a primitive, is the complexity of its linking
(naming) and also of its data motion.  Also, assuming an
inline object is scalarized in the IR, then a reconstruction
of the whole object creates a lot of identity functions that
have to be cleared away.  So, because I prefer withfield as
the primitive at the JVM level, even if canonical constructors
are (currently) the language primitive, I’m shopping around
withfield as a solution to higher-level problems.  My point
is that, if withfield is the more agile building block, then
some reconstructor calls can be boiled down to bare
withfields, instead of a full trip through a canonical
constructor.

(IMO this is a bit more compelling if we can isolate the
per-field validation logic, so that even if a field has validation,
a reconstruction that changes just that one field can turn
into a withfield plus the validation logic, with no “touches”
on unrelated fields.)

>> 
>> 
>> I think if we expose the API point as MyRecord::with(Object…)
>> it should be possible to call the thing reflectively, or with
>> variable keywords, or whatever.  But javac should detect the
>> common case of non-variable keywords, do some checks,
>> and replace the call site with an indy, for those cases.  That
>> way we can have our cake and eat it too.
>> 
> 
> I disagree, we should provide a good perf model for the reconstructors 
> I prefer that a code that perfoms badly to either not compile or is not possible to express instead of allowing people to call 'with' with non constant component name and ask us later to come with some miraculous optimizations.
> A cake should not require a tons of engineers to be salvaged because someone starts by adding a spoon full of salt.

OK, but I will take this as an argument against stringy names,
rather than an argument for very magic treatment of methods
that take stringy names.  Also, there must be a reflective story
for calling these things, but I think that’s easy, and doesn’t
lead people into false expectations of performance.

Thanks!

— John