Record copy()/with()

Sun May 24 14:50:17 UTC 2020

> De: "John Rose" <john.r.rose at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "Tagir Valeev" <amaembo at gmail.com>, "amber-spec-experts"
> <amber-spec-experts at openjdk.java.net>
> Envoyé: Samedi 23 Mai 2020 22:00:36
> Objet: Re: Record copy()/with()

> On May 23, 2020, at 9:07 AM, Remi Forax < [ mailto:forax at univ-mlv.fr |
> forax at univ-mlv.fr ] > wrote:

>>> 3. Record components could also be normalized in the constructor. E.g.
>>> assume record Fraction(int numerator, int denominator) that normalizes
>>> the components using GCD in the constructor. Using withers would
>>> produce a weird result based on the previous value.

>> A record is transparent (same API externally and internally) so as a user i
>> expect that if i set a record component to 3 using a constructor and then ask
>> its value, the result will be 3,

> This is not the expectation we set for record constructors, so it is
> debatable that it would be an expectation for record “withers”.

> This is why I like to use a special term for “withers”: If an object
> can be built from scratch by calling its class’s constructor, then
> it follows that an API point which takes an existing record and
> builds a new one (with modifications) is a sort of constructor
> also; I call it a “reconstructor”.

yes, it's a good name. 

> Some context: I’ve been thinking about this for a long time.
> When we added non-static final variables, with their funny DU/DA
> rules, I did the design and implementation, and one thing that
> irritated me was the lack of a way to change just one or two
> final fields of an object full of finals. At the time final fields
> were rare so we thought the user would always prefer to build
> another object “from scratch”. And it would be expensive enough,
> since fast GC was not yet a thing, that the user would not want a
> slick way to do the job. Now with inline types the balance of
> concerns has shifted, towards objects with *only* final fields,
> and very low allocation costs, yet we still have a “hole” in our
> language, corresponding to that old and now-important
> technical debt. Back in the ‘90s I didn’t have a crisp idea about
> the shape of this technical debt, but now I think I do. Which
> is not to say that I have a crisp idea how to pay it off, but I have
> some insights I want to share.

> The debt shows up when we transform mutable iterators into
> immutable value-based “cursors”. For those we need a way to say
> “offset++” inside an active cursor—yielding a new cursor value which
> is the logical “next state” of the iteration. Physically, that’s a
> constructor for the cursor class which takes the old cursor and does
> “offset++” (or the equivalent) in the body of that constructor. Because
> it takes the old cursor and preserves all other fields untouched, it is a
> very special kind of constructor, which should be called “reconstructor”
> because it reconstructs (a copy of) the old object to match some
> new modeling condition (an incremented offset). As side note, it
> is telling that the rules of constructors (but no other methods)
> allow you to perform local side effects on variables which
> correspond to fields; of course you commit “this.offset = offset”
> at some point and shouldn’t dream of modifying “this.offset”,
> but the constructor is free to change the values before committing
> them. Compact record constructors make this more seamless.
> In such settings (which are natural to reconstructors), saying
> something like “offset++” or “offset -= adjust”, before commit,
> is totally normal.

> A reconstructor makes the most sense for an inline type, but it
> also makes sense for identity types (as long as users are willing
> to eat the cost of making a new version of the object instead of
> side-effecting the old version).

I would say a reconstructor makes sense where there is a canonical constructor. 
One thing we have not done is to figure out exactly what a canonical constructor means for a plain old Java class, 
it something we will have to figure out when introducing inline classes because all inline classes are not record classes. 

>> so i don't think normalizing values of a record in the constructor is a good
>> idea.
>> This issue is independent of with/copy, calling a constructor with the results
>> of some accessors of an already constructed gcd will produce weird results.

> Calling it a reconstructor sets expectation that those results are
> not weird at all: You are just getting the usual constructor logic
> for that particular class, which enforces all of its invariants.

> Having a “wither” that can “poke” any new value unchecked
> into an already-checked configuration, without allowing the class
> to validate the new data, would be the “weird result” in this case.
> Let’s not do that, and let’s not set that kind of expectation.
> Encapsulation means never having anyone else tell you the
> exact values of your fields. Constructors are the gatekeepers
> of that encapsulation. Even record classes (transparent as they
> are) are allowed to have opinions about valid and invalid field
> values, and to reject or modify requests to create instances
> which are invalid according to the contract of the record class.

>>> 4. Points 2 and 3 may lead to the conclusion that not every record
>>> actually needs copying. In fact, I believe, only a few of them would
>>> need it. Adding them automatically would pollute the API and people
>>> may accidentally use them. I believe, if any automatic copying
>>> mechanism will be added, it should be explicitly enabled for specific
>>> records.

> An explicitly declared reconstructor would fulfill this goal.

yes, but given that we already provide a canonical constructor, i don't see the point of not providing the corresponding reconstructor. 

> A reconstructor, as opposed to a “wither” feature, would also scale
> from one argument to any number of arguments.

yes, as Tagir said, being able to set several fields at the same time allows to keep the invariants right. 

> This leads to the issue of defining API points which are polymorphic
> across collections of (statically determined) fields. Which is the
> present point. Note, however, that it has surprisingly deep roots.
> As soon as we added non-static final fields to Java, we incurred a
> debt to eventually examine this problem. Time’s up; here we are.

>> with/copy calls the canonical constructor at the end, it's not something that
>> provide a new behavior, but more a syntactic sugar you provide because updating
>> few fields of a record declaring a dozen of components by calling the canonical
>> constructor explicitly involve a lot of boilerplate code that may hide stupid
>> bugs like the values of two components can be swapped because the code called
>> the accessors in the wrong order.

> Well, this is an argument for keyword-based constructors also.
> And reconstructors as well. (See the connection? It’s 1.25 things
> here not 2 things.)

yes, but introducing named arguments constructors without named arguments methods is weird. 
That said, there is a difference between the two, at least if i understand correctly what they have done in C# (which has two different syntaxes for named arguments and named argument constructors for record), 
Using your cursor example, a cursor on an array can be defined like this 
record Cursor(int offset, String... array) { 
String item() { return array[offset]; } 
} 
var cursor = new Cursor(0, "foo", "bar"); 
and to update the offset 
cursor = cursor.with("offset", cursor.offset() + 1); 

what C# does by introducing a special syntax is to says that inside the method call to with, this doesn't reference the enclosing class but the current cursor, 
so 'offset' in the right hand side of the expression refers to the cursor offset 
cursor = cursor with { offset = offset + 1 } 

This notion that inside a 'builder' 'this' may not reference the enclosing instance but the receiver of the method call have been introduced first in Groovy and is now used in Kotlin, Swift and C#. 

> Setting all of the above aside for now, I have one old and one new
> idea about how to smooth out keyword-based calling sequences.
> These are offered in the spirit of brainstorming.

> The old idea is that, while good old Object… is a fine way to pass
> stuff around, we could (not now but later) choose to expand the
> set of available varargs calls by adding new carrier types as possible
> varargs bundles. So a key/val/key/val/... sequence could be passed
> with keys strongly typed as strings (or enum members, for extra
> checking!) and the vals typed as… well Object, still. The move
> needed for such a thing is, I think, simple though somewhat
> disruptive. Sketch of design:

> - have some way for marking a class A as varargs-capable
> - allow a method m to be marked as A-varargs instead of Object[]-varargs (m(…A
> a)?)
> - transform any call to m(a,b,c…) as m(new A(a,b,…))
> - use the standard rules for constructor resolution in A
> - note that at least one A constructor is probably A-varargs (recursive)
> - this allows A’s constructors to do a L-to-R parse of m’s arguments
> - A(T,U) and A(T,U,…A) give you Map<T,U> key/val/key/val lists

I don't think you need A to be varargs-capable, having a constructor which is a varargs is enough. 
I prefer using ** in a postfix way, Object[]** for array, List<String>** for list, Map<String,Integer>** for map. 
and + 1 of being recursive 

And i think you need to add what i was saying above, that 'this' in this context means the current constructed object 

> I’m just putting that out there. We can use Object… for the
> foreseeable future. An enhanced varargs feature would let us do
> better type checking, though. It would also allow the varargs
> carrier (A not Object…) to be (drum roll please) an inline type,
> getting rid of several kinds of technical debt associated with
> array-based varargs.

yes, varargs call should not allocate any object, implementing List** and Map** doesn't seem complex if there are translated into indy by the compiler and inline at runtime. 

> Second, here’s a new idea: During the JSR 292 design, we talked
> about building BSMs which could somehow capture constant
> (or presumed-constant) argument values and fold them into the
> target of the call-site. Remi, your proposed design for record
> <strikeout>withers</strikeout> reconstructors could use such
> a thing. We were (IIRC) uncertain how to do this well, although
> you may have prototyped something slick like you often do.

I have done something like this, you don't need a new kind of BSM just a BSM that traps arguments that are constant (that are the same as when previously call) and call another BSM with an array of the constant arguments as last arguments. 
I don't use that code anymore 
1/ testing with == to find the constant is not enough, sometimes you want the class to not change, not the value 
By example, for a reconstructor having the names of the components to be constant is not enough, you also need the class of the receiver to be able to see the name as record component. 
Or a value like a Locale to be equivalent to a previously seen version 
2/ you want to speed up the process if the compiler knows that something is a constant, there is no point to rediscover that it's a constant, so the compiler has to be aware of the special protocol 
3/ storing method arguments means they are leaking because your delaying the time they can be GCed 
4/ in the end, it's a kind of "half empty" mechanism (i don't know if there is a better name), it's for improving the performance of few cases but at the same time, writing an ad-hoc BSM for those cases is always better. 
And in a runtime, if you have cross the rubicon saying i want to optimize that call, you don't want to half optimize it but squeeze the last drop of perf you can. 

> This conversation made me revisit the question, and I have a
> proposal, a new general-purpose BSM combinator which sets
> a “trap” for the first call to a call site, samples the arguments
> which are purported to be constant, and then spins a subsidiary
> call site which “sees” the constants, and patches the latter call
> site into the former. Various configurations of mutable and
> constant call sites are possible and useful. A new kind of
> call site might be desirable, the StableCallSite, which is one
> that computes its true target on the first call (not linkage)
> and thereafter does not allow target changes.

> /** Arrange a call site which samples selected arguments
> * on the first call to the call site and calls the indicated bsm
> * to hand-craft a sub-call site based on those arguments.
> * The bsm is called as CallSite subcs = bsm(L,S,MT,ca…,arg…)
> * where the ca values are sampled from the initial dynamic
> * list according to caspec. */
> StableCallSite bootstrapWithConstantArguments(L,S,MT,caspec,bsm,arg…)

> /** Same as bootstrapWithConstantArguments, but the bsm
> * is called not only the first time, but every time a new argument
> * value is encountered. Arguments are compared with == not equals.
> */
> CallSite bootstrapWithSpeculatedArguments(L,S,MT,caspec,bsm,arg…)

> Other variations are possible, using other comparators and also
> key extractors. Object::getClass is a great key extractor; this gives
> us the pattern of monomorphic inline caches. For the speculating
> version, the existing and new targets could recombined into a
> decision tree; that requires an extra hook, perhaps a SwitchingCallSite
> or a MH switch combinator.

> Anyway, I’m brainstorming here, but it seems like we might have
> some MH API work to do that would give us leverage on the wither
> problem. It’s probably obvious, but I’ll say it anyway: The “caspec”
> thingy (a String? “0,2,4”?) would point out the places where the key
> arguments are placed in the field-polymorphic reconstructor call.
> The secondary BSM would take responsibility for building a custom
> reconstructor MH that takes the non-key (val) arguments and builds
> the requested record. The primary BSM (bootstrapWithCA) would
> recede to the background; it’s just a bit of colorless plumbing, having
> no linkage at all to record type translation strategy, other than the
> fact that it’s useful. Maybe there’s a record-specific BSM that wraps
> the whole magic trick, but it’s a simple combo on top.

> The hard part is building the reconstructor factory. I think that
> should be done in such a way that the record class itself has complete
> autonomy over the reconstruction. Probably the reconstructor
> factory should just wire up arguments “foo” where they exist
> in the reconstructor call, and pass argument “this.bar” where
> they are not mentioned via keys. This is easy. It’s clunky too,
> but until the JVM gives a real way to say the thing directly,
> it will work. Note that the clunkiness is hidden deep inside
> the runtime, and can be swapped out (or optimized) when
> a better technique is available, *without changing translation
> strategy*. For records, everything goes through the canonical
> constructor, including synthesized reconstructors. In the
> case of *inline* records, the runtime would create a suitable
> constructor call, and might (if it could prove it correct) use
> bare “withfield” opcodes to make optimization easier.

For inline, the inline static constructor has to be marked as canonical as i was saying earlier, 
So there is no need to use withfield directly, calling the inline static constructor is enough. 

> I think if we expose the API point as MyRecord::with(Object…)
> it should be possible to call the thing reflectively, or with
> variable keywords, or whatever. But javac should detect the
> common case of non-variable keywords, do some checks,
> and replace the call site with an indy, for those cases. That
> way we can have our cake and eat it too.

I disagree, we should provide a good perf model for the reconstructors 
I prefer that a code that perfoms badly to either not compile or is not possible to express instead of allowing people to call 'with' with non constant component name and ask us later to come with some miraculous optimizations. 
A cake should not require a tons of engineers to be salvaged because someone starts by adding a spoon full of salt. 

> (There are other things we can do beyond that, by slicing
> up the per-variable concerns from the per-instance concerns
> in the constructor, leading to better optimizations. The requires
> unknown translation strategy hooks. But this is enough
> brainstorming for one email.)

> — John

Rémi 

[1] i've spend half of a day last week chasing a perf bug due to a cast because a VarHandle was not created correctly.