A peek at the roadmap for pattern matching and more

John Rose john.r.rose at oracle.com
Thu Aug 13 23:00:28 UTC 2020


On Aug 13, 2020, at 3:01 PM, Guy Steele <guy.steele at oracle.com> wrote:
> 
>> 
>> On Aug 13, 2020, at 5:37 PM, John Rose <john.r.rose at oracle.com <mailto:john.r.rose at oracle.com>> wrote:
>> 
>> On Aug 13, 2020, at 12:39 PM, Guy Steele <guy.steele at oracle.com <mailto:guy.steele at oracle.com>> wrote:
>>> 
>>> Whereas I can more easily understand that the job of
>>> 
>>>     public deconstructor Point(int x, int y) {
>>>         x = this.x;
>>>         y = this.y;
>>>     }
>>> 
>>> is to take values out of the object “this” and put them into separate _variables_, not a new object.  (Granted, these variables have a somewhat new and mysterious existence and character.)
>> 
>> And if this mysterious character were something completely unrelated
>> to any other part of the Java language, I’d be more inclined to admit
>> that maybe the missing primitive is some sort of tuple.  It might have
>> to be given a status like that of Java arrays to avoid the infinite regress
>> problem you point out.
>> 
>> BUT, when I stare at a block of code that is setting some named/typed
>> variables, where those variables must be DA at the end of the block,
>> and then they are to be available in various other scopes (not the
>> current scope, but other API points), THEN I say, “where have I
>> seen that pattern before…?”  There is ALREADY a well-developed
>> part of the Java language which covers this sort of workflow
>> (of putting values into a set of required named/typed variables).
>> 
>> Of course, it’s a constructor,
> 
> Actually, a constructor _body_.

Yep.  And it is distinguished from a tuple-based notation in its
reference to (live) named/type values on exit.  We *could* have
used tuples there, by requiring that every (normal) exit from a constructor
must “return multiple values” by specifying a positional argument package
(a tuple) corresponding to all required (final) field settings.
We *could* have observed that something like `this(a,b,c)`, where
the argument list is exactly the required fields, is a perfectly universal
way to commit all required field values to an object, at the end of its
constructor.

Why didn’t we?  It would have been more symmetric in some way,
to have the outputs of the constructor leave the block in the same
format as the inputs.   One reason is the entities which are already
present: The fields are there, ready and waiting for assignment.

Another reason is surely that tuples would have been the wrong
notation for that job.  In a nutshell, positional notations only work
well when there are only a few positions, and named notations,
though more verbose, are more robustly expressive regardless
of the number of positions; they also degrade gracefully when
items may be omitted (optional initialization/binding/assignment).

I think we should (continue to) design for object arities which are
larger than (comfortable) parameter list arities.

> Let us also recall that there is a second well-developed part of the Java language
> that puts values into a set of required named/types variables: method invocation.
> And its structure and behavior are rather different from that of a constructor body.
> 
> (more below)
> 
> ...
> All of which would seem to suggest Rémi’s multi-value-return minmax example as the dual to method invocation:
> 
>>>   . . . a method minMax that returns both the minimum and the maximum of an array
>>>   public static (int min, int max) minMax(int[] array) {
>>> 
>> Nope.  Not going there.  I went down this road too, but multiple-return is another one of those “tease” features that looks cool but very quickly looks like glass 80% empty.  
> 
> Part of the job of method invocation is to take a set of values and definitely assign them to a set of variables (the method parameters).  This could be done with a block that is charged with the task of definitely assigning to those variables:
> 
> 	Math.atan{ x = 2.0; y = 3.0 }
> 	myString.substring{ if (weird) { beginIndex = 3; endIndex = 5; } else { beginIndex = 0; endIndex = myString.length(); } }
> 
> but for convenience (or for compatibility with C) we provide a different mechanism, with different syntax, that in effect uses positional tuples. A block-with-assignment mechanism is possible, but that’s not Java.
> 
> Therefore we will keep re-encountering the question of why positional tuples are good Java style for passing several arguments to a method but not for returning several values from a method.

That’s a good argument; your code example looks plenty ugly.
Surely positional notation is better for those simple use cases,
of well-known APIs where programmers have committed the
order of arguments firmly to memory.

But there are two reasons “that’s not Java” is not the whole story
here.

1. At high arities, positional notations falter, and people ask for
keyword-based argument notations, because it’s hard to commit
to memory the order of arguments for every API.  Java might
answer those demands at some point.  What we are discussing
here could do the job.

2. Java already has a “block of assignments” notation, the constructor
body.  Using that notation elsewhere, rewarding programmers for
learning that notation by giving them more ways to use it, is a
legitimate tactic.  (Yeah, maybe putting it in an external block,
outside its class, is “Not Java”; but lambdas were similarly
“Not Java” at one point; now they are.)

The imperative constructor body, with its named assignments,
can be more expressive and compact than a tuple expression.
It can be read piecewise, and the names help the reading (and
writing) process.  Conditional control flow can visually reify
case analysis for setting up the field values to be output from
the constructor body, without introducing extra temps.

All this is even more true when we connect up record parameters
to record fields, and allow elision of assignments of the form
`this.x =x`.  That amounts to an optionality feature where
the (positional) argument list of a record provides defaults
and then the compact constructor body provides a named
argument set (not an ordered list) of additionally processed
values.  Tuples are not the right notation here; it would be
less clear code if changing one record component (say,
doing a range clip) required the coder to specify the
adjusted record components as a new argument list.

Tuple notations work OK for two or three items but don’t scale nearly
as well as name-based notations when you have a larger collection of
columns to wrangle.  You could say, well, tuples are better if you are
going to specify all the names in some well-determined order—as
is the case with argument lists I suppose—because you can drop the
noise of the names (they don’t add anything).

Yes, in that case tuples are better.  But even for argument lists there is
a place where you really want by-name arguments, because remembering
the order of names is just too hard.  That’s what I mean by positional
notation not scaling well to high arities.

When we are talking about objects, I think we need to design for field
sets that are more numerous than comfortable argument list arities.
The constructor body notation is therefore a better precedent to
build on, for deconstructors and reconstructors, and anything else
that has a transaction on an object-sized scope (bigger than an
arg-list sized scope).

ADTs like Box and Rational and Point3D don’t support my case very well,
because they amount (at most) to pairs or triples.  But if you get anywhere
close to database rows (and I do think we want to scale out that way), then
tuples won’t take us where we want to go, but transactional blocks on names
(that is, constructor bodies suitably generalized) will take us places, and
will make use of mindshare already present in Java programmers.

Back to the point about “the fields are already there”:  While this may
be why constructor bodies are the way they are, I think we could
reconsider the source of the names that are present in what I call
a “transactional block” (with named values falling out the bottom,
and perhaps also falling in the top), starting with deconstructors.
These names could be specified by an argument list for an ad hoc
API point, not the (final non-static) field set of a class.

So an arrow-reversed constructor body is not just a fine way to
unpack the pre-existing fields of a class (that wants to cooperate
with pattern matching).  It is a direction in which Java can, maybe,
move to add some benefits of keyword-based calling sequences,
without importing something completely new.

— John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20200813/6a5fa347/attachment.htm>


More information about the amber-spec-experts mailing list