Array patterns (and varargs patterns)
Brian Goetz
brian.goetz at oracle.com
Fri Sep 9 18:29:37 UTC 2022
Again, look for the embedding projection pairs. The sets involved are
T^n and T[]. The array creation operator is an embedding from T^n to
T[]; the missing dual is the projection from T[] to T^k (for specific
k.) Projections are partial (or lossy), so these are patterns rather
than total functions. The dual of packing an array from a list of
expressions is unpacking the elements into a list of variables.
When I pack an array:
String[] ss = new String[] { "Hi", "Bob" };
this has a similar feel to
Object o = "Bob";
in that we've thrown away some static typing information (in the former,
that the array has length two.) But this information is retained
dynamically, and we can recover it with a runtime test. Asking
if (o instanceof String s) { ... }
is asking "was the last assignment to `o` from a String". Asking
if (ss instanceof String[] { var a, var b }) { ... }
is asking "was the last assignment to ss a String[] with two elements"
(and similar for other configurations of the nested patterns.) In both
cases, we are asking the same generalized question: could this { object,
array } have come from an assignment / creation expression that has a
certain shape.
I get it; you don't find this feature compelling. You've said that
already, and now we're just going in circles. Your mail reads to me
like "its a bad idea because I think its a bad idea." Yes, other
languages approach this in different ways; Caml deconstructs into (head,
tail) because its fundamental data structure is a cons list. That makes
sense given how the language works. Java works differently, so
transplanting from Caml or Javascript is not always going to be a good
answer. Remember the pattern mantra: each aggregation idiom in the
language should have a corresponding form deconstruction pattern.
Constructors have deconstruction patterns; factory methods will
eventually have named static patterns; if we add collection literals,
there will be collection patterns, etc. If an aggregation form lacks a
corresponding dual, this turns into an asymmetry which in turn means
*destructuring cannot compose the same way aggregation composes*. This
is bad! Arrays have their own special form of aggregation (array
creation expression); array patterns are the corresponding destructuring.
I encourage you to re-read
https://openjdk.org/projects/amber/design-notes/patterns/pattern-match-object-model
, and the "red ball" API examples, to see what I mean. This is about
composibility, not about whether any given form of pattern "pays its
weight."
So again, please try harder to engage with _why do we think this is
important_, and the specifics of what has been proposed, rather than
just waving the YAGNI stick. There's a bigger picture here.
>>> For me, Arrays.of() is a named pattern with a vararg list of bindings, no ?
>> Its a named pattern, but to work, it would need varargs patterns -- and
>> array patterns are the underpinnings of varargs, just as array creation
>> is the underpinning of varargs invocation. We're not going to do
>> varargs patterns differently than we do varargs invocation, just to
>> avoid doing array patterns -- that would be silly.
> Here we want to extract the value into bindings/variables, that is not what the varargs does, the varargs takes a bunch of value on stack and put them into an array.
> Here we want the opposite operation of a varargs, the spread (or splat) operator that takes the argument from an array (or a collection ?) and put them on the stack.
>
> If we have the pattern method Arrays.of()
>
> static <T> pattern (T...) of(T[] array) { // here it's a varargs
> ...
> }
>
> and we call it using a named pattern
> switch(array) {
> case Arrays.of(/* insert a syntax here */) -> ...
>
> the syntax should extract some/all values of the array into one or several bindings.
>
> If we are in Caml, we have the :: operator to separate the first element from the rest
> switch(array) {
> case Arrays.of(String first :: String[] rest) -> ...
>
> If we are in JavaScript, we have the spread operator (notice that the ... is before the type)
> switch(array) {
> case Arrays.of(String first, ... String[] rest) -> ...
>
> So the varargs is at the declaration side, at the pattern side we need a new operator spread, so i think that adding an array pattern now is not a good idea.
>
> regards,
> Rémi
>
>>>>> With best regards,
>>>>> Tagir Valeev.
>>>>>
>>>>> On Tue, Sep 6, 2022 at 11:11 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>>>>>> We dropped this out of the record patterns JEP, but I think it is time to
>>>>>> revisit this.
>>>>>>
>>>>>> The concept of array patterns was pretty straightforward; they mimic the nesting
>>>>>> and exhaustiveness rules of record patterns, they are just a different sort of
>>>>>> container for nested patterns. And they have an obvious duality with array
>>>>>> creation expressions.
>>>>>>
>>>>>> The main open question here was how we distinguish between "match an array of
>>>>>> length exactly N" (where there are N nested patterns) and "match an array of
>>>>>> length at least N". We toyed with the idea of a "..." indicator to mean "more
>>>>>> elements", but this felt a little forced and opened new questions.
>>>>>>
>>>>>> It later occurred to me that there is another place to nest a pattern in an
>>>>>> array pattern -- to match (and bind) the length. In the following, assume for
>>>>>> sake of exposition that "_" is the "any" pattern (matches everything, binds
>>>>>> nothing) and that we have some way to denote a constant pattern, which I'll
>>>>>> denote here with a constant literal.
>>>>>>
>>>>>> There is an obvious place to put this (optional) pattern: in between the
>>>>>> brackets. So:
>>>>>>
>>>>>> case String[1] { P }:
>>>>>> ^ a constant pattern
>>>>>>
>>>>>> would match string arrays of length 1 whose sole element matches P. And
>>>>>>
>>>>>> case String[] { P, Q }
>>>>>>
>>>>>> would match string arrays of length exactly 2, whose first two elements match P
>>>>>> and Q respectively. (If the length pattern is not specified, we infer a
>>>>>> constant pattern whose constant is equal to the length of the nested pattern
>>>>>> list.)
>>>>>>
>>>>>> Matching a target to `String[L] { P0, .., Pn }` means
>>>>>>
>>>>>> x instanceof String[] arr
>>>>>> && arr.length matches L
>>>>>> && arr.length >= n
>>>>>> && arr[0] matches P0
>>>>>> && arr[1] matches P1
>>>>>> ...
>>>>>> && arr[n] matches Pn
>>>>>>
>>>>>> More examples:
>>>>>>
>>>>>> case String[int len] { P }
>>>>>>
>>>>>> would match string arrays of length >= 1 whose first element matches P, and
>>>>>> further binds the array length to `len`.
>>>>>>
>>>>>> case String[_] { P, Q }
>>>>>>
>>>>>> would match string arrays of any length whose first two elements match P and Q.
>>>>>>
>>>>>> case String[3] { }
>>>>>> ^constant pattern
>>>>>>
>>>>>> matches all string arrays of length 3.
>>>>>>
>>>>>>
>>>>>> This is a more principled way to do it, because the length is a part of the
>>>>>> array and deserves a chance to match via nested patterns, just as with the
>>>>>> elements, and it avoid trying to give "..." a new meaning.
>>>>>>
>>>>>> The downside is that it might be confusing at first (though people will learn
>>>>>> quickly enough) how to distinguish between an exact match and a prefix match.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/5/2021 1:48 PM, Brian Goetz wrote:
>>>>>>
>>>>>> As we get into the next round of pattern matching, I'd like to opportunistically
>>>>>> attach another sub-feature: array patterns. (This also bears on the question
>>>>>> of "how would varargs patterns work", which I'll address below, though they
>>>>>> might come later.)
>>>>>>
>>>>>> ## Array Patterns
>>>>>>
>>>>>> If we want to create a new array, we do so with an array construction
>>>>>> expression:
>>>>>>
>>>>>> new String[] { "a", "b" }
>>>>>>
>>>>>> Since each form of aggregation should have its dual in destructuring, the
>>>>>> natural way to represent an array pattern (h/t to AlanM for suggesting this)
>>>>>> is:
>>>>>>
>>>>>> if (arr instanceof String[] { var a, var b }) { ... }
>>>>>>
>>>>>> Here, the applicability test is: "are you an instanceof of String[], with length
>>>>>> = 2", and if so, we cast to String[], extract the two elements, and match them
>>>>>> to the nested patterns `var a` and `var b`. This is the natural analogue of
>>>>>> deconstruction patterns for arrays, complete with nesting.
>>>>>>
>>>>>> Since an array can have more elements, we likely need a way to say "length >= 2"
>>>>>> rather than simply "length == 2". There are multiple syntactic ways to get
>>>>>> there, for now I'm going to write
>>>>>>
>>>>>> if (arr instanceof String[] { var a, var b, ... })
>>>>>>
>>>>>> to indicate "more". The "..." matches zero or more elements and binds nothing.
>>>>>>
>>>>>> <digression>
>>>>>> People are immediately going to ask "can I bind something to the remainder"; I
>>>>>> think this is mostly an "attractive distraction", and would prefer to not have
>>>>>> this dominate the discussion.
>>>>>> </digression>
>>>>>>
>>>>>> Here's an example from the JDK that could use this effectively:
>>>>>>
>>>>>> String[] limits = limitString.split(":");
>>>>>> try {
>>>>>> switch (limits.length) {
>>>>>> case 2: {
>>>>>> if (!limits[1].equals("*"))
>>>>>> setMultilineLimit(MultilineLimit.DEPTH, Integer.parseInt(limits[1]));
>>>>>> }
>>>>>> case 1: {
>>>>>> if (!limits[0].equals("*"))
>>>>>> setMultilineLimit(MultilineLimit.LENGTH, Integer.parseInt(limits[0]));
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> catch(NumberFormatException ex) {
>>>>>> setMultilineLimit(MultilineLimit.DEPTH, -1);
>>>>>> setMultilineLimit(MultilineLimit.LENGTH, -1);
>>>>>> }
>>>>>>
>>>>>> becomes (eventually)
>>>>>>
>>>>>> switch (limitString.split(":")) {
>>>>>> case String[] { var _, Integer.parseInt(var i) } -> setMultilineLimit(DEPTH, i);
>>>>>> case String[] { Integer.parseInt(var i) } -> setMultilineLimit(LENGTH, i);
>>>>>> default -> { setMultilineLimit(DEPTH, -1); setMultilineLimit(LENGTH, -1); }
>>>>>> }
>>>>>>
>>>>>> Note how not only does this become more compact, but the unchecked
>>>>>> "NumberFormatException" is folded into the match, rather than being a separate
>>>>>> concern.
>>>>>>
>>>>>>
>>>>>> ## Varargs patterns
>>>>>>
>>>>>> Having array patterns offers us a natural way to interpret deconstruction
>>>>>> patterns for varargs records. Assume we have:
>>>>>>
>>>>>> void m(X... xs) { }
>>>>>>
>>>>>> Then a varargs invocation
>>>>>>
>>>>>> m(a, b, c)
>>>>>>
>>>>>> is really sugar for
>>>>>>
>>>>>> m(new X[] { a, b, c })
>>>>>>
>>>>>> So the dual of a varargs invocation, a varargs match, is really a match to an
>>>>>> array pattern. So for a record
>>>>>>
>>>>>> record R(X... xs) { }
>>>>>>
>>>>>> a varargs match:
>>>>>>
>>>>>> case R(var a, var b, var c):
>>>>>>
>>>>>> is really sugar for an array match:
>>>>>>
>>>>>> case R(X[] { var a, var b, var c }):
>>>>>>
>>>>>> And similarly, we can use our "more arity" indicator:
>>>>>>
>>>>>> case R(var a, var b, var c, ...):
>>>>>>
>>>>>> to indicate that there are at least three elements.
>>>>>>
More information about the amber-spec-observers
mailing list