Array patterns (and varargs patterns)

Brian Goetz brian.goetz at oracle.com
Fri Sep 9 18:29:37 UTC 2022


Again, look for the embedding projection pairs.  The sets involved are 
T^n and T[].  The array creation operator is an embedding from T^n to 
T[]; the missing dual is the projection from T[] to T^k (for specific 
k.)  Projections are partial (or lossy), so these are patterns rather 
than total functions.  The dual of packing an array from a list of 
expressions is unpacking the elements into a list of variables.

When I pack an array:

     String[] ss = new String[] { "Hi", "Bob" };

this has a similar feel to

     Object o = "Bob";

in that we've thrown away some static typing information (in the former, 
that the array has length two.)  But this information is retained 
dynamically, and we can recover it with a runtime test. Asking

     if (o instanceof String s) { ... }

is asking "was the last assignment to `o` from a String".  Asking

     if (ss instanceof String[] { var a, var b }) { ... }

is asking "was the last assignment to ss a String[] with two elements" 
(and similar for other configurations of the nested patterns.)  In both 
cases, we are asking the same generalized question: could this { object, 
array } have come from an assignment / creation expression that has a 
certain shape.

I get it; you don't find this feature compelling.  You've said that 
already, and now we're just going in circles.  Your mail reads to me 
like "its a bad idea because I think its a bad idea."  Yes, other 
languages approach this in different ways; Caml deconstructs into (head, 
tail) because its fundamental data structure is a cons list. That makes 
sense given how the language works.  Java works differently, so 
transplanting from Caml or Javascript is not always going to be a good 
answer.  Remember the pattern mantra: each aggregation idiom in the 
language should have a corresponding form deconstruction pattern.  
Constructors have deconstruction patterns; factory methods will 
eventually have named static patterns; if we add collection literals, 
there will be collection patterns, etc.  If an aggregation form lacks a 
corresponding dual, this turns into an asymmetry which in turn means 
*destructuring cannot compose the same way aggregation composes*.  This 
is bad!  Arrays have their own special form of aggregation (array 
creation expression); array patterns are the corresponding destructuring.

I encourage you to re-read 
https://openjdk.org/projects/amber/design-notes/patterns/pattern-match-object-model 
, and the "red ball" API examples, to see what I mean.  This is about 
composibility, not about whether any given form of pattern "pays its 
weight."


So again, please try harder to engage with _why do we think this is 
important_, and the specifics of what has been proposed, rather than 
just waving the YAGNI stick.  There's a bigger picture here.

>>> For me, Arrays.of() is a named pattern with a vararg list of bindings, no ?
>> Its a named pattern, but to work, it would need varargs patterns -- and
>> array patterns are the underpinnings of varargs, just as array creation
>> is the underpinning of varargs invocation.  We're not going to do
>> varargs patterns differently than we do varargs invocation, just to
>> avoid doing array patterns -- that would be silly.
> Here we want to extract the value into bindings/variables, that is not what the varargs does, the varargs  takes a bunch of value on stack and put them into an array.
> Here we want the opposite operation of a varargs, the spread (or splat) operator that takes the argument from an array (or a collection ?) and put them on the stack.
>
> If we have the pattern method Arrays.of()
>
> static <T> pattern (T...) of(T[] array) {  // here it's a varargs
>    ...
> }
>
> and we call it using a named pattern
>    switch(array) {
>      case Arrays.of(/* insert a syntax here */) -> ...
>
> the syntax should extract some/all values of the array into one or several bindings.
>
> If we are in Caml, we have the :: operator to separate the first element from the rest
>    switch(array) {
>      case Arrays.of(String first :: String[] rest) -> ...
>
> If we are in JavaScript, we have the spread operator (notice that the ... is before the type)
>    switch(array) {
>      case Arrays.of(String first, ... String[] rest) -> ...
>
> So the varargs is at the declaration side, at the pattern side we need a new operator spread, so i think that adding an array pattern now is not a good idea.
>
> regards,
> Rémi
>
>>>>> With best regards,
>>>>> Tagir Valeev.
>>>>>
>>>>> On Tue, Sep 6, 2022 at 11:11 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>>>>>> We dropped this out of the record patterns JEP, but I think it is time to
>>>>>> revisit this.
>>>>>>
>>>>>> The concept of array patterns was pretty straightforward; they mimic the nesting
>>>>>> and exhaustiveness rules of record patterns, they are just a different sort of
>>>>>> container for nested patterns.  And they have an obvious duality with array
>>>>>> creation expressions.
>>>>>>
>>>>>> The main open question here was how we distinguish between "match an array of
>>>>>> length exactly N" (where there are N nested patterns) and "match an array of
>>>>>> length at least N".  We toyed with the idea of a "..." indicator to mean "more
>>>>>> elements", but this felt a little forced and opened new questions.
>>>>>>
>>>>>> It later occurred to me that there is another place to nest a pattern in an
>>>>>> array pattern -- to match (and bind) the length.  In the following, assume for
>>>>>> sake of exposition that "_" is the "any" pattern (matches everything, binds
>>>>>> nothing) and that we have some way to denote a constant pattern, which I'll
>>>>>> denote here with a constant literal.
>>>>>>
>>>>>> There is an obvious place to put this (optional) pattern: in between the
>>>>>> brackets.  So:
>>>>>>
>>>>>>        case String[1] { P }:
>>>>>>                    ^ a constant pattern
>>>>>>
>>>>>> would match string arrays of length 1 whose sole element matches P.  And
>>>>>>
>>>>>>        case String[] { P, Q }
>>>>>>
>>>>>> would match string arrays of length exactly 2, whose first two elements match P
>>>>>> and Q respectively.  (If the length pattern is not specified, we infer a
>>>>>> constant pattern whose constant is equal to the length of the nested pattern
>>>>>> list.)
>>>>>>
>>>>>> Matching a target to `String[L] { P0, .., Pn }` means
>>>>>>
>>>>>>        x instanceof String[] arr
>>>>>>            && arr.length matches L
>>>>>>            && arr.length >= n
>>>>>>            && arr[0] matches P0
>>>>>>            && arr[1] matches P1
>>>>>>            ...
>>>>>>            && arr[n] matches Pn
>>>>>>
>>>>>> More examples:
>>>>>>
>>>>>>        case String[int len] { P }
>>>>>>
>>>>>> would match string arrays of length >= 1 whose first element matches P, and
>>>>>> further binds the array length to `len`.
>>>>>>
>>>>>>        case String[_] { P, Q }
>>>>>>
>>>>>> would match string arrays of any length whose first two elements match P and Q.
>>>>>>
>>>>>>        case String[3] { }
>>>>>>                    ^constant pattern
>>>>>>
>>>>>> matches all string arrays of length 3.
>>>>>>
>>>>>>
>>>>>> This is a more principled way to do it, because the length is a part of the
>>>>>> array and deserves a chance to match via nested patterns, just as with the
>>>>>> elements, and it avoid trying to give "..." a new meaning.
>>>>>>
>>>>>> The downside is that it might be confusing at first (though people will learn
>>>>>> quickly enough) how to distinguish between an exact match and a prefix match.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/5/2021 1:48 PM, Brian Goetz wrote:
>>>>>>
>>>>>> As we get into the next round of pattern matching, I'd like to opportunistically
>>>>>> attach another sub-feature: array patterns.  (This also bears on the question
>>>>>> of "how would varargs patterns work", which I'll address below, though they
>>>>>> might come later.)
>>>>>>
>>>>>> ## Array Patterns
>>>>>>
>>>>>> If we want to create a new array, we do so with an array construction
>>>>>> expression:
>>>>>>
>>>>>>        new String[] { "a", "b" }
>>>>>>
>>>>>> Since each form of aggregation should have its dual in destructuring, the
>>>>>> natural way to represent an array pattern (h/t to AlanM for suggesting this)
>>>>>> is:
>>>>>>
>>>>>>        if (arr instanceof String[] { var a, var b }) { ... }
>>>>>>
>>>>>> Here, the applicability test is: "are you an instanceof of String[], with length
>>>>>> = 2", and if so, we cast to String[], extract the two elements, and match them
>>>>>> to the nested patterns `var a` and `var b`.   This is the natural analogue of
>>>>>> deconstruction patterns for arrays, complete with nesting.
>>>>>>
>>>>>> Since an array can have more elements, we likely need a way to say "length >= 2"
>>>>>> rather than simply "length == 2".  There are multiple syntactic ways to get
>>>>>> there, for now I'm going to write
>>>>>>
>>>>>>        if (arr instanceof String[] { var a, var b, ... })
>>>>>>
>>>>>> to indicate "more".  The "..." matches zero or more elements and binds nothing.
>>>>>>
>>>>>> <digression>
>>>>>> People are immediately going to ask "can I bind something to the remainder"; I
>>>>>> think this is mostly an "attractive distraction", and would prefer to not have
>>>>>> this dominate the discussion.
>>>>>> </digression>
>>>>>>
>>>>>> Here's an example from the JDK that could use this effectively:
>>>>>>
>>>>>> String[] limits = limitString.split(":");
>>>>>> try {
>>>>>>        switch (limits.length) {
>>>>>>            case 2: {
>>>>>>                if (!limits[1].equals("*"))
>>>>>>                    setMultilineLimit(MultilineLimit.DEPTH, Integer.parseInt(limits[1]));
>>>>>>            }
>>>>>>            case 1: {
>>>>>>                if (!limits[0].equals("*"))
>>>>>>                    setMultilineLimit(MultilineLimit.LENGTH, Integer.parseInt(limits[0]));
>>>>>>            }
>>>>>>        }
>>>>>> }
>>>>>> catch(NumberFormatException ex) {
>>>>>>        setMultilineLimit(MultilineLimit.DEPTH, -1);
>>>>>>        setMultilineLimit(MultilineLimit.LENGTH, -1);
>>>>>> }
>>>>>>
>>>>>> becomes (eventually)
>>>>>>
>>>>>>        switch (limitString.split(":")) {
>>>>>>            case String[] { var _, Integer.parseInt(var i) } -> setMultilineLimit(DEPTH, i);
>>>>>>            case String[] { Integer.parseInt(var i) } -> setMultilineLimit(LENGTH, i);
>>>>>>            default -> { setMultilineLimit(DEPTH, -1); setMultilineLimit(LENGTH, -1); }
>>>>>>        }
>>>>>>
>>>>>> Note how not only does this become more compact, but the unchecked
>>>>>> "NumberFormatException" is folded into the match, rather than being a separate
>>>>>> concern.
>>>>>>
>>>>>>
>>>>>> ## Varargs patterns
>>>>>>
>>>>>> Having array patterns offers us a natural way to interpret deconstruction
>>>>>> patterns for varargs records.  Assume we have:
>>>>>>
>>>>>>        void m(X... xs) { }
>>>>>>
>>>>>> Then a varargs invocation
>>>>>>
>>>>>>        m(a, b, c)
>>>>>>
>>>>>> is really sugar for
>>>>>>
>>>>>>        m(new X[] { a, b, c })
>>>>>>
>>>>>> So the dual of a varargs invocation, a varargs match, is really a match to an
>>>>>> array pattern.  So for a record
>>>>>>
>>>>>>        record R(X... xs) { }
>>>>>>
>>>>>> a varargs match:
>>>>>>
>>>>>>        case R(var a, var b, var c):
>>>>>>
>>>>>> is really sugar for an array match:
>>>>>>
>>>>>>        case R(X[] { var a, var b, var c }):
>>>>>>
>>>>>> And similarly, we can use our "more arity" indicator:
>>>>>>
>>>>>>        case R(var a, var b, var c, ...):
>>>>>>
>>>>>> to indicate that there are at least three elements.
>>>>>>



More information about the amber-spec-observers mailing list