Unnamed variables and match-all patterns

Wed Sep 7 17:41:33 UTC 2022

We've gone around and around a few times on "unnamed variables" 
(underscore), starting with JEP 302 (Lambda Leftovers).  We reclaimed 
the underscore token in Java 9 with the intention of using it for 
unnamed variables and "any" patterns.  Along the way, we ran into some 
hiccups, and it has sat on the shelf for a while.  Let's take it down, 
dust it off, and see if we have any more clarity than before.

There are three syntactic productions in which we might want to use 
underscore as a "don't care" indicator:

  - Unnamed variables.  Here, underscore stands in for a variable name.  
When we declare a local variable, catch formal, pattern variable, etc, 
whose name is `_`, which has the effect of entering no new names in 
scope.  It becomes an "initialize-only" variable.

     try { ... }
     catch (FooException _) { throw new BarException("foo"); }

  - Partial inference.  Here, underscore stands in for a type name.  
Today, we can infer type variables for generic method invocations and 
constructor invocations, but it is all-or-nothing.  Being able to denote 
"infer this type" would allow us to do partial inference:

     foo.<String, _>m(...)

  - "Any" patterns.  Here, underscore is a pattern, which matches 
everything, and binds nothing.

     case Foo(var s, _): ...

We don't have to do all of these; right now we're not considering 
partial inference, but the other two are reasonable options.  Unnamed 
variables have been a long-standing request; any patterns will likely be 
a common request soon as well.

For a match-all pattern, there is little to say other than "_" is one of 
the alternatives of the Pattern production, it is applicable to all 
types, it is unconditional on all types, and it has no bindings.  The 
specification already has a concept of "any" patterns; this is just 
making it denotable.

I think there is little controversy about using unnamed local variables 
(local variable declaration statements, catch formals, foreach induction 
variables, resources in try-with-resources) and unnamed lambda 
parameters.  What is common to all of these is that these are _pure 
implementation details_, where the author has elected to not give a name 
to a variable that is entirely implementation-facing.  This seems 
eminently reasonable.  Unnamed parameters can help eliminate errors by 
capturing design assumptions and make life easier for static analysis 
tools that like to point out unused variables.

Where we stumble is on method parameters, because method parameter names 
serve two masters -- the implementation (as the declaration of a 
variable) and the API (as part of the specification of what the method 
does.)  Among other things, we like to document the semantics of method 
parameters in Javadoc with the `@param` tag, but doing so requires a 
name (or inventing a new Javadoc mechanism like `@param #4`, likely a 
loser.)  Secondarily, sometimes parameter names are retained in the 
MethodParameters attribute, though that attribute (JVMS 4.7.24) already 
supports parameters without names by using a zero CP index.

With `var`, we drew a clear line of "implementation only" -- you can't 
infer a method return type, even for a private method, you can only use 
it for local variables and lambda formals.  This has been pretty 
successful.

We've explored a number of intermediate points on the spectrum with 
varying degrees of stability:

  A) Implementation only -- local variables, catch formals, for-loop 
induction variables, TWR resources, pattern variables, lambda formals
  B) "A++", where we add in method parameters of anonymous classes
  C) Adding in method parameters _for non-initial declarations_ -- allow 
unnamed parameters only for methods that override a method from a 
supertype, ensuring that there is a real specification of what the 
parameters mean.
  D) Anything goes, any method parameter can be unnamed, throwing 
specification to the wind.

A is a stable point, and has the advantage of mostly lining up with 
where we can use `var`.  But users will surely grumble that they can't 
use it for implementations of methods from supertypes.  As this feature 
request predates lambdas and patterns, giving it to lambdas and patterns 
but not ordinary methods might feel a bit mean.

The motivation for B is obvious -- to support smooth refactoring between 
lambdas and inner classes -- but is not a very stable point, as one will 
immediately ask "what about refactoring to named classes".

C feels attractive, though there would surely be complaints too; it 
excludes constructors and static methods (which might sometimes want 
unnamed parameters when a parameter is no longer used, but stays around 
for binary compatibility), and even some initial declarations.  But, 
these cases are likely to be somewhat more rare, so I don't object to 
leaving these aside. The main concern is that this might feel 
arbitrary.  There is also the possibility for some confusion; it is not 
obvious what it means when you override a method that already has an 
unnamed parameter.  Can you give it a name and use it?  It is a little 
weird that the lack of name applies only to the implementation of the 
method, but somehow bleeds into the specification.  There is also some 
impact on Javadoc, as well as lingering concerns that there are other 
shoes to drop other than Javadoc and MethodParameters.

D is also stable, but feels like it makes the language less safe, by 
making some methods unspecifiable.  On the other hand, the people who 
might use it for initial declarations, static methods, etc, are also the 
sort of people who probably don't write specification anyway (otherwise 
they would realize that they are depriving their callers of useful 
information.)

In (C), Javadoc could insert an `@implNote` that says something like 
"this implementation ignores the value of parameters <x> and <y> from 
declaring method Foo::bar".  In (D), it could say "ignores its 3rd and 
4th parameter", or insert synthetic @param tags for parameters whose 
name is something like "<unnamed>".

Past discussions seemed to gravitate toward either A or D, which are 
also the simplest / most stable points.  I guess it becomes a question 
of getting over the "makes the language less safe" concerns.

Regardless, I'd like to see if we can quantify the "lingering concerns 
about other shoes to drop."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20220907/45c0575c/attachment-0001.htm>