Yield as contextual keyword (was: Call for bikeshed -- break replacement in expression switch)

Fri May 17 16:57:28 UTC 2019

As was pointed out in Keyword Management for the Java Language (https://openjdk.java.net/jeps/8223002 <https://openjdk.java.net/jeps/8223002>), contextual keywords are a compromise, and their compromises vary by lexical position.  Let’s take a more organized look at the costs and options for doing `yield` as a contextual keyword.  

But, before we do, let’s put this in context (heh): methods called yield() are rare (there’s only one in the JDK), and blocks on the RHS of an arrow-switch are rare, so we’re talking about the interaction of two corner cases.  

Let’s take the following example.  

class C {
  /* 1 */  void yield(int x) { }

  void m(int y) {
      /* 2 */  yield (1);
      /* 3 */  yield 1;

      int z = switch (y) {
          case 0 -> {
              /* 4 */  yield (1);
          }
          case 1 -> {
              /* 5 */  yield 1;
          }
          default -> 42;
      }
  }
}

First, requirements: 

For usage (1), this has to be a valid method declaration.  

For usage (2), this has to be a method invocation.  

For usage (3), this has to be some sort of compilation error.  

For usage (4), there is some discussion to be had.

For usage (5), this has to be a yield statement.

(1) is not problematic, as the yield-statement production is not in play at all when parsing method declarations.  

(3) is not problematic, as there is no ambiguity between method-invocation and yield-statement, and yield-statement is not allowed here.  (Even if the operand were an identifier, not a numeric literal, it would not be ambiguous with a local variable declaration, because `yield` will not be permitted as a type identifier.). 

(5) is not problematic, as there is no ambiguity between method invocation and yield-statement.

Let’s talk about (2) and (4).  

Let’s assume the parser production only allows yield statement inside of a block on the RHS of an arrow-switch (and maybe some other contexts in the future, but not all blocks).  Let’s call these “switchy blocks” for clarity.  That means that (2) is similarly unambiguous to (3), and will be parsed as a method invocation.  So this is really all about (4).  

OPTION A: DISALLOW YIELD (E)
----------------------------

In this option, we disallow yield statements whose argument is a parenthesized expression, instead parsing them as method invocations.  Most such invocations will fail as there is unlikely to be a yield() method in scope.  

From a parser perspective, this is straightforward enough; we need an alternate Expression production which omits “parenthesized expression.”  

From a user perspective, I think this is likely to be a sharp edge, as I would expect it to be more common to want to use a parenthesized operand than there will be a yield method in scope.

OPTION B: DISALLOW UNQUALIFIED INVOCATION
-----------------------------------------

From a parser perspective, this is similarly straightforward: inside a switchy block, give the rule `yield <expr>` a higher priority than method invocation.  The compiler can warn on this ambiguity, if we like.  

From a user perspective, users wanting to invoke yield() methods inside switchy blocks will need to qualify the receiver (Foo.yield(), this.yield(), etc). 

The cost is that a statement “yield (e)” parses to different things in different contexts; in a switchy block, it is a yield statement, the rest of the time, it is a method invocation.  

I think this is much less likely to cause user distress than Option A, because it is rare that there is an unqualified yield(x) method in scope.  (And, given every yield() method I can think of, you’d likely never call one from a switchy block anyway, as they are side-effectful and blocking.). And in the case of collision, there is a clear workaround if the user really wanted a method invocation, and the compiler can deliver a warning when there is actual ambiguity.  

OPTION C: SYMBOL-DRIVEN PARSING
-------------------------------

In this option, the context-sensitivity of parsing includes a check for whether a `yield()` method is in scope.  I think we can rule this out as overly heroic; constraining parsing to be aware of the symbol table is asking a lot of compilers.  

OPTION D: BOTH WAYS
-------------------

In this option, we proceed as with Option A, but when we get to symbol analysis, if we are in a switchy block and there is no yield() method in scope, we rewrite the tree to be a yield statement instead.  

OPTION E: A REAL KEYWORD
------------------------

The pain above is an artifact of choosing a contextual keyword; on the scale of contextual pain, this rates a “mild”, largely because true collisions are likely to be quite rare, and there is no backward compatibility concern.  So while choosing a real keyword (break-with) would be cleaner, I don’t think the users will like it as much.  

My opinions: I think C is pretty much a non-starter, and IMO B is measurably more attractive than A.  Option D is not as terrible as C but seems overly heroic, as we try to avoid tree-rewriting in attribution.  I don’t think the pain of either A or B merits grabbing for E.