We need more keywords, captain!

Brian Goetz brian.goetz at oracle.com
Tue Jan 8 15:22:17 UTC 2019

This document proposes a possible move that will buy us some breathing 
room in the perpetual problem where the keyword-management tail wags the 
programming-model dog.

## We need more keywords, captain!

Java has a fixed set of _keywords_ (JLS 3.9) which are not allowed to
be used as identifiers.  This set has remained quite stable over the
years (for good reason), with the exceptions of `assert` added in 1.4,
`enum` added in 5, and `_` added in 9.  In addition, there are also
several _reserved identifiers_ (`true`, `false`, and `null`) which
behave almost like keywords.

Over time, as the language evolves, language designers face a
challenge; the set of keywords imagined in version 1.0 are rarely
suitable for expressing all the things we might ever want our language
to express.  We have several tools at our disposal for addressing this

  - Eminent domain.  Take words that were previously identifiers, and
    turn them into keywords, as we did with `assert` in 1.4.

  - Recycle.  Repurpose an existing keyword for something that it was
    never really meant for (such as using `default` for annotation
    values or default methods).

  - Do without.  Find a way to pick a syntax that doesn't require a
    new keyword, such as using `@interface` for annotations instead of
    `annotation` -- or don't do the feature at all.

  - Smoke and mirrors.  Create the illusion of context-dependent
    keywords through various linguistic heroics (restricted keywords,
    reserved type names.)

In any given situation, all of these options are on the table -- but
most of the time, none of these options are very good.  The lack of
reasonable options for extending the syntax of the language threatens
to become a significant impediment to language evolution.

#### Why not "just" make new keywords?

While it may be legal for us to declare `i` to be a keyword in a
future version of Java, this would likely break every program in the
world,  since `i` is used so commonly as an identifier.  (When the
`assert` keyword was added in 1.4, it broke every testing framework.)
The cost of remediating the effect of such incompatible changes varies
as well; invalidating a name choice for a local variable has a local
fix,  but invalidating the name of a public type or an interface
method might well be fatal.

Additionally, the keywords we're likely to want to reclaim are often
those that are popular as identifiers (e.g., `value`, `var`,
`method`), making such fatal collisions more likely.  In some cases,
if the keyword candidate in question is sufficiently rarely used as an
identifier, we might still opt to take that source-compatibility hit
-- but names that are less likely to collide (e.g.,
`usually_but_not_always_final`) are likely not the ones we want in our
language. Realistically, this is unlikely to be a well we can go to
very often, and the bar must be very high.

#### Why not "just" live with the keywords we have?

Reusing keywords in multiple contexts has ample precedent in
programming languages, including Java.  (For example, we (ab)use `final`
for "not mutable", "not overridable", and "not extensible".)
Sometimes, using an existing keyword in a new context is natural and
sensible, but usually it's not our first choice.  Over time, as the
range of demands we place on our keyword set expands, this may well
descend into the ridiculous; no one wants to use `null final` as a way
of negating finality.  (While one might think such things are too
ridiculous to consider, note that we received serious-seeming
suggestions during JEP 325 to use `new switch` to describe a switch
with different semantics.  Presumably to be followed by `new new
switch` in ten years.)

Of course, one way to live without making new keywords is to stop
evolving the language entirely.  While there are some who think this
is a fine idea, doing so because of the lack of available tokens would
be a silly reason. We are convinced that Java has a long life ahead of
it, and developers are excited about new features that enable to them
to write more expressive and reliable code.

#### Why not "just" make contextual keywords?

At first glance, contextual keywords (and their friends, such as
reserved type identifiers) may appear to be a magic wand; they let us
create the illusion of adding new keywords without breaking existing
programs.  But the positive track record of contextual keywords hides
a great deal of complexity and distortion.

Each grammar position is its own story; contextual keywords that might
be used as modifiers (e.g., `readonly`) have different ambiguity
considerations than those that might be use in code (e.g., a `matches`
expression).  The process of selecting a contextual keyword is not a
simple matter of adding it to the grammar; each one requires an
analysis of potential current and future interactions.  Similarly,
each token we try to repurpose may have its own special
considerations;  for example, we could justify the use of `var` as a
reserved type name  because because the naming conventions are so
broadly adhered to.  Finally, the use of contextual keywords in
certain  syntactic positions can create additional considerations for
extending the syntax later.

Contextual keywords create complexity for specifications, compilers,
and IDEs.  With one or two special cases, we can often deal well
enough, but if special cases were to become more pervasive, this would
likely result in more significant maintenance costs or bug tail. While
it is easy to dismiss this as “not my problem”, in reality, this is
everybody’s problem. IDEs often have to guess whether a use of a
contextual keyword is a keyword or identifier, and it may not have
enough information to make a good guess until it’s seen more input.
This results in worse user highlighting, auto-completion, and
refactoring abilities — or worse.  These problems quickly become
everyone's problems.

So, while contextual keywords are one of the tools in our toolbox,
they should also be used sparingly.

#### Why is this a problem?

Aside from the obvious consequences of these problems (clunky syntax,
complexity, bugs), there is a more insidious hidden cost --
distortion.  The accidental details of keyword management pose a
constant risk of distortion in language design.

One could consider the choice to use `@interface` instead of
`annotation` for annotations to be a distortion; having a descriptive
name rather than a funky combination of punctuation and keyword would
surely have made it easier for people to become familiar with

In another example, the set of modifiers (`public`, `private`,
`static`, `final`, etc) is not complete; there is no way to say “not
final” or “not static”. This, in turn, means that we cannot create
features where variables or classes are `final` by default, or members
are `static` by default, because there’s no way to denote the desire
to opt out of it.  While there may be reasons to justify a locally
suboptimal default anyway (such as global consistency), we want to
make these choices deliberately, not have them made for us by the
accidental details of keyword management. Choosing to leave out a
feature for reasons of simplicity is fine; leaving it out because we
don't have a way to denote the obvious semantics is not.

It may not be obvious from the outside, but this is a constant problem
in evolving the language, and an ongoing tax that we all pay, directly
or indirectly.

## We need a new source of keyword candidates

Every time we confront this problem, the overwhelming tendency is to
punt and pick one of the bad options, because the problem only comes
along every once in a while.  But, with the features in the pipeline, I
expect it will continue to come along with some frequency, and I’d
rather get ahead of it. Given that all of these current options are
problematic, and there is not even a least-problematic move that
applies across all situations, my inclination is to try to expand the
set of lexical forms that can be used as keywords.

As a not-serious example, take the convention that we’ve used for
experimental features, where we prefix provisional keywords in
prototypes with two underscores, as we did with `__ByValue` in the
Valhalla prototype. (We commonly do this in feature proposals and
prototypes, mostly to signify “this keyword is a placeholder for a
syntax decision to be made later”, but also because it permits a
simple implementation that is unlikely to collide with existing code.)
We could, for example, carve out the space of identifiers that begin
with underscore as being reserved for keywords. Of course, this isn’t
so pretty, and it also means we'd have a mix of underscore and
non-underscore keywords, so it’s not a serious suggestion, as much as
an example of the sort of move we are looking for.

But I do have a serious suggestion: allow _hyphenated_ keywords where
one or more of the terms are already keywords or reserved identifiers.
Unlike restricted keywords, this creates much less trouble for
parsing, as (for example) `non-null` cannot be confused for a
subtraction expression, and the lexer can always tell with fixed
lookahead whether `a-b` is three tokens or one. This gives us a lot
more room for creating new, less-conflicting keywords. And these new
keywords are likely to be good names, too, as many of the missing
concepts we want to add describe their relationship to existing
language constructs -- such as `non-null`.

Here’s some examples where this approach might yield credible
candidates. (Note: none of these are being proposed here; this is
merely an illustrative list of examples of how this mechanism could
form keywords that might, in some particular possible future, be
useful and better than the alternatives we have now.)

   - `non-null`
   - `non-final`
   - `package-private` (the default accessibility for class members, 
currently not denotable)
   - `public-read` (publicly readable, privately writable)
   - `null-checked`
   - `type-static` (a concept needed in Valhalla, which is static 
relative to a particular specialization of a class, rather than the 
class itself)
   - `default-value`
   - `eventually-final` (what the `@Stable` annotation currently suggests)
   - `semi-final` (an alternative to `sealed`)
   - `exhaustive-switch` (opting into exhaustiveness checking for statement
   - `enum-class`, `annotation-class`, `record-class` (we might have 
chosen these
      as an alternative to `enum` and `@interface`, had we had the option)
   - `this-class` (to describe the class literal for the current class)
   - `this-return` (a common request is a way to mark a setter or 
builder method
     as returning its receiver)

(Again, the point is not to debate the merits of any of these specific
examples; the point is merely to illustrate what we might be able to do
with such a mechanism.)

Having this as an option doesn't mean we can't also use the other
approaches when they are suitable; it just means we have more, and
likely less fraught, options with which to make better decisions.

There are likely to be other lexical schemes by which new keywords can
be created without impinging on existing code; this one seems credible
and reasonably parsable by both machines and humans.

#### "But that's ugly"

Invariably, some percentage of readers will have an immediate and
visceral reaction to this idea.  Let's stipulate for the record that
some people will find this ugly.  (At least, at first.  Many such
reactions are possibly-transient (see what I did there?) responses
to unfamiliarity.)

More information about the amber-spec-experts mailing list