An alternative to "restricted keywords"

Fri May 12 06:17:35 UTC 2017

[CC JPMS expert mailing list because, it's an important issue IMO]

I've a counter proposition.

I do not like your proposal because from the user point of view, '^' looks like a hack, it's not used anywhere else in the grammar. 
I agree that restricted keywords are not properly specified in JLS. Reading your mail, i've discovered that what i was calling restricted keywords is not what javac implements :(
I agree that restricted keywords should be only enabled when parsing module-info.java
I agree that doing error recovery on the way the grammar for module-info is currently implemented in javac leads to less than ideal error messages.

In my opinion, both
   module m { requires transitive transitive; }
   module m { requires transitive; }
should be rejected because what javac implements something more close to the javascript ASI rules than restricted keywords as currently specified by Alex.

For me, a restricted keyword is a keyword which is activated if you are at a position in the grammar where it can be recognized and because it's a keyword, it tooks over an identifier.
by example for 
  module m { 
if the next token is 'requires', it should be recognized as a keyword because you can parse a directive 'required ...' so there is a production that will starts with the 'required' keyword.

so 
  module m { requires transitive; }
should be rejected because transitive should be recognized as a keyword after requires and the compiler should report a missing module name.

and
  module m { requires transitive transitive; }
should be rejected because the grammar that parse the modifiers is defined as "a loop" so from the grammar point of view it's like
  module m { requires Modifier Modifier; }
so the the front end of the compiler should report a missing module name and a later phase should report that there is twice the same modifier 'transitive'.

I believe that with this definition of 'restricted keyword', compiler can recover error more easily and offers meaningful error message and the module-info part of the grammar is LR(1).

regards,
Rémi

----- Mail original -----
> De: "Stephan Herrmann" <stephan.herrmann at berlin.de>
> À: jigsaw-dev at openjdk.java.net
> Envoyé: Mardi 9 Mai 2017 16:56:11
> Objet: An alternative to "restricted keywords"

> (1) I understand the need for avoiding that new module-related
> keywords conflict with existing code, where these words may be used
> as identifiers. Moreover, it must be possible for a module declaration
> to refer to packages or types thusly named.
> 
> However,
> 
> (2) The currently proposed "restricted keywords" are not appropriately
> specified in JLS.
> 
> (3) The currently proposed "restricted keywords" pose difficulties to
> the implementation of all tools that need to parse a module declaration.
> 
> (4) A simple alternative to "restricted keywords" exists, which has not
> received the attention it deserves.
> 
> Details:
> 
> (2) The current specification implicitly violates the assumption that
> parsing can be performed on the basis of a token stream produced by
> a scanner (aka lexer). From discussion on this list we learned that
> the following examples are intended to be syntactically legal:
>    module m { requires transitive transitive; }
>    module m { requires transitive; }
> (Please for the moment disregard heuristic solutions, while we are
>  investigating whether generally "restricted keywords" is a well-defined
>  concept, or not.)
> Of the three occurrences of "transitive", #1 is a keyword, the others
> are identifiers. At the point when the parser has consumed "requires"
> and now asks about classification of the word "transitive", the scanner
> cannot possible answer this classification. It can only answer for sure,
> after the *parser* has accepted the full declaration. Put differently,
> the parser must consume more tokens than have been classified by the
> Scanner. Put differently, to faithfully parse arbitrary grammars using
> a concept of "restricted keywords", scanners must provide speculative
> answers, which may later need to be revised by backtracking or similar
> exhaustive exploration of the space of possible interpretations.
> 
> The specification is totally silent about this fundamental change.
> 
> 
> (3) "restricted keywords" pose three problems to tool implementations:
> 
> (3.a) Any known practical approach to implement a parser with
> "restricted keywords" requires to leverage heuristics, which are based
> on the exact set of rules defined in the grammar. Such heuristics
> reduce the look-ahead that needs to be performed by the scanner,
> in order to avoid the full exhaustive exploration mentioned above.
> A set of such heuristic is extremely fragile and can easily break when
> later more rules are added to the grammar. This means small future
> language changes can easily break any chosen strategy.
> 
> (3.b) If parsing works for error-free input, this doesn't imply that
> a parser will be able to give any useful answer for input with syntax
> errors. As a worst-case example consider an arbitrary input sequence
> consisting of just the two words "requires" and "transitive" in random
> order and with no punctuation.
> A parser will not be able to detect any structure in this sequence.
> By comparison, normal keywords serve as a baseline, where parsing
> typically can resume regardless of any leading garbage.
> While this is not relevant for normal compilation, it is paramount
> for assistive functions, which most of the time operate on incomplete
> text, likely to contain even syntax errors.
> Strictly speaking, any "module declaration" with syntax errors is
> not a ModuleDeclaration, and thus none of the "restrictive keywords"
> can be interpreted as keywords (which per JLS can only happen inside
> a ModuleDeclaration).
> All this means, that functionality like code completion is
> systematically broken in a language using "restricted keywords".
> 
> (3.c) Other IDE functionality assumes that small fragments of the
> input text can be scanned out of context. The classical example here
> is syntax highlighting but there are more examples.
> Any such functionality has to be re-implemented, replacing the
> highly efficient local scanning with full parsing of the input text.
> For functionality that is implicitly invoked per keystroke, or on
> mouse hover etc, this difference in efficiency negatively affects
> the overall user experience of an IDE.
> 
> 
> (4) The following proposal avoids all difficulties described above:
> 
> * open, module, requires, transitive, exports, opens, to, uses,
>    provides, and with are "module words", to which the following
>    interpretation is applied:
>    * within any ordinary compilation unit, a module word is a normal
>      identifier.
>    * within a modular compilation unit, all module words are
>      (unconditional) keywords.
> * We introduce three new auxiliary non-terminals:
>      LegacyPackageName:
>          LegacyIdentifier
>          LegacyPackageName . LegacyIdentifier
>      LegacyTypeName:
>          LegacyIdentifier
>          LegacyTypeName . LegacyIdentifier
>      LegacyIdentifier:
>          Identifier
>          ^open
>          ^module
>          ...
>          ^with
> * We modify all productions in 7.7, replacing PackageName with
>   LegacyPackageName and replacing TypeName with LegacyTypeName.
> * After parsing, each of the words '^open', '^module' etc.
>   is interpreted by removing the leading '^' (escape character).
> 
> Here, '^' is chosen as the escape character following the precedent
> of Xtext. Plenty of other options for this purpose are possible, too.
> 
> 
> 
> This proposal completely satisfies the requirements (1), and avoids
> all of the problems (2) and (3). There's an obvious price to pay:
> users will have to add the escape character when referring to code
> that uses a module word as a package name or type name.
> 
> Not only is this a very low price compared to the benefits; one can
> even argue that it also helps the human reader of a module declaration,
> because it clearly marks which occurrences of a module word are indeed
> identifiers.
> 
> An IDE can easily help in interactively adding escapes where necessary.
> 
> Finally, in this trade-off it is relevant to consider the expected
> frequencies: legacy names (needing escape) will surely be the exception
> - by magnitudes. So, the little price needing to be paid, will only
> affect a comparatively small number of locations.
> 
> 
> Stephan