An alternative to "restricted keywords"
Remi Forax
forax at univ-mlv.fr
Fri May 12 06:17:35 UTC 2017
[CC JPMS expert mailing list because, it's an important issue IMO]
I've a counter proposition.
I do not like your proposal because from the user point of view, '^' looks like a hack, it's not used anywhere else in the grammar.
I agree that restricted keywords are not properly specified in JLS. Reading your mail, i've discovered that what i was calling restricted keywords is not what javac implements :(
I agree that restricted keywords should be only enabled when parsing module-info.java
I agree that doing error recovery on the way the grammar for module-info is currently implemented in javac leads to less than ideal error messages.
In my opinion, both
module m { requires transitive transitive; }
module m { requires transitive; }
should be rejected because what javac implements something more close to the javascript ASI rules than restricted keywords as currently specified by Alex.
For me, a restricted keyword is a keyword which is activated if you are at a position in the grammar where it can be recognized and because it's a keyword, it tooks over an identifier.
by example for
module m {
if the next token is 'requires', it should be recognized as a keyword because you can parse a directive 'required ...' so there is a production that will starts with the 'required' keyword.
so
module m { requires transitive; }
should be rejected because transitive should be recognized as a keyword after requires and the compiler should report a missing module name.
and
module m { requires transitive transitive; }
should be rejected because the grammar that parse the modifiers is defined as "a loop" so from the grammar point of view it's like
module m { requires Modifier Modifier; }
so the the front end of the compiler should report a missing module name and a later phase should report that there is twice the same modifier 'transitive'.
I believe that with this definition of 'restricted keyword', compiler can recover error more easily and offers meaningful error message and the module-info part of the grammar is LR(1).
regards,
Rémi
----- Mail original -----
> De: "Stephan Herrmann" <stephan.herrmann at berlin.de>
> À: jigsaw-dev at openjdk.java.net
> Envoyé: Mardi 9 Mai 2017 16:56:11
> Objet: An alternative to "restricted keywords"
> (1) I understand the need for avoiding that new module-related
> keywords conflict with existing code, where these words may be used
> as identifiers. Moreover, it must be possible for a module declaration
> to refer to packages or types thusly named.
>
> However,
>
> (2) The currently proposed "restricted keywords" are not appropriately
> specified in JLS.
>
> (3) The currently proposed "restricted keywords" pose difficulties to
> the implementation of all tools that need to parse a module declaration.
>
> (4) A simple alternative to "restricted keywords" exists, which has not
> received the attention it deserves.
>
> Details:
>
> (2) The current specification implicitly violates the assumption that
> parsing can be performed on the basis of a token stream produced by
> a scanner (aka lexer). From discussion on this list we learned that
> the following examples are intended to be syntactically legal:
> module m { requires transitive transitive; }
> module m { requires transitive; }
> (Please for the moment disregard heuristic solutions, while we are
> investigating whether generally "restricted keywords" is a well-defined
> concept, or not.)
> Of the three occurrences of "transitive", #1 is a keyword, the others
> are identifiers. At the point when the parser has consumed "requires"
> and now asks about classification of the word "transitive", the scanner
> cannot possible answer this classification. It can only answer for sure,
> after the *parser* has accepted the full declaration. Put differently,
> the parser must consume more tokens than have been classified by the
> Scanner. Put differently, to faithfully parse arbitrary grammars using
> a concept of "restricted keywords", scanners must provide speculative
> answers, which may later need to be revised by backtracking or similar
> exhaustive exploration of the space of possible interpretations.
>
> The specification is totally silent about this fundamental change.
>
>
> (3) "restricted keywords" pose three problems to tool implementations:
>
> (3.a) Any known practical approach to implement a parser with
> "restricted keywords" requires to leverage heuristics, which are based
> on the exact set of rules defined in the grammar. Such heuristics
> reduce the look-ahead that needs to be performed by the scanner,
> in order to avoid the full exhaustive exploration mentioned above.
> A set of such heuristic is extremely fragile and can easily break when
> later more rules are added to the grammar. This means small future
> language changes can easily break any chosen strategy.
>
> (3.b) If parsing works for error-free input, this doesn't imply that
> a parser will be able to give any useful answer for input with syntax
> errors. As a worst-case example consider an arbitrary input sequence
> consisting of just the two words "requires" and "transitive" in random
> order and with no punctuation.
> A parser will not be able to detect any structure in this sequence.
> By comparison, normal keywords serve as a baseline, where parsing
> typically can resume regardless of any leading garbage.
> While this is not relevant for normal compilation, it is paramount
> for assistive functions, which most of the time operate on incomplete
> text, likely to contain even syntax errors.
> Strictly speaking, any "module declaration" with syntax errors is
> not a ModuleDeclaration, and thus none of the "restrictive keywords"
> can be interpreted as keywords (which per JLS can only happen inside
> a ModuleDeclaration).
> All this means, that functionality like code completion is
> systematically broken in a language using "restricted keywords".
>
> (3.c) Other IDE functionality assumes that small fragments of the
> input text can be scanned out of context. The classical example here
> is syntax highlighting but there are more examples.
> Any such functionality has to be re-implemented, replacing the
> highly efficient local scanning with full parsing of the input text.
> For functionality that is implicitly invoked per keystroke, or on
> mouse hover etc, this difference in efficiency negatively affects
> the overall user experience of an IDE.
>
>
> (4) The following proposal avoids all difficulties described above:
>
> * open, module, requires, transitive, exports, opens, to, uses,
> provides, and with are "module words", to which the following
> interpretation is applied:
> * within any ordinary compilation unit, a module word is a normal
> identifier.
> * within a modular compilation unit, all module words are
> (unconditional) keywords.
> * We introduce three new auxiliary non-terminals:
> LegacyPackageName:
> LegacyIdentifier
> LegacyPackageName . LegacyIdentifier
> LegacyTypeName:
> LegacyIdentifier
> LegacyTypeName . LegacyIdentifier
> LegacyIdentifier:
> Identifier
> ^open
> ^module
> ...
> ^with
> * We modify all productions in 7.7, replacing PackageName with
> LegacyPackageName and replacing TypeName with LegacyTypeName.
> * After parsing, each of the words '^open', '^module' etc.
> is interpreted by removing the leading '^' (escape character).
>
> Here, '^' is chosen as the escape character following the precedent
> of Xtext. Plenty of other options for this purpose are possible, too.
>
>
>
> This proposal completely satisfies the requirements (1), and avoids
> all of the problems (2) and (3). There's an obvious price to pay:
> users will have to add the escape character when referring to code
> that uses a module word as a package name or type name.
>
> Not only is this a very low price compared to the benefits; one can
> even argue that it also helps the human reader of a module declaration,
> because it clearly marks which occurrences of a module word are indeed
> identifiers.
>
> An IDE can easily help in interactively adding escapes where necessary.
>
> Finally, in this trade-off it is relevant to consider the expected
> frequencies: legacy names (needing escape) will surely be the exception
> - by magnitudes. So, the little price needing to be paid, will only
> affect a comparatively small number of locations.
>
>
> Stephan
More information about the jpms-spec-experts
mailing list