An alternative to "restricted keywords"
Remi Forax
forax at univ-mlv.fr
Fri May 12 19:21:19 UTC 2017
Hi Peter,
On May 12, 2017 6:08:58 PM GMT+02:00, Peter Levart <peter.levart at gmail.com> wrote:
>Hi Remi,
>
>On 05/12/2017 08:17 AM, Remi Forax wrote:
>> [CC JPMS expert mailing list because, it's an important issue IMO]
>>
>> I've a counter proposition.
>>
>> I do not like your proposal because from the user point of view, '^'
>looks like a hack, it's not used anywhere else in the grammar.
>> I agree that restricted keywords are not properly specified in JLS.
>Reading your mail, i've discovered that what i was calling restricted
>keywords is not what javac implements :(
>> I agree that restricted keywords should be only enabled when parsing
>module-info.java
>> I agree that doing error recovery on the way the grammar for
>module-info is currently implemented in javac leads to less than ideal
>error messages.
>>
>> In my opinion, both
>> module m { requires transitive transitive; }
>> module m { requires transitive; }
>> should be rejected because what javac implements something more close
>to the javascript ASI rules than restricted keywords as currently
>specified by Alex.
>>
>> For me, a restricted keyword is a keyword which is activated if you
>are at a position in the grammar where it can be recognized and because
>it's a keyword, it tooks over an identifier.
>> by example for
>> module m {
>> if the next token is 'requires', it should be recognized as a keyword
>because you can parse a directive 'required ...' so there is a
>production that will starts with the 'required' keyword.
>>
>> so
>> module m { requires transitive; }
>> should be rejected because transitive should be recognized as a
>keyword after requires and the compiler should report a missing module
>name.
>>
>> and
>> module m { requires transitive transitive; }
>> should be rejected because the grammar that parse the modifiers is
>defined as "a loop" so from the grammar point of view it's like
>> module m { requires Modifier Modifier; }
>> so the the front end of the compiler should report a missing module
>name and a later phase should report that there is twice the same
>modifier 'transitive'.
>>
>> I believe that with this definition of 'restricted keyword', compiler
>can recover error more easily and offers meaningful error message and
>the module-info part of the grammar is LR(1).
>
>This will make "requires", "uses", "provides", "with", "to", "static",
>"transitive", "exports", etc .... all illegal module names. Ok, no big
>deal, because there are no module names yet (apart from JDK modules and
>
>those are named differently). But...
you should use reverse DNS naming for module name, so no problem.
>
>What about:
>
>module m { exports transitive; }
>
>Here 'transitive' is an existing package name for example. Who
>guarantees that there are no packages out there with names matching
>restricted keywords? Current restriction for modules is that they can
>not have an unnamed package. Do we want to restrict package names a
>module can export too?
you should use reverse DNS naming for package so no problem :)
>
>Stephan's solution does not have this problem.
>
>Regards, Peter
I think those issues are not real problem.
Rémi
>
>>
>> regards,
>> Rémi
>>
>> ----- Mail original -----
>>> De: "Stephan Herrmann" <stephan.herrmann at berlin.de>
>>> À: jigsaw-dev at openjdk.java.net
>>> Envoyé: Mardi 9 Mai 2017 16:56:11
>>> Objet: An alternative to "restricted keywords"
>>> (1) I understand the need for avoiding that new module-related
>>> keywords conflict with existing code, where these words may be used
>>> as identifiers. Moreover, it must be possible for a module
>declaration
>>> to refer to packages or types thusly named.
>>>
>>> However,
>>>
>>> (2) The currently proposed "restricted keywords" are not
>appropriately
>>> specified in JLS.
>>>
>>> (3) The currently proposed "restricted keywords" pose difficulties
>to
>>> the implementation of all tools that need to parse a module
>declaration.
>>>
>>> (4) A simple alternative to "restricted keywords" exists, which has
>not
>>> received the attention it deserves.
>>>
>>> Details:
>>>
>>> (2) The current specification implicitly violates the assumption
>that
>>> parsing can be performed on the basis of a token stream produced by
>>> a scanner (aka lexer). From discussion on this list we learned that
>>> the following examples are intended to be syntactically legal:
>>> module m { requires transitive transitive; }
>>> module m { requires transitive; }
>>> (Please for the moment disregard heuristic solutions, while we are
>>> investigating whether generally "restricted keywords" is a
>well-defined
>>> concept, or not.)
>>> Of the three occurrences of "transitive", #1 is a keyword, the
>others
>>> are identifiers. At the point when the parser has consumed
>"requires"
>>> and now asks about classification of the word "transitive", the
>scanner
>>> cannot possible answer this classification. It can only answer for
>sure,
>>> after the *parser* has accepted the full declaration. Put
>differently,
>>> the parser must consume more tokens than have been classified by the
>>> Scanner. Put differently, to faithfully parse arbitrary grammars
>using
>>> a concept of "restricted keywords", scanners must provide
>speculative
>>> answers, which may later need to be revised by backtracking or
>similar
>>> exhaustive exploration of the space of possible interpretations.
>>>
>>> The specification is totally silent about this fundamental change.
>>>
>>>
>>> (3) "restricted keywords" pose three problems to tool
>implementations:
>>>
>>> (3.a) Any known practical approach to implement a parser with
>>> "restricted keywords" requires to leverage heuristics, which are
>based
>>> on the exact set of rules defined in the grammar. Such heuristics
>>> reduce the look-ahead that needs to be performed by the scanner,
>>> in order to avoid the full exhaustive exploration mentioned above.
>>> A set of such heuristic is extremely fragile and can easily break
>when
>>> later more rules are added to the grammar. This means small future
>>> language changes can easily break any chosen strategy.
>>>
>>> (3.b) If parsing works for error-free input, this doesn't imply that
>>> a parser will be able to give any useful answer for input with
>syntax
>>> errors. As a worst-case example consider an arbitrary input sequence
>>> consisting of just the two words "requires" and "transitive" in
>random
>>> order and with no punctuation.
>>> A parser will not be able to detect any structure in this sequence.
>>> By comparison, normal keywords serve as a baseline, where parsing
>>> typically can resume regardless of any leading garbage.
>>> While this is not relevant for normal compilation, it is paramount
>>> for assistive functions, which most of the time operate on
>incomplete
>>> text, likely to contain even syntax errors.
>>> Strictly speaking, any "module declaration" with syntax errors is
>>> not a ModuleDeclaration, and thus none of the "restrictive keywords"
>>> can be interpreted as keywords (which per JLS can only happen inside
>>> a ModuleDeclaration).
>>> All this means, that functionality like code completion is
>>> systematically broken in a language using "restricted keywords".
>>>
>>> (3.c) Other IDE functionality assumes that small fragments of the
>>> input text can be scanned out of context. The classical example here
>>> is syntax highlighting but there are more examples.
>>> Any such functionality has to be re-implemented, replacing the
>>> highly efficient local scanning with full parsing of the input text.
>>> For functionality that is implicitly invoked per keystroke, or on
>>> mouse hover etc, this difference in efficiency negatively affects
>>> the overall user experience of an IDE.
>>>
>>>
>>> (4) The following proposal avoids all difficulties described above:
>>>
>>> * open, module, requires, transitive, exports, opens, to, uses,
>>> provides, and with are "module words", to which the following
>>> interpretation is applied:
>>> * within any ordinary compilation unit, a module word is a
>normal
>>> identifier.
>>> * within a modular compilation unit, all module words are
>>> (unconditional) keywords.
>>> * We introduce three new auxiliary non-terminals:
>>> LegacyPackageName:
>>> LegacyIdentifier
>>> LegacyPackageName . LegacyIdentifier
>>> LegacyTypeName:
>>> LegacyIdentifier
>>> LegacyTypeName . LegacyIdentifier
>>> LegacyIdentifier:
>>> Identifier
>>> ^open
>>> ^module
>>> ...
>>> ^with
>>> * We modify all productions in 7.7, replacing PackageName with
>>> LegacyPackageName and replacing TypeName with LegacyTypeName.
>>> * After parsing, each of the words '^open', '^module' etc.
>>> is interpreted by removing the leading '^' (escape character).
>>>
>>> Here, '^' is chosen as the escape character following the precedent
>>> of Xtext. Plenty of other options for this purpose are possible,
>too.
>>>
>>>
>>>
>>> This proposal completely satisfies the requirements (1), and avoids
>>> all of the problems (2) and (3). There's an obvious price to pay:
>>> users will have to add the escape character when referring to code
>>> that uses a module word as a package name or type name.
>>>
>>> Not only is this a very low price compared to the benefits; one can
>>> even argue that it also helps the human reader of a module
>declaration,
>>> because it clearly marks which occurrences of a module word are
>indeed
>>> identifiers.
>>>
>>> An IDE can easily help in interactively adding escapes where
>necessary.
>>>
>>> Finally, in this trade-off it is relevant to consider the expected
>>> frequencies: legacy names (needing escape) will surely be the
>exception
>>> - by magnitudes. So, the little price needing to be paid, will only
>>> affect a comparatively small number of locations.
>>>
>>>
>>> Stephan
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
More information about the jpms-spec-experts
mailing list