An alternative to "restricted keywords" + helping automatic modules

Fri May 19 15:38:04 UTC 2017

I haven't seen any reaction to this sub-topic:

- If an automatic module would have a name containing a Java keyword,
   is it OK to simply refuse handling this artifact as an automatic module?

Stephan

On 18.05.2017 10:59, Stephan Herrmann wrote:
> Remi,
>
> I see your proposal as a minimal compromise, avoiding the worst
> of difficulties, but I think we can do better.
>
> Trade-off:
> In all posts I could not find a real reason against escaping,
> aside from aesthetics. I don't see this as sufficient motivation
> for a less-then-perfect solution.
>
>
> Clarity:
> I'm still not completely following your explanations, partly because
> of the jargon you are using. I'll leave it to Alex to decide if he
> likes the idea that JLS would have to explain terms like dotted
> production.
>
> Compare this to just adding a few more rules to the grammar,
> where no hand-waving is needed for an explanation.
> No, I did not say that escaping is a pervasive change.
> I never said that the grammar for ordinary compilation units
> should be changed.
> If you like we only need to extend one rule for the scope of
> modular compilation units: Identifier. It can't get simpler.
>
>
> Completeness:
> I understand you as saying, module names cannot start with
> "transitive". Mind you, that every modifier that will be added
> to the grammar for modules in the future will cause conflicts for
> names that are now legal, and you won't have a means to resolve this.
>
> By contrast, we can use the escaping approach even to solve one
> more problem that has been briefly touched on this list before:
>
> Automatic modules suffer from the fact that some artifact names may
> have Java keywords in their name, which means that these artifacts
> simply cannot be used as automatic modules, right?
> Why not apply escaping also here? *Any* dot-separated sequence
> of words could be used as module name, as long as module references
> have a means to escape any keywords in that sequence.
>
>
> Suitability for implementation:
> As said, your proposal resolves one problem, but still IDE
> functionality suffers from restricted keywords, because scanning
> and parsing need more context information than normal.
> - Recovery after a syntax error will regress.
> - Scanning arbitrary regions of code is not possible.
> Remember:
> In an IDE code with syntax errors is the norm, not an exception,
> as the IDE provides functionality to work on incomplete code.
>
>
> Stephan
>
>
> On 18.05.2017 00:34, Remi Forax wrote:
>> I want to answer this before we start the meetings because i really think that restricted keyword as i propose solve the issues
>> Stephan raised.
>>
>>
>> ----- Mail original -----
>>> De: "Stephan Herrmann" <stephan.herrmann at berlin.de>
>>> À: jigsaw-dev at openjdk.java.net
>>> Envoyé: Mardi 16 Mai 2017 11:49:45
>>> Objet: Re: An alternative to "restricted keywords"
>>
>>> Thanks, Remi, for taking this to the EG list.
>>>
>>> Some collected responses:
>>>
>>>
>>> Remi: "from the user point of view, '^' looks like a hack"
>>>
>>> This is, of course, a subjective statement. I don't share this view
>>> and in years of experience with Xtext-languages (where this concept
>>> is used by default) I never heard any user complain about this.
>>>
>>> More importantly, I hold that such aesthetic considerations are of
>>> much lesser significance than the question, whether we can explain
>>> - unambiguously explain - the concept in a few simple sentences.
>>> Explaining must be possible at two levels: in a rigorous specification
>>> and in simple words for users of the language.
>>
>> I'm not against ^, or ` as it has already asked to escape an identifier, but as you said it's a pervasive change that applies on
>> the whole grammar while i think that with restricted keyword (that really should be called local keywords) the changes only impact
>> the grammar that specifies a module-info.java
>>
>>>
>>> Remi: "a keyword which is activated if you are at a position in the
>>>  grammar where it can be recognized".
>>>
>>> I don't think 'being at a position in the grammar' is a good way of
>>> explaining. Parsing doesn't generally have one position in a grammar,
>>> multiple productions can be active in the same parser state.
>>> Also speaking of a "loop" for modifiers seems to complicate matters
>>> more than necessary.
>>>
>>> Under these considerations I still see '^' as the clearest of all
>>> solutions. Clear as a specification, simple to explain to users.
>>
>> Eclipse uses a LR parser, for a LR parser, position == dotted production as i have written earlier, so no problem because it
>> corresponds to only one parser state.  Note that even if one do not use an LR or a LL parser, most hand written parser i've seen,
>> javac is one of them, also refers to dotted production in the comments of the corresponding methods.
>>
>>>
>>>
>>>
>>> Peter spoke about module names vs. package names.
>>>
>>> I think we agree, that module names cannot use "module words",
>>> whereas package names should be expected to contain them.
>>
>> yes, that the main issue, package names may contains unqualified name like 'transitive, ''with' or 'to'.
>> but i think people will also want to use existing package or more exactly prefix of existing package as module name, so we should
>> also support having restricted keyword name as part of a module name.
>>
>> The grammar is:
>>
>>   open? module module_name {
>>     requires (transitive | static)* module_name;
>>     exports package_name;
>>     exports package_name to module_name1, module_name2;
>>     opens package_name;
>>     opens package_name to module_name1, module_name2;
>>     uses xxx;
>>     provides xxx with xxx, yyy;
>>   }
>>
>> If we just consider package name, only 'opens' and 'exports' are followed by a package name and a package name can only been
>> followed by ';' or 'to', so once 'opens' is parsed, you know that you can have only an identifier so if it's not an identifier by
>> one of the restricted keywords, it should be considered as an identifier.
>>
>> As i said earlier, the scanner can see the restricted keyword as keyword and before feeding the token to the parser, you can check
>> the parser state to see if the keyword as to be lowered to an identifier or not.
>>
>> For module name, there is the supplementary problem of transitive, because if a module starts with transitive, you can have a
>> conflict. As i said earlier, instead of using the next token to know if transitive is the keyword or part of the module name, i
>> think we should consider it as a keyword, as the JLS said a restricted keyword is activated when it can appear, so "requires
>> transitive" is not a valid directive.
>>
>>>
>>> Remi: "you should use reverse DNS naming for package so no problem :)"
>>>
>>> "to" is a "module word" and a TLD.
>>> I think we should be very careful in judging that a existing conflict
>>> is not a real problem. Better to clearly and rigorously avoid the
>>> conflict in the first place.
>>
>> to as the first part of a package/module and to as in exports ... to can not be present on the same dotted production, because
>> exports as to be followed by a package_name so 'to' here means the start of a package name and then because a package name can not
>> ends with '.' you always know if you are inside the production recognizing the package_name or outside matching the to of the
>> directive exports.
>>
>>>
>>>
>>>
>>> Some additional notes from my side:
>>>
>>> In the escape-approach, it may be prudent to technically allow
>>> escaping even words that are identifiers in Java 9, but could become
>>> keywords in a future version. This ensures that modules which need
>>> more escaping in Java 9+X can still be parsed in Java 9.
>>
>> yes, that's why i think that escaping is not the right mechanism here, because we want to solve a very local problem so we do not
>> need a global grammar-wise way to solve our problem.
>>
>>>
>>>
>>> Current focus was on names of modules, packages and types.
>>> A complete solution must also give an answer for annotations on modules.
>>> Some possible solutions:
>>> a. Assume that annotations for modules are designed with modules in mind
>>>    and thus have to avoid any module words in their names.
>>> b. Support escaping also in annotations
>>> c. Refine the scope where "module words" are keywords, let it start only
>>>    when the word "module" or the group "open module" has been consumed.
>>>    This would make the words "module" and "open" special, as being
>>>    switch words, where we switch from one language to another.
>>>    (For this I previously coined the term "scoped keywords" [1])
>>
>> For annotation, again, because annotation name are qualified, you now when you see 'module' if you are in the middle of the
>> annotation name or if you are outside.
>>
>>>
>>>
>>> I think we all agree that the conflicts we are solving here are rare
>>> corner cases. Most names do not contain module words. Still, from a
>>> conceptual and technical p.o.v. the solution must be bullet proof.
>>> But there's no need to be afraid of module declarations being spammed
>>> with dozens of '^' characters. Realistically, this will not happen.
>>>
>>
>> I agree, and i strongly believe that scoped keyword, local keywords or restricted keywords, i.e. whatever the name, keywords that
>> are keywords or identifiers depending on the parser state are the general mechanism that solve our problem.
>>
>>> Stephan
>>>
>>> [1] http://www.objectteams.org/def/1.3/sA.html#sA.0.1
>>>
>>
>> Rémi
>>
>