Java Platform Module System

Wed May 3 22:06:56 UTC 2017

----- Mail original -----
> De: "Stephan Herrmann" <stephan.herrmann at berlin.de>
> À: jigsaw-dev at openjdk.java.net, "Remi Forax" <forax at univ-mlv.fr>, "Alex Buckley" <alex.buckley at oracle.com>
> Cc: "Brian Goetz" <Brian.Goetz at oracle.com>, "Dan Smith" <daniel.smith at oracle.com>
> Envoyé: Mercredi 3 Mai 2017 23:31:14
> Objet: Re: Java Platform Module System

> On 03.05.2017 20:55, Remi Forax wrote:
> > It's context-free because a context free grammar defined its input in term of
> > terminals and the theory do not say how to map a token to a terminal.
> >
> > Jay is right that it requires to use either some specific parser generator
> > like Tatoo [1] the one i've written 10 years ago (because i wanted the tool to
> > help me to extend a grammar easily) or to modify an existing parser generator so
> > the parser can send the production state to the lexer which will enable/disable
> > the automata that recognize the associated keywords .
> 
> Just feeding parser state into the Lexer doesn't cut it for Java 9,
> because the classification keyword / identifier cannot be made at
> the time when the stream passes the Lexer.

No, it's done between the lexer and the parser

> Let me remind you of this example:
>        module foo { exports transitive
> How should the poor lexer recognize in this situation that transitive
> is an identifier (sic) (if you complete the text accordingly)?

There is a simple solution, consider module, requires, etc as keyword in the lexer, and when the keyword is sent to the parser, downgrade it to an identifier if you are not at the right dotted production.

It's easy to implement if your lexer/parser is non blocking, i.e if you push a bytebuffer to the lexer and the lexer push terminals to the parser. 

the other solution used by Tatoo is to instead of having one giant automata that recognize all tokens, works with a list automata that recognized each token and activate them or not depending on the parser state.

> Aside from specific heuristics
> (which are not available to any parser generator),

any but one :)

> we only know about this classification after the parser has matched
> an entire declaration.

so i suppose your parser is LR, you know the classification from the dotted production just before the terminal is about to be recognized.
when you construct the LR table, you know for each dotted production what are the terminals that can appear so the parser generator can keep these info in a side table and during the parsing, from the parser state, find which terminals can be recognized.

> I'm not even sure that theory has a name for this kind of grammar.
> Maybe we should speak of a constraint solver rather than a parser.

no need to have a constraint solver here, you need to export the terminals that will lead to a shift or a reduce for any LR states.  

> 
> Stephan
> 

Rémi

>>
>> Rémi
>>
>> [1] http://dl.acm.org/citation.cfm?id=1168057
>>
>> ----- Mail original -----
>>> De: "Alex Buckley" <alex.buckley at oracle.com>
>>> À: "Jayaprakash Arthanareeswaran" <jarthana at in.ibm.com>, "Dan Smith"
>>> <daniel.smith at oracle.com>, "Brian Goetz"
>>> <Brian.Goetz at oracle.com>
>>> Cc: jigsaw-dev at openjdk.java.net
>>> Envoyé: Mercredi 3 Mai 2017 19:46:54
>>> Objet: Re: Java Platform Module System
>>
>>> On 5/2/2017 3:39 PM, Alex Buckley wrote:
>>>> On 5/2/2017 7:07 AM, Jayaprakash Arthanareeswaran wrote:
>>>>> Chapter 2 in [1] describes context-free grammars. The addition to "3.9
>>>>> Keywords" defines "restricted keywords", which prevent the grammar for
>>>>> ModuleDeclaration from being context-free. This prevents compilers from
>>>>> using common parser generators, since those typically only support
>>>>> context-free grammars. The lexical/syntactic grammar split defined in
>>>>> chapter 2 is not of much use for actual implementations of
>>>>> module-info.java parsers.
>>>>> The spec at least needs to point out that the given grammar for
>>>>> ModuleDeclaration is not actually context-free.
>>>>
>>>> The syntactic grammar in JLS8 was not context-free either; the opening
>>>> line of Chapter 2 has been false for years. For JLS9, I will remove the
>>>> claim that the lexical and syntactic grammars are context-free, and
>>>> perhaps a future JLS can discuss the difficulties in parsing the
>>>
>>> Jan Lahoda pointed out privately that the syntactic grammar in JLS8 and
>>> JLS9 is in fact context-free -- it's just not LL(1). Not being LL(1) is
>>> what I should have said the grammar hasn't been for a long time.
>>>
> >> Alex