REPL code review -- CompletenessAnalyzer

Tue Sep 22 09:50:21 UTC 2015

On 22/09/15 06:46, Robert Field wrote:
>
> On 09/11/15 08:25, Maurizio Cimadamore wrote:
>> * CompletenessAnalyzer
>>
>>    - Location position constants appear more complex than necessary 
>> (or I'm missing something); it appears that constants like XEXPR1o, 
>> XDECL1o, XSTMT1o are never used? The only uages I see are two:
>>
>>    - to build other constants (such as XEXPR1, XDECL1 and XSTMT1)
>>
>>    - in a switch inside parseUnit
>
> Exactly, they define the bits and the semantics of those bits. They 
> are applied as a set of bits.
>
> The comments were broken, I'm updating as such --
>
>     // Location position kinds -- a token is ...
>     private static final int XEXPR         = 
> 0b1;                       // OK in expression (not first)
>     private static final int XDECL         = 
> 0b10;                      // OK in declaration (not first)
>     private static final int XSTMT         = 
> 0b100;                     // OK in statement framework (not first)
>     private static final int XEXPR1o       = 
> 0b1000;                    // OK first in expression
>     private static final int XDECL1o       = 
> 0b10000;                   // OK first in declaration
>     private static final int XSTMT1o       = 
> 0b100000;                  // OK first or only in statement framework
>     private static final int XEXPR1        = XEXPR1o | 
> XEXPR;           // OK in expression (anywhere)
>     private static final int XDECL1        = XDECL1o | 
> XDECL;           // OK in declaration (anywhere)
>     private static final int XSTMT1        = XSTMT1o | 
> XSTMT;           // OK in statement framework (anywhere)
>     private static final int XANY1         = XEXPR1o | XDECL1o | 
> XSTMT1o;  // Mask: first in statement, declaration, or expression
>     private static final int XTERM         = 0b100000000;              
> // Can terminate (last before EOF)
>     private static final int XERRO         = 0b1000000000;             
> // Is an error
>
> So, for example "boolean" is XEXPR | XDECL1, which is XEXPR | XDECL | 
> XDECL1o -- meaning it can occur in an expression (in a cast), or at 
> the beginning or later in a declaration.  If we encounter "boolean" at 
> the beginning of a snippet, parseUnit() masks to just look at allowed 
> in first position, in this case XDECL1o.  So, it knows it is the 
> beginning of a declaration.
>
>>
>> Since no token is really using those constants explicitly, maybe they 
>> could be removed in favor of their more general counterparts (this 
>> also kind of means that i.e. XEXPR1 == XEXPR, etc.)
>
> See above, it is being used.
I see now - what I was missing is that 'is at the beginning?' property 
is not a property of the token as a whole - but it is a 
context-dependent property; i.e. a boolean can either be at the 
beginning or not depending on whether it is parsed as a statement or as 
an expression - hence the need for multiple flags (where in my mind only 
one would have been sufficient). Thanks for taking the time to explain - 
the example was useful!
>
>>
>>    - I believe in general the design of location codes + 'belongs' 
>> field feels weird, and given this code is quite convoluted (because 
>> the task is hard!) the code might benefit from some cleanup. My 
>> suggestions:
>>     (1) come up with a cleaner enum which contains: { EXPR, DECL, 
>> STMT, AMBIGUOUS, ERR } - use this for classification purposes in 
>> parseUnit and related predicates (i.e. what is this token like?)
>>     (2) extract the 'isOkToTerminate' property onto a separate 
>> boolean flag on the token. After all, this is an orthogonal property 
>> of all tokens.
>
> Does my explanation clarify?  Am I still missing a simplification?
>
> An enum won't work as they are unions of possibilities.
Yes, your explanation helps - my aim was to separate between three 
orthogonal properties:

* what is the kind of this token?
* is this token a beginning?
* is this token a valid termination?

So that the last two could be moved onto separate boolean flags. But, as 
you point out, you need a richer set of flags to express things like 
'boolean'.

As for enums and union, EnumSet is still your friend, if needs be.
>
>>
>>    - watch out for commented out code/dead code (here and elsewhere)
>>
>> Sometimes the completeness analysis doesn't behave as expected:
>>
>> -> new
>> >> ;
>> >> ;
>> >> ;
>> >> ;
>> >> ;
>> >> ;
>> >> Foo();
>> |  Error:
>> |  <identifier> expected
>> |  new
>> |     ^
>> |  Error:
>> |  '(' or '[' expected
>> |  ;
>> |  ^
>>
>> In the above, I found it confusing that the ';' is not enough to tell 
>> the analysis logic that I'm done with the statement - and the 
>> analysis will keep waiting until it matches a '()'.
>>
>
> OK, "new" has special processing that has makes it error prone. In the 
> webrev below I remove that special processing at a small loss of error 
> detection, and add a general check for bad sequencing.
>
>    http://cr.openjdk.java.net/~rfield/CompletenessAnalyzer_v0/
>
> This fixes "new ;" but does not fix "(;".  The lexical analysis does 
> not have enough context to know this isn't the beginning of a 
> for-statement.  Basically, everything between matching parenthesis, 
> brackets, or braces is currently ignored -- effectively "(...)", 
> "[...]", or "{...}".   A result of this is, for example, that "else" 
> is returned from completion as "unknown" (erroneous) so it is 
> immediately parsed and an error returned; Whereas "(else" is returned 
> from completion as "incomplete" prompting for more input.
>
> To fix this, the nesting analysis would need to be moved into the 
> parser (where is typically belongs) which would be a fairly 
> significant change.  I will poke at that and see how complex it is. 
> Your thoughts on its importance?
I don't think it's very important - the confusing part is that when you 
make a mistake (as I did in the case above) it is hard to realize what 
the REPL is expecting from you in order to mark the statement as 'complete'.
Would some hint help here - i.e. "I'm waiting for (...)"

Also, while you might not be able to discard "(;" immediately, we should 
at least try to discard "(;;;", as that's not compatible with a for anyway.

Maurizio
>
> Thanks,
> Robert
>