REPL code review -- CompletenessAnalyzer

Tue Sep 22 05:46:15 UTC 2015

On 09/11/15 08:25, Maurizio Cimadamore wrote:
> * CompletenessAnalyzer
>
>    - Location position constants appear more complex than necessary 
> (or I'm missing something); it appears that constants like XEXPR1o, 
> XDECL1o, XSTMT1o are never used? The only uages I see are two:
>
>    - to build other constants (such as XEXPR1, XDECL1 and XSTMT1)
>
>    - in a switch inside parseUnit

Exactly, they define the bits and the semantics of those bits. They are 
applied as a set of bits.

The comments were broken, I'm updating as such --

     // Location position kinds -- a token is ...
     private static final int XEXPR         = 0b1;                       
// OK in expression (not first)
     private static final int XDECL         = 0b10;                      
// OK in declaration (not first)
     private static final int XSTMT         = 0b100;                     
// OK in statement framework (not first)
     private static final int XEXPR1o       = 0b1000;                    
// OK first in expression
     private static final int XDECL1o       = 0b10000;                   
// OK first in declaration
     private static final int XSTMT1o       = 0b100000;                  
// OK first or only in statement framework
     private static final int XEXPR1        = XEXPR1o | XEXPR;           
// OK in expression (anywhere)
     private static final int XDECL1        = XDECL1o | XDECL;           
// OK in declaration (anywhere)
     private static final int XSTMT1        = XSTMT1o | XSTMT;           
// OK in statement framework (anywhere)
     private static final int XANY1         = XEXPR1o | XDECL1o | 
XSTMT1o;  // Mask: first in statement, declaration, or expression
     private static final int XTERM         = 0b100000000;              
// Can terminate (last before EOF)
     private static final int XERRO         = 0b1000000000;             
// Is an error

So, for example "boolean" is XEXPR | XDECL1, which is XEXPR | XDECL | 
XDECL1o -- meaning it can occur in an expression (in a cast), or at the 
beginning or later in a declaration.  If we encounter "boolean" at the 
beginning of a snippet, parseUnit() masks to just look at allowed in 
first position, in this case XDECL1o.  So, it knows it is the beginning 
of a declaration.

>
> Since no token is really using those constants explicitly, maybe they 
> could be removed in favor of their more general counterparts (this 
> also kind of means that i.e. XEXPR1 == XEXPR, etc.)

See above, it is being used.

>
>    - I believe in general the design of location codes + 'belongs' 
> field feels weird, and given this code is quite convoluted (because 
> the task is hard!) the code might benefit from some cleanup. My 
> suggestions:
>     (1) come up with a cleaner enum which contains: { EXPR, DECL, 
> STMT, AMBIGUOUS, ERR } - use this for classification purposes in 
> parseUnit and related predicates (i.e. what is this token like?)
>     (2) extract the 'isOkToTerminate' property onto a separate boolean 
> flag on the token. After all, this is an orthogonal property of all 
> tokens.

Does my explanation clarify?  Am I still missing a simplification?

An enum won't work as they are unions of possibilities.

>
>    - watch out for commented out code/dead code (here and elsewhere)
>
> Sometimes the completeness analysis doesn't behave as expected:
>
> -> new
> >> ;
> >> ;
> >> ;
> >> ;
> >> ;
> >> ;
> >> Foo();
> |  Error:
> |  <identifier> expected
> |  new
> |     ^
> |  Error:
> |  '(' or '[' expected
> |  ;
> |  ^
>
> In the above, I found it confusing that the ';' is not enough to tell 
> the analysis logic that I'm done with the statement - and the analysis 
> will keep waiting until it matches a '()'.
>

OK, "new" has special processing that has makes it error prone.  In the 
webrev below I remove that special processing at a small loss of error 
detection, and add a general check for bad sequencing.

    http://cr.openjdk.java.net/~rfield/CompletenessAnalyzer_v0/

This fixes "new ;" but does not fix "(;".  The lexical analysis does not 
have enough context to know this isn't the beginning of a 
for-statement.  Basically, everything between matching parenthesis, 
brackets, or braces is currently ignored -- effectively "(...)", 
"[...]", or "{...}".   A result of this is, for example, that "else" is 
returned from completion as "unknown" (erroneous) so it is immediately 
parsed and an error returned; Whereas "(else" is returned from 
completion as "incomplete" prompting for more input.

To fix this, the nesting analysis would need to be moved into the parser 
(where is typically belongs) which would be a fairly significant 
change.  I will poke at that and see how complex it is. Your thoughts on 
its importance?

Thanks,
Robert