REPL code review -- CompletenessAnalyzer

Tue Sep 22 17:29:12 UTC 2015

Back on the completeness analyzer flags - I guess I still think there's 
some redundancy - i.e.

you have:

private static final int XEXPR         = 0b1; // OK in expression
private static final int XEXPR1o       = 0b1000; // First in statement
private static final int XEXPR1        = XEXPR1o | XEXPR; // or later

But in reality you end up using either XEXPR or XEXPR1.

So, can we at least get rid of the 'XEXPR1o' flags (and similar) on the 
basis that when 'XEXPR1o' is used, you really mean 'XEXPR1' ?

Maurizio

On 22/09/15 18:10, Robert Field wrote:
>
> On 09/22/15 02:50, Maurizio Cimadamore wrote:
>>
>>
>> On 22/09/15 06:46, Robert Field wrote:
>>>
>>> On 09/11/15 08:25, Maurizio Cimadamore wrote:
>>>> * CompletenessAnalyzer
>>>>
>>>>    - Location position constants appear more complex than necessary 
>>>> (or I'm missing something); it appears that constants like XEXPR1o, 
>>>> XDECL1o, XSTMT1o are never used? The only uages I see are two:
>>>>
>>>>    - to build other constants (such as XEXPR1, XDECL1 and XSTMT1)
>>>>
>>>>    - in a switch inside parseUnit
>>>
>>> Exactly, they define the bits and the semantics of those bits. They 
>>> are applied as a set of bits.
>>>
>>> The comments were broken, I'm updating as such --
>>>
>>>     // Location position kinds -- a token is ...
>>>     private static final int XEXPR         = 
>>> 0b1;                       // OK in expression (not first)
>>>     private static final int XDECL         = 
>>> 0b10;                      // OK in declaration (not first)
>>>     private static final int XSTMT         = 
>>> 0b100;                     // OK in statement framework (not first)
>>>     private static final int XEXPR1o       = 
>>> 0b1000;                    // OK first in expression
>>>     private static final int XDECL1o       = 
>>> 0b10000;                   // OK first in declaration
>>>     private static final int XSTMT1o       = 
>>> 0b100000;                  // OK first or only in statement framework
>>>     private static final int XEXPR1        = XEXPR1o | 
>>> XEXPR;           // OK in expression (anywhere)
>>>     private static final int XDECL1        = XDECL1o | 
>>> XDECL;           // OK in declaration (anywhere)
>>>     private static final int XSTMT1        = XSTMT1o | 
>>> XSTMT;           // OK in statement framework (anywhere)
>>>     private static final int XANY1         = XEXPR1o | XDECL1o | 
>>> XSTMT1o;  // Mask: first in statement, declaration, or expression
>>>     private static final int XTERM         = 
>>> 0b100000000;              // Can terminate (last before EOF)
>>>     private static final int XERRO         = 
>>> 0b1000000000;             // Is an error
>>>
>>> So, for example "boolean" is XEXPR | XDECL1, which is XEXPR | XDECL 
>>> | XDECL1o -- meaning it can occur in an expression (in a cast), or 
>>> at the beginning or later in a declaration.  If we encounter 
>>> "boolean" at the beginning of a snippet, parseUnit() masks to just 
>>> look at allowed in first position, in this case XDECL1o.  So, it 
>>> knows it is the beginning of a declaration.
>>>
>>>>
>>>> Since no token is really using those constants explicitly, maybe 
>>>> they could be removed in favor of their more general counterparts 
>>>> (this also kind of means that i.e. XEXPR1 == XEXPR, etc.)
>>>
>>> See above, it is being used.
>> I see now - what I was missing is that 'is at the beginning?' 
>> property is not a property of the token as a whole - but it is a 
>> context-dependent property; i.e. a boolean can either be at the 
>> beginning or not depending on whether it is parsed as a statement or 
>> as an expression - hence the need for multiple flags (where in my 
>> mind only one would have been sufficient). Thanks for taking the time 
>> to explain - the example was useful!
>>>
>>>>
>>>>    - I believe in general the design of location codes + 'belongs' 
>>>> field feels weird, and given this code is quite convoluted (because 
>>>> the task is hard!) the code might benefit from some cleanup. My 
>>>> suggestions:
>>>>     (1) come up with a cleaner enum which contains: { EXPR, DECL, 
>>>> STMT, AMBIGUOUS, ERR } - use this for classification purposes in 
>>>> parseUnit and related predicates (i.e. what is this token like?)
>>>>     (2) extract the 'isOkToTerminate' property onto a separate 
>>>> boolean flag on the token. After all, this is an orthogonal 
>>>> property of all tokens.
>>>
>>> Does my explanation clarify?  Am I still missing a simplification?
>>>
>>> An enum won't work as they are unions of possibilities.
>> Yes, your explanation helps - my aim was to separate between three 
>> orthogonal properties:
>>
>> * what is the kind of this token?
>> * is this token a beginning?
>> * is this token a valid termination?
>>
>> So that the last two could be moved onto separate boolean flags. But, 
>> as you point out, you need a richer set of flags to express things 
>> like 'boolean'.
>>
>> As for enums and union, EnumSet is still your friend, if needs be.
>>>
>>>>
>>>>    - watch out for commented out code/dead code (here and elsewhere)
>>>>
>>>> Sometimes the completeness analysis doesn't behave as expected:
>>>>
>>>> -> new
>>>> >> ;
>>>> >> ;
>>>> >> ;
>>>> >> ;
>>>> >> ;
>>>> >> ;
>>>> >> Foo();
>>>> |  Error:
>>>> |  <identifier> expected
>>>> |  new
>>>> |     ^
>>>> |  Error:
>>>> |  '(' or '[' expected
>>>> |  ;
>>>> |  ^
>>>>
>>>> In the above, I found it confusing that the ';' is not enough to 
>>>> tell the analysis logic that I'm done with the statement - and the 
>>>> analysis will keep waiting until it matches a '()'.
>>>>
>>>
>>> OK, "new" has special processing that has makes it error prone.  In 
>>> the webrev below I remove that special processing at a small loss of 
>>> error detection, and add a general check for bad sequencing.
>>>
>>> http://cr.openjdk.java.net/~rfield/CompletenessAnalyzer_v0/
>>>
>>> This fixes "new ;" but does not fix "(;".  The lexical analysis does 
>>> not have enough context to know this isn't the beginning of a 
>>> for-statement.  Basically, everything between matching parenthesis, 
>>> brackets, or braces is currently ignored -- effectively "(...)", 
>>> "[...]", or "{...}".   A result of this is, for example, that "else" 
>>> is returned from completion as "unknown" (erroneous) so it is 
>>> immediately parsed and an error returned; Whereas "(else" is 
>>> returned from completion as "incomplete" prompting for more input.
>>>
>>> To fix this, the nesting analysis would need to be moved into the 
>>> parser (where is typically belongs) which would be a fairly 
>>> significant change.  I will poke at that and see how complex it is. 
>>> Your thoughts on its importance?
>> I don't think it's very important - the confusing part is that when 
>> you make a mistake (as I did in the case above) it is hard to realize 
>> what the REPL is expecting from you in order to mark the statement as 
>> 'complete'.
>> Would some hint help here - i.e. "I'm waiting for (...)"
>>
>> Also, while you might not be able to discard "(;" immediately, we 
>> should at least try to discard "(;;;", as that's not compatible with 
>> a for anyway.
>
> That too would require context that the matcher's previous-token 
> context does not have.  Pondered parser implementation last night, 
> while I should have been sleeping, giving it a try now....
>
> Thanks,
> Robert
>
>>
>> Maurizio
>>>
>>> Thanks,
>>> Robert
>>>
>>
>