REPL code review -- CompletenessAnalyzer
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Sep 22 09:50:21 UTC 2015
On 22/09/15 06:46, Robert Field wrote:
>
> On 09/11/15 08:25, Maurizio Cimadamore wrote:
>> * CompletenessAnalyzer
>>
>> - Location position constants appear more complex than necessary
>> (or I'm missing something); it appears that constants like XEXPR1o,
>> XDECL1o, XSTMT1o are never used? The only uages I see are two:
>>
>> - to build other constants (such as XEXPR1, XDECL1 and XSTMT1)
>>
>> - in a switch inside parseUnit
>
> Exactly, they define the bits and the semantics of those bits. They
> are applied as a set of bits.
>
> The comments were broken, I'm updating as such --
>
> // Location position kinds -- a token is ...
> private static final int XEXPR =
> 0b1; // OK in expression (not first)
> private static final int XDECL =
> 0b10; // OK in declaration (not first)
> private static final int XSTMT =
> 0b100; // OK in statement framework (not first)
> private static final int XEXPR1o =
> 0b1000; // OK first in expression
> private static final int XDECL1o =
> 0b10000; // OK first in declaration
> private static final int XSTMT1o =
> 0b100000; // OK first or only in statement framework
> private static final int XEXPR1 = XEXPR1o |
> XEXPR; // OK in expression (anywhere)
> private static final int XDECL1 = XDECL1o |
> XDECL; // OK in declaration (anywhere)
> private static final int XSTMT1 = XSTMT1o |
> XSTMT; // OK in statement framework (anywhere)
> private static final int XANY1 = XEXPR1o | XDECL1o |
> XSTMT1o; // Mask: first in statement, declaration, or expression
> private static final int XTERM = 0b100000000;
> // Can terminate (last before EOF)
> private static final int XERRO = 0b1000000000;
> // Is an error
>
> So, for example "boolean" is XEXPR | XDECL1, which is XEXPR | XDECL |
> XDECL1o -- meaning it can occur in an expression (in a cast), or at
> the beginning or later in a declaration. If we encounter "boolean" at
> the beginning of a snippet, parseUnit() masks to just look at allowed
> in first position, in this case XDECL1o. So, it knows it is the
> beginning of a declaration.
>
>>
>> Since no token is really using those constants explicitly, maybe they
>> could be removed in favor of their more general counterparts (this
>> also kind of means that i.e. XEXPR1 == XEXPR, etc.)
>
> See above, it is being used.
I see now - what I was missing is that 'is at the beginning?' property
is not a property of the token as a whole - but it is a
context-dependent property; i.e. a boolean can either be at the
beginning or not depending on whether it is parsed as a statement or as
an expression - hence the need for multiple flags (where in my mind only
one would have been sufficient). Thanks for taking the time to explain -
the example was useful!
>
>>
>> - I believe in general the design of location codes + 'belongs'
>> field feels weird, and given this code is quite convoluted (because
>> the task is hard!) the code might benefit from some cleanup. My
>> suggestions:
>> (1) come up with a cleaner enum which contains: { EXPR, DECL,
>> STMT, AMBIGUOUS, ERR } - use this for classification purposes in
>> parseUnit and related predicates (i.e. what is this token like?)
>> (2) extract the 'isOkToTerminate' property onto a separate
>> boolean flag on the token. After all, this is an orthogonal property
>> of all tokens.
>
> Does my explanation clarify? Am I still missing a simplification?
>
> An enum won't work as they are unions of possibilities.
Yes, your explanation helps - my aim was to separate between three
orthogonal properties:
* what is the kind of this token?
* is this token a beginning?
* is this token a valid termination?
So that the last two could be moved onto separate boolean flags. But, as
you point out, you need a richer set of flags to express things like
'boolean'.
As for enums and union, EnumSet is still your friend, if needs be.
>
>>
>> - watch out for commented out code/dead code (here and elsewhere)
>>
>> Sometimes the completeness analysis doesn't behave as expected:
>>
>> -> new
>> >> ;
>> >> ;
>> >> ;
>> >> ;
>> >> ;
>> >> ;
>> >> Foo();
>> | Error:
>> | <identifier> expected
>> | new
>> | ^
>> | Error:
>> | '(' or '[' expected
>> | ;
>> | ^
>>
>> In the above, I found it confusing that the ';' is not enough to tell
>> the analysis logic that I'm done with the statement - and the
>> analysis will keep waiting until it matches a '()'.
>>
>
> OK, "new" has special processing that has makes it error prone. In the
> webrev below I remove that special processing at a small loss of error
> detection, and add a general check for bad sequencing.
>
> http://cr.openjdk.java.net/~rfield/CompletenessAnalyzer_v0/
>
> This fixes "new ;" but does not fix "(;". The lexical analysis does
> not have enough context to know this isn't the beginning of a
> for-statement. Basically, everything between matching parenthesis,
> brackets, or braces is currently ignored -- effectively "(...)",
> "[...]", or "{...}". A result of this is, for example, that "else"
> is returned from completion as "unknown" (erroneous) so it is
> immediately parsed and an error returned; Whereas "(else" is returned
> from completion as "incomplete" prompting for more input.
>
> To fix this, the nesting analysis would need to be moved into the
> parser (where is typically belongs) which would be a fairly
> significant change. I will poke at that and see how complex it is.
> Your thoughts on its importance?
I don't think it's very important - the confusing part is that when you
make a mistake (as I did in the case above) it is hard to realize what
the REPL is expecting from you in order to mark the statement as 'complete'.
Would some hint help here - i.e. "I'm waiting for (...)"
Also, while you might not be able to discard "(;" immediately, we should
at least try to discard "(;;;", as that's not compatible with a for anyway.
Maurizio
>
> Thanks,
> Robert
>
More information about the kulla-dev
mailing list