REPL code review

Fri Sep 11 15:25:36 UTC 2015

Hi,
this is a review for REPL - I focused on the javac/language-related 
parts and I think I came up with some interesting examples. I tried to 
cover as many areas as possible and, as a result, the comments are not 
as deep. I can dive more into specific sections if required. Overall the 
feeling I get from this is very good. The overall machinery seems to be 
cleanly defined - I like the fact that you use JDI to completely hide 
the REPL frontend from how user-defined classes are loaded/redefined. I 
did run into some bugs - some of them minor, some of them more worrisome 
- I hope this is the kind of feedback you were looking for.

One general high-level comment, which I also pointed out elsewhere, is 
that I'm not sure jshell really belongs in langtools; while it's 
semantically (obviously) related to langtools - it is a rather different 
beasts w.r.t. all other tools in langtools-land; the fat that it depends 
on the JDK (for jline, and for JDI in general) makes it very hard to run 
on top of a random JDK and then bootstrapping classes - which is a 
technique widely used to be able to run langtools tools w/o having to do 
a full build cycle. More specifically, talking about IDE integration, I 
don't see how IntelliJ/Netbeans langtools projects (and the langtools 
internal developer ant build) could be updated to be able to run/debug 
jshell w/o a full build.

Below are the more detailed review comments (organized by class):

* TreeDependencyScanner

     -  not sure why Object is being used as a return type for the 
visitor methods; you could probably just use Void and then use scan() 
instead of scanAndReduce() [given there's nothing to reduce] ?

    -  I'm not sure you need the scanDeclaration method - it is only 
used in Scan.processVariable - which runs it on a variable's type - but 
variable types are already scanned by TDS references with the right 
'decl' set?

* Eval

     - just curious - you seem to do a lot of String-based analysis (see 
example in computeQualifiedParameterTypes()) - why is it not possible to 
rely on the more robust Element API to do that? I.e. couldn't a snippet 
corresponding to a declaration have an Element? Or, even more generally, 
why not just create real AST nodes for wraps?

    - UnresolvedExtractor - it would probably help to use 
DiagnosticFlag.RECOVERABLE instead of hard-coded strings to filter out 
resolution error (I also noted this somewhere below). However, I also 
note how the logic for extracting the unresolved symbol is inherently 
fragile and dependent on what is the shape of the javac diagnostics. I 
wonder if looking at the analyzed tree might not provide a more robust 
solution here.

     - the Range logic is clever - but, again, it seems like a big 
workaround for the fact that wrapping is defined in terms of strings - 
not trees; any reason you went down this path?

     - corralled - I found the Snippet hierarchy underused when it comes 
to corralling; the task of corralling a snippet is basically performed 
by Eval through some helper routines - couldn't we have each snippet 
define how corralling should be done? You go half-way there by having a 
corral() method in the Snippet base class, but it doesn't do much, other 
then retrieving whatever value has been set from outside.

    - the corralling logic doesn't seem to work on supertypes using FQN:

-> class A extends B.Inner { static class Inner { } }
|  Error:
|  package B does not exist
|  class A extends B.Inner { static class Inner { } }
|                  ^-----^

I would have expected the snippet to wait for B.

     - the updates of corralled snippets seems to happen at the wrong 
time - this could lead to chain of symbols that are not usable:

-> class D extends E { }
|  Added class D, however, it cannot be referenced until class E is declared

-> class E { D d; }
|  Added class E, however, it cannot be referenced until class D is declared

     I guess what happens here is that, since the decision on whether to 
corral or not is based on the fact that we can compile cleanly , here we 
run into troubles. The first snippet obviously cannot be compiled 
cleanly, so it's corralled; the second one cannot compile cleanly 
because the first one is not 'defined' - so you get another resolution 
error and another corralled snippet. Unfortunately, the two snippets 
taken together form a legal source. Btw, I think the current behavior is 
preventing stuff like this:

-> class A extends B { }
|  Added class A, however, it cannot be referenced until class B is declared

-> class B extends A { }
|  Added class B, however, it cannot be referenced until class A is declared

from looping forever in Eval.declare; perhaps the code could be made 
more robust by looking explicitly for update loops (i.e. this can be 
done by keeping a stack of all the updated snippets and see if we 
encounter the same snippet twice).

    - on update loops - I guess it is indeed possible to have loops, 
even with current corralling limitations (by exploiting redefinition):

-> class Outer { class Inner extends Foo { } }
|  Added class Outer, however, it cannot be referenced until class Foo 
is declared

-> class Foo { }
|  Added class Foo

-> class Foo extends Outer { }

Update loop check logic is in order I think. Also, interestingly - I'm 
not able to stop the REPL once I get into such a loop. Actually, later I 
found out it doesn't loop forever - I guess eventually it runs out of 
stack (after several minutes) and exits saying this:

|  Replaced class Foo
|    Update overwrote class Foo

Which seems bogus anyway - given there's cyclic inheritance (no error is 
reported!).

* ReplParser - in general it looks good. The question I have here is 
more design-oriented: why do we need it? I believe the main use case is 
that you need to parse the input source to be able to decide between 
expressions and declarations. Can't we wrap the snippet in the following 
two ways:

class Temp$Decl {
    <snippet here>
}

and:

class Temp$Expr {
    Object o = <snippet here>
}

and then see who wins? This would also require also to move the wrapping 
logic from Eval to some earlier stage (when the input is deemed 
complete). It's probably a very big change, so I'm merely throwing this 
as a suggestion for future improvements - it's outside the scope of the 
review.

* TypePrinter - so, this is needed when you need to print the type of 
some expression; it seems the exact copy of the printer used by the 
RichDiagnosticFormatter, except for the fact that you want to omit the 
captured type-variable details. So, maybe you could split this process 
in two: first you map the original type onto a type where all captured 
variables have been replaced by their bounds (i.e from List<#CAP1> to 
List<Object>) - then you call the RichDiagnosticFormatter printer onto 
it (the best way to do that would probably be to create a fake 
diagnostic with a type as its argument, and then feed that to the 
diagnostic formatter - that should give you back a string). The 
RichDiagnosticFormatter, by default will also generate stuff that you 
probably don't want, such as 'where' clauses to disambiguate 
type-variable names with the same names but different owners - but such 
features can be selectively disabled (via javac command line options).

* MemoryFileManager

    - Maybe the code of 'list' could be more optimized/lazy by returning 
a compound iterator - i.e. an iterator that keeps scanning the items in 
the first iterator and, when finished moves onto the next iterator.

    - comments: is it normal to repeat all comments from the 
JavaFileManager interface?

    - watch out for unused methods: dumpClasses, findGeneratedClass, 
findGeneratedBytes, inferModuleName, listModueLocations (maybe this 
stuff is for Jigsaw? If so, maybe add a comment)

* CompletenessAnalyzer

    - Location position constants appear more complex than necessary (or 
I'm missing something); it appears that constants like XEXPR1o, XDECL1o, 
XSTMT1o are never used? The only uages I see are two:

    - to build other constants (such as XEXPR1, XDECL1 and XSTMT1)

    - in a switch inside parseUnit

Since no token is really using those constants explicitly, maybe they 
could be removed in favor of their more general counterparts (this also 
kind of means that i.e. XEXPR1 == XEXPR, etc.)

    - I believe in general the design of location codes + 'belongs' 
field feels weird, and given this code is quite convoluted (because the 
task is hard!) the code might benefit from some cleanup. My suggestions:
     (1) come up with a cleaner enum which contains: { EXPR, DECL, STMT, 
AMBIGUOUS, ERR } - use this for classification purposes in parseUnit and 
related predicates (i.e. what is this token like?)
     (2) extract the 'isOkToTerminate' property onto a separate boolean 
flag on the token. After all, this is an orthogonal property of all tokens.

    - watch out for commented out code/dead code (here and elsewhere)

Sometimes the completeness analysis doesn't behave as expected:

-> new
 >> ;
 >> ;
 >> ;
 >> ;
 >> ;
 >> ;
 >> Foo();
|  Error:
|  <identifier> expected
|  new
|     ^
|  Error:
|  '(' or '[' expected
|  ;
|  ^

In the above, I found it confusing that the ';' is not enough to tell 
the analysis logic that I'm done with the statement - and the analysis 
will keep waiting until it matches a '()'.

-SourceCodeAnalysisImpl - generally looks good, and the tab completion 
logic is quite clever:

    - in scopeContent() what is the difference between the 
'scopeIterable' and a call to Scope.getLocalElements (which is routed 
via Scope.getSymbols)?

    - the logic for filtering based on target type is cool - but could 
use some cleanup - for instance there are two smart filters, and it's 
not clear why those are needed; it seems like smartTypeFilter is only 
used in imports, and the main difference is that it is allowed to 
operate on type elements, while the smartFilter can't; but the typing 
logic is the same (except for isAssignable vs. isSubtype - which 
shouldn't be important in the smartTypeFilter case anyway). I suggest 
just using smartFilter (without the class filtering logic), and the add 
the proper extra filtering logic
    (i.e. must be a class, must be a non-class) as a separate 'kind' filter.

    -found a bug with varargs (varargs expansion does not take place):

-> class Foo { static void m(int... i) { } }
|  Replaced class Foo
|    Update overwrote class Foo
-> Foo.m(Display all 395 possibilities? (y or n)

    -found bugs around constructors; asMemberOf doesn't get called, and 
that leads to useless smart filters:

-> class Baz<X> {
    >> Baz(X x) { }
    >> }
    |  Added class Baz

    -> String s = "";
    |  Added variable s of type String with initial value ""

    -> new Baz(Display all 391 possibilities? (y or n)

    -> new Baz<String>(Display all 391 possibilities? (y or n)

    -> Baz<String> bz = new Baz<>(Display all 391 possibilities? (y or n)

The problem is that the type of the constructor is Baz(X) and no 
variable in scope can possibly have type X, so all elements in the scope 
are cleared out. In the first two cases should be fixable by simply 
adding a call to asMemberOf() on the constructor element. The third case 
is more difficult - perhaps some heuristics could be attempted, but the 
general case is hard.

     -smart completion doesn't always work as expected - in particular, 
target-type-based suggestions seem to occur only once:

-> fs.m(
ls                        <press tab to see more>

-> fs.m
|  Error:
|  cannot find symbol
|    symbol:   variable m
|  fs.m
|  ^--^

-> fs.m(Display all 389 possibilities? (y or n)

    After first time, all members will be displayed (unless you start 
working on a different instance...)

* TaskFactory:
     - I see that cntResolve counts the recoverable errors - javac does 
have a notion of recoverable errors that is used in annotation 
processing - all recoverable diagnostics are marked with a special flag 
(DiagnosticFlag.RECOVERABLE). I wonder if it would be more robust to 
rely on that, rather than using raw diagnostic text (as the latter might 
change over time).

    - I like the hierarchy of tasks - it is very easy to see which tasks 
are meant to operate on source code and which on synthetic wraps.

Maurizio