Incremental java compile AKA javac print compile dependencies

Jonathan Gibbons jonathan.gibbons at oracle.com
Wed May 26 20:49:16 PDT 2010


On 05/26/2010 08:38 PM, Joshua Maurice wrote:
> On Wed, May 26, 2010 at 6:56 PM, Jonathan Gibbons 
> <jonathan.gibbons at oracle.com <mailto:jonathan.gibbons at oracle.com>> wrote:
>
>     On 05/25/2010 06:42 PM, Joshua Maurice wrote:
>>     On Tue, May 25, 2010 at 6:38 PM, Joshua Maurice
>>     <joshuamaurice at gmail.com <mailto:joshuamaurice at gmail.com>> wrote:
>>
>>         On Tue, May 25, 2010 at 6:01 PM, Jonathan Gibbons
>>         <jonathan.gibbons at oracle.com
>>         <mailto:jonathan.gibbons at oracle.com>> wrote:
>>
>>             On 05/25/2010 05:11 PM, Joshua Maurice wrote:
>>
>>
>>
>>                 What is relevant is that to get decent levels of
>>                 incremental, aka skipping unnecessary rebuilds, the
>>                 build system needs to know for each java X, the full
>>                 list of class files and java files which will be
>>                 directly used by javac when compiling X. Whenever any
>>                 of those direct compile dependencies have an
>>                 "interface" / "signature" change, X needs to be
>>                 recompiled.
>>
>>
>>             Stop right there.   There's likely a wrong assumption
>>             here, hidden in the word "directly".
>>
>>             If you start from scratch, with no classes precompiled,
>>             when you compile X, javac will pull in from the
>>             sourcepath the transitive closure of X and all its
>>             dependencies.  Thus if X refers to Y, and if the
>>             implementation of Y refers to Z, then javac will compile
>>             X and Y and Z, even though there no direct reference in
>>             any way from X to Z.   This is why your proposed
>>             technique of tracking -verbose output will not work.
>>
>>
>>         What? For starters, I'm planning on specifically not using
>>         the -sourcepath option. Suppose a user touches X only, and
>>         nothing else depends on X, like your example, and I want to
>>         only recompile X.java. However, if I give the -sourcepath
>>         option, then as you note, javac will recompile X, Y, and Z,
>>         but Y and Z are useless recompiles.
>>
>>         Here are some examples to further explain what I'm planning:
>>
>>         Suppose X, Y, and Z are part of the same javac task. Touch
>>         Z.java. Do a build. The build system notes by rule 1 that
>>         Z.java is "out of date" (source file last modification time
>>         is newer than last compile time). It notes by rule 3b that
>>         Y.java is "out of date" (direct dependency java file in same
>>         javac task is "out of date"). It then notes by rule 3b that
>>         X.java is "out of date" (direct dependency java file in same
>>         javac task is "out of date").
>>
>>         Suppose X, Y, and Z are each part of different javac tasks,
>>         such as in different jars. Touch Z.java. Do a build. The
>>         build system notes by rule 1 that Z.java is out of date
>>         (source file last modification timestamp is newer than last
>>         compile time). It calls javac on Z.java. Z.class has the same
>>         "interface", so its last "interface change" time remains
>>         unchanged. The build system then finds no rule which makes Y
>>         or X out of date, so it does no further recompile.
>>
>>         Suppose X, Y, and Z are each part of different javac tasks,
>>         such as in different jars. Modify Z.java so its "signature" /
>>         "interface" has changed. Do a build. The build system notes
>>         by rule 1 that Z.java is out of date (source file last
>>         modification timestamp is newer than last compile time). It
>>         calls javac on Z.java. Z.class when compared to the old one
>>         has a different "interface", so its last "interface change"
>>         is set to now. The build system then finds Y.java to be "out
>>         of date" by rule 3a (a direct dependency class file has a
>>         newer "last interface change" time than the last compilation
>>         time). Depending on if this affects the "interface" of Y, X
>>         then might also be found to be "out of date", or it might be
>>         found to be "up to date".
>>
>>         Note that when they're part of the same javac task, I do
>>         cascade without termination downstream, to the extent of the
>>         javac task. I have made an educated \guess\ that this is a
>>         reasonably efficient way to get good parallelization, close
>>         to minimal rebuilds, and avoid a great deal of overhead of
>>         calling many separate javac invocations.
>>
>>             There is a difference between the set of files needed to
>>             compile X, and the set of files on which X has a direct
>>             dependency (meaning that if they change, X needs to be
>>             recompiled.)  To determine the set of files (or even
>>             better, the classes) on which X depends, you must either
>>             look at the classfile (which has the constant problem) or
>>             at the AST sometime after Attr.
>>
>>
>>         What? There is? No there isn't. There is no difference between:
>>         - the set A - the set of files needed to compile some java
>>         file X
>>         and
>>         - the set B - the set of files which X has a direct
>>         dependency - meaning that if they change, the java file X
>>         needs to be recompiled.
>>
>>         At least, perhaps a more intelligent / sophisticated build
>>         system could make such a distinction, but that is not my aim
>>         at the moment. I am being conservative at the moment, and if
>>         some class definition Y is required to compile X.java, then I
>>         find it quite reasonable that X.java's compilation might be
>>         different, or fail altogether, with a different class
>>         definition Y or an un-findable class definition Y.
>>
>>         What do you propose is the difference between sets A and B?
>>         An example would be enlightening. (Unless we're talking about
>>         Ghost Dependencies, names which might refer to a different
>>         type or member depending on what's on the classpath and in
>>         the java files in the compile, such as A.B "hiding" A.B,
>>         where one of them is a package A, class B, and the other is a
>>         class A, and an inner class B. I don't think you're talking
>>         about Ghost Dependencies though.)
>>
>>         PS: Hopefully we're not quibbling over the definition of
>>         "minimal rebuild". Yes, by a certain strict definition of
>>         minimal rebuild, where "equivalent to a full clean build" is
>>         defined as "the output class files display the same
>>         observable behavior over all 'allowed by documentation'
>>         inputs", then a minimal rebuild is equivalent to the Halting
>>         problem. However, if we define "equivalent to a full clean
>>         build" in terms of same binary contents of class files, then
>>         I'm inclined to think that it's not equivalent to the Halting
>>         problem, though I'm not sure. Either way, I'm going for a
>>         conservative approximation, one which is 100% correct, but
>>         may do unnecessary rebuilds, though preferably as little
>>         unnecessary rebuilds as "reasonable".
>>
>>
>>     Oh, nevermind. I'm sorry. I think I see your point now. You're
>>     talking about transitive dependencies vs direct dependencies.
>>     Yes, a change to a "transitive compile dependency" (quote
>>     unquote), may require a rebuild of me. I believe my above
>>     examples highlight how I plan to catch that. With the direct
>>     dependencies of javac -verbose, I could then construct the
>>     dependency graph and start recompiling out of date nodes.
>>     However, I do not want to cascade endlessly downstream, and to do
>>     that I need to know all possible \direct\ dependencies, to know
>>     that when I have a set of unchanged leaves of the cascade that
>>     there are no possible effects on nodes outside the rebuild
>>     portion of the graph from the changed portion.
>
>     You say "With the direct dependencies of javac -verbose".   
>     Unless *all* other files have been compiled except the one you're
>     interested in, then -verbose is not going to give you direct
>     dependencies.  In the worst case (no files have been compiled)
>     then -verbose is going to give you transitive dependencies.
>
>
> Indeed. My working current experimental build system compiles all of 
> the out of date java files in the javac task in a single javac 
> invocation without -verbose, then it compile them all \again\, one 
> javac invocation per java file with -verbose. I don't know how 
> different ways I can say this.
>
>     The only reliable way to get the direct dependencies is to look at
>     the class files or to hook into javac and look at the AST at the
>     right point in the compilation.
>
>
> No. That is definitely not the most reliable way.
>
> Currently the tools.jar API, aka JavacTask, gives access to the parse 
> tree of function bodies, but it does not give access to the analyzed 
> tree of function bodies. I need to know all "external to 
> CompilationUnitTree" resolved type names at the very least, the 
> external types loaded during the compile of the CompilationUnitTree, 
> aka java file. I could use the parse tree and attempt to deduce types, 
> but this would be akin to writing a whole compiler, so I won't take 
> this approach.
>
> (Perhaps you mean a different kind of hooking. In which case, please 
> clarify as I am most interested.)
>
> Class files do not contain all of the compile dependency information 
> either. Example:
>   echo "public class aa { void foo() { java.util.List x = null; } }" > 
> src/aa.java
>   javac -d tgt src/*.java
>   javap -verbose -classpath tgt aa | grep List
> the grep outputs nothing, meaning that there is no reference to "List" 
> in the class file. The local variable is optimized out of existence by 
> javac, yet if that type was later removed (such as if it was a user 
> defined type in the build), then the class file would still load, but 
> the clean recompile would fail because that type would no longer be 
> findable by javac.
>
> Now, this is a straw man of sorts. We could enable debug information 
> ala javac -g. In which case, the javap will show that "List" is 
> contained in the class file. I \think\ that the only exception is 
> constant variable fields (as defined by the Java spec third edition). 
> However, I do not know any good place of reference to which claims 
> that this is the only exception, the only difference between the 
> information contained in class files and the information printed by 
> javac -verbose. That is the purpose of this email chain: to determine 
> what acceptable substitutes there are to javac -verbose (including 
> sprucing up javac -verbose).
>
> I still strongly suspect that current javac -verbose will give the 
> most reliable results: what I need to know is the exact list loaded by 
> the compiler to compile a java file, and that is exactly what javac 
> -verbose prints out (when used as described above, though at a great 
> wall clock run time cost.)
>
> It seems that I might be able to hack my way around this "constant 
> variable" feature by using another form of Ghost Dependencies for 
> constant variable field simple names. I describe this above, 
> specifically rules 4a and 4b of rules 1-4. However, a more ideal 
> situation would be for javac to act like any sane C compiler and print 
> out the actual external files used during the compile in a usable 
> form, aka on a per CompilationUnitTree basis, aka on a per input java 
> file basis.


OK, I give up.  You clearly know all the answers. Have a good life.

-- Jonathan Gibbons
     Tech Lead, javac


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20100526/6f9d1824/attachment.html 


More information about the compiler-dev mailing list