Incremental java compile AKA javac print compile dependencies
Jonathan Gibbons
jonathan.gibbons at oracle.com
Wed May 26 20:49:16 PDT 2010
On 05/26/2010 08:38 PM, Joshua Maurice wrote:
> On Wed, May 26, 2010 at 6:56 PM, Jonathan Gibbons
> <jonathan.gibbons at oracle.com <mailto:jonathan.gibbons at oracle.com>> wrote:
>
> On 05/25/2010 06:42 PM, Joshua Maurice wrote:
>> On Tue, May 25, 2010 at 6:38 PM, Joshua Maurice
>> <joshuamaurice at gmail.com <mailto:joshuamaurice at gmail.com>> wrote:
>>
>> On Tue, May 25, 2010 at 6:01 PM, Jonathan Gibbons
>> <jonathan.gibbons at oracle.com
>> <mailto:jonathan.gibbons at oracle.com>> wrote:
>>
>> On 05/25/2010 05:11 PM, Joshua Maurice wrote:
>>
>>
>>
>> What is relevant is that to get decent levels of
>> incremental, aka skipping unnecessary rebuilds, the
>> build system needs to know for each java X, the full
>> list of class files and java files which will be
>> directly used by javac when compiling X. Whenever any
>> of those direct compile dependencies have an
>> "interface" / "signature" change, X needs to be
>> recompiled.
>>
>>
>> Stop right there. There's likely a wrong assumption
>> here, hidden in the word "directly".
>>
>> If you start from scratch, with no classes precompiled,
>> when you compile X, javac will pull in from the
>> sourcepath the transitive closure of X and all its
>> dependencies. Thus if X refers to Y, and if the
>> implementation of Y refers to Z, then javac will compile
>> X and Y and Z, even though there no direct reference in
>> any way from X to Z. This is why your proposed
>> technique of tracking -verbose output will not work.
>>
>>
>> What? For starters, I'm planning on specifically not using
>> the -sourcepath option. Suppose a user touches X only, and
>> nothing else depends on X, like your example, and I want to
>> only recompile X.java. However, if I give the -sourcepath
>> option, then as you note, javac will recompile X, Y, and Z,
>> but Y and Z are useless recompiles.
>>
>> Here are some examples to further explain what I'm planning:
>>
>> Suppose X, Y, and Z are part of the same javac task. Touch
>> Z.java. Do a build. The build system notes by rule 1 that
>> Z.java is "out of date" (source file last modification time
>> is newer than last compile time). It notes by rule 3b that
>> Y.java is "out of date" (direct dependency java file in same
>> javac task is "out of date"). It then notes by rule 3b that
>> X.java is "out of date" (direct dependency java file in same
>> javac task is "out of date").
>>
>> Suppose X, Y, and Z are each part of different javac tasks,
>> such as in different jars. Touch Z.java. Do a build. The
>> build system notes by rule 1 that Z.java is out of date
>> (source file last modification timestamp is newer than last
>> compile time). It calls javac on Z.java. Z.class has the same
>> "interface", so its last "interface change" time remains
>> unchanged. The build system then finds no rule which makes Y
>> or X out of date, so it does no further recompile.
>>
>> Suppose X, Y, and Z are each part of different javac tasks,
>> such as in different jars. Modify Z.java so its "signature" /
>> "interface" has changed. Do a build. The build system notes
>> by rule 1 that Z.java is out of date (source file last
>> modification timestamp is newer than last compile time). It
>> calls javac on Z.java. Z.class when compared to the old one
>> has a different "interface", so its last "interface change"
>> is set to now. The build system then finds Y.java to be "out
>> of date" by rule 3a (a direct dependency class file has a
>> newer "last interface change" time than the last compilation
>> time). Depending on if this affects the "interface" of Y, X
>> then might also be found to be "out of date", or it might be
>> found to be "up to date".
>>
>> Note that when they're part of the same javac task, I do
>> cascade without termination downstream, to the extent of the
>> javac task. I have made an educated \guess\ that this is a
>> reasonably efficient way to get good parallelization, close
>> to minimal rebuilds, and avoid a great deal of overhead of
>> calling many separate javac invocations.
>>
>> There is a difference between the set of files needed to
>> compile X, and the set of files on which X has a direct
>> dependency (meaning that if they change, X needs to be
>> recompiled.) To determine the set of files (or even
>> better, the classes) on which X depends, you must either
>> look at the classfile (which has the constant problem) or
>> at the AST sometime after Attr.
>>
>>
>> What? There is? No there isn't. There is no difference between:
>> - the set A - the set of files needed to compile some java
>> file X
>> and
>> - the set B - the set of files which X has a direct
>> dependency - meaning that if they change, the java file X
>> needs to be recompiled.
>>
>> At least, perhaps a more intelligent / sophisticated build
>> system could make such a distinction, but that is not my aim
>> at the moment. I am being conservative at the moment, and if
>> some class definition Y is required to compile X.java, then I
>> find it quite reasonable that X.java's compilation might be
>> different, or fail altogether, with a different class
>> definition Y or an un-findable class definition Y.
>>
>> What do you propose is the difference between sets A and B?
>> An example would be enlightening. (Unless we're talking about
>> Ghost Dependencies, names which might refer to a different
>> type or member depending on what's on the classpath and in
>> the java files in the compile, such as A.B "hiding" A.B,
>> where one of them is a package A, class B, and the other is a
>> class A, and an inner class B. I don't think you're talking
>> about Ghost Dependencies though.)
>>
>> PS: Hopefully we're not quibbling over the definition of
>> "minimal rebuild". Yes, by a certain strict definition of
>> minimal rebuild, where "equivalent to a full clean build" is
>> defined as "the output class files display the same
>> observable behavior over all 'allowed by documentation'
>> inputs", then a minimal rebuild is equivalent to the Halting
>> problem. However, if we define "equivalent to a full clean
>> build" in terms of same binary contents of class files, then
>> I'm inclined to think that it's not equivalent to the Halting
>> problem, though I'm not sure. Either way, I'm going for a
>> conservative approximation, one which is 100% correct, but
>> may do unnecessary rebuilds, though preferably as little
>> unnecessary rebuilds as "reasonable".
>>
>>
>> Oh, nevermind. I'm sorry. I think I see your point now. You're
>> talking about transitive dependencies vs direct dependencies.
>> Yes, a change to a "transitive compile dependency" (quote
>> unquote), may require a rebuild of me. I believe my above
>> examples highlight how I plan to catch that. With the direct
>> dependencies of javac -verbose, I could then construct the
>> dependency graph and start recompiling out of date nodes.
>> However, I do not want to cascade endlessly downstream, and to do
>> that I need to know all possible \direct\ dependencies, to know
>> that when I have a set of unchanged leaves of the cascade that
>> there are no possible effects on nodes outside the rebuild
>> portion of the graph from the changed portion.
>
> You say "With the direct dependencies of javac -verbose".
> Unless *all* other files have been compiled except the one you're
> interested in, then -verbose is not going to give you direct
> dependencies. In the worst case (no files have been compiled)
> then -verbose is going to give you transitive dependencies.
>
>
> Indeed. My working current experimental build system compiles all of
> the out of date java files in the javac task in a single javac
> invocation without -verbose, then it compile them all \again\, one
> javac invocation per java file with -verbose. I don't know how
> different ways I can say this.
>
> The only reliable way to get the direct dependencies is to look at
> the class files or to hook into javac and look at the AST at the
> right point in the compilation.
>
>
> No. That is definitely not the most reliable way.
>
> Currently the tools.jar API, aka JavacTask, gives access to the parse
> tree of function bodies, but it does not give access to the analyzed
> tree of function bodies. I need to know all "external to
> CompilationUnitTree" resolved type names at the very least, the
> external types loaded during the compile of the CompilationUnitTree,
> aka java file. I could use the parse tree and attempt to deduce types,
> but this would be akin to writing a whole compiler, so I won't take
> this approach.
>
> (Perhaps you mean a different kind of hooking. In which case, please
> clarify as I am most interested.)
>
> Class files do not contain all of the compile dependency information
> either. Example:
> echo "public class aa { void foo() { java.util.List x = null; } }" >
> src/aa.java
> javac -d tgt src/*.java
> javap -verbose -classpath tgt aa | grep List
> the grep outputs nothing, meaning that there is no reference to "List"
> in the class file. The local variable is optimized out of existence by
> javac, yet if that type was later removed (such as if it was a user
> defined type in the build), then the class file would still load, but
> the clean recompile would fail because that type would no longer be
> findable by javac.
>
> Now, this is a straw man of sorts. We could enable debug information
> ala javac -g. In which case, the javap will show that "List" is
> contained in the class file. I \think\ that the only exception is
> constant variable fields (as defined by the Java spec third edition).
> However, I do not know any good place of reference to which claims
> that this is the only exception, the only difference between the
> information contained in class files and the information printed by
> javac -verbose. That is the purpose of this email chain: to determine
> what acceptable substitutes there are to javac -verbose (including
> sprucing up javac -verbose).
>
> I still strongly suspect that current javac -verbose will give the
> most reliable results: what I need to know is the exact list loaded by
> the compiler to compile a java file, and that is exactly what javac
> -verbose prints out (when used as described above, though at a great
> wall clock run time cost.)
>
> It seems that I might be able to hack my way around this "constant
> variable" feature by using another form of Ghost Dependencies for
> constant variable field simple names. I describe this above,
> specifically rules 4a and 4b of rules 1-4. However, a more ideal
> situation would be for javac to act like any sane C compiler and print
> out the actual external files used during the compile in a usable
> form, aka on a per CompilationUnitTree basis, aka on a per input java
> file basis.
OK, I give up. You clearly know all the answers. Have a good life.
-- Jonathan Gibbons
Tech Lead, javac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20100526/6f9d1824/attachment.html
More information about the compiler-dev
mailing list