Incremental java compile AKA javac print compile dependencies

Joshua Maurice joshuamaurice at gmail.com
Tue May 25 18:42:42 PDT 2010


On Tue, May 25, 2010 at 6:38 PM, Joshua Maurice <joshuamaurice at gmail.com>wrote:

> On Tue, May 25, 2010 at 6:01 PM, Jonathan Gibbons <
> jonathan.gibbons at oracle.com> wrote:
>
>> On 05/25/2010 05:11 PM, Joshua Maurice wrote:
>>
>>>
>>>
>>> What is relevant is that to get decent levels of incremental, aka
>>> skipping unnecessary rebuilds, the build system needs to know for each java
>>> X, the full list of class files and java files which will be directly used
>>> by javac when compiling X. Whenever any of those direct compile dependencies
>>> have an "interface" / "signature" change, X needs to be recompiled.
>>>
>>
>> Stop right there.   There's likely a wrong assumption here, hidden in the
>> word "directly".
>>
>> If you start from scratch, with no classes precompiled, when you compile
>> X, javac will pull in from the sourcepath the transitive closure of X and
>> all its dependencies.  Thus if X refers to Y, and if the implementation of Y
>> refers to Z, then javac will compile X and Y and Z, even though there no
>> direct reference in any way from X to Z.   This is why your proposed
>> technique of tracking -verbose output will not work.
>>
>
> What? For starters, I'm planning on specifically not using the -sourcepath
> option. Suppose a user touches X only, and nothing else depends on X, like
> your example, and I want to only recompile X.java. However, if I give the
> -sourcepath option, then as you note, javac will recompile X, Y, and Z, but
> Y and Z are useless recompiles.
>
> Here are some examples to further explain what I'm planning:
>
> Suppose X, Y, and Z are part of the same javac task. Touch Z.java. Do a
> build. The build system notes by rule 1 that Z.java is "out of date" (source
> file last modification time is newer than last compile time). It notes by
> rule 3b that Y.java is "out of date" (direct dependency java file in same
> javac task is "out of date"). It then notes by rule 3b that X.java is "out
> of date" (direct dependency java file in same javac task is "out of date").
>
> Suppose X, Y, and Z are each part of different javac tasks, such as in
> different jars. Touch Z.java. Do a build. The build system notes by rule 1
> that Z.java is out of date (source file last modification timestamp is newer
> than last compile time). It calls javac on Z.java. Z.class has the same
> "interface", so its last "interface change" time remains unchanged. The
> build system then finds no rule which makes Y or X out of date, so it does
> no further recompile.
>
> Suppose X, Y, and Z are each part of different javac tasks, such as in
> different jars. Modify Z.java so its "signature" / "interface" has changed.
> Do a build. The build system notes by rule 1 that Z.java is out of date
> (source file last modification timestamp is newer than last compile time).
> It calls javac on Z.java. Z.class when compared to the old one has a
> different "interface", so its last "interface change" is set to now. The
> build system then finds Y.java to be "out of date" by rule 3a (a direct
> dependency class file has a newer "last interface change" time than the last
> compilation time). Depending on if this affects the "interface" of Y, X then
> might also be found to be "out of date", or it might be found to be "up to
> date".
>
> Note that when they're part of the same javac task, I do cascade without
> termination downstream, to the extent of the javac task. I have made an
> educated \guess\ that this is a reasonably efficient way to get good
> parallelization, close to minimal rebuilds, and avoid a great deal of
> overhead of calling many separate javac invocations.
>
> There is a difference between the set of files needed to compile X, and the
>> set of files on which X has a direct dependency (meaning that if they
>> change, X needs to be recompiled.)  To determine the set of files (or even
>> better, the classes) on which X depends, you must either look at the
>> classfile (which has the constant problem) or at the AST sometime after
>> Attr.
>>
>
> What? There is? No there isn't. There is no difference between:
> - the set A - the set of files needed to compile some java file X
> and
> - the set B - the set of files which X has a direct dependency - meaning
> that if they change, the java file X needs to be recompiled.
>
> At least, perhaps a more intelligent / sophisticated build system could
> make such a distinction, but that is not my aim at the moment. I am being
> conservative at the moment, and if some class definition Y is required to
> compile X.java, then I find it quite reasonable that X.java's compilation
> might be different, or fail altogether, with a different class definition Y
> or an un-findable class definition Y.
>
> What do you propose is the difference between sets A and B? An example
> would be enlightening. (Unless we're talking about Ghost Dependencies, names
> which might refer to a different type or member depending on what's on the
> classpath and in the java files in the compile, such as A.B "hiding" A.B,
> where one of them is a package A, class B, and the other is a class A, and
> an inner class B. I don't think you're talking about Ghost Dependencies
> though.)
>
> PS: Hopefully we're not quibbling over the definition of "minimal rebuild".
> Yes, by a certain strict definition of minimal rebuild, where "equivalent to
> a full clean build" is defined as "the output class files display the same
> observable behavior over all 'allowed by documentation' inputs", then a
> minimal rebuild is equivalent to the Halting problem. However, if we define
> "equivalent to a full clean build" in terms of same binary contents of class
> files, then I'm inclined to think that it's not equivalent to the Halting
> problem, though I'm not sure. Either way, I'm going for a conservative
> approximation, one which is 100% correct, but may do unnecessary rebuilds,
> though preferably as little unnecessary rebuilds as "reasonable".
>

Oh, nevermind. I'm sorry. I think I see your point now. You're talking about
transitive dependencies vs direct dependencies. Yes, a change to a
"transitive compile dependency" (quote unquote), may require a rebuild of
me. I believe my above examples highlight how I plan to catch that. With the
direct dependencies of javac -verbose, I could then construct the dependency
graph and start recompiling out of date nodes. However, I do not want to
cascade endlessly downstream, and to do that I need to know all possible
\direct\ dependencies, to know that when I have a set of unchanged leaves of
the cascade that there are no possible effects on nodes outside the rebuild
portion of the graph from the changed portion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20100525/bda32932/attachment.html 


More information about the compiler-dev mailing list