Incremental java compile AKA javac print compile dependencies

Joshua Maurice joshuamaurice at gmail.com
Tue May 25 18:38:42 PDT 2010


On Tue, May 25, 2010 at 6:01 PM, Jonathan Gibbons <
jonathan.gibbons at oracle.com> wrote:

> On 05/25/2010 05:11 PM, Joshua Maurice wrote:
>
>>
>>
>> What is relevant is that to get decent levels of incremental, aka skipping
>> unnecessary rebuilds, the build system needs to know for each java X, the
>> full list of class files and java files which will be directly used by javac
>> when compiling X. Whenever any of those direct compile dependencies have an
>> "interface" / "signature" change, X needs to be recompiled.
>>
>
> Stop right there.   There's likely a wrong assumption here, hidden in the
> word "directly".
>
> If you start from scratch, with no classes precompiled, when you compile X,
> javac will pull in from the sourcepath the transitive closure of X and all
> its dependencies.  Thus if X refers to Y, and if the implementation of Y
> refers to Z, then javac will compile X and Y and Z, even though there no
> direct reference in any way from X to Z.   This is why your proposed
> technique of tracking -verbose output will not work.
>

What? For starters, I'm planning on specifically not using the -sourcepath
option. Suppose a user touches X only, and nothing else depends on X, like
your example, and I want to only recompile X.java. However, if I give the
-sourcepath option, then as you note, javac will recompile X, Y, and Z, but
Y and Z are useless recompiles.

Here are some examples to further explain what I'm planning:

Suppose X, Y, and Z are part of the same javac task. Touch Z.java. Do a
build. The build system notes by rule 1 that Z.java is "out of date" (source
file last modification time is newer than last compile time). It notes by
rule 3b that Y.java is "out of date" (direct dependency java file in same
javac task is "out of date"). It then notes by rule 3b that X.java is "out
of date" (direct dependency java file in same javac task is "out of date").

Suppose X, Y, and Z are each part of different javac tasks, such as in
different jars. Touch Z.java. Do a build. The build system notes by rule 1
that Z.java is out of date (source file last modification timestamp is newer
than last compile time). It calls javac on Z.java. Z.class has the same
"interface", so its last "interface change" time remains unchanged. The
build system then finds no rule which makes Y or X out of date, so it does
no further recompile.

Suppose X, Y, and Z are each part of different javac tasks, such as in
different jars. Modify Z.java so its "signature" / "interface" has changed.
Do a build. The build system notes by rule 1 that Z.java is out of date
(source file last modification timestamp is newer than last compile time).
It calls javac on Z.java. Z.class when compared to the old one has a
different "interface", so its last "interface change" is set to now. The
build system then finds Y.java to be "out of date" by rule 3a (a direct
dependency class file has a newer "last interface change" time than the last
compilation time). Depending on if this affects the "interface" of Y, X then
might also be found to be "out of date", or it might be found to be "up to
date".

Note that when they're part of the same javac task, I do cascade without
termination downstream, to the extent of the javac task. I have made an
educated \guess\ that this is a reasonably efficient way to get good
parallelization, close to minimal rebuilds, and avoid a great deal of
overhead of calling many separate javac invocations.

There is a difference between the set of files needed to compile X, and the
> set of files on which X has a direct dependency (meaning that if they
> change, X needs to be recompiled.)  To determine the set of files (or even
> better, the classes) on which X depends, you must either look at the
> classfile (which has the constant problem) or at the AST sometime after
> Attr.
>

What? There is? No there isn't. There is no difference between:
- the set A - the set of files needed to compile some java file X
and
- the set B - the set of files which X has a direct dependency - meaning
that if they change, the java file X needs to be recompiled.

At least, perhaps a more intelligent / sophisticated build system could make
such a distinction, but that is not my aim at the moment. I am being
conservative at the moment, and if some class definition Y is required to
compile X.java, then I find it quite reasonable that X.java's compilation
might be different, or fail altogether, with a different class definition Y
or an un-findable class definition Y.

What do you propose is the difference between sets A and B? An example would
be enlightening. (Unless we're talking about Ghost Dependencies, names which
might refer to a different type or member depending on what's on the
classpath and in the java files in the compile, such as A.B "hiding" A.B,
where one of them is a package A, class B, and the other is a class A, and
an inner class B. I don't think you're talking about Ghost Dependencies
though.)

PS: Hopefully we're not quibbling over the definition of "minimal rebuild".
Yes, by a certain strict definition of minimal rebuild, where "equivalent to
a full clean build" is defined as "the output class files display the same
observable behavior over all 'allowed by documentation' inputs", then a
minimal rebuild is equivalent to the Halting problem. However, if we define
"equivalent to a full clean build" in terms of same binary contents of class
files, then I'm inclined to think that it's not equivalent to the Halting
problem, though I'm not sure. Either way, I'm going for a conservative
approximation, one which is 100% correct, but may do unnecessary rebuilds,
though preferably as little unnecessary rebuilds as "reasonable".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20100525/589a8688/attachment.html 


More information about the compiler-dev mailing list