Incremental java compile AKA javac print compile dependencies
Joshua Maurice
joshuamaurice at gmail.com
Wed May 26 20:38:53 PDT 2010
On Wed, May 26, 2010 at 6:56 PM, Jonathan Gibbons <
jonathan.gibbons at oracle.com> wrote:
> On 05/25/2010 06:42 PM, Joshua Maurice wrote:
>
> On Tue, May 25, 2010 at 6:38 PM, Joshua Maurice <joshuamaurice at gmail.com>wrote:
>
>> On Tue, May 25, 2010 at 6:01 PM, Jonathan Gibbons <
>> jonathan.gibbons at oracle.com> wrote:
>>
>>> On 05/25/2010 05:11 PM, Joshua Maurice wrote:
>>>
>>>>
>>>>
>>>> What is relevant is that to get decent levels of incremental, aka
>>>> skipping unnecessary rebuilds, the build system needs to know for each java
>>>> X, the full list of class files and java files which will be directly used
>>>> by javac when compiling X. Whenever any of those direct compile dependencies
>>>> have an "interface" / "signature" change, X needs to be recompiled.
>>>>
>>>
>>> Stop right there. There's likely a wrong assumption here, hidden in
>>> the word "directly".
>>>
>>> If you start from scratch, with no classes precompiled, when you compile
>>> X, javac will pull in from the sourcepath the transitive closure of X and
>>> all its dependencies. Thus if X refers to Y, and if the implementation of Y
>>> refers to Z, then javac will compile X and Y and Z, even though there no
>>> direct reference in any way from X to Z. This is why your proposed
>>> technique of tracking -verbose output will not work.
>>>
>>
>> What? For starters, I'm planning on specifically not using the -sourcepath
>> option. Suppose a user touches X only, and nothing else depends on X, like
>> your example, and I want to only recompile X.java. However, if I give the
>> -sourcepath option, then as you note, javac will recompile X, Y, and Z, but
>> Y and Z are useless recompiles.
>>
>> Here are some examples to further explain what I'm planning:
>>
>> Suppose X, Y, and Z are part of the same javac task. Touch Z.java. Do a
>> build. The build system notes by rule 1 that Z.java is "out of date" (source
>> file last modification time is newer than last compile time). It notes by
>> rule 3b that Y.java is "out of date" (direct dependency java file in same
>> javac task is "out of date"). It then notes by rule 3b that X.java is "out
>> of date" (direct dependency java file in same javac task is "out of date").
>>
>> Suppose X, Y, and Z are each part of different javac tasks, such as in
>> different jars. Touch Z.java. Do a build. The build system notes by rule 1
>> that Z.java is out of date (source file last modification timestamp is newer
>> than last compile time). It calls javac on Z.java. Z.class has the same
>> "interface", so its last "interface change" time remains unchanged. The
>> build system then finds no rule which makes Y or X out of date, so it does
>> no further recompile.
>>
>> Suppose X, Y, and Z are each part of different javac tasks, such as in
>> different jars. Modify Z.java so its "signature" / "interface" has changed.
>> Do a build. The build system notes by rule 1 that Z.java is out of date
>> (source file last modification timestamp is newer than last compile time).
>> It calls javac on Z.java. Z.class when compared to the old one has a
>> different "interface", so its last "interface change" is set to now. The
>> build system then finds Y.java to be "out of date" by rule 3a (a direct
>> dependency class file has a newer "last interface change" time than the last
>> compilation time). Depending on if this affects the "interface" of Y, X then
>> might also be found to be "out of date", or it might be found to be "up to
>> date".
>>
>> Note that when they're part of the same javac task, I do cascade without
>> termination downstream, to the extent of the javac task. I have made an
>> educated \guess\ that this is a reasonably efficient way to get good
>> parallelization, close to minimal rebuilds, and avoid a great deal of
>> overhead of calling many separate javac invocations.
>>
>> There is a difference between the set of files needed to compile X, and
>>> the set of files on which X has a direct dependency (meaning that if they
>>> change, X needs to be recompiled.) To determine the set of files (or even
>>> better, the classes) on which X depends, you must either look at the
>>> classfile (which has the constant problem) or at the AST sometime after
>>> Attr.
>>>
>>
>> What? There is? No there isn't. There is no difference between:
>> - the set A - the set of files needed to compile some java file X
>> and
>> - the set B - the set of files which X has a direct dependency - meaning
>> that if they change, the java file X needs to be recompiled.
>>
>> At least, perhaps a more intelligent / sophisticated build system could
>> make such a distinction, but that is not my aim at the moment. I am being
>> conservative at the moment, and if some class definition Y is required to
>> compile X.java, then I find it quite reasonable that X.java's compilation
>> might be different, or fail altogether, with a different class definition Y
>> or an un-findable class definition Y.
>>
>> What do you propose is the difference between sets A and B? An example
>> would be enlightening. (Unless we're talking about Ghost Dependencies, names
>> which might refer to a different type or member depending on what's on the
>> classpath and in the java files in the compile, such as A.B "hiding" A.B,
>> where one of them is a package A, class B, and the other is a class A, and
>> an inner class B. I don't think you're talking about Ghost Dependencies
>> though.)
>>
>> PS: Hopefully we're not quibbling over the definition of "minimal
>> rebuild". Yes, by a certain strict definition of minimal rebuild, where
>> "equivalent to a full clean build" is defined as "the output class files
>> display the same observable behavior over all 'allowed by documentation'
>> inputs", then a minimal rebuild is equivalent to the Halting problem.
>> However, if we define "equivalent to a full clean build" in terms of same
>> binary contents of class files, then I'm inclined to think that it's not
>> equivalent to the Halting problem, though I'm not sure. Either way, I'm
>> going for a conservative approximation, one which is 100% correct, but may
>> do unnecessary rebuilds, though preferably as little unnecessary rebuilds as
>> "reasonable".
>>
>
> Oh, nevermind. I'm sorry. I think I see your point now. You're talking
> about transitive dependencies vs direct dependencies. Yes, a change to a
> "transitive compile dependency" (quote unquote), may require a rebuild of
> me. I believe my above examples highlight how I plan to catch that. With the
> direct dependencies of javac -verbose, I could then construct the dependency
> graph and start recompiling out of date nodes. However, I do not want to
> cascade endlessly downstream, and to do that I need to know all possible
> \direct\ dependencies, to know that when I have a set of unchanged leaves of
> the cascade that there are no possible effects on nodes outside the rebuild
> portion of the graph from the changed portion.
>
>
> You say "With the direct dependencies of javac -verbose". Unless *all*
> other files have been compiled except the one you're interested in, then
> -verbose is not going to give you direct dependencies. In the worst case
> (no files have been compiled) then -verbose is going to give you transitive
> dependencies.
>
Indeed. My working current experimental build system compiles all of the out
of date java files in the javac task in a single javac invocation without
-verbose, then it compile them all \again\, one javac invocation per java
file with -verbose. I don't know how different ways I can say this.
The only reliable way to get the direct dependencies is to look at the class
> files or to hook into javac and look at the AST at the right point in the
> compilation.
>
No. That is definitely not the most reliable way.
Currently the tools.jar API, aka JavacTask, gives access to the parse tree
of function bodies, but it does not give access to the analyzed tree of
function bodies. I need to know all "external to CompilationUnitTree"
resolved type names at the very least, the external types loaded during the
compile of the CompilationUnitTree, aka java file. I could use the parse
tree and attempt to deduce types, but this would be akin to writing a whole
compiler, so I won't take this approach.
(Perhaps you mean a different kind of hooking. In which case, please clarify
as I am most interested.)
Class files do not contain all of the compile dependency information either.
Example:
echo "public class aa { void foo() { java.util.List x = null; } }" >
src/aa.java
javac -d tgt src/*.java
javap -verbose -classpath tgt aa | grep List
the grep outputs nothing, meaning that there is no reference to "List" in
the class file. The local variable is optimized out of existence by javac,
yet if that type was later removed (such as if it was a user defined type in
the build), then the class file would still load, but the clean recompile
would fail because that type would no longer be findable by javac.
Now, this is a straw man of sorts. We could enable debug information ala
javac -g. In which case, the javap will show that "List" is contained in the
class file. I \think\ that the only exception is constant variable fields
(as defined by the Java spec third edition). However, I do not know any good
place of reference to which claims that this is the only exception, the only
difference between the information contained in class files and the
information printed by javac -verbose. That is the purpose of this email
chain: to determine what acceptable substitutes there are to javac -verbose
(including sprucing up javac -verbose).
I still strongly suspect that current javac -verbose will give the most
reliable results: what I need to know is the exact list loaded by the
compiler to compile a java file, and that is exactly what javac -verbose
prints out (when used as described above, though at a great wall clock run
time cost.)
It seems that I might be able to hack my way around this "constant variable"
feature by using another form of Ghost Dependencies for constant variable
field simple names. I describe this above, specifically rules 4a and 4b of
rules 1-4. However, a more ideal situation would be for javac to act like
any sane C compiler and print out the actual external files used during the
compile in a usable form, aka on a per CompilationUnitTree basis, aka on a
per input java file basis.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20100526/134d11d5/attachment.html
More information about the compiler-dev
mailing list