Incremental java compile AKA javac print compile dependencies
Joshua Maurice
joshuamaurice at gmail.com
Wed May 26 13:35:14 PDT 2010
On Wed, May 26, 2010 at 8:10 AM, Jonathan Gibbons <
jonathan.gibbons at oracle.com> wrote:
> On 05/25/2010 06:38 PM, Joshua Maurice wrote:
>
> On Tue, May 25, 2010 at 6:01 PM, Jonathan Gibbons <
> jonathan.gibbons at oracle.com> wrote:
>
>> On 05/25/2010 05:11 PM, Joshua Maurice wrote:
>>
>>>
>>>
>>> What is relevant is that to get decent levels of incremental, aka
>>> skipping unnecessary rebuilds, the build system needs to know for each java
>>> X, the full list of class files and java files which will be directly used
>>> by javac when compiling X. Whenever any of those direct compile dependencies
>>> have an "interface" / "signature" change, X needs to be recompiled.
>>>
>>
>> Stop right there. There's likely a wrong assumption here, hidden in the
>> word "directly".
>>
>> If you start from scratch, with no classes precompiled, when you compile
>> X, javac will pull in from the sourcepath the transitive closure of X and
>> all its dependencies. Thus if X refers to Y, and if the implementation of Y
>> refers to Z, then javac will compile X and Y and Z, even though there no
>> direct reference in any way from X to Z. This is why your proposed
>> technique of tracking -verbose output will not work.
>>
>
> What? For starters, I'm planning on specifically not using the -sourcepath
> option. Suppose a user touches X only, and nothing else depends on X, like
> your example, and I want to only recompile X.java. However, if I give the
> -sourcepath option, then as you note, javac will recompile X, Y, and Z, but
> Y and Z are useless recompiles.
>
>
>
>
> If you have mutually referential class files (X refers to Y, Y refers to X)
> then you either need -sourcepath or you need to compile X and Y together.
> Most large code bases will have such mutually referential classes. If
> you're talking about a general purpose build tool, you need to deal with
> this case. If you're only dealing with a specific code base that does not
> have mutually referential classes, that's a whole lot less of an interesting
> problem.
>
> Mutually referential class files are what make it difficult to identify the
> set of dependencies for each individual class. Unless you look at the class
> file or get into and look at the AST, you'll likely end up finding the set
> of dependencies for groups or cycles of classes, and those can sometimes be
> surprisingly large.
>
So, let me try again. I don't know how the rest of the world does it, but my
company separates java files into separate groupings, separate "javac
tasks". Currently we use Maven, and one of our products has over 800
pom.xmls, roughly 400 of which are jar type pom.xmls, for a combined total
of around 25,000 java files (in addition to other kinds of source files,
such as C++). So, there will be 400 calls to javac in a full clean rebuild.
There may be and probably are classes with circular dependencies in this
build, but the only cycles are intra-pom.xml, not between pom.xmls.
My proposed solution is to build a dependency graph of this pom.xmls, these
"javac units", traversing them in dependency order, parallelizing it as far
as the pom.xml aka "javac unit" dependencies would allow. For each "javac
unit", the system will determine which files are out of date using the above
rules, and cascade endlessly down the java / class dependencies in that
"javac unit" to get a list of "out of date" java files. It will then invoke
javac once on those "out of date" java files in that "javac task".
(Note that this could result in additional java files being "out of date" in
that "javac task" due to Ghost Dependencies. I'll do another "out of date"
check after the first javac invocation, and if there are any new "out of
date" java files, I'll just do a clean build of that "javac task".)
Specifically, I will take the class / java file dependencies and cascade
endlessly without termination inside of a "javac task". Those java files
will then be built. I will then move on to the next "javac task" and repeat
the application of the rules.
Hopefully this is a bit more clear. To reiterate, I believe as an educated
guess that this is a good tradeoff between parallelization, skipping
unnecessary recompiles, and avoiding overhead of calling javac many times. I
have gotten quite promising results on my current code base using this. I
have my experimental setup working on roughly 100 pom.xmls of the product at
the moment. That's how I was able to profile ~85% of the time spent in a
full clean rebuild was spent calling javac multiple times with -verbose to
get the dependency information. Even with this overhead, I have achieved
roughly 3x faster wall clock build times on a decent 4 core computer than
our current Maven solution for a full clean rebuild.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20100526/24c6869f/attachment.html
More information about the compiler-dev
mailing list