Incremental java compile AKA javac print compile dependencies

Joshua Maurice joshuamaurice at gmail.com
Tue May 25 17:11:21 PDT 2010


On Tue, May 25, 2010 at 7:43 AM, Jonathan Gibbons <
jonathan.gibbons at oracle.com> wrote:

>  On 05/24/2010 11:47 PM, Joshua Maurice wrote:
>
> Thank you. I will ask that group.
>
> However, your suggestion of the class files doesn't quite fit what I want
> to do. Using the information from class files, (ignoring the constant
> problem), (without great modification), will only work if the build cascades
> endlessly downstream. I am specifically trying to make a system where the
> cascade will terminate when no further recompiles downstream are required.
> To this end, the information in the class files is woefully insufficient.
> Ex: C extends B. B extends A. A declares member field x. Class D uses C.x.
> I'm pretty sure that in D's class file, A and B are not mentioned. Some
> change to B could introduce a field named x, "hiding" A.x, potentially
> changing the meaning and compile of D, potentially even resulting in a clean
> build failure, but this "incremental" build would silently call a success.
>
> Also, the JavaFileMananger idea will not work for the same reason that
> -verbose as is will not work. I need / want this information on a per java
> file basis, but javac "caches" referenced class files, only loading them
> once, so neither will give me what I need.
>
>
> By analysing the contents of a class file you can determine what accessible
> aspects of any class each class depends on, and that should be sufficient to
> solve the dependency analysis problem.  You do /not/ want to look at the
> verbose output because it will tell you too much irrelevant info.  For
> example, if class A calls method print in class B, all that matters is that
> A calls B.print -- it does not matter how B.print is implemented -- whether
> it uses System.err.println, or java.util.logging or CORBA.    Analyzing
> -verbose output will tell you all the implementation dependencies; you only
> need the "API signature" dependencies.
>
> Furthermore, by computing an MD5 checksum for the API signature of each
> classfile, you can determine whether a classfile's API signature has been
> changed by a recompilation.   If the checksum is changed after a
> compilation, you need to recompile dependent classfiles. If the checksum has
> not changed, you don't need to recompile dependent classfiles. So if you
> just edit comments in your file, or if you just change use of
> System.err.println to System.out.println, then you don't need to recompile
> any of the classes that depend on the recompiled class.  And that's the holy
> grail of incremental compilation.
>
>
>
>
> PS: Is there a non-arrogant way to suggest reading the
> comp.lang.java.programmer discussion, as it contains most of this discussion
> already, without sounding like a self absorbed jerk who thinks his time is
> more important than everyone else's? Probably not. I'll ask compiler-dev.
>
>  Very perceptive.
>
>
> On Mon, May 24, 2010 at 9:29 PM, Jonathan Gibbons <
> jonathan.gibbons at oracle.com> wrote:
>
>> Joshua,
>>
>> This question is better asked on compiler-dev at openjdk.java.net.
>>
>> There are various ways you can get the information you are asking, in a
>> single invocation of the compiler and without changing the compiler.
>>
>> One way would be to invoke the compiler via the JSR 199 API. You can
>> provide your own JavaFileManager which can keep track of files read and
>> written by the compiler. For the class files that are written, you can
>> analyze the class files to determine all the classes that are referenced by
>> that class file. Class files also identify the source files that they come
>> from. That should be enough to get you almost all the information that you
>> need. The only information you'll miss using that technique is inlined
>> constants.
>>
>> -- Jon G
>>
>>
>>
>>
>> On 05/24/2010 07:23 PM, Joshua Maurice wrote:
>>
>>> I'm sorry. I'm new to this list, and I'm not sure if this is the
>>> appropriate
>>> forum. I was looking at the javac source code, and I am now looking for
>>> help
>>> or guidance. Specifically, I want to create an incrementally correct
>>> build
>>> system for java, preferably without having to (re)write a Java compiler.
>>> I
>>> need some additional information to be printed from javac to get full
>>> dependency information. Please see:
>>>   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4639384
>>>
>>>
>>> http://groups.google.com/group/comp.lang.java.programmer/browse_thread/thread/bb6f663d55700951#
>>> for a detailed discussion. The short version is that -verbose output
>>> almost
>>> is what I need. I need to know the full list of class files loaded for
>>> each
>>> java file being compiled, and the full list of java files in the compile
>>> used by each java file being compiled; that is, the full list of java
>>> file
>>> and class file \compile\ dependencies of each java file in the compile.
>>> One
>>> can already get this information by calling javac once on the set of
>>> interested java files to do the compile, then an additional java -verbose
>>> once per java file to get the dependency information.
>>>
>>> However, calling javac so many times is quite slow, and unnecessary. I
>>> would
>>> like to enhance javac to print out this dependency information with one
>>> invocation aka without resorting to calling javac once per java file, and
>>> preferably get this change into the actual official released javac.
>>>
>>> PS: Yes this information on its own is insufficient to do an
>>> incrementally
>>> correct java compile. However, when combined with Ghost Dependencies
>>> (please
>>> see:
>>>   http://www.jot.fm/issues/issue_2004_12/article4.pdf
>>> ), I think that this would actually work. I have some tests to this
>>> effect
>>> already with my proposed system which currently calls javac once per java
>>> file to get the compile dependency information.
>>>
>>  So, do we top post or bottom post here? Oh well, I'll try to stick to
bottom opsting.

To start from scratch, my goal is to create a fully 100% incremental correct
build system, where a "correct" build is defined as a build which produces
"equivalent" output as a full rebuild from a completely pristine source code
"view" / checkout, and an incrementally correct build is defined as a
correct build which skips some unnecessary rebuilds. The less unnecessary
work done, the more "incremental" the build is.

I am quite unwilling to compromise on incremental correctness over these
common developer actions:
1- add task "jar from class folder"
2- remove task "jar from class folder"
3- modify task "jar from class folder", such as changing the jar name or
changing the class folder
4- add task "compile java files from source folders to class folder"
5- remove task "compile java files from source folders to class folder"
6- modify task "compile java files from source folders to class folder",
such as changing the compile classpath,
7- add java file
8- remove java file
9- modify java file

To do this, one needs to keep track of all expected files in the output
folder(s), remove unexpected / stale files in the output folder(s), and
various other minutia which is not relevant here.

What is relevant is that to get decent levels of incremental, aka skipping
unnecessary rebuilds, the build system needs to know for each java X, the
full list of class files and java files which will be directly used by javac
when compiling X. Whenever any of those direct compile dependencies have an
"interface" / "signature" change, X needs to be recompiled.

This is why I very much want to use javac itself to print out this list. I
am somewhat afraid that any other method will have corner cases which will
miss dependencies that javac -verbose would catch. Ex: constant variable
fields. Can anyone list with certainty all the corner cases?

The newest idea I have to get this list is as follows. I am not convinced to
its accuracy.

1- A java file's compilation is out of date when its source file has been
modified since the last compilation.

2- A java file's compilation is out of date when it has a newer Ghost
Dependency, see paper: www.jot.fm/issues/issue_2004_12/article4.pdf

3- A java file's compilation is out of date when one of its output class
files has a reference to a type
3a- whose class file has a last "interface changed" time which is newer than
the java file's last compilation,
3b- or which is in an output class file of an "out of date" java file which
is part of this javac task,
3c- or which has a super type whose class has a last "interface changed"
time which is newer than the java file's last compilation,
3d- or which has a super type which is in an output class file of an "out of
date" java file which is part of this javac task.

4a- A java file's compilation is out of date when
- it has a potentially used constant variable field simple name X (which is
basically any simple name of any name in the source),
- and there is a class file on the compile classpath which "exports" a
constant variable field which has simple name X,
- and the "exported" constant variable field has a "last changed" time which
is newer than the java file's last compilation.
4b- A java file's compilation is out of date when
- it has a potentially used constant variable field simple name X (which is
basically any simple name of any name in the source),
- and there is an "out of date" java file in this javac task which has a
class file which "exports" a constant variable field which has simple name
X.

The reasons for rule 1 should be obvious.

The reasons for rule 2 are clearly explained in the paper.

The reasons for rules 3a and 3b should be equally as obvious.

However, we also need rules 3c and 3d to be fully correct. Ex:
  public class aa { int x; }
  public class bb extends aa {}
  public class cc { int x = new bb().x; }
In this example, the class file of "cc" contains no reference to "aa". The
member "x" of "bb" is declared in "aa.java". A change to "aa.java" will
produce a class file "aa.class" which has a new last "interface changed"
time. This will trigger a rebuild of "bb.java", but the class file
"bb.class" will be binary equivalent, so no rebuild will occur for "cc.java"
without rules 3c and 3d.

Rules 4a and 4b are needed to handle the exception of constant variable
fields as this dependency information does not make it into the class files.


Note that a change to a java file in a javac task might result in a new
dependency between that java file and another java file in that same javac
task, so after finding all "out of date" java files in a javac task and
compiling those, it might result in more "out of date" java files. The fix
is simply to rerun the "out of date" logic to see if there's more "out of
date" java files, at which point I'll probably just rebuild the whole task.

Having thought this through, this seems a bit more doable than when I gave
it a passing glance in the comp.lang.java.programmer discussion. This could
work. Rules 4 could be quite onerous on the level of incremental, depending
on the actual names being used in the source code. I suppose it wouldn't be
horrifically difficult to change the names in most circumstances to not
conflict though.

I'm still back to my last "objection", that I'm not knowledgeable enough to
know if this catches all possible corner cases. Does anyone know any
possible cases which this scheme would not catch? I suppose I'll implement
and start testing this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20100525/25461cad/attachment.html 


More information about the compiler-dev mailing list