Recording source information in a build

Thu Apr 3 18:36:19 UTC 2008

Problem Statement:

   Given a build of the OpenJDK, how can you find out what source was used to
     build this binary install?

Seed of a Solution:

   With Mercurial, a single repository changeset number identifies the state of
     the complete source repository. If this changeset (or set of changesets)
     could be somehow recorded with the built bits, then given any build you
     could quickly and easily reconstruct the exact source files that were used
     at build time.

Problems:

   We have a forest not a single repository.
   We often create source bundles (sources minus the SCM management data, e.g. ".hg")
     so we need this to work in the face of building from source bundles.

Possible Solution:

   First issue is identifying a repository of the forest relative to the root of the forest.
     So each repository would get a managed file ".identification" which would contain
       information to help identify the repository.
       For example, the topmost OpenJDK one would have a ".identification" file containing:
         root=.
        directory=.
        description=Root of the JDK Source Tree
       and the corba one would have:
         root=..
         directory=corba
         description=Corba Sources
       etc. (the directory could be a deeper nested directory, like jdk/src/closed)
     This .identification file would be a permanent file in the repository, at the root
       of the repository. It's saying that to get to the root of the forest, you
       'cd ${root}'. And if this repository is not located at ${root}/${directory}
       something is wrong, or the repository is not currently part of a forest.

   Second issue, the changeset id.
     A second file called ".changeset" would not be a managed file and would be created
       before the source bundles are created, and be non-existent if they can't be created
       because you don't have repositories (building from raw source trees) or don't have
       access to 'hg'. These files would just contain a changeset=id, created with:
          hg tip --template 'changeset={node}\n'
       So somewhere this needs to happen, before source bundles are created and before
         the use of this data:
         TREES:=$(shell hg ftrees)
         if [ "$(TREES)" != "" ] ; then
           for i in $(TREES) ; do
             (cd $i && hg tip --template 'changeset={node}\n' > .changeset )
           done
         fi

   Third, all this data needs to be merged together into a file that could be
     used later to recreate the source tree by running:
        hg clone -r ${changeset} http://hg.openjdk.java.net/jdk7/${directory} ${directory}
     as many times as needed.
     The Makefiles would be sensitive to the existence of the .changeset files and
       allow for them to not exist where they are used, they might not be there in
       all cases. But when they are there, do something like:
         jdk_source_information.txt:
                 $(RM) $@
                 echo "# JDK Source Information" > $@
                 if [ "$(TREES)" != "" ] ; then
                   for i in $(TREES) ; do
                     if [ -f ${i}/.identification ] ; then
                       cat ${i}/.identification >> $@
                       if [ -f ${i}/.changeset ] ; then
                         cat ${i}/.changeset >> $@
                       fi
                     fi
                   done
                 fi
        Resulting in a file:
          # JDK Source Information
          root=.
          directory=.
          description=Root of the JDK Source Tree
          changeset=BIGHEXNUMBER
          root=..
          directory=corba
          description=Corba Sources
          changeset=BIGHEXNUMBER
          ...
        Left in the jdk install tree.

---
Just a first guess at a basic idea as to how this could work...

Please don't assume the above is also an implementation, it's the basic idea
of having members of the forest identify themselves, and the idea of
recording the changesets, and finally of leaving source information in
the resulting binary build.

Comments?

-kto

P.S. Full RFE can be seen at: http://bugs.sun.com/view_bug.do?bug_id=6631003