JDK8 Preliminary Repository Layout

Fri Mar 11 18:07:35 UTC 2011

On Mar 11, 2011, at 2:11 AM, Steve Poole wrote:

> Kelly - can you explain for us newbies why you have separate repositories?  I'm sure I can list any number of reasons but it would be good to get your view.   It may sound like a dumb question but it does help in these sort of discussions to know some of the history :-)
> 

This is probably tainted, but I will try and provide hopefully a honest view, with some humor thrown in. ;^)

Prior to Mercurial, we used the Sun product Teamware and we had separate workspaces (what Teamware called
a repository) for: control, hotspot, j2se, deploy, install, etc. (deploy and install were Sun plugins & installers).
So that set a pattern. Teamware basically managed SCCS files, so as a workspace grew, it did not scale well,
and Teamware relied on NFS access to share these files (80,000+ files, when you count SCCS s.* and * files).
So this separation initially, in my view, was done for developer productivity.
I don't have any history on why the other workspaces existed as separate workspaces, but I just assume
it was for the same reasons as hotspot, nobody wanted to part of the big j2se gorilla in the room,
and having your own workspace created more of a separate silo for that team to work in I suspect.
The control workspace was a small batch of makefiles that built all the workspaces, used by Release
Engineering mostly.

Note that Teamware allowed for partial workspaces, since it was only managing SCCS and individual
file edits, you could trim a Teamware workspace down to just the directories you were working in, and
still sync and push with subset workspaces. This flexibility was taken advantage of by the j2se team
to minimize the NFS traffic and improve productivity too. Mercurial doesn't allow for subset repositories.

The hotspot team found that their smaller 5,000 file workspace was easier to deal with, and in fact
the VM was a natural interface boundary, easy to isolate, controlled APIs, pre-built VMs could be
dropped into a JDK, testing/experiments were easy. Hotspot was also mostly C++ and native code.
Later, a "Hotspot Express" delivery model was possible so that the same sources could be delivered
to completely separate JDK releases.
The hotspot developers were happy, well, as happy as a hotspot developer can be I suppose ;^)
(The Serviceability Agent or SA was developed by the hotspot team and was/is very tightly integrated
with hotspot, so it became part of hotspot, not the j2se).

The j2se workspace was much larger, maybe 35,000 source files, it initially included all the sources from the
corba, jaxp. jaxws, and langtools repositories that exist now.
This j2se workspace was very hard to deal with and many of the sources were copy&paste from other projects
that weren't even managed by the JDK team, new deliveries created lost fix situations and an unreliable state.
The build process was complicated because part of the workspace had the javac sources, which had to be built
first, then that used to build the sources all over again.

So just prior to OpenJDK, or about then, we decided to try and split up the j2se workspace to better manage our build
and source importing issues. The corba, jaxp, and jaxws workspaces were created and those files were pulled
from the j2se workspace, as was the javac and "language tools" sources into a langtools workspace.
The j2se workspace was then renamed "jdk".

That gave us the workspaces: corba, jaxp, jaxws, langtools, jdk, hotspot, ...

These Teamware workspaces eventually became what you see today as the openjdk7 Mercurial repositories,
but we had to push some files down into smaller closed repositories: src/closed, test/closed, and make/closed
for jdk, and src/closed, test/closed, and build/closed for hotspot. The fact that hotspot had managed sources
in a build directory was a thorn in our sides for a while and it was eventually removed along with build/closed.
Makefile logic is pretty much 100% open right now.

I'm not sure that the open sourcing influenced this, but note that corba, jaxp, jaxws, and langtools are pure
open source, and 100% Java (except for one .c file in corba initially). Managing pure open Java projects is a
joy if you ask me. ;^)

For langtools, the team wanted this separate repository and lobbied hard for it as a productivity aid and also to allow
them to use the NetBeans IDE on just their sources (NetBeans and some IDEs had a hard time swallowing the entire
j2se sources), but they also needed to try and ship a separate javac product somewhere, I forget the details.
Maybe some work with some outside developers, Jonathan Gibbons would remember.
I'm sure if you asked him, there is no way they would want back into a larger repository.

The corba sources haven't changed much since then, makefile changes and all native code has been removed.
Originally, we wanted an ant script for a faster build and to allow for NetBeans/IDE use as it became pure
Java. That hasn't happened. We keep thinking that these sources should be updated with newer Corba
sources and use whatever build process the J2EE Corba team has. Not sure what the plans are here.

The jaxp and jaxws repositories got the source drop model and the sources originally managed were deleted
in favor of source drops from these teams, where they manage the master sources for these products that also
ship in other forms in other products. This is still a work in progress in terms of finding the best way to
manage this. We need the sources (can't just get class/jar drops) so that we can build classes with -target 7,
but changes really need to go through these teams so they can be managed properly.

Mercurial's changeset model and the need for "merge changesets" when two changesets were created from
the same parent changeset is another aspect to this. Many teams that changed from a file based management
system to Mercurial have encountered "merge mania", the NetBeans team ran into this.
It's an issue with too many developers trying to push changes into a single large repository.
You can't push a changeset into Mercurial unless you have done a pull and sync'd up with the latest changesets
in the repository. If there are frequent pushes going on, either from too much activity or too many developers,
someone may experience a:
   hg push    # fails because you need to do a pull "too many heads message"
   hg pull -u && hg merge && hg commit -m Merge    #  Or hg fetch
   hg push   # fails because you took too long and someone else pushed a new one
   hg pull -u && hg merge && hg commit -m Merge    #  Or hg fetch
   hg push   # fails because you took too long and someone else pushed a new one
   ...
This is minimized by reducing the "fan in", smaller repositories, fewer developers pushing into the same
repository, etc. Our team forests minimize this, and our separate repositories minimize this.
Now some people might say this is a flaw in Mercurial, and I disagree.
By having one "tip", and explicit merge changesets, the sources have a singular state, with one simple
changeset ID, you know the state of all 20,000 files in the jdk repository.

Mercurial handles very large repositories very well in my opinion, tremendously fast when using local
disk and not NFS file systems. So having Mercurial manage one repository of 50,000 files is not an issue,
except needing the disk space.

Hope this helps and I wasn't too long winded.

-kto