JDK8 Preliminary Repository Layout

Wed Apr 27 14:39:29 UTC 2011

On Mar 11, 2011, at 10:07 AM, Kelly O'Hair wrote:

> On Mar 11, 2011, at 2:11 AM, Steve Poole wrote:
> 
>> Kelly - can you explain for us newbies why you have separate repositories?  I'm sure I can list any number of reasons but it would be good to get your view.   It may sound like a dumb question but it does help in these sort of discussions to know some of the history :-)
> 
> This is probably tainted, but I will try and provide hopefully a honest view, with some humor thrown in. ;^)
> 
> Prior to Mercurial, we used the Sun product Teamware and we had separate workspaces (what Teamware called
> a repository) for: control, hotspot, j2se, deploy, install, etc. (deploy and install were Sun plugins & installers).
> So that set a pattern. Teamware basically managed SCCS files, so as a workspace grew, it did not scale well,
> and Teamware relied on NFS access to share these files (80,000+ files, when you count SCCS s.* and * files).
> So this separation initially, in my view, was done for developer productivity.
> I don't have any history on why the other workspaces existed as separate workspaces, but I just assume
> it was for the same reasons as hotspot, nobody wanted to part of the big j2se gorilla in the room,
> and having your own workspace created more of a separate silo for that team to work in I suspect.
> The control workspace was a small batch of makefiles that built all the workspaces, used by Release
> Engineering mostly.
> 
> Note that Teamware allowed for partial workspaces, since it was only managing SCCS and individual
> file edits, you could trim a Teamware workspace down to just the directories you were working in, and
> still sync and push with subset workspaces. This flexibility was taken advantage of by the j2se team
> to minimize the NFS traffic and improve productivity too. Mercurial doesn't allow for subset repositories.
> 
> The hotspot team found that their smaller 5,000 file workspace was easier to deal with, and in fact
> the VM was a natural interface boundary, easy to isolate, controlled APIs, pre-built VMs could be
> dropped into a JDK, testing/experiments were easy. Hotspot was also mostly C++ and native code.
> Later, a "Hotspot Express" delivery model was possible so that the same sources could be delivered
> to completely separate JDK releases.
> The hotspot developers were happy, well, as happy as a hotspot developer can be I suppose ;^)
> (The Serviceability Agent or SA was developed by the hotspot team and was/is very tightly integrated
> with hotspot, so it became part of hotspot, not the j2se).
> 
> The j2se workspace was much larger, maybe 35,000 source files, it initially included all the sources from the
> corba, jaxp. jaxws, and langtools repositories that exist now.
> This j2se workspace was very hard to deal with and many of the sources were copy&paste from other projects
> that weren't even managed by the JDK team, new deliveries created lost fix situations and an unreliable state.
> The build process was complicated because part of the workspace had the javac sources, which had to be built
> first, then that used to build the sources all over again.
> 
> So just prior to OpenJDK, or about then, we decided to try and split up the j2se workspace to better manage our build
> and source importing issues. The corba, jaxp, and jaxws workspaces were created and those files were pulled
> from the j2se workspace, as was the javac and "language tools" sources into a langtools workspace.
> The j2se workspace was then renamed "jdk".
> 
> That gave us the workspaces: corba, jaxp, jaxws, langtools, jdk, hotspot, ...
> 
> These Teamware workspaces eventually became what you see today as the openjdk7 Mercurial repositories,
> but we had to push some files down into smaller closed repositories: src/closed, test/closed, and make/closed
> for jdk, and src/closed, test/closed, and build/closed for hotspot. The fact that hotspot had managed sources
> in a build directory was a thorn in our sides for a while and it was eventually removed along with build/closed.
> Makefile logic is pretty much 100% open right now.
> 
> I'm not sure that the open sourcing influenced this, but note that corba, jaxp, jaxws, and langtools are pure
> open source, and 100% Java (except for one .c file in corba initially). Managing pure open Java projects is a
> joy if you ask me. ;^)
> 
> For langtools, the team wanted this separate repository and lobbied hard for it as a productivity aid and also to allow
> them to use the NetBeans IDE on just their sources (NetBeans and some IDEs had a hard time swallowing the entire
> j2se sources), but they also needed to try and ship a separate javac product somewhere, I forget the details.
> Maybe some work with some outside developers, Jonathan Gibbons would remember.
> I'm sure if you asked him, there is no way they would want back into a larger repository.
> 
> The corba sources haven't changed much since then, makefile changes and all native code has been removed.
> Originally, we wanted an ant script for a faster build and to allow for NetBeans/IDE use as it became pure
> Java. That hasn't happened. We keep thinking that these sources should be updated with newer Corba
> sources and use whatever build process the J2EE Corba team has. Not sure what the plans are here.
> 
> The jaxp and jaxws repositories got the source drop model and the sources originally managed were deleted
> in favor of source drops from these teams, where they manage the master sources for these products that also
> ship in other forms in other products. This is still a work in progress in terms of finding the best way to
> manage this. We need the sources (can't just get class/jar drops) so that we can build classes with -target 7,
> but changes really need to go through these teams so they can be managed properly.
> 
> Mercurial's changeset model and the need for "merge changesets" when two changesets were created from
> the same parent changeset is another aspect to this. Many teams that changed from a file based management
> system to Mercurial have encountered "merge mania", the NetBeans team ran into this.
> It's an issue with too many developers trying to push changes into a single large repository.
> You can't push a changeset into Mercurial unless you have done a pull and sync'd up with the latest changesets
> in the repository. If there are frequent pushes going on, either from too much activity or too many developers,
> someone may experience a:
>   hg push    # fails because you need to do a pull "too many heads message"
>   hg pull -u && hg merge && hg commit -m Merge    #  Or hg fetch
>   hg push   # fails because you took too long and someone else pushed a new one
>   hg pull -u && hg merge && hg commit -m Merge    #  Or hg fetch
>   hg push   # fails because you took too long and someone else pushed a new one
>   ...
> This is minimized by reducing the "fan in", smaller repositories, fewer developers pushing into the same
> repository, etc. Our team forests minimize this, and our separate repositories minimize this.
> Now some people might say this is a flaw in Mercurial, and I disagree.
> By having one "tip", and explicit merge changesets, the sources have a singular state, with one simple
> changeset ID, you know the state of all 20,000 files in the jdk repository.
> 
> Mercurial handles very large repositories very well in my opinion, tremendously fast when using local
> disk and not NFS file systems. So having Mercurial manage one repository of 50,000 files is not an issue,
> except needing the disk space.
> 
> Hope this helps and I wasn't too long winded.

This should go up on a wiki somewhere. I'd love to point folks interested in the macosx-port project to it, and the exact state will change over time as well.

Cheers,
Mike Swingler
Java Engineering
Apple Inc.