Setting up the Mercurial Repositories for Sumatra

Mon Dec 10 23:34:35 PST 2012

On Dec 10, 2012, at 5:47 PM, John Coomes wrote:

> John Rose (john.r.rose at oracle.com) wrote:
>> In concrete terms, we think this comes to something like the following repositories:
>> 
>> A. http://hg.openjdk.java.net/sumatra/sumatra/{hotspot,jdk,langtools,...}  (cloned from jdk8, buildable & testable, occasionally refreshed and rebased)
>> 
>> B. http://hg.openjdk.java.net/sumatra/sumatra-dev/{hotspot,jdk,langtools}  (bundle of independent branches, occasionally rebased on A)
> 
> By "rebase", I assume you are *not* referring to the history rewriting
> done by the mercurial rebase extension.

That's correct.  I take it as a given that Mercurial history has to be monotonically increasing within any given repo.

One advantage of the flat patch file model (as you point out below) is that rebasing a patch set does not require destruction of history.  You just generate a new patch (or patches).  The effect on repo history is confined to the patch repository, and to the particular patch file that required a rebase.

The place where you pay for this separation is if two people try to update the same patch, then you have to merge diffs-of-diffs.  That is painful.  For this reason, if two people are working on one mlvm patch, we split the patch into parts, and merge it back up later when the concurrent development is finished.

> That sort of history
> rewriting is very inconvenient in shared repos, as everyone must throw
> away their old repo, clone the rewritten one and then reapply their
> work in progress to it.  It also requires an admin's help to do that
> on the servers.

Yuck.  I suppose part of the problem is that the admin has to create a blacklist of obsolete versions (hashes) which the repo will refuse to accept from out-of-date users.

>> C. http://hg.openjdk.java.net/sumatra/sumatra-dev/scratch (initially empty workbench shelf, for sharing artifacts other than JDK source changes)
>> 
>> John Coomes is looking into configuring these repos with suitable "jcheck" options to relax some of the OpenJDK rules on changeset structure, and to allow B. to have multiple simultaneous branches.
> 
> The first (lax checking of changeset comments) is already present in
> jcheck, and just needs to be enabled for the sumatra repos.  The
> second (allowing named branches) will require a modest change to
> jcheck and a small change to jprt[1].

My biggest question about branches is, do they work in practice, for the workflows we are intending?  (I.e., small scale, provisional & experimental changes, independent workers, controlled mini-integrations.)  I don't know anyone who has used them, so I'm slightly doubtful.

>> Before we jump in, though, I have one big worry, and it's the same as yours, Tom:  Which practice of branching will work for us?  I have enjoyed using the simplest possible version:  Flat patch files, handled with MQ (hg qpush etc.) and manually rebased.  This gives maximum flexibility, but may be too unfriendly for us.  If there is a HG branch model that works better than (versioned) flat patch files, let's use it.
> 
> Let me list the pros & cons as I see them.  Note that I have not
> worked directly with the mlvm flat patch file model, but have talked
> to those who have:
> 
> Flat patch file model (ala mlvm):
> 
> pros:
> 
> 	separation - each patch remains a separate unit for its entire
> 	lifetime--until the point it is ready to include in a jdk
> 	release repository (e.g., jdk9/jdk9/hotspot), if that's
> 	desired.

Corollary of separation:  No direct interaction between patch history and baseline history.  There are rebase events which cause changes to patches, but there are no merge nodes in the baseline repository.  Not sure if this is truly a simplification, but it seems so to me.

> 	flexibility - can include/exclude various changes as needed,
> 	without any commitment to to keeping the result (other than
> 	the effort needed to resolve conflicts, if any)

Corollary:  Patches can be reordered relative to each other, and split or merged.  Changes tend to be small, and (again) do not affect baseline repository.

> cons:
> 
> 	conflicts result in patch reject files, which must be fixed up
> 	by hand - merge tools are not invoked

There is a claimed solution for this, to the effect that "hg pull --rebase" DTRT on MQ patches:
  http://stackoverflow.com/questions/11700136/when-doing-qpush-can-i-get-a-merge-tool-instead-of-rej-files

I have not tested this.  But it seems reasonable.  Note that the scary word "rebase" applies to the MQ patches only, which are designed to be rebased, as discussed above.

> 	changeset history is a diff against patch files ("diffs of diffs")

Yes.  That's the worst.  In essence, the changeset history is useful (a) for its comments and other metadata, and (b) as a way to return to an old configuration.  The actual diffs-of-diffs ( ∂ ² ) are hard to read.

When I have to evaluate a patch change in mlvm, what I usually do is (a) manually reverse the current version of the patch using "patch -sp1 -R < .hg/patches/foo.patch", (b) materialize the previous version of the patch, (c) run the previous version forward "patch -sp1 -N < .hg/patches/foo.patch.prev, and (d) examine the effect on the working files, using "hg diff".  If I had to do it frequently, I'd make it into a script, but I don't.

> 	requires careful attention to avoid committing or merging
> 	changes into the wrong patch

(That con applies to both scenarios.)

> 	many operations are indirect - must clone a stable repo, then
> 	apply the patch files before doing the desired operation:
> 	build, generate webrev, compare changes, etc.
> 
> Named branch model:
> 
> As I envision it, each sumatra feature of decent size would be
> developed on a separate named branch.  During development of the
> feature, changes from the "stable" branch would regularly be merged
> into the feature branch (but not the other way around).  Once the
> feature was deemed complete and stable, the feature branch would be
> closed and the net result applied to the "stable" branch as a single
> changeset.
> 
> pros:
> 
> 	standard mercurial development model
> 
> 	conflicts are resolved using the normal mercurial merge
> 	machinery
> 
> cons:
> 
> 	jcheck and jprt[1] would have to be updated to allow named
> 	branches (modest changes in both cases).
> 
> 	requires careful attention to avoid committing or merging
> 	changes on the wrong branch
> 
> 	incomplete separation - after the feature is complete and its
> 	feature branch is merged into the "stable" branch, the changes
> 	required to keep the code in sync with the upstream repos
> 	would be scattered across various merge changesets, which
> 	would accumlate over time.  Thus there would not be a single
> 	entity that held the entire change for that feature.

Hmm...  This problem is related to the one of "cherry-picking" or back-porting, where you want to take some changesets from one repo and move them sideways to another, probably consolidating them into a single changeset.  The problem is greatly complicated by the fact that the list of required changesets from the original repo is not easy to compute, if the main changeset has complex dependencies on other changesets, which themselves may contain changes that the back-porter doesn't want.

(In practice, I find it difficult to predict how these details of workflow will shake out, which is why I would prefer to rely on somebody's experience.  That's why I keep asking about branches.)

> 	the old mercurial version currently in use on our servers
> 	would require the use of 'push -f' when creating a new branch
> 	or after merging the "stable" branch into a feature branch
> 	(newer hg versions don't require this)

Thanks, John, for the detailed analysis.  It's really interesting.

— John