Looking ahead: proposed Hg forest consolidation for JDK 10

Wed Oct 12 07:16:14 UTC 2016

Hi Joe, 

thanks for your detailed answer.  Unfortunately it 
doesn't dispel my concerns.

> Hi Goetz,
> 
> On 10/11/2016 2:30 AM, Lindenmaier, Goetz wrote:
> > Hi,
> >
> > I see several problems with this approach.
> >
> > 1.) Mercurial already has problems scaling with the current repositories.
> >      This will get worse with bigger repos. E.g. 'hg diff' takes
> >      14 secs on jdk, but only 2 secs on jaxp:
> >      jdk:  ~90000 files, 15000 changes, hg diff takes  14 secs
> >      jaxp: ~12000 files,  1000 changes, hg diff takes  2 secs
> 
> By its nature, hg diff needs to walk the directory tree so a bigger tree
> will generally be slower. 
Yes, and that's bad!

> Doing a diff on a particular subdirectory, say
> for hotspot,  should have comparable performance as today.

The use case of hg diff is to find what was changed. Obviously, if I only 
do it on the subdir, I might miss something. 

> The fsmonitor extension,
> https://www.mercurial-scm.org/wiki/FsMonitorExtension, could help in
> this case too.
> 
> > 2.) Cloning the repo does not scale.
> >      Cloning the root repo and calling get_source.sh takes 20 min.
> >      I ususally only clone the root repo and hotspot. This only
> >      takes 3 min.
> >      I don't think merging the repos might improve the 20 mins.
> >      In contrary, as cloning the jdk repo takes most of the time,
> >      and the others run in parallel, cloning an even bigger repo
> >      will be slower.
> >      Alternatively, one could hold a 'master' repo and replicate that
> >      by local copy. But this shows similar timings (1:40 vs. 9min).
> 
> We've discussed this kind of use-case internally as well. The
> recommendation is to have a designated local master and then do local
> clones of that. On a unix system if the local clones are on the same
> disk, hard links are used with a copy-on-write policy so the clones are
> space-efficient and time-efficient to create. The local clone times
> we've seen are about 30 seconds in that case.

I would have to run the watchman on all the machines I happen to 
work on. A possible solution imposing work on every user.

> > 3.) Having to clone the full repos will require considerably more
> >      disk space.
> >      I'm working on various issues in hotspot and keep them seperated
> >      by doing this in individual repositories that only contain hotspot.
> >      These repos will require considerably more space.

> If disk space is a concern, you can use mq or bookmarks against a single
> repo.

I use mq a lot.  But often for separate tasks separate repos are required.
Say, I'm working on
  - testing a change of someone other against head revision to review it.
  - developing the s390 port with a mq that contains 10 patches
  - looking for a performance regression by syncing to older revisions, 
    building and running benchmarks in a script. 
You can't combine such tasks with a mq in one repo.

> > 4.) There will be additional merges because changes that are now done
> >      in two repos will then be done in a single repo. If I then sync back
> >      a few hotspot changes, a lot of files in the other subdirectories
> >      will get touched. This slows down sync and causes rebuilds.
> >      Sure this might just be what is intended, but currently I don't
> >      need to rebuild jdk etc. very often.
> 
> While hotspot and the rest of the JDK can often be treated as
> approximately independent, they are not truly independent.

Yes, but they _are_ approximately independent. That suffices to 
avoid lot's of boilerplate work.
In other SCM systems you can sync back only a subdirectory.
Mercurial does not support that.

> > 5.) It will get harder to monitor submitted changes that are relevant
> >      for a specific area. E.g., I might only want to see changes in hotspot.
> >      In the web frontend, you can not browse changes on subdirectory basis.
> >      Maybe this can be solved, as the commandline 'hg log' etc. already
> support
> >      this.
> 
> We don't have plans to change the Hg web UI so I think a command line
> solution would be appropriate here.

You should consider fixing this, maybe as a follow up.  You can already 
browse file history,  This should be also possible for directories.

> > 6.) A single repo will simplify making combined changes. So there will be
> >      more of these. But combined changes complicate handling of our
> >      licensed   code.
> >      In our activities as licensee, we are consuming hotspot change-wise.
> >      This is because we modified a lot in hotspot, and merging hotspot
> >      changes step by step simplifies the merging.
> >      On the other side, we consume the changes to jdk etc. as chunks.
> >      This is because we changed much less in these directories so
> >      that merging causes less problems. Also, there are much more
> >      changes and we don't have the manpower to consume them change-
> wise.
> >      Having combined changes requires more synchronization between
> >      the two merging tasks. It's already an increasing effort in
> >      jdk9.
> >      Also, to follow these two different merging approaches for hotspot
> >      and the rest, we would have to first split the single repo into
> >      two parts.
> >
> >
> > Comments to the JEP:
> >
> > I appreciate that the change history is kept as it makes research
> > in old changes more easy. On the other side, dropping the history
> > might speed up handling of the new repo.
> 
> We are aware that Facebook has developed Hg plugins to allow shallow
> clones, i.e. clones without all the history, but we haven't investigated
> using them yet.
> 
> >
> > I also appreciate the changes in directory layout. If the
> > repos are merged, this should be done this way.
> >
> > We find it difficult to keep the jtreg runner in sync with our
> > current version of jdk9, especially as we have two of them (We
> > test openJdk and SAP JVM 9, and within SAP JVM 9 hotspot and
> > jdk often differ in a few builds.)
> > I would appreciate if the runner could be included in the
> > root/test directory.
> 
> I'm not quite sure what you are referring to by the jtreg runner.

I mean the code in http://hg.openjdk.java.net/code-tools/jtreg

As Andrew stated, some subdirectories are pretty stable. It
might completely make sense to merge these into one repository, but I'm
really concerned about jdk and hotspot. 

In general, I think those people that are highly specialized on complex
subcomponents of the VM will suffer from this.  They often are fine
just working with hotspot / jdk etc..  In general, these people develop
new components in the latest branch.
Those people that have to maintain and test the VM really will profit
from the new setup.  They anyways always operate with the full 
repo tree.
Having this said, I think it would make more sense to put the legacy code
base into merged repos, and not the development branch?

Best regards,
  Goetz.