Looking ahead: proposed Hg forest consolidation for JDK 10

Tue Oct 11 09:30:08 UTC 2016

Hi,

I see several problems with this approach.

1.) Mercurial already has problems scaling with the current repositories.
    This will get worse with bigger repos. E.g. 'hg diff' takes 
    14 secs on jdk, but only 2 secs on jaxp:
    jdk:  ~90000 files, 15000 changes, hg diff takes  14 secs
    jaxp: ~12000 files,  1000 changes, hg diff takes  2 secs
2.) Cloning the repo does not scale.
    Cloning the root repo and calling get_source.sh takes 20 min.
    I ususally only clone the root repo and hotspot. This only 
    takes 3 min.
    I don't think merging the repos might improve the 20 mins.
    In contrary, as cloning the jdk repo takes most of the time, 
    and the others run in parallel, cloning an even bigger repo
    will be slower. 
    Alternatively, one could hold a 'master' repo and replicate that
    by local copy. But this shows similar timings (1:40 vs. 9min).
3.) Having to clone the full repos will require considerably more
    disk space.
    I'm working on various issues in hotspot and keep them seperated
    by doing this in individual repositories that only contain hotspot.
    These repos will require considerably more space.

4.) There will be additional merges because changes that are now done
    in two repos will then be done in a single repo. If I then sync back
    a few hotspot changes, a lot of files in the other subdirectories
    will get touched. This slows down sync and causes rebuilds.
    Sure this might just be what is intended, but currently I don't
    need to rebuild jdk etc. very often. 

5.) It will get harder to monitor submitted changes that are relevant
    for a specific area. E.g., I might only want to see changes in hotspot.
    In the web frontend, you can not browse changes on subdirectory basis.
    Maybe this can be solved, as the commandline 'hg log' etc. already support
    this.

6.) A single repo will simplify making combined changes. So there will be
    more of these. But combined changes complicate handling of our licensed
    code.
    In our activities as licensee, we are consuming hotspot change-wise.
    This is because we modified a lot in hotspot, and merging hotspot
    changes step by step simplifies the merging.
    On the other side, we consume the changes to jdk etc. as chunks.
    This is because we changed much less in these directories so 
    that merging causes less problems. Also, there are much more
    changes and we don't have the manpower to consume them change-wise.
    Having combined changes requires more synchronization between
    the two merging tasks. It's already an increasing effort in 
    jdk9.
    Also, to follow these two different merging approaches for hotspot
    and the rest, we would have to first split the single repo into
    two parts.

Comments to the JEP:

I appreciate that the change history is kept as it makes research 
in old changes more easy. On the other side, dropping the history 
might speed up handling of the new repo.

I also appreciate the changes in directory layout. If the 
repos are merged, this should be done this way.

We find it difficult to keep the jtreg runner in sync with our
current version of jdk9, especially as we have two of them (We
test openJdk and SAP JVM 9, and within SAP JVM 9 hotspot and
jdk often differ in a few builds.)
I would appreciate if the runner could be included in the
root/test directory.

Best regards,
  Goetz.

> -----Original Message-----
> From: jdk9-dev [mailto:jdk9-dev-bounces at openjdk.java.net] On Behalf Of
> joe darcy
> Sent: Dienstag, 11. Oktober 2016 04:12
> To: jdk9-dev at openjdk.java.net
> Subject: Looking ahead: proposed Hg forest consolidation for JDK 10
> 
> Hello,
> 
> Looking ahead to JDK 10, a group of JDK engineers have been exploring
> consolidating the large number of Hg repositories in an open JDK forest
> to a single one with the goal of using the consolidated arrangement for
> JDK 10.
> 
> This message is being sent to jdk9-dev since a jdk10-dev alias to
> discuss JDK 10 doesn't exist yet.
> 
> A JEP describing the project has been submitted :
> 
>      JDK-8167368: Consolidate JDK 10 OpenJDK repositories to a single
> repository
>      https://bugs.openjdk.java.net/browse/JDK-8167368
> 
> The text of the JEP describes the motivation and current state of the
> work in more detail, including proposed changes to the file layout.
> Publication of the prototype consolidated repository is planned, but not
> done yet. The email below has a list of additional anticipated questions
> and answers.
> 
> We feel this consolidated arrangement offers some significant structural
> advantages for managing the JDK's source code and we are now asking for
> feedback on this potential change. In particular, if you feel there is a
> show-stopper problem with making this change, please let us know!
> 
> I'd like to acknowledge the work of Stefan Sarne, Stuart Marks, and
> Ingemar Aberg participating in discussions leading up to the prototype
> and I'd like to especially recognize the contributions of Erik Helin for
> savvy Hg manipulations and Erik Joelsson for skillful build wrangling in
> this project.
> 
> Please send initial comments by October 18, 2016.
> 
> Cheers,
> 
> -Joe
> 
> Q: What about the set of forests for JDK 10? Are we going to have
> master, dev, client, hotspot, etc. the same set as in 9?
> A: That is a separate question from the repository consolidation, but
> there will likely be simplifications here too. Discussions on that point
> will come later.
> 
> Q: I usually just build the code in repo X today. Will I have have to
> build the *whole JDK* now?
> A: Not necessarily. The same top-level build targets should work in the
> consolidated forest.
> 
> Q: Does disk usage change?
> A: The total disk usage of the current forest compared to the
> consolidated forest is nearly the same.
> 
> Q: In more detail, how were the changesets imported?
> A: The scripts used for the consolidation conversion are attached to the
> JEP.
> 
> Q: What happens to the Hg hashes?
> A: The conversion scheme used in the prototype does *not* preserve Hg
> hashes of changesets compared the current forests. However, the bug ids
> are preserved and can be searched for. In addition, one or more
> pre-consolidation forests should be archived in perpetuity so that URLs
> in bug comments continue to work, etc.
> 
> A mapping of the old hashes to the corresponding new hashes might be
> generated and placed in the final new repo.
> 
> Q: I'm allergic to tabs; what about jcheck?
> A: If history is preserved, the checking done by jcheck needs to be
> modified for the consolidated forest. One way to do this is to augment
> the white lists used in jcheck with the conflicting changesets. This
> approach may not be elegant, but it is effective and doesn't appear to
> appreciably impact jcheck running times.
> 
> Q: Will the future 9 update forest also have this consolidation
> restructuring?
> A: The script used to do the consolidation conversion is deterministic
> and could be run to create the  9 update forest in the future at the
> discretion of the 9 update team.
> 
> Q: For backports for forwardports, will there be a script to translate
> patch files across the consolidation boundary?
> A: That work is planned, but not yet done; see JDK-8165623: Create patch
> translator to update paths pre/post consolidation.
> 
> Q: It's the 21st century and I develop using an IDE. That is still going
> to work, right?
> A: The prototype to date does include updating the various IDE support
> files, but bug JDK-8167142 has been filed to track that work.