Call for Discussion: New Project: Skara -- investigating source code management options for the JDK sources

joe darcy joe.darcy at oracle.com
Fri Jul 27 20:10:20 UTC 2018


Hello Mario,

On 7/27/2018 2:26 AM, Mario Torre wrote:
> Hi Martijn,
>
> How many contributions from developers in those git mirrors came into
> OpenJDK (or even, how many contributions happened on those mirrors
> outside of OpenJDK development?).
>
> I think the point about performance is sound [1], but I would be very
> careful to introduce a new SCM, lots of developers are used with
> mercurial now, and even if git is probably just a small learning step
> away, I would argue that this is unnecessary to the people who are
> already contributing.
>
[snip]

>
> [1] It really is terrible now with a single repo, but is it a problem
> of mercurial really? Git also carries all the history in the clone,
> did somebody do some testing on this, and I mean, on the same servers
> and network?
>

In Mercurial, when a file is moved, its history is restarted, meaning a 
full copy of the file is stored. Therefore, lots of file moves will tend 
to make a Mercurial repo get disproportionally larger. In the JDK, many 
files were moved in JDK 9 for modularity and large numbers of files were 
moved again in JDK 10 for the repo consolidation.

The Mercurial representation of JDK 8 GA takes about 412 MB, JDK 9 GA 
~808 MB, and JDK 10 GA ~1553 MB. Given the number of changesets in JDK 
10, extrapolating from the good linear fit between number of changesets 
and size in the JDK 7 and 8 update releases, one would expect JDK 10 in 
hg to take in the neighborhood of 450 MB - 500 MB. Therefore, the file 
moves are certainly bulking up the repo size, contributing to the 
increased download times.

While a simple import of the JDK sources into git can lead to a larger 
representation, if the git repo is repacked [1], it will result in a 
much, much small representation. Basically a repack is requesting git 
use forward and backward differencing with a large window to look for a 
more compact representation; this will remove the excess size introduced 
by the file moves. In particular, by running

     git repack -a -d --depth=250 --window=250 -f

on some git imports of the JDK we've done internally, we ended with a 
git repo size of recent JDK sources of around 300 MB, roughly 5X 
smaller. That 300 MB includes all the JDK changeset history and tags, etc.

In some experiments with hosting providers, cloning such a repacked git 
repo can be completed within 1 to 3 minutes, which is considerably 
faster than the clone times we see now from hg.openjdk.java.net.

HTH,

-Joe

[1] 
https://metalinguist.wordpress.com/2007/12/06/the-woes-of-git-gc-aggressive-and-how-git-deltas-work/




More information about the discuss mailing list