Call for Discussion: New Project: Skara -- investigating source code management options for the JDK sources

Mario Torre neugens at redhat.com
Mon Jul 30 13:16:16 UTC 2018


On Mon, Jul 30, 2018 at 2:41 PM, Erik Österlund
<erik.osterlund at oracle.com> wrote:
> Hi,
>
> etc for free? I would personally rather ride on the source code hosting experience and expertise of GitHub than to chase after homegrown solutions to patch the problems.
> /Erik

That surely isn't for free. Besides, what you described are just
clones, how does GitHub protect us from multiple clones going out of
sync? Since it's a model that does favour branching I can only see
more out of sync repos.

The GitHub model may be good for the vast majority of little projects
out there, but not for us (no disrespect here intended, they may have
large communities etc. but clearly the majority of projects have
smaller source count and focus areas than OpenJDK, this project is
huge, that's what makes it special).

I do have experience with one other project on GitHub that is not even
large enough to approach the critical mass of OpenJDK, but is large
and the development model is insane, it's very dispersive and there's
no simple way of filtering or keep organised discussion and reviews,
bugs etc and features etc... Once a project becomes somewhat big,
GitHub is a mess.

So, yes, we may decide to host of GitHub (or similar), but we should
be *very* careful not to use the GitHub model, it won't scale for us.

Our current model is not broken, and mercurial is only a tad slow, so
we shouldn't change anything other than make the SCM a tad faster, and
again, if the only solution is to move to git... well, whatever, but
that's just about it.

I usually compare OpenJDK and the Kernel because they are very similar
(by design I think?) and although we don't have any more the
mono-tree/multiple repos approach, this is still valid:

https://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html

In my view, moving to GitHub would be a mistake that will bring us
more pain down the road and force us to adapt to a new workflow for no
reason.

Cheers,
Mario

> On Jul 30 2018, at 1:53 pm, Aleksey Shipilev <shade at redhat.com> wrote:
>>
>> On 07/30/2018 01:13 PM, Weijun Wang wrote:
>> > Joe said on Jul 28:
>> >
>> > > In Mercurial, when a file is moved, its history is restarted, meaning a full copy of the file is stored. Therefore, lots of file moves will tend to make a Mercurial repo get disproportionally larger. In the JDK, many files were moved in JDK 9 for modularity and large numbers of files were moved again in JDK 10 for the repo consolidation.
>> > > The Mercurial representation of JDK 8 GA takes about 412 MB, JDK 9 GA ~808 MB, and JDK 10 GA ~1553 MB.
>> > So this is related to Mercurial's design that a rename equals to a remove and a create.
>> > Maybe we can fix Mercurial to make this a real "move", and I doubt if there is a space-time tradeoff here.
>> What I meant to say is that space-time tradeoff between on-the-wire format (bundles) and on-the-disk
>> format (.hg folder) is there, and you can choose either, depending on the context. Publishing blobs
>> in on-the-wire format has better compatibility, while tarballs in on-the-disk format are ultimately
>> faster to "clone".
>>
>> Two mega-moves (Jigsaw in 9, and monorepo in 10) inflated the on-the-disk size quite badly, as Joe
>> indicated above, but on-the-wire format size seems to remain okay. So, if we enabled CDN-backed
>> bundles-assisted clone, it should probably cut down clone pains, at least for our Europe-side folks,
>> at the expense of some client CPU churn associated with converting on-the-wire to on-the-disk during
>> the clone.
>>
>> Some optimization for on-the-disk size is possible if you re-clone the repo with
>> "--config=format.generaldelta=1 --config=format.aggressivemergedeltas=1", thus optimizing internal
>> .hg metadata. That would take a lot of time. If you have some time to spare, then it makes sense to
>> do so. My build scripts do that automatically before packaging the .hg snapshots.
>>
>> Also, it seems that doing the "clone --pull" twice with generaldelta enabled compacts metadata even
>> more: jdk/jdk .hg size fell from 1.5 GB to 1.2 GB uncompressed, and from 750M to 590M
>> xz9-compressed. I just fixed my build scripts and currently testing them.
>>
>> -Aleksey



-- 
Mario Torre
Associate Manager, Software Engineering
Red Hat GmbH <https://www.redhat.com>
9704 A60C B4BE A8B8 0F30  9205 5D7E 4952 3F65 7898


More information about the discuss mailing list