Consolidated repo vs. old forest size differences

Erik Joelsson erik.joelsson at oracle.com
Fri Sep 29 11:20:10 UTC 2017


Hello Volker,

As the implementor of the conversion strategy, I have investigated and 
played around with this a lot. There are two main contributors to the 
increased size.

* First, as Joe said, the file moves, which account for around 350MB. A 
similar (though not quite as big) size increase occurred when we 
reorganized the source into modules.

* The second is caused by inefficient storage in the "manifest" file in 
the metadata. This file can grow large when adjacent changesets have 
large diffs between them, typically from non linear history. By merging 
together previously unrelated histories, we created quite a few of 
those. If you check your .hg dir, you will see that this file is around 
400MB.

There is a way to reduce the size of the manifest file, but it requires 
a newer version of Mercurial than we have on our servers and the feature 
is not well documented (see aggressiveMergeDeltas). Because of this, we 
have been hesitant to deploy that change on our main servers. If you are 
curious, you can convert your repository locally using a command like this:

hg --config=format.generaldelta=1 
--config=format.aggressivemergedeltas=1 clone --pull <jdk10-master> 
<new-jdk10-master>

That should result in a manifest file size of around 40MB and bring down 
the metadata total size to around 1100MB. All of the conversion was done 
with this option enabled, but by pushing to an older server, the 
"compression" didn't stick, which it otherwise would have.

/Erik


On 2017-09-29 01:09, Volker Simonis wrote:
> Hi Joe,
>
> thanks for the quick answer. Yes, that explains the double size. So at
> least it is a one time effect only.
>
> Regards,
> Volker
>
>
> On Fri, Sep 29, 2017 at 12:48 AM, joe darcy <joe.darcy at oracle.com> wrote:
>> Hi Volker,
>>
>> When there is a file move in Hg, it starts a fresh snapshot of the file.
>> With the reorganized source structure, having a single top-level src
>> subdirectory instead of jdk/src, langtools/src, etc. all the source files
>> were moved.
>>
>> This accounts for such of the greater size post consolidation. We're looking
>> into some ways to mitigate the impact of the larger size.
>>
>> Thanks,
>>
>> -Joe
>>
>>
>>
>> On 9/28/2017 3:15 PM, Volker Simonis wrote:
>>> Hi,
>>>
>>> not sure if this has been discussed before but at least I couldn't
>>> find any references in the previous mail threads on the repo
>>> consolidation.
>>>
>>> I've just realized that the size of the repository history (i.e.
>>> everything under .hg) has doubled in the new consolidated repo (800mb
>>> vs. 1600mb) and I don't exactly understand why:
>>>
>>> $ du -shc jdk10-hs-old/*/.hg jdk10-hs-old/.hg
>>> 16M    jdk10-hs-old/corba/.hg
>>> 141M    jdk10-hs-old/hotspot/.hg
>>> 49M    jdk10-hs-old/jaxp/.hg
>>> 57M    jdk10-hs-old/jaxws/.hg
>>> 453M    jdk10-hs-old/jdk/.hg
>>> 76M    jdk10-hs-old/langtools/.hg
>>> 33M    jdk10-hs-old/nashorn/.hg
>>> 8,1M    jdk10-hs-old/.hg
>>> 829M    total
>>>
>>> $ du -sh jdk10-hs/.hg
>>> 1,6G    jdk10-hs/.hg
>>>
>>> I wonder why this is the case?
>>>
>>> Is this because the consolidated repo has more and bigger merge changes?
>>>
>>> The consolidated repo has a total of 47297 changes with about 13878
>>> merge changes:
>>>
>>> $ hg -R jdk10-hs log --template "{rev}\n" -r tip
>>> 47297
>>> $ hg -R jdk10-hs log --template "{rev}\n" -k Merge | wc
>>>     13878   13878   79600
>>>
>>> The old forest had a total of  43102 changes with about 10408 merge
>>> changes:
>>>
>>> $ bash common/bin/hgforest.sh log --template "{rev}\n" | wc
>>>     43102   86285 1295798
>>> $ bash common/bin/hgforest.sh log -k "Merge" --template "{rev}\n" | wc
>>>     10408   20897  312491
>>>
>>> So the new consolidated repo has about 3000-4000 more changes of which
>>> all are merge changesets. Does anybody know a nice command to sum up
>>> the size of all merge changesets?
>>>
>>> Any other insights or comments? It would be especially interesting to
>>> know how this will evolve in the future.
>>>
>>> Regards,
>>> Volker
>>>
>>> PS: this also partially explains why downloading the new repo takes
>>> considerably longer compared to the old forest (the fact that the
>>> get_sources.sh script downloaded the forest in parallel being the
>>> second reason).
>>



More information about the jdk10-dev mailing list