hg clone is unbelievably slow
Ioi Lam
ioi.lam at oracle.com
Tue Feb 6 15:00:44 UTC 2018
I used to have a nightly script that syncs and zips up the entire content of the .hg files. Then if I want to clone a new repo, I would just unzip the .hg files, and run hg update -c. All that just takes just a few seconds on SSD. Then just hg pull -u to download the changes in the past day. You will have a fresh repo in less than a minute.
Ioi
> On Feb 6, 2018, at 7:05 PM, Aleksey Shipilev <ashipile at redhat.com> wrote:
>
>> On 02/06/2018 11:50 AM, Andrew Haley wrote:
>> Half an hour or more here. AFAIK the problem is due to the
>> inefficiency of Mercurial itself and the hg protocol.
>
> The compounding factors are:
> - hg.openjdk.java.net is too far from Europe, and bandwidth-delay-product kills TCP performance
> with regular-sized buffers (which have to be tuned on both client and server side). It was partially
> alleviated by forests where you had several concurrent hg clones (hotspot, jdk, ...) at the same time;
> - the Jigsaw and monorepo file moves inflated the repository size dramatically. See
> https://builds.shipilev.net/workspaces/: 8u is 240 MB compressed, 9 is 420 MB compressed, 10 is 760
> MB compressed!
>
>> Aleksey Shipilev has done an experiment whereby trees are regularly
>> cloned and compressed tarballs created; these can be downloaded in a
>> couple of minutes. But really we don't want to depend on the largesse
>> of one developer: if we could download the OpenJDK trees directly by
>> means of wget (or something similar) we would reduce the load on the
>> servers and reduce the time taken to download as well.
>
> I second that. Also, we can do Mercurial and compressing tricks to make the compressed archive
> easier to download. Happy to share the script that makes the densest .tar.xz without going
> full-crazy (maybe other simple tricks missing?):
>
> function repack-jdk8 {
> URL=$1
> NAME=$2
> if [ ! -d $NAME ]; then
> hg clone $URL $NAME
> fi
> cd $NAME
> hg pull
> hg update
> HGFOREST_GLOBALOPTS=" --config=format.generaldelta=1 --config=format.aggressivemergedeltas=1" sh
> common/bin/hgforest.sh clone
> sh common/bin/hgforest.sh pull
> sh common/bin/hgforest.sh update null
> hg update null
> cd ..
>
> # Cluster similar files together
> find $NAME/ -type f | awk -F '/' '{ print $(NF-2) " " $(NF-1) " " $(NF) " " $L; }' | sort | awk
> '{ print $4; }' > list.txt
> tar -T list.txt -c -f - | xz -6 > $NAME.tar.xz
>
> rsync $NAME.tar.xz builds at builds.shipilev.net:~/wwwroot/workspaces/
> }
>
> function repack-jdk10 {
> URL=$1
> NAME=$2
> if [ ! -d $NAME ]; then
> hg --config=format.generaldelta=1 --config=format.aggressivemergedeltas=1 clone $URL $NAME
> fi
> cd $NAME
> hg pull
> hg update null
> cd ..
>
> # Cluster similar files together
> find $NAME/ -type f | awk -F '/' '{ print $(NF-2) " " $(NF-1) " " $(NF) " " $L; }' | sort | awk
> '{ print $4; }' > list.txt
>
> tar -T list.txt -c -f - | xz -6 > $NAME.tar.xz
> rsync $NAME.tar.xz builds at builds.shipilev.net:~/wwwroot/workspaces/
> }
>
> repack-jdk8 http://hg.openjdk.java.net/jdk8u/jdk8u/ jdk8u-jdk8u
> repack-jdk10 http://hg.openjdk.java.net/jdk/jdk jdk-jdk
>
> -Aleksey
>
More information about the jdk-dev
mailing list