hg clone is unbelievably slow

Ioi Lam ioi.lam at oracle.com
Tue Feb 6 15:00:44 UTC 2018


I used to have a nightly script that syncs and zips up the entire content of the .hg files. Then if I want to clone a new repo, I would just unzip the .hg files, and run hg update -c. All that just takes just a few seconds on SSD. Then just hg pull -u to download the changes in the past day. You will have a fresh repo in less than a minute.

Ioi

> On Feb 6, 2018, at 7:05 PM, Aleksey Shipilev <ashipile at redhat.com> wrote:
> 
>> On 02/06/2018 11:50 AM, Andrew Haley wrote:
>> Half an hour or more here.  AFAIK the problem is due to the
>> inefficiency of Mercurial itself and the hg protocol.
> 
> The compounding factors are:
>  - hg.openjdk.java.net is too far from Europe, and bandwidth-delay-product kills TCP performance
> with regular-sized buffers (which have to be tuned on both client and server side). It was partially
> alleviated by forests where you had several concurrent hg clones (hotspot, jdk, ...) at the same time;
>  - the Jigsaw and monorepo file moves inflated the repository size dramatically. See
> https://builds.shipilev.net/workspaces/: 8u is 240 MB compressed, 9 is 420 MB compressed, 10 is 760
> MB compressed!
> 
>> Aleksey Shipilev has done an experiment whereby trees are regularly
>> cloned and compressed tarballs created; these can be downloaded in a
>> couple of minutes.  But really we don't want to depend on the largesse
>> of one developer: if we could download the OpenJDK trees directly by
>> means of wget (or something similar) we would reduce the load on the
>> servers and reduce the time taken to download as well.
> 
> I second that. Also, we can do Mercurial and compressing tricks to make the compressed archive
> easier to download. Happy to share the script that makes the densest .tar.xz without going
> full-crazy (maybe other simple tricks missing?):
> 
> function repack-jdk8 {
>  URL=$1
>  NAME=$2
>  if [ ! -d $NAME ]; then
>    hg clone $URL $NAME
>  fi
>  cd $NAME
>  hg pull
>  hg update
>  HGFOREST_GLOBALOPTS=" --config=format.generaldelta=1 --config=format.aggressivemergedeltas=1" sh
> common/bin/hgforest.sh clone
>  sh common/bin/hgforest.sh pull
>  sh common/bin/hgforest.sh update null
>  hg update null
>  cd ..
> 
>  # Cluster similar files together
>  find $NAME/ -type f | awk -F '/' '{ print $(NF-2) " " $(NF-1) " " $(NF) " " $L; }' | sort  | awk
> '{ print $4; }' > list.txt
>  tar -T list.txt -c -f - | xz -6 > $NAME.tar.xz
> 
>  rsync $NAME.tar.xz builds at builds.shipilev.net:~/wwwroot/workspaces/
> }
> 
> function repack-jdk10 {
>  URL=$1
>  NAME=$2
>  if [ ! -d $NAME ]; then
>    hg --config=format.generaldelta=1 --config=format.aggressivemergedeltas=1  clone $URL $NAME
>  fi
>  cd $NAME
>  hg pull
>  hg update null
>  cd ..
> 
>  # Cluster similar files together
>  find $NAME/ -type f | awk -F '/' '{ print $(NF-2) " " $(NF-1) " " $(NF) " " $L; }' | sort  | awk
> '{ print $4; }' > list.txt
> 
>  tar -T list.txt -c -f - | xz -6 > $NAME.tar.xz
>  rsync $NAME.tar.xz builds at builds.shipilev.net:~/wwwroot/workspaces/
> }
> 
> repack-jdk8  http://hg.openjdk.java.net/jdk8u/jdk8u/ jdk8u-jdk8u
> repack-jdk10 http://hg.openjdk.java.net/jdk/jdk      jdk-jdk
> 
> -Aleksey
> 



More information about the jdk-dev mailing list