hg clone is unbelievably slow

Tue Feb 6 11:05:05 UTC 2018

On 02/06/2018 11:50 AM, Andrew Haley wrote:
> Half an hour or more here.  AFAIK the problem is due to the
> inefficiency of Mercurial itself and the hg protocol.

The compounding factors are:
  - hg.openjdk.java.net is too far from Europe, and bandwidth-delay-product kills TCP performance
with regular-sized buffers (which have to be tuned on both client and server side). It was partially
alleviated by forests where you had several concurrent hg clones (hotspot, jdk, ...) at the same time;
  - the Jigsaw and monorepo file moves inflated the repository size dramatically. See
https://builds.shipilev.net/workspaces/: 8u is 240 MB compressed, 9 is 420 MB compressed, 10 is 760
MB compressed!

> Aleksey Shipilev has done an experiment whereby trees are regularly
> cloned and compressed tarballs created; these can be downloaded in a
> couple of minutes.  But really we don't want to depend on the largesse
> of one developer: if we could download the OpenJDK trees directly by
> means of wget (or something similar) we would reduce the load on the
> servers and reduce the time taken to download as well.

I second that. Also, we can do Mercurial and compressing tricks to make the compressed archive
easier to download. Happy to share the script that makes the densest .tar.xz without going
full-crazy (maybe other simple tricks missing?):

function repack-jdk8 {
  URL=$1
  NAME=$2
  if [ ! -d $NAME ]; then
    hg clone $URL $NAME
  fi
  cd $NAME
  hg pull
  hg update
  HGFOREST_GLOBALOPTS=" --config=format.generaldelta=1 --config=format.aggressivemergedeltas=1" sh
common/bin/hgforest.sh clone
  sh common/bin/hgforest.sh pull
  sh common/bin/hgforest.sh update null
  hg update null
  cd ..

  # Cluster similar files together
  find $NAME/ -type f | awk -F '/' '{ print $(NF-2) " " $(NF-1) " " $(NF) " " $L; }' | sort  | awk
'{ print $4; }' > list.txt
  tar -T list.txt -c -f - | xz -6 > $NAME.tar.xz

  rsync $NAME.tar.xz builds at builds.shipilev.net:~/wwwroot/workspaces/
}

function repack-jdk10 {
  URL=$1
  NAME=$2
  if [ ! -d $NAME ]; then
    hg --config=format.generaldelta=1 --config=format.aggressivemergedeltas=1  clone $URL $NAME
  fi
  cd $NAME
  hg pull
  hg update null
  cd ..

  # Cluster similar files together
  find $NAME/ -type f | awk -F '/' '{ print $(NF-2) " " $(NF-1) " " $(NF) " " $L; }' | sort  | awk
'{ print $4; }' > list.txt

  tar -T list.txt -c -f - | xz -6 > $NAME.tar.xz
  rsync $NAME.tar.xz builds at builds.shipilev.net:~/wwwroot/workspaces/
}

repack-jdk8  http://hg.openjdk.java.net/jdk8u/jdk8u/ jdk8u-jdk8u
repack-jdk10 http://hg.openjdk.java.net/jdk/jdk      jdk-jdk

-Aleksey