RFR: race with nested repos in hgforest.sh

Neil Richards neil.richards at ngmr.net
Wed Feb 6 16:51:15 UTC 2013


fwiw, I've always found the attempts by hg_forest to process things in
parallel cause me more pain than it's worth.

Marching through the repo locations serially in order might take a few
moments longer elapsed time, but creates instantly understandable
status / output / log and points for returning sensible errors when
things don't succeed.

Doing so also reduces the burst load on the server (it has to deal with
one request per invocation at any time, rather than five? (or more)),
which also strikes me as a win.

(There are times that my 'hg pull' requests get momentarily rebuffed by
the OpenJDK server, in a manner that suggests to me it's getting flooded
by such requests).

Obviously, figuring out how to do everything properly in parallel is a
lot more fun :) ...
but I don't think this script is there yet, and I wonder if the KISS
principle shouldn't apply here.

Regards,
Neil

On Wed, 2013-02-06 at 10:41 +0000, Chris Hegarty wrote:
> 
> On 06/02/2013 10:24, David Holmes wrote:
> > Thanks Chris. I missed a couple of obvious things there. :(
> >
> > One thing related to the repo name in the $(rc) variable. Somewhere the
> > repo name gets mangled:
> >
> > + echo jdk/src/closed
> > + sed -e s at ./@@ -e s@/@_ at g
> > + repopidfile=jdsrc_closed
> >
> > the / to _ doesn't work for the first / (not part of your fix though).
> >
> > I must say though that this bug still has me perplexed. I ran with set
> > -x and observed the following. Given:
> >
> > # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed
> > jdk/make/closed jdk/test/closed hotspot/make/closed
> > hotspot/src/closed hotspot/test/closed deploy install sponsors pubs
> >
> > The first clone I see is corba and the second is jdk/make/closed. The
> > very last clone I see is jdk. BUT there was no error creating the
> > sub-repos before the enclosing repo! ??? Makes no sense.
> 
> I only see the problem when running with old versions of hg. More recent 
> ones seem to be immune to it. I guess they must do something equivalent 
> to 'mkdir -p' on the destination directory.
> 
> Blocking the clone of the nested repos until the enclosing repo creates 
> the holding directory seems like a small price to pay for such an 
> obscure bug.
> 
> Also, now returning an error code when something fails should help avoid 
> any future issues of missing or out of date repos.
> 
> -Chris.
> 
> >
> > David
> > -----
> >
> > On 6/02/2013 8:04 PM, Chris Hegarty wrote:
> >> Thank for the review. comments inline...
> >>
> >> On 06/02/2013 01:53, David Holmes wrote:
> >>> 204 # Terminate with exit 0 only if all subprocesses were successful
> >>> 205 ec=0
> >>> 206 if [ -d ${tmp} ]; then
> >>> 207 for rc in `ls ${tmp}/*.pid.rc` ; do
> >>>
> >>> I don't think you need to subshell an invocation of ls, simply do
> >>>
> >>> for rc in "${tmp}/*.pid.rc" ; do
> >>
> >> Yes this is better, but the quotes are incorrect. It should simply be
> >> for rc in ${tmp}/*.pid.rc ; do
> >>
> >>> 208 exit_code=`cat ${rc} | tr -d ' \n\r'`
> >>> 209 if [ "${exit_code}" != "0" ] ; then
> >>> 210 echo "WARNING: ${rc} exited abnormally."
> >>>
> >>> Not sure printing ${rc} will be very informative there ??
> >>
> >> It prints the file name which contains the subrepo name. Not pretty, but
> >> this gives more information about which clone/pull failed.
> >>
> >>> 211 ec=1
> >>>
> >>> Is there a "break" we can add here?
> >>
> >> Yes, we could break out/exit early, but I deliberately want to iterator
> >> over all the child process exits codes to better highlight which ones,
> >> if more than one, failed.
> >>
> >>> In get_source.sh
> >>>
> >>> # Get clones of all nested repositories
> >>> ! sh ./common/bin/hgforest.sh clone "$@" || exit 1
> >>>
> >>> # Update all existing repositories to the latest sources
> >>> sh ./common/bin/hgforest.sh pull -u
> >>>
> >>> don't we want "|| exit 1" for the pull case as well?
> >>
> >> Since the pull is the last command there is no need for "|| exit 1",
> >> since its exit code will be returned.
> >>
> >> -Chris.
> >>
> >>>
> >>> Thanks,
> >>> David
> >>>
> >>> On 6/02/2013 12:07 AM, Chris Hegarty wrote:
> >>>> On 02/05/2013 06:21 AM, David Holmes wrote:
> >>>>> Chris,
> >>>>>
> >>>>> When these failures occur does the failure get reflected in an error
> >>>>> exit code?
> >>>>
> >>>> Not currently, good point.
> >>>>
> >>>>> I'm seeing hudson builds merilly buildinh OpenJDK instead of Oracle
> >>>>> JDK
> >>>>> because the closed repos were skipped due to this bug :(
> >>>>
> >>>> This is what I am seeing too, and the reason I proposed a patch to
> >>>> resolve it.
> >>>>
> >>>> Updated webrev;
> >>>> http://cr.openjdk.java.net/~chegar/hgforest_nestedRepos/webrev/
> >>>>
> >>>> 1) now checks, and returns, appropriate error code, so that higher
> >>>> level scripts can determine if it completed successfully
> >>>> 2) Added diagnostic error message to help identify potential future
> >>>> problems if a nested repo is blocked for too long
> >>>>
> >>>> -Chris.
> >>>>
> >>>>>
> >>>>> David
> >>>>>
> >>>>> On 2/02/2013 12:40 AM, Chris Hegarty wrote:
> >>>>>> [ to build-dev and core-libs-dev, expect reviewer from either, but
> >>>>>> will
> >>>>>> integrate through jdk8/tl ]
> >>>>>>
> >>>>>> This issue is mainly of interest to Oracle engineers, but it effects
> >>>>>> the
> >>>>>> public hgforest script.
> >>>>>>
> >>>>>> When hgforest.sh is run with an addition argument to specify a closed
> >>>>>> server, there is a problem/race between the creation of the
> >>>>>> directories
> >>>>>> to hold nested repositories and the clone itself. These directories
> >>>>>> need
> >>>>>> to be created before the clone command is executed, otherwise it will
> >>>>>> fail, as below.
> >>>>>>
> >>>>>> The trivial fix is to back off these nested repos until their
> >>>>>> containing
> >>>>>> directory exists.
> >>>>>>
> >>>>>> Webrev:
> >>>>>> http://cr.openjdk.java.net/~chegar/hgforest_nestedRepos/webrev/
> >>>>>>
> >>>>>> sh ./get_source.sh http://xxx.yyy.oracle.com | & tee clone.log
> >>>>>> # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed
> >>>>>> jdk/make/closed jdk/test/closed hotspot/make/closed
> >>>>>> hotspot/src/closed
> >>>>>> hotspot/test/closed deploy install sponsors pubs
> >>>>>>
> >>>>>> Waiting 5 secs before spawning next background command.
> >>>>>> Waiting 5 secs before spawning next background command.
> >>>>>> Waiting 5 secs before spawning next background command.
> >>>>>> jdk/src/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u
> >>>>>> /java/devtools/sparc/mercurial/latest/bin/hg clone
> >>>>>> http://xxx.yyy.oracle.com/jdk8/tl/jdk/src/closed jdk/src/closed
> >>>>>> jdk/src/closed: abort: No such file or directory: jdk/src/closed
> >>>>>> jdk/make/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u
> >>>>>> /java/devtools/sparc/mercurial/latest/bin/hg clone
> >>>>>> http://xxx.yyy.oracle.com/jdk8/tl/jdk/make/closed jdk/make/closed
> >>>>>> jdk/make/closed: abort: No such file or directory: jdk/make/closed
> >>>>>> Waiting 5 secs before spawning next background command.
> >>>>>> ....
> >>>>>>
> >>>>>>
> >>>>>> -Chris.


-- 
Unless stated above:
IBM email: neil_richards at uk.ibm.com
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU




More information about the core-libs-dev mailing list