RFR: race with nested repos in hgforest.sh

Chris Hegarty chris.hegarty at oracle.com
Wed Feb 6 17:03:36 UTC 2013


Neil,

I think a wider discussion needs to be had around how this script should 
operate, but I didn't want this specific issue to get side tracked by that.

The hgforest script already clones/pulls in parallel, the problem here 
is that it just doesn't work with older versions of hg. And worse than 
that doesn't even return an error. I simply want to resolve this issue 
so that the script is once again useable.

If there is an issue with parallel clones/pulls, then we should consider 
that separately.

-Chris.

On 06/02/2013 16:51, Neil Richards wrote:
> fwiw, I've always found the attempts by hg_forest to process things in
> parallel cause me more pain than it's worth.
>
> Marching through the repo locations serially in order might take a few
> moments longer elapsed time, but creates instantly understandable
> status / output / log and points for returning sensible errors when
> things don't succeed.
>
> Doing so also reduces the burst load on the server (it has to deal with
> one request per invocation at any time, rather than five? (or more)),
> which also strikes me as a win.
>
> (There are times that my 'hg pull' requests get momentarily rebuffed by
> the OpenJDK server, in a manner that suggests to me it's getting flooded
> by such requests).
>
> Obviously, figuring out how to do everything properly in parallel is a
> lot more fun :) ...
> but I don't think this script is there yet, and I wonder if the KISS
> principle shouldn't apply here.
>
> Regards,
> Neil
>
> On Wed, 2013-02-06 at 10:41 +0000, Chris Hegarty wrote:
>>
>> On 06/02/2013 10:24, David Holmes wrote:
>>> Thanks Chris. I missed a couple of obvious things there. :(
>>>
>>> One thing related to the repo name in the $(rc) variable. Somewhere the
>>> repo name gets mangled:
>>>
>>> + echo jdk/src/closed
>>> + sed -e s at ./@@ -e s@/@_ at g
>>> + repopidfile=jdsrc_closed
>>>
>>> the / to _ doesn't work for the first / (not part of your fix though).
>>>
>>> I must say though that this bug still has me perplexed. I ran with set
>>> -x and observed the following. Given:
>>>
>>> # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed
>>> jdk/make/closed jdk/test/closed hotspot/make/closed
>>> hotspot/src/closed hotspot/test/closed deploy install sponsors pubs
>>>
>>> The first clone I see is corba and the second is jdk/make/closed. The
>>> very last clone I see is jdk. BUT there was no error creating the
>>> sub-repos before the enclosing repo! ??? Makes no sense.
>>
>> I only see the problem when running with old versions of hg. More recent
>> ones seem to be immune to it. I guess they must do something equivalent
>> to 'mkdir -p' on the destination directory.
>>
>> Blocking the clone of the nested repos until the enclosing repo creates
>> the holding directory seems like a small price to pay for such an
>> obscure bug.
>>
>> Also, now returning an error code when something fails should help avoid
>> any future issues of missing or out of date repos.
>>
>> -Chris.
>>
>>>
>>> David
>>> -----
>>>
>>> On 6/02/2013 8:04 PM, Chris Hegarty wrote:
>>>> Thank for the review. comments inline...
>>>>
>>>> On 06/02/2013 01:53, David Holmes wrote:
>>>>> 204 # Terminate with exit 0 only if all subprocesses were successful
>>>>> 205 ec=0
>>>>> 206 if [ -d ${tmp} ]; then
>>>>> 207 for rc in `ls ${tmp}/*.pid.rc` ; do
>>>>>
>>>>> I don't think you need to subshell an invocation of ls, simply do
>>>>>
>>>>> for rc in "${tmp}/*.pid.rc" ; do
>>>>
>>>> Yes this is better, but the quotes are incorrect. It should simply be
>>>> for rc in ${tmp}/*.pid.rc ; do
>>>>
>>>>> 208 exit_code=`cat ${rc} | tr -d ' \n\r'`
>>>>> 209 if [ "${exit_code}" != "0" ] ; then
>>>>> 210 echo "WARNING: ${rc} exited abnormally."
>>>>>
>>>>> Not sure printing ${rc} will be very informative there ??
>>>>
>>>> It prints the file name which contains the subrepo name. Not pretty, but
>>>> this gives more information about which clone/pull failed.
>>>>
>>>>> 211 ec=1
>>>>>
>>>>> Is there a "break" we can add here?
>>>>
>>>> Yes, we could break out/exit early, but I deliberately want to iterator
>>>> over all the child process exits codes to better highlight which ones,
>>>> if more than one, failed.
>>>>
>>>>> In get_source.sh
>>>>>
>>>>> # Get clones of all nested repositories
>>>>> ! sh ./common/bin/hgforest.sh clone "$@" || exit 1
>>>>>
>>>>> # Update all existing repositories to the latest sources
>>>>> sh ./common/bin/hgforest.sh pull -u
>>>>>
>>>>> don't we want "|| exit 1" for the pull case as well?
>>>>
>>>> Since the pull is the last command there is no need for "|| exit 1",
>>>> since its exit code will be returned.
>>>>
>>>> -Chris.
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 6/02/2013 12:07 AM, Chris Hegarty wrote:
>>>>>> On 02/05/2013 06:21 AM, David Holmes wrote:
>>>>>>> Chris,
>>>>>>>
>>>>>>> When these failures occur does the failure get reflected in an error
>>>>>>> exit code?
>>>>>>
>>>>>> Not currently, good point.
>>>>>>
>>>>>>> I'm seeing hudson builds merilly buildinh OpenJDK instead of Oracle
>>>>>>> JDK
>>>>>>> because the closed repos were skipped due to this bug :(
>>>>>>
>>>>>> This is what I am seeing too, and the reason I proposed a patch to
>>>>>> resolve it.
>>>>>>
>>>>>> Updated webrev;
>>>>>> http://cr.openjdk.java.net/~chegar/hgforest_nestedRepos/webrev/
>>>>>>
>>>>>> 1) now checks, and returns, appropriate error code, so that higher
>>>>>> level scripts can determine if it completed successfully
>>>>>> 2) Added diagnostic error message to help identify potential future
>>>>>> problems if a nested repo is blocked for too long
>>>>>>
>>>>>> -Chris.
>>>>>>
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On 2/02/2013 12:40 AM, Chris Hegarty wrote:
>>>>>>>> [ to build-dev and core-libs-dev, expect reviewer from either, but
>>>>>>>> will
>>>>>>>> integrate through jdk8/tl ]
>>>>>>>>
>>>>>>>> This issue is mainly of interest to Oracle engineers, but it effects
>>>>>>>> the
>>>>>>>> public hgforest script.
>>>>>>>>
>>>>>>>> When hgforest.sh is run with an addition argument to specify a closed
>>>>>>>> server, there is a problem/race between the creation of the
>>>>>>>> directories
>>>>>>>> to hold nested repositories and the clone itself. These directories
>>>>>>>> need
>>>>>>>> to be created before the clone command is executed, otherwise it will
>>>>>>>> fail, as below.
>>>>>>>>
>>>>>>>> The trivial fix is to back off these nested repos until their
>>>>>>>> containing
>>>>>>>> directory exists.
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~chegar/hgforest_nestedRepos/webrev/
>>>>>>>>
>>>>>>>> sh ./get_source.sh http://xxx.yyy.oracle.com |&  tee clone.log
>>>>>>>> # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed
>>>>>>>> jdk/make/closed jdk/test/closed hotspot/make/closed
>>>>>>>> hotspot/src/closed
>>>>>>>> hotspot/test/closed deploy install sponsors pubs
>>>>>>>>
>>>>>>>> Waiting 5 secs before spawning next background command.
>>>>>>>> Waiting 5 secs before spawning next background command.
>>>>>>>> Waiting 5 secs before spawning next background command.
>>>>>>>> jdk/src/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u
>>>>>>>> /java/devtools/sparc/mercurial/latest/bin/hg clone
>>>>>>>> http://xxx.yyy.oracle.com/jdk8/tl/jdk/src/closed jdk/src/closed
>>>>>>>> jdk/src/closed: abort: No such file or directory: jdk/src/closed
>>>>>>>> jdk/make/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u
>>>>>>>> /java/devtools/sparc/mercurial/latest/bin/hg clone
>>>>>>>> http://xxx.yyy.oracle.com/jdk8/tl/jdk/make/closed jdk/make/closed
>>>>>>>> jdk/make/closed: abort: No such file or directory: jdk/make/closed
>>>>>>>> Waiting 5 secs before spawning next background command.
>>>>>>>> ....
>>>>>>>>
>>>>>>>>
>>>>>>>> -Chris.
>
>



More information about the build-dev mailing list