RFR: race with nested repos in hgforest.sh
Chris Hegarty
chris.hegarty at oracle.com
Wed Feb 6 10:41:39 UTC 2013
On 06/02/2013 10:24, David Holmes wrote:
> Thanks Chris. I missed a couple of obvious things there. :(
>
> One thing related to the repo name in the $(rc) variable. Somewhere the
> repo name gets mangled:
>
> + echo jdk/src/closed
> + sed -e s at ./@@ -e s@/@_ at g
> + repopidfile=jdsrc_closed
>
> the / to _ doesn't work for the first / (not part of your fix though).
>
> I must say though that this bug still has me perplexed. I ran with set
> -x and observed the following. Given:
>
> # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed
> jdk/make/closed jdk/test/closed hotspot/make/closed
> hotspot/src/closed hotspot/test/closed deploy install sponsors pubs
>
> The first clone I see is corba and the second is jdk/make/closed. The
> very last clone I see is jdk. BUT there was no error creating the
> sub-repos before the enclosing repo! ??? Makes no sense.
I only see the problem when running with old versions of hg. More recent
ones seem to be immune to it. I guess they must do something equivalent
to 'mkdir -p' on the destination directory.
Blocking the clone of the nested repos until the enclosing repo creates
the holding directory seems like a small price to pay for such an
obscure bug.
Also, now returning an error code when something fails should help avoid
any future issues of missing or out of date repos.
-Chris.
>
> David
> -----
>
> On 6/02/2013 8:04 PM, Chris Hegarty wrote:
>> Thank for the review. comments inline...
>>
>> On 06/02/2013 01:53, David Holmes wrote:
>>> 204 # Terminate with exit 0 only if all subprocesses were successful
>>> 205 ec=0
>>> 206 if [ -d ${tmp} ]; then
>>> 207 for rc in `ls ${tmp}/*.pid.rc` ; do
>>>
>>> I don't think you need to subshell an invocation of ls, simply do
>>>
>>> for rc in "${tmp}/*.pid.rc" ; do
>>
>> Yes this is better, but the quotes are incorrect. It should simply be
>> for rc in ${tmp}/*.pid.rc ; do
>>
>>> 208 exit_code=`cat ${rc} | tr -d ' \n\r'`
>>> 209 if [ "${exit_code}" != "0" ] ; then
>>> 210 echo "WARNING: ${rc} exited abnormally."
>>>
>>> Not sure printing ${rc} will be very informative there ??
>>
>> It prints the file name which contains the subrepo name. Not pretty, but
>> this gives more information about which clone/pull failed.
>>
>>> 211 ec=1
>>>
>>> Is there a "break" we can add here?
>>
>> Yes, we could break out/exit early, but I deliberately want to iterator
>> over all the child process exits codes to better highlight which ones,
>> if more than one, failed.
>>
>>> In get_source.sh
>>>
>>> # Get clones of all nested repositories
>>> ! sh ./common/bin/hgforest.sh clone "$@" || exit 1
>>>
>>> # Update all existing repositories to the latest sources
>>> sh ./common/bin/hgforest.sh pull -u
>>>
>>> don't we want "|| exit 1" for the pull case as well?
>>
>> Since the pull is the last command there is no need for "|| exit 1",
>> since its exit code will be returned.
>>
>> -Chris.
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 6/02/2013 12:07 AM, Chris Hegarty wrote:
>>>> On 02/05/2013 06:21 AM, David Holmes wrote:
>>>>> Chris,
>>>>>
>>>>> When these failures occur does the failure get reflected in an error
>>>>> exit code?
>>>>
>>>> Not currently, good point.
>>>>
>>>>> I'm seeing hudson builds merilly buildinh OpenJDK instead of Oracle
>>>>> JDK
>>>>> because the closed repos were skipped due to this bug :(
>>>>
>>>> This is what I am seeing too, and the reason I proposed a patch to
>>>> resolve it.
>>>>
>>>> Updated webrev;
>>>> http://cr.openjdk.java.net/~chegar/hgforest_nestedRepos/webrev/
>>>>
>>>> 1) now checks, and returns, appropriate error code, so that higher
>>>> level scripts can determine if it completed successfully
>>>> 2) Added diagnostic error message to help identify potential future
>>>> problems if a nested repo is blocked for too long
>>>>
>>>> -Chris.
>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On 2/02/2013 12:40 AM, Chris Hegarty wrote:
>>>>>> [ to build-dev and core-libs-dev, expect reviewer from either, but
>>>>>> will
>>>>>> integrate through jdk8/tl ]
>>>>>>
>>>>>> This issue is mainly of interest to Oracle engineers, but it effects
>>>>>> the
>>>>>> public hgforest script.
>>>>>>
>>>>>> When hgforest.sh is run with an addition argument to specify a closed
>>>>>> server, there is a problem/race between the creation of the
>>>>>> directories
>>>>>> to hold nested repositories and the clone itself. These directories
>>>>>> need
>>>>>> to be created before the clone command is executed, otherwise it will
>>>>>> fail, as below.
>>>>>>
>>>>>> The trivial fix is to back off these nested repos until their
>>>>>> containing
>>>>>> directory exists.
>>>>>>
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~chegar/hgforest_nestedRepos/webrev/
>>>>>>
>>>>>> sh ./get_source.sh http://xxx.yyy.oracle.com | & tee clone.log
>>>>>> # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed
>>>>>> jdk/make/closed jdk/test/closed hotspot/make/closed
>>>>>> hotspot/src/closed
>>>>>> hotspot/test/closed deploy install sponsors pubs
>>>>>>
>>>>>> Waiting 5 secs before spawning next background command.
>>>>>> Waiting 5 secs before spawning next background command.
>>>>>> Waiting 5 secs before spawning next background command.
>>>>>> jdk/src/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u
>>>>>> /java/devtools/sparc/mercurial/latest/bin/hg clone
>>>>>> http://xxx.yyy.oracle.com/jdk8/tl/jdk/src/closed jdk/src/closed
>>>>>> jdk/src/closed: abort: No such file or directory: jdk/src/closed
>>>>>> jdk/make/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u
>>>>>> /java/devtools/sparc/mercurial/latest/bin/hg clone
>>>>>> http://xxx.yyy.oracle.com/jdk8/tl/jdk/make/closed jdk/make/closed
>>>>>> jdk/make/closed: abort: No such file or directory: jdk/make/closed
>>>>>> Waiting 5 secs before spawning next background command.
>>>>>> ....
>>>>>>
>>>>>>
>>>>>> -Chris.
More information about the build-dev
mailing list