Some update on Cygwin hangs

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Thu Oct 18 02:39:43 PDT 2012


Erik and I have been chasing the cygwin instability for a while. This is 
a report of our (or at least mine :)) current understanding.
* Overall, cygwin builds seems pretty stable nowadays, even on Windows 7.
* The one glaring exception is a hang that's quite repeatable, under the 
right circumstances.
* This hang happens while building images. Typically you see an output 
about "ctsym" right before the hang. We've named this the "ctsym bug", 
even though ctsym does not have anything to do with it. (It's just about 
the last successful thing we managed to do before the hang).
* Adding a simple "echo" output between running create-jars and images 
made the issue much harder to repeat.
* When running with JOBS=1, the hang is much harder to reproduce.

I made a small hack that ran just the images target with -j1, and that 
run orders of magnitude more times before hanging than when running with 
default parallelism (4 on my machine). However, in the end, it still 
hanged, but after like 2-3 days.

I have now managed to reproduce the hang with JOBS=1 LOG=trace. Of 
course, adding debugging might change things enough that this is not the 
real case, but it seems likely to be.

This is how far we got:
* make -f Images.gmk
* all static code (VAR=$(shell ...), mostly a bunch of find's) have been 
executed.
* we have *just* started executing our first rule, which is at line 77 
in Images.gmk:
$(JRE_IMAGE_DIR)/bin/%: $(JDK_OUTPUTDIR)/bin/%
     $(ECHO) $(LOG_INFO) Copying $(patsubst $(OUTPUT_ROOT)/%,%,$@)
     $(install-file)

The last few lines of output is:
Images.gmk:78: Building 
/cygdrive/c/localdata/hg/build-infra-jdk8-b/build/windows-x86_64-normal-server-release/images/j2re-image/bin/attach.diz 
(from 
/cygdrive/c/localdata/hg/build-infra-jdk8-b/build/windows-x86_64-normal-server-release/jdk/bin/attach.diz) 
(/cygdrive/c/localdata/hg/build-infra-jdk8-b/build/windows-x86_64-normal-server-release/jdk/bin/attach.diz 
newer)
/usr/bin/echo  Copying images/j2re-image/bin/attach.diz

and then we hang. The macro install-file is defined as such:
ifeq ($(OPENJDK_TARGET_OS),solaris)
# On Solaris, if the target is a symlink and exists, cp won't overwrite.
define install-file
# ...
endef
else ifeq ($(OPENJDK_TARGET_OS),macosx)
define install-file
# ...
endef
else
define install-file
     $(MKDIR) -p $(@D)
     $(CP) -fP '$<' '$@'
endef
endif

So we seem to get stuck at the mkdir. Let's check the output dir!
$ ls 
/cygdrive/c/localdata/hg/build-infra-jdk8-b/build/windows-x86_64-normal-server-release/images/j2re-image/bin
ls: cannot access 
/cygdrive/c/localdata/hg/build-infra-jdk8-b/build/windows-x86_64-normal-server-release/images/j2re-image/bin: 
No such file or directory

Aha! Not created. We haven't even started the recurseive mkdir:
$ ls 
/cygdrive/c/localdata/hg/build-infra-jdk8-b/build/windows-x86_64-normal-server-release/images
lib  local_policy_jar.tmp  src  src.zip  symbols US_export_policy_jar.tmp

Look, no j2re-image directory.

So what do we make of this? I don't know. I'm not sure how to proceed on 
this, except to add some more debug output. It might be that multi-level 
directory creation (mkdir -p needed to create both j2re-image and 
j2re-image/bin) is unstable in make in Windows. On the other hand, since 
we're running with LOG=trace, make should always execute the external 
shell and not try to shortcut it for known operations.

I didn't say I had a solution, just an update. :-)

/Magnus




More information about the build-infra-dev mailing list