Review request for 5049299

Martin Buchholz martinrb at google.com
Tue Jun 23 00:45:59 UTC 2009


clone-exec update:

I submitted the changes for this, but jtreg tests failed on 32-bit Linux
(I had only tested on 64-bit Linux)

We disabled (but did not roll back) the use of clone to allow the
TL integration to proceed.

(As I promised elsewhere...)
I just filed a bug against upstream glibc demonstrating the problem
with clone(CLONE_VM).  You can see the small C program
in my bug report below.
Probably any discussion related just to the glibc bug
can occur on the public glibc bugzilla at
http://sources.redhat.com/bugzilla/show_bug.cgi?id=10311

glibc maintainer Uli Drepper has already responded
saying

"If you use clone() you're on your own."

so if we are going to fix it, we'll have to do it ourselves.
Help from threading/kernel hackers appreciated.

Thanks much,

Martin

 ---------- Forwarded message ----------
From: martinrb at google dot com <sourceware-bugzilla at sourceware.org>
Date: Mon, Jun 22, 2009 at 12:23
Subject: [Bug nptl/10311] New: clone(CLONE_VM) fails with pthread_getattr_np
on i386
To: martinrb at google.com


I'm using clone() with flags CLONE_VM, but not CLONE_THREAD.
(background: I'm trying to solve the ancient overcommit failure
when spawning a small Unix process from a big process).

The act of calling clone appears to mess up the pthread library,
but only on i386, not on x86_64, using glibc version 2.7
(The bugzilla Version drop-down does not allow one to specify 2.7;
y'all should fix that)

Here's a shell transcript containing a program
that demonstrates the problem, and shows that
the problem does not occur when running in 64-bit mode
on 64-bit Linux.  (The problem also occurs when running in 32-bit mode
on 32-bit Linux).

A program like this would be a fine addition to the glibc test suite.

$ set -x; for flag in -m32 -m64; do gcc $flag -lpthread ./clone_bug.c &&
./a.out; done; cat clone_bug.c; uname -a; getconf GNU_LIBPTHREAD_VERSION;
getconf GNU_LIBC_VERSION
+zsh:1464> set -x
+zsh:1464> flag=-m32
+zsh:1464> gcc -m32 -lpthread ./clone_bug.c
+zsh:1464> ./a.out
count=2, pthread_getattr_np failed with errno = "No such process"
+zsh:1464> flag=-m64
+zsh:1464> gcc -m64 -lpthread ./clone_bug.c
+zsh:1464> ./a.out
+zsh:1464> cat clone_bug.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <stddef.h>
#include <sys/types.h>
#include <wait.h>
#include <errno.h>
#include <unistd.h>
#include <pthread.h>
#include <syscall.h>
#include <sched.h>

static void
debugPrint(char *format, ...) {
 FILE *tty = fopen("/dev/tty", "w");
 va_list ap;
 va_start(ap, format);
 vfprintf(tty, format, ap);
 va_end(ap);
 fclose(tty);
}

static void debugPids(void) {
//   debugPrint("getpid()=%d gettid()=%d, syscall(getpid)=%d
pthread_self=%d\n",
//              getpid(), syscall(SYS_gettid), syscall(SYS_getpid),
pthread_self());
 static int count = 0;
 pthread_attr_t attr;
 int result;
 ++count;
 if ((result = pthread_getattr_np(pthread_self(), &attr)) != 0)
   debugPrint("count=%d, pthread_getattr_np failed with errno = \"%s\"\n",
              count, strerror(result));
}

static int childProcess(void *ignored) {
 _exit(0);
 // debugPrint("child\n");
 // execve("/bin/true", NULL, NULL);
 // perror("execve");
}

// I'm sure there's a better way to do this,
// but pthread_join ain't it - we can't trust it.
volatile int done = 0;

void* run(void *x) {
 const int stack_size = 1024 * 1024;
 void *clone_stack = malloc(2 * stack_size);
 int status;
 debugPids();
 int pid = clone(childProcess, clone_stack + stack_size,
                 CLONE_VM | SIGCHLD, NULL);
 waitpid(pid, &status, 0);
 debugPids();
 done = 1;
 pthread_exit(0);
 return NULL;
}

int main(int argc, char *argv[]) {
 pthread_attr_t attr;
 pthread_t tid;

 pthread_attr_init(&attr);
 pthread_create(&tid, &attr, (void* (*)(void*)) run, NULL);
 // pthread_join(tid, NULL);
 while (! done)
   ;
}
+zsh:1464> uname -a
Linux spraggett.mtv.corp.google.com 2.6.24-gg23-generic #1 SMP Fri Jan 30
14:07:49 PST 2009 x86_64 GNU/Linux
+zsh:1464> getconf GNU_LIBPTHREAD_VERSION
NPTL 2.7
+zsh:1464> getconf GNU_LIBC_VERSION
glibc 2.7

--
          Summary: clone(CLONE_VM) fails with pthread_getattr_np on i386
          Product: glibc
          Version: 2.8
           Status: NEW
         Severity: normal
         Priority: P2
        Component: nptl
       AssignedTo: drepper at redhat dot com
       ReportedBy: martinrb at google dot com
               CC: glibc-bugs at sources dot redhat dot com
 GCC host triplet: x86_64-unknown-linux-gnu


http://sourceware.org/bugzilla/show_bug.cgi?id=10311

Martin


On Thu, Jun 11, 2009 at 14:16, Martin Buchholz <martinrb at google.com> wrote:

> Thanks, Michael
>
> I'm hoping the following will placate sun studio cc:
>
> diff --git a/src/solaris/native/java/lang/UNIXProcess_md.c
> b/src/solaris/native/java/lang/UNIXProcess_md.c
> --- a/src/solaris/native/java/lang/UNIXProcess_md.c
> +++ b/src/solaris/native/java/lang/UNIXProcess_md.c
> @@ -651,6 +651,7 @@
>      }
>      close(FAIL_FILENO);
>      _exit(-1);
> +    return 0;  /* Suppress warning "no return value from function" */
>  }
>
> I'm also adding my manual test case BigFork.java.
> It may be helpful while implementing the Solaris version of this feature.
>
> webrev updated.
>
> I need a Sun bug to commit these changes for Linux.  Please create one.
>
> Synopsis: * (process) Use clone(CLONE_VM), not fork, on Linux to avoid
> swap exhaustion <http://bugs.sun.com/view_bug.do?bug_id=5049299>*
>
> Description:
> On Linux it is possible to use clone with CLONE_VM, but not CLONE_THREAD,
> which is like fork() but much cheaper and avoids swap exhaustion due to
> momentary
> overcommit of swap space.  One has to be very careful in this case to not
> mutate global
> variables such as environ, but it's worth it.
>
> Evaluation:
> Make it so.
>
> See also: 5049299
>
> Once that is done, I will commit my changes.
>
> Thanks,
>
> Martin
>
>
> On Thu, Jun 11, 2009 at 07:22, Michael McMahon <Michael.McMahon at sun.com>wrote:
>
>> Martin Buchholz wrote:
>>
>>> I broke down and finally created a "proper" webrev,
>>> just like the good old days.
>>>
>>>  http://cr.openjdk.java.net/~martin/clone-exec/<http://cr.openjdk.java.net/%7Emartin/clone-exec/><
>>> http://cr.openjdk.java.net/%7Emartin/clone-exec/>
>>>
>>>  I've run the regression tests on Solaris and Linux and they seem fine.
>> There is a compile warning on solaris at line 654: no return value from
>> function.
>> Aside from that, I'm happy with the change now
>>
>> - Michael.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20090622/d1688712/attachment.html>


More information about the core-libs-dev mailing list