RFR: JDK-8274320: os::fork_and_exec() should be using posix_spawn

Thomas Stüfe thomas.stuefe at gmail.com
Fri Oct 22 10:01:52 UTC 2021


Hi Florian, David,

@Florian: thanks a lot for digging this up!

The oldest glibc release we need is difficult to estimate, it depends on
what individual downstream vendors want to do with the JVM. If we don't
downport this patch I guess glibc 2.24 would be a safe bet. From what I can
see this is when posix_spawn on Linux started using clone(). The only still
supported commercial distro with an older glibc I am aware of is Ubuntu
16.04.

---

So IUUC we could deadlock today with fork() too, if we crash inside malloc.
I'd say posix_spawn sounds good then, since according to Florian
it's async-signal safe, and it works better under memory pressure. We still
don't know what other libc's do.

The worst thing which can happen is that we hang. David is right, that
would be bad. But not super-bad, we still have the global
`ErrorLogTimeout`. That one kicks in after 2 minutes and _exit()s the VM.
But we lose the core then.

As a future improvement, it may make sense to extend the
secondary-signal-and-timeout-capture-feature in VMError::report() (the STEP
macro and friends) to encompass the caller function
VMError::report_and_die(). In other words, all the steps in
VMError::report_and_die(), including the handling of '-XX:OnError', should
run with individual timeouts too and not endanger the follow-up steps. That
way, if we spawn a tool with -XX:OnError which hangs, we would not wait for
the ErrorLogTimeout to _exit() but would cancel this individual step and
continue with the next step. And we would still abort() at the end and get
a core.

Cheers, Thomas


On Fri, Oct 22, 2021 at 9:19 AM David Holmes <david.holmes at oracle.com>
wrote:

> Hi Florian,
>
> On 22/10/2021 5:34 am, Florian Weimer wrote:
> > * David Holmes:
> >
> >> Sorry but I have to disagree. fork() is async-signal-safe, but if an
> >> at-fork handler is not, then all bets are off - that is fine, it is
> >> the best we can do. But posix_spawn makes no claim to any kind of
> >> async-signal safety so we very much do lose something by switching to
> >> it IMO.
> >
> > Sorry, I didn't see the thread until now.
> >
> > The glibc implementation of fork is not async-signal-safe even if the
> > process has not installed any fork handlers.  Our (downstream)
> > perspective is captured here:
> >
> >    Using the fork function in signal handlers
> >    <https://access.redhat.com/articles/2921161>
>
> I'm surprised then that we have not encountered any such reported
> deadlocks in recent years. I found this issue also somewhat illuminating:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=4737
>
> especially the report that fork() is no longer required to be
> async-signal-safe, but IIUC neither is posix_spawn, so we're left with
> no way to implement this functionality in a sound way and must hope for
> the best from the implementation. That's not very satisfactory. But in
> light of this I can't really reject the change to use posix_spawn on the
> grounds that fork() is safer.
>
> Cheers,
> David
> -----
>
> > The current implementation of posix_spawn in glibc is async-signal-safe,
> > I think.  I would have to ask on libc-alpha if we can make this official
> > in any way.  The current musl implementation seems to be safe as well.
> >
> > The glibc implementation of posix_spawn has changed substantially over
> > the years, and I can dig through the history to make sure it has not
> > changed materially.  What's the oldest glibc release you still need to
> > support?
> >
> > Other functions in the posix_spawn corner (for maintaining file actions)
> > are definitely not safe because they call malloc internally, but the
> > current patch does not use them.
> >
> > When used carefully, vfork can be made async-signal-safe.  But you
> > really have to block signals before calling it, and in the subprocess,
> > restore the signal handler disposition to SIG_DFL, and then unblock the
> > signals.  Otherwise some signal handler might run with a
> > slightly-incorrect TCB.  (Historic posix_spawn implementations did not
> > do the signal handlers dance.)  At least vfork does not run fork
> > handlers.
> >
> > VMError::report_and_die() seems to call fopen for the replay data file.
> > There are probably more issues like that.
> >
> > Thanks,
> > Florian
> >
>


More information about the hotspot-runtime-dev mailing list