Re: 6850720: Allow POSIX_SPAWN to be used for ProcessImpl on Linux

22 Oct 2018

      Hi Florian,

our mails crossed... I think I am fine now with posix_spawn(),
provided we do enough testing.

But I'll answer your questions inline.

On Mon, Oct 22, 2018 at 9:00 PM Florian Weimer <fweimer@redhat.com> wrote:
...
* Thomas Stüfe:
...
So far I have not read a single technical reason in this thread why
vfork needs to be abandoned now - apart from it being obsolete. If you
read my initial thread from September, you know that I think we have
understood vfork's shortcomings very well, and that our (SAPs)
proposed patch shows that they can be dealt with. In our port, our
vfork+exec*2 is solid since many years, without any issues.
The main problem for vfork in application code is that you need to *all*
disable signals, even signals used by the implementation.  If a signal
handler runs by accident while the vfork is active, memory corruption is
practically guaranteed.  The only way to disable the signals is with a
direct system call; sigprocmask/pthread_sigmask do not work.
Does your implementation do this?
I understand. No, admittedly not. But we squeeze the vulnerable time
window to the minimal possible:

if (vfork() == 0) exec(..);

which was a large step forward from the stock Ojdk solution.

While not completely bullet proof, I saw not a single instance of an
error in all these years (I understand those errors would be very
intermittent and difficult to attribute to vfork+signalling, so we may
have missed some).
...
...
The current posix_spawn() implementation was added to glibc with glibc
2.24. So, what was the state of posix_spawn() before that version? Is
it safe to use, does it do the right thing, or will we encounter
regressions?
It uses fork by default.  It can be told to use vfork, via
POSIX_SPAWN_USEVFORK, but then it is buggy.  For generic JDK code, this
seems hardly appropriate.
Are you sure about this? The coding I saw in  glibc < 2.24 was that it
would use vfork if both attributes and file actions were NULL, which
should be the case with the OpenJDK and jspawnhelper.

fork() would be bad and a reason not to use posix_spawn().
...
...
My Ubuntu 16.04 box runs glibc 2.23. Arguably, Ubuntu 16.04 is quite a
common distro. I have to check our machines at work, but I am very
sure that our zoo of SLES and RHEL servers do not all run glibc>=2.24,
especially on the more exotic architectures.
In glibc, the vfork-based performance does not bring in any new ABIs, so
it is in theory backportable.  The main risk is that the vfork
optimization landed in glibc 2.24, and the PID cache was removed in
glibc 2.25.  vfork with the PID cache was really iffy, but I would not
recommend to backport the PID cache removal.  But Debian 9/stretch uses
glibc 2.24, and I think that shows that the vfork optimization with the
PID cache should be safe enough.  (Of course you need to remove the
assert that fires if the vfork does not actually stop the parent process
and is implemented as a fork; the glibc implementation still works, but
with somewhat degraded error checking.)
How far back would you want to see this changed?  Debian jessie and Red
Hat Enterprise Linux 6 would be rather unlikely.  If you want to target
those, your only chance is to essentially duplicate the glibc
implementation in OpenJDK.
As I wrote before, if I understand the coding in glibc between 2.4 and
2.24 correctly, I think it uses vfork() and that should be fine by me:

posix_spawn() using vfork(), with no attributes/file actions and in
conjunction with the jspawnhelper, is almost exactly the same as the
proposed vfork() + exec*2 patch: posix_spawn() will exec() immediately
after the vfork(), then, in jspwnhelper, we set up the new process and
exec() again. So I am fine with that.

Provided I have understood all that stuff correctly and not made a
thinking error somewhere.

Cheers, Thomas
...
Thanks,
Florian