SIGBUS on linux in perfMemory_init
Nico Williams
nico at cryptonector.com
Tue May 3 15:41:32 UTC 2022
On Fri, Apr 29, 2022 at 09:44:00AM -0400, Vitaly Davidovich wrote:
> As for possible solutions, would it be possible to use the global PID
> instead of the namespaced PID to "regain" the uniqueness invariant of the
> PID? Also, might it make sense to flock() the file to prevent another
> process from mucking with it?
My unsolicited, outsider opinions:
- Sharing /tmp across containers is a Bad Idea (tm).
- Sharing /tmp across related containers (in a pod) is not _as_ bad an
idea.
(It might be a way to implement some cross-container communications,
though it would be better to have an explicit mechanism for that
rather than the rather-generic /tmp.)
- Containerizing apps that *do* communicate over /tmp might be one
reason one might configure a shared /tmp in a pod.
Some support for such a configuration might be needed.
(Alternatively, pods that share /tmp should also share a PID
namespace.)
- Since there is an option to not have an mmap'ed hsperf file, it might
be nice to have an option to use the global PID for naming hsperf
files. Or, better, implement an automatic mechanism for detecting
conflict and switching to global PID for naming hsperf files (or
switching to anonymous hsperf mmaps).
- In any case, on systems that have a real flock(2), using flock(2) for
liveness testing is better than kill(2) with signal 0 -- the latter
has false positives, while the former does not [provided O_CLOEXEC is
used].
For this reason, and though I am not too sympathetic to the situation
that caused this crash, I believe that it would be better to have
some sort of fix for this problem than to declare it a non-problem
and not-fix it.
I would like to expand on Vitaly's mention of flock(2). Using the
global PID would leave the JVM unable to use kill(2) with signal 0 for
liveness detection during hsperf garbage file collection. Using kill(2)
with signal 0 for liveness is not that reliable anyways because of PID
reuse -- it can have false positives.
A better mechanism for liveness detection would be to have the owning
JVM take an exclusive (LOCK_EX) flock(2) on the hsperf file at startup,
and for hsperf garbage file collection to try (LOCK_NB) to get an
exclusive lock (LOCK_EX) on a candidate hsperf garbage file as a
liveness detection mechanism.
When using the namespaced PID the kill(2) with signal 0 method of
liveness detection should still be used for backwards-compatibility in,
e.g., jvisualvm.
Using flock(2) would be less portable than kill(2) with signal 0, but
already there is a bunch of Linux-specific code here looking through
/proc, and Linux does have a real flock(2).
An adaptive, zero-conf hsperf file naming scheme might use the
namespaced PID if available (i.e., if an exclusive flock(2) could be
obtained on the file), or the global PID if not, with some indication in
the name of the file's name of which kind of PID was used.
Cheers,
Nico
--
More information about the hotspot-runtime-dev
mailing list