SIGBUS on linux in perfMemory_init

Nico Williams nico at cryptonector.com
Tue May 3 15:41:32 UTC 2022


On Fri, Apr 29, 2022 at 09:44:00AM -0400, Vitaly Davidovich wrote:
> As for possible solutions, would it be possible to use the global PID
> instead of the namespaced PID to "regain" the uniqueness invariant of the
> PID? Also, might it make sense to flock() the file to prevent another
> process from mucking with it?

My unsolicited, outsider opinions:

 - Sharing /tmp across containers is a Bad Idea (tm).

 - Sharing /tmp across related containers (in a pod) is not _as_ bad an
   idea.

   (It might be a way to implement some cross-container communications,
   though it would be better to have an explicit mechanism for that
   rather than the rather-generic /tmp.)

 - Containerizing apps that *do* communicate over /tmp might be one
   reason one might configure a shared /tmp in a pod.

   Some support for such a configuration might be needed.

   (Alternatively, pods that share /tmp should also share a PID
   namespace.)

 - Since there is an option to not have an mmap'ed hsperf file, it might
   be nice to have an option to use the global PID for naming hsperf
   files.  Or, better, implement an automatic mechanism for detecting
   conflict and switching to global PID for naming hsperf files (or
   switching to anonymous hsperf mmaps).

 - In any case, on systems that have a real flock(2), using flock(2) for
   liveness testing is better than kill(2) with signal 0 -- the latter
   has false positives, while the former does not [provided O_CLOEXEC is
   used].

   For this reason, and though I am not too sympathetic to the situation
   that caused this crash, I believe that it would be better to have
   some sort of fix for this problem than to declare it a non-problem
   and not-fix it.


I would like to expand on Vitaly's mention of flock(2).  Using the
global PID would leave the JVM unable to use kill(2) with signal 0 for
liveness detection during hsperf garbage file collection.  Using kill(2)
with signal 0 for liveness is not that reliable anyways because of PID
reuse -- it can have false positives.

A better mechanism for liveness detection would be to have the owning
JVM take an exclusive (LOCK_EX) flock(2) on the hsperf file at startup,
and for hsperf garbage file collection to try (LOCK_NB) to get an
exclusive lock (LOCK_EX) on a candidate hsperf garbage file as a
liveness detection mechanism.

When using the namespaced PID the kill(2) with signal 0 method of
liveness detection should still be used for backwards-compatibility in,
e.g., jvisualvm.

Using flock(2) would be less portable than kill(2) with signal 0, but
already there is a bunch of Linux-specific code here looking through
/proc, and Linux does have a real flock(2).

An adaptive, zero-conf hsperf file naming scheme might use the
namespaced PID if available (i.e., if an exclusive flock(2) could be
obtained on the file), or the global PID if not, with some indication in
the name of the file's name of which kind of PID was used.

Cheers,

Nico
-- 


More information about the hotspot-runtime-dev mailing list