SIGBUS on linux in perfMemory_init

Ioi Lam ioi.lam at oracle.com
Thu May 5 20:48:37 UTC 2022



On 5/3/2022 8:41 AM, Nico Williams wrote:
> On Fri, Apr 29, 2022 at 09:44:00AM -0400, Vitaly Davidovich wrote:
>> As for possible solutions, would it be possible to use the global PID
>> instead of the namespaced PID to "regain" the uniqueness invariant of the
>> PID? Also, might it make sense to flock() the file to prevent another
>> process from mucking with it?
> My unsolicited, outsider opinions:
>
>   - Sharing /tmp across containers is a Bad Idea (tm).
>
>   - Sharing /tmp across related containers (in a pod) is not _as_ bad an
>     idea.
>
>     (It might be a way to implement some cross-container communications,
>     though it would be better to have an explicit mechanism for that
>     rather than the rather-generic /tmp.)
>
>   - Containerizing apps that *do* communicate over /tmp might be one
>     reason one might configure a shared /tmp in a pod.
>
>     Some support for such a configuration might be needed.
>
>     (Alternatively, pods that share /tmp should also share a PID
>     namespace.)
>
>   - Since there is an option to not have an mmap'ed hsperf file, it might
>     be nice to have an option to use the global PID for naming hsperf
>     files.  Or, better, implement an automatic mechanism for detecting
>     conflict and switching to global PID for naming hsperf files (or
>     switching to anonymous hsperf mmaps).
>
>   - In any case, on systems that have a real flock(2), using flock(2) for
>     liveness testing is better than kill(2) with signal 0 -- the latter
>     has false positives, while the former does not [provided O_CLOEXEC is
>     used].
>
>     For this reason, and though I am not too sympathetic to the situation
>     that caused this crash, I believe that it would be better to have
>     some sort of fix for this problem than to declare it a non-problem
>     and not-fix it.
>
>
> I would like to expand on Vitaly's mention of flock(2).  Using the
> global PID would leave the JVM unable to use kill(2) with signal 0 for
> liveness detection during hsperf garbage file collection.  Using kill(2)
> with signal 0 for liveness is not that reliable anyways because of PID
> reuse -- it can have false positives.
>
> A better mechanism for liveness detection would be to have the owning
> JVM take an exclusive (LOCK_EX) flock(2) on the hsperf file at startup,
> and for hsperf garbage file collection to try (LOCK_NB) to get an
> exclusive lock (LOCK_EX) on a candidate hsperf garbage file as a
> liveness detection mechanism.
>
> When using the namespaced PID the kill(2) with signal 0 method of
> liveness detection should still be used for backwards-compatibility in,
> e.g., jvisualvm.
>
> Using flock(2) would be less portable than kill(2) with signal 0, but
> already there is a bunch of Linux-specific code here looking through
> /proc, and Linux does have a real flock(2).
>
> An adaptive, zero-conf hsperf file naming scheme might use the
> namespaced PID if available (i.e., if an exclusive flock(2) could be
> obtained on the file), or the global PID if not, with some indication in
> the name of the file's name of which kind of PID was used.

Hi Nico,

I read your message again and now I totally agree with using flock(2) :-)

As you said, we should start with getpid(). That way the behavior is 
compatible with older versions of jcmd tools, especially when Java is 
used outside of containers.

One thing I realized is that if we have a collision, we don't need to 
use a globally unique ID. We just need an ID that's unique in the 
directory being written into.

I think we can do this on the VM side:

     String id = getpid();
     while (true) {
         String file = "/tmp/hsperfdata_" + username() + "/" + id;
         if (get_exclusive_access(file)) {
             // I won the contest and
             // (a) the file didn't exist, or
             // (b) the file existed but the JVM that used it has died
             return file;
         }
         // Add an "x" here so we don't collide with the getpid() of 
another process
         id = "x" + random();
     }

On the tools side, we can do the pid -> rendezvous file mapping as I 
described in the other e-mail.

Thanks
- Ioi



> Cheers,
>
> Nico



More information about the hotspot-runtime-dev mailing list