SIGBUS on linux in perfMemory_init

Vitaly Davidovich vitalyd at gmail.com
Tue May 3 11:05:08 UTC 2022


Hi all,

Wanted to bump this thread in case someone with thoughts/opinions missed it
the first time around.

Solutions aside, should a JBS entry be filed to record/track this?

Thanks

On Fri, Apr 29, 2022 at 9:44 AM Vitaly Davidovich <vitalyd at gmail.com> wrote:

> Hi all,
>
> We've been seeing intermittent SIGBUS failures on linux with jdk11.  They
> all have this distinctive backtrace:
>
> C  [libc.so.6+0x12944d]
>
> V  [libjvm.so+0xcca542]  perfMemory_init()+0x72
>
> V  [libjvm.so+0x8a3242]  vm_init_globals()+0x22
>
> V  [libjvm.so+0xedc31d]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x1ed
>
> V  [libjvm.so+0x9615b2]  JNI_CreateJavaVM+0x52
>
> C  [libjli.so+0x49af]  JavaMain+0x8f
>
> C  [libjli.so+0x9149]  ThreadJavaMain+0x9
>
>
> Initially, we suspected that /tmp was full but that turned out to not be
> the case.  After a few more instances of the crash and investigation, we
> believe we know the root cause.
>
>
> The crashing applications are all running in a K8 pod, with each JVM in a
> separate container:
>
>
> container_type: cgroupv1 (from the hs_err file)
>
>
> /tmp is mounted such that it's shared by multiple containers.  Since these
> JVMs are running in containers, we believe what happens is the namespaced
> (i.e. per container) PIDs overlap between different containers - 2 JVMs, in
> separate containers, can end up with the same namespaced PID.  Since /tmp
> is shared, they can now "contend" on the same perfMemory file since those
> file names are PID based.
>
>
> Once multiple JVMs can contend on the same file, a SIGBUS can arise if one
> JVM has mmap'd the file and another ftruncate()'s it from under it (e.g.
> https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/os/linux/perfMemory_linux.cpp#L909
> ).
>
>
> Is this a known issue? I couldn't find any existing JBS entries or mailing
> list discussions around this specific circumstance.
>
>
> As for possible solutions, would it be possible to use the global PID
> instead of the namespaced PID to "regain" the uniqueness invariant of the
> PID? Also, might it make sense to flock() the file to prevent another
> process from mucking with it?
>
>
> Happy to provide more info if needed.
>
>
> Thanks
>
>
> --
Sent from my phone


More information about the hotspot-runtime-dev mailing list