SIGBUS on linux in perfMemory_init
Vitaly Davidovich
vitalyd at gmail.com
Tue May 3 15:54:15 UTC 2022
On Tue, May 3, 2022 at 11:07 AM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
> On 5/3/2022 4:05 AM, Vitaly Davidovich wrote:
> > Hi all,
> >
> > Wanted to bump this thread in case someone with thoughts/opinions missed
> it
> > the first time around.
> >
> > Solutions aside, should a JBS entry be filed to record/track this?
>
> I already filed a JBS issue on your behalf
> https://bugs.openjdk.java.net/browse/JDK-8286030
>
> Thanks
> - Ioi
Ah, thanks Ioi!
>
>
> > Thanks
> >
> > On Fri, Apr 29, 2022 at 9:44 AM Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
> >
> >> Hi all,
> >>
> >> We've been seeing intermittent SIGBUS failures on linux with jdk11.
> They
> >> all have this distinctive backtrace:
> >>
> >> C [libc.so.6+0x12944d]
> >>
> >> V [libjvm.so+0xcca542] perfMemory_init()+0x72
> >>
> >> V [libjvm.so+0x8a3242] vm_init_globals()+0x22
> >>
> >> V [libjvm.so+0xedc31d] Threads::create_vm(JavaVMInitArgs*,
> bool*)+0x1ed
> >>
> >> V [libjvm.so+0x9615b2] JNI_CreateJavaVM+0x52
> >>
> >> C [libjli.so+0x49af] JavaMain+0x8f
> >>
> >> C [libjli.so+0x9149] ThreadJavaMain+0x9
> >>
> >>
> >> Initially, we suspected that /tmp was full but that turned out to not be
> >> the case. After a few more instances of the crash and investigation, we
> >> believe we know the root cause.
> >>
> >>
> >> The crashing applications are all running in a K8 pod, with each JVM in
> a
> >> separate container:
> >>
> >>
> >> container_type: cgroupv1 (from the hs_err file)
> >>
> >>
> >> /tmp is mounted such that it's shared by multiple containers. Since
> these
> >> JVMs are running in containers, we believe what happens is the
> namespaced
> >> (i.e. per container) PIDs overlap between different containers - 2
> JVMs, in
> >> separate containers, can end up with the same namespaced PID. Since
> /tmp
> >> is shared, they can now "contend" on the same perfMemory file since
> those
> >> file names are PID based.
> >>
> >>
> >> Once multiple JVMs can contend on the same file, a SIGBUS can arise if
> one
> >> JVM has mmap'd the file and another ftruncate()'s it from under it (e.g.
> >>
> https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/os/linux/perfMemory_linux.cpp#L909
> >> ).
> >>
> >>
> >> Is this a known issue? I couldn't find any existing JBS entries or
> mailing
> >> list discussions around this specific circumstance.
> >>
> >>
> >> As for possible solutions, would it be possible to use the global PID
> >> instead of the namespaced PID to "regain" the uniqueness invariant of
> the
> >> PID? Also, might it make sense to flock() the file to prevent another
> >> process from mucking with it?
> >>
> >>
> >> Happy to provide more info if needed.
> >>
> >>
> >> Thanks
> >>
> >>
> >> --
> > Sent from my phone
>
> --
Sent from my phone
More information about the hotspot-runtime-dev
mailing list