SIGBUS on linux in perfMemory_init
Vitaly Davidovich
vitalyd at gmail.com
Tue May 3 16:05:48 UTC 2022
On Tue, May 3, 2022 at 11:54 AM Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
>
> On Tue, May 3, 2022 at 11:07 AM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>>
>>
>> On 5/3/2022 4:05 AM, Vitaly Davidovich wrote:
>> > Hi all,
>> >
>> > Wanted to bump this thread in case someone with thoughts/opinions
>> missed it
>> > the first time around.
>> >
>> > Solutions aside, should a JBS entry be filed to record/track this?
>>
>> I already filed a JBS issue on your behalf
>> https://bugs.openjdk.java.net/browse/JDK-8286030
>
> I can't comment on the JBS, but another workaround (which we're employing)
is -XX:+PerfDisableSharedMem. Per my understanding, this will prevent
certain tools from locating the JVM instance but still allows something
like `jcmd` to connect (via an explicitly supplied pid) and read the perf
counters.
>
>>
>> Thanks
>> - Ioi
>
> Ah, thanks Ioi!
>
>>
>>
>> > Thanks
>> >
>> > On Fri, Apr 29, 2022 at 9:44 AM Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>> >
>> >> Hi all,
>> >>
>> >> We've been seeing intermittent SIGBUS failures on linux with jdk11.
>> They
>> >> all have this distinctive backtrace:
>> >>
>> >> C [libc.so.6+0x12944d]
>> >>
>> >> V [libjvm.so+0xcca542] perfMemory_init()+0x72
>> >>
>> >> V [libjvm.so+0x8a3242] vm_init_globals()+0x22
>> >>
>> >> V [libjvm.so+0xedc31d] Threads::create_vm(JavaVMInitArgs*,
>> bool*)+0x1ed
>> >>
>> >> V [libjvm.so+0x9615b2] JNI_CreateJavaVM+0x52
>> >>
>> >> C [libjli.so+0x49af] JavaMain+0x8f
>> >>
>> >> C [libjli.so+0x9149] ThreadJavaMain+0x9
>> >>
>> >>
>> >> Initially, we suspected that /tmp was full but that turned out to not
>> be
>> >> the case. After a few more instances of the crash and investigation,
>> we
>> >> believe we know the root cause.
>> >>
>> >>
>> >> The crashing applications are all running in a K8 pod, with each JVM
>> in a
>> >> separate container:
>> >>
>> >>
>> >> container_type: cgroupv1 (from the hs_err file)
>> >>
>> >>
>> >> /tmp is mounted such that it's shared by multiple containers. Since
>> these
>> >> JVMs are running in containers, we believe what happens is the
>> namespaced
>> >> (i.e. per container) PIDs overlap between different containers - 2
>> JVMs, in
>> >> separate containers, can end up with the same namespaced PID. Since
>> /tmp
>> >> is shared, they can now "contend" on the same perfMemory file since
>> those
>> >> file names are PID based.
>> >>
>> >>
>> >> Once multiple JVMs can contend on the same file, a SIGBUS can arise if
>> one
>> >> JVM has mmap'd the file and another ftruncate()'s it from under it
>> (e.g.
>> >>
>> https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/os/linux/perfMemory_linux.cpp#L909
>> >> ).
>> >>
>> >>
>> >> Is this a known issue? I couldn't find any existing JBS entries or
>> mailing
>> >> list discussions around this specific circumstance.
>> >>
>> >>
>> >> As for possible solutions, would it be possible to use the global PID
>> >> instead of the namespaced PID to "regain" the uniqueness invariant of
>> the
>> >> PID? Also, might it make sense to flock() the file to prevent another
>> >> process from mucking with it?
>> >>
>> >>
>> >> Happy to provide more info if needed.
>> >>
>> >>
>> >> Thanks
>> >>
>> >>
>> >> --
>> > Sent from my phone
>>
>> --
> Sent from my phone
>
More information about the hotspot-runtime-dev
mailing list