SIGBUS on linux in perfMemory_init

Ioi Lam ioi.lam at oracle.com
Tue May 3 15:06:58 UTC 2022



On 5/3/2022 4:05 AM, Vitaly Davidovich wrote:
> Hi all,
>
> Wanted to bump this thread in case someone with thoughts/opinions missed it
> the first time around.
>
> Solutions aside, should a JBS entry be filed to record/track this?

I already filed a JBS issue on your behalf
https://bugs.openjdk.java.net/browse/JDK-8286030

Thanks
- Ioi

> Thanks
>
> On Fri, Apr 29, 2022 at 9:44 AM Vitaly Davidovich <vitalyd at gmail.com> wrote:
>
>> Hi all,
>>
>> We've been seeing intermittent SIGBUS failures on linux with jdk11.  They
>> all have this distinctive backtrace:
>>
>> C  [libc.so.6+0x12944d]
>>
>> V  [libjvm.so+0xcca542]  perfMemory_init()+0x72
>>
>> V  [libjvm.so+0x8a3242]  vm_init_globals()+0x22
>>
>> V  [libjvm.so+0xedc31d]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x1ed
>>
>> V  [libjvm.so+0x9615b2]  JNI_CreateJavaVM+0x52
>>
>> C  [libjli.so+0x49af]  JavaMain+0x8f
>>
>> C  [libjli.so+0x9149]  ThreadJavaMain+0x9
>>
>>
>> Initially, we suspected that /tmp was full but that turned out to not be
>> the case.  After a few more instances of the crash and investigation, we
>> believe we know the root cause.
>>
>>
>> The crashing applications are all running in a K8 pod, with each JVM in a
>> separate container:
>>
>>
>> container_type: cgroupv1 (from the hs_err file)
>>
>>
>> /tmp is mounted such that it's shared by multiple containers.  Since these
>> JVMs are running in containers, we believe what happens is the namespaced
>> (i.e. per container) PIDs overlap between different containers - 2 JVMs, in
>> separate containers, can end up with the same namespaced PID.  Since /tmp
>> is shared, they can now "contend" on the same perfMemory file since those
>> file names are PID based.
>>
>>
>> Once multiple JVMs can contend on the same file, a SIGBUS can arise if one
>> JVM has mmap'd the file and another ftruncate()'s it from under it (e.g.
>> https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/os/linux/perfMemory_linux.cpp#L909
>> ).
>>
>>
>> Is this a known issue? I couldn't find any existing JBS entries or mailing
>> list discussions around this specific circumstance.
>>
>>
>> As for possible solutions, would it be possible to use the global PID
>> instead of the namespaced PID to "regain" the uniqueness invariant of the
>> PID? Also, might it make sense to flock() the file to prevent another
>> process from mucking with it?
>>
>>
>> Happy to provide more info if needed.
>>
>>
>> Thanks
>>
>>
>> --
> Sent from my phone



More information about the hotspot-runtime-dev mailing list