SIGBUS on linux in perfMemory_init (containers)

Ioi Lam ioi.lam at oracle.com
Tue May 3 05:13:54 UTC 2022



On 5/2/2022 5:41 AM, Severin Gehwolf wrote:
> Hi,
>
> On Fri, 2022-04-29 at 14:19 -0700, Ioi Lam wrote:
>>
>> On 4/29/2022 1:55 PM, Ioi Lam wrote:
>>>
>>> On 4/29/2022 6:44 AM, Vitaly Davidovich wrote:
>>>> Hi all,
>>>>
>>>> We've been seeing intermittent SIGBUS failures on linux with jdk11.
>>>> They
>>>> all have this distinctive backtrace:
>>>>
>>>> C  [libc.so.6+0x12944d]
>>>>
>>>> V  [libjvm.so+0xcca542]  perfMemory_init()+0x72
>>>>
>>>> V  [libjvm.so+0x8a3242]  vm_init_globals()+0x22
>>>>
>>>> V  [libjvm.so+0xedc31d]  Threads::create_vm(JavaVMInitArgs*,
>>>> bool*)+0x1ed
>>>>
>>>> V  [libjvm.so+0x9615b2]  JNI_CreateJavaVM+0x52
>>>>
>>>> C  [libjli.so+0x49af]  JavaMain+0x8f
>>>>
>>>> C  [libjli.so+0x9149]  ThreadJavaMain+0x9
>>>>
>>>>
>>>> Initially, we suspected that /tmp was full but that turned out to not be
>>>> the case.  After a few more instances of the crash and investigation, we
>>>> believe we know the root cause.
>>>>
>>>>
>>>> The crashing applications are all running in a K8 pod, with each JVM
>>>> in a
>>>> separate container:
>>>>
>>>>
>>>> container_type: cgroupv1 (from the hs_err file)
>>>>
>>>>
>>>> /tmp is mounted such that it's shared by multiple containers. Since
>>>> these
>>>> JVMs are running in containers, we believe what happens is the
>>>> namespaced
>>>> (i.e. per container) PIDs overlap between different containers - 2
>>>> JVMs, in
>>>> separate containers, can end up with the same namespaced PID. Since /tmp
>>>> is shared, they can now "contend" on the same perfMemory file since
>>>> those
>>>> file names are PID based.
>>> Hi Vitaly,
>>>
>>> Is there any reason for sharing the same /tmp directory across
>>> different containers?
>>>
>>> Are you using the /tmp/hsperfdata_$USER/<pid> files at all. If not,
>>> for the time being, you can disable them with the -XX:-UsePerfData flag,
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8255008 has a related proposal:
>>>
> This bug is private. Could this one be made accessible somehow?

I've made the bug public.

> Another related bug seems to be, though not quite the same:
> https://bugs.openjdk.java.net/browse/JDK-8284330

Vitaly's scenario will still crash with the above fix.

>>> Java: -Djdk.attach.tmpdir=/container-attachdir
>>> -XX:+UnlockCommercialFeature -XX:+FlightRecorder -XX:+StartAttachListener
>>> Docker: --volume /tmp/container-attachdir:/container-attachdir
>>>
>>> In this case, we probably will run into the same UID clash problem as
>>> well.
>>>
>>> Maybe we should have an additional property like
>>> -Djdk.attach.use.global.pid=true
>>>
>> I read the proposal in JDK-8255008 again and realized that the JVM
>> inside the container doesn't know what it's host PID is. The proposal is
>> to create these files:
>>
>> $jdk_attach_dir/hsperfdata_{user}/e4f3e2e4fd97:10
>> $jdk_attach_dir/.java_pid:e4f3e2e4fd97:10
>>
>> where the e4f3e2e4fd97 is the container ID which is visible as
>> /tmp/hostname from inside the container.
>>
>> I'll try to implement a prototype for the proposal.
> Please be aware that the container's hostname is also user-settable.
> E.g.
>
> $ docker run --hostname foo ...
>
> Would set the hostname to 'foo'.

Maybe that's OK, as the user will probably set them to unique names.

Or we can use some sort of UUID. Is there anything that cgroup provides 
for a containerized process to uniquely identify itself?

And, do we need to handle nested containers? Is this a practical use case?

> Ioi, did you end up creating a bug for this?

I created a JBS issue from Vitaly's original report:

https://bugs.openjdk.java.net/browse/JDK-8286030

Thanks
- Ioi

>
> Thanks,
> Severin
>
>> Thanks
>> - Ioi
>>
>>>> Once multiple JVMs can contend on the same file, a SIGBUS can
>>>> arise
>>>> if one
>>>> JVM has mmap'd the file and another ftruncate()'s it from under
>>>> it (e.g.
>>>> https://github.com/openjdk/jdk11/blob/37115c8ea4aff13a8148ee2b8832b20888a5d880/src/hotspot/os/linux/perfMemory_linux.cpp#L909
>>>>   
>>>>
>>>> ).
>>>>
>>>>
>>>> Is this a known issue? I couldn't find any existing JBS entries
>>>> or
>>>> mailing
>>>> list discussions around this specific circumstance.
>>>>
>>>>
>>>> As for possible solutions, would it be possible to use the global
>>>> PID
>>>> instead of the namespaced PID to "regain" the uniqueness
>>>> invariant of
>>>> the
>>>> PID?
>>> I think this needs to be optional. E.g., if you run a tool such as
>>> "jcmd" inside the same container as the jvm process, the jcmd tool
>>> would expect the PID to be the local version specific to this
>>> container.
>>>
>>>>    Also, might it make sense to flock() the file to prevent
>>>> another
>>>> process from mucking with it?
>>>>
>>> That will not solve the fundamental problem where two processes are
>>> using the same hsperfdata file. One of them would fail to write to
>>> it,
>>> and tools won't be able to monitor both processes.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> Happy to provide more info if needed.
>>>>
>>>>
>>>> Thanks



More information about the hotspot-runtime-dev mailing list