SIGBUS on linux in perfMemory_init (containers)

Severin Gehwolf sgehwolf at redhat.com
Tue May 3 12:52:28 UTC 2022


On Mon, 2022-05-02 at 22:13 -0700, Ioi Lam wrote:
> 
> 
> On 5/2/2022 5:41 AM, Severin Gehwolf wrote:
> > Hi,
> > 
> > On Fri, 2022-04-29 at 14:19 -0700, Ioi Lam wrote:
> > > 
> > > On 4/29/2022 1:55 PM, Ioi Lam wrote:
> > > > 
> > > > On 4/29/2022 6:44 AM, Vitaly Davidovich wrote:
> > > > > Hi all,
> > > > > 
> > > > > We've been seeing intermittent SIGBUS failures on linux with jdk11.
> > > > > They
> > > > > all have this distinctive backtrace:
> > > > > 
> > > > > C  [libc.so.6+0x12944d]
> > > > > 
> > > > > V  [libjvm.so+0xcca542]  perfMemory_init()+0x72
> > > > > 
> > > > > V  [libjvm.so+0x8a3242]  vm_init_globals()+0x22
> > > > > 
> > > > > V  [libjvm.so+0xedc31d]  Threads::create_vm(JavaVMInitArgs*,
> > > > > bool*)+0x1ed
> > > > > 
> > > > > V  [libjvm.so+0x9615b2]  JNI_CreateJavaVM+0x52
> > > > > 
> > > > > C  [libjli.so+0x49af]  JavaMain+0x8f
> > > > > 
> > > > > C  [libjli.so+0x9149]  ThreadJavaMain+0x9
> > > > > 
> > > > > 
> > > > > Initially, we suspected that /tmp was full but that turned out to not be
> > > > > the case.  After a few more instances of the crash and investigation, we
> > > > > believe we know the root cause.
> > > > > 
> > > > > 
> > > > > The crashing applications are all running in a K8 pod, with each JVM
> > > > > in a
> > > > > separate container:
> > > > > 
> > > > > 
> > > > > container_type: cgroupv1 (from the hs_err file)
> > > > > 
> > > > > 
> > > > > /tmp is mounted such that it's shared by multiple containers. Since
> > > > > these
> > > > > JVMs are running in containers, we believe what happens is the
> > > > > namespaced
> > > > > (i.e. per container) PIDs overlap between different containers - 2
> > > > > JVMs, in
> > > > > separate containers, can end up with the same namespaced PID. Since /tmp
> > > > > is shared, they can now "contend" on the same perfMemory file since
> > > > > those
> > > > > file names are PID based.
> > > > Hi Vitaly,
> > > > 
> > > > Is there any reason for sharing the same /tmp directory across
> > > > different containers?
> > > > 
> > > > Are you using the /tmp/hsperfdata_$USER/<pid> files at all. If not,
> > > > for the time being, you can disable them with the -XX:-UsePerfData flag,
> > > > 
> > > > https://bugs.openjdk.java.net/browse/JDK-8255008 has a related proposal:
> > > > 
> > This bug is private. Could this one be made accessible somehow?
> 
> I've made the bug public.

Thank you!

> > Another related bug seems to be, though not quite the same:
> > https://bugs.openjdk.java.net/browse/JDK-8284330
> 
> Vitaly's scenario will still crash with the above fix.

Right. My understanding is that Vitaly's scenario is /tmp shared across
multiple containers, which likely run as pid 1 (if the only process).
Add to it that they run as the same users inside the container and you
get your clash:

Container 1: user A, single process (=> pid 1)
Container 2: user A, single process (=> pid 1)

Container 1 and 2 share /tmp (e.g. via volume mounts). Container 1
*and* Container 2's processes suddenly share /tmp/hsperfdata_usera/1.

$ ps ax | grep java
  17662 pts/0    Ssl+   0:01 java -jar /deployments/undertow-servlet.jar
  18057 pts/0    Ssl+   0:01 java -jar /deployments/undertow-servlet.jar
$ sudo lsof -p 17662 | grep hsperf
java    17662 sgehwolf  DEL       REG  253,6           2464492 /tmp/hsperfdata_sgehwolf/1
$ sudo lsof -p 18057 | grep hsperf
java    18057 sgehwolf  mem       REG  253,6           2464503 /tmp/hsperfdata_sgehwolf/1 (stat: No such file or directory)

Containers started with:
$ podman run --rm -ti --userns keep-id --user $(id -u) -v $(pwd)/tmp_share_test:/tmp:z quay.io/sgehwolf/fedora-35-undertow:jdk17

Question is why share tmp to begin with?

> > > > Java: -Djdk.attach.tmpdir=/container-attachdir
> > > > -XX:+UnlockCommercialFeature -XX:+FlightRecorder -XX:+StartAttachListener
> > > > Docker: --volume /tmp/container-attachdir:/container-attachdir
> > > > 
> > > > In this case, we probably will run into the same UID clash problem as
> > > > well.
> > > > 
> > > > Maybe we should have an additional property like
> > > > -Djdk.attach.use.global.pid=true
> > > > 
> > > I read the proposal in JDK-8255008 again and realized that the JVM
> > > inside the container doesn't know what it's host PID is. The proposal is
> > > to create these files:
> > > 
> > > $jdk_attach_dir/hsperfdata_{user}/e4f3e2e4fd97:10
> > > $jdk_attach_dir/.java_pid:e4f3e2e4fd97:10
> > > 
> > > where the e4f3e2e4fd97 is the container ID which is visible as
> > > /tmp/hostname from inside the container.
> > > 
> > > I'll try to implement a prototype for the proposal.
> > Please be aware that the container's hostname is also user-settable.
> > E.g.
> > 
> > $ docker run --hostname foo ...
> > 
> > Would set the hostname to 'foo'.
> 
> Maybe that's OK, as the user will probably set them to unique names.

My concern is that all of a sudden user-input ends up writing paths in
the filesystem. That's a no-no to me.

> Or we can use some sort of UUID. Is there anything that cgroup provides 
> for a containerized process to uniquely identify itself?

Not that I'm aware of. But then again I'm doubtful the use-case has a
lot to stand on.

> And, do we need to handle nested containers? Is this a practical use case?

TBH, I'm not sure this is a fix worth having. Other than increased
complexity of the code I don't see much benefit. What's the use-case
exactly? To be able to extract JFR recordings from a process *in* a
container, but not run as root on the host?

> > Ioi, did you end up creating a bug for this?
> 
> I created a JBS issue from Vitaly's original report:
> 
> https://bugs.openjdk.java.net/browse/JDK-8286030

Thanks.

--
Severin



More information about the hotspot-runtime-dev mailing list