CheckpointOpenFileException for /var/lib/sss/mc/passwd
Ashutosh Mehra
asmehra at redhat.com
Thu Jan 20 16:04:43 UTC 2022
While trying C/R using CRaC build on my linux system (RHEL 8 based), I
encountered this exception:
$ sudo ./build/linux-x86_64-server-slowdebug/images/jdk/bin/java
-XX:+UnlockExperimentalVMOptions -XX:CRaCCheckpointTo=cr
-XX:+CRPrintResourcesOnCheckpoint HelloWorld
Before checkpoint
JVM: FD fd=0 type=character: details1="/dev/pts/4" OK: inherited from
process env
JVM: FD fd=1 type=character: details1="/dev/pts/4" OK: inherited from
process env
JVM: FD fd=2 type=character: details1="/dev/pts/4" OK: inherited from
process env
JVM: FD fd=3 type=regular:
details1="/home/asmehra/data/ashu-mehra/crac/build/linux-x86_64-server-slowdebug/images/jdk/lib/modules"
OK: inherited from process env
JVM: FD fd=4 type=regular: details1="/var/lib/sss/mc/passwd" BAD: opened by
application
JVM: FD fd=5 type=socket: details1="socket:[2248020]" BAD: opened by
application details2="socket:[2248020]"
Exception in thread "main" jdk.crac.CheckpointException
at java.base/jdk.crac.Core.checkpointRestore1(Core.java:142)
at java.base/jdk.crac.Core.checkpointRestore(Core.java:193)
at HelloWorld.main(HelloWorld.java:9)
Suppressed: jdk.crac.impl.CheckpointOpenFileException:
/var/lib/sss/mc/passwd
at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:84)
at java.base/jdk.crac.Core.checkpointRestore1(Core.java:145)
... 2 more
Suppressed: jdk.crac.impl.CheckpointOpenSocketException: socket:[2248020]
at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:88)
at java.base/jdk.crac.Core.checkpointRestore1(Core.java:145)
... 2 more
Notice this message in the above output:
JVM: FD fd=4 type=regular: details1="/var/lib/sss/mc/passwd" BAD: opened by
application
Here is the HelloWorld application used in the above example:
public class HelloWorld {
public static void main(String args[]) throws Exception {
System.out.println("Before checkpoint");
jdk.crac.Core.checkpointRestore();
System.out.println("After checkpoint");
}
}
Clearly the application is not trying to open /var/lib/sss/mc/passwd.
I tried to figure out what causes the process to open /var/lib/sss/mc/passwd
file.
Turns out it originates from libc when JVM tries to get user name to create
mmap based shared memory using user name as the location:
(gdb) bt
#0 0x00007ffff72af550 in open64 () from /lib64/libc.so.6
#1 0x00007ffff44b4314 in sss_open_cloexec () from /lib64/libnss_sss.so.2
#2 0x00007ffff44b3fc9 in sss_nss_mc_get_ctx () from /lib64/libnss_sss.so.2
#3 0x00007ffff44b4770 in sss_nss_mc_getpwuid () from /lib64/libnss_sss.so.2
#4 0x00007ffff44b061e in _nss_sss_getpwuid_r () from /lib64/libnss_sss.so.2
#5 0x00007ffff728a41d in getpwuid_r@@GLIBC_2.2.5 () from /lib64/libc.so.6
#6 0x00007ffff5e960eb in get_user_name (uid=0) at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/os/posix/perfMemory_posix.cpp:470
#7 0x00007ffff5e97026 in mmap_create_shared (size=32768) at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/os/posix/perfMemory_posix.cpp:972
#8 0x00007ffff5e972e8 in create_shared_memory (size=32768) at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/os/posix/perfMemory_posix.cpp:1049
#9 0x00007ffff5e97ac1 in PerfMemory::create_memory_region (size=32768) at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/os/posix/perfMemory_posix.cpp:1232
#10 0x00007ffff5e94e7c in PerfMemory::initialize () at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/share/runtime/perfMemory.cpp:107
#11 0x00007ffff5e94d9f in perfMemory_init () at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/share/runtime/perfMemory.cpp:62
#12 0x00007ffff5935d03 in vm_init_globals () at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/share/runtime/init.cpp:108
#13 0x00007ffff610bfeb in Threads::create_vm (args=0x7ffff7fd3df0,
canTryAgain=0x7ffff7fd3ce3) at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/share/runtime/thread.cpp:2813
#14 0x00007ffff5a37ddf in JNI_CreateJavaVM_inner (vm=0x7ffff7fd3e48,
penv=0x7ffff7fd3e50, args=0x7ffff7fd3df0) at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/share/prims/jni.cpp:3621
#15 0x00007ffff5a38138 in JNI_CreateJavaVM (vm=0x7ffff7fd3e48,
penv=0x7ffff7fd3e50, args=0x7ffff7fd3df0) at
/home/asmehra/data/ashu-mehra/crac/src/hotspot/share/prims/jni.cpp:3709
#16 0x00007ffff79b04ce in InitializeJVM (pvm=0x7ffff7fd3e48,
penv=0x7ffff7fd3e50, ifn=0x7ffff7fd3ea0) at
/home/asmehra/data/ashu-mehra/crac/src/java.base/share/native/libjli/java.c:1541
#17 0x00007ffff79ad042 in JavaMain (_args=0x7fffffffb040) at
/home/asmehra/data/ashu-mehra/crac/src/java.base/share/native/libjli/java.c:415
#18 0x00007ffff79b3e16 in ThreadJavaMain (args=0x7fffffffb040) at
/home/asmehra/data/ashu-mehra/crac/src/java.base/unix/native/libjli/java_md.c:651
#19 0x00007ffff779115a in start_thread () from /lib64/libpthread.so.0
#20 0x00007ffff72bef73 in clone () from /lib64/libc.so.6
libnss_sss.so.2 comes into picture because this system is configured to use
SSSD using NSS as the provider for password and groups map. This is
something outside the control of the JVM. Ideally this fd should have been
treated as opened by JVM, not the application.
During startup the JVM caches the fds in _vm_inited_fds to be able to
segregate the fds opened by the application and the JVM, but this happens
before it creates the shared memory.
This is the reason why the fd for /var/lib/sss/mc/passwd is not included in
the set of _vm_inited_fds and results in the exception at the time of
checkpoint.
I think this issue points out an important observation that we should try
to do any kind of segregation of resources between JVM-owned and
application-owned as late as possible, probably just before we start
executing the application code.
In this case delaying the initialization of _vm_inited_fds should help.
For now, I can workaround this issue by avoiding SSSD by updating
/etc/nsswitch.conf:
Current order of entries in /etc/nsswitch.conf is:
passwd: sss files systemd
group: sss files systemd
To workaround, change the order to:
passwd: files sss systemd
group: files sss systemd
Regards,
Ashutosh Mehra
More information about the crac-dev
mailing list