Inquiry: Handling Unix sockets created by glibc/NSS during checkpoint

Mon Sep 15 15:12:08 UTC 2025

Hi Ma Zhen,

we are aware of similar issue where an application has 
`/var/cache/nscd/passwd` mapped despite not having the priviledge to 
open() this file - the application can receive a file descriptor through 
a socket and then is able to mmap it. Another case are files under 
`/var/lib/sss/mc/` opened by getpwuid_r, getpwname_r, getgrgid_r, 
getgrname_r or similar functions.

You're right that File Descriptor Policies cannot be applied here, these 
work on a Java level (the FD must have an associated Java object).

There is a VM option `-XX:CRaCAllowedOpenFilePrefixes` that lets the 
checkpoint to proceed if a file from this path is opened; in most cases 
CRIU can reopen a regular file without issues (and it should be able to 
handle sockets as well). I have not tested if the path matching works 
with sockets, but shouldn't be too difficult to fix up for Unix sockets.

Besides this there's no Resource-handling on native level (you cannot 
register a native hook), though it might be possible to find an open FD 
and close it from Java - I wouldn't recommend such hacky way.

To be honest on systems where we've encountered this issue we rather 
disabled NSCD service completely. If you can't control the environment, 
you can run the application in a container that won't be configured with 
these services.

Cheers,

Radim

On 9/15/25 11:29, ma zhen wrote:
>
> Hi CRaC developers,
>
> I am currently working on adapting a Java application to support CRaC. 
> I've encountered a specific challenge related to a Unix socket that is 
> preventing successful checkpoint creation.
>
> During the checkpoint process, I consistently receive a 
> CheckpointOpenSocketException for a specific file descriptor, which 
> lsof identifies as a Unix socket.
>
> I have conducted a detailed investigation to trace the origin of this 
> socket and found that it is not created directly by my Java 
> application code. Instead, it is created by the underlying glibc 
> library as part of the Name Service Switch (NSS) framework. The call 
> stack, captured using BCC, clearly shows that the socket() call 
> originates from glibc's __nscd_* functions. This happens when the JVM 
> or application triggers a name service lookup (e.g., resolving a user 
> ID). In my specific environment, this results in a Unix socket 
> connection from the Java process to the lwsmd daemon for authentication.
>
> Because this socket is created and managed within the native C 
> library, the standard approach of implementing a Java-level 
> org.crac.Resource to close and restore it doesn't seem applicable, as 
> my application code has no direct handle or control over its lifecycle.
>
> I have documented the full analysis, including the error, lsof output, 
> and BCC stack traces, in a detailed write-up which you can find here:
> https://github.com/mz1999/blog/blob/master/docs/trace_java_socket_creation-en.md
>
> My question is: What is the recommended approach for handling such 
> file descriptors that are opened by underlying native libraries 
> without direct control from the Java application?
>
> Are there any existing mechanisms, perhaps through advanced file 
> descriptor policies, or any planned features that might address this 
> common scenario? Or is there another workaround that the team would 
> suggest?
>
> Thank you for your time and for developing this fantastic project. Any 
> guidance you can provide would be greatly appreciated.
>
> Best regards,
> mazhen