[crac] RFR: Persist memory in-JVM [v8]

Thu Oct 5 14:21:08 UTC 2023

On Wed, 4 Oct 2023 12:54:42 GMT, Radim Vansa <rvansa at openjdk.org> wrote:

>> This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data.
>> 
>> At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented.
>> 
>> ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint).
>> ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed.
>> 
>> Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file.
>
> Radim Vansa has updated the pull request incrementally with six additional commits since the last revision:
> 
>  - Fix x86 build
>  - Use number of threads directly
>  - Don't persist CodeCache used for non-nMethods
>    
>    The stubs allocated in this heap are used for atomic operations on
>    aarch64, avoiding them before the memory is restored would be complicated.
>  - Fix close() on windows
>  - Add overrides (OSX build fix)
>  - Fix recursiveCheckpoint

After some attempts to avoid allocations when CodeCache is unmapped I resolved this by not persisting the non-nmethod part of CodeCache as not being able to allocate/free can be difficult e.g. when we're receiving (arbitrary number of) new parameters on restore, and requires blocking all native threads.

Other notable fix is a fix for pauseengine/simengine - it's not possible to mmaping the memory for these, it must be just read in.

I've also fixed the x86 (ia32) build. There's a checkpoint issue (on Java side) which I'll file as a separate PR; with the fix I can verify ia32 is running fine. While we don't have to put too much effort into ia32 per se, testing 32 bit revealed some problems that could manifest on 64 bit and would be much harder to reproduce. The most important one (architecture-wise) was that the code can mmap new regions before the persisted memory is reloaded. Later on the loading code would silently overwrite those.
This is now avoided by mmaping the regions recorded in index earlier on (e.g. before reading new paremeters from shm) an mapping them with PROT_NONE; mmaps without `MAP_FIXED` won't acquire those. This problem is less likely to happen with more sparse 64 bit address space but on ia32 this happened reliably.

-------------

PR Comment: https://git.openjdk.org/crac/pull/95#issuecomment-1748946554