[crac] RFR: Persist memory in-JVM

Anton Kozlov akozlov at openjdk.org
Fri Sep 1 16:40:16 UTC 2023


On Fri, 28 Jul 2023 17:07:22 GMT, Radim Vansa <rvansa at openjdk.org> wrote:

> This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data.
> 
> At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented.
> 
> ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint).
> ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed.
> 
> Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file.

src/hotspot/os/linux/crac_linux.cpp line 578:

> 576:   while (persist_futex) {
> 577:     syscall(SYS_futex, &persist_futex, FUTEX_WAIT_PRIVATE, 1, nullptr, nullptr, 0);
> 578:   }

I'm not sure will this guarantee no stack accesses. I met with compiler just spilling valued to x86-64 red zone below SP https://en.wikipedia.org/wiki/Red_zone_(computing).

To avoid dealing with this problems, does it make sense to leave a page or two around SP still mmapped?

src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp line 298:

> 296:     flip = !flip;
> 297:     index = next;
> 298:   }

There is a pattern of computing the size of memory to be persisted, and re-computing the same value to be loaded. This function in particular looks very similar to G1PageBasedVirtualSpace::persist_for_checkpoint. Only later I found out, MemoryLoader verifies that the load request corresponds to a previous store request.

I propose to change this and similar functions to do verification solely, and make MemoryLoader to drive process of loading, since it anyway possesses all the data required for that. In this function, for verification it will be enough to check `_committed` did not change, and it can be done more efficiently, just by making a snapshot of the data and comparing current state and the snapshot. The verification can also be performed in debug builds only then.

src/hotspot/share/runtime/crac.cpp line 361:

> 359:   MonitorLocker ml(PeriodicTask_lock, Mutex::_safepoint_check_flag);
> 360:   WatcherThread::watcher_thread()->unpark();
> 361: }

This looks like a part of #106

src/hotspot/share/runtime/crac.cpp line 406:

> 404:     if (CRPersistMemory) {
> 405:       // Since VM_Crac instance is allocated on stack of other thread
> 406:       // we must not use it from now on

I see no problem with the code, but the comment bothers me. Threads stack ummapping is turning out to be pretty fragile.

src/hotspot/share/runtime/crac.cpp line 685:

> 683: GrowableArray<struct crac::MemoryPersisterBase::record> crac::MemoryPersisterBase::_index(256, mtInternal);
> 684: int crac::MemoryPersisterBase::_fd = -1;
> 685: bool crac::MemoryPersisterBase::_loading = false;

I see this is being used for assert only, then it should probably be DEBUG_ONLY(...)

src/hotspot/share/runtime/crac.cpp line 735:

> 733:       // We don't have to punch holes using fallocate, OS creates holes automatically
> 734:       // when we are seeking over gaps.
> 735:       // Note: in the future it might be useful to record holes explicitly, too,

Agree, let's add `TODO` marker here.

src/hotspot/share/runtime/crac.cpp line 812:

> 810:     if (a->addr < b->addr) return -1;
> 811:     if (a->addr > b->addr) return 1;
> 812:     return 0;

Out of curiosity, why not `a->addr - b->addr`?

src/hotspot/share/runtime/crac.hpp line 82:

> 80:     bool store_gap(void *addr, size_t length);
> 81: 
> 82:     static void persist();

I think `persist()` should be renamed to `finalize()`, or something like this. Functions `store` and `persist` look pretty similar, their responsibilities are not clear.

src/hotspot/share/runtime/globals.hpp line 2025:

> 2023:   product(bool, CRPersistMemory, true, "Persist/load memory from within "   \
> 2024:       "the VM rather than relying on C/R engine.")                          \
> 2025:                                                                             \

I see some value in having CRPersistMemory as JVM option, but for now the reason is not clear. For debugging, to be able to revert back to previous behavior, it should be DIAGNOSTIC. To highlight the potentially incomplete or bogus implmenetation, it should be EXPERIMENTAL. Unattributed, it means there are valid production use-cases when it would be beneficial to not use it. Is it really so? In what cases?

src/java.base/unix/native/libjli/java_md.c line 658:

> 656:  */
> 657: static void* ThreadJavaMain(void* args) {
> 658:     main_thread_result = JavaMain(args);

I think this will work, but let's assign and read `main_thread_result` in the critical section. To avoid the need for reasoning why main_thread_result is visible in other thread.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1313060458
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1312918995
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1313057520
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1313162596
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1313176252
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1313169099
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1313173155
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1313167830
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1312869901
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1313264339


More information about the crac-dev mailing list