[crac] RFR: Persist memory in-JVM

Mon Sep 4 07:14:08 UTC 2023

On Fri, 1 Sep 2023 13:53:58 GMT, Anton Kozlov <akozlov at openjdk.org> wrote:

>> This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data.
>> 
>> At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented.
>> 
>> ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint).
>> ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed.
>> 
>> Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file.
>
> src/hotspot/os/linux/crac_linux.cpp line 578:
> 
>> 576:   while (persist_futex) {
>> 577:     syscall(SYS_futex, &persist_futex, FUTEX_WAIT_PRIVATE, 1, nullptr, nullptr, 0);
>> 578:   }
> 
> I'm not sure will this guarantee no stack accesses. I met with compiler just spilling valued to x86-64 red zone below SP https://en.wikipedia.org/wiki/Red_zone_(computing).
> 
> To avoid dealing with this problems, does it make sense to leave a page or two around SP still mmapped?

You're right, it is not a sufficient guarantee (though in practice it works well enough for the POC) - I've checked assembly and found that there were some accesses (though this is not related to red zone IMO). Ideally I would rewrite this into assembly (which is not too difficult but requires arch-specific code) - it's kind of guesswork to estimate how far the access could go.

> just by making a snapshot of the data and comparing current state and the snapshot

Not sure if I follow in here; the verification could check if there's correct index entry in the loader, is that what you mean? Regrettably AFAIK it is not possible to check if given memory range is mapped with certain access modes.

> src/hotspot/share/runtime/crac.cpp line 361:
> 
>> 359:   MonitorLocker ml(PeriodicTask_lock, Mutex::_safepoint_check_flag);
>> 360:   WatcherThread::watcher_thread()->unpark();
>> 361: }
> 
> This looks like a part of #106

Yes, this PR includes those commits so that it can actually run. I wonder why Github still highlights those now that #106 is integrated, and there shouldn't be any diff. I can rebase...

> src/hotspot/share/runtime/crac.cpp line 812:
> 
>> 810:     if (a->addr < b->addr) return -1;
>> 811:     if (a->addr > b->addr) return 1;
>> 812:     return 0;
> 
> Out of curiosity, why not `a->addr - b->addr`?

When you cast 64-bit diff in addresses into 32 bit int, the sign is not preserved (I think that this just clamps of the higher bits); the sort got pretty messed with that. Maybe there's a bitwise hack, but `cmov`s are cheap enough: https://wonderfly.github.io/cs-basics/2020/10/03/comparing-uint64_t/

> src/hotspot/share/runtime/globals.hpp line 2025:
> 
>> 2023:   product(bool, CRPersistMemory, true, "Persist/load memory from within "   \
>> 2024:       "the VM rather than relying on C/R engine.")                          \
>> 2025:                                                                             \
> 
> I see some value in having CRPersistMemory as JVM option, but for now the reason is not clear. For debugging, to be able to revert back to previous behavior, it should be DIAGNOSTIC. To highlight the potentially incomplete or bogus implmenetation, it should be EXPERIMENTAL. Unattributed, it means there are valid production use-cases when it would be beneficial to not use it. Is it really so? In what cases?

Right, I'll make this diagnostic. It will let us benchmark the baseline of persist and restore via CRIU only.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1314477640
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1314484613
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1314486227
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1314492720
PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1314497154