From jkratochvil at openjdk.org Mon Oct 2 08:30:12 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 2 Oct 2023 08:30:12 GMT Subject: [crac] Integrated: Fix OSX x86_64 CRaC crash In-Reply-To: References: Message-ID: <9ypPg-9t85bx0UkaqMTL5YWn_zbXlYQwcvCX9QXgT6g=.64f550c1-fd37-4058-9f41-2a41d3049fe8@github.com> On Fri, 29 Sep 2023 10:12:50 GMT, Jan Kratochvil wrote: > - copy-paste the same code from src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp > > It has fixed 3 testcases but the last 4th failure looks to be unrelated to this problem. > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/jdk/jdk/crac/JarFileFactoryCacheTest/JarFileFactoryCacheTest.java > 1 1 0 0 > jtreg:test/jdk/jdk/crac/MXBean.java 1 1 0 0 > jtreg:test/jdk/jdk/crac/RefQueueTest.java 1 1 0 0 > jtreg:test/jdk/jdk/crac/recursiveCheckpoint/Test.java >>> 1 0 1 0 << > ============================== This pull request has now been integrated. Changeset: 4d1e7b9d Author: Jan Kratochvil Committer: Radim Vansa URL: https://git.openjdk.org/crac/commit/4d1e7b9d0bb8d399bb692ce477ac503f0bffe2ed Stats: 10 lines in 2 files changed: 9 ins; 0 del; 1 mod Fix OSX x86_64 CRaC crash Reviewed-by: rvansa ------------- PR: https://git.openjdk.org/crac/pull/121 From jkratochvil at openjdk.org Mon Oct 2 14:56:01 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 2 Oct 2023 14:56:01 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v2] In-Reply-To: References: Message-ID: > CRaC: Fix fds opened for logging Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - AsyncLogWriter: add stop() and resume() - Merge branch 'crac' into crac-logfd - b229ea41: - 8f161825: - d8a454cb: - Merge branch 'crac' into crac-logfd - +testcase - CRaC: Fix fds opened for logging ------------- Changes: - all: https://git.openjdk.org/crac/pull/113/files - new: https://git.openjdk.org/crac/pull/113/files/17362702..39d38191 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=113&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=113&range=00-01 Stats: 153 lines in 16 files changed: 133 ins; 5 del; 15 mod Patch: https://git.openjdk.org/crac/pull/113.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/113/head:pull/113 PR: https://git.openjdk.org/crac/pull/113 From jkratochvil at openjdk.org Mon Oct 2 14:56:03 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 2 Oct 2023 14:56:03 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 14:30:15 GMT, Jan Kratochvil wrote: > CRaC: Fix fds opened for logging I am not sure when to call those functions, whether in VM Thread or before... ------------- PR Comment: https://git.openjdk.org/crac/pull/113#issuecomment-1734977560 From rvansa at openjdk.org Mon Oct 2 14:56:04 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Oct 2023 14:56:04 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v2] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 14:49:56 GMT, Jan Kratochvil wrote: >> CRaC: Fix fds opened for logging > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - AsyncLogWriter: add stop() and resume() > - Merge branch 'crac' into crac-logfd > - b229ea41: > - 8f161825: > - d8a454cb: > - Merge branch 'crac' into crac-logfd > - +testcase > - CRaC: Fix fds opened for logging src/hotspot/share/runtime/crac.cpp line 431: > 429: Universe::heap()->finish_collection(); > 430: > 431: AsyncLogWriter::instance()->flush(); As a non-java thread, AsyncLogWriter does not participate in the safepoint protocol. What happens if another non-java thread enqueues a message to be written while the output is closed? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/113#discussion_r1337075009 From jkratochvil at openjdk.org Mon Oct 2 14:56:04 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 2 Oct 2023 14:56:04 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v2] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 11:40:45 GMT, Radim Vansa wrote: >> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - AsyncLogWriter: add stop() and resume() >> - Merge branch 'crac' into crac-logfd >> - b229ea41: >> - 8f161825: >> - d8a454cb: >> - Merge branch 'crac' into crac-logfd >> - +testcase >> - CRaC: Fix fds opened for logging > > src/hotspot/share/runtime/crac.cpp line 431: > >> 429: Universe::heap()->finish_collection(); >> 430: >> 431: AsyncLogWriter::instance()->flush(); > > As a non-java thread, AsyncLogWriter does not participate in the safepoint protocol. What happens if another non-java thread enqueues a message to be written while the output is closed? OK, I have to improve it, thanks. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/113#discussion_r1337256728 From jkratochvil at openjdk.org Mon Oct 2 14:56:05 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 2 Oct 2023 14:56:05 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v2] In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 13:56:50 GMT, Jan Kratochvil wrote: >> src/hotspot/share/runtime/crac.cpp line 431: >> >>> 429: Universe::heap()->finish_collection(); >>> 430: >>> 431: AsyncLogWriter::instance()->flush(); >> >> As a non-java thread, AsyncLogWriter does not participate in the safepoint protocol. What happens if another non-java thread enqueues a message to be written while the output is closed? > > OK, I have to improve it, thanks. I hope the stop() and resume() do fix it. But there is no testcase for that, do you want it? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/113#discussion_r1342793215 From akozlov at openjdk.org Mon Oct 2 17:55:12 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Mon, 2 Oct 2023 17:55:12 GMT Subject: [crac] RFR: Close files opened by Decoder before checkpoint In-Reply-To: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> References: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> Message-ID: On Mon, 25 Sep 2023 13:08:24 GMT, Radim Vansa wrote: > Native memory tracking needs to resolve some addresses for stack unwinding and the decoders keep some shared library files open as a cache. Since the decoder instances can be re-allocated anytime this fix just destroys them before a checkpoint. src/hotspot/share/utilities/decoder.cpp line 134: > 132: delete _error_handler_decoder; > 133: _error_handler_decoder = nullptr; > 134: } Error handler appears only during fatal error reporting. We can drop the handling and syncing with the errorred thread, as fatal error will follow anyway after we find error handler. So this can be guarantee(_error_handler_decoder == nullptr) ------------- PR Review Comment: https://git.openjdk.org/crac/pull/116#discussion_r1342986789 From akozlov at openjdk.org Mon Oct 2 17:57:29 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Mon, 2 Oct 2023 17:57:29 GMT Subject: [crac] RFR: Drop perfdata and cppath In-Reply-To: References: Message-ID: <8dTVuXOkdCnAogpVI26448Yn8i7gC7ViG9N5e9qnnWg=.d3c681d8-19ef-4c68-a883-516aec32f6a4@github.com> On Wed, 27 Sep 2023 20:40:53 GMT, Radim Vansa wrote: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. test/jdk/jdk/crac/PerfMemoryRestoreTest.java line 66: > 64: Thread.sleep(10); > 65: } > 66: // Note: we need to check the checkpoint.pid(), which should be restored (when using CRIU), Could you also check that jcmd works after restore? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/119#discussion_r1342988896 From rvansa at openjdk.org Mon Oct 2 18:37:12 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Oct 2023 18:37:12 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v2] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 14:48:48 GMT, Jan Kratochvil wrote: >> OK, I have to improve it, thanks. > > I hope the stop() and resume() do fix it. But there is no testcase for that, do you want it? I guess that a testcase for the synchronization would be quite difficult to do, as you'd be trying to simulate a race. However, I am not convinced the `stop()` and `resume()` work correctly: the log writer does not need to own the lock while writing messages to the (potentially nulled) outputs; you flush the buffer and then acquire the lock, effectively blocking all log producers from enqueuing further messages. However if someone enqueues a message right after flush, the log writer has a chance to dequeue it (actually swap queues) and even if you acquire lock then, continue writing these messages to the closed output. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/113#discussion_r1343024529 From rvansa at openjdk.org Mon Oct 2 18:59:40 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Oct 2023 18:59:40 GMT Subject: [crac] RFR: Close files opened by Decoder before checkpoint [v2] In-Reply-To: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> References: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> Message-ID: <6S30weJNZ0mIL0VExeOO1BQwq8xU9G5gZ_xiXM4mJ_E=.5e069f4b-789f-4fbf-a985-a13f270fd79a@github.com> > Native memory tracking needs to resolve some addresses for stack unwinding and the decoders keep some shared library files open as a cache. Since the decoder instances can be re-allocated anytime this fix just destroys them before a checkpoint. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Assume error handler decoder is null during checkpoint ------------- Changes: - all: https://git.openjdk.org/crac/pull/116/files - new: https://git.openjdk.org/crac/pull/116/files/f2e99a2d..ee544fc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=116&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=116&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/crac/pull/116.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/116/head:pull/116 PR: https://git.openjdk.org/crac/pull/116 From rvansa at openjdk.org Mon Oct 2 18:59:41 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Oct 2023 18:59:41 GMT Subject: [crac] RFR: Close files opened by Decoder before checkpoint In-Reply-To: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> References: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> Message-ID: On Mon, 25 Sep 2023 13:08:24 GMT, Radim Vansa wrote: > Native memory tracking needs to resolve some addresses for stack unwinding and the decoders keep some shared library files open as a cache. Since the decoder instances can be re-allocated anytime this fix just destroys them before a checkpoint. Agreed, updated. ------------- PR Comment: https://git.openjdk.org/crac/pull/116#issuecomment-1743577342 From rvansa at openjdk.org Mon Oct 2 19:51:51 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Oct 2023 19:51:51 GMT Subject: [crac] RFR: Drop perfdata and cppath [v2] In-Reply-To: References: Message-ID: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add jcmd PerfCounter.print to the test ------------- Changes: - all: https://git.openjdk.org/crac/pull/119/files - new: https://git.openjdk.org/crac/pull/119/files/71e54c25..bd6adcc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=119&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=119&range=00-01 Stats: 13 lines in 2 files changed: 10 ins; 0 del; 3 mod Patch: https://git.openjdk.org/crac/pull/119.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/119/head:pull/119 PR: https://git.openjdk.org/crac/pull/119 From rvansa at openjdk.org Mon Oct 2 19:51:53 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Oct 2023 19:51:53 GMT Subject: [crac] RFR: Drop perfdata and cppath In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 20:40:53 GMT, Radim Vansa wrote: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Added `jcmd PerfCounter.print` as requested. ------------- PR Comment: https://git.openjdk.org/crac/pull/119#issuecomment-1743637518 From rvansa at openjdk.org Tue Oct 3 07:11:55 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 3 Oct 2023 07:11:55 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v2] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 14:56:01 GMT, Jan Kratochvil wrote: >> CRaC: Fix fds opened for logging > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - AsyncLogWriter: add stop() and resume() > - Merge branch 'crac' into crac-logfd > - b229ea41: > - 8f161825: > - d8a454cb: > - Merge branch 'crac' into crac-logfd > - +testcase > - CRaC: Fix fds opened for logging About when to call them: IMO the more you do outside the VM thread the better. OpenJDK CRaC has gone the way where we perform most of the C/R handling in parallel with running the application, as opposed to the approach in OpenJ9 where even the Java callbacks (Resource.beforeCheckpoint/afterRestore) are performed in a single-threaded mode (which has its own advantage in a reduced need for complicated synchronization), so it is better to keep to philosophy consistent. ------------- PR Comment: https://git.openjdk.org/crac/pull/113#issuecomment-1744311006 From rvansa at openjdk.org Tue Oct 3 07:17:12 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 3 Oct 2023 07:17:12 GMT Subject: [crac] RFR: Persist memory in-JVM [v7] In-Reply-To: References: Message-ID: > This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. > > At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. > > ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). > ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. > > Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. Radim Vansa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge remote-tracking branch 'origin/crac' into persist_memory - Fix reloading memory when checkpoint fails - Fix compilation on different OSes - Improve aarch64 assembly - Move MemoryPersister impl to own file - Merge branch 'crac' into persist_memory - Backport of API from future changes for other persistent memory features - Another assembly fix - Don't fork when we're not unregistering rseq - Fix assembly loop - ... and 27 more: https://git.openjdk.org/crac/compare/4d1e7b9d...fea0155a ------------- Changes: https://git.openjdk.org/crac/pull/95/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=95&range=06 Stats: 1195 lines in 33 files changed: 1164 ins; 12 del; 19 mod Patch: https://git.openjdk.org/crac/pull/95.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/95/head:pull/95 PR: https://git.openjdk.org/crac/pull/95 From rvansa at openjdk.org Wed Oct 4 12:54:42 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Wed, 4 Oct 2023 12:54:42 GMT Subject: [crac] RFR: Persist memory in-JVM [v8] In-Reply-To: References: Message-ID: > This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. > > At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. > > ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). > ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. > > Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. Radim Vansa has updated the pull request incrementally with six additional commits since the last revision: - Fix x86 build - Use number of threads directly - Don't persist CodeCache used for non-nMethods The stubs allocated in this heap are used for atomic operations on aarch64, avoiding them before the memory is restored would be complicated. - Fix close() on windows - Add overrides (OSX build fix) - Fix recursiveCheckpoint ------------- Changes: - all: https://git.openjdk.org/crac/pull/95/files - new: https://git.openjdk.org/crac/pull/95/files/fea0155a..78dfa7cb Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=95&range=07 - incr: https://webrevs.openjdk.org/?repo=crac&pr=95&range=06-07 Stats: 203 lines in 9 files changed: 128 ins; 40 del; 35 mod Patch: https://git.openjdk.org/crac/pull/95.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/95/head:pull/95 PR: https://git.openjdk.org/crac/pull/95 From jkratochvil at openjdk.org Wed Oct 4 14:00:56 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Wed, 4 Oct 2023 14:00:56 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v3] In-Reply-To: References: Message-ID: > CRaC: Fix fds opened for logging Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix a race by new _block_async. - bugreported by Radim Vansa ------------- Changes: - all: https://git.openjdk.org/crac/pull/113/files - new: https://git.openjdk.org/crac/pull/113/files/39d38191..a4666eb4 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=113&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=113&range=01-02 Stats: 9 lines in 3 files changed: 5 ins; 1 del; 3 mod Patch: https://git.openjdk.org/crac/pull/113.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/113/head:pull/113 PR: https://git.openjdk.org/crac/pull/113 From jkratochvil at openjdk.org Wed Oct 4 14:14:38 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Wed, 4 Oct 2023 14:14:38 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v4] In-Reply-To: References: Message-ID: > CRaC: Fix fds opened for logging Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Move log suspension out of VM thread. ------------- Changes: - all: https://git.openjdk.org/crac/pull/113/files - new: https://git.openjdk.org/crac/pull/113/files/a4666eb4..58fd093b Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=113&range=03 - incr: https://webrevs.openjdk.org/?repo=crac&pr=113&range=02-03 Stats: 22 lines in 1 file changed: 11 ins; 11 del; 0 mod Patch: https://git.openjdk.org/crac/pull/113.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/113/head:pull/113 PR: https://git.openjdk.org/crac/pull/113 From rvansa at openjdk.org Thu Oct 5 06:56:44 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 5 Oct 2023 06:56:44 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v4] In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 14:14:38 GMT, Jan Kratochvil wrote: >> CRaC: Fix fds opened for logging > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Move log suspension out of VM thread. src/hotspot/share/runtime/crac.cpp line 439: > 437: } > 438: if (cr.ok()) { > 439: LogConfiguration::reopen(); Looks like the log stays closed & blocked if the checkpoint fails? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/113#discussion_r1346869978 From jkratochvil at openjdk.org Thu Oct 5 07:31:30 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 5 Oct 2023 07:31:30 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v5] In-Reply-To: References: Message-ID: > CRaC: Fix fds opened for logging Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Do not keep logs closed if snapshot failed - bugreported by Radim Vansa ------------- Changes: - all: https://git.openjdk.org/crac/pull/113/files - new: https://git.openjdk.org/crac/pull/113/files/58fd093b..b941490f Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=113&range=04 - incr: https://webrevs.openjdk.org/?repo=crac&pr=113&range=03-04 Stats: 11 lines in 1 file changed: 6 ins; 5 del; 0 mod Patch: https://git.openjdk.org/crac/pull/113.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/113/head:pull/113 PR: https://git.openjdk.org/crac/pull/113 From jkratochvil at openjdk.org Thu Oct 5 07:38:43 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 5 Oct 2023 07:38:43 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v5] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 18:34:50 GMT, Radim Vansa wrote: >> I hope the stop() and resume() do fix it. But there is no testcase for that, do you want it? > > I guess that a testcase for the synchronization would be quite difficult to do, as you'd be trying to simulate a race. > > However, I am not convinced the `stop()` and `resume()` work correctly: the log writer does not need to own the lock while writing messages to the (potentially nulled) outputs; you flush the buffer and then acquire the lock, effectively blocking all log producers from enqueuing further messages. However if someone enqueues a message right after flush, the log writer has a chance to dequeue it (actually swap queues) and even if you acquire lock then, continue writing these messages to the closed output. Race reproducers are also a part of testsuites - [ptrace-testsuite](https://sourceware.org/systemtap/wiki/utrace/tests) limits their run by its `TESTTIME` parameter. I sure did not try how easily would be to reproduce this race and I am sure happy not to write such a testcase. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/113#discussion_r1346938564 From jkratochvil at openjdk.org Thu Oct 5 07:38:43 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 5 Oct 2023 07:38:43 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v4] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 06:40:32 GMT, Radim Vansa wrote: >> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: >> >> Move log suspension out of VM thread. > > src/hotspot/share/runtime/crac.cpp line 439: > >> 437: } >> 438: if (cr.ok()) { >> 439: LogConfiguration::reopen(); > > Looks like the log stays closed & blocked if the checkpoint fails? Hopefully it is fixed now? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/113#discussion_r1346938993 From rvansa at openjdk.org Thu Oct 5 14:21:08 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 5 Oct 2023 14:21:08 GMT Subject: [crac] RFR: Persist memory in-JVM [v8] In-Reply-To: References: Message-ID: <8gEJkR4zP6Vxfa7seGhWkBIXpPM_lUZQr3F4It5f0kA=.d53b18aa-11b0-46ec-be26-213183d3dc38@github.com> On Wed, 4 Oct 2023 12:54:42 GMT, Radim Vansa wrote: >> This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. >> >> At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. >> >> ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). >> ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. >> >> Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. > > Radim Vansa has updated the pull request incrementally with six additional commits since the last revision: > > - Fix x86 build > - Use number of threads directly > - Don't persist CodeCache used for non-nMethods > > The stubs allocated in this heap are used for atomic operations on > aarch64, avoiding them before the memory is restored would be complicated. > - Fix close() on windows > - Add overrides (OSX build fix) > - Fix recursiveCheckpoint After some attempts to avoid allocations when CodeCache is unmapped I resolved this by not persisting the non-nmethod part of CodeCache as not being able to allocate/free can be difficult e.g. when we're receiving (arbitrary number of) new parameters on restore, and requires blocking all native threads. Other notable fix is a fix for pauseengine/simengine - it's not possible to mmaping the memory for these, it must be just read in. I've also fixed the x86 (ia32) build. There's a checkpoint issue (on Java side) which I'll file as a separate PR; with the fix I can verify ia32 is running fine. While we don't have to put too much effort into ia32 per se, testing 32 bit revealed some problems that could manifest on 64 bit and would be much harder to reproduce. The most important one (architecture-wise) was that the code can mmap new regions before the persisted memory is reloaded. Later on the loading code would silently overwrite those. This is now avoided by mmaping the regions recorded in index earlier on (e.g. before reading new paremeters from shm) an mapping them with PROT_NONE; mmaps without `MAP_FIXED` won't acquire those. This problem is less likely to happen with more sparse 64 bit address space but on ia32 this happened reliably. ------------- PR Comment: https://git.openjdk.org/crac/pull/95#issuecomment-1748946554 From rvansa at openjdk.org Thu Oct 5 15:23:15 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 5 Oct 2023 15:23:15 GMT Subject: [crac] RFR: Persist memory in-JVM [v9] In-Reply-To: References: Message-ID: > This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. > > At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. > > ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). > ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. > > Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. Radim Vansa has updated the pull request incrementally with two additional commits since the last revision: - Fix build on non-linux - Fix issues manifesting on ia32 * reinit memory early during restore * use native size for fields in index * fix syscall passing too long pointer (64bit) on 32bit * fix various bugs in code ------------- Changes: - all: https://git.openjdk.org/crac/pull/95/files - new: https://git.openjdk.org/crac/pull/95/files/78dfa7cb..11b4a04e Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=95&range=08 - incr: https://webrevs.openjdk.org/?repo=crac&pr=95&range=07-08 Stats: 145 lines in 8 files changed: 66 ins; 19 del; 60 mod Patch: https://git.openjdk.org/crac/pull/95.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/95/head:pull/95 PR: https://git.openjdk.org/crac/pull/95 From jkratochvil at openjdk.org Thu Oct 5 15:24:27 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 5 Oct 2023 15:24:27 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v5] In-Reply-To: References: Message-ID: <-TcOVVdJt0YRbAadg_33l8mrDnOT5fabQ63AwF73jho=.d14eb1a7-7814-47ca-a39d-cb94a58e27b1@github.com> On Thu, 5 Oct 2023 07:31:30 GMT, Jan Kratochvil wrote: >> CRaC: Fix fds opened for logging > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Do not keep logs closed if snapshot failed > - bugreported by Radim Vansa There is a GHA error... ------------- PR Comment: https://git.openjdk.org/crac/pull/113#issuecomment-1748626556 From jkratochvil at openjdk.org Thu Oct 5 15:24:23 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 5 Oct 2023 15:24:23 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v6] In-Reply-To: References: Message-ID: > CRaC: Fix fds opened for logging Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Attempt to solve a compilation error from GHA: /usr/bin/ld: build/linux-x64/hotspot/variant-zero/libjvm/objs/crac.o: in function `AsyncLogWriter::stop()': src/hotspot/share/logging/logAsyncWriter.hpp:202: undefined reference to `PlatformMutex::lock()' /usr/bin/ld: build/linux-x64/hotspot/variant-zero/libjvm/objs/crac.o: in function `AsyncLogWriter::resume()': src/hotspot/share/logging/logAsyncWriter.hpp:203: undefined reference to `PlatformMutex::unlock()' /usr/bin/ld: build/linux-x64/hotspot/variant-zero/libjvm/objs/crac.o: in function `AsyncLogWriter::stop()': src/hotspot/share/logging/logAsyncWriter.hpp:202: undefined reference to `PlatformMutex::lock()' /usr/bin/ld: build/linux-x64/hotspot/variant-zero/libjvm/objs/crac.o: in function `AsyncLogWriter::resume()': src/hotspot/share/logging/logAsyncWriter.hpp:203: undefined reference to `PlatformMutex::unlock()' ------------- Changes: - all: https://git.openjdk.org/crac/pull/113/files - new: https://git.openjdk.org/crac/pull/113/files/b941490f..a79974a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=113&range=05 - incr: https://webrevs.openjdk.org/?repo=crac&pr=113&range=04-05 Stats: 11 lines in 2 files changed: 9 ins; 0 del; 2 mod Patch: https://git.openjdk.org/crac/pull/113.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/113/head:pull/113 PR: https://git.openjdk.org/crac/pull/113 From rvansa at openjdk.org Thu Oct 5 15:24:26 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 5 Oct 2023 15:24:26 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v5] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 07:31:30 GMT, Jan Kratochvil wrote: >> CRaC: Fix fds opened for logging > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Do not keep logs closed if snapshot failed > - bugreported by Radim Vansa LGTM, thank you! ------------- Marked as reviewed by rvansa (Committer). PR Review: https://git.openjdk.org/crac/pull/113#pullrequestreview-1659683801 From jkratochvil at openjdk.org Thu Oct 5 15:40:28 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 5 Oct 2023 15:40:28 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v6] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 15:24:23 GMT, Jan Kratochvil wrote: >> CRaC: Fix fds opened for logging > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Attempt to solve a compilation error from GHA: > /usr/bin/ld: build/linux-x64/hotspot/variant-zero/libjvm/objs/crac.o: in function `AsyncLogWriter::stop()': > src/hotspot/share/logging/logAsyncWriter.hpp:202: undefined reference to `PlatformMutex::lock()' > /usr/bin/ld: build/linux-x64/hotspot/variant-zero/libjvm/objs/crac.o: in function `AsyncLogWriter::resume()': > src/hotspot/share/logging/logAsyncWriter.hpp:203: undefined reference to `PlatformMutex::unlock()' > /usr/bin/ld: build/linux-x64/hotspot/variant-zero/libjvm/objs/crac.o: in function `AsyncLogWriter::stop()': > src/hotspot/share/logging/logAsyncWriter.hpp:202: undefined reference to `PlatformMutex::lock()' > /usr/bin/ld: build/linux-x64/hotspot/variant-zero/libjvm/objs/crac.o: in function `AsyncLogWriter::resume()': > src/hotspot/share/logging/logAsyncWriter.hpp:203: undefined reference to `PlatformMutex::unlock()' Please do not sponsor it yet, maybe there is really some OSX compilation regression. ------------- PR Comment: https://git.openjdk.org/crac/pull/113#issuecomment-1749153061 From jkratochvil at openjdk.org Thu Oct 5 15:47:27 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 5 Oct 2023 15:47:27 GMT Subject: [crac] RFR: Fix compilation with clang Message-ID: ../../src/hotspot/os/linux/crac_linux.cpp:192:13: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] while (dp = readdir(dir)) { ~~~^~~~~~~~~~~~~~ ../../src/hotspot/os/linux/crac_linux.cpp:192:13: note: place parentheses around the assignment to silence this warning while (dp = readdir(dir)) { ^ ( ) ../../src/hotspot/os/linux/crac_linux.cpp:192:13: note: use '==' to turn this assignment into an equality comparison while (dp = readdir(dir)) { ^ == ../../src/hotspot/os/linux/crac_linux.cpp:402:13: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] while (dp = readdir(dir)) { ~~~^~~~~~~~~~~~~~ ../../src/hotspot/os/linux/crac_linux.cpp:402:13: note: place parentheses around the assignment to silence this warning while (dp = readdir(dir)) { ^ ( ) ../../src/hotspot/os/linux/crac_linux.cpp:402:13: note: use '==' to turn this assignment into an equality comparison while (dp = readdir(dir)) { ^ == 2 errors generated. ------------- Commit messages: - Fix compilation with clang Changes: https://git.openjdk.org/crac/pull/122/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=122&range=00 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/crac/pull/122.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/122/head:pull/122 PR: https://git.openjdk.org/crac/pull/122 From jkratochvil at openjdk.org Thu Oct 5 16:23:45 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Thu, 5 Oct 2023 16:23:45 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v7] In-Reply-To: References: Message-ID: <6M2n-EYHwJrk68TvprpTLJQr-dHm50DrYD_kSbcjBbs=.e6772ac8-0c20-4e7a-8aa5-3ffbef814a56@github.com> > CRaC: Fix fds opened for logging Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix clang compilation -Werror warnings ------------- Changes: - all: https://git.openjdk.org/crac/pull/113/files - new: https://git.openjdk.org/crac/pull/113/files/a79974a4..02a5fab3 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=113&range=06 - incr: https://webrevs.openjdk.org/?repo=crac&pr=113&range=05-06 Stats: 12 lines in 2 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/crac/pull/113.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/113/head:pull/113 PR: https://git.openjdk.org/crac/pull/113 From rvansa at openjdk.org Fri Oct 6 06:57:35 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 6 Oct 2023 06:57:35 GMT Subject: [crac] RFR: Fix compilation with clang In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 15:38:55 GMT, Jan Kratochvil wrote: > ../../src/hotspot/os/linux/crac_linux.cpp:192:13: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] > while (dp = readdir(dir)) { > ~~~^~~~~~~~~~~~~~ > ../../src/hotspot/os/linux/crac_linux.cpp:192:13: note: place parentheses around the assignment to silence this warning > while (dp = readdir(dir)) { > ^ > ( ) > ../../src/hotspot/os/linux/crac_linux.cpp:192:13: note: use '==' to turn this assignment into an equality comparison > while (dp = readdir(dir)) { > ^ > == > ../../src/hotspot/os/linux/crac_linux.cpp:402:13: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] > while (dp = readdir(dir)) { > ~~~^~~~~~~~~~~~~~ > ../../src/hotspot/os/linux/crac_linux.cpp:402:13: note: place parentheses around the assignment to silence this warning > while (dp = readdir(dir)) { > ^ > ( ) > ../../src/hotspot/os/linux/crac_linux.cpp:402:13: note: use '==' to turn this assignment into an equality comparison > while (dp = readdir(dir)) { > ^ > == > 2 errors generated. Marked as reviewed by rvansa (Committer). ------------- PR Review: https://git.openjdk.org/crac/pull/122#pullrequestreview-1661222184 From rvansa at openjdk.org Fri Oct 6 07:10:33 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 6 Oct 2023 07:10:33 GMT Subject: [crac] RFR: CRaC: Fix fds opened for logging [v7] In-Reply-To: <6M2n-EYHwJrk68TvprpTLJQr-dHm50DrYD_kSbcjBbs=.e6772ac8-0c20-4e7a-8aa5-3ffbef814a56@github.com> References: <6M2n-EYHwJrk68TvprpTLJQr-dHm50DrYD_kSbcjBbs=.e6772ac8-0c20-4e7a-8aa5-3ffbef814a56@github.com> Message-ID: On Thu, 5 Oct 2023 16:23:45 GMT, Jan Kratochvil wrote: >> CRaC: Fix fds opened for logging > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix clang compilation -Werror warnings The build looks good, please set to integrate again. ------------- PR Comment: https://git.openjdk.org/crac/pull/113#issuecomment-1750090216 From rvansa at openjdk.org Fri Oct 6 08:10:17 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 6 Oct 2023 08:10:17 GMT Subject: [crac] RFR: Persist memory in-JVM [v10] In-Reply-To: References: Message-ID: <-AAgO5ziLiuXJ7VISbH5rCCtPfDYMEHTI5Vx3EgNICI=.bbedf232-fa9c-4170-932e-a24d1429d0f5@github.com> > This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. > > At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. > > ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). > ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. > > Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Try fix other platforms ------------- Changes: - all: https://git.openjdk.org/crac/pull/95/files - new: https://git.openjdk.org/crac/pull/95/files/11b4a04e..37819a6b Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=95&range=09 - incr: https://webrevs.openjdk.org/?repo=crac&pr=95&range=08-09 Stats: 93 lines in 6 files changed: 31 ins; 45 del; 17 mod Patch: https://git.openjdk.org/crac/pull/95.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/95/head:pull/95 PR: https://git.openjdk.org/crac/pull/95 From jkratochvil at openjdk.org Fri Oct 6 11:15:34 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Fri, 6 Oct 2023 11:15:34 GMT Subject: [crac] Integrated: CRaC: Fix fds opened for logging In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 14:30:15 GMT, Jan Kratochvil wrote: > CRaC: Fix fds opened for logging This pull request has now been integrated. Changeset: dc16f1da Author: Jan Kratochvil Committer: Radim Vansa URL: https://git.openjdk.org/crac/commit/dc16f1da298f8f9162e3e8634d2aa7690e05e1c6 Stats: 183 lines in 11 files changed: 170 ins; 0 del; 13 mod CRaC: Fix fds opened for logging Reviewed-by: rvansa ------------- PR: https://git.openjdk.org/crac/pull/113 From jkratochvil at openjdk.org Fri Oct 6 11:15:28 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Fri, 6 Oct 2023 11:15:28 GMT Subject: [crac] Integrated: Fix compilation with clang In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 15:38:55 GMT, Jan Kratochvil wrote: > ../../src/hotspot/os/linux/crac_linux.cpp:192:13: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] > while (dp = readdir(dir)) { > ~~~^~~~~~~~~~~~~~ > ../../src/hotspot/os/linux/crac_linux.cpp:192:13: note: place parentheses around the assignment to silence this warning > while (dp = readdir(dir)) { > ^ > ( ) > ../../src/hotspot/os/linux/crac_linux.cpp:192:13: note: use '==' to turn this assignment into an equality comparison > while (dp = readdir(dir)) { > ^ > == > ../../src/hotspot/os/linux/crac_linux.cpp:402:13: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] > while (dp = readdir(dir)) { > ~~~^~~~~~~~~~~~~~ > ../../src/hotspot/os/linux/crac_linux.cpp:402:13: note: place parentheses around the assignment to silence this warning > while (dp = readdir(dir)) { > ^ > ( ) > ../../src/hotspot/os/linux/crac_linux.cpp:402:13: note: use '==' to turn this assignment into an equality comparison > while (dp = readdir(dir)) { > ^ > == > 2 errors generated. This pull request has now been integrated. Changeset: b6d39f11 Author: Jan Kratochvil Committer: Radim Vansa URL: https://git.openjdk.org/crac/commit/b6d39f11bc216c097dcc255a37e859b6c8d01144 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Fix compilation with clang Reviewed-by: rvansa ------------- PR: https://git.openjdk.org/crac/pull/122 From rvansa at openjdk.org Fri Oct 6 11:23:00 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 6 Oct 2023 11:23:00 GMT Subject: [crac] RFR: Persist memory in-JVM [v11] In-Reply-To: References: Message-ID: <38x5Pn_mYikGvIabE9kPiNkEPGbZYu62uEXIVDEso7Y=.4a8bb200-5f39-4ef9-aa36-f41ea36fe6e3@github.com> > This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. > > At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. > > ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). > ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. > > Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Another attempt to fix Win & OSX ------------- Changes: - all: https://git.openjdk.org/crac/pull/95/files - new: https://git.openjdk.org/crac/pull/95/files/37819a6b..be580c5f Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=95&range=10 - incr: https://webrevs.openjdk.org/?repo=crac&pr=95&range=09-10 Stats: 13 lines in 2 files changed: 11 ins; 1 del; 1 mod Patch: https://git.openjdk.org/crac/pull/95.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/95/head:pull/95 PR: https://git.openjdk.org/crac/pull/95 From rvansa at openjdk.org Fri Oct 6 13:20:32 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 6 Oct 2023 13:20:32 GMT Subject: [crac] RFR: Fix OSX failure in SunMiscSignalTest Message-ID: Fixes GHA failure in SunMiscSignalTest on OSX ------------- Commit messages: - Fix OSX failure in SunMiscSignalTest Changes: https://git.openjdk.org/crac/pull/123/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=123&range=00 Stats: 6 lines in 2 files changed: 4 ins; 2 del; 0 mod Patch: https://git.openjdk.org/crac/pull/123.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/123/head:pull/123 PR: https://git.openjdk.org/crac/pull/123 From rvansa at openjdk.org Fri Oct 6 13:23:13 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 6 Oct 2023 13:23:13 GMT Subject: [crac] RFR: Handle C/R in BasicImageReader (via reflection) Message-ID: Fixes checkpoint failure due to open FD to modules on x86 (the Java code behaves differently on 32bit and 64bt). ------------- Commit messages: - Handle C/R in BasicImageReader (via reflection) Changes: https://git.openjdk.org/crac/pull/124/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=124&range=00 Stats: 112 lines in 1 file changed: 86 ins; 22 del; 4 mod Patch: https://git.openjdk.org/crac/pull/124.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/124/head:pull/124 PR: https://git.openjdk.org/crac/pull/124 From rvansa at openjdk.org Fri Oct 6 13:24:56 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 6 Oct 2023 13:24:56 GMT Subject: [crac] RFR: Stabilize GHA testsuite Message-ID: <79rGmrItxYDFygbMRycCnfgunRHRtY2w9w9be9vFICU=.92f04e17-9266-41e8-9f60-5f81022b6132@github.com> At this point Github Actions always show some failures; let's ignore platforms that we don't handle for CRaC and some problematic tests in mainline JDK. ------------- Commit messages: - ProblemList test runtime/ClassInitErrors/TestStackOverflowDuringInit.java - Exclude s390, ppc64le and riscv64 from build Changes: https://git.openjdk.org/crac/pull/125/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=125&range=00 Stats: 20 lines in 2 files changed: 2 ins; 0 del; 18 mod Patch: https://git.openjdk.org/crac/pull/125.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/125/head:pull/125 PR: https://git.openjdk.org/crac/pull/125 From rvansa at openjdk.org Fri Oct 6 15:27:27 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 6 Oct 2023 15:27:27 GMT Subject: [crac] RFR: Persist memory in-JVM [v12] In-Reply-To: References: Message-ID: > This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. > > At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. > > ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). > ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. > > Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. Radim Vansa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: - maybe fixup - Merge branch 'crac' into persist_memory - Another attempt to fix Win & OSX - Try fix other platforms - Fix build on non-linux - Fix issues manifesting on ia32 * reinit memory early during restore * use native size for fields in index * fix syscall passing too long pointer (64bit) on 32bit * fix various bugs in code - Fix x86 build - Use number of threads directly - Don't persist CodeCache used for non-nMethods The stubs allocated in this heap are used for atomic operations on aarch64, avoiding them before the memory is restored would be complicated. - Fix close() on windows - ... and 39 more: https://git.openjdk.org/crac/compare/b6d39f11...6a215b41 ------------- Changes: https://git.openjdk.org/crac/pull/95/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=95&range=11 Stats: 1368 lines in 37 files changed: 1312 ins; 29 del; 27 mod Patch: https://git.openjdk.org/crac/pull/95.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/95/head:pull/95 PR: https://git.openjdk.org/crac/pull/95 From akozlov at openjdk.org Thu Oct 12 14:23:51 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 12 Oct 2023 14:23:51 GMT Subject: [crac] RFR: Drop perfdata and cppath [v2] In-Reply-To: References: Message-ID: <-9ATcgIRdSIu5pNkyQcLe77SdRWc3z_9HznGH6IPXa4=.3bf77a09-065e-4be0-b7ae-58530f30c51a@github.com> On Mon, 2 Oct 2023 19:51:51 GMT, Radim Vansa wrote: >> Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. >> This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Add jcmd PerfCounter.print to the test It should be possible to remove static int checkpoint_fd = -1; from https://github.com/openjdk/crac/pull/119/files#diff-7313eb3d328797a7720fa1b2b73cd159934506593443e45534baad80cb1382b7R66 ------------- PR Review: https://git.openjdk.org/crac/pull/119#pullrequestreview-1674179076 From rvansa at openjdk.org Thu Oct 12 15:42:00 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 12 Oct 2023 15:42:00 GMT Subject: [crac] RFR: Drop perfdata and cppath [v3] In-Reply-To: References: Message-ID: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Remove forgotten global variable ------------- Changes: - all: https://git.openjdk.org/crac/pull/119/files - new: https://git.openjdk.org/crac/pull/119/files/bd6adcc8..bc8cf838 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=119&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=119&range=01-02 Stats: 67 lines in 1 file changed: 36 ins; 9 del; 22 mod Patch: https://git.openjdk.org/crac/pull/119.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/119/head:pull/119 PR: https://git.openjdk.org/crac/pull/119 From rvansa at openjdk.org Thu Oct 12 15:47:27 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 12 Oct 2023 15:47:27 GMT Subject: [crac] RFR: Drop perfdata and cppath [v4] In-Reply-To: References: Message-ID: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Fixup for previous commit ------------- Changes: - all: https://git.openjdk.org/crac/pull/119/files - new: https://git.openjdk.org/crac/pull/119/files/bc8cf838..79298fd2 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=119&range=03 - incr: https://webrevs.openjdk.org/?repo=crac&pr=119&range=02-03 Stats: 73 lines in 1 file changed: 15 ins; 43 del; 15 mod Patch: https://git.openjdk.org/crac/pull/119.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/119/head:pull/119 PR: https://git.openjdk.org/crac/pull/119 From rvansa at openjdk.org Fri Oct 13 12:28:20 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 13 Oct 2023 12:28:20 GMT Subject: [crac] RFR: Drop perfdata and cppath [v5] In-Reply-To: References: Message-ID: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Fix NPE in some tests ------------- Changes: - all: https://git.openjdk.org/crac/pull/119/files - new: https://git.openjdk.org/crac/pull/119/files/79298fd2..da777fd5 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=119&range=04 - incr: https://webrevs.openjdk.org/?repo=crac&pr=119&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/119.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/119/head:pull/119 PR: https://git.openjdk.org/crac/pull/119 From rvansa at openjdk.org Fri Oct 13 14:31:17 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 13 Oct 2023 14:31:17 GMT Subject: [crac] RFR: Drop perfdata and cppath [v6] In-Reply-To: References: Message-ID: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add timeouts to the test ------------- Changes: - all: https://git.openjdk.org/crac/pull/119/files - new: https://git.openjdk.org/crac/pull/119/files/da777fd5..0309d72a Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=119&range=05 - incr: https://webrevs.openjdk.org/?repo=crac&pr=119&range=04-05 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/119.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/119/head:pull/119 PR: https://git.openjdk.org/crac/pull/119 From rvansa at openjdk.org Fri Oct 13 15:39:35 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 13 Oct 2023 15:39:35 GMT Subject: [crac] RFR: Drop perfdata and cppath [v7] In-Reply-To: References: Message-ID: <9h1quFhIzx-ceWjw6PzPIxUjRoKhml42KFu9TXSa32M=.0e55cf67-a0fd-466c-bcf5-3f34852cd4a7@github.com> > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Use hardcoded path ------------- Changes: - all: https://git.openjdk.org/crac/pull/119/files - new: https://git.openjdk.org/crac/pull/119/files/0309d72a..ce6cb495 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=119&range=06 - incr: https://webrevs.openjdk.org/?repo=crac&pr=119&range=05-06 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/119.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/119/head:pull/119 PR: https://git.openjdk.org/crac/pull/119 From rvansa at openjdk.org Fri Oct 13 20:33:01 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 13 Oct 2023 20:33:01 GMT Subject: [crac] RFR: Drop perfdata and cppath [v2] In-Reply-To: <-9ATcgIRdSIu5pNkyQcLe77SdRWc3z_9HznGH6IPXa4=.3bf77a09-065e-4be0-b7ae-58530f30c51a@github.com> References: <-9ATcgIRdSIu5pNkyQcLe77SdRWc3z_9HznGH6IPXa4=.3bf77a09-065e-4be0-b7ae-58530f30c51a@github.com> Message-ID: On Thu, 12 Oct 2023 13:26:03 GMT, Anton Kozlov wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add jcmd PerfCounter.print to the test > > It should be possible to remove > > static int checkpoint_fd = -1; > > > from https://github.com/openjdk/crac/pull/119/files#diff-7313eb3d328797a7720fa1b2b73cd159934506593443e45534baad80cb1382b7R66 @AntonKozlov Done. I also had to fixup the test, as in GHA the temp directory was set to a local folder and actually it has to be hardcoded to `/tmp` on Linux (local testing did not reveal that). ------------- PR Comment: https://git.openjdk.org/crac/pull/119#issuecomment-1762160673 From duke at openjdk.org Mon Oct 16 05:15:57 2023 From: duke at openjdk.org (Pushkar N Kulkarni) Date: Mon, 16 Oct 2023 05:15:57 GMT Subject: [crac] RFR: Update the "modified criu" link to the latest release Message-ID: The modified criu binary downloaded from the release that is linked from the README causes CRaC to segfault while checkpointing. Updating to the latest release helps resolve the problem. ------------- Commit messages: - Update the "modified criu" link to the latest release Changes: https://git.openjdk.org/crac/pull/126/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=126&range=00 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/126.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/126/head:pull/126 PR: https://git.openjdk.org/crac/pull/126 From duke at openjdk.org Mon Oct 16 05:41:52 2023 From: duke at openjdk.org (Pushkar N Kulkarni) Date: Mon, 16 Oct 2023 05:41:52 GMT Subject: [crac] RFR: Update the "modified criu" link to the latest release [v2] In-Reply-To: References: Message-ID: > The modified criu binary downloaded from the release that is linked from the README causes CRaC to segfault while checkpointing. Updating to the latest release helps resolve the problem. Pushkar N Kulkarni has updated the pull request incrementally with one additional commit since the last revision: Update full name ------------- Changes: - all: https://git.openjdk.org/crac/pull/126/files - new: https://git.openjdk.org/crac/pull/126/files/10fc10e2..3bc972ad Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=126&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=126&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/126.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/126/head:pull/126 PR: https://git.openjdk.org/crac/pull/126 From rvansa at openjdk.org Mon Oct 16 07:39:37 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 16 Oct 2023 07:39:37 GMT Subject: [crac] RFR: Update the "modified criu" link to the latest release [v2] In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 05:41:52 GMT, Pushkar N Kulkarni wrote: >> The modified criu binary downloaded from the release that is linked from the README causes CRaC to segfault while checkpointing. Updating to the latest release helps resolve the problem. > > Pushkar N Kulkarni has updated the pull request incrementally with one additional commit since the last revision: > > Update full name Thanks for the correction! ------------- Marked as reviewed by rvansa (Committer). PR Review: https://git.openjdk.org/crac/pull/126#pullrequestreview-1679308568 From jkratochvil at openjdk.org Mon Oct 16 11:45:35 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 16 Oct 2023 11:45:35 GMT Subject: [crac] RFR: -XX:+ShowCPUFeatures: Fix double 0x (0x0x) Message-ID: - `CPU features being used are: -XX:CPUFeatures=0x0x4ff7fff9dfcfbf7,0x0x3e6` ------------- Commit messages: - -XX:+ShowCPUFeatures: Fix double 0x (0x0x) Changes: https://git.openjdk.org/crac/pull/127/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=127&range=00 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/crac/pull/127.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/127/head:pull/127 PR: https://git.openjdk.org/crac/pull/127 From rvansa at openjdk.org Mon Oct 16 11:59:13 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 16 Oct 2023 11:59:13 GMT Subject: [crac] RFR: Persist memory in-JVM [v13] In-Reply-To: References: Message-ID: > This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. > > At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. > > ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). > ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. > > Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. Radim Vansa has updated the pull request incrementally with three additional commits since the last revision: - Revert changes not needed anymore in vm_version - Windows build fix - Refactored to fix OSX ------------- Changes: - all: https://git.openjdk.org/crac/pull/95/files - new: https://git.openjdk.org/crac/pull/95/files/6a215b41..b23f9246 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=95&range=12 - incr: https://webrevs.openjdk.org/?repo=crac&pr=95&range=11-12 Stats: 403 lines in 17 files changed: 178 ins; 174 del; 51 mod Patch: https://git.openjdk.org/crac/pull/95.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/95/head:pull/95 PR: https://git.openjdk.org/crac/pull/95 From rvansa at openjdk.org Tue Oct 17 06:17:57 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 17 Oct 2023 06:17:57 GMT Subject: [crac] RFR: -XX:+ShowCPUFeatures: Fix double 0x (0x0x) In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 11:38:09 GMT, Jan Kratochvil wrote: > - `CPU features being used are: -XX:CPUFeatures=0x0x4ff7fff9dfcfbf7,0x0x3e6` Marked as reviewed by rvansa (Committer). ------------- PR Review: https://git.openjdk.org/crac/pull/127#pullrequestreview-1681437154 From rvansa at openjdk.org Tue Oct 17 06:37:58 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 17 Oct 2023 06:37:58 GMT Subject: [crac] RFR: Update the "modified criu" link to the latest release [v2] In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 05:41:52 GMT, Pushkar N Kulkarni wrote: >> The modified criu binary downloaded from the release that is linked from the README causes CRaC to segfault while checkpointing. Updating to the latest release helps resolve the problem. > > Pushkar N Kulkarni has updated the pull request incrementally with one additional commit since the last revision: > > Update full name @pushkarnk Could you please type in the `/integrate` command? ------------- PR Comment: https://git.openjdk.org/crac/pull/126#issuecomment-1765758338 From duke at openjdk.org Tue Oct 17 07:07:56 2023 From: duke at openjdk.org (Pushkar N Kulkarni) Date: Tue, 17 Oct 2023 07:07:56 GMT Subject: [crac] Integrated: Update the "modified criu" link to the latest release In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 05:07:57 GMT, Pushkar N Kulkarni wrote: > The modified criu binary downloaded from the release that is linked from the README causes CRaC to segfault while checkpointing. Updating to the latest release helps resolve the problem. This pull request has now been integrated. Changeset: ba0087fc Author: Pushkar N Kulkarni Committer: Radim Vansa URL: https://git.openjdk.org/crac/commit/ba0087fca72ae0f21351fd7bd8d1e7926175b860 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Update the "modified criu" link to the latest release Reviewed-by: rvansa ------------- PR: https://git.openjdk.org/crac/pull/126 From jkratochvil at openjdk.org Tue Oct 17 07:10:53 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 17 Oct 2023 07:10:53 GMT Subject: [crac] Integrated: -XX:+ShowCPUFeatures: Fix double 0x (0x0x) In-Reply-To: References: Message-ID: <4aTIHFBk16qZjmygYVcwVH77t7Qli0cuKpD8V5e9RwM=.62a8fb2a-a7c1-4b5f-9c44-a33266714c87@github.com> On Mon, 16 Oct 2023 11:38:09 GMT, Jan Kratochvil wrote: > - `CPU features being used are: -XX:CPUFeatures=0x0x4ff7fff9dfcfbf7,0x0x3e6` This pull request has now been integrated. Changeset: 396db644 Author: Jan Kratochvil Committer: Radim Vansa URL: https://git.openjdk.org/crac/commit/396db64430b00c734a1538c2133cc6ddfb9264c5 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod -XX:+ShowCPUFeatures: Fix double 0x (0x0x) Reviewed-by: rvansa ------------- PR: https://git.openjdk.org/crac/pull/127 From rvansa at openjdk.org Tue Oct 17 07:37:35 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 17 Oct 2023 07:37:35 GMT Subject: [crac] RFR: Fix test when running as root Message-ID: Root can reopen the file even without permissions. ------------- Commit messages: - Fix test when running as root Changes: https://git.openjdk.org/crac/pull/128/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=128&range=00 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/128.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/128/head:pull/128 PR: https://git.openjdk.org/crac/pull/128 From jkratochvil at openjdk.org Tue Oct 17 07:58:44 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 17 Oct 2023 07:58:44 GMT Subject: [crac] RFR: Fix test when running as root In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 07:28:44 GMT, Radim Vansa wrote: > Root can reopen the file even without permissions. Yes, it does work for me now, thanks. ------------- PR Comment: https://git.openjdk.org/crac/pull/128#issuecomment-1765871584 From rmarchenko at openjdk.org Tue Oct 17 08:59:13 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 17 Oct 2023 08:59:13 GMT Subject: [crac] RFR: Fixing crash on restore when user name is not set Message-ID: This change fixes a crash occuring during restore in a container when user name is not set. ------------- Commit messages: - Fixing crash on restore when user name is not set Changes: https://git.openjdk.org/crac/pull/129/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=129&range=00 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/129.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/129/head:pull/129 PR: https://git.openjdk.org/crac/pull/129 From rmarchenko at openjdk.org Tue Oct 17 09:36:11 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 17 Oct 2023 09:36:11 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime flag) Message-ID: This change adds an opportunity reset both JVM's start time and uptime on restoring. Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". The flag is 'false' by default. ------------- Commit messages: - Reset JVM start time and up time on restore (CRaCResetStartTime flag) Changes: https://git.openjdk.org/crac/pull/130/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=130&range=00 Stats: 96 lines in 6 files changed: 96 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/130.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/130/head:pull/130 PR: https://git.openjdk.org/crac/pull/130 From rvansa at openjdk.org Tue Oct 17 11:50:30 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 17 Oct 2023 11:50:30 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 09:29:35 GMT, Roman Marchenko wrote: > This change adds an opportunity reset both JVM's start time and uptime on restoring. > > Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". > > The flag is 'false' by default. src/hotspot/os/linux/crac_linux.cpp line 468: > 466: > 467: void crac::initialize_time_counters() { > 468: os::Posix::init(); The initialization updates some condition variable attributes; have you checked if there's any chance that another thread could observe an inconsistent state while this is updated? test/jdk/jdk/crac/ResetStartTimeTest.java line 40: > 38: * @library /test/lib > 39: * @build SimpleTest > 40: * @requires (os.family == "linux") I think that if you use simengine the test is applicable to other platforms as well. test/jdk/jdk/crac/ResetStartTimeTest.java line 44: > 42: * @run driver/timeout=60 jdk.test.lib.crac.CracTest true > 43: */ > 44: public class SimpleTest implements CracTest { Could you please name the test `ResetStartTimeTest` to match the filename? test/jdk/jdk/crac/ResetStartTimeTest.java line 75: > 73: assertLessThan(uptime1, WAIT_TIMEOUT); > 74: } else { > 75: assertLessThan(uptime0, uptime1); If the C/R is really quick this should be <= ------------- PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1361985624 PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1361962742 PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1361959649 PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1361962107 From rvansa at openjdk.org Tue Oct 17 11:53:41 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 17 Oct 2023 11:53:41 GMT Subject: [crac] RFR: Fixing crash on restore when user name is not set In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 08:53:02 GMT, Roman Marchenko wrote: > This change fixes a crash occuring during restore in a container when user name is not set. src/hotspot/os/posix/perfMemory_posix.cpp line 1405: > 1403: char* user_name = get_user_name(geteuid()); > 1404: if (!user_name) { > 1405: return false; Could we accompany this with an error message? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/129#discussion_r1361990839 From rmarchenko at openjdk.org Wed Oct 18 07:29:37 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Wed, 18 Oct 2023 07:29:37 GMT Subject: [crac] RFR: Fixing crash on restore when user name is not set In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 11:51:08 GMT, Radim Vansa wrote: >> This change fixes a crash occuring during restore in a container when user name is not set. > > src/hotspot/os/posix/perfMemory_posix.cpp line 1405: > >> 1403: char* user_name = get_user_name(geteuid()); >> 1404: if (!user_name) { >> 1405: return false; > > Could we accompany this with an error message? I'm not sure about an error message as it doesn't fail restore procedure. The same scenario from [here ](https://github.com/openjdk/crac/blob/crac/src/hotspot/os/posix/perfMemory_posix.cpp#L1032) doesn't show any message also. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/129#discussion_r1363376921 From rvansa at openjdk.org Wed Oct 18 08:27:06 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Wed, 18 Oct 2023 08:27:06 GMT Subject: [crac] RFR: Fixing crash on restore when user name is not set In-Reply-To: References: Message-ID: <50LnQyfUIeLZa3uw6eJtgG9HZGnjA4NlxJhDTM9MGQM=.af644d43-6b48-4ace-bea6-7b6fb71aa7af@github.com> On Tue, 17 Oct 2023 08:53:02 GMT, Roman Marchenko wrote: > This change fixes a crash occuring during restore in a container when user name is not set. Marked as reviewed by rvansa (Committer). ------------- PR Review: https://git.openjdk.org/crac/pull/129#pullrequestreview-1684376573 From rvansa at openjdk.org Wed Oct 18 08:27:26 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Wed, 18 Oct 2023 08:27:26 GMT Subject: [crac] RFR: Fixing crash on restore when user name is not set In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 07:26:28 GMT, Roman Marchenko wrote: >> src/hotspot/os/posix/perfMemory_posix.cpp line 1405: >> >>> 1403: char* user_name = get_user_name(geteuid()); >>> 1404: if (!user_name) { >>> 1405: return false; >> >> Could we accompany this with an error message? > > I'm not sure about an error message as it doesn't fail restore procedure. > The same scenario from [here ](https://github.com/openjdk/crac/blob/crac/src/hotspot/os/posix/perfMemory_posix.cpp#L1032) doesn't show any message also. OK, that scenario would end up with an error message only with [develop flags](https://github.com/openjdk/crac/blob/crac/src/hotspot/os/posix/perfMemory_posix.cpp#L1261). ------------- PR Review Comment: https://git.openjdk.org/crac/pull/129#discussion_r1363425156 From rmarchenko at openjdk.org Wed Oct 18 08:28:18 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Wed, 18 Oct 2023 08:28:18 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 11:46:55 GMT, Radim Vansa wrote: >> This change adds an opportunity reset both JVM's start time and uptime on restoring. >> >> Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". >> >> The flag is 'false' by default. > > src/hotspot/os/linux/crac_linux.cpp line 468: > >> 466: >> 467: void crac::initialize_time_counters() { >> 468: os::Posix::init(); > > The initialization updates some condition variable attributes; have you checked if there's any chance that another thread could observe an inconsistent state while this is updated? You're right, I missed it. > test/jdk/jdk/crac/ResetStartTimeTest.java line 44: > >> 42: * @run driver/timeout=60 jdk.test.lib.crac.CracTest true >> 43: */ >> 44: public class SimpleTest implements CracTest { > > Could you please name the test `ResetStartTimeTest` to match the filename? Ye olde copy-n-paste :) Thanks! ------------- PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1363461639 PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1363417336 From rmarchenko at openjdk.org Wed Oct 18 08:33:02 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Wed, 18 Oct 2023 08:33:02 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v2] In-Reply-To: References: Message-ID: > This change adds an opportunity reset both JVM's start time and uptime on restoring. > > Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". > > The flag is 'false' by default. Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: Fixing review comments ------------- Changes: - all: https://git.openjdk.org/crac/pull/130/files - new: https://git.openjdk.org/crac/pull/130/files/cd5b63eb..91913cb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=130&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=130&range=00-01 Stats: 29 lines in 5 files changed: 14 ins; 7 del; 8 mod Patch: https://git.openjdk.org/crac/pull/130.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/130/head:pull/130 PR: https://git.openjdk.org/crac/pull/130 From rmarchenko at openjdk.org Wed Oct 18 08:33:06 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Wed, 18 Oct 2023 08:33:06 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 08:24:49 GMT, Roman Marchenko wrote: >> src/hotspot/os/linux/crac_linux.cpp line 468: >> >>> 466: >>> 467: void crac::initialize_time_counters() { >>> 468: os::Posix::init(); >> >> The initialization updates some condition variable attributes; have you checked if there's any chance that another thread could observe an inconsistent state while this is updated? > > You're right, I missed it. Done >> test/jdk/jdk/crac/ResetStartTimeTest.java line 44: >> >>> 42: * @run driver/timeout=60 jdk.test.lib.crac.CracTest true >>> 43: */ >>> 44: public class SimpleTest implements CracTest { >> >> Could you please name the test `ResetStartTimeTest` to match the filename? > > Ye olde copy-n-paste :) > Thanks! Done ------------- PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1363475223 PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1363474387 From rmarchenko at openjdk.org Wed Oct 18 08:33:08 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Wed, 18 Oct 2023 08:33:08 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 11:37:11 GMT, Radim Vansa wrote: >> Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing review comments > > test/jdk/jdk/crac/ResetStartTimeTest.java line 40: > >> 38: * @library /test/lib >> 39: * @build SimpleTest >> 40: * @requires (os.family == "linux") > > I think that if you use simengine the test is applicable to other platforms as well. Done > test/jdk/jdk/crac/ResetStartTimeTest.java line 75: > >> 73: assertLessThan(uptime1, WAIT_TIMEOUT); >> 74: } else { >> 75: assertLessThan(uptime0, uptime1); > > If the C/R is really quick this should be <= Done ------------- PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1363474783 PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1363474550 From rmarchenko at openjdk.org Wed Oct 18 09:08:41 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Wed, 18 Oct 2023 09:08:41 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v3] In-Reply-To: References: Message-ID: > This change adds an opportunity reset both JVM's start time and uptime on restoring. > > Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". > > The flag is 'false' by default. Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: Restoring file ------------- Changes: - all: https://git.openjdk.org/crac/pull/130/files - new: https://git.openjdk.org/crac/pull/130/files/91913cb2..3daec256 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=130&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=130&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/crac/pull/130.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/130/head:pull/130 PR: https://git.openjdk.org/crac/pull/130 From rvansa at openjdk.org Wed Oct 18 18:59:12 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Wed, 18 Oct 2023 18:59:12 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v3] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:08:41 GMT, Roman Marchenko wrote: >> This change adds an opportunity reset both JVM's start time and uptime on restoring. >> >> Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". >> >> The flag is 'false' by default. > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Restoring file LGTM in general, but the build is failing on some platforms with /home/runner/work/crac/crac/src/hotspot/os/posix/crac_posix.cpp:44:14: error: incomplete type 'os::Posix' used in nested name specifier and similar errors. ------------- PR Review: https://git.openjdk.org/crac/pull/130#pullrequestreview-1685876624 From rmarchenko at openjdk.org Wed Oct 18 19:09:16 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Wed, 18 Oct 2023 19:09:16 GMT Subject: [crac] Integrated: Fixing crash on restore when user name is not set In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 08:53:02 GMT, Roman Marchenko wrote: > This change fixes a crash occuring during restore in a container when user name is not set. This pull request has now been integrated. Changeset: f0b2fcda Author: Roman Marchenko Committer: Radim Vansa URL: https://git.openjdk.org/crac/commit/f0b2fcda193c532f1140479b085aedc93585cd88 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Fixing crash on restore when user name is not set Reviewed-by: rvansa ------------- PR: https://git.openjdk.org/crac/pull/129 From rmarchenko at openjdk.org Thu Oct 19 05:51:13 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Thu, 19 Oct 2023 05:51:13 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v4] In-Reply-To: References: Message-ID: > This change adds an opportunity reset both JVM's start time and uptime on restoring. > > Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". > > The flag is 'false' by default. Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: Fixing builds ------------- Changes: - all: https://git.openjdk.org/crac/pull/130/files - new: https://git.openjdk.org/crac/pull/130/files/3daec256..360bd03d Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=130&range=03 - incr: https://webrevs.openjdk.org/?repo=crac&pr=130&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/130.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/130/head:pull/130 PR: https://git.openjdk.org/crac/pull/130 From rmarchenko at openjdk.org Thu Oct 19 07:25:12 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Thu, 19 Oct 2023 07:25:12 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v5] In-Reply-To: References: Message-ID: > This change adds an opportunity reset both JVM's start time and uptime on restoring. > > Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". > > The flag is 'false' by default. Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: Fixing win32 build ------------- Changes: - all: https://git.openjdk.org/crac/pull/130/files - new: https://git.openjdk.org/crac/pull/130/files/360bd03d..79c08f62 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=130&range=04 - incr: https://webrevs.openjdk.org/?repo=crac&pr=130&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod Patch: https://git.openjdk.org/crac/pull/130.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/130/head:pull/130 PR: https://git.openjdk.org/crac/pull/130 From rvansa at openjdk.org Thu Oct 19 09:22:40 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 19 Oct 2023 09:22:40 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v5] In-Reply-To: References: Message-ID: <51lNVtqyDzmCsr7DXDcvxBQ1YtcAKoMzCoYuiitKUzU=.29593240-9d4b-4cfa-83f1-dacc8a15f9a2@github.com> On Thu, 19 Oct 2023 07:25:12 GMT, Roman Marchenko wrote: >> This change adds an opportunity reset both JVM's start time and uptime on restoring. >> >> Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". >> >> The flag is 'false' by default. > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Fixing win32 build Marked as reviewed by rvansa (Committer). ------------- PR Review: https://git.openjdk.org/crac/pull/130#pullrequestreview-1687161525 From rvansa at openjdk.org Thu Oct 19 11:28:38 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 19 Oct 2023 11:28:38 GMT Subject: [crac] RFR: Terminate restored process when criuengine restorewait exits Message-ID: <-KSjntDhlfM5JvGVbp8MyfeW7XER7usE3i17FEufyyU=.ff7e9f88-b755-4172-b052-da8973e0af1f@github.com> With criuengine the restored process gets restorewait process as its parent; scripts not expecting two processes might signal (e.g. terminate) the parent process but the actual restored process would get orphaned. ------------- Commit messages: - Terminate restored process when criuengine restorewait exits Changes: https://git.openjdk.org/crac/pull/131/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=131&range=00 Stats: 147 lines in 5 files changed: 147 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/131.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/131/head:pull/131 PR: https://git.openjdk.org/crac/pull/131 From akozlov at openjdk.org Fri Oct 20 14:53:00 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 20 Oct 2023 14:53:00 GMT Subject: [crac] RFR: Drop perfdata and cppath [v7] In-Reply-To: <9h1quFhIzx-ceWjw6PzPIxUjRoKhml42KFu9TXSa32M=.0e55cf67-a0fd-466c-bcf5-3f34852cd4a7@github.com> References: <9h1quFhIzx-ceWjw6PzPIxUjRoKhml42KFu9TXSa32M=.0e55cf67-a0fd-466c-bcf5-3f34852cd4a7@github.com> Message-ID: On Fri, 13 Oct 2023 15:39:35 GMT, Radim Vansa wrote: >> Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. >> This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Use hardcoded path LGTM, thank you! ------------- Marked as reviewed by akozlov (Lead). PR Review: https://git.openjdk.org/crac/pull/119#pullrequestreview-1690219416 From abakhtin at openjdk.org Fri Oct 20 18:22:00 2023 From: abakhtin at openjdk.org (Alexey Bakhtin) Date: Fri, 20 Oct 2023 18:22:00 GMT Subject: [crac] RFR: Drop perfdata and cppath [v7] In-Reply-To: <9h1quFhIzx-ceWjw6PzPIxUjRoKhml42KFu9TXSa32M=.0e55cf67-a0fd-466c-bcf5-3f34852cd4a7@github.com> References: <9h1quFhIzx-ceWjw6PzPIxUjRoKhml42KFu9TXSa32M=.0e55cf67-a0fd-466c-bcf5-3f34852cd4a7@github.com> Message-ID: On Fri, 13 Oct 2023 15:39:35 GMT, Radim Vansa wrote: >> Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. >> This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Use hardcoded path src/hotspot/os/linux/perfMemory_linux.hpp line 33: > 31: > 32: public: > 33: static bool checkpoint(const char* checkpoint_path); With the proposed changes you do not need the `checkpoint_path` parameter anymore. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/119#discussion_r1367362790 From rvansa at openjdk.org Mon Oct 23 06:48:18 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 23 Oct 2023 06:48:18 GMT Subject: [crac] RFR: Drop perfdata and cppath [v8] In-Reply-To: References: Message-ID: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Remove unused parameter ------------- Changes: - all: https://git.openjdk.org/crac/pull/119/files - new: https://git.openjdk.org/crac/pull/119/files/ce6cb495..7157f39f Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=119&range=07 - incr: https://webrevs.openjdk.org/?repo=crac&pr=119&range=06-07 Stats: 4 lines in 2 files changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/crac/pull/119.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/119/head:pull/119 PR: https://git.openjdk.org/crac/pull/119 From rvansa at openjdk.org Mon Oct 23 13:09:34 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 23 Oct 2023 13:09:34 GMT Subject: [crac] RFR: Drop perfdata and cppath [v9] In-Reply-To: References: Message-ID: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: fixup ------------- Changes: - all: https://git.openjdk.org/crac/pull/119/files - new: https://git.openjdk.org/crac/pull/119/files/7157f39f..1349d46c Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=119&range=08 - incr: https://webrevs.openjdk.org/?repo=crac&pr=119&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/119.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/119/head:pull/119 PR: https://git.openjdk.org/crac/pull/119 From rvansa at openjdk.org Mon Oct 23 13:25:34 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 23 Oct 2023 13:25:34 GMT Subject: [crac] RFR: Put restored Java process on foreground instead of restorewait Message-ID: Solves a scenario where we restore the process in background, `fg` it and then try to Ctrl+C interrupt it. The usefulness of this actually depends on https://github.com/CRaC/criu/pull/14 `criuengine` restorewait does not have tty on stdin/stdout/stderr, though one FD is open to tty; we have to find it. ------------- Commit messages: - Put restored Java process on foreground instead of restorewait Changes: https://git.openjdk.org/crac/pull/133/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=133&range=00 Stats: 31 lines in 1 file changed: 30 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/133.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/133/head:pull/133 PR: https://git.openjdk.org/crac/pull/133 From rvansa at openjdk.org Mon Oct 23 15:34:01 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 23 Oct 2023 15:34:01 GMT Subject: [crac] RFR: Drop perfdata and cppath [v7] In-Reply-To: References: <9h1quFhIzx-ceWjw6PzPIxUjRoKhml42KFu9TXSa32M=.0e55cf67-a0fd-466c-bcf5-3f34852cd4a7@github.com> Message-ID: On Fri, 20 Oct 2023 18:19:02 GMT, Alexey Bakhtin wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Use hardcoded path > > src/hotspot/os/linux/perfMemory_linux.hpp line 33: > >> 31: >> 32: public: >> 33: static bool checkpoint(const char* checkpoint_path); > > With the proposed changes you do not need the `checkpoint_path` parameter anymore. Thanks for noting that, updated. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/119#discussion_r1368878655 From rvansa at openjdk.org Mon Oct 23 15:37:21 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 23 Oct 2023 15:37:21 GMT Subject: [crac] Integrated: Drop perfdata and cppath In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 20:40:53 GMT, Radim Vansa wrote: > Storing PerfMemory contents into file might be fragile; if the file is corrupted (e.g. due to wrong permissions) JVM might receive SIGBUS when updating performance counters after restore. > This commit provides alternate solution, moving the shared file-mapped memory into private anonymous memory during C/R. This pull request has now been integrated. Changeset: 84f170a3 Author: Radim Vansa URL: https://git.openjdk.org/crac/commit/84f170a30ac49552be5b479032e34cfa68a939cb Stats: 279 lines in 7 files changed: 132 ins; 126 del; 21 mod Drop perfdata and cppath Reviewed-by: akozlov ------------- PR: https://git.openjdk.org/crac/pull/119 From rmarchenko at openjdk.org Tue Oct 24 07:22:59 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 24 Oct 2023 07:22:59 GMT Subject: [crac] RFR: Put restored Java process on foreground instead of restorewait In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 13:18:03 GMT, Radim Vansa wrote: > Solves a scenario where we restore the process in background, `fg` it and then try to Ctrl+C interrupt it. > > The usefulness of this actually depends on https://github.com/CRaC/criu/pull/14 > > `criuengine` restorewait does not have tty on stdin/stdout/stderr, though one FD is open to tty; we have to find it. src/java.base/unix/native/criuengine/criuengine.c line 391: > 389: int fd = atoi(dp->d_name); > 390: if (isatty(fd)) { > 391: g_tty_fd = fd; Shouldn't we break here? ------------- PR Review Comment: https://git.openjdk.org/crac/pull/133#discussion_r1369722656 From rmarchenko at openjdk.org Tue Oct 24 14:17:11 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 24 Oct 2023 14:17:11 GMT Subject: [crac] Integrated: Reset JVM start time and up time on restore (CRaCResetStartTime) In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 09:29:35 GMT, Roman Marchenko wrote: > This change adds an opportunity reset both JVM's start time and uptime on restoring. > > Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". > > The flag is 'false' by default. This pull request has now been integrated. Changeset: 35d5bc4d Author: Roman Marchenko Committer: Radim Vansa URL: https://git.openjdk.org/crac/commit/35d5bc4dc1f67f22dadec31cf8926bc8c2151593 Stats: 106 lines in 9 files changed: 104 ins; 2 del; 0 mod Reset JVM start time and up time on restore (CRaCResetStartTime) Reviewed-by: rvansa ------------- PR: https://git.openjdk.org/crac/pull/130 From jkratochvil at openjdk.org Wed Oct 25 15:17:09 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Wed, 25 Oct 2023 15:17:09 GMT Subject: [crac] RFR: Fix CPUFeatures crash on new->old CPU Message-ID: - reproducible on i7-1165G7 "qemu-kvm -cpu host" for a checkpoint and "qemu-kvm -cpu SandyBridge" for its restore ------------- Commit messages: - Fix CPUFeatures crash on new->old CPU Changes: https://git.openjdk.org/crac/pull/134/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=134&range=00 Stats: 44 lines in 4 files changed: 33 ins; 0 del; 11 mod Patch: https://git.openjdk.org/crac/pull/134.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/134/head:pull/134 PR: https://git.openjdk.org/crac/pull/134 From akozlov at openjdk.org Thu Oct 26 18:58:00 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 26 Oct 2023 18:58:00 GMT Subject: [crac] RFR: Close files opened by Decoder before checkpoint [v2] In-Reply-To: <6S30weJNZ0mIL0VExeOO1BQwq8xU9G5gZ_xiXM4mJ_E=.5e069f4b-789f-4fbf-a985-a13f270fd79a@github.com> References: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> <6S30weJNZ0mIL0VExeOO1BQwq8xU9G5gZ_xiXM4mJ_E=.5e069f4b-789f-4fbf-a985-a13f270fd79a@github.com> Message-ID: On Mon, 2 Oct 2023 18:59:40 GMT, Radim Vansa wrote: >> Native memory tracking needs to resolve some addresses for stack unwinding and the decoders keep some shared library files open as a cache. Since the decoder instances can be re-allocated anytime this fix just destroys them before a checkpoint. > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Assume error handler decoder is null during checkpoint Marked as reviewed by akozlov (Lead). ------------- PR Review: https://git.openjdk.org/crac/pull/116#pullrequestreview-1700426220 From akozlov at openjdk.org Thu Oct 26 19:34:09 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 26 Oct 2023 19:34:09 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v5] In-Reply-To: References: Message-ID: <4zHfAOtXP476tJjgt5IblV4uXi3BIGwtB-RrG8_vIw8=.6df06499-8110-41d5-ac46-363afba1e988@github.com> On Thu, 19 Oct 2023 07:25:12 GMT, Roman Marchenko wrote: >> This change adds an opportunity reset both JVM's start time and uptime on restoring. >> >> Resetting time may be performed with the new flag "-XX:+CRaCResetStartTime". >> >> The flag is 'false' by default. > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Fixing win32 build src/hotspot/share/runtime/globals.hpp line 2001: > 1999: product(bool, CRaCResetStartTime, false, RESTORE_SETTABLE, \ > 2000: "Reset JVM's start time and uptime on restore") \ > 2001: \ Sorry for being late. Since we're evaluating providing the restore time via getUptime(), the option should be true by default, and be diagnostic. Nit: the end of lines are not aligned. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1373722320 From akozlov at openjdk.org Thu Oct 26 19:35:07 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 26 Oct 2023 19:35:07 GMT Subject: [crac] RFR: Persist memory in-JVM [v13] In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 11:59:13 GMT, Radim Vansa wrote: >> This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. >> >> At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. >> >> ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). >> ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. >> >> Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. > > Radim Vansa has updated the pull request incrementally with three additional commits since the last revision: > > - Revert changes not needed anymore in vm_version > - Windows build fix > - Refactored to fix OSX Incomplete first round of review src/hotspot/os/linux/crac_linux.cpp line 481: > 479: > 480: Atomic::add(&persist_waiters, 1); > 481: // From now on the code must not use stack variables! Does it work in slowdebug? Suppose Atomic::add is not inlined and it creates the stack frame. Once the variable is decremented, but before the return, the stack may be unmapped already. Althought chances are not high. So this can in theory crash, right? src/hotspot/os/linux/crac_linux.cpp line 562: > 560: // We cannot release this in after_threads_restored(), have to wait > 561: // until the last thread restores > 562: int dec = Atomic::sub(&persist_waiters, 1); `Atomic::sub(&persist_waiters, 1)` is in the HAS_RSEQ ifdef. Apparently needs ... #endif // HAS_RSEQ int dec = Atomic::sub(&persist_waiters, 1); #ifdef HAS_RSEQ ... Or move the rseq_config memory freeing to `crac::after_threads_restored`, as a mirror of `crac::before_threads_persisted` which already spins on the `persist_waiters`. src/hotspot/share/runtime/crac.cpp line 363: > 361: VM_Version::crac_restore_finalize(); > 362: > 363: memory_restore(); memory_restore() does not gurantee not calling glibc extensively, so it may call unsupported CPU instruction before `VM_Version::crac_restore_finalize()` will have a chance to terminate the VM cleanly. src/hotspot/share/runtime/crac.cpp line 372: > 370: if (CRPersistMemory) { > 371: // Before reinit_memory the code must not change memory layout, e.g. mmapping > 372: // or even malloc'ing anything (malloc running out of space could run short and allocate Does the comment regarding malloc is valid in the current implementation? Is that the C lib malloc, or JVM's one, which we control and indeed should not use? ------------- PR Review: https://git.openjdk.org/crac/pull/95#pullrequestreview-1700082007 PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1373459286 PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1373453376 PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1373476222 PR Review Comment: https://git.openjdk.org/crac/pull/95#discussion_r1373472920 From akozlov at openjdk.org Thu Oct 26 19:41:08 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Thu, 26 Oct 2023 19:41:08 GMT Subject: [crac] RFR: Fix: arguments supplied to restore are split with whitespace In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 16:02:32 GMT, null wrote: > This PR fixes a bug that causes arguments with whitespaces to be split into multiple arguments during restore. > > It contains two commits. The first commit is refactor only. It moves all side effects from `CracRestoreParameters` to the call sites, and changes the type of `CracRestoreParameters::args` from a single string to a `GrowableArray` of strings. Serialization code is also cleaned up a bit. > > The second commit fixes the bug by introducing a new pseudo property, `-DCRaCJavaMainArgs`, that is set in `JavaMain`. An instance of `JavaMainArgs` (containing `argc` and `argv`) is stored as extra info of the property, which is later extracted in `Arguments::parse_options_for_restore` and passed to `crac::restore`. > > Potential issues: > > * We use `putenv` to modify environment variables, which expects `char*`. In this PR, I'm `const_cast`ing from `const char*` to `char*`, since I believe `putenv` doesn't actually modify the string. Is that OK? Maybe rewriting with `setenv` would be better? > * `read_growable_array` is implemented by reading byte-at-a-time from the shared memory. Will it be too slow? I could rewrite it to read everything at once (like the original code did), but the current implementation is cute and I kinda want to keep it. Would you mind rebasing the PR? src/hotspot/share/runtime/arguments.cpp line 2287: > 2285: } > 2286: } > 2287: else if (strcmp(key, "jdk.internal.crac.mainArgs") == 0) { Nit: please use One-True-Brace-Style https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#whitespace ------------- PR Review: https://git.openjdk.org/crac/pull/101#pullrequestreview-1700515230 PR Review Comment: https://git.openjdk.org/crac/pull/101#discussion_r1373726907 From rmarchenko at openjdk.org Thu Oct 26 19:41:13 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Thu, 26 Oct 2023 19:41:13 GMT Subject: [crac] RFR: Reset JVM start time and up time on restore (CRaCResetStartTime) [v5] In-Reply-To: <4zHfAOtXP476tJjgt5IblV4uXi3BIGwtB-RrG8_vIw8=.6df06499-8110-41d5-ac46-363afba1e988@github.com> References: <4zHfAOtXP476tJjgt5IblV4uXi3BIGwtB-RrG8_vIw8=.6df06499-8110-41d5-ac46-363afba1e988@github.com> Message-ID: On Thu, 26 Oct 2023 19:31:15 GMT, Anton Kozlov wrote: >> Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing win32 build > > src/hotspot/share/runtime/globals.hpp line 2001: > >> 1999: product(bool, CRaCResetStartTime, false, RESTORE_SETTABLE, \ >> 2000: "Reset JVM's start time and uptime on restore") \ >> 2001: \ > > Sorry for being late. Since we're evaluating providing the restore time via getUptime(), the option should be true by default, and be diagnostic. > > Nit: the end of lines are not aligned. Ok, I will change it in a new PR ------------- PR Review Comment: https://git.openjdk.org/crac/pull/130#discussion_r1373729413 From rvansa at openjdk.org Fri Oct 27 08:07:11 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 27 Oct 2023 08:07:11 GMT Subject: [crac] Integrated: Close files opened by Decoder before checkpoint In-Reply-To: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> References: <6ZZkAT8Gst8w1pQ-y6FV8wm6Fystcs4Wpy0HzelKc-E=.2c029ec7-8ab7-4767-99a2-0feaab574510@github.com> Message-ID: On Mon, 25 Sep 2023 13:08:24 GMT, Radim Vansa wrote: > Native memory tracking needs to resolve some addresses for stack unwinding and the decoders keep some shared library files open as a cache. Since the decoder instances can be re-allocated anytime this fix just destroys them before a checkpoint. This pull request has now been integrated. Changeset: 1880603b Author: Radim Vansa URL: https://git.openjdk.org/crac/commit/1880603bcb67e88818aa2d12ab8d23067f0fd4d9 Stats: 76 lines in 4 files changed: 76 ins; 0 del; 0 mod Close files opened by Decoder before checkpoint Reviewed-by: akozlov ------------- PR: https://git.openjdk.org/crac/pull/116 From rmarchenko at openjdk.org Fri Oct 27 10:26:08 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Fri, 27 Oct 2023 10:26:08 GMT Subject: [crac] RFR: Changing CRaCResetStartTime flag defaults Message-ID: By the request in this comment https://github.com/openjdk/crac/pull/130/files#r1373722320 ------------- Commit messages: - Changing CRaCResetStartTime flag defaults Changes: https://git.openjdk.org/crac/pull/135/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=135&range=00 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/crac/pull/135.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/135/head:pull/135 PR: https://git.openjdk.org/crac/pull/135 From rmarchenko at openjdk.org Fri Oct 27 12:07:34 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Fri, 27 Oct 2023 12:07:34 GMT Subject: [crac] RFR: Changing CRaCResetStartTime flag defaults [v2] In-Reply-To: References: Message-ID: <6RRWH6xcx99cF_4uMeHFsaWrq2USJNU7g4KWnwiDyfc=.fc7edd6a-231e-47a4-92d6-f899093e516e@github.com> > By the request in this comment https://github.com/openjdk/crac/pull/130/files#r1373722320 Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: Forgot unlocking diag options ------------- Changes: - all: https://git.openjdk.org/crac/pull/135/files - new: https://git.openjdk.org/crac/pull/135/files/b2ed68a2..eb2d3479 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=135&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=135&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/crac/pull/135.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/135/head:pull/135 PR: https://git.openjdk.org/crac/pull/135 From rvansa at openjdk.org Fri Oct 27 13:20:06 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Fri, 27 Oct 2023 13:20:06 GMT Subject: [crac] RFR: Fix CPUFeatures crash on new->old CPU In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 15:09:58 GMT, Jan Kratochvil wrote: > - reproducible on i7-1165G7 "qemu-kvm -cpu host" for a checkpoint and "qemu-kvm -cpu SandyBridge" for its restore src/hotspot/share/runtime/crac.cpp line 526: > 524: // crac_restore_finalize() may terminate the process if we run on (older) CPU where glibc string functions may crash. > 525: // The flag is stored separately as all the code of this function below is difficult to implement without the string functions. > 526: bool IgnoreCPUFeatures_local; In general, please use snake_case for local variables, even though it refers to VM option. Rather than individual 'bits', can we make this at least part of the `header` structure? We could read that one without using any string functions, and then the variable-size rest of the data (which requires string funcs). The size being read can be part of the `header` structure rather than stat-ing the file. Also, there is `read_all` function in `crac_linux.cpp` (I am moving that to a shared place in some PRs) that deals better with `::read` returning only partial result. src/hotspot/share/runtime/crac.cpp line 532: > 530: } > 531: if (!IgnoreCPUFeatures_local) { > 532: VM_Version::crac_restore_finalize(); Is this function idempotent? It's called from `VM_Crac::doit()` as well (and it should be since reading the shm is optional), but could calling it twice cause any trouble? src/hotspot/share/runtime/crac_structs.hpp line 225: > 223: assert(id > 0, "id is expected to be a PID and therefore > 0"); > 224: char *d = _path; > 225: const char prefix[] = "/tmp/cracshm."; Have you investigated any other option for transferring the data? While both `/tmp/` and `/dev/shm` are temp files, using `shm_open` looks like the go-to approach for sharing data. I dislike hardcoding the `/tmp` path, though it might be viable since perfMemory also has it hardcoded? In any case, this code is not linux-specific (this should be implemented in `crac_posix.cpp`, I suppose). ------------- PR Review Comment: https://git.openjdk.org/crac/pull/134#discussion_r1374550270 PR Review Comment: https://git.openjdk.org/crac/pull/134#discussion_r1374553073 PR Review Comment: https://git.openjdk.org/crac/pull/134#discussion_r1374526685 From jkratochvil at openjdk.org Fri Oct 27 13:33:04 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Fri, 27 Oct 2023 13:33:04 GMT Subject: [crac] RFR: Fix CPUFeatures crash on new->old CPU In-Reply-To: References: Message-ID: <2YuvgXzzcgqS9X1RPbB_JRhAKDRm6CdLcUOfSgMNTeo=.5e6d236d-dc6e-46c7-8bb1-7c46753832d2@github.com> On Fri, 27 Oct 2023 12:45:53 GMT, Radim Vansa wrote: >> - reproducible on i7-1165G7 "qemu-kvm -cpu host" for a checkpoint and "qemu-kvm -cpu SandyBridge" for its restore > > src/hotspot/share/runtime/crac_structs.hpp line 225: > >> 223: assert(id > 0, "id is expected to be a PID and therefore > 0"); >> 224: char *d = _path; >> 225: const char prefix[] = "/tmp/cracshm."; > > Have you investigated any other option for transferring the data? While both `/tmp/` and `/dev/shm` are temp files, using `shm_open` looks like the go-to approach for sharing data. I dislike hardcoding the `/tmp` path, though it might be viable since perfMemory also has it hardcoded? In any case, this code is not linux-specific (this should be implemented in `crac_posix.cpp`, I suppose). `shm_open` calls some string functions. So with the current (mis)design of CPUFeatures it cannot be used. One could call directly the syscall but that is too ugly. And then I do not see why to use SHM with the current design. One process creates the object, writes its data, closes the object. Second process opens the object, deletes the object, reads the data, closes the object. That is a normal usage of a tmp file. Shared memory has its use for some simultaneous data exchange but this is not the case so shared memory is overengineered. There is also limited amount of shared memory available so it is wasteful. Linuxes nowadays even have /tmp as tmpfs by default so even no disk operations are involved. I agree "/tmp" should not be in this platform-independent file, I will fix that, thanks. There are more portable functions for temporary files (`mktemp()` et al.) but these all use string functions. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/134#discussion_r1374589002 From jkratochvil at openjdk.org Fri Oct 27 13:50:06 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Fri, 27 Oct 2023 13:50:06 GMT Subject: [crac] RFR: Fix CPUFeatures crash on new->old CPU In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 13:08:25 GMT, Radim Vansa wrote: >> - reproducible on i7-1165G7 "qemu-kvm -cpu host" for a checkpoint and "qemu-kvm -cpu SandyBridge" for its restore > > src/hotspot/share/runtime/crac.cpp line 532: > >> 530: } >> 531: if (!IgnoreCPUFeatures_local) { >> 532: VM_Version::crac_restore_finalize(); > > Is this function idempotent? It's called from `VM_Crac::doit()` as well (and it should be since reading the shm is optional), but could calling it twice cause any trouble? It is but it was just a luck. At least a comment there is wrong. I will improve it, thanks. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/134#discussion_r1374611831 From akozlov at openjdk.org Fri Oct 27 14:36:59 2023 From: akozlov at openjdk.org (Anton Kozlov) Date: Fri, 27 Oct 2023 14:36:59 GMT Subject: [crac] RFR: Changing CRaCResetStartTime flag defaults [v2] In-Reply-To: <6RRWH6xcx99cF_4uMeHFsaWrq2USJNU7g4KWnwiDyfc=.fc7edd6a-231e-47a4-92d6-f899093e516e@github.com> References: <6RRWH6xcx99cF_4uMeHFsaWrq2USJNU7g4KWnwiDyfc=.fc7edd6a-231e-47a4-92d6-f899093e516e@github.com> Message-ID: On Fri, 27 Oct 2023 12:07:34 GMT, Roman Marchenko wrote: >> By the request in this comment https://github.com/openjdk/crac/pull/130/files#r1373722320 > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Forgot unlocking diag options LGTM, thank you! ------------- Marked as reviewed by akozlov (Lead). PR Review: https://git.openjdk.org/crac/pull/135#pullrequestreview-1701947532 From rvansa at openjdk.org Mon Oct 30 08:15:04 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 30 Oct 2023 08:15:04 GMT Subject: [crac] RFR: Persist memory in-JVM [v13] In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 11:59:13 GMT, Radim Vansa wrote: >> This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. >> >> At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. >> >> ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). >> ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. >> >> Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. > > Radim Vansa has updated the pull request incrementally with three additional commits since the last revision: > > - Revert changes not needed anymore in vm_version > - Windows build fix > - Refactored to fix OSX Closing this as we want to consider alternative approach, less invasive to the JVM. ------------- PR Comment: https://git.openjdk.org/crac/pull/95#issuecomment-1784679006 From rvansa at openjdk.org Mon Oct 30 08:15:04 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 30 Oct 2023 08:15:04 GMT Subject: [crac] Withdrawn: Persist memory in-JVM In-Reply-To: References: Message-ID: On Fri, 28 Jul 2023 17:07:22 GMT, Radim Vansa wrote: > This is a WIP for persisting various portions of JVM memory from within the process, rather than leaving that up to C/R engine (such as CRIU). In the future this will enable us to optimize loading (theoretically leading to faster startup), compress and encrypt the data. > > At this moment the implementation of persisting thread stacks is in proof-of-concept shape, ~especially waking up the primordial thread includes some hacks. This could be improved by using a custom (global, ideally robust) futex instead of the internal futex used by `pthread_join`.~ Fix already implemented. > > ~One of the concerns related to thread stacks is rseq used by glibc; without disabling this CRIU would attempt to 'fix' the rseqs (see `fixup_thread_rseq`) and touch the unmapped memory. CRIU uses special ptrace commands to check the status; I am not aware if it is possible to access this information from within the process using any public API.~ Solved. The JVM forks and the child ptraces JVM, recording the rseq info. Once we have the we can unregister the rseq before checkpoint and register it afterwards (here we have the advantage that we know the threads won't be in any critical section as we're in a safepoint). > ~Currently this works with `/proc/sys/kernel/yama/ptrace_scope` set to `0`; we should make it work with `1` (default), too.~ Fixed. > > Regarding persistence implementation, currently we store the memory in multiple files; first block (page-size aligned) contains some validation data and index of memory address - file offsets. The way this is implemented requires the size of index to be known before dumping memory. It might be more convenient (and portable for e.g. network-based storage) to use single file, and either keep the index in the 'fundamental' memory (C heap), put it at the end of file or to another index file. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/crac/pull/95 From rvansa at openjdk.org Mon Oct 30 08:17:07 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 30 Oct 2023 08:17:07 GMT Subject: [crac] Withdrawn: If std descriptor is bound with regular file, redirect it to pseudo terminal before the checkpoint In-Reply-To: References: Message-ID: On Fri, 17 Jun 2022 14:22:37 GMT, Ilarion Nakonechnyy wrote: > CRIU restore fail after Checkpoint stdout to file. > Redirecting a stdout to the pipeline doesn't break the restore. > > A proposed approach forces the close of stdout, stdin, and stderr file descriptors if they are redirected to the files, on checkpoint resources verification. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/crac/pull/24 From jkratochvil at openjdk.org Mon Oct 30 13:26:19 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 30 Oct 2023 13:26:19 GMT Subject: [crac] RFR: Fix CPUFeatures crash on new->old CPU [v2] In-Reply-To: References: Message-ID: <_1A6n-o1iH57xmfl4YtnkSRvVRkgQ-DsHJR_kQJ5h5k=.2385876d-f1ec-4c41-8371-a29ab4d2caa8@github.com> > - reproducible on i7-1165G7 "qemu-kvm -cpu host" for a checkpoint and "qemu-kvm -cpu SandyBridge" for its restore Jan Kratochvil has updated the pull request incrementally with four additional commits since the last revision: - Make _ignore_cpu_features a tri-state. - Fix up double call of VM_Version::crac_restore_finalize(). - found by Randim Vansa - Move IgnoreCPUFeatures_local into the header. - suggested by Radim Vansa - Move CracSHM::_prefix to crac_posix.cpp - suggested by Radim Vansa ------------- Changes: - all: https://git.openjdk.org/crac/pull/134/files - new: https://git.openjdk.org/crac/pull/134/files/d26f972a..f9470978 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=134&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=134&range=00-01 Stats: 56 lines in 5 files changed: 24 ins; 15 del; 17 mod Patch: https://git.openjdk.org/crac/pull/134.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/134/head:pull/134 PR: https://git.openjdk.org/crac/pull/134 From jkratochvil at openjdk.org Mon Oct 30 15:17:58 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 30 Oct 2023 15:17:58 GMT Subject: [crac] RFR: Fix CPUFeatures crash on new->old CPU [v3] In-Reply-To: References: Message-ID: <-i-OQDAsVq682vmEs5n7XOiziSCKUlwYmq_Vseu5sWE=.ceff798d-d630-47cc-a22b-9d3d5edf9fb6@github.com> > - reproducible on i7-1165G7 "qemu-kvm -cpu host" for a checkpoint and "qemu-kvm -cpu SandyBridge" for its restore Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix non-x86* compilation ------------- Changes: - all: https://git.openjdk.org/crac/pull/134/files - new: https://git.openjdk.org/crac/pull/134/files/f9470978..f66a71ae Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=134&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=134&range=01-02 Stats: 14 lines in 9 files changed: 2 ins; 2 del; 10 mod Patch: https://git.openjdk.org/crac/pull/134.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/134/head:pull/134 PR: https://git.openjdk.org/crac/pull/134 From jkratochvil at openjdk.org Mon Oct 30 15:40:53 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 30 Oct 2023 15:40:53 GMT Subject: [crac] RFR: Fix CPUFeatures crash on new->old CPU [v4] In-Reply-To: References: Message-ID: > - reproducible on i7-1165G7 "qemu-kvm -cpu host" for a checkpoint and "qemu-kvm -cpu SandyBridge" for its restore Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix MSVC compilation ------------- Changes: - all: https://git.openjdk.org/crac/pull/134/files - new: https://git.openjdk.org/crac/pull/134/files/f66a71ae..1b24fed5 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=134&range=03 - incr: https://webrevs.openjdk.org/?repo=crac&pr=134&range=02-03 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/crac/pull/134.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/134/head:pull/134 PR: https://git.openjdk.org/crac/pull/134 From jkratochvil at openjdk.org Tue Oct 31 00:13:23 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 31 Oct 2023 00:13:23 GMT Subject: [crac] RFR: Fix CPUFeatures crash on new->old CPU [v5] In-Reply-To: References: Message-ID: <2i-sgzbKFF7PDRW4_LAJsYGG1dlEZNPtHTDT28pDkBQ=.5dc6b33d-375e-4c61-b497-7e9c8318d7fd@github.com> > - reproducible on i7-1165G7 "qemu-kvm -cpu host" for a checkpoint and "qemu-kvm -cpu SandyBridge" for its restore Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix github compilation signedness problem ------------- Changes: - all: https://git.openjdk.org/crac/pull/134/files - new: https://git.openjdk.org/crac/pull/134/files/1b24fed5..c1983089 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=134&range=04 - incr: https://webrevs.openjdk.org/?repo=crac&pr=134&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/crac/pull/134.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/134/head:pull/134 PR: https://git.openjdk.org/crac/pull/134 From jkratochvil at openjdk.org Tue Oct 31 07:03:25 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 31 Oct 2023 07:03:25 GMT Subject: [crac] RFR: Compile all the INCLUDE_LD_SO_LIST_DIAGNOSTICS code unconditionally Message-ID: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> Make INCLUDE_LD_SO_LIST_DIAGNOSTICS an `if` conditional instead of a `#if` one. Fix compilation failures. ------------- Commit messages: - Compile all the INCLUDE_LD_SO_LIST_DIAGNOSTICS code unconditionally Changes: https://git.openjdk.org/crac/pull/136/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=136&range=00 Stats: 91 lines in 1 file changed: 12 ins; 13 del; 66 mod Patch: https://git.openjdk.org/crac/pull/136.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/136/head:pull/136 PR: https://git.openjdk.org/crac/pull/136 From rvansa at openjdk.org Tue Oct 31 07:03:28 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 31 Oct 2023 07:03:28 GMT Subject: [crac] RFR: Compile all the INCLUDE_LD_SO_LIST_DIAGNOSTICS code unconditionally In-Reply-To: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> References: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> Message-ID: On Tue, 31 Oct 2023 06:28:04 GMT, Jan Kratochvil wrote: > Make INCLUDE_LD_SO_LIST_DIAGNOSTICS an `if` conditional instead of a `#if` one. > Fix compilation failures. While I like the certainty that the code is always compilable, I wonder if handling optional features like this (using macros in conditions) is a pattern that is used elsewhere in the JDK? src/hotspot/cpu/x86/vm_version_x86.cpp line 856: > 854: } > 855: > 856: //#if INCLUDE_LD_SO_LIST_DIAGNOSTICS Please remove commented out code. src/hotspot/cpu/x86/vm_version_x86.cpp line 944: > 942: const char *env = getenv(TUNABLES_NAME); > 943: if (env && strcmp(env, env_val) == 0) { > 944: if (!INCLUDE_CPU_FEATURE_ACTIVE && !INCLUDE_LD_SO_LIST_DIAGNOSTICS) { Have you tested compilation with these disabled? Looks to me like the compiler could complain about unreachable code when the condition is always false... ------------- PR Review: https://git.openjdk.org/crac/pull/136#pullrequestreview-1705554187 PR Review Comment: https://git.openjdk.org/crac/pull/136#discussion_r1377123401 PR Review Comment: https://git.openjdk.org/crac/pull/136#discussion_r1377126065 From jkratochvil at openjdk.org Tue Oct 31 07:07:58 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 31 Oct 2023 07:07:58 GMT Subject: [crac] RFR: Compile all the INCLUDE_LD_SO_LIST_DIAGNOSTICS code unconditionally In-Reply-To: References: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> Message-ID: <-CVBLLzC2LgrvDHYxmrcMW7PbQ3h599J1HraCYvVIqs=.6556bc9f-f743-4195-8c07-1a216be6bb64@github.com> On Tue, 31 Oct 2023 06:52:47 GMT, Radim Vansa wrote: >> Make INCLUDE_LD_SO_LIST_DIAGNOSTICS an `if` conditional instead of a `#if` one. >> Fix compilation failures. > > src/hotspot/cpu/x86/vm_version_x86.cpp line 856: > >> 854: } >> 855: >> 856: //#if INCLUDE_LD_SO_LIST_DIAGNOSTICS > > Please remove commented out code. I wanted to indicate this function is used only for `INCLUDE_LD_SO_LIST_DIAGNOSTICS`. I can also put there a full sentence. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/136#discussion_r1377132708 From rvansa at openjdk.org Tue Oct 31 07:14:57 2023 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 31 Oct 2023 07:14:57 GMT Subject: [crac] RFR: Changing CRaCResetStartTime flag defaults [v2] In-Reply-To: <6RRWH6xcx99cF_4uMeHFsaWrq2USJNU7g4KWnwiDyfc=.fc7edd6a-231e-47a4-92d6-f899093e516e@github.com> References: <6RRWH6xcx99cF_4uMeHFsaWrq2USJNU7g4KWnwiDyfc=.fc7edd6a-231e-47a4-92d6-f899093e516e@github.com> Message-ID: On Fri, 27 Oct 2023 12:07:34 GMT, Roman Marchenko wrote: >> By the request in this comment https://github.com/openjdk/crac/pull/130/files#r1373722320 > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Forgot unlocking diag options @wkia Sorry to chime in rather late, but now it seems that this flag can be set only before checkpoint; why is that? Is there an issue in making this both diagnostic and restore-settable? (I guess that `UnlockDiagnosticOptions` might need to become restore-settable, too, but I don't see an issue with that) ------------- PR Comment: https://git.openjdk.org/crac/pull/135#issuecomment-1786589857 From jkratochvil at openjdk.org Tue Oct 31 07:40:22 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 31 Oct 2023 07:40:22 GMT Subject: [crac] RFR: Compile all the INCLUDE_LD_SO_LIST_DIAGNOSTICS code unconditionally [v2] In-Reply-To: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> References: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> Message-ID: > Make INCLUDE_LD_SO_LIST_DIAGNOSTICS an `if` conditional instead of a `#if` one. > Fix compilation failures. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix compilation errors during all the 3 configurations: #undef INCLUDE_CPU_FEATURE_ACTIVE #undef INCLUDE_LD_SO_LIST_DIAGNOSTICS #define INCLUDE_CPU_FEATURE_ACTIVE 0/1 #define INCLUDE_LD_SO_LIST_DIAGNOSTICS 0/1 - but never 1 for both ------------- Changes: - all: https://git.openjdk.org/crac/pull/136/files - new: https://git.openjdk.org/crac/pull/136/files/0b4f6692..c1769a92 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=136&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=136&range=00-01 Stats: 69 lines in 2 files changed: 22 ins; 27 del; 20 mod Patch: https://git.openjdk.org/crac/pull/136.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/136/head:pull/136 PR: https://git.openjdk.org/crac/pull/136 From jkratochvil at openjdk.org Tue Oct 31 07:40:25 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 31 Oct 2023 07:40:25 GMT Subject: [crac] RFR: Compile all the INCLUDE_LD_SO_LIST_DIAGNOSTICS code unconditionally [v2] In-Reply-To: References: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> Message-ID: On Tue, 31 Oct 2023 06:56:41 GMT, Radim Vansa wrote: >> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation errors during all the 3 configurations: >> #undef INCLUDE_CPU_FEATURE_ACTIVE >> #undef INCLUDE_LD_SO_LIST_DIAGNOSTICS >> #define INCLUDE_CPU_FEATURE_ACTIVE 0/1 >> #define INCLUDE_LD_SO_LIST_DIAGNOSTICS 0/1 >> - but never 1 for both > > src/hotspot/cpu/x86/vm_version_x86.cpp line 944: > >> 942: const char *env = getenv(TUNABLES_NAME); >> 943: if (env && strcmp(env, env_val) == 0) { >> 944: if (!INCLUDE_CPU_FEATURE_ACTIVE && !INCLUDE_LD_SO_LIST_DIAGNOSTICS) { > > Have you tested compilation with these disabled? Looks to me like the compiler could complain about unreachable code when the condition is always false... I have tried all the 3 compilation cases only now. I do not see such compiler warnings (although different compilations and compilers produce different warnings) but I could fix some other problems. Which will make me update #112. ------------- PR Review Comment: https://git.openjdk.org/crac/pull/136#discussion_r1377162882 From jkratochvil at openjdk.org Tue Oct 31 08:02:19 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 31 Oct 2023 08:02:19 GMT Subject: [crac] RFR: Compile all the INCLUDE_LD_SO_LIST_DIAGNOSTICS code unconditionally [v3] In-Reply-To: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> References: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> Message-ID: > Make INCLUDE_LD_SO_LIST_DIAGNOSTICS an `if` conditional instead of a `#if` one. > Fix compilation failures. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: More compilation cases fixes ------------- Changes: - all: https://git.openjdk.org/crac/pull/136/files - new: https://git.openjdk.org/crac/pull/136/files/c1769a92..f628f71f Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=136&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=136&range=01-02 Stats: 9 lines in 1 file changed: 1 ins; 0 del; 8 mod Patch: https://git.openjdk.org/crac/pull/136.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/136/head:pull/136 PR: https://git.openjdk.org/crac/pull/136 From rmarchenko at openjdk.org Tue Oct 31 08:44:00 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 31 Oct 2023 08:44:00 GMT Subject: [crac] RFR: Changing CRaCResetStartTime flag defaults [v2] In-Reply-To: References: <6RRWH6xcx99cF_4uMeHFsaWrq2USJNU7g4KWnwiDyfc=.fc7edd6a-231e-47a4-92d6-f899093e516e@github.com> Message-ID: On Tue, 31 Oct 2023 07:12:07 GMT, Radim Vansa wrote: >> Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: >> >> Forgot unlocking diag options > > @wkia Sorry to chime in rather late, but now it seems that this flag can be set only before checkpoint; why is that? Is there an issue in making this both diagnostic and restore-settable? (I guess that `UnlockDiagnosticOptions` might need to become restore-settable, too, but I don't see an issue with that) @rvansa You're right, thanks. Need to be reviewed by @AntonKozlov also. Should we change `UnlockDiagnosticOptions` to `RESTORE_SETTABLE` as well? If not, `CRaCResetStartTime` will be available on checkpoint only as a diagnostic option. ------------- PR Comment: https://git.openjdk.org/crac/pull/135#issuecomment-1786752477 From rmarchenko at openjdk.org Tue Oct 31 09:19:23 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 31 Oct 2023 09:19:23 GMT Subject: [crac] RFR: Changing CRaCResetStartTime flag defaults [v3] In-Reply-To: References: Message-ID: > By the request in this comment https://github.com/openjdk/crac/pull/130/files#r1373722320 Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: Fixing review comments ------------- Changes: - all: https://git.openjdk.org/crac/pull/135/files - new: https://git.openjdk.org/crac/pull/135/files/eb2d3479..69e18582 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=135&range=02 - incr: https://webrevs.openjdk.org/?repo=crac&pr=135&range=01-02 Stats: 10 lines in 2 files changed: 3 ins; 2 del; 5 mod Patch: https://git.openjdk.org/crac/pull/135.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/135/head:pull/135 PR: https://git.openjdk.org/crac/pull/135 From jkratochvil at openjdk.org Tue Oct 31 10:23:14 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 31 Oct 2023 10:23:14 GMT Subject: [crac] RFR: Compile all the INCLUDE_LD_SO_LIST_DIAGNOSTICS code unconditionally [v4] In-Reply-To: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> References: <2mqoLQRftHj0yP4KYkOY-G6s1zFq2pu4ozfB5zpBAq8=.b935db48-4bae-48c1-84bd-841486ab121c@github.com> Message-ID: > Make INCLUDE_LD_SO_LIST_DIAGNOSTICS an `if` conditional instead of a `#if` one. > Fix compilation failures. Jan Kratochvil has updated the pull request incrementally with two additional commits since the last revision: - Fix compilation on non-Linux OSes - Fix more compilation problems ------------- Changes: - all: https://git.openjdk.org/crac/pull/136/files - new: https://git.openjdk.org/crac/pull/136/files/f628f71f..ac5191c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=136&range=03 - incr: https://webrevs.openjdk.org/?repo=crac&pr=136&range=02-03 Stats: 24 lines in 2 files changed: 9 ins; 5 del; 10 mod Patch: https://git.openjdk.org/crac/pull/136.diff Fetch: git fetch https://git.openjdk.org/crac.git pull/136/head:pull/136 PR: https://git.openjdk.org/crac/pull/136