From duke at openjdk.org Mon Jan 30 03:30:52 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Mon, 30 Jan 2023 03:30:52 GMT Subject: [crac] RFR: RFC: -XX:CPUFeatures=0xnumber for CPU migration Message-ID: Currently if you `-XX:CRaCCheckpointTo` on a better CPU and `-XX:CRaCRestoreFrom` on a worse CPU the restored OpenJDK will crash. 1. An obvious reason is that JIT-compiled code is using CPU features not implemented on the CPU where the image is restored. 2. A second reason is that glibc has a similar problem, its PLT entries point to CPU optimized functions also crashing on the worse CPU. https://sourceware.org/glibc/wiki/GNU_IFUNC (1) could be solved somehow automatically by deoptimizing and re-JITing all the JIT code. But that would defeat the performance goal of restoring a ready image in the first place. Therefore there had to be implemented a new OpenJDK option: > use -XX:CPUFeatures=0xnumber with -XX:CRaCCheckpointTo when you get an error during -XX:CRaCRestoreFrom on a different machine It is intended to specify the lowest common denominator of all CPUs in a farm. Instead of a possible crash of OpenJDK it will now refuse to run: > Error occurred during initialization of VM > You have to specify -XX:CPUFeatures=0x421801fcfbd7 during -XX:CRaCCheckpointTo making of the checkpoint; specified -XX:CRaCRestoreFrom file contains CPU features 0x7fff9dfcfbf7; this machine's CPU features are 0x421801fcfbd7; missing features of this CPU are 0x3de79c000020 = 3dnowpref, adx, avx512f, avx512dq, avx512cd, avx512bw, avx512vl, sha, avx512_vpopcntdq, avx512_vpclmulqdq, avx512_vaes, avx512_vnni, clflushopt, clwb, avx512_vbmi2, avx512_vbmi (2) has been implemented according to Anton Kozlov's idea that glibc can just reset its IFUNC PLT entries any time later (after restore), not just during the first initialization of glibc. That has currently a problem that it has turned out to be very invasive into private glibc structures. It could work somehow with glibc debuginfo (*-debuginfo.rpm or *-dbg.deb) installed but that has been considered as unacceptable requirement just to run CRaC. Therefore I have provided this proof of concept while I will propose such feature for glibc upstream where it is sure easily implementable. If upstream glibc maintainers do not like the IFUNC reset idea then I do not think this hacky IFUNC reset patching many glibc internal data structures is a good way forward for a 3rd party implementation like CRaC/OpenJDK. In such case I believe one should switch to using GLIBC_TUNABLES environment variable, re-execing OpenJDK after converting the `-XX:CPUFeatures` OpenJDK format into glibc GLIBC_TUNABLES format. Unfortunately there is a precedent OpenJDK upstream has already rejected such re-exec idea in the past: https://github.com/openjdk/crac/pull/31#issuecomment-1275707621 That IMO does not preclude trying the same for this case. - Debian 11 x86_64: It does not work, glibc is too different and inlined there. - Debian 12 x86_64: It works even without libc6-dbg as its offsets are the default. - Fedora 36 x86_64: It works as on Fedoras glibc debuginfo is embedded. ------------- Commit messages: - whitespaces - ifunc: Fix GDB parsing. - +ifunc comment - +linux_ifunc_fetch_offsets() - Remove a needless workaround. - Adjust version dependent #defines. - Fixes for debian12. - 89757be5: - -deoptimization - +Freeze tasks - ... and 27 more: https://git.openjdk.org/crac/compare/65e0785e...328e6b5d Changes: https://git.openjdk.org/crac/pull/41/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=41&range=00 Stats: 1045 lines in 17 files changed: 1034 ins; 2 del; 9 mod Patch: https://git.openjdk.org/crac/pull/41.diff Fetch: git fetch https://git.openjdk.org/crac pull/41/head:pull/41 PR: https://git.openjdk.org/crac/pull/41 From rmarchenko at openjdk.org Tue Jan 31 07:26:38 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 31 Jan 2023 07:26:38 GMT Subject: [crac] RFR: Environment vars propagation into restored process In-Reply-To: References: <4bABOyN_ecZeOziPY8Wsmfbco0VPwgYWNiD4IZFSntA=.3ca718e6-f1db-469e-a466-b55fe9ebde17@github.com> Message-ID: On Mon, 3 Oct 2022 15:21:05 GMT, Dan Heidinga wrote: > We should also consider doing something similar to the OpenJ9 approach where we restrict the set of env vars available prior to the checkpoint (minimize the accidental use of checkpoint env), and limit the env var changes to only add new env vars (no inconsistencies). This got them a long ways in their work with Liberty though they did find it necessary to eventually support overriding some env vars. Recently I realized that if users don't like some variables to be propagated into a restored process, they can prepare their environment as they want before a restore by setting and/or unsetting any environment variables they want. So there is no need to implement additional solutions in CRaC. I've extended the test with the example (#42). ------------- PR: https://git.openjdk.org/crac/pull/30 From rmarchenko at openjdk.org Tue Jan 31 07:30:23 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 31 Jan 2023 07:30:23 GMT Subject: [crac] RFR: RestoreEnvironmentTest refactoring Message-ID: The test was extended with the example to illustrate the scenario when an user don't like to propagate some of environment variables into a restored process (see `RESTORE_ENVIRONMENT_TEST_VAR2` in `RestoreEnvironmentTest.sh`). See the initial discussion here #30 ------------- Commit messages: - RestoreEnvironmentTest is extended with an example Changes: https://git.openjdk.org/crac/pull/42/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=42&range=00 Stats: 10 lines in 2 files changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/crac/pull/42.diff Fetch: git fetch https://git.openjdk.org/crac pull/42/head:pull/42 PR: https://git.openjdk.org/crac/pull/42 From duke at openjdk.org Tue Jan 31 10:34:43 2023 From: duke at openjdk.org (Radim Vansa) Date: Tue, 31 Jan 2023 10:34:43 GMT Subject: [crac] RFR: Improved open file descriptors tracking Message-ID: Tracks `java.io.FileDescriptor` instances as CRaC resource; before checkpoint these are reported and if not allow-listed (e.g. as opened as standard descriptors) an exception is thrown. Further investigation can use system property `jdk.crac.collect-fd-stacktraces=true` to record origin of those file descriptors. File descriptors claimed in Java code are passed to native; native code checks all open file descriptors and reports error if there's an unexpected FD that is not included in the list passed previously. ------------- Commit messages: - Avoid claiming invalid FileDescriptor - Whitelist RandomAccessFile opening classpath files - Add tracking of FD origin - Track FileDescriptors closed by NIO - Track native FDs from EPoll - Ignore FileDescriptors closed externally - Better FD, does not work - Finish claimFD - WIP - WIP - ... and 1 more: https://git.openjdk.org/crac/compare/65e0785e...432c3f73 Changes: https://git.openjdk.org/crac/pull/43/files Webrev: https://webrevs.openjdk.org/?repo=crac&pr=43&range=00 Stats: 602 lines in 26 files changed: 361 ins; 204 del; 37 mod Patch: https://git.openjdk.org/crac/pull/43.diff Fetch: git fetch https://git.openjdk.org/crac pull/43/head:pull/43 PR: https://git.openjdk.org/crac/pull/43 From duke at openjdk.org Tue Jan 31 17:11:24 2023 From: duke at openjdk.org (Radim Vansa) Date: Tue, 31 Jan 2023 17:11:24 GMT Subject: [crac] RFR: Improved open file descriptors tracking [v2] In-Reply-To: References: Message-ID: > Tracks `java.io.FileDescriptor` instances as CRaC resource; before checkpoint these are reported and if not allow-listed (e.g. as opened as standard descriptors) an exception is thrown. Further investigation can use system property `jdk.crac.collect-fd-stacktraces=true` to record origin of those file descriptors. > File descriptors claimed in Java code are passed to native; native code checks all open file descriptors and reports error if there's an unexpected FD that is not included in the list passed previously. Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Drop native FDs tracking ------------- Changes: - all: https://git.openjdk.org/crac/pull/43/files - new: https://git.openjdk.org/crac/pull/43/files/432c3f73..91b223ef Webrevs: - full: https://webrevs.openjdk.org/?repo=crac&pr=43&range=01 - incr: https://webrevs.openjdk.org/?repo=crac&pr=43&range=00-01 Stats: 93 lines in 6 files changed: 0 ins; 89 del; 4 mod Patch: https://git.openjdk.org/crac/pull/43.diff Fetch: git fetch https://git.openjdk.org/crac pull/43/head:pull/43 PR: https://git.openjdk.org/crac/pull/43