[crac] RFR: 8355974: [CRaC] Move CPUFeatures verification to the parent process of JVM [v16]
Timofei Pushkin
tpushkin at openjdk.org
Mon May 19 08:21:23 UTC 2025
On Fri, 16 May 2025 12:43:58 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:
>> There was originally a mistake:
>> - restoring JVM did restore the image
>> - the restored JVM started checking whether CPU Features of the new host >= CPU Features of the checkpoint host
>>
>> That is difficult as glibc is already configured (IFUNC) in the image for the checkpoint host and calling any such glibc functions in the restored image will crash (as the advanced instructions from misconfigured IFUNC are not available). Some glibc functions had to be reimplemented in a dummy way inside JVM due to this misdesign.
>>
>> This patch changes it to:
>> - restoring JVM checks `cpufeatures` user data in the image against current CPU Features
>> - the restored JVM is started only if the CPU Features are satisfied, restored JVM no longer has to verify anything
>>
>> The patch is a bit of a kitchen sink, there are various improvements of the CPU Features code.
>
> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision:
>
> Wrap cpufeatures_store() by VM_Version::ignore_cpu_features()
The runtime part mostly LGTM given that we've decided to address cleanup of argument parsing and CPU features ignoring in separate patches. The other parts I've reviewed only briefly.
src/hotspot/cpu/x86/vm_version_x86.cpp line 2613:
> 2611:
> 2612: bool VM_Version::cpu_features_binary_check(const CPUFeaturesBinary *data) {
> 2613: ResourceMark rm;
Is this `ResourceMark` still needed? I can't see where it might be used.
src/hotspot/share/runtime/crac.cpp line 123:
> 121: VM_Version::CPUFeaturesBinary data;
> 122: if (!VM_Version::cpu_features_binary(&data)) {
> 123: // This backend does not use CPUFeatures. That is OK.
In [my suggestion](https://github.com/openjdk/crac/pull/227#discussion_r2087372826) I placed this check before `_engine->prepare_user_data_api()` because if the backend doesn't support CPU features there is no point in querying the engine for its support (and possibly erroring on `CracEngine::ApiStatus::ERR`): we would ask the user to fix the engine issue only for them to find out their backend cannot use this feature anyway.
But since this can be bypassed by ignoring CPU features I'm OK with postponing fixing this together with that.
test/jdk/jdk/crac/CPUFeatures/CPUFeatures.sh line 36:
> 34: qemuimgurl=https://download.fedoraproject.org/pub/fedora/linux/releases/41/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-1.4.x86_64.qcow2
> 35: qemuimgsumurl=https://download.fedoraproject.org/pub/fedora/linux/releases/41/Cloud/x86_64/images/Fedora-Cloud-41-1.4-x86_64-CHECKSUM
> 36: # FIXME: warp+criu need an update for new kernels:
OpenJDK only has CRIU, so it was decided not to have any mentions of other engines and their exclusive features in OpenJDK codebase. Could you please remove the mentions here and in other places?
-------------
PR Review: https://git.openjdk.org/crac/pull/227#pullrequestreview-2849746499
PR Review Comment: https://git.openjdk.org/crac/pull/227#discussion_r2095058575
PR Review Comment: https://git.openjdk.org/crac/pull/227#discussion_r2095066392
PR Review Comment: https://git.openjdk.org/crac/pull/227#discussion_r2095116881
More information about the crac-dev
mailing list