[crac] RFR: 8355974: [CRaC] Move CPUFeatures verification to the parent process of JVM [v5]

Timofei Pushkin tpushkin at openjdk.org
Wed Apr 30 13:55:58 UTC 2025


On Wed, 30 Apr 2025 12:40:16 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

>> There was originally a mistake:
>> - restoring JVM did restore the image
>> - the restored JVM started checking whether CPU Features of the new host >= CPU Features of the checkpoint host
>> 
>> That is difficult as glibc is already configured (IFUNC) in the image for the checkpoint host and calling any such glibc functions in the restored image will crash (as the advanced instructions from misconfigured IFUNC are not available). Some glibc functions had to be reimplemented in a dummy way inside JVM due to this misdesign.
>> 
>> This patch changes it to:
>> - restoring JVM checks `cpufeatures` user data in the image against current CPU Features
>> - the restored JVM is started only if the CPU Features are satisfied, restored JVM no longer has to verify anything
>> 
>> The patch is a bit of a kitchen sink, there are various improvements of the CPU Features code.
>
> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Refactor the usage of prepare_user_data_api()

src/hotspot/share/runtime/crac_engine.cpp line 473:

> 471:     datap = nullptr;
> 472:   }
> 473:   if (!VM_Version::cpu_features_binary_check(datap)) {

>From the architectural point of view, I don't like that the checking is performed inside `CracEngine` because the class represents the engine API (originally it just encapsulated handles for lib, APIs, conf to have RAII). When calling `crac_engine.cpufeatures_check()` it's like we are asking the engine to check the features but this is not what the engine itself is doing.

I would suggest:
- `CracEngine::cpufeatures_store` receives a pre-filled `VM_Version::CPUFeaturesBinary` and just stores it.
- `CracEngine::cpufeatures_load` loads `VM_Version::CPUFeaturesBinary`, validates the size and not-null-ness and returns a copy of it (copying is to be able to destroy the user data).

This is just a suggestion, feel free to not implement this if you don't like it.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/227#discussion_r2068716468


More information about the crac-dev mailing list