[crac] RFR: 8371337: [CRaC] Fastdebug build fails when calling OperatingSystemMxBean.getProcessCpuLoad after restore

Radim Vansa rvansa at openjdk.org
Fri Nov 14 07:02:08 UTC 2025


On Wed, 12 Nov 2025 08:23:53 GMT, Timofei Pushkin <tpushkin at openjdk.org> wrote:

>> A JVM that executed OperatingSystemImpl.getProcessCpuLoad()  before checkpoint can fail with assertion failure after restore with:
>> 
>> java: /home/rvansa/work/zulu/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c:291: get_cpuload_internal: Assertion `pticks->usedKernel >= tmp.usedKernel' failed.
>> 
>> This is an assertion failure, therefore failing only in debug builds, and providing probably a non-sense value in release builds. We should remove the assertion and return a negative value (documented as value for ‘unavailable’) if this is detected.
>
> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c line 290:
> 
>> 288:         // After restoring with CRaC the new process can appear 'younger'
>> 289:         // than last value in counters - we will return -1 (unavailable).
>> 290:         if (!failed && pticks->usedKernel >= tmp.usedKernel) {
> 
> I haven't read the precise definitions of `used`, `usedKernel`, `total`, is checking only `usedKernel` enough to guarantee the comparison is also valid for the other two values? Shouldn't we check both `used` and `usedKernel` (assuming `total` is exactly their sum, which may not be the case) in case only one of them has gone backwards?

The whole check is quite indeterministic: if we consider this is called at arbitrary point, the value of `usedKernel` can be just right and the results would not be correct anyway. I think that it's important that at the end the result is sanitized to a safe range 0 - 1.
If we consider a more realistic use case, e.g. calling this every 5 seconds, I would expect that before checkpoint the application spent significantly more time before checkpoint than it did until the first hit after restore, so this will detect potentially invalid result.
My main motivation was to avoid the crash in debug builds. We could perfect this by adding a hook that will reset the values during C/R but I would opt for a minimalistic change.

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/274#discussion_r2526033290


More information about the crac-dev mailing list