[crac] RFR: 8371337: [CRaC] Fastdebug build fails when calling OperatingSystemMxBean.getProcessCpuLoad after restore
Timofei Pushkin
tpushkin at openjdk.org
Fri Nov 14 08:23:41 UTC 2025
On Fri, 14 Nov 2025 06:59:00 GMT, Radim Vansa <rvansa at openjdk.org> wrote:
>> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c line 290:
>>
>>> 288: // After restoring with CRaC the new process can appear 'younger'
>>> 289: // than last value in counters - we will return -1 (unavailable).
>>> 290: if (!failed && pticks->usedKernel >= tmp.usedKernel) {
>>
>> I haven't read the precise definitions of `used`, `usedKernel`, `total`, is checking only `usedKernel` enough to guarantee the comparison is also valid for the other two values? Shouldn't we check both `used` and `usedKernel` (assuming `total` is exactly their sum, which may not be the case) in case only one of them has gone backwards?
>
> The whole check is quite indeterministic: if we consider this is called at arbitrary point, the value of `usedKernel` can be just right and the results would not be correct anyway. I think that it's important that at the end the result is sanitized to a safe range 0 - 1.
> If we consider a more realistic use case, e.g. calling this every 5 seconds, I would expect that before checkpoint the application spent significantly more time before checkpoint than it did until the first hit after restore, so this will detect potentially invalid result.
> My main motivation was to avoid the crash in debug builds. We could perfect this by adding a hook that will reset the values during C/R but I would opt for a minimalistic change.
To me it looks like we should either check all three `used*` variables (because they are not fully dependent and the checks are cheap) or not check any of them (because checking does not guarantee correctness anyway)
-------------
PR Review Comment: https://git.openjdk.org/crac/pull/274#discussion_r2526406827
More information about the crac-dev
mailing list