Call for Discussion: New Project: CRaC

Michael Bien mbien42 at gmail.com
Thu Jul 22 20:51:53 UTC 2021


On 22.07.21 21:17, Anton Kozlov wrote:
>> - How to make the JVM/JDK behave gracefully after "time-jumps".
>
> I assume there should be no correctness problems, as the time-jump 
> does not
> substantially differ from a time spent off-CPU due to OS scheduling.  
> Some
> internal counters could overflow, but this does not look more than 
> just a bug
> that needs fixing. 
this might certainly cause some interesting issues, e.g GC ergonomics 
getting confused after thinking the last pause lasted 5 days :)

That is another aspect why I believe the only way to properly implement 
this is with cooperation of the JVM. CRIU via panama was nice for 
experiments but it would never be reliable.

>
> However, I saw cases when CRIU did restore monotonic clock that broke 
> timed
> waits, causing 100% of CPU loaded with an improper time limit. After not
> restoring the clock completely, the issue has gone away.  That brought 
> us again
> to the time jump, which was correctly handled.

if we are thinking of the same bug, this was fixed in linux 5.10 
(https://lkml.org/lkml/2020/10/15/582 ) - possibly also backported. 
After 5.10 I never encountered 100% load after restoring JVMs again.

-michael



More information about the discuss mailing list