[crac] RFR: Wait until G1 GC has finished before creating a snapshot. [v8]

Anton Kozlov akozlov at openjdk.org
Fri Sep 1 14:54:25 UTC 2023


On Sun, 20 Aug 2023 18:33:24 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

>> @rvansa did report his snapshots are about 2x-3x bigger than they should be. He then also found it only happens if the snapshot is done too quickly after GC should have been run.
>> 
>> One can reproduce the race case by:
>> 
>> --- a/src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp
>> +++ b/src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp
>> @@ -35,7 +35,7 @@ class G1UncommitRegionTask : public G1ServiceTask {
>>    // is short, while still making reasonable progress.
>>    static const uint UncommitSizeLimit = 128 * M;
>>    // Initial delay in milliseconds after GC before the regions are uncommitted.
>> -  static const uint UncommitInitialDelayMs = 100;
>> +  static const uint UncommitInitialDelayMs = 10*1000;
>>    // The delay between two uncommit task executions.
>>    static const uint UncommitTaskDelayMs = 10;
>
> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits:
> 
>  - Merge branch 'crac' into crac-g1gcwait
>  - Merge branch 'crac' into crac-g1gcwait
>  - Do not use synchronization, directly unmap it.
>  - Merge branch 'crac' into crac-g1gcwait
>  - Make -active volatile.
>     - bugreported by Sergey Nazarkin
>  - Remove accidental change of UncommitInitialDelayMs.
>     - bugreported by Radim Vansa
>  - Add an assertion at least for !is_ConcurrentGC_thread().
>  - Remove calling thread state assertions for wait_for_collection_finish().
>  - Add a generic virtual method wait_for_collection_finish().
>     - based on a review by Radim Vansa.
>    Assert calling thread state for wait_for_collection_finish().
>     - based on a review by Radim Vansa.
>  - Wait until G1 GC has finished before creating a snapshot.

> > To be sure, have you tested the change in a debug build with asserts enabled?
> 
> I do run a simple Linux x86_64 [fastdebug] snapshot+restore which does work. 

Great, thanks!

> As the full regression test takes me 10 hours I just rely on GHA. But it is true I check GHA only very briefly for a massive failure. Few unexplainable failures are always there so I ignore them. In this case I see some crac x86 (32-bit) failures.

I can confirm this PR does not regress GHA checks. So they don't block the integration, so I propose to proceed.

Among the failures, I'm concerned about jdk tier1 failures. We'd better to look at them.

-------------

PR Comment: https://git.openjdk.org/crac/pull/93#issuecomment-1702879300


More information about the crac-dev mailing list