[crac] RFR: Wait until G1 GC has finished before creating a snapshot. [v8]
Anton Kozlov
akozlov at openjdk.org
Fri Sep 1 14:54:25 UTC 2023
On Sun, 20 Aug 2023 18:33:24 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:
>> @rvansa did report his snapshots are about 2x-3x bigger than they should be. He then also found it only happens if the snapshot is done too quickly after GC should have been run.
>>
>> One can reproduce the race case by:
>>
>> --- a/src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp
>> +++ b/src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp
>> @@ -35,7 +35,7 @@ class G1UncommitRegionTask : public G1ServiceTask {
>> // is short, while still making reasonable progress.
>> static const uint UncommitSizeLimit = 128 * M;
>> // Initial delay in milliseconds after GC before the regions are uncommitted.
>> - static const uint UncommitInitialDelayMs = 100;
>> + static const uint UncommitInitialDelayMs = 10*1000;
>> // The delay between two uncommit task executions.
>> static const uint UncommitTaskDelayMs = 10;
>
> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits:
>
> - Merge branch 'crac' into crac-g1gcwait
> - Merge branch 'crac' into crac-g1gcwait
> - Do not use synchronization, directly unmap it.
> - Merge branch 'crac' into crac-g1gcwait
> - Make -active volatile.
> - bugreported by Sergey Nazarkin
> - Remove accidental change of UncommitInitialDelayMs.
> - bugreported by Radim Vansa
> - Add an assertion at least for !is_ConcurrentGC_thread().
> - Remove calling thread state assertions for wait_for_collection_finish().
> - Add a generic virtual method wait_for_collection_finish().
> - based on a review by Radim Vansa.
> Assert calling thread state for wait_for_collection_finish().
> - based on a review by Radim Vansa.
> - Wait until G1 GC has finished before creating a snapshot.
> > To be sure, have you tested the change in a debug build with asserts enabled?
>
> I do run a simple Linux x86_64 [fastdebug] snapshot+restore which does work.
Great, thanks!
> As the full regression test takes me 10 hours I just rely on GHA. But it is true I check GHA only very briefly for a massive failure. Few unexplainable failures are always there so I ignore them. In this case I see some crac x86 (32-bit) failures.
I can confirm this PR does not regress GHA checks. So they don't block the integration, so I propose to proceed.
Among the failures, I'm concerned about jdk tier1 failures. We'd better to look at them.
-------------
PR Comment: https://git.openjdk.org/crac/pull/93#issuecomment-1702879300
More information about the crac-dev
mailing list