[crac] RFR: Wait until G1 GC has finished before creating a snapshot. [v3]

Fri Jul 28 12:26:40 UTC 2023

On Thu, 27 Jul 2023 13:00:15 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

>> @rvansa did report his snapshots are about 2x-3x bigger than they should be. He then also found it only happens if the snapshot is done too quickly after GC should have been run.
>> 
>> One can reproduce the race case by:
>> 
>> --- a/src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp
>> +++ b/src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp
>> @@ -35,7 +35,7 @@ class G1UncommitRegionTask : public G1ServiceTask {
>>    // is short, while still making reasonable progress.
>>    static const uint UncommitSizeLimit = 128 * M;
>>    // Initial delay in milliseconds after GC before the regions are uncommitted.
>> -  static const uint UncommitInitialDelayMs = 100;
>> +  static const uint UncommitInitialDelayMs = 10*1000;
>>    // The delay between two uncommit task executions.
>>    static const uint UncommitTaskDelayMs = 10;
>
> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove calling thread state assertions for wait_for_collection_finish().

I do not see what is the concern. I do not have much overview what all threads and VM threads are running in JVM. But there could be two concerns:
- Whether `_instance` could be set only soon after it was checked as NULL by `G1UncommitRegionTask::wait_if_active`. That would mean nobody called garbage collection before calling `G1UncommitRegionTask::wait_if_active`. That is not the case for CRaC:

  Universe::heap()->collect(GCCause::_full_gc_alot);
...
  Universe::heap()->wait_for_collection_finish();

If someone does it there is a race but such call would not make sense. And after all the worst what can happen is that one snapshots (or whatever the caller will do) more memory, that is inconvenient but not fatal.
- Whether `_active_sem.wait();` can wait forever. That could happen for example if `G1UncommitRegionTask::wait_if_active()` was called by G1CollectedHeap::heap()->service_thread(). T
Hat is asserted now. When else it could happen?

-------------

PR Comment: https://git.openjdk.org/crac/pull/93#issuecomment-1655593667