RFR(S) 8214388 CDS dumping fails with java heap fragmentation

Ioi Lam ioi.lam at oracle.com
Fri Nov 30 19:02:21 UTC 2018


Hi Stefan,

Thanks for the review. Here's an updated patch:

http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02/
http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02-delta/

Please see responses to your comments below.

I also added code to handle humongous strings that can be
allocated during dump time, and make sure they are GC'ed (and thus
no humongous regions) before we starting dumping the heap.

I'll add one more test to test the handling of humongous regions
that exist while the heap is being dumped, but that will require
a little hacking with JVMTI. I'll post that later :-)


On 11/30/18 3:17 AM, Stefan Johansson wrote:
> Hi Ioi,
>
> I mostly looked at the GC interaction, some comments below.
>
> On 2018-11-30 02:07, Ioi Lam wrote:
>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/ 
>>
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
> -------------------------------------------
>  325 #if INCLUDE_CDS
>  326   if (DumpSharedSpaces && 
> HeapShared::is_heap_object_archiving_allowed()) {
>  327     // See G1HeapVerifier::verify_ready_for_archiving()
>  328     // This should not happen during normal operation of 
> -Xshare:dump, so let's give a warning.
>  329     log_warning(gc)("Avoid allocating humongous objects during 
> -Xshare:dump (" SIZE_FORMAT
>  330                     " bytes) - may cause fragmentation",
>  331                     word_size * HeapWordSize);
>  332   }
>  333 #endif
>
> I think this warning should be left out, you still have output in the 
> verification that tells the user what is problematic.
>

Removed.

> /src/hotspot/share/gc/g1/g1HeapVerifier.cpp
> -------------------------------------------
> What do you think about extending the verification with one more 
> state, maybe _fatal_hole. When iterating the regions you can then look 
> for holes not caused by humongous. If such holes are found log an 
> error and fail.
>
> You can then turn the current log_error into a log_warning letting the 
> user know there has been humongous allocations which can cause 
> problems during dump time.
>
>  314   // turn on the following assert only if we disallow humongous 
> allocations during
>  315   // dump time.
>  316   //assert(!cl.has_holes(), "must not have holes");
>
> I think you should remove these lines.
>

I modified the verification loop as you suggested, and changed the 
assert to check that all holes must be caused by humongous regions.

> src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp
> --------------------------------------------------
>  184 #if INCLUDE_CDS
>  185   if (DumpSharedSpaces && 
> HeapShared::is_heap_object_archiving_allowed()) {
>  186     // See G1HeapVerifier::verify_ready_for_archiving()
>  187     return 1;
>  188   }
>  189 #endif
>  190
>
> I think this decision should be moved to the g1 argument parsing. In 
> G1Arguments::initialize() you can add this after the initial setting 
> of ParallelGCThreads.
>
>   // When dumping the CDS archive we want to reduce fragmentation by
>   // triggering a full collection. To get as low fragmentation as
>   // possible we only use one worker thread.
>   if (DumpSharedSpaces) {
>     FLAG_SET_ERGO(uint, ParallelGCThreads, 1);
>     FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
>   }
>

Done.

> As you see I also include setting ExplicitGCInvokesConcurrent to 
> false, because if that flag is set a full gc will be turned into a 
> concurrent one.
>
> src/hotspot/share/memory/metaspaceShared.cpp
> --------------------------------------------
> 1706 #if INCLUDE_CDS_JAVA_HEAP
> 1707     if (UseG1GC && HeapShared::is_heap_object_archiving_allowed()) {
> 1708       // Avoid fragmentation while archiving heap objects.
> 1709 
> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true);
> 1710 Universe::heap()->collect(GCCause::_java_lang_system_gc);
> 1711 
> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(false);
> 1712       G1HeapVerifier::verify_ready_for_archiving();
> 1713     }
> 1714 #endif
>
> Do you need the #if here, is_heap_object_archiving_allowed() only 
> returns true when INCLUDE_CDS_JAVA_HEAP is true. Also the check for 
> UseG1GC is also done within is_heap_object_archiving_allowed().
>
> Regarding triggering the GC, the calls to the soft_ref_policy is not 
> needed unless you care about clearing soft references.

I removed the #if. I kept the clearing of soft refs -- soft refs are not 
archived, so clearing them would give more free space for archiving.

> An even nicer solution would be to add a new GCCause that is only used 
> for CDS dumping, that way you can also get a better message in the GC 
> log than "System.gc()" and you can also skip the setting of 
> ExplicitGCInvokesConcurrent that I mentioned above. Since the new 
> GCCause won't be affected by that flag.
>

I added GCCause::_cds_dump and skipped the setting of 
ExplicitGCInvokesConcurrent

Thanks
- Ioi

> Thanks,
> Stefan
>
>> https://bugs.openjdk.java.net/browse/JDK-8214388
>>
>>
>> Symptom:
>> ========
>>
>> "java -Xshare:dump" would intermittently fail with
>>
>>     Unable to write archive heap ... due to fragmentation.
>>
>> This usually happens when you try to dump many classes (e.g. 10000) with
>> a relatively small heap (e.g., 1g) with a lot of GC threads (e.g., 24).
>>
>> (Example use case -- Eclipse IDE loads 15,000 classes with 512MB heap.)
>>
>> When GC happens during class loading, some old G1 regions  may be placed
>> at the top end of the heap (due to large number of GC threads).
>>
>> Later, when writing the archived heap, G1 tries to allocate contiguous
>> regions from the top end of the heap. This would fail due to the 
>> presence
>> of those old regions.
>>
>>
>> Fix:
>> ====
>>
>> As suggested by Stefan Johansson, we run a full GC with a single GC
>> thread. This guarantees that all old blocks will be moved to the bottom
>> end of the heap.
>>
>> Because there's no API for specifying the number of GC threads 
>> dynamically,
>> and CDS dump time doesn't allocates lots of objects, I have 
>> statically forced
>> the number of threads to 1 in AdaptiveSizePolicy::calc_active_workers 
>> during
>> CDS dump time.
>>
>> (This seems like a more direct way than assigning ParallelGCThreads ...)
>>
>>
>> Notes:
>> ======
>>
>> 1. Humongous regions cannot move. However, currently we don't do 
>> humongous
>>     allocations during CDS dump, so we should be fine. I have added 
>> diagnostics
>>     warnings so if fragmentation does happen in the future, the user can
>>     find out why.
>>
>> 2. Fixed a minor bug in 
>> HeapShared::check_closed_archive_heap_region_object_class
>>
>> 3. Fixed a bug in MetaspaceShared::read_extra_data, where the 
>> symbol/strings
>>     would be lost due to GC.
>>
>> 4. Added stress test to successfully archive about 18MB of objects 
>> with -Xmx64m.
>>     This used to fail even with -Xmx512m on a Solaris box.
>>
>> 5. With default CDS archive generation during JDK build time, 
>> -Xmx128m is used.
>>     Before this fix, the EDEN region lives at the top of the heap 
>> during CDS dump
>>     time, and we end up with a 2MB gap between the archive regions 
>> and the top
>>     of the heap. Because the archive regions cannot move, at run 
>> time, using CDS
>>     would reduce the max humongous allocation by 2MB.
>>
>>     With this fix, the archive regions are now placed at the very top 
>> of the heap,
>>     so the gap no longer exists.
>>
>>
>> Tests:
>> ======
>>
>> Running hs-tiers{1-6} for sanity.
>>
>> Thanks
>> - Ioi
>>
>>



More information about the hotspot-runtime-dev mailing list