RFR(S) 8214388 CDS dumping fails with java heap fragmentation

Fri Nov 30 11:17:05 UTC 2018

Hi Ioi,

I mostly looked at the GC interaction, some comments below.

On 2018-11-30 02:07, Ioi Lam wrote:
> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/
src/hotspot/share/gc/g1/g1CollectedHeap.cpp
-------------------------------------------
  325 #if INCLUDE_CDS
  326   if (DumpSharedSpaces && 
HeapShared::is_heap_object_archiving_allowed()) {
  327     // See G1HeapVerifier::verify_ready_for_archiving()
  328     // This should not happen during normal operation of 
-Xshare:dump, so let's give a warning.
  329     log_warning(gc)("Avoid allocating humongous objects during 
-Xshare:dump (" SIZE_FORMAT
  330                     " bytes) - may cause fragmentation",
  331                     word_size * HeapWordSize);
  332   }
  333 #endif

I think this warning should be left out, you still have output in the 
verification that tells the user what is problematic.

/src/hotspot/share/gc/g1/g1HeapVerifier.cpp
-------------------------------------------
What do you think about extending the verification with one more state, 
maybe _fatal_hole. When iterating the regions you can then look for 
holes not caused by humongous. If such holes are found log an error and 
fail.

You can then turn the current log_error into a log_warning letting the 
user know there has been humongous allocations which can cause problems 
during dump time.

  314   // turn on the following assert only if we disallow humongous 
allocations during
  315   // dump time.
  316   //assert(!cl.has_holes(), "must not have holes");

I think you should remove these lines.

src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp
--------------------------------------------------
  184 #if INCLUDE_CDS
  185   if (DumpSharedSpaces && 
HeapShared::is_heap_object_archiving_allowed()) {
  186     // See G1HeapVerifier::verify_ready_for_archiving()
  187     return 1;
  188   }
  189 #endif
  190

I think this decision should be moved to the g1 argument parsing. In 
G1Arguments::initialize() you can add this after the initial setting of 
ParallelGCThreads.

   // When dumping the CDS archive we want to reduce fragmentation by
   // triggering a full collection. To get as low fragmentation as
   // possible we only use one worker thread.
   if (DumpSharedSpaces) {
     FLAG_SET_ERGO(uint, ParallelGCThreads, 1);
     FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
   }

As you see I also include setting ExplicitGCInvokesConcurrent to false, 
because if that flag is set a full gc will be turned into a concurrent one.

src/hotspot/share/memory/metaspaceShared.cpp
--------------------------------------------
1706 #if INCLUDE_CDS_JAVA_HEAP
1707     if (UseG1GC && HeapShared::is_heap_object_archiving_allowed()) {
1708       // Avoid fragmentation while archiving heap objects.
1709 
Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true);
1710       Universe::heap()->collect(GCCause::_java_lang_system_gc);
1711 
Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(false);
1712       G1HeapVerifier::verify_ready_for_archiving();
1713     }
1714 #endif

Do you need the #if here, is_heap_object_archiving_allowed() only 
returns true when INCLUDE_CDS_JAVA_HEAP is true. Also the check for 
UseG1GC is also done within is_heap_object_archiving_allowed().

Regarding triggering the GC, the calls to the soft_ref_policy is not 
needed unless you care about clearing soft references. An even nicer 
solution would be to add a new GCCause that is only used for CDS 
dumping, that way you can also get a better message in the GC log than 
"System.gc()" and you can also skip the setting of 
ExplicitGCInvokesConcurrent that I mentioned above. Since the new 
GCCause won't be affected by that flag.

Thanks,
Stefan

> https://bugs.openjdk.java.net/browse/JDK-8214388
> 
> 
> Symptom:
> ========
> 
> "java -Xshare:dump" would intermittently fail with
> 
>     Unable to write archive heap ... due to fragmentation.
> 
> This usually happens when you try to dump many classes (e.g. 10000) with
> a relatively small heap (e.g., 1g) with a lot of GC threads (e.g., 24).
> 
> (Example use case -- Eclipse IDE loads 15,000 classes with 512MB heap.)
> 
> When GC happens during class loading, some old G1 regions  may be placed
> at the top end of the heap (due to large number of GC threads).
> 
> Later, when writing the archived heap, G1 tries to allocate contiguous
> regions from the top end of the heap. This would fail due to the presence
> of those old regions.
> 
> 
> Fix:
> ====
> 
> As suggested by Stefan Johansson, we run a full GC with a single GC
> thread. This guarantees that all old blocks will be moved to the bottom
> end of the heap.
> 
> Because there's no API for specifying the number of GC threads dynamically,
> and CDS dump time doesn't allocates lots of objects, I have statically 
> forced
> the number of threads to 1 in AdaptiveSizePolicy::calc_active_workers 
> during
> CDS dump time.
> 
> (This seems like a more direct way than assigning ParallelGCThreads ...)
> 
> 
> Notes:
> ======
> 
> 1. Humongous regions cannot move. However, currently we don't do humongous
>     allocations during CDS dump, so we should be fine. I have added 
> diagnostics
>     warnings so if fragmentation does happen in the future, the user can
>     find out why.
> 
> 2. Fixed a minor bug in 
> HeapShared::check_closed_archive_heap_region_object_class
> 
> 3. Fixed a bug in MetaspaceShared::read_extra_data, where the 
> symbol/strings
>     would be lost due to GC.
> 
> 4. Added stress test to successfully archive about 18MB of objects with 
> -Xmx64m.
>     This used to fail even with -Xmx512m on a Solaris box.
> 
> 5. With default CDS archive generation during JDK build time, -Xmx128m 
> is used.
>     Before this fix, the EDEN region lives at the top of the heap during 
> CDS dump
>     time, and we end up with a 2MB gap between the archive regions and 
> the top
>     of the heap. Because the archive regions cannot move, at run time, 
> using CDS
>     would reduce the max humongous allocation by 2MB.
> 
>     With this fix, the archive regions are now placed at the very top of 
> the heap,
>     so the gap no longer exists.
> 
> 
> Tests:
> ======
> 
> Running hs-tiers{1-6} for sanity.
> 
> Thanks
> - Ioi
> 
>