RFR(S) 8214388 CDS dumping fails with java heap fragmentation
Ioi Lam
ioi.lam at oracle.com
Fri Nov 30 20:11:22 UTC 2018
On 11/30/18 11:02 AM, Ioi Lam wrote:
> Hi Stefan,
>
> Thanks for the review. Here's an updated patch:
>
> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02/
>
> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02-delta/
>
>
> Please see responses to your comments below.
>
> I also added code to handle humongous strings that can be
> allocated during dump time, and make sure they are GC'ed (and thus
> no humongous regions) before we starting dumping the heap.
>
> I'll add one more test to test the handling of humongous regions
> that exist while the heap is being dumped, but that will require
> a little hacking with JVMTI. I'll post that later :-)
>
^^^^^ I added a new test "HumongousDuringDump.java" for the above
purpose and updated the webrev in-place.
Thanks
- Ioi
>
> On 11/30/18 3:17 AM, Stefan Johansson wrote:
>> Hi Ioi,
>>
>> I mostly looked at the GC interaction, some comments below.
>>
>> On 2018-11-30 02:07, Ioi Lam wrote:
>>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/
>>>
>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>> -------------------------------------------
>> 325 #if INCLUDE_CDS
>> 326 if (DumpSharedSpaces &&
>> HeapShared::is_heap_object_archiving_allowed()) {
>> 327 // See G1HeapVerifier::verify_ready_for_archiving()
>> 328 // This should not happen during normal operation of
>> -Xshare:dump, so let's give a warning.
>> 329 log_warning(gc)("Avoid allocating humongous objects during
>> -Xshare:dump (" SIZE_FORMAT
>> 330 " bytes) - may cause fragmentation",
>> 331 word_size * HeapWordSize);
>> 332 }
>> 333 #endif
>>
>> I think this warning should be left out, you still have output in the
>> verification that tells the user what is problematic.
>>
>
> Removed.
>
>> /src/hotspot/share/gc/g1/g1HeapVerifier.cpp
>> -------------------------------------------
>> What do you think about extending the verification with one more
>> state, maybe _fatal_hole. When iterating the regions you can then
>> look for holes not caused by humongous. If such holes are found log
>> an error and fail.
>>
>> You can then turn the current log_error into a log_warning letting
>> the user know there has been humongous allocations which can cause
>> problems during dump time.
>>
>> 314 // turn on the following assert only if we disallow humongous
>> allocations during
>> 315 // dump time.
>> 316 //assert(!cl.has_holes(), "must not have holes");
>>
>> I think you should remove these lines.
>>
>
> I modified the verification loop as you suggested, and changed the
> assert to check that all holes must be caused by humongous regions.
>
>> src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp
>> --------------------------------------------------
>> 184 #if INCLUDE_CDS
>> 185 if (DumpSharedSpaces &&
>> HeapShared::is_heap_object_archiving_allowed()) {
>> 186 // See G1HeapVerifier::verify_ready_for_archiving()
>> 187 return 1;
>> 188 }
>> 189 #endif
>> 190
>>
>> I think this decision should be moved to the g1 argument parsing. In
>> G1Arguments::initialize() you can add this after the initial setting
>> of ParallelGCThreads.
>>
>> // When dumping the CDS archive we want to reduce fragmentation by
>> // triggering a full collection. To get as low fragmentation as
>> // possible we only use one worker thread.
>> if (DumpSharedSpaces) {
>> FLAG_SET_ERGO(uint, ParallelGCThreads, 1);
>> FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
>> }
>>
>
> Done.
>
>> As you see I also include setting ExplicitGCInvokesConcurrent to
>> false, because if that flag is set a full gc will be turned into a
>> concurrent one.
>>
>> src/hotspot/share/memory/metaspaceShared.cpp
>> --------------------------------------------
>> 1706 #if INCLUDE_CDS_JAVA_HEAP
>> 1707 if (UseG1GC &&
>> HeapShared::is_heap_object_archiving_allowed()) {
>> 1708 // Avoid fragmentation while archiving heap objects.
>> 1709
>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true);
>> 1710 Universe::heap()->collect(GCCause::_java_lang_system_gc);
>> 1711
>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(false);
>> 1712 G1HeapVerifier::verify_ready_for_archiving();
>> 1713 }
>> 1714 #endif
>>
>> Do you need the #if here, is_heap_object_archiving_allowed() only
>> returns true when INCLUDE_CDS_JAVA_HEAP is true. Also the check for
>> UseG1GC is also done within is_heap_object_archiving_allowed().
>>
>> Regarding triggering the GC, the calls to the soft_ref_policy is not
>> needed unless you care about clearing soft references.
>
> I removed the #if. I kept the clearing of soft refs -- soft refs are
> not archived, so clearing them would give more free space for archiving.
>
>> An even nicer solution would be to add a new GCCause that is only
>> used for CDS dumping, that way you can also get a better message in
>> the GC log than "System.gc()" and you can also skip the setting of
>> ExplicitGCInvokesConcurrent that I mentioned above. Since the new
>> GCCause won't be affected by that flag.
>>
>
> I added GCCause::_cds_dump and skipped the setting of
> ExplicitGCInvokesConcurrent
>
> Thanks
> - Ioi
>
>> Thanks,
>> Stefan
>>
>>> https://bugs.openjdk.java.net/browse/JDK-8214388
>>>
>>>
>>> Symptom:
>>> ========
>>>
>>> "java -Xshare:dump" would intermittently fail with
>>>
>>> Unable to write archive heap ... due to fragmentation.
>>>
>>> This usually happens when you try to dump many classes (e.g. 10000)
>>> with
>>> a relatively small heap (e.g., 1g) with a lot of GC threads (e.g., 24).
>>>
>>> (Example use case -- Eclipse IDE loads 15,000 classes with 512MB heap.)
>>>
>>> When GC happens during class loading, some old G1 regions may be
>>> placed
>>> at the top end of the heap (due to large number of GC threads).
>>>
>>> Later, when writing the archived heap, G1 tries to allocate contiguous
>>> regions from the top end of the heap. This would fail due to the
>>> presence
>>> of those old regions.
>>>
>>>
>>> Fix:
>>> ====
>>>
>>> As suggested by Stefan Johansson, we run a full GC with a single GC
>>> thread. This guarantees that all old blocks will be moved to the bottom
>>> end of the heap.
>>>
>>> Because there's no API for specifying the number of GC threads
>>> dynamically,
>>> and CDS dump time doesn't allocates lots of objects, I have
>>> statically forced
>>> the number of threads to 1 in
>>> AdaptiveSizePolicy::calc_active_workers during
>>> CDS dump time.
>>>
>>> (This seems like a more direct way than assigning ParallelGCThreads
>>> ...)
>>>
>>>
>>> Notes:
>>> ======
>>>
>>> 1. Humongous regions cannot move. However, currently we don't do
>>> humongous
>>> allocations during CDS dump, so we should be fine. I have added
>>> diagnostics
>>> warnings so if fragmentation does happen in the future, the user
>>> can
>>> find out why.
>>>
>>> 2. Fixed a minor bug in
>>> HeapShared::check_closed_archive_heap_region_object_class
>>>
>>> 3. Fixed a bug in MetaspaceShared::read_extra_data, where the
>>> symbol/strings
>>> would be lost due to GC.
>>>
>>> 4. Added stress test to successfully archive about 18MB of objects
>>> with -Xmx64m.
>>> This used to fail even with -Xmx512m on a Solaris box.
>>>
>>> 5. With default CDS archive generation during JDK build time,
>>> -Xmx128m is used.
>>> Before this fix, the EDEN region lives at the top of the heap
>>> during CDS dump
>>> time, and we end up with a 2MB gap between the archive regions
>>> and the top
>>> of the heap. Because the archive regions cannot move, at run
>>> time, using CDS
>>> would reduce the max humongous allocation by 2MB.
>>>
>>> With this fix, the archive regions are now placed at the very
>>> top of the heap,
>>> so the gap no longer exists.
>>>
>>>
>>> Tests:
>>> ======
>>>
>>> Running hs-tiers{1-6} for sanity.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list