RFR(S) 8214388 CDS dumping fails with java heap fragmentation
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Mon Dec 3 11:38:53 UTC 2018
I didn't review but looked at this because I was curious.
http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v03-delta/src/hotspot/share/memory/heapShared.cpp.udiff.html
+ if (!fs.access_flags().is_final() && (ft == T_ARRAY || T_OBJECT)) {
This doesn't look right.
thanks,
Coleen
On 11/30/18 5:41 PM, Ioi Lam wrote:
> (Resending since my last mail seems to have been mangled)
>
> Hi Jiangli,
>
> Thanks for the review!
>
> Updated webrev:
> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v03-delta/
>
>
> On 11/30/18 1:04 PM, Jiangli Zhou wrote:
>> Hi Ioi,
>>
>> - src/hotspot/share/gc/g1/g1Arguments.cpp
>>
>> You missed FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
>
>
> As Stefan mentioned below, doing this is not necessary because I have
> added a new GC cause.
>
>>
>> - src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>>
>> - src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp
>>
>> Above files have no change.
>>
>
> I had multiple MQ patches that cancel each other out. Webrev still
> generates an empty diff for those files. Please ignore them.
>
>
>> - src/hotspot/share/gc/g1/g1HeapVerifier.cpp
>>
>> 320 log_warning(gc, verify)("All free regions should be at the
>> top end of the heap, but "
>> 321 " we found holes. This is probably
>> caused by (unmovable) humongous"
>> 322 " allocations, and may lead to
>> fragmentation while"
>> 323 " writing archive heap memory
>> regions.");
>>
>> This warning is not very clear and can be false alarming. The hole in
>> the middle of the heap causes heap fragmentation, but not necessarily
>> in the archived regions. I'd leave the warning out and let the
>> existing code in FileMapInfo::write_archive_heap_regions to detect
>> archive fragmentation and report error.
>>
>
> This warning is about a *possibility* of fragmentation. It gives the
> user a chance to check what's going on, without having to wait for an
> actual failure to happen.
>
> In practice, this means if the JDK start-up code is changed in the
> future to allocate a humongous array, someone will discover this while
> building the JDK and dumping the default archive. That way, we can fix
> the JDK to avoid the use of humongous arrays.
>
> In fact, I just modified the code to warn against humongous regions
> even if no holes are found.
>
>
>> - src/hotspot/share/gc/shared/gcCause.cpp
>>
>> I'd suggest renaming _cds_dump to _archive_time_gc. The word
>> 'archive' has been adapted to be used in GC code.
>>
>> - src/hotspot/share/memory/filemap.cpp
>>
>
> Done
>
>
>> I see no change for line 686?
>>
>> - src/hotspot/share/memory/heapShared.cpp
>>
>
> A space character is missing after the word "fragmentation."
>
>
>> Please revert the renaming for
>> HeapShared::check_closed_archive_heap_region_object.
>> HeapShared::check_closed_archive_heap_region_object_class is not
>> correct. We check the objects in the graph, not the classes. It's
>> better to use a separate bug and changeset for the bug fix in this
>> function. So it's not mixed with the change for 8214388 for better
>> back-tracking.
>>
>
> I'll revert the whole change and create a new bug.
>
> I found the name check_closed_archive_heap_region_object confusing
> because it says "object" but passes a class. I think it's better to
> pass the object, and then have this function read the class from the
> object. I'll send a separate RFR.
>
>
>> - src/hotspot/share/memory/metaspaceShared.cpp
>>
>> The G1 specific code needs to be wrapped with #if INCLUDE_G1GC.
>>
>> 391 log_warning(cds)("[line %d] extra interned string
>> ignored; size too large: %d",
>> 392 reader.last_line_no(), utf8_length);
>>
>> You can use cds+heap for object archiving related logging.
>>
>
> Done
>
>
>> - 1718 if (UseG1GC &&
>> HeapShared::is_heap_object_archiving_allowed()) {
>>
>> As Stefan also pointed out, you don't need UseG1GC check here.
>> HeapShared::is_heap_object_archiving_allowed already includes it.
>>
>
> Fixed
>
>
>> G1HeapVerifier::verify_ready_for_archiving() is called into two
>> places. One is in HeapShared::archive_java_heap_objects and the other
>> one is in HeapShared::archive_java_heap_objects. The duplication
>> should be eliminated.
>
>
> I removed the one inside metaspaceShared.cpp
>
>
>> I also have the question for the need for
>> G1HeapVerifier::verify_ready_for_archiving(). As there is already
>> more efficient check in FileMapInfo::write_archive_heap_regions to
>> detect fragmented archive regions, verify_ready_for_archiving does
>> not add additional values. It's also a more expensive check.
>>
>
> It's suppose to provide better logging for diagnosing future
> fragmentations (much easier to read than tracing all the gc+region
> logs.). I don't know about you, but for me trying to find out why
> fragmentation happened in JDK-8214217 was pretty damn tedious!
>
> It's not expensive at all since it just scans the list of regions once.
>
> Thanks
>
> - Ioi
>
>
>> Thanks,
>> Jiangli
>>
>> On 11/30/18 11:02 AM, Ioi Lam wrote:
>>> Hi Stefan,
>>>
>>> Thanks for the review. Here's an updated patch:
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02/
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02-delta/
>>>
>>>
>>> Please see responses to your comments below.
>>>
>>> I also added code to handle humongous strings that can be
>>> allocated during dump time, and make sure they are GC'ed (and thus
>>> no humongous regions) before we starting dumping the heap.
>>>
>>> I'll add one more test to test the handling of humongous regions
>>> that exist while the heap is being dumped, but that will require
>>> a little hacking with JVMTI. I'll post that later :-)
>>>
>>>
>>> On 11/30/18 3:17 AM, Stefan Johansson wrote:
>>>> Hi Ioi,
>>>>
>>>> I mostly looked at the GC interaction, some comments below.
>>>>
>>>> On 2018-11-30 02:07, Ioi Lam wrote:
>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/
>>>>>
>>>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>>>> -------------------------------------------
>>>> 325 #if INCLUDE_CDS
>>>> 326 if (DumpSharedSpaces &&
>>>> HeapShared::is_heap_object_archiving_allowed()) {
>>>> 327 // See G1HeapVerifier::verify_ready_for_archiving()
>>>> 328 // This should not happen during normal operation of
>>>> -Xshare:dump, so let's give a warning.
>>>> 329 log_warning(gc)("Avoid allocating humongous objects during
>>>> -Xshare:dump (" SIZE_FORMAT
>>>> 330 " bytes) - may cause fragmentation",
>>>> 331 word_size * HeapWordSize);
>>>> 332 }
>>>> 333 #endif
>>>>
>>>> I think this warning should be left out, you still have output in
>>>> the verification that tells the user what is problematic.
>>>>
>>>
>>> Removed.
>>>
>>>> /src/hotspot/share/gc/g1/g1HeapVerifier.cpp
>>>> -------------------------------------------
>>>> What do you think about extending the verification with one more
>>>> state, maybe _fatal_hole. When iterating the regions you can then
>>>> look for holes not caused by humongous. If such holes are found log
>>>> an error and fail.
>>>>
>>>> You can then turn the current log_error into a log_warning letting
>>>> the user know there has been humongous allocations which can cause
>>>> problems during dump time.
>>>>
>>>> 314 // turn on the following assert only if we disallow
>>>> humongous allocations during
>>>> 315 // dump time.
>>>> 316 //assert(!cl.has_holes(), "must not have holes");
>>>>
>>>> I think you should remove these lines.
>>>>
>>>
>>> I modified the verification loop as you suggested, and changed the
>>> assert to check that all holes must be caused by humongous regions.
>>>
>>>> src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp
>>>> --------------------------------------------------
>>>> 184 #if INCLUDE_CDS
>>>> 185 if (DumpSharedSpaces &&
>>>> HeapShared::is_heap_object_archiving_allowed()) {
>>>> 186 // See G1HeapVerifier::verify_ready_for_archiving()
>>>> 187 return 1;
>>>> 188 }
>>>> 189 #endif
>>>> 190
>>>>
>>>> I think this decision should be moved to the g1 argument parsing.
>>>> In G1Arguments::initialize() you can add this after the initial
>>>> setting of ParallelGCThreads.
>>>>
>>>> // When dumping the CDS archive we want to reduce fragmentation by
>>>> // triggering a full collection. To get as low fragmentation as
>>>> // possible we only use one worker thread.
>>>> if (DumpSharedSpaces) {
>>>> FLAG_SET_ERGO(uint, ParallelGCThreads, 1);
>>>> FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
>>>> }
>>>>
>>>
>>> Done.
>>>
>>>> As you see I also include setting ExplicitGCInvokesConcurrent to
>>>> false, because if that flag is set a full gc will be turned into a
>>>> concurrent one.
>>>>
>>>> src/hotspot/share/memory/metaspaceShared.cpp
>>>> --------------------------------------------
>>>> 1706 #if INCLUDE_CDS_JAVA_HEAP
>>>> 1707 if (UseG1GC &&
>>>> HeapShared::is_heap_object_archiving_allowed()) {
>>>> 1708 // Avoid fragmentation while archiving heap objects.
>>>> 1709
>>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true);
>>>> 1710 Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>>> 1711
>>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(false);
>>>> 1712 G1HeapVerifier::verify_ready_for_archiving();
>>>> 1713 }
>>>> 1714 #endif
>>>>
>>>> Do you need the #if here, is_heap_object_archiving_allowed() only
>>>> returns true when INCLUDE_CDS_JAVA_HEAP is true. Also the check for
>>>> UseG1GC is also done within is_heap_object_archiving_allowed().
>>>>
>>>> Regarding triggering the GC, the calls to the soft_ref_policy is
>>>> not needed unless you care about clearing soft references.
>>>
>>> I removed the #if. I kept the clearing of soft refs -- soft refs are
>>> not archived, so clearing them would give more free space for
>>> archiving.
>>>
>>>> An even nicer solution would be to add a new GCCause that is only
>>>> used for CDS dumping, that way you can also get a better message in
>>>> the GC log than "System.gc()" and you can also skip the setting of
>>>> ExplicitGCInvokesConcurrent that I mentioned above. Since the new
>>>> GCCause won't be affected by that flag.
>>>>
>>>
>>> I added GCCause::_cds_dump and skipped the setting of
>>> ExplicitGCInvokesConcurrent
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> Thanks,
>>>> Stefan
>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8214388
>>>>>
>>>>>
>>>>> Symptom:
>>>>> ========
>>>>>
>>>>> "java -Xshare:dump" would intermittently fail with
>>>>>
>>>>> Unable to write archive heap ... due to fragmentation.
>>>>>
>>>>> This usually happens when you try to dump many classes (e.g.
>>>>> 10000) with
>>>>> a relatively small heap (e.g., 1g) with a lot of GC threads (e.g.,
>>>>> 24).
>>>>>
>>>>> (Example use case -- Eclipse IDE loads 15,000 classes with 512MB
>>>>> heap.)
>>>>>
>>>>> When GC happens during class loading, some old G1 regions may be
>>>>> placed
>>>>> at the top end of the heap (due to large number of GC threads).
>>>>>
>>>>> Later, when writing the archived heap, G1 tries to allocate
>>>>> contiguous
>>>>> regions from the top end of the heap. This would fail due to the
>>>>> presence
>>>>> of those old regions.
>>>>>
>>>>>
>>>>> Fix:
>>>>> ====
>>>>>
>>>>> As suggested by Stefan Johansson, we run a full GC with a single GC
>>>>> thread. This guarantees that all old blocks will be moved to the
>>>>> bottom
>>>>> end of the heap.
>>>>>
>>>>> Because there's no API for specifying the number of GC threads
>>>>> dynamically,
>>>>> and CDS dump time doesn't allocates lots of objects, I have
>>>>> statically forced
>>>>> the number of threads to 1 in
>>>>> AdaptiveSizePolicy::calc_active_workers during
>>>>> CDS dump time.
>>>>>
>>>>> (This seems like a more direct way than assigning
>>>>> ParallelGCThreads ...)
>>>>>
>>>>>
>>>>> Notes:
>>>>> ======
>>>>>
>>>>> 1. Humongous regions cannot move. However, currently we don't do
>>>>> humongous
>>>>> allocations during CDS dump, so we should be fine. I have
>>>>> added diagnostics
>>>>> warnings so if fragmentation does happen in the future, the
>>>>> user can
>>>>> find out why.
>>>>>
>>>>> 2. Fixed a minor bug in
>>>>> HeapShared::check_closed_archive_heap_region_object_class
>>>>>
>>>>> 3. Fixed a bug in MetaspaceShared::read_extra_data, where the
>>>>> symbol/strings
>>>>> would be lost due to GC.
>>>>>
>>>>> 4. Added stress test to successfully archive about 18MB of objects
>>>>> with -Xmx64m.
>>>>> This used to fail even with -Xmx512m on a Solaris box.
>>>>>
>>>>> 5. With default CDS archive generation during JDK build time,
>>>>> -Xmx128m is used.
>>>>> Before this fix, the EDEN region lives at the top of the heap
>>>>> during CDS dump
>>>>> time, and we end up with a 2MB gap between the archive regions
>>>>> and the top
>>>>> of the heap. Because the archive regions cannot move, at run
>>>>> time, using CDS
>>>>> would reduce the max humongous allocation by 2MB.
>>>>>
>>>>> With this fix, the archive regions are now placed at the very
>>>>> top of the heap,
>>>>> so the gap no longer exists.
>>>>>
>>>>>
>>>>> Tests:
>>>>> ======
>>>>>
>>>>> Running hs-tiers{1-6} for sanity.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>
>>
More information about the hotspot-runtime-dev
mailing list