RFR(S) 8214388 CDS dumping fails with java heap fragmentation

Tue Dec 4 00:06:46 UTC 2018

On 12/3/2018 10:11 AM, Jiangli Zhou wrote:
> On 11/30/18 11:06 PM, Ioi Lam wrote:
>>
>> On 11/30/18 11:01 PM, Ioi Lam wrote:
>>>
>>> On 11/30/18 9:18 PM, Jiangli Zhou wrote:
>>>>
>>>> Hi Ioi,
>>>>
>>>> On 11/30/18 2:41 PM, Ioi Lam wrote:
>>>>
>>>>> (Resending since my last mail seems to have been mangled)
>>>>>
>>>>> Hi Jiangli,
>>>>>
>>>>> Thanks for the review!
>>>>>
>>>>> Updated webrev: 
>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v03-delta/
>>>>>
>>>>>
>>>>> On 11/30/18 1:04 PM, Jiangli Zhou wrote:
>>>>>> Hi Ioi,
>>>>>>
>>>>>> - src/hotspot/share/gc/g1/g1Arguments.cpp
>>>>>>
>>>>>> You missed FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
>>>>>
>>>>>
>>>>> As Stefan mentioned below, doing this is not necessary because I 
>>>>> have added a new GC cause.
>>>>>
>>>>
>>>> Ok. I see G1CollectedHeap::is_user_requested_concurrent_full_gc 
>>>> returns false for default.
>>>>>
>>>>>
>>>>>>
>>>>>> - src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>>>>>>
>>>>>> - src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp
>>>>>>
>>>>>> Above files have no change.
>>>>>>
>>>>>
>>>>> I had multiple MQ patches that cancel each other out. Webrev still 
>>>>> generates an empty diff for those files. Please ignore them.
>>>>>
>>>>>
>>>>>> - src/hotspot/share/gc/g1/g1HeapVerifier.cpp
>>>>>>
>>>>>>  320     log_warning(gc, verify)("All free regions should be at 
>>>>>> the top end of the heap, but "
>>>>>>  321                             " we found holes. This is 
>>>>>> probably caused by (unmovable) humongous"
>>>>>>  322                             " allocations, and may lead to 
>>>>>> fragmentation while"
>>>>>>  323                             " writing archive heap memory 
>>>>>> regions.");
>>>>>>
>>>>>> This warning is not very clear and can be false alarming. The 
>>>>>> hole in the middle of the heap causes heap fragmentation, but not 
>>>>>> necessarily in the archived regions. I'd leave the warning out 
>>>>>> and let the existing code in 
>>>>>> FileMapInfo::write_archive_heap_regions to detect archive 
>>>>>> fragmentation and report error.
>>>>>>
>>>>>
>>>>> This warning is about a *possibility* of fragmentation. It gives 
>>>>> the user a chance to check what's going on, without having to wait 
>>>>> for an actual failure to happen.
>>>>>
>>>>> In practice, this means if the JDK start-up code is changed in the 
>>>>> future to allocate a humongous array, someone will discover this 
>>>>> while building the JDK and dumping the default archive. That way, 
>>>>> we can fix the JDK to avoid the use of humongous arrays.
>>>>>
>>>>> In fact, I just modified the code to warn against humongous 
>>>>> regions even if no holes are found.
>>>>>
>>>>
>>>> A warning implies something wrong, which is not always the case here.
>>>
>>>
>>>> That's why it can be false alarming. To avoid that, you can change 
>>>> log_warning to be log_info or log_debug, also add info in 
>>>> FileMapInfo::write_archive_heap_regions error output to suggest 
>>>> enabling the logging.
>>>
>>>
>>> If I change to log_info no one will see it, because no one dumps 
>>> with -Xlog:cds.
>>>
>>> Our current implementation doesn't handle humongous allocations 
>>> properly, and may lead to fragmentation. NO ONE will see this 
>>> warning unless the JDK is modified to allocate and store humongous 
>>> arrays during bootstraping. If that happens we want to know it 
>>> immediately.
>>>
>>> I fail to see how this can be considered falsely alarming. I'll 
>>> gladly provide personal 24-hour customer support if someone comes 
>>> and claim that they have been falsely alarmed :-)
>
> I was suggesting to change the FileMapInfo::write_archive_heap_regions 
> error to suggest user to enable -Xlog if they want to find out what 
> causes the error. As this is a temporary workaround and  humongous 
> allocations during dump time are rare (probably would not happen at 
> all for most usages), I'm okay with what you have. Let's get this in 
> so we can move on to other tasks.
>

Hi Jiangli, thanks for the review. I will do more testing and push what 
I have.

Thanks
- Ioi

>>>
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>> - src/hotspot/share/gc/shared/gcCause.cpp
>>>>>>
>>>>>> I'd suggest renaming _cds_dump to _archive_time_gc. The word 
>>>>>> 'archive' has been adapted to be used in GC code.
>>>>>>
>>>>>> - src/hotspot/share/memory/filemap.cpp
>>>>>>
>>>>>
>>>>> Done
>>>>>
>>>>>
>>>>>> I see no change for line 686?
>>>>>>
>>>>>> - src/hotspot/share/memory/heapShared.cpp
>>>>>>
>>>>>
>>>>> A space character is missing after the word "fragmentation."
>>>>>
>>>>>
>>>>>> Please revert the renaming for 
>>>>>> HeapShared::check_closed_archive_heap_region_object. 
>>>>>> HeapShared::check_closed_archive_heap_region_object_class is not 
>>>>>> correct. We check the objects in the graph, not the classes. It's 
>>>>>> better to use a separate bug and changeset for the bug fix in 
>>>>>> this function. So it's not mixed with the change for 8214388 for 
>>>>>> better back-tracking.
>>>>>>
>>>>>
>>>>> I'll revert the whole change and create a new bug.
>>>>>
>>>>> I found the name check_closed_archive_heap_region_object confusing 
>>>>> because it says "object" but passes a class. I think it's better 
>>>>> to pass the object, and then have this function read the class 
>>>>> from the object. I'll send a separate RFR.
>>>>>
>>>>>
>>>>>> - src/hotspot/share/memory/metaspaceShared.cpp
>>>>>>
>>>>>> The G1 specific code needs to be wrapped with #if INCLUDE_G1GC.
>>>>>>
>>>>>> 391           log_warning(cds)("[line %d] extra interned string 
>>>>>> ignored; size too large: %d",
>>>>>> 392                            reader.last_line_no(), utf8_length);
>>>>>>
>>>>>> You can use cds+heap for object archiving related logging.
>>>>>>
>>>>>
>>>>> Done
>>>>>
>>>>>
>>>>>> - 1718     if (UseG1GC && 
>>>>>> HeapShared::is_heap_object_archiving_allowed()) {
>>>>>>
>>>>>> As Stefan also pointed out, you don't need UseG1GC check here. 
>>>>>> HeapShared::is_heap_object_archiving_allowed already includes it.
>>>>>>
>>>>>
>>>>> Fixed
>>>>>
>>>>>
>>>>>> G1HeapVerifier::verify_ready_for_archiving() is called into two 
>>>>>> places. One is in HeapShared::archive_java_heap_objects and the 
>>>>>> other one is in HeapShared::archive_java_heap_objects. The 
>>>>>> duplication should be eliminated.
>>>>>
>>>>>
>>>>> I removed the one inside metaspaceShared.cpp
>>>>>
>>>>>
>>>>>> I also have the question for the need for 
>>>>>> G1HeapVerifier::verify_ready_for_archiving(). As there is already 
>>>>>> more efficient check in FileMapInfo::write_archive_heap_regions 
>>>>>> to detect fragmented archive regions, verify_ready_for_archiving 
>>>>>> does not add additional values. It's also a more expensive check.
>>>>>>
>>>>>
>>>>> It's suppose to provide better logging for diagnosing future 
>>>>> fragmentations (much easier to read than tracing all the gc+region 
>>>>> logs.). I don't know about you, but for me trying to find out why 
>>>>> fragmentation happened in JDK-8214217 was pretty damn tedious!
>>>>>
>>>>> It's not expensive at all since it just scans the list of regions 
>>>>> once.
>>>>>
>>>>
>>>> Please see above suggestion if more info for fragmentation is needed.
>>>>
>>>> After JDK 12, the approach #2 described in the RFE can be 
>>>> implemented. It has no issue if there is humongous region.
>>>>
>>>
>>> We can remove these logging AFTER fixing the problems with humongous 
>>> regions.
>>>
>>>
>>>> I'm still pounding on the question that Stefan raised in Slack 
>>>> channel about how much GC work is going on during dump time. 
>>>> Although only Strings, mirrors and other few types of objects being 
>>>> allocated currently, if the number of allocated objects is large 
>>>> and the java heap is inadequately small, what are the side-effects 
>>>> of this change (how much slow down would be introduced)?
>>>>
>>>
>>> I ran LotsOfClasses test with and without the change. over 30000 
>>> classes are loaded. There are over 32MB of archived objects. The 
>>> whole execution time slowed from 25 seconds to 26 seconds. It's 
>>> barely noticeable.
>>>
>>
>> Just to add one thing:
>>
>> What IS noticeable is -- before the fix, LotsOfClasses would fail 
>> with fragmentation with 1308MB of heap when 24 GC threads are used. 
>> Now LotsOfClasses can run successfully with only about 70MB of heap.
>
> Ok, thanks for confirming.
>
> Thanks,
> Jiangli
>>
>>
>>
>>> Thanks
>>>
>>> - Ioi
>>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> - Ioi
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>
>>>>>> On 11/30/18 11:02 AM, Ioi Lam wrote:
>>>>>>> Hi Stefan,
>>>>>>>
>>>>>>> Thanks for the review. Here's an updated patch:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02/ 
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02-delta/ 
>>>>>>>
>>>>>>>
>>>>>>> Please see responses to your comments below.
>>>>>>>
>>>>>>> I also added code to handle humongous strings that can be
>>>>>>> allocated during dump time, and make sure they are GC'ed (and thus
>>>>>>> no humongous regions) before we starting dumping the heap.
>>>>>>>
>>>>>>> I'll add one more test to test the handling of humongous regions
>>>>>>> that exist while the heap is being dumped, but that will require
>>>>>>> a little hacking with JVMTI. I'll post that later :-)
>>>>>>>
>>>>>>>
>>>>>>> On 11/30/18 3:17 AM, Stefan Johansson wrote:
>>>>>>>> Hi Ioi,
>>>>>>>>
>>>>>>>> I mostly looked at the GC interaction, some comments below.
>>>>>>>>
>>>>>>>> On 2018-11-30 02:07, Ioi Lam wrote:
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/ 
>>>>>>>>>
>>>>>>>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>>>>>>>> -------------------------------------------
>>>>>>>>  325 #if INCLUDE_CDS
>>>>>>>>  326   if (DumpSharedSpaces && 
>>>>>>>> HeapShared::is_heap_object_archiving_allowed()) {
>>>>>>>>  327     // See G1HeapVerifier::verify_ready_for_archiving()
>>>>>>>>  328     // This should not happen during normal operation of 
>>>>>>>> -Xshare:dump, so let's give a warning.
>>>>>>>>  329     log_warning(gc)("Avoid allocating humongous objects 
>>>>>>>> during -Xshare:dump (" SIZE_FORMAT
>>>>>>>>  330                     " bytes) - may cause fragmentation",
>>>>>>>>  331                     word_size * HeapWordSize);
>>>>>>>>  332   }
>>>>>>>>  333 #endif
>>>>>>>>
>>>>>>>> I think this warning should be left out, you still have output 
>>>>>>>> in the verification that tells the user what is problematic.
>>>>>>>>
>>>>>>>
>>>>>>> Removed.
>>>>>>>
>>>>>>>> /src/hotspot/share/gc/g1/g1HeapVerifier.cpp
>>>>>>>> -------------------------------------------
>>>>>>>> What do you think about extending the verification with one 
>>>>>>>> more state, maybe _fatal_hole. When iterating the regions you 
>>>>>>>> can then look for holes not caused by humongous. If such holes 
>>>>>>>> are found log an error and fail.
>>>>>>>>
>>>>>>>> You can then turn the current log_error into a log_warning 
>>>>>>>> letting the user know there has been humongous allocations 
>>>>>>>> which can cause problems during dump time.
>>>>>>>>
>>>>>>>>  314   // turn on the following assert only if we disallow 
>>>>>>>> humongous allocations during
>>>>>>>>  315   // dump time.
>>>>>>>>  316   //assert(!cl.has_holes(), "must not have holes");
>>>>>>>>
>>>>>>>> I think you should remove these lines.
>>>>>>>>
>>>>>>>
>>>>>>> I modified the verification loop as you suggested, and changed 
>>>>>>> the assert to check that all holes must be caused by humongous 
>>>>>>> regions.
>>>>>>>
>>>>>>>> src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp
>>>>>>>> --------------------------------------------------
>>>>>>>>  184 #if INCLUDE_CDS
>>>>>>>>  185   if (DumpSharedSpaces && 
>>>>>>>> HeapShared::is_heap_object_archiving_allowed()) {
>>>>>>>>  186     // See G1HeapVerifier::verify_ready_for_archiving()
>>>>>>>>  187     return 1;
>>>>>>>>  188   }
>>>>>>>>  189 #endif
>>>>>>>>  190
>>>>>>>>
>>>>>>>> I think this decision should be moved to the g1 argument 
>>>>>>>> parsing. In G1Arguments::initialize() you can add this after 
>>>>>>>> the initial setting of ParallelGCThreads.
>>>>>>>>
>>>>>>>>   // When dumping the CDS archive we want to reduce 
>>>>>>>> fragmentation by
>>>>>>>>   // triggering a full collection. To get as low fragmentation as
>>>>>>>>   // possible we only use one worker thread.
>>>>>>>>   if (DumpSharedSpaces) {
>>>>>>>>     FLAG_SET_ERGO(uint, ParallelGCThreads, 1);
>>>>>>>>     FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
>>>>>>>>   }
>>>>>>>>
>>>>>>>
>>>>>>> Done.
>>>>>>>
>>>>>>>> As you see I also include setting ExplicitGCInvokesConcurrent 
>>>>>>>> to false, because if that flag is set a full gc will be turned 
>>>>>>>> into a concurrent one.
>>>>>>>>
>>>>>>>> src/hotspot/share/memory/metaspaceShared.cpp
>>>>>>>> --------------------------------------------
>>>>>>>> 1706 #if INCLUDE_CDS_JAVA_HEAP
>>>>>>>> 1707     if (UseG1GC && 
>>>>>>>> HeapShared::is_heap_object_archiving_allowed()) {
>>>>>>>> 1708       // Avoid fragmentation while archiving heap objects.
>>>>>>>> 1709 
>>>>>>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true);
>>>>>>>> 1710 Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>>>>>>> 1711 
>>>>>>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(false);
>>>>>>>> 1712 G1HeapVerifier::verify_ready_for_archiving();
>>>>>>>> 1713     }
>>>>>>>> 1714 #endif
>>>>>>>>
>>>>>>>> Do you need the #if here, is_heap_object_archiving_allowed() 
>>>>>>>> only returns true when INCLUDE_CDS_JAVA_HEAP is true. Also the 
>>>>>>>> check for UseG1GC is also done within 
>>>>>>>> is_heap_object_archiving_allowed().
>>>>>>>>
>>>>>>>> Regarding triggering the GC, the calls to the soft_ref_policy 
>>>>>>>> is not needed unless you care about clearing soft references.
>>>>>>>
>>>>>>> I removed the #if. I kept the clearing of soft refs -- soft refs 
>>>>>>> are not archived, so clearing them would give more free space 
>>>>>>> for archiving.
>>>>>>>
>>>>>>>> An even nicer solution would be to add a new GCCause that is 
>>>>>>>> only used for CDS dumping, that way you can also get a better 
>>>>>>>> message in the GC log than "System.gc()" and you can also skip 
>>>>>>>> the setting of ExplicitGCInvokesConcurrent that I mentioned 
>>>>>>>> above. Since the new GCCause won't be affected by that flag.
>>>>>>>>
>>>>>>>
>>>>>>> I added GCCause::_cds_dump and skipped the setting of 
>>>>>>> ExplicitGCInvokesConcurrent
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8214388
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Symptom:
>>>>>>>>> ========
>>>>>>>>>
>>>>>>>>> "java -Xshare:dump" would intermittently fail with
>>>>>>>>>
>>>>>>>>>     Unable to write archive heap ... due to fragmentation.
>>>>>>>>>
>>>>>>>>> This usually happens when you try to dump many classes (e.g. 
>>>>>>>>> 10000) with
>>>>>>>>> a relatively small heap (e.g., 1g) with a lot of GC threads 
>>>>>>>>> (e.g., 24).
>>>>>>>>>
>>>>>>>>> (Example use case -- Eclipse IDE loads 15,000 classes with 
>>>>>>>>> 512MB heap.)
>>>>>>>>>
>>>>>>>>> When GC happens during class loading, some old G1 regions  may 
>>>>>>>>> be placed
>>>>>>>>> at the top end of the heap (due to large number of GC threads).
>>>>>>>>>
>>>>>>>>> Later, when writing the archived heap, G1 tries to allocate 
>>>>>>>>> contiguous
>>>>>>>>> regions from the top end of the heap. This would fail due to 
>>>>>>>>> the presence
>>>>>>>>> of those old regions.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Fix:
>>>>>>>>> ====
>>>>>>>>>
>>>>>>>>> As suggested by Stefan Johansson, we run a full GC with a 
>>>>>>>>> single GC
>>>>>>>>> thread. This guarantees that all old blocks will be moved to 
>>>>>>>>> the bottom
>>>>>>>>> end of the heap.
>>>>>>>>>
>>>>>>>>> Because there's no API for specifying the number of GC threads 
>>>>>>>>> dynamically,
>>>>>>>>> and CDS dump time doesn't allocates lots of objects, I have 
>>>>>>>>> statically forced
>>>>>>>>> the number of threads to 1 in 
>>>>>>>>> AdaptiveSizePolicy::calc_active_workers during
>>>>>>>>> CDS dump time.
>>>>>>>>>
>>>>>>>>> (This seems like a more direct way than assigning 
>>>>>>>>> ParallelGCThreads ...)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Notes:
>>>>>>>>> ======
>>>>>>>>>
>>>>>>>>> 1. Humongous regions cannot move. However, currently we don't 
>>>>>>>>> do humongous
>>>>>>>>>     allocations during CDS dump, so we should be fine. I have 
>>>>>>>>> added diagnostics
>>>>>>>>>     warnings so if fragmentation does happen in the future, 
>>>>>>>>> the user can
>>>>>>>>>     find out why.
>>>>>>>>>
>>>>>>>>> 2. Fixed a minor bug in 
>>>>>>>>> HeapShared::check_closed_archive_heap_region_object_class
>>>>>>>>>
>>>>>>>>> 3. Fixed a bug in MetaspaceShared::read_extra_data, where the 
>>>>>>>>> symbol/strings
>>>>>>>>>     would be lost due to GC.
>>>>>>>>>
>>>>>>>>> 4. Added stress test to successfully archive about 18MB of 
>>>>>>>>> objects with -Xmx64m.
>>>>>>>>>     This used to fail even with -Xmx512m on a Solaris box.
>>>>>>>>>
>>>>>>>>> 5. With default CDS archive generation during JDK build time, 
>>>>>>>>> -Xmx128m is used.
>>>>>>>>>     Before this fix, the EDEN region lives at the top of the 
>>>>>>>>> heap during CDS dump
>>>>>>>>>     time, and we end up with a 2MB gap between the archive 
>>>>>>>>> regions and the top
>>>>>>>>>     of the heap. Because the archive regions cannot move, at 
>>>>>>>>> run time, using CDS
>>>>>>>>>     would reduce the max humongous allocation by 2MB.
>>>>>>>>>
>>>>>>>>>     With this fix, the archive regions are now placed at the 
>>>>>>>>> very top of the heap,
>>>>>>>>>     so the gap no longer exists.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Tests:
>>>>>>>>> ======
>>>>>>>>>
>>>>>>>>> Running hs-tiers{1-6} for sanity.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>