RFR(S) 8214388 CDS dumping fails with java heap fragmentation
Ioi Lam
ioi.lam at oracle.com
Mon Dec 3 23:58:50 UTC 2018
Hi Stefan, thanks for the review.
- Ioi
On 12/3/2018 1:28 AM, Stefan Johansson wrote:
> Hi Ioi,
>
> Thanks for addressing my comments I think this version looks good from
> a G1 point of view.
>
> Thanks,
> Stefan
>
> On 2018-11-30 23:36, Ioi Lam wrote:
>> Hi Jiangli,
>>
>> Thanks for the review!
>>
>>
>> Updated webrev:
>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v03-delta/
>>
>>
>> On 11/30/18 1:04 PM, Jiangli Zhou wrote:
>>> Hi Ioi, > > - src/hotspot/share/gc/g1/g1Arguments.cpp > > You missed
>> FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
>>
>> As Stefan mentioned below, doing this is not necessary because I have
>> added a new GC cause.
>>
>>> > - src/hotspot/share/gc/g1/g1CollectedHeap.cpp > > -
>> src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp > > Above files
>> have no change.
>>
>> I had multiple MQ patches that cancel each other out. Webrev still
>> generates an empty diff for those files. Please ignore them.
>>
>>> > - src/hotspot/share/gc/g1/g1HeapVerifier.cpp > > 320 log_warning(gc,
>> verify)("All free regions should be at the > top end of the heap, but
>> " 321 " we found > holes. This is probably caused by (unmovable)
>> humongous" 322 > " allocations, and may lead to fragmentation while"
>> 323 > " writing archive heap memory regions."); > > This warning is
>> not very clear and can be false alarming. The hole in > the middle of
>> the heap causes heap fragmentation, but not necessarily > in the
>> archived regions. I'd leave the warning out and let the > existing
>> code in FileMapInfo::write_archive_heap_regions to detect > archive
>> fragmentation and report error. >
>> This warning is about a *possibility* of fragmentation. It gives the
>> user a chance to check what's going on, without having to wait for an
>> actual failure to happen.
>>
>> In practice, this means if the JDK start-up code is changed in the
>> future to allocate a humongous array, someone will discover this
>> while building the JDK and dumping the default archive. That way, we
>> can fix the JDK to avoid the use of humongous arrays.
>>
>> In fact, I just modified the code to warn against humongous regions
>> even if no holes are found.
>>
>>
>>> - src/hotspot/share/gc/shared/gcCause.cpp > > I'd suggest renaming
>>> _cds_dump to _archive_time_gc. The word >
>> 'archive' has been adapted to be used in GC code. >
>>
>> Done
>>
>>
>>> - src/hotspot/share/memory/filemap.cpp > > I see no change for line
>>> 686? >
>> A space character is missing after the word "fragmentation."
>>
>>
>>> - src/hotspot/share/memory/heapShared.cpp > > Please revert the
>>> renaming for >
>> HeapShared::check_closed_archive_heap_region_object. >
>> HeapShared::check_closed_archive_heap_region_object_class is not >
>> correct. We check the objects in the graph, not the classes. It's >
>> better to use a separate bug and changeset for the bug fix in this >
>> function. So it's not mixed with the change for 8214388 for better >
>> back-tracking.
>> I'll revert the whole change and create a new bug.
>>
>> I found the name check_closed_archive_heap_region_object confusing
>> because it says "object" but passes a class. I think it's better to
>> pass the object, and then have this function read the class from the
>> object. I'll send a separate RFR.
>>
>>> > - src/hotspot/share/memory/metaspaceShared.cpp > > The G1 specific
>> code needs to be wrapped with #if INCLUDE_G1GC. >
>> Done
>>
>>
>>> 391 log_warning(cds)("[line %d] extra interned string >
>>> ignored; size too large: %d", 392 > reader.last_line_no(),
>> utf8_length); > > You can use cds+heap for object archiving related
>> logging. >
>> Done
>>
>>
>>> - 1718 if (UseG1GC && >
>>> HeapShared::is_heap_object_archiving_allowed()) { > > As Stefan also
>> pointed out, you don't need UseG1GC check here. >
>> HeapShared::is_heap_object_archiving_allowed already includes it. >
>> Fixed
>>
>>
>>> G1HeapVerifier::verify_ready_for_archiving() is called into two >
>>> places. One is in HeapShared::archive_java_heap_objects and the
>> other > one is in HeapShared::archive_java_heap_objects. The
>> duplication > should be eliminated.
>> I removed the one inside metaspaceShared.cpp
>>
>>
>>> I also have the question for the need for >
>>> G1HeapVerifier::verify_ready_for_archiving(). As there is already >
>> more efficient check in FileMapInfo::write_archive_heap_regions to >
>> detect fragmented archive regions, verify_ready_for_archiving does >
>> not add additional values. It's also a more expensive check. >
>> It's suppose to provide better logging for diagnosing future
>> fragmentations (much easier to read than tracing all the gc+region
>> logs.). I don't know about you, but for me trying to find out why
>> fragmentation happened in JDK-8214217 was pretty damn tedious!
>>
>> It's not expensive at all since it just scans the list of regions once.
>>
>> Thanks
>>
>> - Ioi
>>
>>
>>> Thanks, Jiangli > > On 11/30/18 11:02 AM, Ioi Lam wrote: >> Hi
>>> Stefan, >> >> Thanks
>> for the review. Here's an updated patch: >> >>
>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02/
>> >> >>
>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02-delta/
>>>> >> Please see responses to your comments below. >> >> I also added
>>>> code
>> to handle humongous strings that can be allocated >> during dump
>> time, and make sure they are GC'ed (and thus no >> humongous regions)
>> before we starting dumping the heap. >> >> I'll add one more test to
>> test the handling of humongous regions >> that exist while the heap
>> is being dumped, but that will require a >> little hacking with
>> JVMTI. I'll post that later :-) >> >> >> On 11/30/18 3:17 AM, Stefan
>> Johansson wrote: >>> Hi Ioi, >>> >>> I mostly looked at the GC
>> interaction, some comments below. >>> >>> On 2018-11-30 02:07, Ioi
>> Lam wrote: >>>>
>> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/
>> >>> >>>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>>>>> ------------------------------------------- 325 #if INCLUDE_CDS
>>>>> >>> 326 if (DumpSharedSpaces && >>>
>> HeapShared::is_heap_object_archiving_allowed()) { 327 // See >>>
>> G1HeapVerifier::verify_ready_for_archiving() 328 // This >>> should
>> not happen during normal operation of -Xshare:dump, so >>> let's give
>> a warning. 329 log_warning(gc)("Avoid allocating >>> humongous
>> objects during -Xshare:dump (" SIZE_FORMAT 330 >>> " bytes) - may
>> cause fragmentation", 331 >>> word_size * HeapWordSize); 332 } 333
>> #endif >>> >>> I think this warning should be left out, you still
>> have output in >>> the verification that tells the user what is
>> problematic. >>> >> >> Removed. >> >>>
>> /src/hotspot/share/gc/g1/g1HeapVerifier.cpp >>>
>> ------------------------------------------- What do you think >>>
>> about extending the verification with one more state, maybe >>>
>> _fatal_hole. When iterating the regions you can then look for >>>
>> holes not caused by humongous. If such holes are found log an >>>
>> error and fail. >>> >>> You can then turn the current log_error into
>> a log_warning >>> letting the user know there has been humongous
>> allocations which >>> can cause problems during dump time. >>> >>>
>> 314 // turn on the following assert only if we disallow >>> humongous
>> allocations during 315 // dump time. 316 >>>
>> //assert(!cl.has_holes(), "must not have holes"); >>> >>> I think you
>> should remove these lines. >>> >> >> I modified the verification loop
>> as you suggested, and changed the >> assert to check that all holes
>> must be caused by humongous >> regions. >> >>>
>> src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp >>>
>> -------------------------------------------------- 184 #if >>>
>> INCLUDE_CDS 185 if (DumpSharedSpaces && >>>
>> HeapShared::is_heap_object_archiving_allowed()) { 186 // See >>>
>> G1HeapVerifier::verify_ready_for_archiving() 187 return 1; >>> 188 }
>> 189 #endif 190 >>> >>> I think this decision should be moved to the
>> g1 argument parsing. >>> In G1Arguments::initialize() you can add
>> this after the initial >>> setting of ParallelGCThreads. >>> >>> //
>> When dumping the CDS archive we want to reduce fragmentation >>> by
>> // triggering a full collection. To get as low fragmentation >>> as
>> // possible we only use one worker thread. if >>> (DumpSharedSpaces)
>> { FLAG_SET_ERGO(uint, ParallelGCThreads, 1); >>> FLAG_SET_ERGO(bool,
>> ExplicitGCInvokesConcurrent, false); } >>> >> >> Done. >> >>> As you
>> see I also include setting ExplicitGCInvokesConcurrent to >>> false,
>> because if that flag is set a full gc will be turned into >>> a
>> concurrent one. >>> >>> src/hotspot/share/memory/metaspaceShared.cpp
>> >>> -------------------------------------------- 1706 #if >>>
>> INCLUDE_CDS_JAVA_HEAP 1707 if (UseG1GC && >>>
>> HeapShared::is_heap_object_archiving_allowed()) { 1708 // >>> Avoid
>> fragmentation while archiving heap objects. 1709 >>>
>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true);
>> >>> >>> 1710 Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>>>> 1711 >>>
>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(false);
>> >>> >>> 1712 G1HeapVerifier::verify_ready_for_archiving();
>>>>> 1713 } 1714 #endif >>> >>> Do you need the #if here,
>>>>> is_heap_object_archiving_allowed()
>> only >>> returns true when INCLUDE_CDS_JAVA_HEAP is true. Also the
>> check >>> for UseG1GC is also done within >>>
>> is_heap_object_archiving_allowed(). >>> >>> Regarding triggering the
>> GC, the calls to the soft_ref_policy is >>> not needed unless you
>> care about clearing soft references. >> >> I removed the #if. I kept
>> the clearing of soft refs -- soft refs >> are not archived, so
>> clearing them would give more free space for >> archiving. >> >>> An
>> even nicer solution would be to add a new GCCause that is only >>>
>> used for CDS dumping, that way you can also get a better message >>>
>> in the GC log than "System.gc()" and you can also skip the >>>
>> setting of ExplicitGCInvokesConcurrent that I mentioned above. >>>
>> Since the new GCCause won't be affected by that flag. >>> >> >> I
>> added GCCause::_cds_dump and skipped the setting of >>
>> ExplicitGCInvokesConcurrent >> >> Thanks - Ioi >> >>> Thanks, Stefan
>> >>> >>>> https://bugs.openjdk.java.net/browse/JDK-8214388 >>>> >>>>
>> >>>> Symptom: ======== >>>> >>>> "java -Xshare:dump" would
>> intermittently fail with >>>> >>>> Unable to write archive heap ...
>> due to fragmentation. >>>> >>>> This usually happens when you try to
>> dump many classes (e.g. >>>> 10000) with a relatively small heap
>> (e.g., 1g) with a lot of GC >>>> threads (e.g., 24). >>>> >>>>
>> (Example use case -- Eclipse IDE loads 15,000 classes with >>>> 512MB
>> heap.) >>>> >>>> When GC happens during class loading, some old G1
>> regions may >>>> be placed at the top end of the heap (due to large
>> number of GC >>>> threads). >>>> >>>> Later, when writing the
>> archived heap, G1 tries to allocate >>>> contiguous regions from the
>> top end of the heap. This would >>>> fail due to the presence of
>> those old regions. >>>> >>>> >>>> Fix: ==== >>>> >>>> As suggested
>> by Stefan Johansson, we run a full GC with a >>>> single GC thread.
>> This guarantees that all old blocks will be >>>> moved to the bottom
>> end of the heap. >>>> >>>> Because there's no API for specifying the
>> number of GC threads >>>> dynamically, and CDS dump time doesn't
>> allocates lots of >>>> objects, I have statically forced the number
>> of threads to 1 in >>>> AdaptiveSizePolicy::calc_active_workers
>> during CDS dump time. >>>> >>>> (This seems like a more direct way
>> than assigning >>>> ParallelGCThreads ...) >>>> >>>> >>>> Notes:
>> ====== >>>> >>>> 1. Humongous regions cannot move. However, currently
>> we don't >>>> do humongous allocations during CDS dump, so we should
>> be fine. >>>> I have added diagnostics warnings so if fragmentation
>> does >>>> happen in the future, the user can find out why. >>>> >>>>
>> 2. Fixed a minor bug in >>>>
>> HeapShared::check_closed_archive_heap_region_object_class >>>> >>>>
>> 3. Fixed a bug in MetaspaceShared::read_extra_data, where the >>>>
>> symbol/strings would be lost due to GC. >>>> >>>> 4. Added stress
>> test to successfully archive about 18MB of >>>> objects with -Xmx64m.
>> This used to fail even with -Xmx512m on a >>>> Solaris box. >>>> >>>>
>> 5. With default CDS archive generation during JDK build time, >>>>
>> -Xmx128m is used. Before this fix, the EDEN region lives at the >>>>
>> top of the heap during CDS dump time, and we end up with a 2MB >>>>
>> gap between the archive regions and the top of the heap. >>>> Because
>> the archive regions cannot move, at run time, using >>>> CDS would
>> reduce the max humongous allocation by 2MB. >>>> >>>> With this fix,
>> the archive regions are now placed at the very >>>> top of the heap,
>> so the gap no longer exists. >>>> >>>> >>>> Tests: ====== >>>> >>>>
>> Running hs-tiers{1-6} for sanity. >>>> >>>> Thanks - Ioi >>>> >>>> >> >
More information about the hotspot-runtime-dev
mailing list