RFR(S) 8214388 CDS dumping fails with java heap fragmentation
Stefan Johansson
stefan.johansson at oracle.com
Mon Dec 3 09:28:51 UTC 2018
Hi Ioi,
Thanks for addressing my comments I think this version looks good from a
G1 point of view.
Thanks,
Stefan
On 2018-11-30 23:36, Ioi Lam wrote:
> Hi Jiangli,
>
> Thanks for the review!
>
>
> Updated webrev:
> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v03-delta/
>
>
> On 11/30/18 1:04 PM, Jiangli Zhou wrote:
>> Hi Ioi, > > - src/hotspot/share/gc/g1/g1Arguments.cpp > > You missed
> FLAG_SET_ERGO(bool, ExplicitGCInvokesConcurrent, false);
>
> As Stefan mentioned below, doing this is not necessary because I have
> added a new GC cause.
>
>> > - src/hotspot/share/gc/g1/g1CollectedHeap.cpp > > -
> src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp > > Above files have
> no change.
>
> I had multiple MQ patches that cancel each other out. Webrev still
> generates an empty diff for those files. Please ignore them.
>
>> > - src/hotspot/share/gc/g1/g1HeapVerifier.cpp > > 320 log_warning(gc,
> verify)("All free regions should be at the > top end of the heap, but "
> 321 " we found > holes. This is probably caused by (unmovable)
> humongous" 322 > " allocations, and may lead to fragmentation while" 323
> > " writing archive heap memory regions."); > > This warning is not
> very clear and can be false alarming. The hole in > the middle of the
> heap causes heap fragmentation, but not necessarily > in the archived
> regions. I'd leave the warning out and let the > existing code in
> FileMapInfo::write_archive_heap_regions to detect > archive
> fragmentation and report error. >
> This warning is about a *possibility* of fragmentation. It gives the
> user a chance to check what's going on, without having to wait for an
> actual failure to happen.
>
> In practice, this means if the JDK start-up code is changed in the
> future to allocate a humongous array, someone will discover this while
> building the JDK and dumping the default archive. That way, we can fix
> the JDK to avoid the use of humongous arrays.
>
> In fact, I just modified the code to warn against humongous regions even
> if no holes are found.
>
>
>> - src/hotspot/share/gc/shared/gcCause.cpp > > I'd suggest renaming _cds_dump to _archive_time_gc. The word >
> 'archive' has been adapted to be used in GC code. >
>
> Done
>
>
>> - src/hotspot/share/memory/filemap.cpp > > I see no change for line 686? >
> A space character is missing after the word "fragmentation."
>
>
>> - src/hotspot/share/memory/heapShared.cpp > > Please revert the renaming for >
> HeapShared::check_closed_archive_heap_region_object. >
> HeapShared::check_closed_archive_heap_region_object_class is not >
> correct. We check the objects in the graph, not the classes. It's >
> better to use a separate bug and changeset for the bug fix in this >
> function. So it's not mixed with the change for 8214388 for better >
> back-tracking.
> I'll revert the whole change and create a new bug.
>
> I found the name check_closed_archive_heap_region_object confusing
> because it says "object" but passes a class. I think it's better to pass
> the object, and then have this function read the class from the object.
> I'll send a separate RFR.
>
>> > - src/hotspot/share/memory/metaspaceShared.cpp > > The G1 specific
> code needs to be wrapped with #if INCLUDE_G1GC. >
> Done
>
>
>> 391 log_warning(cds)("[line %d] extra interned string > ignored; size too large: %d", 392 > reader.last_line_no(),
> utf8_length); > > You can use cds+heap for object archiving related
> logging. >
> Done
>
>
>> - 1718 if (UseG1GC && > HeapShared::is_heap_object_archiving_allowed()) { > > As Stefan also
> pointed out, you don't need UseG1GC check here. >
> HeapShared::is_heap_object_archiving_allowed already includes it. >
> Fixed
>
>
>> G1HeapVerifier::verify_ready_for_archiving() is called into two > places. One is in HeapShared::archive_java_heap_objects and the
> other > one is in HeapShared::archive_java_heap_objects. The duplication
> > should be eliminated.
> I removed the one inside metaspaceShared.cpp
>
>
>> I also have the question for the need for > G1HeapVerifier::verify_ready_for_archiving(). As there is already >
> more efficient check in FileMapInfo::write_archive_heap_regions to >
> detect fragmented archive regions, verify_ready_for_archiving does > not
> add additional values. It's also a more expensive check. >
> It's suppose to provide better logging for diagnosing future
> fragmentations (much easier to read than tracing all the gc+region
> logs.). I don't know about you, but for me trying to find out why
> fragmentation happened in JDK-8214217 was pretty damn tedious!
>
> It's not expensive at all since it just scans the list of regions once.
>
> Thanks
>
> - Ioi
>
>
>> Thanks, Jiangli > > On 11/30/18 11:02 AM, Ioi Lam wrote: >> Hi Stefan, >> >> Thanks
> for the review. Here's an updated patch: >> >>
> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02/
> >> >>
> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v02-delta/
>>> >> Please see responses to your comments below. >> >> I also added code
> to handle humongous strings that can be allocated >> during dump time,
> and make sure they are GC'ed (and thus no >> humongous regions) before
> we starting dumping the heap. >> >> I'll add one more test to test the
> handling of humongous regions >> that exist while the heap is being
> dumped, but that will require a >> little hacking with JVMTI. I'll post
> that later :-) >> >> >> On 11/30/18 3:17 AM, Stefan Johansson wrote: >>>
> Hi Ioi, >>> >>> I mostly looked at the GC interaction, some comments
> below. >>> >>> On 2018-11-30 02:07, Ioi Lam wrote: >>>>
> http://cr.openjdk.java.net/~iklam/jdk12/8214388-dumptime-fragmentation.v01/
> >>> >>>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>>>> ------------------------------------------- 325 #if INCLUDE_CDS >>> 326 if (DumpSharedSpaces && >>>
> HeapShared::is_heap_object_archiving_allowed()) { 327 // See >>>
> G1HeapVerifier::verify_ready_for_archiving() 328 // This >>> should not
> happen during normal operation of -Xshare:dump, so >>> let's give a
> warning. 329 log_warning(gc)("Avoid allocating >>> humongous objects
> during -Xshare:dump (" SIZE_FORMAT 330 >>> " bytes) - may cause
> fragmentation", 331 >>> word_size * HeapWordSize); 332 } 333 #endif >>>
> >>> I think this warning should be left out, you still have output in
> >>> the verification that tells the user what is problematic. >>> >> >>
> Removed. >> >>> /src/hotspot/share/gc/g1/g1HeapVerifier.cpp >>>
> ------------------------------------------- What do you think >>> about
> extending the verification with one more state, maybe >>> _fatal_hole.
> When iterating the regions you can then look for >>> holes not caused by
> humongous. If such holes are found log an >>> error and fail. >>> >>>
> You can then turn the current log_error into a log_warning >>> letting
> the user know there has been humongous allocations which >>> can cause
> problems during dump time. >>> >>> 314 // turn on the following assert
> only if we disallow >>> humongous allocations during 315 // dump time.
> 316 >>> //assert(!cl.has_holes(), "must not have holes"); >>> >>> I
> think you should remove these lines. >>> >> >> I modified the
> verification loop as you suggested, and changed the >> assert to check
> that all holes must be caused by humongous >> regions. >> >>>
> src/hotspot/share/gc/shared/adaptiveSizePolicy.cpp >>>
> -------------------------------------------------- 184 #if >>>
> INCLUDE_CDS 185 if (DumpSharedSpaces && >>>
> HeapShared::is_heap_object_archiving_allowed()) { 186 // See >>>
> G1HeapVerifier::verify_ready_for_archiving() 187 return 1; >>> 188 } 189
> #endif 190 >>> >>> I think this decision should be moved to the g1
> argument parsing. >>> In G1Arguments::initialize() you can add this
> after the initial >>> setting of ParallelGCThreads. >>> >>> // When
> dumping the CDS archive we want to reduce fragmentation >>> by //
> triggering a full collection. To get as low fragmentation >>> as //
> possible we only use one worker thread. if >>> (DumpSharedSpaces) {
> FLAG_SET_ERGO(uint, ParallelGCThreads, 1); >>> FLAG_SET_ERGO(bool,
> ExplicitGCInvokesConcurrent, false); } >>> >> >> Done. >> >>> As you see
> I also include setting ExplicitGCInvokesConcurrent to >>> false, because
> if that flag is set a full gc will be turned into >>> a concurrent one.
> >>> >>> src/hotspot/share/memory/metaspaceShared.cpp >>>
> -------------------------------------------- 1706 #if >>>
> INCLUDE_CDS_JAVA_HEAP 1707 if (UseG1GC && >>>
> HeapShared::is_heap_object_archiving_allowed()) { 1708 // >>> Avoid
> fragmentation while archiving heap objects. 1709 >>>
> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true);
> >>> >>> 1710 Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>>> 1711 >>>
> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(false);
> >>> >>> 1712 G1HeapVerifier::verify_ready_for_archiving();
>>>> 1713 } 1714 #endif >>> >>> Do you need the #if here, is_heap_object_archiving_allowed()
> only >>> returns true when INCLUDE_CDS_JAVA_HEAP is true. Also the check
> >>> for UseG1GC is also done within >>>
> is_heap_object_archiving_allowed(). >>> >>> Regarding triggering the GC,
> the calls to the soft_ref_policy is >>> not needed unless you care about
> clearing soft references. >> >> I removed the #if. I kept the clearing
> of soft refs -- soft refs >> are not archived, so clearing them would
> give more free space for >> archiving. >> >>> An even nicer solution
> would be to add a new GCCause that is only >>> used for CDS dumping,
> that way you can also get a better message >>> in the GC log than
> "System.gc()" and you can also skip the >>> setting of
> ExplicitGCInvokesConcurrent that I mentioned above. >>> Since the new
> GCCause won't be affected by that flag. >>> >> >> I added
> GCCause::_cds_dump and skipped the setting of >>
> ExplicitGCInvokesConcurrent >> >> Thanks - Ioi >> >>> Thanks, Stefan >>>
> >>>> https://bugs.openjdk.java.net/browse/JDK-8214388 >>>> >>>> >>>>
> Symptom: ======== >>>> >>>> "java -Xshare:dump" would intermittently
> fail with >>>> >>>> Unable to write archive heap ... due to
> fragmentation. >>>> >>>> This usually happens when you try to dump many
> classes (e.g. >>>> 10000) with a relatively small heap (e.g., 1g) with a
> lot of GC >>>> threads (e.g., 24). >>>> >>>> (Example use case --
> Eclipse IDE loads 15,000 classes with >>>> 512MB heap.) >>>> >>>> When
> GC happens during class loading, some old G1 regions may >>>> be placed
> at the top end of the heap (due to large number of GC >>>> threads).
> >>>> >>>> Later, when writing the archived heap, G1 tries to allocate
> >>>> contiguous regions from the top end of the heap. This would >>>>
> fail due to the presence of those old regions. >>>> >>>> >>>> Fix: ====
> >>>> >>>> As suggested by Stefan Johansson, we run a full GC with a
> >>>> single GC thread. This guarantees that all old blocks will be >>>>
> moved to the bottom end of the heap. >>>> >>>> Because there's no API
> for specifying the number of GC threads >>>> dynamically, and CDS dump
> time doesn't allocates lots of >>>> objects, I have statically forced
> the number of threads to 1 in >>>>
> AdaptiveSizePolicy::calc_active_workers during CDS dump time. >>>> >>>>
> (This seems like a more direct way than assigning >>>> ParallelGCThreads
> ...) >>>> >>>> >>>> Notes: ====== >>>> >>>> 1. Humongous regions cannot
> move. However, currently we don't >>>> do humongous allocations during
> CDS dump, so we should be fine. >>>> I have added diagnostics warnings
> so if fragmentation does >>>> happen in the future, the user can find
> out why. >>>> >>>> 2. Fixed a minor bug in >>>>
> HeapShared::check_closed_archive_heap_region_object_class >>>> >>>> 3.
> Fixed a bug in MetaspaceShared::read_extra_data, where the >>>>
> symbol/strings would be lost due to GC. >>>> >>>> 4. Added stress test
> to successfully archive about 18MB of >>>> objects with -Xmx64m. This
> used to fail even with -Xmx512m on a >>>> Solaris box. >>>> >>>> 5. With
> default CDS archive generation during JDK build time, >>>> -Xmx128m is
> used. Before this fix, the EDEN region lives at the >>>> top of the heap
> during CDS dump time, and we end up with a 2MB >>>> gap between the
> archive regions and the top of the heap. >>>> Because the archive
> regions cannot move, at run time, using >>>> CDS would reduce the max
> humongous allocation by 2MB. >>>> >>>> With this fix, the archive
> regions are now placed at the very >>>> top of the heap, so the gap no
> longer exists. >>>> >>>> >>>> Tests: ====== >>>> >>>> Running
> hs-tiers{1-6} for sanity. >>>> >>>> Thanks - Ioi >>>> >>>> >> >
More information about the hotspot-runtime-dev
mailing list