RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Mon Jun 1 03:33:20 UTC 2020

On Sun, May 31, 2020 at 8:27 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>
> On Fri, May 29, 2020 at 10:44 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >
> >
> >
> > On 5/29/20 8:40 PM, Jiangli Zhou wrote:
> > > On Fri, May 29, 2020 at 7:30 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> > >> https://bugs.openjdk.java.net/browse/JDK-8245925
> > >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> > >>
> > >>
> > >> Summary:
> > >>
> > >> CDS supports archived heap objects only for G1. During -Xshare:dump,
> > >> CDS executes a full GC so that G1 will compact the heap regions, leaving
> > >> maximum contiguous free space at the top of the heap. Then, the archived
> > >> heap regions are allocated from the top of the heap.
> > >>
> > >> Under some circumstances, java.lang.ref.Cleaners will execute
> > >> after the GC has completed. The cleaners may allocate or synchronized, which
> > >> will cause G1 to allocate an EDEN region at the top of the heap.
> > > This is an interesting one. Please give more details on under what
> > > circumstances java.lang.ref.Cleaners causes the issue. It's unclear to
> > > me why it hasn't been showing up before.
> >
> > Hi Jiangli,
> >
> > Thanks for the review. It's very helpful.
> >
> > The assert (see my comment in JDK-8245925) happened in my prototype for
> > JDK-8244778
> >
> > http://cr.openjdk.java.net/~iklam/jdk15/8244778-archive-full-module-graph.v00.8/
> >
> > I have to archive AppClassLoader and PlatformClassLoader, but need to
> > clear their URLClassPath field (the "ucp" field). See
> > clear_loader_states in metaspaceShared.cpp. Because of this, some
> > java.util.zip.ZipFiles referenced by the URLClassPath  become garbage,
> > and their Cleaners are executed after full GC has finished.
> >
>
> I haven't looked at your 8244778-archive-full-module-graph change yet,
> if you are going to archive class loader objects, you probably want to
> go with a solution that scrubs fields that are 'not archivable' and
> then restores at runtime. Sounds like you are going with that. When I
> worked on the initial implementation for system module object
> archiving, I implemented static field scrubber with the goal for
> archiving class loaders. I didn't complete it as it was not yet
> needed, but the code probably is helpful for you now. I might have
> sent you the pointer to one of the versions at the time, but try
> looking under my old /home directory if it's still around. It might be
> good to trigger runtime field restoration by Java code, that's the
> part I haven't fully explored yet. But, hopefully these inputs would
> be useful for your current work.
>
> > I think the bug has always existed, but is just never triggered because
> > we have not activated the Cleaners.
> >
> > >
> > >>
> > >> The fix is simple -- after CDS has entered a safepoint, if EDEN regions
> > >> exist,
> > >> exit the safepoint, run GC, and try again. Eventually all the cleaners will
> > >> be executed and no more allocation can happen.
> > >>
> > >> For safety, I limit the retry count to 30 (or about total 9 seconds).
> > >>
> > > I think it's better to skip the top allocated region(s) in such cases
> > > and avoid retrying. Dump time performance is important, as we are
> > > moving the cost from runtime to CDS dump time. It's desirable to keep
> > > the dump time cost as low as possible, so using CDS delivers better
> > > net gain overall.
> > >
> > > Here are some comments for your current webrev itself.
> > >
> > > 1611 static bool has_unwanted_g1_eden_regions() {
> > > 1612 #if INCLUDE_G1GC
> > > 1613   return HeapShared::is_heap_object_archiving_allowed() && UseG1GC &&
> > > 1614          G1CollectedHeap::heap()->eden_regions_count() > 0;
> > > 1615 #else
> > > 1616   return false;
> > > 1617 #endif
> > > 1618 }
> > >
> > > You can remove 'UseG1GC' from line 1613, as
> > > is_heap_object_archiving_allowed() check already covers it:
> > >
> > > static bool is_heap_object_archiving_allowed() {
> > >    CDS_JAVA_HEAP_ONLY(return (UseG1GC && UseCompressedOops &&
> > > UseCompressedClassPointers);)
> > >    NOT_CDS_JAVA_HEAP(return false;)
> > > }
> > >
> > > Please include heap archiving code under #if INCLUDE_CDS_JAVA_HEAP.
> > > It's better to extract the GC handling code in
> > > VM_PopulateDumpSharedSpace::doit() into a separate API in
> > > heapShared.*.
> > >
> > > It's time to enhance heap archiving to use a separate buffer when
> > > copying the objects at dump time (discussed before), as a longer term
> > > fix. I'll file a RFE.
> >
> > Thanks for reminding me. I think that is a better way to fix this
> > problem. It should be fairly easy to do, as we can already relocate the
> > heap regions using HeapShared::patch_archived_heap_embedded_pointers().
> > Let me try to implement it.
> >
>
> Sounds good. Thanks for doing that.
>
> > BTW, the GC speed is very fast, because the heap is not used very much
> > during -Xshare:dump. -Xlog:gc shows:
> >
> > [0.259s][info][gc ] GC(0) Pause Full (Full GC for -Xshare:dump)
> > 4M->1M(32M) 8.220ms
> >
> > So we have allocated only 4MB of objects, and only 1MB of those are
> > reachable.
> >
> > Anyway, I think we can even avoid running the GC altogether. We can scan
> > for contiguous free space from the top of the heap (below the EDEN
> > region). If there's more contiguous free space than the current
> > allocated heap regions, we know for sure that we can archive all the
> > heap objects that we need without doing the GC. That can be done as an
> > RFE after copying the objects. It won't save much though (8ms out of
> > about 700ms of total -Xshare:dump time).
>
> Looks like we had similar thoughts about finding free heap regions for
> copying. Here are the details:
>
> Solution 1):
> Allocate a buffer (no specific memory location requirement) and copy
> the heap objects to the buffer. Additional buffers can be allocated
> when needed, and they don't need to form a consecutive block of
> memory. The pointers within the copied Java objects need to be
> computed from the Java heap top as the 'real heap address' (so their
> runtime positions are at the heap top), instead of the buffer address.
> Region verification code also needs to be updated to reflect how the
> pointers are computed now.
>
> Solution 2):
> Find a range (consecutive heap regions) within the Java heap that is
> free. Copy the archive objects to that range. The pointer update and
> verification are similar to solution 1).
>
> I think you can go with solution 1 now. Solution 2) has the benefit of
> not requiring additional memory for copying archived objects. That's
> important as I did run into insufficient memory at dump time in real
> use cases, so any memory saving at dump time is desirable.

Some clarifications to avoid confusion: The insufficient memory is due
to memory restriction for builds in cloud environment.

Thanks,
Jiangli

> It's better
> to go with 2) when the size of archive range is known before copying.
> With the planned work for class pre-initialization and enhanced object
> archiving support, it will be able to obtain (or have a good estimate
> of) the total size before copying. Solution 1 can be enhanced to use
> heap memory when that happens.
>
> I'll log these details as a RFE on Monday.
>
> Best,
> Jiangli
>
> >
> > I'll withdraw this patch for now, and will try to implement the object
> > copying.
> >
> > Thanks
> > - Ioi
> >
> > >
> > > Best,
> > > Jiangli
> > >
> > >> Thanks
> > >> - Ioi
> > >>
> > >>
> > >> <https://bugs.openjdk.java.net/browse/JDK-8245925>
> >