RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Mon Jun 1 03:27:08 UTC 2020

On Fri, May 29, 2020 at 10:44 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 5/29/20 8:40 PM, Jiangli Zhou wrote:
> > On Fri, May 29, 2020 at 7:30 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >> https://bugs.openjdk.java.net/browse/JDK-8245925
> >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>
> >>
> >> Summary:
> >>
> >> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >> CDS executes a full GC so that G1 will compact the heap regions, leaving
> >> maximum contiguous free space at the top of the heap. Then, the archived
> >> heap regions are allocated from the top of the heap.
> >>
> >> Under some circumstances, java.lang.ref.Cleaners will execute
> >> after the GC has completed. The cleaners may allocate or synchronized, which
> >> will cause G1 to allocate an EDEN region at the top of the heap.
> > This is an interesting one. Please give more details on under what
> > circumstances java.lang.ref.Cleaners causes the issue. It's unclear to
> > me why it hasn't been showing up before.
>
> Hi Jiangli,
>
> Thanks for the review. It's very helpful.
>
> The assert (see my comment in JDK-8245925) happened in my prototype for
> JDK-8244778
>
> http://cr.openjdk.java.net/~iklam/jdk15/8244778-archive-full-module-graph.v00.8/
>
> I have to archive AppClassLoader and PlatformClassLoader, but need to
> clear their URLClassPath field (the "ucp" field). See
> clear_loader_states in metaspaceShared.cpp. Because of this, some
> java.util.zip.ZipFiles referenced by the URLClassPath  become garbage,
> and their Cleaners are executed after full GC has finished.
>

I haven't looked at your 8244778-archive-full-module-graph change yet,
if you are going to archive class loader objects, you probably want to
go with a solution that scrubs fields that are 'not archivable' and
then restores at runtime. Sounds like you are going with that. When I
worked on the initial implementation for system module object
archiving, I implemented static field scrubber with the goal for
archiving class loaders. I didn't complete it as it was not yet
needed, but the code probably is helpful for you now. I might have
sent you the pointer to one of the versions at the time, but try
looking under my old /home directory if it's still around. It might be
good to trigger runtime field restoration by Java code, that's the
part I haven't fully explored yet. But, hopefully these inputs would
be useful for your current work.

> I think the bug has always existed, but is just never triggered because
> we have not activated the Cleaners.
>
> >
> >>
> >> The fix is simple -- after CDS has entered a safepoint, if EDEN regions
> >> exist,
> >> exit the safepoint, run GC, and try again. Eventually all the cleaners will
> >> be executed and no more allocation can happen.
> >>
> >> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>
> > I think it's better to skip the top allocated region(s) in such cases
> > and avoid retrying. Dump time performance is important, as we are
> > moving the cost from runtime to CDS dump time. It's desirable to keep
> > the dump time cost as low as possible, so using CDS delivers better
> > net gain overall.
> >
> > Here are some comments for your current webrev itself.
> >
> > 1611 static bool has_unwanted_g1_eden_regions() {
> > 1612 #if INCLUDE_G1GC
> > 1613   return HeapShared::is_heap_object_archiving_allowed() && UseG1GC &&
> > 1614          G1CollectedHeap::heap()->eden_regions_count() > 0;
> > 1615 #else
> > 1616   return false;
> > 1617 #endif
> > 1618 }
> >
> > You can remove 'UseG1GC' from line 1613, as
> > is_heap_object_archiving_allowed() check already covers it:
> >
> > static bool is_heap_object_archiving_allowed() {
> >    CDS_JAVA_HEAP_ONLY(return (UseG1GC && UseCompressedOops &&
> > UseCompressedClassPointers);)
> >    NOT_CDS_JAVA_HEAP(return false;)
> > }
> >
> > Please include heap archiving code under #if INCLUDE_CDS_JAVA_HEAP.
> > It's better to extract the GC handling code in
> > VM_PopulateDumpSharedSpace::doit() into a separate API in
> > heapShared.*.
> >
> > It's time to enhance heap archiving to use a separate buffer when
> > copying the objects at dump time (discussed before), as a longer term
> > fix. I'll file a RFE.
>
> Thanks for reminding me. I think that is a better way to fix this
> problem. It should be fairly easy to do, as we can already relocate the
> heap regions using HeapShared::patch_archived_heap_embedded_pointers().
> Let me try to implement it.
>

Sounds good. Thanks for doing that.

> BTW, the GC speed is very fast, because the heap is not used very much
> during -Xshare:dump. -Xlog:gc shows:
>
> [0.259s][info][gc ] GC(0) Pause Full (Full GC for -Xshare:dump)
> 4M->1M(32M) 8.220ms
>
> So we have allocated only 4MB of objects, and only 1MB of those are
> reachable.
>
> Anyway, I think we can even avoid running the GC altogether. We can scan
> for contiguous free space from the top of the heap (below the EDEN
> region). If there's more contiguous free space than the current
> allocated heap regions, we know for sure that we can archive all the
> heap objects that we need without doing the GC. That can be done as an
> RFE after copying the objects. It won't save much though (8ms out of
> about 700ms of total -Xshare:dump time).

Looks like we had similar thoughts about finding free heap regions for
copying. Here are the details:

Solution 1):
Allocate a buffer (no specific memory location requirement) and copy
the heap objects to the buffer. Additional buffers can be allocated
when needed, and they don't need to form a consecutive block of
memory. The pointers within the copied Java objects need to be
computed from the Java heap top as the 'real heap address' (so their
runtime positions are at the heap top), instead of the buffer address.
Region verification code also needs to be updated to reflect how the
pointers are computed now.

Solution 2):
Find a range (consecutive heap regions) within the Java heap that is
free. Copy the archive objects to that range. The pointer update and
verification are similar to solution 1).

I think you can go with solution 1 now. Solution 2) has the benefit of
not requiring additional memory for copying archived objects. That's
important as I did run into insufficient memory at dump time in real
use cases, so any memory saving at dump time is desirable. It's better
to go with 2) when the size of archive range is known before copying.
With the planned work for class pre-initialization and enhanced object
archiving support, it will be able to obtain (or have a good estimate
of) the total size before copying. Solution 1 can be enhanced to use
heap memory when that happens.

I'll log these details as a RFE on Monday.

Best,
Jiangli

>
> I'll withdraw this patch for now, and will try to implement the object
> copying.
>
> Thanks
> - Ioi
>
> >
> > Best,
> > Jiangli
> >
> >> Thanks
> >> - Ioi
> >>
> >>
> >> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>