RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Tue Jun 2 02:28:25 UTC 2020

On 5/31/20 8:33 PM, Jiangli Zhou wrote:
> On Sun, May 31, 2020 at 8:27 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>> On Fri, May 29, 2020 at 10:44 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>
>>>
>>> On 5/29/20 8:40 PM, Jiangli Zhou wrote:
>>>> On Fri, May 29, 2020 at 7:30 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
>>>>>
>>>>>
>>>>> Summary:
>>>>>
>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>>> CDS executes a full GC so that G1 will compact the heap regions, leaving
>>>>> maximum contiguous free space at the top of the heap. Then, the archived
>>>>> heap regions are allocated from the top of the heap.
>>>>>
>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>>> after the GC has completed. The cleaners may allocate or synchronized, which
>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>> This is an interesting one. Please give more details on under what
>>>> circumstances java.lang.ref.Cleaners causes the issue. It's unclear to
>>>> me why it hasn't been showing up before.
>>> Hi Jiangli,
>>>
>>> Thanks for the review. It's very helpful.
>>>
>>> The assert (see my comment in JDK-8245925) happened in my prototype for
>>> JDK-8244778
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk15/8244778-archive-full-module-graph.v00.8/
>>>
>>> I have to archive AppClassLoader and PlatformClassLoader, but need to
>>> clear their URLClassPath field (the "ucp" field). See
>>> clear_loader_states in metaspaceShared.cpp. Because of this, some
>>> java.util.zip.ZipFiles referenced by the URLClassPath  become garbage,
>>> and their Cleaners are executed after full GC has finished.
>>>
>> I haven't looked at your 8244778-archive-full-module-graph change yet,
>> if you are going to archive class loader objects, you probably want to
>> go with a solution that scrubs fields that are 'not archivable' and
>> then restores at runtime. Sounds like you are going with that. When I
>> worked on the initial implementation for system module object
>> archiving, I implemented static field scrubber with the goal for
>> archiving class loaders. I didn't complete it as it was not yet
>> needed, but the code probably is helpful for you now. I might have
>> sent you the pointer to one of the versions at the time, but try
>> looking under my old /home directory if it's still around. It might be
>> good to trigger runtime field restoration by Java code, that's the
>> part I haven't fully explored yet. But, hopefully these inputs would
>> be useful for your current work.

Hi Jiangli,

I can't access your old home directory. I already implemented the field 
scrubbing in the above patch. It's in clear_loader_states in 
metaspaceShared.cpp.

>>> I think the bug has always existed, but is just never triggered because
>>> we have not activated the Cleaners.
>>>
>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN regions
>>>>> exist,
>>>>> exit the safepoint, run GC, and try again. Eventually all the cleaners will
>>>>> be executed and no more allocation can happen.
>>>>>
>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>>
>>>> I think it's better to skip the top allocated region(s) in such cases
>>>> and avoid retrying. Dump time performance is important, as we are
>>>> moving the cost from runtime to CDS dump time. It's desirable to keep
>>>> the dump time cost as low as possible, so using CDS delivers better
>>>> net gain overall.
>>>>
>>>> Here are some comments for your current webrev itself.
>>>>
>>>> 1611 static bool has_unwanted_g1_eden_regions() {
>>>> 1612 #if INCLUDE_G1GC
>>>> 1613   return HeapShared::is_heap_object_archiving_allowed() && UseG1GC &&
>>>> 1614          G1CollectedHeap::heap()->eden_regions_count() > 0;
>>>> 1615 #else
>>>> 1616   return false;
>>>> 1617 #endif
>>>> 1618 }
>>>>
>>>> You can remove 'UseG1GC' from line 1613, as
>>>> is_heap_object_archiving_allowed() check already covers it:
>>>>
>>>> static bool is_heap_object_archiving_allowed() {
>>>>     CDS_JAVA_HEAP_ONLY(return (UseG1GC && UseCompressedOops &&
>>>> UseCompressedClassPointers);)
>>>>     NOT_CDS_JAVA_HEAP(return false;)
>>>> }
>>>>
>>>> Please include heap archiving code under #if INCLUDE_CDS_JAVA_HEAP.
>>>> It's better to extract the GC handling code in
>>>> VM_PopulateDumpSharedSpace::doit() into a separate API in
>>>> heapShared.*.
>>>>
>>>> It's time to enhance heap archiving to use a separate buffer when
>>>> copying the objects at dump time (discussed before), as a longer term
>>>> fix. I'll file a RFE.
>>> Thanks for reminding me. I think that is a better way to fix this
>>> problem. It should be fairly easy to do, as we can already relocate the
>>> heap regions using HeapShared::patch_archived_heap_embedded_pointers().
>>> Let me try to implement it.
>>>
>> Sounds good. Thanks for doing that.
>>
>>> BTW, the GC speed is very fast, because the heap is not used very much
>>> during -Xshare:dump. -Xlog:gc shows:
>>>
>>> [0.259s][info][gc ] GC(0) Pause Full (Full GC for -Xshare:dump)
>>> 4M->1M(32M) 8.220ms
>>>
>>> So we have allocated only 4MB of objects, and only 1MB of those are
>>> reachable.
>>>
>>> Anyway, I think we can even avoid running the GC altogether. We can scan
>>> for contiguous free space from the top of the heap (below the EDEN
>>> region). If there's more contiguous free space than the current
>>> allocated heap regions, we know for sure that we can archive all the
>>> heap objects that we need without doing the GC. That can be done as an
>>> RFE after copying the objects. It won't save much though (8ms out of
>>> about 700ms of total -Xshare:dump time).
>> Looks like we had similar thoughts about finding free heap regions for
>> copying. Here are the details:
>>
>> Solution 1):
>> Allocate a buffer (no specific memory location requirement) and copy
>> the heap objects to the buffer. Additional buffers can be allocated
>> when needed, and they don't need to form a consecutive block of
>> memory. The pointers within the copied Java objects need to be
>> computed from the Java heap top as the 'real heap address' (so their
>> runtime positions are at the heap top), instead of the buffer address.
>> Region verification code also needs to be updated to reflect how the
>> pointers are computed now.
>>
>> Solution 2):
>> Find a range (consecutive heap regions) within the Java heap that is
>> free. Copy the archive objects to that range. The pointer update and
>> verification are similar to solution 1).
>>
>>

I am thinking of doing something simpler. During heap archiving, G1 just 
allocates the highest free region

bool G1ArchiveAllocator::alloc_new_region() {
   // Allocate the highest free region in the reserved heap,
   // and add it to our list of allocated regions. It is marked
   // archive and added to the old set.
   HeapRegion* hr = _g1h->alloc_highest_free_region();

If there are used regions scattered around in the heap, we will end up 
with a few archive regions that are not contiguous, and the highest 
archive region may not be flushed to the top of the heap.

Inside VM_PopulateDumpSharedSpace::dump_archive_heap_oopmaps, I will 
rewrite the pointers inside each of the archived regions, such that the 
contents would be the same as if all the archive regions were 
consecutively allocated at the top of the heap.

This will require no more memory allocation at dump time that what it 
does today, and can be done with very little overhead.

> I think you can go with solution 1 now. Solution 2) has the benefit of
> not requiring additional memory for copying archived objects. That's
> important as I did run into insufficient memory at dump time in real
> use cases, so any memory saving at dump time is desirable.
>
> Some clarifications to avoid confusion: The insufficient memory is due
> to memory restriction for builds in cloud environment.
Could you elaborate? The Java heap space is reserved but initially not 
committed. Physical memory is allocated when we write into the archive 
heap regions.

Were you getting OOM during virtual space reservation, or during 
os::commit_memory()?

How much heap objects are you dumping? The current jdk repo needs only 2 
G1 regions, so it's 2* 4M of memory for small heaps like -Xmx128m, and 2 
* 8M of memory for larger heaps.

Thanks
- Ioi

> Thanks,
> Jiangli
>
>> It's better
>> to go with 2) when the size of archive range is known before copying.
>> With the planned work for class pre-initialization and enhanced object
>> archiving support, it will be able to obtain (or have a good estimate
>> of) the total size before copying. Solution 1 can be enhanced to use
>> heap memory when that happens.
>>
>> I'll log these details as a RFE on Monday.
>>
>> Best,
>> Jiangli
>>
>>> I'll withdraw this patch for now, and will try to implement the object
>>> copying.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> Best,
>>>> Jiangli
>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>