RFR: 8296263: Uniform APIs for using archived heap regions

Mon Nov 7 19:58:29 UTC 2022

On Fri, 4 Nov 2022 05:14:50 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> This is an attempt to unify the two different approaches for using archived heap regions. Main goal is to restructure and modify the code to have a single set of GC APIs that can be called for using archived heap regions.
>> 
>> In current state, the VM either tries to "map" (for G1) or "load" (for non-G1 GC policies) the archived heap regions into the java heap.
>> When mapping, the VM determines the address range in the java heap where the archived regions should be mapped. It tries to map the regions towards the end of the heap. The APIs used for this purpose are G1 specific.
>> When loading, the VM asks the GC to provide a chunk of memory from the heap, into which it reads the contents of the archived heap regions. The APIs used are GC policy agnostic but challenging to use for region based collectors.
>> 
>> This PR attempts to add new set of GC APIs that can be used by the VM to reserve space in the heap for mapping the archived heap regions. It combines the good parts of the two existing approaches. Similar to the "loading" API, in this new approach VM is not responsible for determining the mapping address. That responsibility always resides with the GC policy. This also allows the flexibility for the GC implementation to decide where and how to reserve the space for the archived regions. For instance, G1 implementation can continue to attempt to allocate the space towards the end of the heap.
>> This PR also provides the implementation of the new APIs for all the existing GC policies that currently support archived heap regions viz G1, serial, parallel and epsilon.
>
> I am not sure if the existing implementation is 100% correct, but for these test cases, I think we are probably saved by this code: 
> 
> 
>   if (!is_aligned(relocated_closed_heap_region_bottom, HeapRegion::GrainBytes)) {
>     // Align the bottom of the closed archive heap regions at G1 region boundary.
>     // This will avoid the situation where the highest open region and the lowest
>     // closed region sharing the same G1 region. Otherwise we will fail to map the
>     // open regions.
>     size_t align = size_t(relocated_closed_heap_region_bottom) % HeapRegion::GrainBytes;
>     delta -= align;
>     log_info(cds)("CDS heap data needs to be relocated lower by a further " SIZE_FORMAT
>                   " bytes to " INTX_FORMAT " to be aligned with HeapRegion::GrainBytes",
>                   align, delta);
>     set_shared_heap_runtime_delta(delta);
>     relocated_closed_heap_region_bottom = heap_region_runtime_start_address(si);
>     _heap_pointers_need_patching = true;
>   }
> 
> 
> G1 regions are at least 1MB, and are always a power of 2.
> 
> By patching SharedStringsStress.java with this, I can get the CA1 and OA0 regions to be not aligned by GrainBytes, but that doesn't seem to cause the test to fail.
> 
> 
> -                TestCommon.concat(vmOptionsPrefix, "HelloString"));
> +                TestCommon.concat(vmOptionsPrefix, "-Xlog:cds=debug", "-Xmx6g", "HelloString"));
> 
> 
> In any case, I think we can consider first changing the way the regions are written ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) so that they can be more easily mapped by various collectors.
> 
> (Also, tactically, we should probably first change G1 to use the new "Uniform API" you are thinking about, but leave the other collectors unchanged. This way, we can gradually test things out and fix the other collectors in subsequent RFEs).
> 
> Currently, when writing the archived heap, we allocate a G1 region and write objects into it, from bottom to top. When it fills up, we allocate another G1 region that's immediately below, and start filing it from bottom to top. At the end, we merge all the fully-filled regions into the CA0 region, and make the last, half-filled region CA1.
> 
> (Same for the OA0, OA1 regions, but usually the OA0 region never has more than 1MB objects, so we'd never have the OA1 region).
> 
> This is kind of kludgy. We should be able to first determine all objects to be archived, and then write them out a single contiguous "closed" region, and a single contiguous "open" region. When filling out these regions, we can pack the objects so that they will never cross a 1MB boundary.
> 
> Also, I think it may not even be worthwhile to have the "closed" region and treat it specially at runtime. We can have just a single contiguous block of archived objects like this, where S are the String objects and their char arrays, and O are the other types of objects
> 
> 
> OOOOOOOOOOOSSSSSSSSSSS
> 
> 
> At runtime, we allocate enough G1 regions from the top of the heap to accommodate the archived objects, and put a dummy object at the bottom to fix the bottom-most region.
> 
> (The reason we align the archived regions to the top of the G1 heap is the top of the heap usually have the same narrowOop for various heap sizes, so we can usually avoid patching the embedded oop pointers.
> 
> This is a trade off with other collectors, which may not allow you to start allocating memory from the top. We may want to reconsider this.)
> 
> All the Strings are always in the interned table so they will never be collected. Also, we already computed their hashcode, so they are never written into (unless you `synchronize` on them at runtime). So for the region(s) that contain only the S objects, 
> we can effectively share the memory across multiple processes, and the GC will never collect them.
> 
> Anyway, we usually just have a few MBs of archived objects, so it may not matter whether we keep them immutable or not.
> *******
> 
> I want to thank you for starting working in this area. Going forward, I think we need more discussion and design before we can decide exactly what to do.

@iklam thanks for sharing the information and details on the future work in this space.

> By patching SharedStringsStress.java with this, I can get the CA1 and OA0 regions to be not aligned by GrainBytes, but that doesn't seem to cause the test to fail.

I was actually referring to CA0 and CA1 in my figures (which I realized was not clear in my explanation earlier). 
Anyway, I now understand the existing mechanism works fine because the following conditions are maintained (which you have already mentioned in your comment):
1. G1 regions are at least 1MB, and are always a power of 2.
2. At dump time the objects are placed such that they do not cross `HeapRegion::min_region_size_in_words()` which I believe is 1M.

Because of these two constraints, change in G1 region size at run time cannot result in objects crossing the region boundary.
So if I update the G1 code such that at run time the regions are mapped at 1M boundary then I can get rid of the problem of objects crossing region boundary and the two tests also pass.

> In any case, I think we can consider first changing the way the regions are written ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) so that they can be more easily mapped by various collectors.

I agree ([JDK-8296344](https://bugs.openjdk.org/browse/JDK-8296344)) would make it easier to map them at run time and would be happy to contribute to it anyway possible. But again, that's a GC policy specific implementation detail. 
I guess you would agree we need to de-couple the CDS code from the GC policy details. While JDK-8296344 aims at decoupling the code at dump time, my aim with this PR is to achieve the same at run time by having GC-agnostic APIs. 
Moreover, the dump time mechanism should not affect the APIs used for mapping regions at run time (though the implementation may need to be adjusted).
So, with this in mind do you think we can continue working on this PR, or do you believe the GC APIs this PR proposes to add would not be sufficient once JDK-8296344 is implemented?

> (Also, tactically, we should probably first change G1 to use the new "Uniform API" you are thinking about, but leave the other collectors unchanged. This way, we can gradually test things out and fix the other collectors in subsequent RFEs).

That makes sense. Ideally I should have done the implementation for other collectors in a separate RFEs. But I was worried if I the new APIs are flexible enough to support other non-G1 policies, and in an attempt to verify that I added the support for those policies as well. If it helps I can remove those commits and deliver them later in subsequent RFEs.

-------------

PR: https://git.openjdk.org/jdk/pull/10970