[aarch64-port-dev ] Question about ccs reservation, CDS and aarch64 specifics

Ioi Lam ioi.lam at oracle.com
Tue Apr 21 04:42:56 UTC 2020



On 4/20/20 4:10 AM, Thomas Stüfe wrote:
> On Mon, Apr 20, 2020 at 10:47 AM Ioi Lam <ioi.lam at oracle.com 
> <mailto:ioi.lam at oracle.com>> wrote:
>
>
>
>     On 4/18/20 12:15 AM, Thomas Stüfe wrote:
>>     Hi Ioi,
>>
>>     I am working on a small patch and have some more questions.
>>
>>     - First, a simple one, in
>>     DynamicArchiveBuilder::reserve_space_and_init_buffer_to_target_delta(),
>>     the space does not have anything to do with metaspace, as you
>>     wrote, so the alignment could be anything, right?
>>
>     I think so.
>
>>     - Out of curiousity, when you pack the different regions
>>     (DumpRegion::pack) you align the end to page size. Why? Why could
>>     the next region not simply follow immediately? I looked if any
>>     code needs a region to be page aligned, but may have missed it.
>
>     We map RO read-only and MC/RW in read-write. If the regions are
>     not aligned, you will have a page that wants half to be read-only
>     and half to be read-write.
>
>
> Okay. I wondered why page align here and not allocation granularity. 
> Now I understand. I guess this is also the reason why we could not use 
> large pages for the archive?
>
> I think this is fine, I did not want to change it. On some platforms 
> we have 64K (non-large) pages, but even there I think the waste would 
> be acceptable.
>
>     I guess we can adjust the mapping to be more lenient (if a page
>     wants half read-write, we map it read-write), but that's no done
>     today.
>
>>
>>     - void
>>     MetaspaceShared::initialize_dumptime_shared_and_meta_spaces() :
>>
>>     I assume this code has to work for all three cases right
>>     1) lp32.
>>     2) lp64 with and without UseCompressedClassPointers?
>>     3) lp64 without UseCompressedClassPointers?
>>
>>     If yes, does the setting for UseCompressedClassPointers have to
>>     be the same at run time?
>
>     Yes. The value of UseCompressedOops and UseCompressedClassPointers
>     must be the same between dump time and run time.
>
>>
>>
>>     In this layout:
>>       // On 64-bit VM, the heap and class space layout will be the
>>     same as if
>>       // you're running in -Xshare:on mode:
>>       //
>>       //                              +-- SharedBaseAddress (default
>>     = 0x800000000)
>>       //                              v
>>       // +-..---------+---------+ ...
>>     +----+----+----+--------------------+
>>       // |    Heap    | Archive |     | MC | RW | RO |    class space
>>         |
>>       // +-..---------+---------+ ...
>>     +----+----+----+--------------------+
>>       // |<--   MaxHeapSize  -->| |<-- UnscaledClassSpaceMax = 4GB -->|
>>       //
>>
>>     Why does the class space has to follow mc+rw+ro? Could it come
>>     before?
>>
>>
>     Compressed klass pointers are stored in archived objects. If the
>     class space is now lower than SharedBaseAddress, you will need to
>     rebase all of the compressed klass pointers. This is not efficient
>     and will slow down start-up.
>
>
> Well, could SharedBaseAddress not point to start of the ccs:
>
>   // +-- SharedBaseAddress (default = 0x800000000)
>   // v
>   // +----+----+----+-----------------------------------+
>   // |    class space     | ..gap maybe.. | MC | RW | RO
>   // +----+----+----+-----------------------------------+
>
> you'd then need to make sure that the relative offset of MC to 
> SharedBaseAddress is the same at dump time and at runtime. Is my 
> understanding correct? I am not saying I want to do this, I just try 
> to understand the way ccs archive allocation works.

That should work. But you are still using a fixed offset from the bottom 
of MC to SharedBaseAddress (instead of a fixed address of 0). I am not 
sure if that will buy you any flexibility.

>
>>
>>     Actually, does it have to be in the same space at all, or could
>>     it live somewhere completely different?
>
>     It can be higher. You just need to ensure that the distance
>     between SharedBaseAddress to the end of the class space is within
>     max compressed klass space size.
>
>     But, I am wondering why you're asking this :-)
>
>
> I try to understand the allocation and where apply what restrictions. 
> We have at least three parties, cds, metaspace and the underlying 
> platform, all with their own subtleties of how the memory should be 
> allocated:
> - metaspace will in the near future want a larger alignment than what 
> cds uses for reservation.
> - platforms like aarch64 and maybe ppc want the compressed class base 
> to look in a certain way
>
> Part of my confusion was that I always thought of 
> CompressClassPointers::base() to be basically the same as the start of 
> the ccs (maybe modulo being zero on zero-based mode) but that is 
> obviously not true since CDS exists. So what I wrote first:
>
> "Metaspace::reserve_preferred_space.. Despite its generic-sounding 
> name, these functions can only be used to allocate ccs."
>
> is actually not fully correct. In reality this space is to be used to 
> allocate memory to house Klass structures so that their pointers are 
> compressable, so the reserved start address has to be compatible with 
> that. But, e.g., that start address does not have to be aligned to 
> Metaspace::reserve_alignment().
>
That's very true. I have some logic in CDS to pick the greater of 
Metaspace::reserve_alignment() and os alignment. I think this is 
probably unnecessary.

Maybe we should simply untangle CDS/CCS from Metaspace altogether? We 
really just need a 1GB reservation to be anchored at 32GB (or some 
address that aarch64 likes). That way, you can do whatever you want with 
Metaspace and not worry about CDS/CCS.

Thanks
- Ioi
> In both cds dump and runtime case, the ccs is carved from the end part 
> of the reserved space. Only that split point, and the size of that 
> second part, have to be aligned to Metaspace::reserve_alignment().
>
> Were we to allocate ccs first and put the archives behind it this 
> would simplify some matters, but only minor points. I think the way it 
> works now is okay. I will try to disentangle it a bit in a way you 
> proposed.
>
>>     To ask in a more precise way: I understand that both the mc+rw+ro
>>     archives and the ccs have to live in an area encompassed by the
>>     compressed class pointers encoding scheme. I wonder whether there
>>     are any restrictions beyond that.
>>
>>     Could there be a gap between archives and ccs? 
>     Yes
>
>>     Can the order be reversed? 
>     No.
>
>>     Do the relative positions between archives and ccs have to be the
>>     same between dump time and runtime?
>     No. All the pointers stored inside CDS point to inside of the
>     MC/MW/RO regions, so it doesn't retain any knowledge of where the
>     CCS was at dump time.
>
>
> Clear answers, thank you!
>
> ..Thomas
>
>
>     Thanks
>     - Ioi
>
>
>>
>>     Thanks!
>>
>>     On Thu, Apr 16, 2020 at 8:31 PM Ioi Lam <ioi.lam at oracle.com
>>     <mailto:ioi.lam at oracle.com>> wrote:
>>
>>
>>
>>         On 4/16/20 11:14 AM, Thomas Stüfe wrote:
>>>         Hi Ioi,
>>>
>>>         On Thu, Apr 16, 2020 at 7:49 PM Ioi Lam <ioi.lam at oracle.com
>>>         <mailto:ioi.lam at oracle.com>> wrote:
>>>
>>>             (I suppose you mean "compressed class space" by "ccs" :-)
>>>
>>>
>>>         Yes, I think I stole this from Stefan Karlsson :)
>>>
>>>             <snip>
>>>
>>>             I am not even sure if case (C) can happen at all.
>>>
>>>             I admit that I've been guilty of making the interface
>>>             even more complicated
>>>             with JDK-8231610
>>>             <https://bugs.openjdk.java.net/browse/JDK-8231610>(Relocate
>>>             the CDS archive if it cannot be mapped to the
>>>             requested address). Looks now is a good time to clean up.
>>>
>>>
>>>         The coding has been complicated to begin with, and then it
>>>         usually only gets worse since no-one has time for a revamp
>>>         :( A clean up would be very helpful.
>>>
>>>         One reason I look at this coding now, beside the aarch64
>>>         problem, was that I try to disentangle CDS from Metaspace,
>>>         especially the alignment policy. Remember, I tried to tackle
>>>         this last summer? but it keeps biting me. For such a small
>>>         problem this is weirdly complicated.
>>>
>>>             One thing that can be cleaned up is the call to
>>>             Metaspace::allocate_metaspace_compressed_klass_ptrs:
>>>
>>>             (a) when CDS is enabled:
>>>
>>>             Metaspace::global_initialize()
>>>                 ->
>>>             MetaspaceShared::initialize_runtime_shared_and_meta_spaces()
>>>                    -> ... MetaspaceShared::map_archives()
>>>                      -> ... reserve the space, eventually calling
>>>             Metaspace::reserve_space
>>>                      -> call
>>>             Metaspace::allocate_metaspace_compressed_klass_ptrs()
>>>
>>>             (b) when CDS is disabled
>>>
>>>             Metaspace::global_initialize()
>>>             -> allocate_metaspace_compressed_klass_ptrs
>>>                    -> (if cds is not enabled) Metaspace::reserve_space()
>>>
>>>
>>>             In case (b), we should first reserve the space, and then
>>>             call into
>>>             allocate_metaspace_compressed_klass_ptrs. This will
>>>             simplify the arguments
>>>             of allocate_metaspace_compressed_klass_ptrs, and will
>>>             also limit the variations
>>>             of calls to Metaspace::reserve_space(). I think this
>>>             will make it possible to
>>>             drop the use_requested_addr argument and rely simply on
>>>             (requested_addr != NULL)
>>>
>>>
>>>         So, in all cases we'd pre-reserve the ReservedSpace and hand
>>>         it down to
>>>         Metaspace::allocate_metaspace_compressed_klass_ptrs()?
>>>
>>>         This would melt down
>>>         Metaspace::allocate_metaspace_compressed_klass_ptrs() to
>>>         just "initialize compressed class space from a pre-arranged
>>>         ReservedSpace, and set up base + shift".
>>>
>>>         We could probably rename that thing
>>>         to Metaspace::set_up_compressed_klass_space(ReservedSpace*
>>>         rs, cds_base);
>>>
>>>         We even could move set_narrow_klass_base_and_shift() out of
>>>         Metaspace::set_up_compressed_klass_space, then it becomes a
>>>         series of three simple operations:
>>>         1) obtain a ReservedSpace however you see fit
>>>         2) register it with Metaspace as address space for ccs,
>>>         3) set_narrow_klass_base_and_shift. We would not have to
>>>         hand down cds_base to Metaspace, only for it to be used as
>>>         base address in set_narrow_klass_base_and_shift.
>>>
>>
>>         Yes, that seems the right thing to do. That will hopefully
>>         make the aarch64 initialization code a little simpler as well.
>>
>>>         One question which came to me today was:
>>>
>>>         In AppCDS, DynamicArchiveBuilder::do_it() calls
>>>         Metaspace::reserve_space(). Is that really needed, does a
>>>         DumpRegion have anything to do with ccs? Don't they just
>>>         need some space to dump into? Hope that question is not dumb.
>>>
>>         Do you mean:
>>
>>         DynamicArchiveBuilder::reserve_space_and_init_buffer_to_target_delta()
>>
>>         -> MetaspaceShared::reserve_shared_space
>>             -> Metaspace::reserve_space
>>
>>         That's not necessary. When I wrote the code I thought
>>         Metaspace::reserve_space was a general function for reserving
>>         spaces :-) but as you said, this function is probably
>>         intended only for initializing the CCS.
>>
>>         Thanks
>>         - Ioi
>>
>>>         Thanks, Thomas
>>>
>>>             Thanks
>>>             - Ioi
>>>
>>>
>>>>>             Does that make sense? In other words, if the whole point of
>>>>>             Metaspace::reserve_preferred_space() is "OS knows better, let it try
>>>>>             to find a good address", would it not make sense to just try a low
>>>>>             address as part of the try-addresses-loop?
>>>>             We certainly don't want to have to use a dedicated heapbase register
>>>>             or a shift. Just give us a multiple of 4*G and we're happy.
>>>>
>>>
>>
>



More information about the aarch64-port-dev mailing list