RFR(S) 8208658: Make CDS archived heap regions usable even if compressed oop encoding has changed

Ioi Lam ioi.lam at oracle.com
Tue Aug 21 00:32:10 UTC 2018


Hi Jiangli,

Thanks for the review. I've updated the webrev at:

http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v02/

Delta from the last webrev:

http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v02.delta/

Please see comments below


On 8/17/2018 11:49 AM, Jiangli Zhou wrote:
> Hi Ioi,
>
> Thanks a lot for the updated webrev! Please see some more 
> comments/suggestions below.
>
> - src/hotspot/share/memory/filemap.cpp
>
>  195   if (MetaspaceShared::is_heap_object_archiving_allowed()) {
>  196     _g1_reserved = G1CollectedHeap::heap()->g1_reserved();
>  197   }
>
> The above are G1 specific. It should be included under #if INCLUDE_G1GC.
>

Fixed. I tested with "configure --with-jvm-features=-g1gc ...". I ended 
up having to fix a couple of JFR files as well.

>  881 MemRegion 
> FileMapInfo::get_heap_regions_range_with_current_oop_encoding_mode() {
> ...
>  884
>  885   for (int i = MetaspaceShared::first_string; i <= 
> MetaspaceShared::last_valid_region; i++) {
>  886     FileMapHeader::space_info* si = space_at(i);
>  887     size_t size = si->_used;
>  888     if (size > 0) {
>  889       address s = 
> si->start_address_with_current_oop_encoding_mode();
>  890       address e = s + size;
>  891       if (start > s) {
>  892         start = s;
>  893       }
>  894       if (end < e) {
>  895         end = e;
>  896       }
>  897     }
>  898   }
>
> We can record the ‘start’ and ‘end’ within the archive at dump time. 
> At runtime, we only need to compute the addresses of the ‘start’ and 
> ‘end’ using the current encoding mode without iterating all archived 
> heap spaces. It’s a minor optimization, we can do that after current 
> RFE if you prefer.
>

I used to have a version like what you suggested. However, that means I 
need to split the logic between dump time and run time, and you end up 
with more code.  Since this code is not performance critical, I think 
it's better to put it all in one place.

> - src/hotspot/share/memory/filemap.hpp
> 91   address _oopmap;               // bitmap for relocating embedded 
> oops
>
> Nit. ‘pointers’ seems to be a better word than ‘oops’ here.
>

oopmap is actually a common terminology in HotSpot that means 'a map of 
all the pointers to oop'

> 316 MemRegion get_heap_regions_range_with_current_oop_encoding_mode();
>
> Does the above needs NOT_CDS_JAVA_HEAP_RETURN also?
>

I don't know how to easily return an 'empty' ResourceBitMap. Since this 
function is used only by the archived heap code, I moved its declaration 
inside INCLUDE_CDS_JAVA_HEAP.

> 524 class FindEmbeddedPointers: public BasicOopIterateClosure {
>
> Please add some comments for the FindEmbeddedPointers closure. Since 
> we iterate all objects inside the archived heap regions, we still can 
> end up with the correct & complete bitmaps without follow the 
> references within each archived object. BTW, can you please check and 
> make sure NULL oop pointers are not on the bitmap so we don’t waste 
> cycles at runtime.
>

That's a good suggestion. I added code to filter out all the null 
pointers. It turns out about 50% of the pointers in the open region are 
NULL, so it's good saving. The new version is slightly faster than the 
last version.

> Please also add comments explaining it’s dump time only logic.
>

Done.

> 564 class PatchEmbeddedPointers: public BitMapClosure {
>
> Some comments would be useful to explain it’s for runtime use.
>

Done

>  573 if (!CompressedOops::is_null(v)) {
>  574       oop o = HeapShared::decode_with_archived_oop_encoding_mode(v);
>  575       RawAccess<IS_NOT_NULL>::oop_store(p, o);
>  576     }
>
> So looks like we do include null in the bitmap? If we eliminate the 
> nulls from the bitmap, the is_null check above can be avoided at 
> runtime. It’s a very small optimization, but probably worth to consider?
>

Fixed

> - src/hotspot/share/oops/instanceMirrorKlass.hpp
>
> 102   static void serialize(class SerializeClosure* f) NOT_CDS_RETURN;
>
> To sync up with your recent change, the above can be renamed to 
> serialize_offset.
>

Fixed

> - test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java
>
> Could you please add the scenario that Calvin suggested:
>
>
>         new Scenario(       32,         32, 64, 512, 2048, 4097, 
> 16374, 31000)

Added.

Thanks
- Ioi

>
> Thanks,
> Jiangli
>
> On 8/14/18 7:27 PM, Ioi Lam wrote:
>> Here's an updated webrev:
>>
>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/ 
>>
>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/log.txt 
>>
>>
>>
>> [1] Incorporated many suggestions from Jiangli (Thanks!):
>>
>>     - Moved the patching code code to heapShared.cpp
>>     - Added various comments and improved logging (see above log.txt 
>> link)
>>     - Added test cases for >32GB heaps.
>>
>> [2] Fixed the region relocation logic to avoid any GC code changes and
>>     support all possible compressed oop encoding changes.
>>
>>     It turns out I don't need to force 8MB region size during dump time.
>>     I just need to align the bottom of the closed heap on G1 region
>>     boundary. This will ensure that the open and close heap will never
>>     occupy the same G1 region.
>>
>>     All test cases in DifferentHeapSizes.java can successfully
>>     map the archived regions.
>>
>>     See FileMapInfo::map_heap_regions_impl
>>
>> [3] Generate a bitmap of all the pointers at dump time, and use this
>>     for pointer patching at runtime. I can't find any measurable
>>     improvements, but it should be at least as fast as the last
>>     version (which used a BasicOopIterateClosure at runtime).
>>
>> [4] Various clean up, such as
>>
>>     struct FileMapInfo::FileMapHeader::space_info* si = 
>> &_header->_space[i];
>>   ->
>>     FileMapHeader::space_info* si = space_at(i);
>>
>> Testing: Passed hs tiers 1/2/3/4 with no new failures.
>>
>> Thanks
>> - Ioi
>>
>>
>> On 8/7/18 6:47 PM, Jiangli Zhou wrote:
>>> On 8/7/18 3:38 PM, Ioi Lam wrote:
>>>
>>>> Hi Jiangli,
>>>>
>>>> Thanks for the review.
>>>>
>>>>
>>>> On 8/7/18 9:25 AM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> This is a welcomed change! Thanks for doing this. I'd like to 
>>>>> suggest building a relocation table at dump time. The table can 
>>>>> contain entries with pair of 
>>>>> 'offset_to_region_start:narrow_oop_value'. Then at runtime, we can 
>>>>> avoid iterating the objects. That would help to reduce the 
>>>>> overhead that you are observing when relocation is needed at 
>>>>> runtime. This could be done as a further optimization in a 
>>>>> follow-up RFE for 12.
>>>>>
>>>>
>>>> That's a good suggestion.  I'll try to do that as part of this fix, 
>>>> by generating a bitmap of all the pointers that need to be 
>>>> relocated. I will need one bit per 8 bytes of archived heap data, 
>>>> so the (read-only) memory cost is pretty minimal.
>>>>
>>>>> - src/hotspot/share/memory/heapShared.inline.hpp
>>>>>
>>>>> - src/hotspot/share/memory/heapShared.cpp
>>>>>
>>>>>   33 inline oop HeapShared::decode_not_null(narrowOop v) {
>>>>>   34   assert(!CompressedOops::is_null(v), "narrow oop value can 
>>>>> never be zero");
>>>>>   35   oop result = (oop)(void*)((uintptr_t)_narrow_oop_base + 
>>>>> ((uintptr_t)v << _narrow_oop_shift));
>>>>>   36   assert(check_obj_alignment(result), "address not aligned: " 
>>>>> INTPTR_FORMAT, p2i((void*) result));
>>>>>   37   return result;
>>>>>   38 }
>>>>>
>>>>>  510 void HeapShared::init_narrow_oop_decoding(address base, int 
>>>>> shift) {
>>>>>  511   _narrow_oop_base = base;
>>>>>  512   _narrow_oop_shift = shift;
>>>>>  513 }
>>>>>
>>>>> - src/hotspot/share/memory/filemap.cpp
>>>>>
>>>>>
>>>>>  913     //   dumptime heap end  ------------v
>>>>>  914     //   [      |archived heap regions| ] runtime heap end 
>>>>> ------v
>>>>>  915     //                                       [ |archived heap 
>>>>> regions| ]
>>>>>  916     // |<-----delta-------------------->|
>>>>>  917     //
>>>>>  918     // At dump time, the archived heap region were near the 
>>>>> top of the heap.
>>>>>  919     // At run time, that region may not be inside the heap, 
>>>>> so we move it so
>>>>>  920     // that it's now near the top of teh runtime time. This 
>>>>> can be done by
>>>>>  921     // the simple math of adding the delta as shown above.
>>>>>  922     address dumptime_heap_end = 
>>>>> (address)_header->_g1_reserved.end();
>>>>>  923     address runtime_heap_end = 
>>>>> (address)G1CollectedHeap::heap()->g1_reserved().end();
>>>>>  924     delta = runtime_heap_end - dumptime_heap_end;
>>>>>  925   }
>>>>>  926
>>>>>  927 HeapShared::init_narrow_oop_decoding(narrow_oop_base() + 
>>>>> delta, narrow_oop_shift());
>>>>>
>>>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure {
>>>>> ...
>>>>> 1056   virtual void do_oop(narrowOop *p) {
>>>>> 1057     narrowOop v = *p;
>>>>> 1058     if (!CompressedOops::is_null(v)) {
>>>>> 1059       oop o = HeapShared::decode_not_null(v);
>>>>> 1060       RawAccess<IS_NOT_NULL>::oop_store(p, o);
>>>>> 1061     }
>>>>> 1062   }
>>>>>
>>>>> Your relocation change boils down to above sections of code. It 
>>>>> works as both UnscaledNarrowOop encoding and ZeroBasedNarrowOop 
>>>>> encoding use 0 as the base. I believe we don't handle heap based 
>>>>> encoding with CDS enabled. Can you please double check that.
>>>>>
>>>> It actually does work. See
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log.txt 
>>>>
>>>>
>>>> Command line: 
>>>> [/home/iklam/jdk/bld/abe-fastdebug/images/jdk/bin/java ... 
>>>> -Xmx31000m -XX:+UseStringDeduplication Hello ]
>>>>
>>>> [STDOUT]
>>>> [0.092s][info][cds] Archived narrow_klass_base = 
>>>> 0x0000000800000000, narrow_klass_shift = 3
>>>> [0.092s][info][cds] Archived narrow_oop_mode = 0, narrow_oop_base = 
>>>> 0x0000000000000000, narrow_oop_shift = 0
>>>> [0.092s][info][cds] CDS heap data need to be relocated because
>>>> [0.092s][info][cds] the archive was created with an incompatible 
>>>> heap size: 128M.
>>>> [0.092s][info][cds] Current narrow_oop_mode = 2, narrow_oop_base = 
>>>> 0x0000001000000000, narrow_oop_shift = 3
>>>> Hello World
>>>>
>>>>> The new HeapShared::decode_not_null() decodes the narrowOop value 
>>>>> using the dump time encoding mode with added base delta. Can you 
>>>>> please rename it to be more descriptive, for example 
>>>>> HeapShared::decode_with_archived_encoding_mode()? Please also add 
>>>>> some comments to include these detail for this function.
>>>>>
>>>>
>>>> OK, I will fix that.
>>>>
>>>>> - src/hotspot/share/memory/filemap.cpp
>>>>>
>>>>>  886   if (narrow_klass_base() != Universe::narrow_klass_base() ||
>>>>>  887       narrow_klass_shift() != Universe::narrow_klass_shift()) {
>>>>>  888     log_info(cds)("Cached heap data from the CDS archive need 
>>>>> to be relocated because");
>>>>>  889     log_info(cds)("the CDS archive was created with an 
>>>>> incompatible heap size: " UINTX_FORMAT "M.", max_heap_size()/M);
>>>>>  890     log_info(cds)("Current narrow_klass_base = " PTR_FORMAT 
>>>>> ", narrow_klass_shift = %d",
>>>>>  891 p2i(Universe::narrow_klass_base()), 
>>>>> Universe::narrow_klass_shift());
>>>>>  892     return;
>>>>>  893   }
>>>>>
>>>>> Please fix the log message at 888 as relocation is not done in 
>>>>> above case.
>>>>>
>>>> Oops, it should have been "Cached heap data from the CDS archive 
>>>> cannot be used because", but I messed up with copy-pasting.
>>>>
>>>>> Please move the code at line 895 - 898 to be before the above 
>>>>> code, so we still have the logging information about the dump time 
>>>>> encoding modes when runtime narrow_klass_encoding is different. 
>>>>> The dump time encoding modes are useful debugging information.
>>>>>
>>>> OK. Will do.
>>>>
>>>>>
>>>>> 1039 void FileMapInfo::relocate_archived_heap_embedded_pointers()
>>>>>
>>>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure
>>>>>
>>>>> 1068 void FileMapInfo::relocate_archived_heap_embedded_pointers_impl
>>>>>
>>>>> Could you please place above new code blocks in heapShared.* 
>>>>> files? I filed JDK-8206009 to move all heap object archiving code 
>>>>> to heapShared.*. It's helpful to make sure the related new code 
>>>>> goes to the right files before we completely resolve existing code 
>>>>> for JDK-8206009.
>>>>>
>>>>
>>>> OK. Will do.
>>>>
>>>>> - src/hotspot/share/gc/g1/heapRegion.cpp
>>>>>
>>>>> Can you please check the percentage of unused region space if we 
>>>>> force 8M gc region at dump time? The unused spaces in the archived 
>>>>> heap regions are filled with dummy objects at runtime and cannot 
>>>>> be utilized.
>>>> At runtime, the region size is chosen according to -Xmx. The amount 
>>>> of waste (filled by dummy objects) is related only to the runtime 
>>>> region size. It's not related to the region size chosen at dump time.
>>> Yes, the heap memory waste is more related with the runtime GC 
>>> region size.
>>>
>>> The other side effect of forcing hard coded 8M GC region at dump 
>>> time is that you might have archived objects across the region 
>>> boundary at runtime when the runtime GC region size is less than 8M. 
>>> When that happens, you would need to invalidate all archived java 
>>> heap data. Also, you need to detect such cases, which adds 
>>> additional overhead. However, my memory tells me that we allocate 
>>> archived objects at 2M boundary within a GC region regardless the 
>>> region size at dump time to make sure we don't ever run into cases 
>>> with objects across boundary when runtime and dump have different GC 
>>> region sizes. I need to double check on that to make sure it is the 
>>> case.
>>>>
>>>> So in this regard, this patch does not increase the amount of 
>>>> runtime memory waste.
>>>>
>>>>> It would also introduce gaps between the archived heap regions at 
>>>>> runtime when the GC region size is < 8M with smaller heap size.
>>>>
>>>> Yes, this is an issue for small heaps. Previously, without my 
>>>> patch, if you run with
>>>>
>>>>     java -Xshare:dump -Xmx128m
>>>>     java -Xshare:on -Xmx128m
>>>>
>>>> the open and closed archive regions will in two consecutive 1MB 
>>>> regions.
>>>>
>>>> With my patch, there will be a gap of 7MB between the 2 regions 
>>>> (each as still 1MB in size). This means if you run with very small 
>>>> heaps with CDS, you may not be able to allocate very large arrays.
>>>>
>>>>> We should check with GC team on this and see if there are other 
>>>>> alternatives to resolve the overlapping issue. One possible 
>>>>> solution I think is to relocate the 'open' archive heap region 
>>>>> further down at runtime if the 'open' and 'closed' archive regions 
>>>>> are in the same G1 region at runtime.
>>>>>
>>>> I'll try implement what you described above (in a separate RFE). 
>>>> This will make runtime relocation a little slower, but hopefully 
>>>> that will just be noise, and I won't need to make any change in G1.
>>> We just need to make sure the open archive regions are allocated 
>>> outside the GC region used for the closed archive regions. It should 
>>> not add additional overhead for object relocation. However, it does 
>>> add complexity to the relocation implementation as you would have 
>>> different base deltas.
>>>>
>>>> For this RFE, I will revert the changes in heapRegion.cpp. This 
>>>> means in some cases, we won't be able to map the archived regions:
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log2.txt 
>>>>
>>>>
>>>> Note that 2 out of 26 cases fail to map the heap regions with 
>>>> "Unable to allocate region, java heap range is already in use",
>>>> but it's still works for the majority of the cases, so it still 
>>>> much better than what we have today
>>> That's a reasonable trade-off. Seems acceptable to me.
>>>>
>>>>> - 
>>>>> test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java
>>>>>
>>>>> Looks good! How about adding some scenarios with heap size > 32G 
>>>>> as sanity checks.
>>>>>
>>>>
>>>> Will do, but with heap size > 32G CDS will be disabled because 
>>>> compressed oop is disabled.
>>> Yes. I think we don't have any existing test cases with >32G. So 
>>> it's good to add it for sanity check.
>>>
>>> Thanks,
>>> Jiangli
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Thanks,
>>>>>
>>>>> Jiangli
>>>>>
>>>>>
>>>>> On 8/1/18 11:04 PM, Ioi Lam wrote:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8208658
>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/ 
>>>>>>
>>>>>>
>>>>>> PROBLEM:
>>>>>>
>>>>>>     At runtime, if you use a maximum heap size (-Xmx) setting that's
>>>>>>     different than the one used at CDS archive creation time, the 
>>>>>> archived
>>>>>>     heap regions may fail to map. The reason is the archived heap 
>>>>>> regions
>>>>>>     are hard-coded to a specific compressed oop encoding scheme.
>>>>>>
>>>>>>     This causes degradation in start-up time, because the 
>>>>>> archived heap
>>>>>>     regions enable many aggressive optimizations, such as 
>>>>>> JDK-8202035,
>>>>>>     archiving the system module info.
>>>>>>
>>>>>>     This degradation is undesirable for JDK-8204247 (include default
>>>>>>     CDS archive in JDK binary) -- what we want is to ship a CDS 
>>>>>> archive
>>>>>>     that's optimal regardless of the max heap settings.
>>>>>>
>>>>>> FIX:
>>>>>>
>>>>>>     If the compressed oop encoding scheme is different:
>>>>>>         - relocate the archived heap regions so it
>>>>>>           fits within the runtime heap bounds.
>>>>>>         - patch all reference fields in the archived objects to
>>>>>>           use the runtime compressed oop encoding scheme.
>>>>>>
>>>>>> RESULTS:
>>>>>>
>>>>>>     # dump archive
>>>>>>     java -Xshare:dump -Xmx2000m
>>>>>>
>>>>>>     # same compressed oop encoding
>>>>>>     java -Xshare:on -Xmx2000m -Xlog:cds -version
>>>>>>
>>>>>>     # different compressed oop encoding
>>>>>>     java -Xshare:on -Xmx2090m -Xlog:cds -version
>>>>>>
>>>>>>     Note: the archive is dumped with UnscaledNarrowOop encoding.
>>>>>>     At run time, if -Xmx2090m is specified, ZeroBasedNarrowOop is
>>>>>>     used instead.
>>>>>>
>>>>>> ======================================================================== 
>>>>>>
>>>>>>     OLD same encoding      34.45 ms +- 0.51%
>>>>>>     OLD diff encoding      47.76 ms +- 0.94% <<<<< 13.3 ms 
>>>>>> degradation!
>>>>>>
>>>>>>     NEW same encoding      34.58 ms +- 0.61%
>>>>>>     NEW diff encoding      35.09 ms +- 0.24% <<<<< 0.5 ms 
>>>>>> degradation
>>>>>> ======================================================================== 
>>>>>>
>>>>>>
>>>>>>
>>>>>>     The cost of relocating/patching the archived heap regions is 
>>>>>> only
>>>>>>     0.5 ms. This is much better than the 13.3 ms degradation 
>>>>>> caused by the
>>>>>>     failure to map the archived heap regions.
>>>>>>
>>>>>> TESTING:
>>>>>>
>>>>>>     All 150+ CDS tests passed locally. I am starting tiers 1/2/3 
>>>>>> runs.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>
>>>>
>>>
>>
>




More information about the hotspot-gc-dev mailing list