RFR(S) 8208658: Make CDS archived heap regions usable even if compressed oop encoding has changed

Jiangli Zhou jiangli.zhou at oracle.com
Fri Aug 17 18:49:14 UTC 2018


Hi Ioi,

Thanks a lot for the updated webrev! Please see some more 
comments/suggestions below.

- src/hotspot/share/memory/filemap.cpp

  195   if (MetaspaceShared::is_heap_object_archiving_allowed()) {
  196     _g1_reserved = G1CollectedHeap::heap()->g1_reserved();
  197   }

The above are G1 specific. It should be included under #if INCLUDE_G1GC.

  881 MemRegion 
FileMapInfo::get_heap_regions_range_with_current_oop_encoding_mode() {
...
  884
  885   for (int i = MetaspaceShared::first_string; i <= 
MetaspaceShared::last_valid_region; i++) {
  886     FileMapHeader::space_info* si = space_at(i);
  887     size_t size = si->_used;
  888     if (size > 0) {
  889       address s = si->start_address_with_current_oop_encoding_mode();
  890       address e = s + size;
  891       if (start > s) {
  892         start = s;
  893       }
  894       if (end < e) {
  895         end = e;
  896       }
  897     }
  898   }

We can record the ‘start’ and ‘end’ within the archive at dump time. At 
runtime, we only need to compute the addresses of the ‘start’ and ‘end’ 
using the current encoding mode without iterating all archived heap 
spaces. It’s a minor optimization, we can do that after current RFE if 
you prefer.

- src/hotspot/share/memory/filemap.hpp
91   address _oopmap;               // bitmap for relocating embedded oops

Nit. ‘pointers’ seems to be a better word than ‘oops’ here.

316   MemRegion get_heap_regions_range_with_current_oop_encoding_mode();

Does the above needs NOT_CDS_JAVA_HEAP_RETURN also?

524 class FindEmbeddedPointers: public BasicOopIterateClosure {

Please add some comments for the FindEmbeddedPointers closure. Since we 
iterate all objects inside the archived heap regions, we still can end 
up with the correct & complete bitmaps without follow the references 
within each archived object. BTW, can you please check and make sure 
NULL oop pointers are not on the bitmap so we don’t waste cycles at runtime.

Please also add comments explaining it’s dump time only logic.

564 class PatchEmbeddedPointers: public BitMapClosure {

Some comments would be useful to explain it’s for runtime use.

  573     if (!CompressedOops::is_null(v)) {
  574       oop o = HeapShared::decode_with_archived_oop_encoding_mode(v);
  575       RawAccess<IS_NOT_NULL>::oop_store(p, o);
  576     }

So looks like we do include null in the bitmap? If we eliminate the 
nulls from the bitmap, the is_null check above can be avoided at 
runtime. It’s a very small optimization, but probably worth to consider?

- src/hotspot/share/oops/instanceMirrorKlass.hpp

102   static void serialize(class SerializeClosure* f) NOT_CDS_RETURN;

To sync up with your recent change, the above can be renamed to 
serialize_offset.

- test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java

Could you please add the scenario that Calvin suggested:


         new Scenario(       32,         32, 64, 512, 2048, 4097, 16374, 
31000)

Thanks,
Jiangli

On 8/14/18 7:27 PM, Ioi Lam wrote:
> Here's an updated webrev:
>
> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/ 
>
> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/log.txt 
>
>
>
> [1] Incorporated many suggestions from Jiangli (Thanks!):
>
>     - Moved the patching code code to heapShared.cpp
>     - Added various comments and improved logging (see above log.txt 
> link)
>     - Added test cases for >32GB heaps.
>
> [2] Fixed the region relocation logic to avoid any GC code changes and
>     support all possible compressed oop encoding changes.
>
>     It turns out I don't need to force 8MB region size during dump time.
>     I just need to align the bottom of the closed heap on G1 region
>     boundary. This will ensure that the open and close heap will never
>     occupy the same G1 region.
>
>     All test cases in DifferentHeapSizes.java can successfully
>     map the archived regions.
>
>     See FileMapInfo::map_heap_regions_impl
>
> [3] Generate a bitmap of all the pointers at dump time, and use this
>     for pointer patching at runtime. I can't find any measurable
>     improvements, but it should be at least as fast as the last
>     version (which used a BasicOopIterateClosure at runtime).
>
> [4] Various clean up, such as
>
>     struct FileMapInfo::FileMapHeader::space_info* si = 
> &_header->_space[i];
>   ->
>     FileMapHeader::space_info* si = space_at(i);
>
> Testing: Passed hs tiers 1/2/3/4 with no new failures.
>
> Thanks
> - Ioi
>
>
> On 8/7/18 6:47 PM, Jiangli Zhou wrote:
>> On 8/7/18 3:38 PM, Ioi Lam wrote:
>>
>>> Hi Jiangli,
>>>
>>> Thanks for the review.
>>>
>>>
>>> On 8/7/18 9:25 AM, Jiangli Zhou wrote:
>>>> Hi Ioi,
>>>>
>>>> This is a welcomed change! Thanks for doing this. I'd like to 
>>>> suggest building a relocation table at dump time. The table can 
>>>> contain entries with pair of 
>>>> 'offset_to_region_start:narrow_oop_value'. Then at runtime, we can 
>>>> avoid iterating the objects. That would help to reduce the overhead 
>>>> that you are observing when relocation is needed at runtime. This 
>>>> could be done as a further optimization in a follow-up RFE for 12.
>>>>
>>>
>>> That's a good suggestion.  I'll try to do that as part of this fix, 
>>> by generating a bitmap of all the pointers that need to be 
>>> relocated. I will need one bit per 8 bytes of archived heap data, so 
>>> the (read-only) memory cost is pretty minimal.
>>>
>>>> - src/hotspot/share/memory/heapShared.inline.hpp
>>>>
>>>> - src/hotspot/share/memory/heapShared.cpp
>>>>
>>>>   33 inline oop HeapShared::decode_not_null(narrowOop v) {
>>>>   34   assert(!CompressedOops::is_null(v), "narrow oop value can 
>>>> never be zero");
>>>>   35   oop result = (oop)(void*)((uintptr_t)_narrow_oop_base + 
>>>> ((uintptr_t)v << _narrow_oop_shift));
>>>>   36   assert(check_obj_alignment(result), "address not aligned: " 
>>>> INTPTR_FORMAT, p2i((void*) result));
>>>>   37   return result;
>>>>   38 }
>>>>
>>>>  510 void HeapShared::init_narrow_oop_decoding(address base, int 
>>>> shift) {
>>>>  511   _narrow_oop_base = base;
>>>>  512   _narrow_oop_shift = shift;
>>>>  513 }
>>>>
>>>> - src/hotspot/share/memory/filemap.cpp
>>>>
>>>>
>>>>  913     //   dumptime heap end  ------------v
>>>>  914     //   [      |archived heap regions| ] runtime heap end 
>>>> ------v
>>>>  915     //                                       [ |archived heap 
>>>> regions| ]
>>>>  916     // |<-----delta-------------------->|
>>>>  917     //
>>>>  918     // At dump time, the archived heap region were near the 
>>>> top of the heap.
>>>>  919     // At run time, that region may not be inside the heap, so 
>>>> we move it so
>>>>  920     // that it's now near the top of teh runtime time. This 
>>>> can be done by
>>>>  921     // the simple math of adding the delta as shown above.
>>>>  922     address dumptime_heap_end = 
>>>> (address)_header->_g1_reserved.end();
>>>>  923     address runtime_heap_end = 
>>>> (address)G1CollectedHeap::heap()->g1_reserved().end();
>>>>  924     delta = runtime_heap_end - dumptime_heap_end;
>>>>  925   }
>>>>  926
>>>>  927 HeapShared::init_narrow_oop_decoding(narrow_oop_base() + 
>>>> delta, narrow_oop_shift());
>>>>
>>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure {
>>>> ...
>>>> 1056   virtual void do_oop(narrowOop *p) {
>>>> 1057     narrowOop v = *p;
>>>> 1058     if (!CompressedOops::is_null(v)) {
>>>> 1059       oop o = HeapShared::decode_not_null(v);
>>>> 1060       RawAccess<IS_NOT_NULL>::oop_store(p, o);
>>>> 1061     }
>>>> 1062   }
>>>>
>>>> Your relocation change boils down to above sections of code. It 
>>>> works as both UnscaledNarrowOop encoding and ZeroBasedNarrowOop 
>>>> encoding use 0 as the base. I believe we don't handle heap based 
>>>> encoding with CDS enabled. Can you please double check that.
>>>>
>>> It actually does work. See
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log.txt 
>>>
>>>
>>> Command line: [/home/iklam/jdk/bld/abe-fastdebug/images/jdk/bin/java 
>>> ... -Xmx31000m -XX:+UseStringDeduplication Hello ]
>>>
>>> [STDOUT]
>>> [0.092s][info][cds] Archived narrow_klass_base = 0x0000000800000000, 
>>> narrow_klass_shift = 3
>>> [0.092s][info][cds] Archived narrow_oop_mode = 0, narrow_oop_base = 
>>> 0x0000000000000000, narrow_oop_shift = 0
>>> [0.092s][info][cds] CDS heap data need to be relocated because
>>> [0.092s][info][cds] the archive was created with an incompatible 
>>> heap size: 128M.
>>> [0.092s][info][cds] Current narrow_oop_mode = 2, narrow_oop_base = 
>>> 0x0000001000000000, narrow_oop_shift = 3
>>> Hello World
>>>
>>>> The new HeapShared::decode_not_null() decodes the narrowOop value 
>>>> using the dump time encoding mode with added base delta. Can you 
>>>> please rename it to be more descriptive, for example 
>>>> HeapShared::decode_with_archived_encoding_mode()? Please also add 
>>>> some comments to include these detail for this function.
>>>>
>>>
>>> OK, I will fix that.
>>>
>>>> - src/hotspot/share/memory/filemap.cpp
>>>>
>>>>  886   if (narrow_klass_base() != Universe::narrow_klass_base() ||
>>>>  887       narrow_klass_shift() != Universe::narrow_klass_shift()) {
>>>>  888     log_info(cds)("Cached heap data from the CDS archive need 
>>>> to be relocated because");
>>>>  889     log_info(cds)("the CDS archive was created with an 
>>>> incompatible heap size: " UINTX_FORMAT "M.", max_heap_size()/M);
>>>>  890     log_info(cds)("Current narrow_klass_base = " PTR_FORMAT ", 
>>>> narrow_klass_shift = %d",
>>>>  891 p2i(Universe::narrow_klass_base()), 
>>>> Universe::narrow_klass_shift());
>>>>  892     return;
>>>>  893   }
>>>>
>>>> Please fix the log message at 888 as relocation is not done in 
>>>> above case.
>>>>
>>> Oops, it should have been "Cached heap data from the CDS archive 
>>> cannot be used because", but I messed up with copy-pasting.
>>>
>>>> Please move the code at line 895 - 898 to be before the above code, 
>>>> so we still have the logging information about the dump time 
>>>> encoding modes when runtime narrow_klass_encoding is different. The 
>>>> dump time encoding modes are useful debugging information.
>>>>
>>> OK. Will do.
>>>
>>>>
>>>> 1039 void FileMapInfo::relocate_archived_heap_embedded_pointers()
>>>>
>>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure
>>>>
>>>> 1068 void FileMapInfo::relocate_archived_heap_embedded_pointers_impl
>>>>
>>>> Could you please place above new code blocks in heapShared.* files? 
>>>> I filed JDK-8206009 to move all heap object archiving code to 
>>>> heapShared.*. It's helpful to make sure the related new code goes 
>>>> to the right files before we completely resolve existing code for 
>>>> JDK-8206009.
>>>>
>>>
>>> OK. Will do.
>>>
>>>> - src/hotspot/share/gc/g1/heapRegion.cpp
>>>>
>>>> Can you please check the percentage of unused region space if we 
>>>> force 8M gc region at dump time? The unused spaces in the archived 
>>>> heap regions are filled with dummy objects at runtime and cannot be 
>>>> utilized.
>>> At runtime, the region size is chosen according to -Xmx. The amount 
>>> of waste (filled by dummy objects) is related only to the runtime 
>>> region size. It's not related to the region size chosen at dump time.
>> Yes, the heap memory waste is more related with the runtime GC region 
>> size.
>>
>> The other side effect of forcing hard coded 8M GC region at dump time 
>> is that you might have archived objects across the region boundary at 
>> runtime when the runtime GC region size is less than 8M. When that 
>> happens, you would need to invalidate all archived java heap data. 
>> Also, you need to detect such cases, which adds additional overhead. 
>> However, my memory tells me that we allocate archived objects at 2M 
>> boundary within a GC region regardless the region size at dump time 
>> to make sure we don't ever run into cases with objects across 
>> boundary when runtime and dump have different GC region sizes. I need 
>> to double check on that to make sure it is the case.
>>>
>>> So in this regard, this patch does not increase the amount of 
>>> runtime memory waste.
>>>
>>>> It would also introduce gaps between the archived heap regions at 
>>>> runtime when the GC region size is < 8M with smaller heap size.
>>>
>>> Yes, this is an issue for small heaps. Previously, without my patch, 
>>> if you run with
>>>
>>>     java -Xshare:dump -Xmx128m
>>>     java -Xshare:on -Xmx128m
>>>
>>> the open and closed archive regions will in two consecutive 1MB 
>>> regions.
>>>
>>> With my patch, there will be a gap of 7MB between the 2 regions 
>>> (each as still 1MB in size). This means if you run with very small 
>>> heaps with CDS, you may not be able to allocate very large arrays.
>>>
>>>> We should check with GC team on this and see if there are other 
>>>> alternatives to resolve the overlapping issue. One possible 
>>>> solution I think is to relocate the 'open' archive heap region 
>>>> further down at runtime if the 'open' and 'closed' archive regions 
>>>> are in the same G1 region at runtime.
>>>>
>>> I'll try implement what you described above (in a separate RFE). 
>>> This will make runtime relocation a little slower, but hopefully 
>>> that will just be noise, and I won't need to make any change in G1.
>> We just need to make sure the open archive regions are allocated 
>> outside the GC region used for the closed archive regions. It should 
>> not add additional overhead for object relocation. However, it does 
>> add complexity to the relocation implementation as you would have 
>> different base deltas.
>>>
>>> For this RFE, I will revert the changes in heapRegion.cpp. This 
>>> means in some cases, we won't be able to map the archived regions:
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log2.txt 
>>>
>>>
>>> Note that 2 out of 26 cases fail to map the heap regions with 
>>> "Unable to allocate region, java heap range is already in use",
>>> but it's still works for the majority of the cases, so it still much 
>>> better than what we have today
>> That's a reasonable trade-off. Seems acceptable to me.
>>>
>>>> - 
>>>> test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java
>>>>
>>>> Looks good! How about adding some scenarios with heap size > 32G as 
>>>> sanity checks.
>>>>
>>>
>>> Will do, but with heap size > 32G CDS will be disabled because 
>>> compressed oop is disabled.
>> Yes. I think we don't have any existing test cases with >32G. So it's 
>> good to add it for sanity check.
>>
>> Thanks,
>> Jiangli
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> Thanks,
>>>>
>>>> Jiangli
>>>>
>>>>
>>>> On 8/1/18 11:04 PM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8208658
>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/ 
>>>>>
>>>>>
>>>>> PROBLEM:
>>>>>
>>>>>     At runtime, if you use a maximum heap size (-Xmx) setting that's
>>>>>     different than the one used at CDS archive creation time, the 
>>>>> archived
>>>>>     heap regions may fail to map. The reason is the archived heap 
>>>>> regions
>>>>>     are hard-coded to a specific compressed oop encoding scheme.
>>>>>
>>>>>     This causes degradation in start-up time, because the archived 
>>>>> heap
>>>>>     regions enable many aggressive optimizations, such as 
>>>>> JDK-8202035,
>>>>>     archiving the system module info.
>>>>>
>>>>>     This degradation is undesirable for JDK-8204247 (include default
>>>>>     CDS archive in JDK binary) -- what we want is to ship a CDS 
>>>>> archive
>>>>>     that's optimal regardless of the max heap settings.
>>>>>
>>>>> FIX:
>>>>>
>>>>>     If the compressed oop encoding scheme is different:
>>>>>         - relocate the archived heap regions so it
>>>>>           fits within the runtime heap bounds.
>>>>>         - patch all reference fields in the archived objects to
>>>>>           use the runtime compressed oop encoding scheme.
>>>>>
>>>>> RESULTS:
>>>>>
>>>>>     # dump archive
>>>>>     java -Xshare:dump -Xmx2000m
>>>>>
>>>>>     # same compressed oop encoding
>>>>>     java -Xshare:on -Xmx2000m -Xlog:cds -version
>>>>>
>>>>>     # different compressed oop encoding
>>>>>     java -Xshare:on -Xmx2090m -Xlog:cds -version
>>>>>
>>>>>     Note: the archive is dumped with UnscaledNarrowOop encoding.
>>>>>     At run time, if -Xmx2090m is specified, ZeroBasedNarrowOop is
>>>>>     used instead.
>>>>>
>>>>> ======================================================================== 
>>>>>
>>>>>     OLD same encoding      34.45 ms +- 0.51%
>>>>>     OLD diff encoding      47.76 ms +- 0.94% <<<<< 13.3 ms 
>>>>> degradation!
>>>>>
>>>>>     NEW same encoding      34.58 ms +- 0.61%
>>>>>     NEW diff encoding      35.09 ms +- 0.24% <<<<< 0.5 ms degradation
>>>>> ======================================================================== 
>>>>>
>>>>>
>>>>>
>>>>>     The cost of relocating/patching the archived heap regions is only
>>>>>     0.5 ms. This is much better than the 13.3 ms degradation 
>>>>> caused by the
>>>>>     failure to map the archived heap regions.
>>>>>
>>>>> TESTING:
>>>>>
>>>>>     All 150+ CDS tests passed locally. I am starting tiers 1/2/3 
>>>>> runs.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>
>>>
>>
>




More information about the hotspot-gc-dev mailing list