RFR(S) 8208658: Make CDS archived heap regions usable even if compressed oop encoding has changed

Ioi Lam ioi.lam at oracle.com
Wed Aug 15 02:27:39 UTC 2018


Here's an updated webrev:

http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/
http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/log.txt


[1] Incorporated many suggestions from Jiangli (Thanks!):

     - Moved the patching code code to heapShared.cpp
     - Added various comments and improved logging (see above log.txt link)
     - Added test cases for >32GB heaps.

[2] Fixed the region relocation logic to avoid any GC code changes and
     support all possible compressed oop encoding changes.

     It turns out I don't need to force 8MB region size during dump time.
     I just need to align the bottom of the closed heap on G1 region
     boundary. This will ensure that the open and close heap will never
     occupy the same G1 region.

     All test cases in DifferentHeapSizes.java can successfully
     map the archived regions.

     See FileMapInfo::map_heap_regions_impl

[3] Generate a bitmap of all the pointers at dump time, and use this
     for pointer patching at runtime. I can't find any measurable
     improvements, but it should be at least as fast as the last
     version (which used a BasicOopIterateClosure at runtime).

[4] Various clean up, such as

     struct FileMapInfo::FileMapHeader::space_info* si = 
&_header->_space[i];
   ->
     FileMapHeader::space_info* si = space_at(i);

Testing: Passed hs tiers 1/2/3/4 with no new failures.

Thanks
- Ioi


On 8/7/18 6:47 PM, Jiangli Zhou wrote:
> On 8/7/18 3:38 PM, Ioi Lam wrote:
>
>> Hi Jiangli,
>>
>> Thanks for the review.
>>
>>
>> On 8/7/18 9:25 AM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>> This is a welcomed change! Thanks for doing this. I'd like to 
>>> suggest building a relocation table at dump time. The table can 
>>> contain entries with pair of 
>>> 'offset_to_region_start:narrow_oop_value'. Then at runtime, we can 
>>> avoid iterating the objects. That would help to reduce the overhead 
>>> that you are observing when relocation is needed at runtime. This 
>>> could be done as a further optimization in a follow-up RFE for 12.
>>>
>>
>> That's a good suggestion.  I'll try to do that as part of this fix, 
>> by generating a bitmap of all the pointers that need to be relocated. 
>> I will need one bit per 8 bytes of archived heap data, so the 
>> (read-only) memory cost is pretty minimal.
>>
>>> - src/hotspot/share/memory/heapShared.inline.hpp
>>>
>>> - src/hotspot/share/memory/heapShared.cpp
>>>
>>>   33 inline oop HeapShared::decode_not_null(narrowOop v) {
>>>   34   assert(!CompressedOops::is_null(v), "narrow oop value can 
>>> never be zero");
>>>   35   oop result = (oop)(void*)((uintptr_t)_narrow_oop_base + 
>>> ((uintptr_t)v << _narrow_oop_shift));
>>>   36   assert(check_obj_alignment(result), "address not aligned: " 
>>> INTPTR_FORMAT, p2i((void*) result));
>>>   37   return result;
>>>   38 }
>>>
>>>  510 void HeapShared::init_narrow_oop_decoding(address base, int 
>>> shift) {
>>>  511   _narrow_oop_base = base;
>>>  512   _narrow_oop_shift = shift;
>>>  513 }
>>>
>>> - src/hotspot/share/memory/filemap.cpp
>>>
>>>
>>>  913     //   dumptime heap end  ------------v
>>>  914     //   [      |archived heap regions| ]         runtime heap 
>>> end ------v
>>>  915     //                                       [ |archived heap 
>>> regions| ]
>>>  916     // |<-----delta-------------------->|
>>>  917     //
>>>  918     // At dump time, the archived heap region were near the top 
>>> of the heap.
>>>  919     // At run time, that region may not be inside the heap, so 
>>> we move it so
>>>  920     // that it's now near the top of teh runtime time. This can 
>>> be done by
>>>  921     // the simple math of adding the delta as shown above.
>>>  922     address dumptime_heap_end = 
>>> (address)_header->_g1_reserved.end();
>>>  923     address runtime_heap_end = 
>>> (address)G1CollectedHeap::heap()->g1_reserved().end();
>>>  924     delta = runtime_heap_end - dumptime_heap_end;
>>>  925   }
>>>  926
>>>  927 HeapShared::init_narrow_oop_decoding(narrow_oop_base() + delta, 
>>> narrow_oop_shift());
>>>
>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure {
>>> ...
>>> 1056   virtual void do_oop(narrowOop *p) {
>>> 1057     narrowOop v = *p;
>>> 1058     if (!CompressedOops::is_null(v)) {
>>> 1059       oop o = HeapShared::decode_not_null(v);
>>> 1060       RawAccess<IS_NOT_NULL>::oop_store(p, o);
>>> 1061     }
>>> 1062   }
>>>
>>> Your relocation change boils down to above sections of code. It 
>>> works as both UnscaledNarrowOop encoding and ZeroBasedNarrowOop 
>>> encoding use 0 as the base. I believe we don't handle heap based 
>>> encoding with CDS enabled. Can you please double check that.
>>>
>> It actually does work. See
>>
>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log.txt 
>>
>>
>> Command line: [/home/iklam/jdk/bld/abe-fastdebug/images/jdk/bin/java 
>> ... -Xmx31000m -XX:+UseStringDeduplication Hello ]
>>
>> [STDOUT]
>> [0.092s][info][cds] Archived narrow_klass_base = 0x0000000800000000, 
>> narrow_klass_shift = 3
>> [0.092s][info][cds] Archived narrow_oop_mode = 0, narrow_oop_base = 
>> 0x0000000000000000, narrow_oop_shift = 0
>> [0.092s][info][cds] CDS heap data need to be relocated because
>> [0.092s][info][cds] the archive was created with an incompatible heap 
>> size: 128M.
>> [0.092s][info][cds] Current narrow_oop_mode = 2, narrow_oop_base = 
>> 0x0000001000000000, narrow_oop_shift = 3
>> Hello World
>>
>>> The new HeapShared::decode_not_null() decodes the narrowOop value 
>>> using the dump time encoding mode with added base delta. Can you 
>>> please rename it to be more descriptive, for example 
>>> HeapShared::decode_with_archived_encoding_mode()? Please also add 
>>> some comments to include these detail for this function.
>>>
>>
>> OK, I will fix that.
>>
>>> - src/hotspot/share/memory/filemap.cpp
>>>
>>>  886   if (narrow_klass_base() != Universe::narrow_klass_base() ||
>>>  887       narrow_klass_shift() != Universe::narrow_klass_shift()) {
>>>  888     log_info(cds)("Cached heap data from the CDS archive need 
>>> to be relocated because");
>>>  889     log_info(cds)("the CDS archive was created with an 
>>> incompatible heap size: " UINTX_FORMAT "M.", max_heap_size()/M);
>>>  890     log_info(cds)("Current narrow_klass_base = " PTR_FORMAT ", 
>>> narrow_klass_shift = %d",
>>>  891 p2i(Universe::narrow_klass_base()), 
>>> Universe::narrow_klass_shift());
>>>  892     return;
>>>  893   }
>>>
>>> Please fix the log message at 888 as relocation is not done in above 
>>> case.
>>>
>> Oops, it should have been "Cached heap data from the CDS archive 
>> cannot be used because", but I messed up with copy-pasting.
>>
>>> Please move the code at line 895 - 898 to be before the above code, 
>>> so we still have the logging information about the dump time 
>>> encoding modes when runtime narrow_klass_encoding is different. The 
>>> dump time encoding modes are useful debugging information.
>>>
>> OK. Will do.
>>
>>>
>>> 1039 void FileMapInfo::relocate_archived_heap_embedded_pointers()
>>>
>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure
>>>
>>> 1068 void FileMapInfo::relocate_archived_heap_embedded_pointers_impl
>>>
>>> Could you please place above new code blocks in heapShared.* files? 
>>> I filed JDK-8206009 to move all heap object archiving code to 
>>> heapShared.*. It's helpful to make sure the related new code goes to 
>>> the right files before we completely resolve existing code for 
>>> JDK-8206009.
>>>
>>
>> OK. Will do.
>>
>>> - src/hotspot/share/gc/g1/heapRegion.cpp
>>>
>>> Can you please check the percentage of unused region space if we 
>>> force 8M gc region at dump time? The unused spaces in the archived 
>>> heap regions are filled with dummy objects at runtime and cannot be 
>>> utilized.
>> At runtime, the region size is chosen according to -Xmx. The amount 
>> of waste (filled by dummy objects) is related only to the runtime 
>> region size. It's not related to the region size chosen at dump time.
> Yes, the heap memory waste is more related with the runtime GC region 
> size.
>
> The other side effect of forcing hard coded 8M GC region at dump time 
> is that you might have archived objects across the region boundary at 
> runtime when the runtime GC region size is less than 8M. When that 
> happens, you would need to invalidate all archived java heap data. 
> Also, you need to detect such cases, which adds additional overhead. 
> However, my memory tells me that we allocate archived objects at 2M 
> boundary within a GC region regardless the region size at dump time to 
> make sure we don't ever run into cases with objects across boundary 
> when runtime and dump have different GC region sizes. I need to double 
> check on that to make sure it is the case.
>>
>> So in this regard, this patch does not increase the amount of runtime 
>> memory waste.
>>
>>> It would also introduce gaps between the archived heap regions at 
>>> runtime when the GC region size is < 8M with smaller heap size.
>>
>> Yes, this is an issue for small heaps. Previously, without my patch, 
>> if you run with
>>
>>     java -Xshare:dump -Xmx128m
>>     java -Xshare:on -Xmx128m
>>
>> the open and closed archive regions will in two consecutive 1MB regions.
>>
>> With my patch, there will be a gap of 7MB between the 2 regions (each 
>> as still 1MB in size). This means if you run with very small heaps 
>> with CDS, you may not be able to allocate very large arrays.
>>
>>> We should check with GC team on this and see if there are other 
>>> alternatives to resolve the overlapping issue. One possible solution 
>>> I think is to relocate the 'open' archive heap region further down 
>>> at runtime if the 'open' and 'closed' archive regions are in the 
>>> same G1 region at runtime.
>>>
>> I'll try implement what you described above (in a separate RFE). This 
>> will make runtime relocation a little slower, but hopefully that will 
>> just be noise, and I won't need to make any change in G1.
> We just need to make sure the open archive regions are allocated 
> outside the GC region used for the closed archive regions. It should 
> not add additional overhead for object relocation. However, it does 
> add complexity to the relocation implementation as you would have 
> different base deltas.
>>
>> For this RFE, I will revert the changes in heapRegion.cpp. This means 
>> in some cases, we won't be able to map the archived regions:
>>
>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log2.txt 
>>
>>
>> Note that 2 out of 26 cases fail to map the heap regions with "Unable 
>> to allocate region, java heap range is already in use",
>> but it's still works for the majority of the cases, so it still much 
>> better than what we have today
> That's a reasonable trade-off. Seems acceptable to me.
>>
>>> - test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java
>>>
>>> Looks good! How about adding some scenarios with heap size > 32G as 
>>> sanity checks.
>>>
>>
>> Will do, but with heap size > 32G CDS will be disabled because 
>> compressed oop is disabled.
> Yes. I think we don't have any existing test cases with >32G. So it's 
> good to add it for sanity check.
>
> Thanks,
> Jiangli
>>
>> Thanks
>> - Ioi
>>
>>> Thanks,
>>>
>>> Jiangli
>>>
>>>
>>> On 8/1/18 11:04 PM, Ioi Lam wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8208658
>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/ 
>>>>
>>>>
>>>> PROBLEM:
>>>>
>>>>     At runtime, if you use a maximum heap size (-Xmx) setting that's
>>>>     different than the one used at CDS archive creation time, the 
>>>> archived
>>>>     heap regions may fail to map. The reason is the archived heap 
>>>> regions
>>>>     are hard-coded to a specific compressed oop encoding scheme.
>>>>
>>>>     This causes degradation in start-up time, because the archived 
>>>> heap
>>>>     regions enable many aggressive optimizations, such as JDK-8202035,
>>>>     archiving the system module info.
>>>>
>>>>     This degradation is undesirable for JDK-8204247 (include default
>>>>     CDS archive in JDK binary) -- what we want is to ship a CDS 
>>>> archive
>>>>     that's optimal regardless of the max heap settings.
>>>>
>>>> FIX:
>>>>
>>>>     If the compressed oop encoding scheme is different:
>>>>         - relocate the archived heap regions so it
>>>>           fits within the runtime heap bounds.
>>>>         - patch all reference fields in the archived objects to
>>>>           use the runtime compressed oop encoding scheme.
>>>>
>>>> RESULTS:
>>>>
>>>>     # dump archive
>>>>     java -Xshare:dump -Xmx2000m
>>>>
>>>>     # same compressed oop encoding
>>>>     java -Xshare:on -Xmx2000m -Xlog:cds -version
>>>>
>>>>     # different compressed oop encoding
>>>>     java -Xshare:on -Xmx2090m -Xlog:cds -version
>>>>
>>>>     Note: the archive is dumped with UnscaledNarrowOop encoding.
>>>>     At run time, if -Xmx2090m is specified, ZeroBasedNarrowOop is
>>>>     used instead.
>>>>
>>>> ======================================================================== 
>>>>
>>>>     OLD same encoding      34.45 ms +- 0.51%
>>>>     OLD diff encoding      47.76 ms +- 0.94% <<<<< 13.3 ms 
>>>> degradation!
>>>>
>>>>     NEW same encoding      34.58 ms +- 0.61%
>>>>     NEW diff encoding      35.09 ms +- 0.24% <<<<< 0.5 ms degradation
>>>> ======================================================================== 
>>>>
>>>>
>>>>
>>>>     The cost of relocating/patching the archived heap regions is only
>>>>     0.5 ms. This is much better than the 13.3 ms degradation caused 
>>>> by the
>>>>     failure to map the archived heap regions.
>>>>
>>>> TESTING:
>>>>
>>>>     All 150+ CDS tests passed locally. I am starting tiers 1/2/3 runs.
>>>>
>>>> Thanks
>>>> - Ioi
>>>
>>
>




More information about the hotspot-gc-dev mailing list