RFR(S) 8208658: Make CDS archived heap regions usable even if compressed oop encoding has changed
Jiangli Zhou
jiangli.zhou at oracle.com
Tue Aug 21 03:54:57 UTC 2018
Looks good. It would be good to ask someone in the GC team to also
review the changes, especially the FileMapInfo::map_heap_regions_impl()
code that maps the archived heap regions into the runtime java heap in
the relocation case.
Thanks,
Jiangli
On 8/20/18 5:32 PM, Ioi Lam wrote:
> Hi Jiangli,
>
> Thanks for the review. I've updated the webrev at:
>
> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v02/
>
>
> Delta from the last webrev:
>
> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v02.delta/
>
>
> Please see comments below
>
>
> On 8/17/2018 11:49 AM, Jiangli Zhou wrote:
>> Hi Ioi,
>>
>> Thanks a lot for the updated webrev! Please see some more
>> comments/suggestions below.
>>
>> - src/hotspot/share/memory/filemap.cpp
>>
>> 195 if (MetaspaceShared::is_heap_object_archiving_allowed()) {
>> 196 _g1_reserved = G1CollectedHeap::heap()->g1_reserved();
>> 197 }
>>
>> The above are G1 specific. It should be included under #if INCLUDE_G1GC.
>>
>
> Fixed. I tested with "configure --with-jvm-features=-g1gc ...". I
> ended up having to fix a couple of JFR files as well.
>
>> 881 MemRegion
>> FileMapInfo::get_heap_regions_range_with_current_oop_encoding_mode() {
>> ...
>> 884
>> 885 for (int i = MetaspaceShared::first_string; i <=
>> MetaspaceShared::last_valid_region; i++) {
>> 886 FileMapHeader::space_info* si = space_at(i);
>> 887 size_t size = si->_used;
>> 888 if (size > 0) {
>> 889 address s =
>> si->start_address_with_current_oop_encoding_mode();
>> 890 address e = s + size;
>> 891 if (start > s) {
>> 892 start = s;
>> 893 }
>> 894 if (end < e) {
>> 895 end = e;
>> 896 }
>> 897 }
>> 898 }
>>
>> We can record the ‘start’ and ‘end’ within the archive at dump time.
>> At runtime, we only need to compute the addresses of the ‘start’ and
>> ‘end’ using the current encoding mode without iterating all archived
>> heap spaces. It’s a minor optimization, we can do that after current
>> RFE if you prefer.
>>
>
> I used to have a version like what you suggested. However, that means
> I need to split the logic between dump time and run time, and you end
> up with more code. Since this code is not performance critical, I
> think it's better to put it all in one place.
>
>> - src/hotspot/share/memory/filemap.hpp
>> 91 address _oopmap; // bitmap for relocating embedded
>> oops
>>
>> Nit. ‘pointers’ seems to be a better word than ‘oops’ here.
>>
>
> oopmap is actually a common terminology in HotSpot that means 'a map
> of all the pointers to oop'
>
>> 316 MemRegion get_heap_regions_range_with_current_oop_encoding_mode();
>>
>> Does the above needs NOT_CDS_JAVA_HEAP_RETURN also?
>>
>
> I don't know how to easily return an 'empty' ResourceBitMap. Since
> this function is used only by the archived heap code, I moved its
> declaration inside INCLUDE_CDS_JAVA_HEAP.
>
>> 524 class FindEmbeddedPointers: public BasicOopIterateClosure {
>>
>> Please add some comments for the FindEmbeddedPointers closure. Since
>> we iterate all objects inside the archived heap regions, we still can
>> end up with the correct & complete bitmaps without follow the
>> references within each archived object. BTW, can you please check and
>> make sure NULL oop pointers are not on the bitmap so we don’t waste
>> cycles at runtime.
>>
>
> That's a good suggestion. I added code to filter out all the null
> pointers. It turns out about 50% of the pointers in the open region
> are NULL, so it's good saving. The new version is slightly faster than
> the last version.
>
>> Please also add comments explaining it’s dump time only logic.
>>
>
> Done.
>
>> 564 class PatchEmbeddedPointers: public BitMapClosure {
>>
>> Some comments would be useful to explain it’s for runtime use.
>>
>
> Done
>
>> 573 if (!CompressedOops::is_null(v)) {
>> 574 oop o =
>> HeapShared::decode_with_archived_oop_encoding_mode(v);
>> 575 RawAccess<IS_NOT_NULL>::oop_store(p, o);
>> 576 }
>>
>> So looks like we do include null in the bitmap? If we eliminate the
>> nulls from the bitmap, the is_null check above can be avoided at
>> runtime. It’s a very small optimization, but probably worth to consider?
>>
>
> Fixed
>
>> - src/hotspot/share/oops/instanceMirrorKlass.hpp
>>
>> 102 static void serialize(class SerializeClosure* f) NOT_CDS_RETURN;
>>
>> To sync up with your recent change, the above can be renamed to
>> serialize_offset.
>>
>
> Fixed
>
>> - test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java
>>
>> Could you please add the scenario that Calvin suggested:
>>
>>
>> new Scenario( 32, 32, 64, 512, 2048, 4097,
>> 16374, 31000)
>
> Added.
>
> Thanks
> - Ioi
>
>>
>> Thanks,
>> Jiangli
>>
>> On 8/14/18 7:27 PM, Ioi Lam wrote:
>>> Here's an updated webrev:
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/log.txt
>>>
>>>
>>>
>>> [1] Incorporated many suggestions from Jiangli (Thanks!):
>>>
>>> - Moved the patching code code to heapShared.cpp
>>> - Added various comments and improved logging (see above log.txt
>>> link)
>>> - Added test cases for >32GB heaps.
>>>
>>> [2] Fixed the region relocation logic to avoid any GC code changes and
>>> support all possible compressed oop encoding changes.
>>>
>>> It turns out I don't need to force 8MB region size during dump
>>> time.
>>> I just need to align the bottom of the closed heap on G1 region
>>> boundary. This will ensure that the open and close heap will never
>>> occupy the same G1 region.
>>>
>>> All test cases in DifferentHeapSizes.java can successfully
>>> map the archived regions.
>>>
>>> See FileMapInfo::map_heap_regions_impl
>>>
>>> [3] Generate a bitmap of all the pointers at dump time, and use this
>>> for pointer patching at runtime. I can't find any measurable
>>> improvements, but it should be at least as fast as the last
>>> version (which used a BasicOopIterateClosure at runtime).
>>>
>>> [4] Various clean up, such as
>>>
>>> struct FileMapInfo::FileMapHeader::space_info* si =
>>> &_header->_space[i];
>>> ->
>>> FileMapHeader::space_info* si = space_at(i);
>>>
>>> Testing: Passed hs tiers 1/2/3/4 with no new failures.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>> On 8/7/18 6:47 PM, Jiangli Zhou wrote:
>>>> On 8/7/18 3:38 PM, Ioi Lam wrote:
>>>>
>>>>> Hi Jiangli,
>>>>>
>>>>> Thanks for the review.
>>>>>
>>>>>
>>>>> On 8/7/18 9:25 AM, Jiangli Zhou wrote:
>>>>>> Hi Ioi,
>>>>>>
>>>>>> This is a welcomed change! Thanks for doing this. I'd like to
>>>>>> suggest building a relocation table at dump time. The table can
>>>>>> contain entries with pair of
>>>>>> 'offset_to_region_start:narrow_oop_value'. Then at runtime, we
>>>>>> can avoid iterating the objects. That would help to reduce the
>>>>>> overhead that you are observing when relocation is needed at
>>>>>> runtime. This could be done as a further optimization in a
>>>>>> follow-up RFE for 12.
>>>>>>
>>>>>
>>>>> That's a good suggestion. I'll try to do that as part of this
>>>>> fix, by generating a bitmap of all the pointers that need to be
>>>>> relocated. I will need one bit per 8 bytes of archived heap data,
>>>>> so the (read-only) memory cost is pretty minimal.
>>>>>
>>>>>> - src/hotspot/share/memory/heapShared.inline.hpp
>>>>>>
>>>>>> - src/hotspot/share/memory/heapShared.cpp
>>>>>>
>>>>>> 33 inline oop HeapShared::decode_not_null(narrowOop v) {
>>>>>> 34 assert(!CompressedOops::is_null(v), "narrow oop value can
>>>>>> never be zero");
>>>>>> 35 oop result = (oop)(void*)((uintptr_t)_narrow_oop_base +
>>>>>> ((uintptr_t)v << _narrow_oop_shift));
>>>>>> 36 assert(check_obj_alignment(result), "address not aligned:
>>>>>> " INTPTR_FORMAT, p2i((void*) result));
>>>>>> 37 return result;
>>>>>> 38 }
>>>>>>
>>>>>> 510 void HeapShared::init_narrow_oop_decoding(address base, int
>>>>>> shift) {
>>>>>> 511 _narrow_oop_base = base;
>>>>>> 512 _narrow_oop_shift = shift;
>>>>>> 513 }
>>>>>>
>>>>>> - src/hotspot/share/memory/filemap.cpp
>>>>>>
>>>>>>
>>>>>> 913 // dumptime heap end ------------v
>>>>>> 914 // [ |archived heap regions| ] runtime heap end
>>>>>> ------v
>>>>>> 915 // [ |archived
>>>>>> heap regions| ]
>>>>>> 916 // |<-----delta-------------------->|
>>>>>> 917 //
>>>>>> 918 // At dump time, the archived heap region were near the
>>>>>> top of the heap.
>>>>>> 919 // At run time, that region may not be inside the heap,
>>>>>> so we move it so
>>>>>> 920 // that it's now near the top of teh runtime time. This
>>>>>> can be done by
>>>>>> 921 // the simple math of adding the delta as shown above.
>>>>>> 922 address dumptime_heap_end =
>>>>>> (address)_header->_g1_reserved.end();
>>>>>> 923 address runtime_heap_end =
>>>>>> (address)G1CollectedHeap::heap()->g1_reserved().end();
>>>>>> 924 delta = runtime_heap_end - dumptime_heap_end;
>>>>>> 925 }
>>>>>> 926
>>>>>> 927 HeapShared::init_narrow_oop_decoding(narrow_oop_base() +
>>>>>> delta, narrow_oop_shift());
>>>>>>
>>>>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure {
>>>>>> ...
>>>>>> 1056 virtual void do_oop(narrowOop *p) {
>>>>>> 1057 narrowOop v = *p;
>>>>>> 1058 if (!CompressedOops::is_null(v)) {
>>>>>> 1059 oop o = HeapShared::decode_not_null(v);
>>>>>> 1060 RawAccess<IS_NOT_NULL>::oop_store(p, o);
>>>>>> 1061 }
>>>>>> 1062 }
>>>>>>
>>>>>> Your relocation change boils down to above sections of code. It
>>>>>> works as both UnscaledNarrowOop encoding and ZeroBasedNarrowOop
>>>>>> encoding use 0 as the base. I believe we don't handle heap based
>>>>>> encoding with CDS enabled. Can you please double check that.
>>>>>>
>>>>> It actually does work. See
>>>>>
>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log.txt
>>>>>
>>>>>
>>>>> Command line:
>>>>> [/home/iklam/jdk/bld/abe-fastdebug/images/jdk/bin/java ...
>>>>> -Xmx31000m -XX:+UseStringDeduplication Hello ]
>>>>>
>>>>> [STDOUT]
>>>>> [0.092s][info][cds] Archived narrow_klass_base =
>>>>> 0x0000000800000000, narrow_klass_shift = 3
>>>>> [0.092s][info][cds] Archived narrow_oop_mode = 0, narrow_oop_base
>>>>> = 0x0000000000000000, narrow_oop_shift = 0
>>>>> [0.092s][info][cds] CDS heap data need to be relocated because
>>>>> [0.092s][info][cds] the archive was created with an incompatible
>>>>> heap size: 128M.
>>>>> [0.092s][info][cds] Current narrow_oop_mode = 2, narrow_oop_base =
>>>>> 0x0000001000000000, narrow_oop_shift = 3
>>>>> Hello World
>>>>>
>>>>>> The new HeapShared::decode_not_null() decodes the narrowOop value
>>>>>> using the dump time encoding mode with added base delta. Can you
>>>>>> please rename it to be more descriptive, for example
>>>>>> HeapShared::decode_with_archived_encoding_mode()? Please also add
>>>>>> some comments to include these detail for this function.
>>>>>>
>>>>>
>>>>> OK, I will fix that.
>>>>>
>>>>>> - src/hotspot/share/memory/filemap.cpp
>>>>>>
>>>>>> 886 if (narrow_klass_base() != Universe::narrow_klass_base() ||
>>>>>> 887 narrow_klass_shift() != Universe::narrow_klass_shift()) {
>>>>>> 888 log_info(cds)("Cached heap data from the CDS archive
>>>>>> need to be relocated because");
>>>>>> 889 log_info(cds)("the CDS archive was created with an
>>>>>> incompatible heap size: " UINTX_FORMAT "M.", max_heap_size()/M);
>>>>>> 890 log_info(cds)("Current narrow_klass_base = " PTR_FORMAT
>>>>>> ", narrow_klass_shift = %d",
>>>>>> 891 p2i(Universe::narrow_klass_base()),
>>>>>> Universe::narrow_klass_shift());
>>>>>> 892 return;
>>>>>> 893 }
>>>>>>
>>>>>> Please fix the log message at 888 as relocation is not done in
>>>>>> above case.
>>>>>>
>>>>> Oops, it should have been "Cached heap data from the CDS archive
>>>>> cannot be used because", but I messed up with copy-pasting.
>>>>>
>>>>>> Please move the code at line 895 - 898 to be before the above
>>>>>> code, so we still have the logging information about the dump
>>>>>> time encoding modes when runtime narrow_klass_encoding is
>>>>>> different. The dump time encoding modes are useful debugging
>>>>>> information.
>>>>>>
>>>>> OK. Will do.
>>>>>
>>>>>>
>>>>>> 1039 void FileMapInfo::relocate_archived_heap_embedded_pointers()
>>>>>>
>>>>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure
>>>>>>
>>>>>> 1068 void FileMapInfo::relocate_archived_heap_embedded_pointers_impl
>>>>>>
>>>>>> Could you please place above new code blocks in heapShared.*
>>>>>> files? I filed JDK-8206009 to move all heap object archiving code
>>>>>> to heapShared.*. It's helpful to make sure the related new code
>>>>>> goes to the right files before we completely resolve existing
>>>>>> code for JDK-8206009.
>>>>>>
>>>>>
>>>>> OK. Will do.
>>>>>
>>>>>> - src/hotspot/share/gc/g1/heapRegion.cpp
>>>>>>
>>>>>> Can you please check the percentage of unused region space if we
>>>>>> force 8M gc region at dump time? The unused spaces in the
>>>>>> archived heap regions are filled with dummy objects at runtime
>>>>>> and cannot be utilized.
>>>>> At runtime, the region size is chosen according to -Xmx. The
>>>>> amount of waste (filled by dummy objects) is related only to the
>>>>> runtime region size. It's not related to the region size chosen at
>>>>> dump time.
>>>> Yes, the heap memory waste is more related with the runtime GC
>>>> region size.
>>>>
>>>> The other side effect of forcing hard coded 8M GC region at dump
>>>> time is that you might have archived objects across the region
>>>> boundary at runtime when the runtime GC region size is less than
>>>> 8M. When that happens, you would need to invalidate all archived
>>>> java heap data. Also, you need to detect such cases, which adds
>>>> additional overhead. However, my memory tells me that we allocate
>>>> archived objects at 2M boundary within a GC region regardless the
>>>> region size at dump time to make sure we don't ever run into cases
>>>> with objects across boundary when runtime and dump have different
>>>> GC region sizes. I need to double check on that to make sure it is
>>>> the case.
>>>>>
>>>>> So in this regard, this patch does not increase the amount of
>>>>> runtime memory waste.
>>>>>
>>>>>> It would also introduce gaps between the archived heap regions at
>>>>>> runtime when the GC region size is < 8M with smaller heap size.
>>>>>
>>>>> Yes, this is an issue for small heaps. Previously, without my
>>>>> patch, if you run with
>>>>>
>>>>> java -Xshare:dump -Xmx128m
>>>>> java -Xshare:on -Xmx128m
>>>>>
>>>>> the open and closed archive regions will in two consecutive 1MB
>>>>> regions.
>>>>>
>>>>> With my patch, there will be a gap of 7MB between the 2 regions
>>>>> (each as still 1MB in size). This means if you run with very small
>>>>> heaps with CDS, you may not be able to allocate very large arrays.
>>>>>
>>>>>> We should check with GC team on this and see if there are other
>>>>>> alternatives to resolve the overlapping issue. One possible
>>>>>> solution I think is to relocate the 'open' archive heap region
>>>>>> further down at runtime if the 'open' and 'closed' archive
>>>>>> regions are in the same G1 region at runtime.
>>>>>>
>>>>> I'll try implement what you described above (in a separate RFE).
>>>>> This will make runtime relocation a little slower, but hopefully
>>>>> that will just be noise, and I won't need to make any change in G1.
>>>> We just need to make sure the open archive regions are allocated
>>>> outside the GC region used for the closed archive regions. It
>>>> should not add additional overhead for object relocation. However,
>>>> it does add complexity to the relocation implementation as you
>>>> would have different base deltas.
>>>>>
>>>>> For this RFE, I will revert the changes in heapRegion.cpp. This
>>>>> means in some cases, we won't be able to map the archived regions:
>>>>>
>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log2.txt
>>>>>
>>>>>
>>>>> Note that 2 out of 26 cases fail to map the heap regions with
>>>>> "Unable to allocate region, java heap range is already in use",
>>>>> but it's still works for the majority of the cases, so it still
>>>>> much better than what we have today
>>>> That's a reasonable trade-off. Seems acceptable to me.
>>>>>
>>>>>> -
>>>>>> test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java
>>>>>>
>>>>>>
>>>>>> Looks good! How about adding some scenarios with heap size > 32G
>>>>>> as sanity checks.
>>>>>>
>>>>>
>>>>> Will do, but with heap size > 32G CDS will be disabled because
>>>>> compressed oop is disabled.
>>>> Yes. I think we don't have any existing test cases with >32G. So
>>>> it's good to add it for sanity check.
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jiangli
>>>>>>
>>>>>>
>>>>>> On 8/1/18 11:04 PM, Ioi Lam wrote:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8208658
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/
>>>>>>>
>>>>>>>
>>>>>>> PROBLEM:
>>>>>>>
>>>>>>> At runtime, if you use a maximum heap size (-Xmx) setting
>>>>>>> that's
>>>>>>> different than the one used at CDS archive creation time,
>>>>>>> the archived
>>>>>>> heap regions may fail to map. The reason is the archived
>>>>>>> heap regions
>>>>>>> are hard-coded to a specific compressed oop encoding scheme.
>>>>>>>
>>>>>>> This causes degradation in start-up time, because the
>>>>>>> archived heap
>>>>>>> regions enable many aggressive optimizations, such as
>>>>>>> JDK-8202035,
>>>>>>> archiving the system module info.
>>>>>>>
>>>>>>> This degradation is undesirable for JDK-8204247 (include
>>>>>>> default
>>>>>>> CDS archive in JDK binary) -- what we want is to ship a CDS
>>>>>>> archive
>>>>>>> that's optimal regardless of the max heap settings.
>>>>>>>
>>>>>>> FIX:
>>>>>>>
>>>>>>> If the compressed oop encoding scheme is different:
>>>>>>> - relocate the archived heap regions so it
>>>>>>> fits within the runtime heap bounds.
>>>>>>> - patch all reference fields in the archived objects to
>>>>>>> use the runtime compressed oop encoding scheme.
>>>>>>>
>>>>>>> RESULTS:
>>>>>>>
>>>>>>> # dump archive
>>>>>>> java -Xshare:dump -Xmx2000m
>>>>>>>
>>>>>>> # same compressed oop encoding
>>>>>>> java -Xshare:on -Xmx2000m -Xlog:cds -version
>>>>>>>
>>>>>>> # different compressed oop encoding
>>>>>>> java -Xshare:on -Xmx2090m -Xlog:cds -version
>>>>>>>
>>>>>>> Note: the archive is dumped with UnscaledNarrowOop encoding.
>>>>>>> At run time, if -Xmx2090m is specified, ZeroBasedNarrowOop is
>>>>>>> used instead.
>>>>>>>
>>>>>>> ========================================================================
>>>>>>>
>>>>>>> OLD same encoding 34.45 ms +- 0.51%
>>>>>>> OLD diff encoding 47.76 ms +- 0.94% <<<<< 13.3 ms
>>>>>>> degradation!
>>>>>>>
>>>>>>> NEW same encoding 34.58 ms +- 0.61%
>>>>>>> NEW diff encoding 35.09 ms +- 0.24% <<<<< 0.5 ms
>>>>>>> degradation
>>>>>>> ========================================================================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The cost of relocating/patching the archived heap regions is
>>>>>>> only
>>>>>>> 0.5 ms. This is much better than the 13.3 ms degradation
>>>>>>> caused by the
>>>>>>> failure to map the archived heap regions.
>>>>>>>
>>>>>>> TESTING:
>>>>>>>
>>>>>>> All 150+ CDS tests passed locally. I am starting tiers 1/2/3
>>>>>>> runs.
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-gc-dev
mailing list