RFR(S) 8208658: Make CDS archived heap regions usable even if compressed oop encoding has changed
Ioi Lam
ioi.lam at oracle.com
Tue Aug 21 00:32:10 UTC 2018
Hi Jiangli,
Thanks for the review. I've updated the webrev at:
http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v02/
Delta from the last webrev:
http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v02.delta/
Please see comments below
On 8/17/2018 11:49 AM, Jiangli Zhou wrote:
> Hi Ioi,
>
> Thanks a lot for the updated webrev! Please see some more
> comments/suggestions below.
>
> - src/hotspot/share/memory/filemap.cpp
>
> 195 if (MetaspaceShared::is_heap_object_archiving_allowed()) {
> 196 _g1_reserved = G1CollectedHeap::heap()->g1_reserved();
> 197 }
>
> The above are G1 specific. It should be included under #if INCLUDE_G1GC.
>
Fixed. I tested with "configure --with-jvm-features=-g1gc ...". I ended
up having to fix a couple of JFR files as well.
> 881 MemRegion
> FileMapInfo::get_heap_regions_range_with_current_oop_encoding_mode() {
> ...
> 884
> 885 for (int i = MetaspaceShared::first_string; i <=
> MetaspaceShared::last_valid_region; i++) {
> 886 FileMapHeader::space_info* si = space_at(i);
> 887 size_t size = si->_used;
> 888 if (size > 0) {
> 889 address s =
> si->start_address_with_current_oop_encoding_mode();
> 890 address e = s + size;
> 891 if (start > s) {
> 892 start = s;
> 893 }
> 894 if (end < e) {
> 895 end = e;
> 896 }
> 897 }
> 898 }
>
> We can record the ‘start’ and ‘end’ within the archive at dump time.
> At runtime, we only need to compute the addresses of the ‘start’ and
> ‘end’ using the current encoding mode without iterating all archived
> heap spaces. It’s a minor optimization, we can do that after current
> RFE if you prefer.
>
I used to have a version like what you suggested. However, that means I
need to split the logic between dump time and run time, and you end up
with more code. Since this code is not performance critical, I think
it's better to put it all in one place.
> - src/hotspot/share/memory/filemap.hpp
> 91 address _oopmap; // bitmap for relocating embedded
> oops
>
> Nit. ‘pointers’ seems to be a better word than ‘oops’ here.
>
oopmap is actually a common terminology in HotSpot that means 'a map of
all the pointers to oop'
> 316 MemRegion get_heap_regions_range_with_current_oop_encoding_mode();
>
> Does the above needs NOT_CDS_JAVA_HEAP_RETURN also?
>
I don't know how to easily return an 'empty' ResourceBitMap. Since this
function is used only by the archived heap code, I moved its declaration
inside INCLUDE_CDS_JAVA_HEAP.
> 524 class FindEmbeddedPointers: public BasicOopIterateClosure {
>
> Please add some comments for the FindEmbeddedPointers closure. Since
> we iterate all objects inside the archived heap regions, we still can
> end up with the correct & complete bitmaps without follow the
> references within each archived object. BTW, can you please check and
> make sure NULL oop pointers are not on the bitmap so we don’t waste
> cycles at runtime.
>
That's a good suggestion. I added code to filter out all the null
pointers. It turns out about 50% of the pointers in the open region are
NULL, so it's good saving. The new version is slightly faster than the
last version.
> Please also add comments explaining it’s dump time only logic.
>
Done.
> 564 class PatchEmbeddedPointers: public BitMapClosure {
>
> Some comments would be useful to explain it’s for runtime use.
>
Done
> 573 if (!CompressedOops::is_null(v)) {
> 574 oop o = HeapShared::decode_with_archived_oop_encoding_mode(v);
> 575 RawAccess<IS_NOT_NULL>::oop_store(p, o);
> 576 }
>
> So looks like we do include null in the bitmap? If we eliminate the
> nulls from the bitmap, the is_null check above can be avoided at
> runtime. It’s a very small optimization, but probably worth to consider?
>
Fixed
> - src/hotspot/share/oops/instanceMirrorKlass.hpp
>
> 102 static void serialize(class SerializeClosure* f) NOT_CDS_RETURN;
>
> To sync up with your recent change, the above can be renamed to
> serialize_offset.
>
Fixed
> - test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java
>
> Could you please add the scenario that Calvin suggested:
>
>
> new Scenario( 32, 32, 64, 512, 2048, 4097,
> 16374, 31000)
Added.
Thanks
- Ioi
>
> Thanks,
> Jiangli
>
> On 8/14/18 7:27 PM, Ioi Lam wrote:
>> Here's an updated webrev:
>>
>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/
>>
>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v01/log.txt
>>
>>
>>
>> [1] Incorporated many suggestions from Jiangli (Thanks!):
>>
>> - Moved the patching code code to heapShared.cpp
>> - Added various comments and improved logging (see above log.txt
>> link)
>> - Added test cases for >32GB heaps.
>>
>> [2] Fixed the region relocation logic to avoid any GC code changes and
>> support all possible compressed oop encoding changes.
>>
>> It turns out I don't need to force 8MB region size during dump time.
>> I just need to align the bottom of the closed heap on G1 region
>> boundary. This will ensure that the open and close heap will never
>> occupy the same G1 region.
>>
>> All test cases in DifferentHeapSizes.java can successfully
>> map the archived regions.
>>
>> See FileMapInfo::map_heap_regions_impl
>>
>> [3] Generate a bitmap of all the pointers at dump time, and use this
>> for pointer patching at runtime. I can't find any measurable
>> improvements, but it should be at least as fast as the last
>> version (which used a BasicOopIterateClosure at runtime).
>>
>> [4] Various clean up, such as
>>
>> struct FileMapInfo::FileMapHeader::space_info* si =
>> &_header->_space[i];
>> ->
>> FileMapHeader::space_info* si = space_at(i);
>>
>> Testing: Passed hs tiers 1/2/3/4 with no new failures.
>>
>> Thanks
>> - Ioi
>>
>>
>> On 8/7/18 6:47 PM, Jiangli Zhou wrote:
>>> On 8/7/18 3:38 PM, Ioi Lam wrote:
>>>
>>>> Hi Jiangli,
>>>>
>>>> Thanks for the review.
>>>>
>>>>
>>>> On 8/7/18 9:25 AM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> This is a welcomed change! Thanks for doing this. I'd like to
>>>>> suggest building a relocation table at dump time. The table can
>>>>> contain entries with pair of
>>>>> 'offset_to_region_start:narrow_oop_value'. Then at runtime, we can
>>>>> avoid iterating the objects. That would help to reduce the
>>>>> overhead that you are observing when relocation is needed at
>>>>> runtime. This could be done as a further optimization in a
>>>>> follow-up RFE for 12.
>>>>>
>>>>
>>>> That's a good suggestion. I'll try to do that as part of this fix,
>>>> by generating a bitmap of all the pointers that need to be
>>>> relocated. I will need one bit per 8 bytes of archived heap data,
>>>> so the (read-only) memory cost is pretty minimal.
>>>>
>>>>> - src/hotspot/share/memory/heapShared.inline.hpp
>>>>>
>>>>> - src/hotspot/share/memory/heapShared.cpp
>>>>>
>>>>> 33 inline oop HeapShared::decode_not_null(narrowOop v) {
>>>>> 34 assert(!CompressedOops::is_null(v), "narrow oop value can
>>>>> never be zero");
>>>>> 35 oop result = (oop)(void*)((uintptr_t)_narrow_oop_base +
>>>>> ((uintptr_t)v << _narrow_oop_shift));
>>>>> 36 assert(check_obj_alignment(result), "address not aligned: "
>>>>> INTPTR_FORMAT, p2i((void*) result));
>>>>> 37 return result;
>>>>> 38 }
>>>>>
>>>>> 510 void HeapShared::init_narrow_oop_decoding(address base, int
>>>>> shift) {
>>>>> 511 _narrow_oop_base = base;
>>>>> 512 _narrow_oop_shift = shift;
>>>>> 513 }
>>>>>
>>>>> - src/hotspot/share/memory/filemap.cpp
>>>>>
>>>>>
>>>>> 913 // dumptime heap end ------------v
>>>>> 914 // [ |archived heap regions| ] runtime heap end
>>>>> ------v
>>>>> 915 // [ |archived heap
>>>>> regions| ]
>>>>> 916 // |<-----delta-------------------->|
>>>>> 917 //
>>>>> 918 // At dump time, the archived heap region were near the
>>>>> top of the heap.
>>>>> 919 // At run time, that region may not be inside the heap,
>>>>> so we move it so
>>>>> 920 // that it's now near the top of teh runtime time. This
>>>>> can be done by
>>>>> 921 // the simple math of adding the delta as shown above.
>>>>> 922 address dumptime_heap_end =
>>>>> (address)_header->_g1_reserved.end();
>>>>> 923 address runtime_heap_end =
>>>>> (address)G1CollectedHeap::heap()->g1_reserved().end();
>>>>> 924 delta = runtime_heap_end - dumptime_heap_end;
>>>>> 925 }
>>>>> 926
>>>>> 927 HeapShared::init_narrow_oop_decoding(narrow_oop_base() +
>>>>> delta, narrow_oop_shift());
>>>>>
>>>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure {
>>>>> ...
>>>>> 1056 virtual void do_oop(narrowOop *p) {
>>>>> 1057 narrowOop v = *p;
>>>>> 1058 if (!CompressedOops::is_null(v)) {
>>>>> 1059 oop o = HeapShared::decode_not_null(v);
>>>>> 1060 RawAccess<IS_NOT_NULL>::oop_store(p, o);
>>>>> 1061 }
>>>>> 1062 }
>>>>>
>>>>> Your relocation change boils down to above sections of code. It
>>>>> works as both UnscaledNarrowOop encoding and ZeroBasedNarrowOop
>>>>> encoding use 0 as the base. I believe we don't handle heap based
>>>>> encoding with CDS enabled. Can you please double check that.
>>>>>
>>>> It actually does work. See
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log.txt
>>>>
>>>>
>>>> Command line:
>>>> [/home/iklam/jdk/bld/abe-fastdebug/images/jdk/bin/java ...
>>>> -Xmx31000m -XX:+UseStringDeduplication Hello ]
>>>>
>>>> [STDOUT]
>>>> [0.092s][info][cds] Archived narrow_klass_base =
>>>> 0x0000000800000000, narrow_klass_shift = 3
>>>> [0.092s][info][cds] Archived narrow_oop_mode = 0, narrow_oop_base =
>>>> 0x0000000000000000, narrow_oop_shift = 0
>>>> [0.092s][info][cds] CDS heap data need to be relocated because
>>>> [0.092s][info][cds] the archive was created with an incompatible
>>>> heap size: 128M.
>>>> [0.092s][info][cds] Current narrow_oop_mode = 2, narrow_oop_base =
>>>> 0x0000001000000000, narrow_oop_shift = 3
>>>> Hello World
>>>>
>>>>> The new HeapShared::decode_not_null() decodes the narrowOop value
>>>>> using the dump time encoding mode with added base delta. Can you
>>>>> please rename it to be more descriptive, for example
>>>>> HeapShared::decode_with_archived_encoding_mode()? Please also add
>>>>> some comments to include these detail for this function.
>>>>>
>>>>
>>>> OK, I will fix that.
>>>>
>>>>> - src/hotspot/share/memory/filemap.cpp
>>>>>
>>>>> 886 if (narrow_klass_base() != Universe::narrow_klass_base() ||
>>>>> 887 narrow_klass_shift() != Universe::narrow_klass_shift()) {
>>>>> 888 log_info(cds)("Cached heap data from the CDS archive need
>>>>> to be relocated because");
>>>>> 889 log_info(cds)("the CDS archive was created with an
>>>>> incompatible heap size: " UINTX_FORMAT "M.", max_heap_size()/M);
>>>>> 890 log_info(cds)("Current narrow_klass_base = " PTR_FORMAT
>>>>> ", narrow_klass_shift = %d",
>>>>> 891 p2i(Universe::narrow_klass_base()),
>>>>> Universe::narrow_klass_shift());
>>>>> 892 return;
>>>>> 893 }
>>>>>
>>>>> Please fix the log message at 888 as relocation is not done in
>>>>> above case.
>>>>>
>>>> Oops, it should have been "Cached heap data from the CDS archive
>>>> cannot be used because", but I messed up with copy-pasting.
>>>>
>>>>> Please move the code at line 895 - 898 to be before the above
>>>>> code, so we still have the logging information about the dump time
>>>>> encoding modes when runtime narrow_klass_encoding is different.
>>>>> The dump time encoding modes are useful debugging information.
>>>>>
>>>> OK. Will do.
>>>>
>>>>>
>>>>> 1039 void FileMapInfo::relocate_archived_heap_embedded_pointers()
>>>>>
>>>>> 1051 class RelocateInternalPointers: public BasicOopIterateClosure
>>>>>
>>>>> 1068 void FileMapInfo::relocate_archived_heap_embedded_pointers_impl
>>>>>
>>>>> Could you please place above new code blocks in heapShared.*
>>>>> files? I filed JDK-8206009 to move all heap object archiving code
>>>>> to heapShared.*. It's helpful to make sure the related new code
>>>>> goes to the right files before we completely resolve existing code
>>>>> for JDK-8206009.
>>>>>
>>>>
>>>> OK. Will do.
>>>>
>>>>> - src/hotspot/share/gc/g1/heapRegion.cpp
>>>>>
>>>>> Can you please check the percentage of unused region space if we
>>>>> force 8M gc region at dump time? The unused spaces in the archived
>>>>> heap regions are filled with dummy objects at runtime and cannot
>>>>> be utilized.
>>>> At runtime, the region size is chosen according to -Xmx. The amount
>>>> of waste (filled by dummy objects) is related only to the runtime
>>>> region size. It's not related to the region size chosen at dump time.
>>> Yes, the heap memory waste is more related with the runtime GC
>>> region size.
>>>
>>> The other side effect of forcing hard coded 8M GC region at dump
>>> time is that you might have archived objects across the region
>>> boundary at runtime when the runtime GC region size is less than 8M.
>>> When that happens, you would need to invalidate all archived java
>>> heap data. Also, you need to detect such cases, which adds
>>> additional overhead. However, my memory tells me that we allocate
>>> archived objects at 2M boundary within a GC region regardless the
>>> region size at dump time to make sure we don't ever run into cases
>>> with objects across boundary when runtime and dump have different GC
>>> region sizes. I need to double check on that to make sure it is the
>>> case.
>>>>
>>>> So in this regard, this patch does not increase the amount of
>>>> runtime memory waste.
>>>>
>>>>> It would also introduce gaps between the archived heap regions at
>>>>> runtime when the GC region size is < 8M with smaller heap size.
>>>>
>>>> Yes, this is an issue for small heaps. Previously, without my
>>>> patch, if you run with
>>>>
>>>> java -Xshare:dump -Xmx128m
>>>> java -Xshare:on -Xmx128m
>>>>
>>>> the open and closed archive regions will in two consecutive 1MB
>>>> regions.
>>>>
>>>> With my patch, there will be a gap of 7MB between the 2 regions
>>>> (each as still 1MB in size). This means if you run with very small
>>>> heaps with CDS, you may not be able to allocate very large arrays.
>>>>
>>>>> We should check with GC team on this and see if there are other
>>>>> alternatives to resolve the overlapping issue. One possible
>>>>> solution I think is to relocate the 'open' archive heap region
>>>>> further down at runtime if the 'open' and 'closed' archive regions
>>>>> are in the same G1 region at runtime.
>>>>>
>>>> I'll try implement what you described above (in a separate RFE).
>>>> This will make runtime relocation a little slower, but hopefully
>>>> that will just be noise, and I won't need to make any change in G1.
>>> We just need to make sure the open archive regions are allocated
>>> outside the GC region used for the closed archive regions. It should
>>> not add additional overhead for object relocation. However, it does
>>> add complexity to the relocation implementation as you would have
>>> different base deltas.
>>>>
>>>> For this RFE, I will revert the changes in heapRegion.cpp. This
>>>> means in some cases, we won't be able to map the archived regions:
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/log2.txt
>>>>
>>>>
>>>> Note that 2 out of 26 cases fail to map the heap regions with
>>>> "Unable to allocate region, java heap range is already in use",
>>>> but it's still works for the majority of the cases, so it still
>>>> much better than what we have today
>>> That's a reasonable trade-off. Seems acceptable to me.
>>>>
>>>>> -
>>>>> test/hotspot/jtreg/runtime/appcds/cacheObject/DifferentHeapSizes.java
>>>>>
>>>>> Looks good! How about adding some scenarios with heap size > 32G
>>>>> as sanity checks.
>>>>>
>>>>
>>>> Will do, but with heap size > 32G CDS will be disabled because
>>>> compressed oop is disabled.
>>> Yes. I think we don't have any existing test cases with >32G. So
>>> it's good to add it for sanity check.
>>>
>>> Thanks,
>>> Jiangli
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Thanks,
>>>>>
>>>>> Jiangli
>>>>>
>>>>>
>>>>> On 8/1/18 11:04 PM, Ioi Lam wrote:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8208658
>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8208658-relocate-archived-heap-regions.v00/
>>>>>>
>>>>>>
>>>>>> PROBLEM:
>>>>>>
>>>>>> At runtime, if you use a maximum heap size (-Xmx) setting that's
>>>>>> different than the one used at CDS archive creation time, the
>>>>>> archived
>>>>>> heap regions may fail to map. The reason is the archived heap
>>>>>> regions
>>>>>> are hard-coded to a specific compressed oop encoding scheme.
>>>>>>
>>>>>> This causes degradation in start-up time, because the
>>>>>> archived heap
>>>>>> regions enable many aggressive optimizations, such as
>>>>>> JDK-8202035,
>>>>>> archiving the system module info.
>>>>>>
>>>>>> This degradation is undesirable for JDK-8204247 (include default
>>>>>> CDS archive in JDK binary) -- what we want is to ship a CDS
>>>>>> archive
>>>>>> that's optimal regardless of the max heap settings.
>>>>>>
>>>>>> FIX:
>>>>>>
>>>>>> If the compressed oop encoding scheme is different:
>>>>>> - relocate the archived heap regions so it
>>>>>> fits within the runtime heap bounds.
>>>>>> - patch all reference fields in the archived objects to
>>>>>> use the runtime compressed oop encoding scheme.
>>>>>>
>>>>>> RESULTS:
>>>>>>
>>>>>> # dump archive
>>>>>> java -Xshare:dump -Xmx2000m
>>>>>>
>>>>>> # same compressed oop encoding
>>>>>> java -Xshare:on -Xmx2000m -Xlog:cds -version
>>>>>>
>>>>>> # different compressed oop encoding
>>>>>> java -Xshare:on -Xmx2090m -Xlog:cds -version
>>>>>>
>>>>>> Note: the archive is dumped with UnscaledNarrowOop encoding.
>>>>>> At run time, if -Xmx2090m is specified, ZeroBasedNarrowOop is
>>>>>> used instead.
>>>>>>
>>>>>> ========================================================================
>>>>>>
>>>>>> OLD same encoding 34.45 ms +- 0.51%
>>>>>> OLD diff encoding 47.76 ms +- 0.94% <<<<< 13.3 ms
>>>>>> degradation!
>>>>>>
>>>>>> NEW same encoding 34.58 ms +- 0.61%
>>>>>> NEW diff encoding 35.09 ms +- 0.24% <<<<< 0.5 ms
>>>>>> degradation
>>>>>> ========================================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>> The cost of relocating/patching the archived heap regions is
>>>>>> only
>>>>>> 0.5 ms. This is much better than the 13.3 ms degradation
>>>>>> caused by the
>>>>>> failure to map the archived heap regions.
>>>>>>
>>>>>> TESTING:
>>>>>>
>>>>>> All 150+ CDS tests passed locally. I am starting tiers 1/2/3
>>>>>> runs.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-gc-dev
mailing list