RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive
Ioi Lam
ioi.lam at oracle.com
Fri Aug 4 21:22:30 UTC 2017
Hi Jiangli,
The code looks good in general. I just have a few pet peeves for
readability:
(1) stringTable.cpp and metaspaceShared.cpp have the same asserts
704 assert(UseG1GC, "Only support G1 GC");
705 assert(UseCompressedOops && UseCompressedClassPointers,
706 "Only support UseCompressedOops and
UseCompressedClassPointers enabled");
1615 assert(UseG1GC, "Only support G1 GC");
1616 assert(UseCompressedOops && UseCompressedClassPointers,
1617 "Only support UseCompressedOops and
UseCompressedClassPointers enabled");
Maybe it's better to combine them into a single function like
MetaspaceShared::assert_vm_flags() so they don't get out of sync?
(2) FileMapInfo::write_archive_heap_regions()
I still find this code very hard to read, especially due to the loop.
First, the comments are not consistent with the code:
498 assert(arr_len <= max_num_regions, "number of memory regions
exceeds maximum");
but the comments says: "The rest are consecutive full GC regions" which
means there's a chance for max_num_regions to be more than 2 (which will
be the case with Calvin's java-loader dumping changes using very small
heap size).So the code is actually wrong.
The word "region" is used in these parameters, but they don't mean the
same thing.
GrowableArray<MemRegion> *regions
int first_region, int max_num_regions,
How about regions -> g1_regions_list
first_region -> first_region_in_archive
In the comments, I find the phrase 'the current archive heap region'
ambiguous. It could be (erroneously) interpreted as "a region from the
currently mapped archive"
To make it unambiguous, how about changing
464 // Write the current archive heap region, which contains one or
multiple GC(G1) regions.
to
// Write the given list of G1 memory regions into the archive,
starting at
// first_region_in_archive.
Also, for the explanation of how the G1 regions are written into the
archive, how about:
// The G1 regions in the list are sorted in ascending address order.
When there are more objects
// than the capacity of a single G1 region, the bottom-most G1
region may be partially filled, and the
// remaining G1 region(s) are consecutively allocated and fully filled.
//
// Thus, the bottom-most G1 region (if not empty) is written into
first_region_in_archive.
// The remaining G1 regions (if exist) are coalesced and written as
a single block
// into (first_region_in_archive + 1)
// Here's the mapping from (g1 regions) -> (archive regions).
All this function needs to do is to decide the values for
r0_start, r0_top
r1_start, r1_top
I think it would be much better to not use the loop, and not use the
max_num_regions parameter (it's always 2 anyway).
*r0_start = *r0_top = NULL;
*r1_start = *r1_top = NULL;
if (arr_len >= 1) {
*r0_start = regions->at(0).start();
*r0_end = *r0_start + regions->at(0).byte_size();
}
if (arr_len >= 2) {
int last = arr_len - 1;
*r1_start = regions->at(1).start();
*r1_end = regions->at(last).start() +
regions->at(last).byte_size();
}
what do you think?
(3) metaspace.cpp
3350 // Map the archived heap regions after compressed pointers
3351 // because it relies on compressed class pointers setting
to work
do you mean this?
// Archived heap regions depend on the parameters of compressed
class pointers, so
// they must be mapped after such parameters have been decided in
the above call.
(4) I found this name not strictly grammatical. How about this:
allow_archive_heap_object -> is_heap_object_archiving_allowed
(5) in most of your code, 'archive' is used as a noun, except in
StringTable::archive_string() where it's used as a verb.
archive_string could also be interpreted erroneously as "return a string
that's already in the archive". So to be consistent and unambiguous, I
think it's better to rename it to StringTable::create_archived_string()
Thanks
- Ioi
On 8/3/17 5:15 PM, Jiangli Zhou wrote:
> Here are the updated webrevs.
>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/
> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>
> Changes in the updated webrevs include:
>
> * Merge with Ioi’s recent shared space auto-sizing change (8072061)
> * Addressed all feedbacks from Ioi and Coleen (Thanks for detailed
> review!)
>
>
> Thanks,
> Jiangli
>
>
>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <jiangli.zhou at Oracle.COM> wrote:
>>
>> Hi Ioi,
>>
>> Thank you so much for reviewing this. I’ve addressed all your
>> feedbacks. Please see details below. I’ll updated the webrev
>> after addressing Coleen’s comments.
>>
>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>>
>>> Hi Jiangli,
>>>
>>> Here are my comments. I've not reviewed the GC code and I'll leave
>>> that to the GC experts :-)
>>>
>>> stringTable.cpp: StringTable::archive_string
>>>
>>> add assert for DumpSharedSpaces only
>>
>> Ok.
>>
>>>
>>> filemap.cpp
>>>
>>> 525 void
>>> FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion>
>>> *regions,
>>> 526 int first_region,
>>> int num_regions) {
>>>
>>> When I first read this function, I found it hard to follow,
>>> especially this part that coalesces the trailing regions:
>>>
>>> 537 int len = regions->length();
>>> 538 if (len > 1) {
>>> 539 start = (char*)regions->at(1).start();
>>> 540 size = (char*)regions->at(len - 1).end() - start;
>>> 541 }
>>> 542 }
>>>
>>> The rest of filemap.cpp always perform identical operations on
>>> MemRegion arrays, which are either 1 or 2 in size. However,
>>> this function doesn't follow that pattern; it also has a very
>>> different notion of "region", and the confusing part is
>>> regions->size() is not the same as num_regions.
>>>
>>> How about we change the API to something like the following? Before
>>> calling this API, the caller needs to coalesce the trailing
>>> G1 regions into a single MemRegion.
>>>
>>> FileMapInfo::write_archive_heap_regions(MemRegion *regions, int
>>> first, int num_regions) {
>>> if (first == MetaspaceShared::first_string) {
>>> assert(num_regons <= MetaspaceShared::max_strings, "...");
>>> } else {
>>> assert(first ==
>>> MetaspaceShared::first_open_archive_heap_region, "...");
>>> assert(num_regons <=
>>> MetaspaceShared::max_open_archive_heap_region, "...");
>>> }
>>> ....
>>>
>>>
>>
>> I’ve reworked the function and simplified the code.
>>
>>>
>>> 756 if (!string_data_mapped) {
>>> 757 StringTable::ignore_shared_strings(true);
>>> 758 assert(string_ranges == NULL && num_string_ranges == 0,
>>> "sanity");
>>> 759 }
>>> 760
>>> 761 if (open_archive_heap_data_mapped) {
>>> 762 MetaspaceShared::set_open_archive_heap_region_mapped();
>>> 763 } else {
>>> 764 assert(open_archive_heap_ranges == NULL &&
>>> num_open_archive_heap_ranges == 0, "sanity");
>>> 765 }
>>>
>>> Maybe the two "if" statements should be more consistent? Instead of
>>> StringTable::ignore_shared_strings, how
>>> about StringTable::set_shared_strings_region_mapped()?
>>
>> Fixed.
>>
>>>
>>> FileMapInfo::map_heap_data() --
>>>
>>> 818 char* addr = (char*)regions[i].start();
>>> 819 char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>> 820 addr, regions[i].byte_size(),
>>> si->_read_only,
>>> 821 si->_allow_exec);
>>>
>>> What happens when the first region succeeds to map but the second
>>> region fails to map? Will both regions be unmapped? I don't see
>>> where you store the return value (base) from os::map_memory(). Does
>>> it mean the code assumes that (addr == base). If so, we need an
>>> assert here.
>>
>> If any of the region fails to map, we bail out and call
>> dealloc_archive_heap_regions(), which handles the deallocation of any
>> regions specified. If second region fails to map, all memory ranges
>> specified by ‘regions’ array are deallocated. We don’t unmap the
>> memory here since it is part of the java heap. Unmapping of heap
>> memory are handled by GC code. The ‘if’ check below makes sure base
>> == addr.
>>
>> if (base == NULL || base != addr) {
>> // dealloc the regions from java heap
>> dealloc_archive_heap_regions(regions, region_num);
>> if (log_is_enabled(Info, cds)) {
>> log_info(cds)("UseSharedSpaces: Unable to map at required
>> address in java heap.");
>> }
>> return false;
>> }
>>
>>
>>>
>>> constantPool.cpp
>>>
>>> Handle refs_handle;
>>> ...
>>> refs_handle = Handle(THREAD, (oop)archived);
>>>
>>> This will first create a NULL handle, then construct a temporary
>>> handle, and then assign the temp handle back to the null
>>> handle. This means two handles will be pushed onto
>>> THREAD->metadata_handles()
>>>
>>> I think it's more efficient if you merge these into a single statement
>>>
>>> Handle refs_handle(THREAD, (oop)archived);
>>
>> Fixed.
>>
>>>
>>> Is this experimental code? Maybe it should be removed?
>>>
>>> 664 if (tag_at(index).is_unresolved_klass()) {
>>> 665 #if 0
>>> 666 CPSlot entry = cp->slot_at(index);
>>> 667 Symbol* name = entry.get_symbol();
>>> 668 Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>> 669 if (k != NULL) {
>>> 670 klass_at_put(index, k);
>>> 671 }
>>> 672 #endif
>>> 673 } else
>>
>> Removed.
>>
>>>
>>> cpCache.hpp:
>>>
>>> u8 _archived_references
>>>
>>> shouldn't this be declared as an narrowOop to avoid the type casts
>>> when it's used?
>>
>> Ok.
>>
>>>
>>> cpCache.cpp:
>>>
>>> add assert so that one of these is used only at dump time and the
>>> other only at run time?
>>>
>>> 610 oop ConstantPoolCache::archived_references() {
>>> 611 return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>> 612 }
>>> 613
>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>> 615 _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>> 616 }
>>
>> Ok.
>>
>> Thanks!
>>
>> Jiangli
>>
>>>
>>> Thanks!
>>> - Ioi
>>>
>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>> Sorry, the mail didn’t handle the rich text well. I fixed the
>>>> format below.
>>>>
>>>> Please help review the changes for JDK-8179302 (Pre-resolve
>>>> constant pool string entries and cache resolved_reference arrays
>>>> in CDS archive). Currently, the CDS archive can contain cached
>>>> class metadata, interned java.lang.String objects. This RFE adds
>>>> the constant pool ‘resolved_references’ arrays (hotspot specific)
>>>> to the archive for startup/runtime performance enhancement.
>>>> The ‘resolved_references' arrays are used to hold references of
>>>> resolved constant pool entries including Strings, mirrors, etc.
>>>> With the 'resolved_references’ being cached, string constants in
>>>> shared classes can now be resolved to existing interned
>>>> java.lang.Strings at CDS dump time. G1 and 64-bit platforms are
>>>> required.
>>>>
>>>> The GC changes in the RFE were discussed and guided by Thomas
>>>> Schatzl and GC team. Part of the changes were contributed by Thomas
>>>> himself.
>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8179302
>>>> hotspot:
>>>> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>>
>>>> Please see below for details of supporting cached
>>>> ‘resolved_references’ and pre-resolving string constants.
>>>>
>>>> Types of Pinned G1 Heap Regions
>>>>
>>>> The pinned region type is a super type of all archive region types,
>>>> which include the open archive type and the closed archive type.
>>>>
>>>> 00100 0 [ 8] Pinned Mask
>>>> 01000 0 [16] Old Mask
>>>> 10000 0 [32] Archive Mask
>>>> 11100 0 [56] Open Archive: ArchiveMask | PinnedMask | OldMask
>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>
>>>>
>>>> Pinned Regions
>>>>
>>>> Objects within the region are 'pinned', which means GC does not
>>>> move any live objects. GC scans and marks objects in the
>>>> pinned region as normal, but skips forwarding live objects.
>>>> Pointers in live objects are updated. Dead objects (unreachable)
>>>> can be collected and freed.
>>>>
>>>> Archive Regions
>>>>
>>>> The archive types are sub-types of 'pinned'. There are two types of
>>>> archive region currently, open archive and closed archive. Both can
>>>> support caching java heap objects via the CDS archive.
>>>>
>>>> An archive region is also an old region by design.
>>>>
>>>> Open Archive (GC-RW) Regions
>>>>
>>>> Open archive region is GC writable. GC scans & marks objects within
>>>> the region and adjusts (updates) pointers in live objects the same
>>>> way as a pinned region. Live objects (reachable) are pinned and not
>>>> forwarded by GC.
>>>> Open archive region does not have 'dead' objects. Unreachable
>>>> objects are 'dormant' objects. Dormant objects are not
>>>> collected and freed by GC.
>>>>
>>>> Adjustable Outgoing Pointers
>>>>
>>>> As GC can adjust pointers within the live objects in open archive
>>>> heap region, objects can have outgoing pointers to another
>>>> java heap region, including closed archive region, open archive
>>>> region, pinned (or humongous) region, and normal generational
>>>> region. When a referenced object is moved by GC, the pointer within
>>>> the open archive region is updated accordingly.
>>>>
>>>> Closed Archive (GC-RO) Regions
>>>>
>>>> The closed archive region is GC read-only region. GC cannot write
>>>> into the region. Objects are not scanned and marked by GC. Objects
>>>> are pinned and not forwarded. Pointers are not updated by GC
>>>> either. Hence, objects within the archive region cannot have any
>>>> outgoing pointers to another java heap region. Objects however can
>>>> still have pointers to other objects within the closed archive
>>>> regions (we might allow pointers to open archive regions in the
>>>> future). That restricts the type of java objects that can
>>>> be supported by the archive region.
>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>
>>>> The GC-readonly archive region makes java heap memory sharable
>>>> among different JVM processes. NOTE: synchronization on the objects
>>>> within the archive heap region can still cause writes to the memory
>>>> page.
>>>>
>>>> Dormant Objects
>>>>
>>>> Dormant objects are unreachable java objects within the open
>>>> archive heap region.
>>>> A java object in the open archive heap region is a live object if
>>>> it can be reached during scanning. Some of the java objects in
>>>> the region may not be reachable during scanning. Those objects are
>>>> considered as dormant, but not dead. For example, a constant pool
>>>> 'resolved_references' array is reachable via the klass root if its
>>>> container klass (shared) is already loaded at the time during GC
>>>> scanning. If a shared klass is not yet loaded, the klass root is
>>>> not scanned and it's constant pool 'resolved_reference' array
>>>> (A) in the open archive region is not reachable. Then A is a
>>>> dormant object.
>>>>
>>>> Object State Transition
>>>>
>>>> All java objects are initially dormant objects when open archive
>>>> heap regions are mapped to the runtime java heap. A dormant object
>>>> becomes live object when the associated shared class is loaded at
>>>> runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs
>>>> to be made when a dormant object becomes live. That should be the
>>>> case for cached objects with strong roots as well, since strong
>>>> roots are only scanned at the start of GC marking (the initial
>>>> marking) but not during Remarking/Final marking. If a cached object
>>>> becomes live during concurrent marking phase, G1 may not find it
>>>> and mark it live unless a call to
>>>> G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>
>>>> Currently, a live object in the open archive heap region cannot
>>>> become dormant again. This restriction simplifies GC
>>>> requirement and guarantees all outgoing pointers are updated by GC
>>>> correctly. Only objects for shared classes from the builtin class
>>>> loaders (boot, PlatformClassLoaders, and AppClassLoaders) are
>>>> supported for caching.
>>>>
>>>> Caching Java Objects at Archive Dump Time
>>>>
>>>> The closed archive and open archive regions are allocated near the
>>>> top of the dump time java heap. Archived java objects are copied
>>>> into the designated archive heap regions. For example, String
>>>> objects and the underlying 'value' arrays are copied into
>>>> the closed archive regions. All references to the archived objects
>>>> (from shared class metadata, string table, etc) are set to the
>>>> new heap locations. A hash table is used to keep track of all
>>>> archived java objects during the copying process to make sure java
>>>> object is not archived more than once if reached from different
>>>> roots. It also makes sure references to the same archived object
>>>> are updated using the same new address location.
>>>>
>>>> Caching Constant Pool resolved_references Array
>>>>
>>>> The 'resolved_references' is an array that holds references of
>>>> resolved constant pool entries including Strings, mirrors
>>>> and methodTypes, etc. Each loaded class has one
>>>> 'resolved_references' array (in ConstantPoolCache). The
>>>> 'resolved_references' arrays are copied into the open archive
>>>> regions during dump process. Prior to copying the
>>>> 'resolved_references' arrays, JVM iterates through constant pool
>>>> entries and resolves all JVM_CONSTANT_String entries to existing
>>>> interned Strings for all archived classes. When resolving, JVM only
>>>> looks up the string table and finds existing interned Strings
>>>> without inserting new ones. If a string entry cannot be resolved to
>>>> an existing interned String, the constant pool entry remain as
>>>> unresolved. That prevents memory waste if a constant pool string
>>>> entry is never used at runtime.
>>>>
>>>> All String objects referenced by the string table are copied first
>>>> into the closed archive regions. The string table entry is
>>>> updated with the new location when each String object is archived.
>>>> The JVM updates the resolved constant pool string entries with the
>>>> new object locations when copying the 'resolved_references' arrays
>>>> to the open archive regions. References to
>>>> the 'resolved_references' arrays in the ConstantPoolCache are also
>>>> updated.
>>>> At runtime as part of ConstantPool::restore_unshareable_info()
>>>> work, call G1SATBCardTableModRefBS::enqueue() to let GC know the
>>>> 'resolved_references' is becoming live. A handle is created for the
>>>> cached object and added to the loader_data's handles.
>>>>
>>>> Runtime Java Heap With Cached Java Objects
>>>>
>>>>
>>>> The closed archive regions (the string regions) and open archive
>>>> regions are mapped to the runtime java heap at the same offsets as
>>>> the dump time offsets from the runtime java heap base.
>>>>
>>>> Preliminary test execution and status:
>>>>
>>>> JPRT: passed
>>>> Tier2-rt: passed
>>>> Tier2-gc: passed
>>>> Tier2-comp: passed
>>>> Tier3-rt: passed
>>>> Tier3-gc: passed
>>>> Tier3-comp: passed
>>>> Tier4-rt: passed
>>>> Tier4-gc: passed
>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>> Tier5-rt: one test failed but passed when running locally, all
>>>> other tests passed
>>>> Tier5-gc: passed
>>>> Tier5-comp: running
>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>> vm.gc: passed
>>>> vm.gc in CDS mode: passed
>>>> Kichensink: passed
>>>> Kichensink in CDS mode: passed
>>>>
>>>> Thanks,
>>>> Jiangli
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list