RFR: 8179302: Pre-resolve constant pool string entries and cache resolved_reference arrays in CDS archive

Ioi Lam ioi.lam at oracle.com
Fri Aug 4 21:22:30 UTC 2017


Hi Jiangli,

The code looks good in general. I just have a few pet peeves for 
readability:


(1) stringTable.cpp and metaspaceShared.cpp have the same asserts

  704   assert(UseG1GC, "Only support G1 GC");
  705   assert(UseCompressedOops && UseCompressedClassPointers,
  706          "Only support UseCompressedOops and 
UseCompressedClassPointers enabled");

1615   assert(UseG1GC, "Only support G1 GC");
1616   assert(UseCompressedOops && UseCompressedClassPointers,
1617          "Only support UseCompressedOops and 
UseCompressedClassPointers enabled");

Maybe it's better to combine them into a single function like 
MetaspaceShared::assert_vm_flags() so they don't get out of sync?



(2) FileMapInfo::write_archive_heap_regions()

I still find this code very hard to read, especially due to the loop.

First, the comments are not consistent with the code:

     498   assert(arr_len <= max_num_regions, "number of memory regions 
exceeds maximum");

but the comments says: "The rest are consecutive full GC regions" which 
means there's a chance for max_num_regions to be more than 2 (which will 
be the case with Calvin's java-loader dumping changes using very small 
heap size).So the code is actually wrong.

The word "region" is used in these parameters, but they don't mean the 
same thing.

               GrowableArray<MemRegion> *regions
               int first_region, int max_num_regions,


How about regions      -> g1_regions_list
           first_region -> first_region_in_archive


In the comments, I find the phrase 'the current archive heap region' 
ambiguous. It could be (erroneously) interpreted as "a region from the 
currently mapped archive"

To make it unambiguous, how about changing


  464 // Write the current archive heap region, which contains one or 
multiple GC(G1) regions.


to

     // Write the given list of G1 memory regions into the archive, 
starting at
     // first_region_in_archive.


Also, for the explanation of how the G1 regions are written into the 
archive, how about:

    // The G1 regions in the list are sorted in ascending address order. 
When there are more objects
    // than the capacity of a single G1 region, the bottom-most G1 
region may be partially filled, and the
    // remaining G1 region(s) are consecutively allocated and fully filled.
    //
    // Thus, the bottom-most G1 region (if not empty) is written into 
first_region_in_archive.
    // The remaining G1 regions (if exist) are coalesced and written as 
a single block
    // into (first_region_in_archive + 1)

    // Here's the mapping from (g1 regions) -> (archive regions).


All this function needs to do is to decide the values for

      r0_start, r0_top
      r1_start, r1_top

I think it would be much better to not use the loop, and not use the 
max_num_regions parameter (it's always 2 anyway).

      *r0_start = *r0_top = NULL;
      *r1_start = *r1_top = NULL;

      if (arr_len >= 1) {
          *r0_start = regions->at(0).start();
          *r0_end = *r0_start + regions->at(0).byte_size();
      }
      if (arr_len >= 2) {
          int last = arr_len - 1;
          *r1_start = regions->at(1).start();
          *r1_end = regions->at(last).start() + 
regions->at(last).byte_size();
}

what do you think?



(3) metaspace.cpp

3350         // Map the archived heap regions after compressed pointers
3351         // because it relies on compressed class pointers setting 
to work

do you mean this?

     // Archived heap regions depend on the parameters of compressed 
class pointers, so
     // they must be mapped after such parameters have been decided in 
the above call.


(4) I found this name not strictly grammatical. How about this:

      allow_archive_heap_object -> is_heap_object_archiving_allowed

(5) in most of your code, 'archive' is used as a noun, except in 
StringTable::archive_string() where it's used as a verb.

archive_string could also be interpreted erroneously as "return a string 
that's already in the archive". So to be consistent and unambiguous, I 
think it's better to rename it to StringTable::create_archived_string()


Thanks
- Ioi


On 8/3/17 5:15 PM, Jiangli Zhou wrote:
> Here are the updated webrevs.
>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.02/ 
> <http://cr.openjdk.java.net/%7Ejiangli/8179302/webrev.hotspot.02/>
> http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.02/
>
> Changes in the updated webrevs include:
>
>   * Merge with Ioi’s recent shared space auto-sizing change (8072061)
>   * Addressed all feedbacks from Ioi and Coleen (Thanks for detailed
>     review!)
>
>
> Thanks,
> Jiangli
>
>
>> On Aug 1, 2017, at 5:29 PM, Jiangli Zhou <jiangli.zhou at Oracle.COM> wrote:
>>
>> Hi Ioi,
>>
>> Thank you so much for reviewing this. I’ve addressed all your 
>> feedbacks. Please see details below. I’ll updated the webrev 
>> after addressing Coleen’s comments.
>>
>>> On Jul 30, 2017, at 9:07 PM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>>
>>> Hi Jiangli,
>>>
>>> Here are my comments. I've not reviewed the GC code and I'll leave 
>>> that to the GC experts :-)
>>>
>>> stringTable.cpp: StringTable::archive_string
>>>
>>>    add assert for DumpSharedSpaces only
>>
>> Ok.
>>
>>>
>>> filemap.cpp
>>>
>>> 525 void 
>>> FileMapInfo::write_archive_heap_regions(GrowableArray<MemRegion> 
>>> *regions,
>>> 526                                              int first_region, 
>>> int num_regions) {
>>>
>>> When I first read this function, I found it hard to follow, 
>>> especially this part that coalesces the trailing regions:
>>>
>>> 537         int len = regions->length();
>>> 538         if (len > 1) {
>>> 539           start = (char*)regions->at(1).start();
>>> 540           size = (char*)regions->at(len - 1).end() - start;
>>> 541         }
>>> 542       }
>>>
>>> The rest of filemap.cpp always perform identical operations on 
>>> MemRegion arrays, which are either 1 or 2 in size. However, 
>>> this function doesn't follow that pattern; it also has a very 
>>> different notion of "region", and the confusing part is 
>>> regions->size() is not the same as num_regions.
>>>
>>> How about we change the API to something like the following? Before 
>>> calling this API, the caller needs to coalesce the trailing 
>>> G1 regions into a single MemRegion.
>>>
>>>     FileMapInfo::write_archive_heap_regions(MemRegion *regions, int 
>>> first, int num_regions) {
>>>        if (first == MetaspaceShared::first_string) {
>>>           assert(num_regons <=  MetaspaceShared::max_strings, "...");
>>>        } else {
>>>           assert(first == 
>>> MetaspaceShared::first_open_archive_heap_region, "...");
>>>           assert(num_regons <= 
>>> MetaspaceShared::max_open_archive_heap_region, "...");
>>> }
>>>        ....
>>>
>>>
>>
>> I’ve reworked the function and simplified the code.
>>
>>>
>>> 756   if (!string_data_mapped) {
>>> 757     StringTable::ignore_shared_strings(true);
>>> 758     assert(string_ranges == NULL && num_string_ranges == 0, 
>>> "sanity");
>>> 759   }
>>> 760
>>> 761   if (open_archive_heap_data_mapped) {
>>> 762 MetaspaceShared::set_open_archive_heap_region_mapped();
>>> 763   } else {
>>> 764     assert(open_archive_heap_ranges == NULL && 
>>> num_open_archive_heap_ranges == 0, "sanity");
>>> 765   }
>>>
>>> Maybe the two "if" statements should be more consistent? Instead of 
>>> StringTable::ignore_shared_strings, how 
>>> about StringTable::set_shared_strings_region_mapped()?
>>
>> Fixed.
>>
>>>
>>> FileMapInfo::map_heap_data() --
>>>
>>> 818     char* addr = (char*)regions[i].start();
>>> 819     char* base = os::map_memory(_fd, _full_path, si->_file_offset,
>>> 820                                 addr, regions[i].byte_size(), 
>>> si->_read_only,
>>> 821                                 si->_allow_exec);
>>>
>>> What happens when the first region succeeds to map but the second 
>>> region fails to map? Will both regions be unmapped? I don't see 
>>> where you store the return value (base) from os::map_memory(). Does 
>>> it mean the code assumes that (addr == base). If so, we need an 
>>> assert here.
>>
>> If any of the region fails to map, we bail out and call 
>> dealloc_archive_heap_regions(), which handles the deallocation of any 
>> regions specified. If second region fails to map, all memory ranges 
>> specified by ‘regions’ array are deallocated. We don’t unmap the 
>> memory here since it is part of the java heap. Unmapping of heap 
>> memory are handled by GC code. The ‘if’ check below makes sure base 
>> == addr.
>>
>>     if (base == NULL || base != addr) {
>>       // dealloc the regions from java heap
>>       dealloc_archive_heap_regions(regions, region_num);
>>       if (log_is_enabled(Info, cds)) {
>>         log_info(cds)("UseSharedSpaces: Unable to map at required 
>> address in java heap.");
>>       }
>>       return false;
>>     }
>>
>>
>>>
>>> constantPool.cpp
>>>
>>>     Handle refs_handle;
>>>     ...
>>>     refs_handle = Handle(THREAD, (oop)archived);
>>>
>>> This will first create a NULL handle, then construct a temporary 
>>> handle, and then assign the temp handle back to the null 
>>> handle. This means two handles will be pushed onto 
>>> THREAD->metadata_handles()
>>>
>>> I think it's more efficient if you merge these into a single statement
>>>
>>>     Handle refs_handle(THREAD, (oop)archived);
>>
>> Fixed.
>>
>>>
>>> Is this experimental code? Maybe it should be removed?
>>>
>>> 664     if (tag_at(index).is_unresolved_klass()) {
>>> 665 #if 0
>>> 666       CPSlot entry = cp->slot_at(index);
>>> 667       Symbol* name = entry.get_symbol();
>>> 668       Klass* k = SystemDictionary::find(name, NULL, NULL, THREAD);
>>> 669       if (k != NULL) {
>>> 670         klass_at_put(index, k);
>>> 671       }
>>> 672 #endif
>>> 673     } else
>>
>> Removed.
>>
>>>
>>> cpCache.hpp:
>>>
>>>     u8   _archived_references
>>>
>>> shouldn't this be declared as an narrowOop to avoid the type casts 
>>> when it's used?
>>
>> Ok.
>>
>>>
>>> cpCache.cpp:
>>>
>>>    add assert so that one of these is used only at dump time and the 
>>> other only at run time?
>>>
>>> 610 oop ConstantPoolCache::archived_references() {
>>> 611   return oopDesc::decode_heap_oop((narrowOop)_archived_references);
>>> 612 }
>>> 613
>>> 614 void ConstantPoolCache::set_archived_references(oop o) {
>>> 615   _archived_references = (u8)oopDesc::encode_heap_oop(o);
>>> 616 }
>>
>> Ok.
>>
>> Thanks!
>>
>> Jiangli
>>
>>>
>>> Thanks!
>>> - Ioi
>>>
>>> On 7/27/17 1:37 PM, Jiangli Zhou wrote:
>>>> Sorry, the mail didn’t handle the rich text well. I fixed the 
>>>> format below.
>>>>
>>>> Please help review the changes for JDK-8179302 (Pre-resolve 
>>>> constant pool string entries and cache resolved_reference arrays 
>>>> in CDS archive). Currently, the CDS archive can contain cached 
>>>> class metadata, interned java.lang.String objects. This RFE adds 
>>>> the constant pool ‘resolved_references’ arrays (hotspot specific) 
>>>> to the archive for startup/runtime performance enhancement. 
>>>> The ‘resolved_references' arrays are used to hold references of 
>>>> resolved constant pool entries including Strings, mirrors, etc. 
>>>> With the 'resolved_references’ being cached, string constants in 
>>>> shared classes can now be resolved to existing interned 
>>>> java.lang.Strings at CDS dump time. G1 and 64-bit platforms are 
>>>> required.
>>>>
>>>> The GC changes in the RFE were discussed and guided by Thomas 
>>>> Schatzl and GC team. Part of the changes were contributed by Thomas 
>>>> himself.
>>>> RFE:  https://bugs.openjdk.java.net/browse/JDK-8179302
>>>> hotspot: 
>>>>   http://cr.openjdk.java.net/~jiangli/8179302/webrev.hotspot.01/
>>>> whitebox: http://cr.openjdk.java.net/~jiangli/8179302/webrev.whitebox.01/
>>>>
>>>> Please see below for details of supporting cached 
>>>> ‘resolved_references’ and pre-resolving string constants.
>>>>
>>>> Types of Pinned G1 Heap Regions
>>>>
>>>> The pinned region type is a super type of all archive region types, 
>>>> which include the open archive type and the closed archive type.
>>>>
>>>> 00100 0 [ 8] Pinned Mask
>>>> 01000 0 [16] Old Mask
>>>> 10000 0 [32] Archive Mask
>>>> 11100 0 [56] Open Archive:   ArchiveMask | PinnedMask | OldMask
>>>> 11100 1 [57] Closed Archive: ArchiveMask | PinnedMask | OldMask + 1
>>>>
>>>>
>>>> Pinned Regions
>>>>
>>>> Objects within the region are 'pinned', which means GC does not 
>>>> move any live objects. GC scans and marks objects in the 
>>>> pinned region as normal, but skips forwarding live objects. 
>>>> Pointers in live objects are updated. Dead objects (unreachable) 
>>>> can be collected and freed.
>>>>
>>>> Archive Regions
>>>>
>>>> The archive types are sub-types of 'pinned'. There are two types of 
>>>> archive region currently, open archive and closed archive. Both can 
>>>> support caching java heap objects via the CDS archive.
>>>>
>>>> An archive region is also an old region by design.
>>>>
>>>> Open Archive (GC-RW) Regions
>>>>
>>>> Open archive region is GC writable. GC scans & marks objects within 
>>>> the region and adjusts (updates) pointers in live objects the same 
>>>> way as a pinned region. Live objects (reachable) are pinned and not 
>>>> forwarded by GC.
>>>> Open archive region does not have 'dead' objects. Unreachable 
>>>> objects are 'dormant' objects. Dormant objects are not 
>>>> collected and freed by GC.
>>>>
>>>> Adjustable Outgoing Pointers
>>>>
>>>> As GC can adjust pointers within the live objects in open archive 
>>>> heap region, objects can have outgoing pointers to another 
>>>> java heap region, including closed archive region, open archive 
>>>> region, pinned (or humongous) region, and normal generational 
>>>> region. When a referenced object is moved by GC, the pointer within 
>>>> the open archive region is updated accordingly.
>>>>
>>>> Closed Archive (GC-RO) Regions
>>>>
>>>> The closed archive region is GC read-only region. GC cannot write 
>>>> into the region. Objects are not scanned and marked by GC. Objects 
>>>> are pinned and not forwarded. Pointers are not updated by GC 
>>>> either. Hence, objects within the archive region cannot have any 
>>>> outgoing pointers to another java heap region. Objects however can 
>>>> still have pointers to other objects within the closed archive 
>>>> regions (we might allow pointers to open archive regions in the 
>>>> future). That restricts the type of java objects that can 
>>>> be supported by the archive region.
>>>> In JDK 9 we support archive Strings with the archive regions.
>>>>
>>>> The GC-readonly archive region makes java heap memory sharable 
>>>> among different JVM processes. NOTE: synchronization on the objects 
>>>> within the archive heap region can still cause writes to the memory 
>>>> page.
>>>>
>>>> Dormant Objects
>>>>
>>>> Dormant objects are unreachable java objects within the open 
>>>> archive heap region.
>>>> A java object in the open archive heap region is a live object if 
>>>> it can be reached during scanning. Some of the java objects in 
>>>> the region may not be reachable during scanning. Those objects are 
>>>> considered as dormant, but not dead. For example, a constant pool 
>>>> 'resolved_references' array is reachable via the klass root if its 
>>>> container klass (shared) is already loaded at the time during GC 
>>>> scanning. If a shared klass is not yet loaded, the klass root is 
>>>> not scanned and it's constant pool 'resolved_reference' array 
>>>> (A) in the open archive region is not reachable. Then A is a 
>>>> dormant object.
>>>>
>>>> Object State Transition
>>>>
>>>> All java objects are initially dormant objects when open archive 
>>>> heap regions are mapped to the runtime java heap. A dormant object 
>>>> becomes live object when the associated shared class is loaded at 
>>>> runtime. Explicit call to G1SATBCardTableModRefBS::enqueue() needs 
>>>> to be made when a dormant object becomes live. That should be the 
>>>> case for cached objects with strong roots as well, since strong 
>>>> roots are only scanned at the start of GC marking (the initial 
>>>> marking) but not during Remarking/Final marking. If a cached object 
>>>> becomes live during concurrent marking phase, G1 may not find it 
>>>> and mark it live unless a call to 
>>>> G1SATBCardTableModRefBS::enqueue() is made for the object.
>>>>
>>>> Currently, a live object in the open archive heap region cannot 
>>>> become dormant again. This restriction simplifies GC 
>>>> requirement and guarantees all outgoing pointers are updated by GC 
>>>> correctly. Only objects for shared classes from the builtin class 
>>>> loaders (boot, PlatformClassLoaders, and AppClassLoaders) are 
>>>> supported for caching.
>>>>
>>>> Caching Java Objects at Archive Dump Time
>>>>
>>>> The closed archive and open archive regions are allocated near the 
>>>> top of the dump time java heap. Archived java objects are copied 
>>>> into the designated archive heap regions. For example, String 
>>>> objects and the underlying 'value' arrays are copied into 
>>>> the closed archive regions. All references to the archived objects 
>>>> (from shared class metadata, string table, etc) are set to the 
>>>> new heap locations. A hash table is used to keep track of all 
>>>> archived java objects during the copying process to make sure java 
>>>> object is not archived more than once if reached from different 
>>>> roots. It also makes sure references to the same archived object 
>>>> are updated using the same new address location.
>>>>
>>>> Caching Constant Pool resolved_references Array
>>>>
>>>> The 'resolved_references' is an array that holds references of 
>>>> resolved constant pool entries including Strings, mirrors 
>>>> and methodTypes, etc. Each loaded class has one 
>>>> 'resolved_references' array (in ConstantPoolCache). The 
>>>> 'resolved_references' arrays are copied into the open archive 
>>>> regions during dump process. Prior to copying the 
>>>> 'resolved_references' arrays, JVM iterates through constant pool 
>>>> entries and resolves all JVM_CONSTANT_String entries to existing 
>>>> interned Strings for all archived classes. When resolving, JVM only 
>>>> looks up the string table and finds existing interned Strings 
>>>> without inserting new ones. If a string entry cannot be resolved to 
>>>> an existing interned String, the constant pool entry remain as 
>>>> unresolved. That prevents memory waste if a constant pool string 
>>>> entry is never used at runtime.
>>>>
>>>> All String objects referenced by the string table are copied first 
>>>> into the closed archive regions. The string table entry is 
>>>> updated with the new location when each String object is archived. 
>>>> The JVM updates the resolved constant pool string entries with the 
>>>> new object locations when copying the 'resolved_references' arrays 
>>>> to the open archive regions. References to 
>>>> the 'resolved_references' arrays in the ConstantPoolCache are also 
>>>> updated.
>>>> At runtime as part of ConstantPool::restore_unshareable_info() 
>>>> work, call G1SATBCardTableModRefBS::enqueue() to let GC know the 
>>>> 'resolved_references' is becoming live. A handle is created for the 
>>>> cached object and added to the loader_data's handles.
>>>>
>>>> Runtime Java Heap With Cached Java Objects
>>>>
>>>>
>>>> The closed archive regions (the string regions) and open archive 
>>>> regions are mapped to the runtime java heap at the same offsets as 
>>>> the dump time offsets from the runtime java heap base.
>>>>
>>>> Preliminary test execution and status:
>>>>
>>>> JPRT: passed
>>>> Tier2-rt: passed
>>>> Tier2-gc: passed
>>>> Tier2-comp: passed
>>>> Tier3-rt: passed
>>>> Tier3-gc: passed
>>>> Tier3-comp: passed
>>>> Tier4-rt: passed
>>>> Tier4-gc: passed
>>>> Tier4-comp:6 jobs timed out, all other tests passed
>>>> Tier5-rt: one test failed but passed when running locally, all 
>>>> other tests passed
>>>> Tier5-gc: passed
>>>> Tier5-comp: running
>>>> hotspot_gc: two jobs timed out, all other tests passed
>>>> hotspot_gc in CDS mode: two jobs timed out, all other tests passed
>>>> vm.gc: passed
>>>> vm.gc in CDS mode: passed
>>>> Kichensink: passed
>>>> Kichensink in CDS mode: passed
>>>>
>>>> Thanks,
>>>> Jiangli
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170804/94de9413/attachment.htm>


More information about the hotspot-gc-dev mailing list