Proposal for improving CDS archive creation

Fri Jul 13 19:43:14 UTC 2018

On 7/13/18 11:50 AM, Ioi Lam wrote:

> When writing into the buffer, the algorithm works like this
>
>
>     MetaspaceObj* get_buffered(MetaspaceObj *p) {
> MetaspaceObj* saved = buffer_find(p);
>         if (saved == NULL) {
>             saved = buffer_write(p);
>         }
>     }
>
> So when you're writing a vtable into the buffer:
>
>     Method** vtable = ...; // points to the "real" class X
>     Method** vtable_buffered = ...; // points to the "buffered" class X
>
>     for (int i=0; i<vtable_length; i++) {
>         Method* m = vtable[i];
> Method* buffered_m = get_buffered(m);
>         vtable_buffered[i] = buffered_m;
>     }
>
> buffer_write(m) will not happen if m is a method defined by a super 
> class of X.
>
> However, with some class are unloaded and the metaspace blocks are 
> being reused, a new MetaspaceObject may happen to occupy the exact 
> same address as an old MetaspaceObject from an unloaded class. This 
> would make the buffering operation more complicated.
>
> We have 2 choices:
>
> [1] Disable the deallocation of MetaspaceObjects when 
> -Xshare:autocreate is specified.
> [2] When a MetaspaceObject is deallocated, remove it from the hash 
> table used by buffer_find().
>
> We can start with [1] as it has a lesser chance of working 
> incorrectly, (except it might run out of metaspace memory for some 
> pathological cases).

That sounds reasonable to me. Could you please also add a note to the 
RFE report so we can keep track the design decision.

Thanks,
Jiangli
>
>
> - Ioi
>
> On 7/11/18 5:46 PM, Jiangli Zhou wrote:
>> Volker originally suggested the idea in the email thread "Improving 
>> AppCDS for Custom Loaders". I think this is a cleaner approach.
>>
>> Thanks,
>>
>> Jiangli
>>
>>
>> On 7/11/18 4:13 PM, Ioi Lam wrote:
>>> I had an off-line discussion with Jiangli, and she has an 
>>> alternative proposal:
>>>
>>> When -Xshare:autocreate is specified, but the CDS archive is not 
>>> available,
>>>
>>> 1. Load classes as normal. After each InstanceKlass is loaded, but 
>>> before it's used,
>>>    make a deep copy of this class into an internal cache.
>>>
>>> 2. The deep copy includes all methods, etc, for this class. However, 
>>> if a Method is
>>>    inherited from a super class, then only a reference to this 
>>> Method is copied.
>>>
>>> 3. At a certain point (probably at VM exit), copy all the (suitable) 
>>> classes from the
>>>    cache and write them into the CDS archive.
>>>
>>> The advantage of this approach is we will be able to archive classes 
>>> that were
>>> loaded by custom loaders, but have been freed at VM exit time 
>>> because the class
>>> loaders were GC'ed.
>>>
>>>
>>> Note: When a class X is loaded, if its supertype(s) have already 
>>> been redefined,
>>> we probably should not copy X into the buffer. That's because the 
>>> vtable of X may
>>> point to some redefined methods from a supertype, which do not match 
>>> the bytecodes
>>> of these methods in the supertype's original class file, so it's a 
>>> messy situation.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>>
>>> On 7/10/18 12:50 PM, Ioi Lam wrote:
>>>> Fixing some sloppy text below ....
>>>>
>>>>
>>>> On 7/10/18 10:16 AM, Ioi Lam wrote:
>>>>> I have a proposal for improving the process of creating of the CDS 
>>>>> archive(s),
>>>>> so we can make CDS easier to use and support more use cases.
>>>>>
>>>>>    - better support for custom loaders
>>>>>    - remove explicit training run
>>>>>    - support 2 levels of shared archives
>>>>>
>>>>> I think the proposal is relatively straight-forward to implement, 
>>>>> as we already
>>>>> have most of the required infrastructures:
>>>>>
>>>>>    + the ability to use Java class loaders at archive creation time
>>>>>    + the ability to relocate MetaspaceObjects
>>>>>
>>>>> Parts of this proposal will also simplify the CDS code and make it 
>>>>> more
>>>>> maintainable.
>>>>>
>>>>> Current process of creating the base archive - [C]
>>>>> ==================================================
>>>>>
>>>>> Currently each JVM process can map at most one CDS archive. Let's 
>>>>> call this
>>>>> the "base archive". It is created by [ref1]:
>>>>>
>>>>>  C1. Reserve a region R of 3GB at 0x800000000.
>>>>>  C2. Load all classes specified in the class list. All data for 
>>>>> these classes
>>>>>      live outside of R.
>>>>>      (E.g., the Klass objects are loaded into tmp_class_space, 
>>>>> which is
>>>>>       adjacent to R).
>>>>>  C3. Copy the metadata of all archivable classes (e.g, exclude 
>>>>> generated
>>>>>      Lambda classes) into R. At this step, R is divided into several
>>>>>      sections (RO, RW, etc).
>>>>>
>>>>>
>>>>>   //  +-- SharedBaseAddress   (default = 0x800000000)
>>>>>   //  +-- _narrow_klass._base
>>>>>   //  |
>>>>>   //  | +-tmp_class_space.base
>>>>>   //  v                               V
>>>>>   // +----+----+----+----+----+-....-+-------------------+
>>>>>   //  |<-           R               ->|
>>>>>   //  | MC | RW | RO | MD | OD |unused| tmp_class_space |
>>>>>   // +----+----+----+----+----+------+-------------------+
>>>>>   //  |<--  3GB        -------------->|
>>>>>   //  |<-- UnscaledClassSpaceMax = 4GB ------------------>|
>>>>>
>>>>>
>>>>> New process for creating the base archive - [N]
>>>>> ===============================================
>>>>>
>>>>> Currently we have a lot of "if (DumpSharedSpaces)" code to for 
>>>>> special case
>>>>> handling of the above scheme. We can improve it by
>>>>>
>>>>>  N1. Remove all code for special memory layout initialization for 
>>>>> -Xshare:dump.
>>>>>      As a result, we will reserve a region R of 1GB at 
>>>>> 0x800000000, which
>>>>>      is used by Klass objects (this is the same as if -Xshare:off 
>>>>> were
>>>>>      specified.)
>>>>>  N2. Load all classes in the class list.
>>>>>  N3. Now R contains the Klass objects of all loaded classes.
>>>>>      Allocate a temporary space T, and copy all contents of R into T.
>>>>>  N4. Now R is empty. Copy the metadata of all archivable classes 
>>>>> into R.
>>>>>
>>>>>
>>>>> Dump-as-you-go for the base archive - [G]
>>>>> =========================================
>>>>>
>>>>> Note that the [N] scheme will work even if you're running an app with
>>>>> -Xshare:off. At some point (e.g., when the VM is about to exit), you
>>>>> can:
>>>>>
>>>>>  G1. Enter a safe point
>>>>>  G2. Go to step [N3].
>>>>>
>>>>> The benefit of [G] is you don't need a separate run to dump the 
>>>>> archive, and
>>>>> there's no need to use the class list. Instead, we can have an 
>>>>> option like:
>>>>>
>>>>>    java -Xshare:autocreate -cp app.jar 
>>>>> -XX:SharedArchiveFile=foo.jsa App
>>>>>
>>>>> If foo.jsa is not available, we run in [G] mode. At VM exit, we 
>>>>> dump into
>>>>> foo.jsa.
>>>>>
>>>>> This way, we don't need to have an explicit training run with
>>>>> -XX:DumpLoadedClassList. Instead, the training run is
>>>>>
>>>> I meant, "Instead, your first run, when the archive is not yet 
>>>> available, becomes the
>>>> training run".
>>>>
>>>> Thanks to Calvin and Dan for spotting this :-)
>>>> - Ioi
>>>>
>>>>> This also makes it easy to support the classes from custom 
>>>>> loaders. There's no
>>>>> need for special tooling to convert -Xlog:class+load=debug output 
>>>>> into a
>>>>> classlist. [ref2]
>>>>>
>>>>>
>>>>> Dumping for second-level archive - [S]
>>>>> ======================================
>>>>>
>>>>>  S1. Load the base archive
>>>>>  S2. Run the app as normal
>>>>>  S3. All Klass objects of the dynamically loaded classes will be 
>>>>> loaded in
>>>>>      the region R, which immediately follows the end of the base 
>>>>> archive.
>>>>>
>>>>>   //  +-- SharedBaseAddress
>>>>>   //  |                          +--- dynamically loaded Klasses
>>>>>   //  |                          |    start from here.
>>>>>   //  v                          v
>>>>>   // +--------------------------+---------...-----------------|
>>>>>   //  | base archive             | region R |
>>>>>   // +--------------------------+---------...-----------------|
>>>>>   //  |<- size of base archive ->|
>>>>>   //  |<--            1GB -->|
>>>>>
>>>>>
>>>>>   S4. At some point (possible when the VM is about to exit) we start
>>>>>       dumping the second level archive
>>>>>   S5. Enter safe point
>>>>>   S6. Now R contains the Klass objects of all dynamically loaded 
>>>>> classes.
>>>>>       Allocate a temporary space T, and copy all contents of R 
>>>>> into T.
>>>>>   S7. Now R is empty. Copy the metadata of all archivable, 
>>>>> dynamically loaded
>>>>>       classes into R.
>>>>>   S8. Create a new shared_dictionary (and shared_symbol_table) 
>>>>> that contains
>>>>>       all the Klasses (Symbols) from both the base and 
>>>>> second-level archives.
>>>>>
>>>>> References
>>>>> ==========
>>>>>
>>>>> [ref1] Current initialization of memory space layout during 
>>>>> -Xshare:dump
>>>>> http://hg.openjdk.java.net/jdk/jdk/file/e0028bb6dd3d/src/hotspot/share/memory/metaspaceShared.cpp#l250 
>>>>>
>>>>> [ref2] Volker Simonis's tool for support custom class loaders in CDS
>>>>>        https://github.com/simonis/cl4cds
>>>>> ---------------------------------------------------------------------- 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>
>>>
>>
>