RFR(L) 8244778 Archive full module graph in CDS

Lois Foltan lois.foltan at oracle.com
Thu Sep 3 18:45:45 UTC 2020


On 8/31/2020 11:31 AM, Ioi Lam wrote:
> Hi Lois,
>
> Thanks for the review. Your comments has led me to discover a couple 
> of pretty serious issues, which hopefully I have fixed. Also the code 
> is cleaner now, with much fewer control-flow changes than the last 
> webrev.
>
> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v03/ 
>
> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v03.delta/ 
>

Hi Ioi,

Looks very good, thanks for making those changes.  One minor comment 
here then another comment below when you discuss the addition of 
check_cds_restrictions().

Minor nit in moduleEntry.cpp & packageEntry.cpp when dealing with the 
ModuleEntry's reads list and a PackageEntry's exports list.  The names 
of the methods to write and read those arrays is somewhat confusing.

ModuleEntry::write_archived_entry_array
ModuleEntry::read_archived_entry_array

At first I thought you were reading/writing an array of archived 
entries, not the array within an archived entry itself.  I was trying to 
think of a better name.  Please consider adding a comment at line #400 & 
line #417 ahead of those methods in moduleEntry.cpp to indicate that 
they are used for both reading/writing a ModuleEntry's reads list and a 
PackageEntry's exports list.

>
> Please see my comments in-line.
>
> On 8/25/20 7:58 AM, Lois Foltan wrote:
>>
>> Hi Ioi,
>>
>> Changes looks really good.  Comments interspersed below.
>>
>> Thanks,
>> Lois
>>
>> On 8/12/2020 6:06 PM, Ioi Lam wrote:
>>> Hi Lois,
>>>
>>> Thanks for the comments. I have an updated webrev
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v02/ 
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v02.delta/ 
>>>
>>>
>>> Here are the general notes on the changes. Replies to your questions 
>>> are in-line below.
>>>
>>> (1) Integrated updates in the Java code from Alan Bateman. Here are 
>>> Alan's
>>>     notes:
>>>
>>>     The archive of the boot layer is as before, except that 
>>> archiving is
>>>     skipped if there are split packages or incubator modules. 
>>> Incubating
>>>     modules aren't resolved by default so we shouldn't have them in the
>>>     boot layer by default anyway.
>>>
>>>     I've dropped the module finders from the boot layer archive. I've
>>>     left the IllegalAccessLogger.Builder in the acrhive for now (even
>>>     though it is not the boot layer). We should be able to remove that
>>>     once the JEP to disallow illegal access by default is in.
>>>
>>>     Related is that I don't like the archive for the module graph
>>>     (ArchivedModuleGraph, pre-dates this RFE) including the set of
>>>     packages to export/open for illegal access as they aren't part
>>>     of the module graph. I've left it for now but we can also remove
>>>     that once we disallow illegal access by default (as those sets
>>>     will be empty).
>>>
>>>     The archive of built-in class loaders is now in one object
>>>     jdk.internal.loader.ArchivedClassLoaders which I think will be a
>>>     bit more maintainable. I've also drop the ucp field from the
>>>     AppClassLoader as the changes to BuiltinClassLoader means is no
>>>     longer needs to duplicated.
>>>
>>>     There's one remaining issue in ModuleBootstrap.boot where it has 
>>> fix
>>>     an app class loader value (ModuleLayer.CLV). Ideally the 
>>> initialization
>>>     of the built-in class loaders would do this but we are kinda forced
>>>     to separate the archiving of the built-in class loaders from the 
>>> boot
>>>     layer. I might look at this again some time.
>>>
>>>
>>> (2) Moved code from classLoaderData.cpp -> classLoaderSharedData.cpp
>>>
>>> (3) Reverted unnecessary changes in JavaClasses::compute_offset
>>>
>>> (4) Minor clean up to use QuickSort::sort() instead of qsort()
>>>
>>> (5) Moved the C-side of module initialization for platform/system
>>>     loaders to inside java.lang.System::initPhase2(), so this happens
>>>     at the same time as without full module archiving.
>>>
>>> (6) Moved the use of Module_lock to so all modules in a class loader
>>>     are restored atomically. See ArchivedClassLoaderData::restore()
>>>
>>>     This fixed a bug where test/jdk/com/sun/jdi/ModulesTest.java
>>>     would fail as it sees a partially restored set of modules.
>>>
>>>
>>>
>>> On 8/7/20 12:06 PM, Lois Foltan wrote:
>>>> Hi Ioi,
>>>>
>>>> Overall looks promising.  I have some review comments below, but 
>>>> not a complete review.  I concentrated on the JVM side primarily 
>>>> how the archived module graph is read in, how the ModuleEntry and 
>>>> PackageEntry tables are created from the archive for the 3 builtin 
>>>> loaders and potential timing issues during start up.
>>>>
>>>> On 7/22/2020 3:36 PM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8244778
>>>>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v01/ 
>>>>>
>>>>>
>>>>> Please review this patch that stores the full module graph in the CDS
>>>>> archive heap. This reduces the initialization time of the basic 
>>>>> JVM by
>>>>> about 22%:
>>>>>
>>>>> $ perf stat -r 100 bin/java -version
>>>>> before: 98,219,329 instructions 0.03971 secs elapsed (+- 0.44%)
>>>>> after:  55,835,815 instructions 0.03109 secs elapsed (+- 0.65%)
>>>>>
>>>>> [1] Start with ModuleBootstrap.java. The current implementation is
>>>>>     quite restrictive: the archived module graph is used only when no
>>>>>     module options are specified.
>>>>>
>>>>>     See ModuleBootstrap.mayUseArchivedBootLayer().
>>>>>
>>>>>     We can probably support options such as main module and module 
>>>>> path
>>>>>     in a future RFE.
>>>>
>>>> Yes, I noticed the restrictions.  The JBS issue discusses the 
>>>> possibility of using the dumped module graph in the event that the 
>>>> value of the options are the same.  I think for this implementation 
>>>> to be viable and have a positive impact you should consider 
>>>> comparing at least the options --add-modules, --add-exports and 
>>>> --add-reads.
>>>
>>> I agree. I want to keep the changes minimal in this RFE, and will 
>>> add more support for other use cases in follow-on RFEs. Instead of 
>>> requiring these options to be unspecified, we can relax the 
>>> restriction such that these options must be the same between archive 
>>> dump time and run time.
>>
>> Sounds good!
>>
>>>
>>>>
>>>>
>>>>>
>>>>> [2] In the current JDK implementation, there is no single object
>>>>>     that represents "the module graph". Most of the information
>>>>>     is stored in the archive bootLayer object, but a few additional
>>>>>     restoration operations need to be performed:
>>>>>
>>>>>     + See ModuleBootstrap.getArchivedBootLayer()
>>>>>     + Some static fields need to be archived/restored in
>>>>>       Module.java, BuiltinClassLoader.java, ClassLoaders.java
>>>>>       and BootLoader.java
>>>>>
>>>>> [3] I ran into a complication with two loader instances of
>>>>>     PlatformClassLoader and AppClassLoader. They are stored in
>>>>>     multiple tables inside the module graph (e.g.,
>>>>>     BuiltinClassLoader$LoadedModule) so I cannot easily recreate
>>>>>     them at runtime.
>>>>>
>>>>>     However, these two loaders contain information specific to the
>>>>>     dump time VM lifecycle (such as the classes that were loaded
>>>>>     during CDS dumping) that need to be scrubbed. I couldn't find an
>>>>>     elegant way of doing this, so I added a private 
>>>>> "resetArchivedStates"
>>>>>     method to a few classes. They are called inside
>>>>>     HeapShared::reset_archived_object_states().
>>>>>
>>>>> [4] Related native data structures (PackageEntry and ModuleEntry)
>>>>>     are also archived. Start with classLoaderData.cpp
>>>>>
>>>>> Passes mach5 tiers 1-4. I will test with additional tiers.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>
>>>> classfile/classLoader.cpp
>>>> - line #1644 pulling the javabase_moduleEntry() check outside of 
>>>> the Module_lock maybe problematic if after you check it another
>>>>   thread initializes in the time between the check and the 
>>>> obtaining of the Module_lock on line #1645.
>>>
>>> Fixed.
>>
>> That looks good.  I think it is fine that you are checking if 
>> java.base is defined via a call to javabase_moduleEntry() because you 
>> are just trying to determine if a ModuleEntry should be created or 
>> not.  In most cases though using ModuleEntryTable::javabase_defined() 
>> is the correct way to ensure that both the ModuleEntry for java.base 
>> has been created and that the ModuleEntry has been injected in the 
>> corresponding java.lang.Module oop.
>>
>>>
>>>>
>>>> classfile/classLoader.hpp
>>>> - somewhere in ArchivedClassLoaderData there should be an assert to 
>>>> make sure that the CLD, whose package entries and module entries 
>>>> are being archived is not a "_has_class_mirror_holder" CLD for a 
>>>> weakly defined hidden class.  Those dedicated CLDs should never 
>>>> have package entries or module entries.
>>>>
>>>
>>> I added ArchivedClassLoaderData::assert_valid()
>>>
>>>> classfile/moduleEntry.cpp
>>>> - line #400, typo "conver" --> "convert"
>>>> - line #500, maybe sort if n is greater than 1 instead of 0 (same 
>>>> comment for packageEntry.cpp line #270)
>>>>
>>> Fixed
>>>
>>>> classfile/systemDictionary.cpp
>>>> - It looks like the PackageEntry and ModuleEntry tables for the 
>>>> system and platform class loaders are  added within 
>>>> SystemDictionary::compute_java_loaders() which is called from 
>>>> Thread::create_vm() but after the call in Thread::create_vm() to 
>>>> call_initPhase2 which is when Universe::_module_initialized is set 
>>>> to true.  Did I read this correctly?  If yes, then I think you have 
>>>> to be sure that Universe::_module_initialized is not set until the 
>>>> module graph for the 3 builtin loaders is loaded from the archive.
>>>>
>>>
>>> I moved the code to be called by ModuleBootstrap::boot() so it 
>>> should now happen inside call_initPhase2.
>>
>> Yes, that looks good.
>>
>>>
>>>> memory/filemap.cpp
>>>> - line #237 typo "modue" --> "module"
>>>>
>>>
>>> Fixed
>>>
>>>> - Since the CDS support for capturing the module graph does not 
>>>> archive unnamed modules, I assume 
>>>> Modules::set_bootloader_unnamed_module() is still called. Is this 
>>>> true or does the unnamed module for the boot loader get archived?
>>> The unnamed module for the boot loader is not archived.
>>>
>>> Modules::set_bootloader_unnamed_module()  wasn't called in my last 
>>> webrev. Thanks for catching it.
>>>
>>> I added a call to BootLoader.getUnnamedModule() in 
>>> ModuleBootstrap::boot()  to trigger <clinit> of BootLoader, which 
>>> will call into the VM for Modules::set_bootloader_unnamed_module().
>>
>> Looks good.
>>
>>>
>>>
>>>
>>>> Clarification/timing questions:
>>>>
>>>
>>> Here's an overall problem I am facing:
>>>
>>> The module graph is archived after the module system has fully 
>>> started up. This means that for the boot loader, we only know the 
>>> full set of modules/packages, but we don't know which part is the 
>>> subset that was initialized during early VM bootstraping (e.g., when 
>>> ModuleEntryTable::javabase_defined() == false).
>>>
>>> So the behavior is a bit different during the early bootstrap phase 
>>> (all the way before we reach ModuleBootstrap::boot()): 
>>> ClassLoaderData::the_null_class_loader_data()->modules() and 
>>> ClassLoaderData::the_null_class_loader_data()->packages() are 
>>> already populated before a single class is loaded.
>>
>> If this is the case then, at the point when a ModuleEntry is created 
>> for java.base using the archived module graph, there should be a 
>> assertion that java_lang_Class::_fixup_module_field_list is NULL. To 
>> confirm no class has been loaded before java.base is defined. Maybe 
>> in ClassLoaderDataShared::serialize where you restore the boot 
>> loader's archived modules?
>
> Thanks for pointing this out. It turned out that in the last webrev 
> (v02), I had a bug where the module of the primitive classes were not 
> initialized. Now I have changed the initialization code to behave the 
> same whether archived full module graph is used or not. The only 
> differences are:
>
> [1] ClassLoader::create_javabase(): 
> ModuleEntryTable::javabase_moduleEntry() may be inited by CDS.
>
> [2] When archived full module graph is used, 
> ModuleEntryTable::patch_javabase_entries is called from 
> Modules::define_archived_modules. 
> (java_lang_Class::_fixup_module_field_list is used within this call.)
>
> I also added a new test case: cds/PrimitiveClassMirrors.java

Good test, I'm glad you added that!

>
>>
>>> This difference doesn't seem to make a practical difference. E.g., 
>>> none of our code seems to assume that "before any classes in the 
>>> java.util package is loaded, 
>>> ClassLoaderData::the_null_class_loader_data()->packages() must not 
>>> contain an entry for java.util".
>>>
>>> I think we have two choices when the archived module graph is used:
>>>
>>> [1] We require that the state of the module system must, at every 
>>> step of VM initialization, be identical to that of a VM that doesn't 
>>> use an archived module graph.
>>>
>>> [2] We make sure that the VM/JDK bootstrap code can tolerate the 
>>> existence of module/packages that are added earlier than a VM that 
>>> doesn't use an archived module graph.
>>>
>>> I tried doing a version of [1] and found that to be too difficult. 
>>> [2] seems much simpler and is the approach I am using now.
>>
>> I think [2] is reasonable.
>>
>>>
>>>> oops/instanceKlass.cpp
>>>> - line #2545, comment "TO DO  -- point it to the archived 
>>>> PackageEntry (JDK-8249262)"
>>>> are you thinking that since the module graph is read in ahead of 
>>>> time that it can be set when an InstanceKlass is created? There is 
>>>> a point before java.base is defined that InstanceKlass::set_package 
>>>> takes into account that could be a timing issue.
>>>>
>>>>
>>>
>>> I think it will work as if another class in the same package has 
>>> already been defined.
>>>
>>>> - There are some checks in modules.cpp that are valuable to have 
>>>> during bootstrapping.  For example, packages that are encountered 
>>>> during bootstrapping before java.base is defined are put into the 
>>>> java.base module but then verified after java.base is defined via 
>>>> verify_javabase_packages.  So at what point in the bootstrapping 
>>>> process does the boot loader's archived module's become known? 
>>>> Could classes have been defined to the VM before this point? Then 
>>>> their packages will have to be verified to be sure that they are 
>>>> indeed packages within java.base.  Consider looking at other checks 
>>>> that occur within classfile/modules.cpp as well.
>>>>
>>> As mentioned above, calling verify_javabase_packages() at run time 
>>> will fail, as we have loaded all packages for the boot loader, not 
>>> just those for java.base.
>>
>> I think not calling verify_javabase_packages works because as you 
>> stated above no classes have been loaded before java.base is defined 
>> which is not the same situation as running without the archived 
>> module graph.
>>
>> A couple of additional comments:
>>
>> - ModuleEntry's reads list and PackageEntry's exports list.  We had 
>> hoped eventually to change these lists from being a c heap allocated 
>> GrowableArray over to a ResourceHashTable for faster lookup.  Doing 
>> that as a separate RFE might help CDS archiving not to have to 
>> archive those lists as an Array?
>>
>
> CDS cannot archive ResourceHashTable either. We have CompactHashtable 
> which can be archived, but it cannot be modified at run time. I think 
> the export list can be modified at run time with 
> java.lang.Module::addExports(), so we probably need to do it like:
>
> - if a Module was archived, check its CompactHashtable first
> - if not found, check the ResourceHashTable
>
> This would make the start-up a little faster (no more copying from the 
> array lists into the hashtable, but would make the code more 
> complicated. I need to investigate to see if it's worthwhile.
>
>> - moduleEntry.cpp and packageEntry.cpp: Both methods 
>> "compare_module_by_name" and "compare_package_by_name" should have an 
>> assert if the names are equal.  No ModuleEntryTable or 
>> PackageEntryTable should ever have 2 same named modules or packages.
>>
>
> I tried adding:
>
> static int compare_package_by_name(PackageEntry* a, PackageEntry* b) {
>   assert(a != b && a->name() != b->name(), "array should not contain 
> duplicated entries");
>   return a->name()->fast_compare(b->name());
> }
>
> but this caused an assert, because our QuickSort implementation would 
> sometimes compare the same element!
>
> #7  0x00007ffff659238b in QuickSort::partition<true, PackageEntry*, 
> int (*)(void const*, void const*)> (array=0x80043fcb0,
>     pivot=248, length=496, comparator=0x7ffff6590a5a 
> <compare_package_by_name(PackageEntry*, PackageEntry*)>)
>     at /jdk2/gil/open/src/hotspot/share/utilities/quickSort.hpp:76
> 76          for ( ; comparator(array[left_index], pivot_val) < 0; 
> ++left_index) {
> (gdb) p array[left_index]
> $1 = (PackageEntry *) 0x7ffff0439d00
> (gdb) p pivot_val
> $2 = (PackageEntry *) 0x7ffff0439d00
> (gdb) p pivot
> $3 = 248
> (gdb) p left_index
> $4 = 248
>
> So I ended up with:
>
> static int compare_package_by_name(PackageEntry* a, PackageEntry* b) {
>   assert(a == b || a->name() != b->name(), "no duplicated names");
>   return a->name()->fast_compare(b->name());
> }

Looks good to me!

>
>> - Another timing clarification question for me.  I assume that the 
>> module graph is dumped post module initialization when Java has 
>> completely defined the module graph to the JVM, is this correct?
>
> Yes
>
>> My concern is that there could be a window post module initialization 
>> and pre archiving the module graph where another thread could define 
>> a new module or add Module readability or add Package exportability.  
>> So that the module graph you are dumping is not the same module graph 
>> at the end of module initialization.  Is this a concern?
>
> We don't allow arbitrary code to be executed during -Xshare:dump, so 
> the module graph shouldn't be changed after module initialization has 
> finished.
>
> I've added Modules::check_cds_restrictions() to check for this.

A question about this because a user's program can define modules post 
module initialization via ModuleDescriptor.newModule().  See for 
example, tests within 
open/test/hotspot/jtreg/runtime/module/AccessCheck.  So all of these 
tests would trigger check_cds_restrictions() if -Xshare:dump was turned 
on.  Is that a concern?

Thanks,
Lois


>
> I also added asserts to make sure that none of the classes used by the 
> archived module graph can be modified by JVMTI. All these classes are 
> loaded in the "early" phase of JVMTI, and we would disable CDS if a 
> JVMTI agent is registered to modify classes in this phase, so we 
> should be safe, but the asserts will ensure that. I updated the 
> ReplaceCriticalClassesForSubgraphs.java test and added a new test 
> RedefineClassesInModuleGraph.java.
>
>
> Thanks
> - Ioi
>
>
>>
>> Thanks,
>> Lois
>>
>>>
>>> Well, verify_javabase_packages() was called once and it succeeded, 
>>> but that was during CDS dump time. So we know at least we have 
>>> verified this once :-)
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> I may have more review comments as I continue to look at this!
>>>>
>>>> Thanks,
>>>> Lois
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>



More information about the hotspot-runtime-dev mailing list