RFR(L) 8244778 Archive full module graph in CDS
Lois Foltan
lois.foltan at oracle.com
Thu Sep 3 18:45:45 UTC 2020
On 8/31/2020 11:31 AM, Ioi Lam wrote:
> Hi Lois,
>
> Thanks for the review. Your comments has led me to discover a couple
> of pretty serious issues, which hopefully I have fixed. Also the code
> is cleaner now, with much fewer control-flow changes than the last
> webrev.
>
> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v03/
>
> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v03.delta/
>
Hi Ioi,
Looks very good, thanks for making those changes. One minor comment
here then another comment below when you discuss the addition of
check_cds_restrictions().
Minor nit in moduleEntry.cpp & packageEntry.cpp when dealing with the
ModuleEntry's reads list and a PackageEntry's exports list. The names
of the methods to write and read those arrays is somewhat confusing.
ModuleEntry::write_archived_entry_array
ModuleEntry::read_archived_entry_array
At first I thought you were reading/writing an array of archived
entries, not the array within an archived entry itself. I was trying to
think of a better name. Please consider adding a comment at line #400 &
line #417 ahead of those methods in moduleEntry.cpp to indicate that
they are used for both reading/writing a ModuleEntry's reads list and a
PackageEntry's exports list.
>
> Please see my comments in-line.
>
> On 8/25/20 7:58 AM, Lois Foltan wrote:
>>
>> Hi Ioi,
>>
>> Changes looks really good. Comments interspersed below.
>>
>> Thanks,
>> Lois
>>
>> On 8/12/2020 6:06 PM, Ioi Lam wrote:
>>> Hi Lois,
>>>
>>> Thanks for the comments. I have an updated webrev
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v02/
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v02.delta/
>>>
>>>
>>> Here are the general notes on the changes. Replies to your questions
>>> are in-line below.
>>>
>>> (1) Integrated updates in the Java code from Alan Bateman. Here are
>>> Alan's
>>> notes:
>>>
>>> The archive of the boot layer is as before, except that
>>> archiving is
>>> skipped if there are split packages or incubator modules.
>>> Incubating
>>> modules aren't resolved by default so we shouldn't have them in the
>>> boot layer by default anyway.
>>>
>>> I've dropped the module finders from the boot layer archive. I've
>>> left the IllegalAccessLogger.Builder in the acrhive for now (even
>>> though it is not the boot layer). We should be able to remove that
>>> once the JEP to disallow illegal access by default is in.
>>>
>>> Related is that I don't like the archive for the module graph
>>> (ArchivedModuleGraph, pre-dates this RFE) including the set of
>>> packages to export/open for illegal access as they aren't part
>>> of the module graph. I've left it for now but we can also remove
>>> that once we disallow illegal access by default (as those sets
>>> will be empty).
>>>
>>> The archive of built-in class loaders is now in one object
>>> jdk.internal.loader.ArchivedClassLoaders which I think will be a
>>> bit more maintainable. I've also drop the ucp field from the
>>> AppClassLoader as the changes to BuiltinClassLoader means is no
>>> longer needs to duplicated.
>>>
>>> There's one remaining issue in ModuleBootstrap.boot where it has
>>> fix
>>> an app class loader value (ModuleLayer.CLV). Ideally the
>>> initialization
>>> of the built-in class loaders would do this but we are kinda forced
>>> to separate the archiving of the built-in class loaders from the
>>> boot
>>> layer. I might look at this again some time.
>>>
>>>
>>> (2) Moved code from classLoaderData.cpp -> classLoaderSharedData.cpp
>>>
>>> (3) Reverted unnecessary changes in JavaClasses::compute_offset
>>>
>>> (4) Minor clean up to use QuickSort::sort() instead of qsort()
>>>
>>> (5) Moved the C-side of module initialization for platform/system
>>> loaders to inside java.lang.System::initPhase2(), so this happens
>>> at the same time as without full module archiving.
>>>
>>> (6) Moved the use of Module_lock to so all modules in a class loader
>>> are restored atomically. See ArchivedClassLoaderData::restore()
>>>
>>> This fixed a bug where test/jdk/com/sun/jdi/ModulesTest.java
>>> would fail as it sees a partially restored set of modules.
>>>
>>>
>>>
>>> On 8/7/20 12:06 PM, Lois Foltan wrote:
>>>> Hi Ioi,
>>>>
>>>> Overall looks promising. I have some review comments below, but
>>>> not a complete review. I concentrated on the JVM side primarily
>>>> how the archived module graph is read in, how the ModuleEntry and
>>>> PackageEntry tables are created from the archive for the 3 builtin
>>>> loaders and potential timing issues during start up.
>>>>
>>>> On 7/22/2020 3:36 PM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8244778
>>>>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v01/
>>>>>
>>>>>
>>>>> Please review this patch that stores the full module graph in the CDS
>>>>> archive heap. This reduces the initialization time of the basic
>>>>> JVM by
>>>>> about 22%:
>>>>>
>>>>> $ perf stat -r 100 bin/java -version
>>>>> before: 98,219,329 instructions 0.03971 secs elapsed (+- 0.44%)
>>>>> after: 55,835,815 instructions 0.03109 secs elapsed (+- 0.65%)
>>>>>
>>>>> [1] Start with ModuleBootstrap.java. The current implementation is
>>>>> quite restrictive: the archived module graph is used only when no
>>>>> module options are specified.
>>>>>
>>>>> See ModuleBootstrap.mayUseArchivedBootLayer().
>>>>>
>>>>> We can probably support options such as main module and module
>>>>> path
>>>>> in a future RFE.
>>>>
>>>> Yes, I noticed the restrictions. The JBS issue discusses the
>>>> possibility of using the dumped module graph in the event that the
>>>> value of the options are the same. I think for this implementation
>>>> to be viable and have a positive impact you should consider
>>>> comparing at least the options --add-modules, --add-exports and
>>>> --add-reads.
>>>
>>> I agree. I want to keep the changes minimal in this RFE, and will
>>> add more support for other use cases in follow-on RFEs. Instead of
>>> requiring these options to be unspecified, we can relax the
>>> restriction such that these options must be the same between archive
>>> dump time and run time.
>>
>> Sounds good!
>>
>>>
>>>>
>>>>
>>>>>
>>>>> [2] In the current JDK implementation, there is no single object
>>>>> that represents "the module graph". Most of the information
>>>>> is stored in the archive bootLayer object, but a few additional
>>>>> restoration operations need to be performed:
>>>>>
>>>>> + See ModuleBootstrap.getArchivedBootLayer()
>>>>> + Some static fields need to be archived/restored in
>>>>> Module.java, BuiltinClassLoader.java, ClassLoaders.java
>>>>> and BootLoader.java
>>>>>
>>>>> [3] I ran into a complication with two loader instances of
>>>>> PlatformClassLoader and AppClassLoader. They are stored in
>>>>> multiple tables inside the module graph (e.g.,
>>>>> BuiltinClassLoader$LoadedModule) so I cannot easily recreate
>>>>> them at runtime.
>>>>>
>>>>> However, these two loaders contain information specific to the
>>>>> dump time VM lifecycle (such as the classes that were loaded
>>>>> during CDS dumping) that need to be scrubbed. I couldn't find an
>>>>> elegant way of doing this, so I added a private
>>>>> "resetArchivedStates"
>>>>> method to a few classes. They are called inside
>>>>> HeapShared::reset_archived_object_states().
>>>>>
>>>>> [4] Related native data structures (PackageEntry and ModuleEntry)
>>>>> are also archived. Start with classLoaderData.cpp
>>>>>
>>>>> Passes mach5 tiers 1-4. I will test with additional tiers.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>
>>>> classfile/classLoader.cpp
>>>> - line #1644 pulling the javabase_moduleEntry() check outside of
>>>> the Module_lock maybe problematic if after you check it another
>>>> thread initializes in the time between the check and the
>>>> obtaining of the Module_lock on line #1645.
>>>
>>> Fixed.
>>
>> That looks good. I think it is fine that you are checking if
>> java.base is defined via a call to javabase_moduleEntry() because you
>> are just trying to determine if a ModuleEntry should be created or
>> not. In most cases though using ModuleEntryTable::javabase_defined()
>> is the correct way to ensure that both the ModuleEntry for java.base
>> has been created and that the ModuleEntry has been injected in the
>> corresponding java.lang.Module oop.
>>
>>>
>>>>
>>>> classfile/classLoader.hpp
>>>> - somewhere in ArchivedClassLoaderData there should be an assert to
>>>> make sure that the CLD, whose package entries and module entries
>>>> are being archived is not a "_has_class_mirror_holder" CLD for a
>>>> weakly defined hidden class. Those dedicated CLDs should never
>>>> have package entries or module entries.
>>>>
>>>
>>> I added ArchivedClassLoaderData::assert_valid()
>>>
>>>> classfile/moduleEntry.cpp
>>>> - line #400, typo "conver" --> "convert"
>>>> - line #500, maybe sort if n is greater than 1 instead of 0 (same
>>>> comment for packageEntry.cpp line #270)
>>>>
>>> Fixed
>>>
>>>> classfile/systemDictionary.cpp
>>>> - It looks like the PackageEntry and ModuleEntry tables for the
>>>> system and platform class loaders are added within
>>>> SystemDictionary::compute_java_loaders() which is called from
>>>> Thread::create_vm() but after the call in Thread::create_vm() to
>>>> call_initPhase2 which is when Universe::_module_initialized is set
>>>> to true. Did I read this correctly? If yes, then I think you have
>>>> to be sure that Universe::_module_initialized is not set until the
>>>> module graph for the 3 builtin loaders is loaded from the archive.
>>>>
>>>
>>> I moved the code to be called by ModuleBootstrap::boot() so it
>>> should now happen inside call_initPhase2.
>>
>> Yes, that looks good.
>>
>>>
>>>> memory/filemap.cpp
>>>> - line #237 typo "modue" --> "module"
>>>>
>>>
>>> Fixed
>>>
>>>> - Since the CDS support for capturing the module graph does not
>>>> archive unnamed modules, I assume
>>>> Modules::set_bootloader_unnamed_module() is still called. Is this
>>>> true or does the unnamed module for the boot loader get archived?
>>> The unnamed module for the boot loader is not archived.
>>>
>>> Modules::set_bootloader_unnamed_module() wasn't called in my last
>>> webrev. Thanks for catching it.
>>>
>>> I added a call to BootLoader.getUnnamedModule() in
>>> ModuleBootstrap::boot() to trigger <clinit> of BootLoader, which
>>> will call into the VM for Modules::set_bootloader_unnamed_module().
>>
>> Looks good.
>>
>>>
>>>
>>>
>>>> Clarification/timing questions:
>>>>
>>>
>>> Here's an overall problem I am facing:
>>>
>>> The module graph is archived after the module system has fully
>>> started up. This means that for the boot loader, we only know the
>>> full set of modules/packages, but we don't know which part is the
>>> subset that was initialized during early VM bootstraping (e.g., when
>>> ModuleEntryTable::javabase_defined() == false).
>>>
>>> So the behavior is a bit different during the early bootstrap phase
>>> (all the way before we reach ModuleBootstrap::boot()):
>>> ClassLoaderData::the_null_class_loader_data()->modules() and
>>> ClassLoaderData::the_null_class_loader_data()->packages() are
>>> already populated before a single class is loaded.
>>
>> If this is the case then, at the point when a ModuleEntry is created
>> for java.base using the archived module graph, there should be a
>> assertion that java_lang_Class::_fixup_module_field_list is NULL. To
>> confirm no class has been loaded before java.base is defined. Maybe
>> in ClassLoaderDataShared::serialize where you restore the boot
>> loader's archived modules?
>
> Thanks for pointing this out. It turned out that in the last webrev
> (v02), I had a bug where the module of the primitive classes were not
> initialized. Now I have changed the initialization code to behave the
> same whether archived full module graph is used or not. The only
> differences are:
>
> [1] ClassLoader::create_javabase():
> ModuleEntryTable::javabase_moduleEntry() may be inited by CDS.
>
> [2] When archived full module graph is used,
> ModuleEntryTable::patch_javabase_entries is called from
> Modules::define_archived_modules.
> (java_lang_Class::_fixup_module_field_list is used within this call.)
>
> I also added a new test case: cds/PrimitiveClassMirrors.java
Good test, I'm glad you added that!
>
>>
>>> This difference doesn't seem to make a practical difference. E.g.,
>>> none of our code seems to assume that "before any classes in the
>>> java.util package is loaded,
>>> ClassLoaderData::the_null_class_loader_data()->packages() must not
>>> contain an entry for java.util".
>>>
>>> I think we have two choices when the archived module graph is used:
>>>
>>> [1] We require that the state of the module system must, at every
>>> step of VM initialization, be identical to that of a VM that doesn't
>>> use an archived module graph.
>>>
>>> [2] We make sure that the VM/JDK bootstrap code can tolerate the
>>> existence of module/packages that are added earlier than a VM that
>>> doesn't use an archived module graph.
>>>
>>> I tried doing a version of [1] and found that to be too difficult.
>>> [2] seems much simpler and is the approach I am using now.
>>
>> I think [2] is reasonable.
>>
>>>
>>>> oops/instanceKlass.cpp
>>>> - line #2545, comment "TO DO -- point it to the archived
>>>> PackageEntry (JDK-8249262)"
>>>> are you thinking that since the module graph is read in ahead of
>>>> time that it can be set when an InstanceKlass is created? There is
>>>> a point before java.base is defined that InstanceKlass::set_package
>>>> takes into account that could be a timing issue.
>>>>
>>>>
>>>
>>> I think it will work as if another class in the same package has
>>> already been defined.
>>>
>>>> - There are some checks in modules.cpp that are valuable to have
>>>> during bootstrapping. For example, packages that are encountered
>>>> during bootstrapping before java.base is defined are put into the
>>>> java.base module but then verified after java.base is defined via
>>>> verify_javabase_packages. So at what point in the bootstrapping
>>>> process does the boot loader's archived module's become known?
>>>> Could classes have been defined to the VM before this point? Then
>>>> their packages will have to be verified to be sure that they are
>>>> indeed packages within java.base. Consider looking at other checks
>>>> that occur within classfile/modules.cpp as well.
>>>>
>>> As mentioned above, calling verify_javabase_packages() at run time
>>> will fail, as we have loaded all packages for the boot loader, not
>>> just those for java.base.
>>
>> I think not calling verify_javabase_packages works because as you
>> stated above no classes have been loaded before java.base is defined
>> which is not the same situation as running without the archived
>> module graph.
>>
>> A couple of additional comments:
>>
>> - ModuleEntry's reads list and PackageEntry's exports list. We had
>> hoped eventually to change these lists from being a c heap allocated
>> GrowableArray over to a ResourceHashTable for faster lookup. Doing
>> that as a separate RFE might help CDS archiving not to have to
>> archive those lists as an Array?
>>
>
> CDS cannot archive ResourceHashTable either. We have CompactHashtable
> which can be archived, but it cannot be modified at run time. I think
> the export list can be modified at run time with
> java.lang.Module::addExports(), so we probably need to do it like:
>
> - if a Module was archived, check its CompactHashtable first
> - if not found, check the ResourceHashTable
>
> This would make the start-up a little faster (no more copying from the
> array lists into the hashtable, but would make the code more
> complicated. I need to investigate to see if it's worthwhile.
>
>> - moduleEntry.cpp and packageEntry.cpp: Both methods
>> "compare_module_by_name" and "compare_package_by_name" should have an
>> assert if the names are equal. No ModuleEntryTable or
>> PackageEntryTable should ever have 2 same named modules or packages.
>>
>
> I tried adding:
>
> static int compare_package_by_name(PackageEntry* a, PackageEntry* b) {
> assert(a != b && a->name() != b->name(), "array should not contain
> duplicated entries");
> return a->name()->fast_compare(b->name());
> }
>
> but this caused an assert, because our QuickSort implementation would
> sometimes compare the same element!
>
> #7 0x00007ffff659238b in QuickSort::partition<true, PackageEntry*,
> int (*)(void const*, void const*)> (array=0x80043fcb0,
> pivot=248, length=496, comparator=0x7ffff6590a5a
> <compare_package_by_name(PackageEntry*, PackageEntry*)>)
> at /jdk2/gil/open/src/hotspot/share/utilities/quickSort.hpp:76
> 76 for ( ; comparator(array[left_index], pivot_val) < 0;
> ++left_index) {
> (gdb) p array[left_index]
> $1 = (PackageEntry *) 0x7ffff0439d00
> (gdb) p pivot_val
> $2 = (PackageEntry *) 0x7ffff0439d00
> (gdb) p pivot
> $3 = 248
> (gdb) p left_index
> $4 = 248
>
> So I ended up with:
>
> static int compare_package_by_name(PackageEntry* a, PackageEntry* b) {
> assert(a == b || a->name() != b->name(), "no duplicated names");
> return a->name()->fast_compare(b->name());
> }
Looks good to me!
>
>> - Another timing clarification question for me. I assume that the
>> module graph is dumped post module initialization when Java has
>> completely defined the module graph to the JVM, is this correct?
>
> Yes
>
>> My concern is that there could be a window post module initialization
>> and pre archiving the module graph where another thread could define
>> a new module or add Module readability or add Package exportability.
>> So that the module graph you are dumping is not the same module graph
>> at the end of module initialization. Is this a concern?
>
> We don't allow arbitrary code to be executed during -Xshare:dump, so
> the module graph shouldn't be changed after module initialization has
> finished.
>
> I've added Modules::check_cds_restrictions() to check for this.
A question about this because a user's program can define modules post
module initialization via ModuleDescriptor.newModule(). See for
example, tests within
open/test/hotspot/jtreg/runtime/module/AccessCheck. So all of these
tests would trigger check_cds_restrictions() if -Xshare:dump was turned
on. Is that a concern?
Thanks,
Lois
>
> I also added asserts to make sure that none of the classes used by the
> archived module graph can be modified by JVMTI. All these classes are
> loaded in the "early" phase of JVMTI, and we would disable CDS if a
> JVMTI agent is registered to modify classes in this phase, so we
> should be safe, but the asserts will ensure that. I updated the
> ReplaceCriticalClassesForSubgraphs.java test and added a new test
> RedefineClassesInModuleGraph.java.
>
>
> Thanks
> - Ioi
>
>
>>
>> Thanks,
>> Lois
>>
>>>
>>> Well, verify_javabase_packages() was called once and it succeeded,
>>> but that was during CDS dump time. So we know at least we have
>>> verified this once :-)
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> I may have more review comments as I continue to look at this!
>>>>
>>>> Thanks,
>>>> Lois
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>
More information about the hotspot-runtime-dev
mailing list