JDK-8275509: (jlink) SystemModulesPlugin generates a jdk.internal.module.SystemModules$all.class which isn't reproducible

Ioi Lam ioi.lam at oracle.com
Wed Dec 1 21:39:00 UTC 2021



On 11/7/21 9:44 PM, Jaikiran Pai wrote:
> Hello Ioi,
>
> On 02/11/21 12:13 am, Ioi Lam wrote:
>> Hi Jaikiran,
>>
>> Thanks for writing the test case to explore the problems in this area.
>>
>> Please see my comments below:
>> ...
>>
>> Generally speaking, CDS has two levels of archiving:
>>
>> [1] archiving class metadata -- classes in the 
>> $JAVA_HOME/lib/classlist are considered to be frequently loaded 
>> classes. They are parsed from classfiles and stored into the CDS 
>> archive. At run time, instead of parsing the classes from classfiles, 
>> the VM directly use the pre-parsed version of these classes (as 
>> InstanceKlass* in C++).
>>
>> At runtime, all such pre-parsed classes are initially in the "loaded" 
>> state. This means their static constructors will be executed when 
>> these classes are referenced for the first time. So as far as Java 
>> semantic is concerned, there's no difference between a pre-parsed 
>> class vs a class loaded from classfile.
>>
>> E.g, the examples of loggers in static initializers will be executed 
>> at runtime.
>>
>> [2] archiving heap objects
>>
>> As shown in your test, we cannot arbitrarily archive the static 
>> fields that were initialized during -Xshare:dump, because they may 
>> have environment dependency.
>>
>> The strategy used by CDS is to archive only a few static fields in a 
>> small number of carefully hand-picked system classes. You can see the 
>> list in
>>
>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/977154400be786c500f36ba14188bff79db57075/src/hotspot/share/cds/heapShared.cpp*L97__;Iw!!ACWV5N9M2RV99hQ!eWsSVUa8qjZ0BWlbOonNdDtE7dcU3w4c9Su5hb24IXirxZFdPoS6wVBMi-78hA$ 
>>
>>
> Thank you for that link. That helped. So essentially even though the 
> list of classes used for archiving class metadata isn't very tightly 
> controlled, the list of objects which are archived in the heap is much 
> more selective.
>
> The reason why my PoC ended up reproducing this issue is because it 
> just so happened that I selected a class (ModuleDescriptor) which 
> (indirectly) is hand-picked in that list of classes that can end up in 
> the archived heap.
>
>> These static fields are stored into the CDS archive. At run time, 
>> these fields are essentially copied into the Java heap, and then 
>> picked up by code like this:
>>
>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/977154400be786c500f36ba14188bff79db57075/src/java.base/share/classes/jdk/internal/module/ModuleBootstrap.java*L163__;Iw!!ACWV5N9M2RV99hQ!eWsSVUa8qjZ0BWlbOonNdDtE7dcU3w4c9Su5hb24IXirxZFdPoS6wVABkO6Q0w$ 
>>
>>
>>     public static ModuleLayer boot() {
>>         Counters.start();
>>
>>         ModuleLayer bootLayer;
>>         ArchivedBootLayer archivedBootLayer = ArchivedBootLayer.get();
>>         if (archivedBootLayer != null) {
>>             assert canUseArchivedBootLayer();
>>             bootLayer = archivedBootLayer.bootLayer();
>>             BootLoader.getUnnamedModule(); // trigger <clinit> of 
>> BootLoader.
>> CDS.defineArchivedModules(ClassLoaders.platformClassLoader(), 
>> ClassLoaders.appClassLoader());
>>
>>             // assume boot layer has at least one module providing a 
>> service
>>             // that is mapped to the application class loader.
>>             JLA.bindToLoader(bootLayer, ClassLoaders.appClassLoader());
>>         } else {
>>             bootLayer = boot2();
>>         }
>>
>> In the case of the module graph, we remove things that depend on the 
>> environment (such as CLASSPATH)
>>
>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/977154400be786c500f36ba14188bff79db57075/src/hotspot/share/cds/heapShared.cpp*L190__;Iw!!ACWV5N9M2RV99hQ!eWsSVUa8qjZ0BWlbOonNdDtE7dcU3w4c9Su5hb24IXirxZFdPoS6wVAx46l0Bg$ 
>>
>>
>> The remaining parts of the archived module graph only depend on the 
>> following system properties:
>>
>>     private static boolean canUseArchivedBootLayer() {
>>         return getProperty("jdk.module.upgrade.path") == null &&
>>                getProperty("jdk.module.path") == null &&
>>                getProperty("jdk.module.patch.0") == null &&       // 
>> --patch-module
>>                getProperty("jdk.module.main") == null &&          // 
>> --module
>>                getProperty("jdk.module.addmods.0") == null &&    // 
>> --add-modules
>>                getProperty("jdk.module.limitmods") == null &&     // 
>> --limit-modules
>>                getProperty("jdk.module.addreads.0") == null &&    // 
>> --add-reads
>>                getProperty("jdk.module.addexports.0") == null &&  // 
>> --add-exports
>>                getProperty("jdk.module.addopens.0") == null; // 
>> --add-opens
>>     }
>>
>> As a result, we will invalidate the archived module graph if these 
>> properties differ between dump time and run time. The Java code above 
>> only asserts that the check has already been done. The actual check 
>> is done in here:
>>
>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/977154400be786c500f36ba14188bff79db57075/src/hotspot/share/runtime/arguments.cpp*L1339__;Iw!!ACWV5N9M2RV99hQ!eWsSVUa8qjZ0BWlbOonNdDtE7dcU3w4c9Su5hb24IXirxZFdPoS6wVAwq_becQ$ 
>>
>
> Understood.
>
>>
>>> Am I misunderstanding the severity of this issue or is this serious 
>>> enough that -Xshare:off should be default (or heap archiving 
>>> disabled somehow by default till this is fixed) to prevent issues 
>>> like these which can at the minimal be hard to debug bugs and on the 
>>> extreme end perhaps leak things from the build server where the JDK 
>>> was built? I guess it all boils down to which exact classes get 
>>> replaced/mapped/copied over from the heap archive? Is there a 
>>> definitive list that can be derived in the current JDK?
>>>
>>
>> I believe the currently implementation is still safe to use (barring 
>> the problems with enums). For sanity, I'll try to write a static 
>> analysis tool to check that the archived module graph doesn't contain 
>> any reference to fields that may be reinitialized at runtime.
>
> I think if such a static analysis tool can be developed, then it would 
> certainly be useful/reassuring that we don't accidentally end up with 
> unexpected data in the archived heap. I'm not a Reviewer but I can 
> imagine it being difficult to catch these changes that have valid Java 
> semantics but at the same time can cause issues due to archiving the 
> heap. So if this static tool can be automated maybe as a jtreg test 
> case then any such changes could automatically be caught before those 
> changes end up in a release.
>
> Thank you again for the detailed response along with pointers to the 
> code. That helped understand this heap archiving process to a large 
> extent. 

Hi Jaikiran,

I finally finished the static analysis tool and the fix for the archived 
Enums. Please see https://github.com/openjdk/jdk/pull/6653

The tool requires quite a bit of human interaction (see the ADD_EXCL 
lines in cdsHeapVerifier.cpp).

The output of the tool is clean right now. The test case 
ArchivedEnumTest.java monitors the tool's output. If anything is flagged 
in the future (due to changes inside the Java code), it will result in a 
test failure so hopefully someone will take a look, and either fix the 
bug, or fix the ADD_EXCL lines :-(

If you have any suggestion for improvement, please let me know.

Thanks
- Ioi





More information about the core-libs-dev mailing list