JDK-8275509: (jlink) SystemModulesPlugin generates a jdk.internal.module.SystemModules$all.class which isn't reproducible

Jaikiran Pai jai.forums2013 at gmail.com
Tue Oct 19 12:31:07 UTC 2021


This relates to the intermittent failures in 
tools/jlink/JLinkReproducibleTest.java test case which has been 
ProblemListed for a while now. The root cause is 
https://bugs.openjdk.java.net/browse/JDK-8275509. I couldn't find any 
specific mailing lists for jlink tool and I remember seeing 
jlink/jpackage related discussions on this mailing list previously, so 
creating this discussion here.

The jlink tool uses plugins to let them transform contents of the 
modules that are part of the image. One such plugin is the 
SystemModulesPlugin and as per its javadoc specification its role is to 
prevent parsing of module-info.class files during JVM startup. To do so, 
it creates specialized dynamically generated class files (during jlink). 
The class file(s) are then bundled into the image that gets generated.

One such class file is a dynamically generated 
jdk/internal/module/SystemModules$all.class. The SystemModulesPlugin 
generates the bytecode for this class. The details of that bytecode are 
very implementation specific. One part of the bytecode generation 
involves bytecode for an internal method called "moduleDescriptors()". 
One of the statements generated for this method implementation is a call 
to jdk.internal.module.Builder.build(int hashCode) method[1]. What this 
call does is, it creates a (pre-populated/validated) instance of the 
java.lang.module.ModuleDescriptor.

The SystemModulesPlugin, when generating the bytecode of this method, 
uses the hashCode() of the ModuleDescriptor of the current JVM instance 
(through a typical ModuleDescriptor#hashCode() call)[2]. By doing so, it 
ends up generating bytecode which "embeds" the current runtime specific 
hashcode into the generated class file.

The contract of java.lang.Object#hashCode() states:

"
...
This integer need not remain consistent from one execution of an 
application to another execution of the same application.
....
"

Effectively, what this means is, if jlink is used to generate an image 
with the exact same set of modules (requirements, package, exports, 
opens etc...) it still can (and does) end up generating a 
jdk/internal/module/SystemModules$all.class file whose binary content 
will differ across these runs, thus being non-reproducible.

The implementation in java.lang.module.ModuleDescriptor is such that if 
the hashcode passed to it (during construction) is 0, then it lazily 
computes the correct hashcode whenever the invocation of hashCode() 
happens at runtime. What that means is, the SystemModulesPlugin in its 
bytecode generation for the moduleDescriptors() method could always pass 
0 as the hashcode to the jdk.internal.module.Builder.build(int hashCode) 
call. That's something that I experimented with and after that change, 
with 100s of runs of the JLinkReproducibleTest, I no longer get any 
failures. However, given that the SystemModulesPlugin's goal appears to 
be to reduce the booting time for system modules, I was wondering if 
this change would introduce any performance penalty that is big enough. 
What this change will end up doing is, whenever the next time the 
hashCode method on instances of the system module's ModuleDescriptor 
gets called, it will have to compute it and that computation is 
relatively expensive (given how many "components" it uses to calculate 
it[3]). It's a (mostly) one time thing for each instance, but I don't 
know how expensive that would be. Do we have any existing benchmarks in 
this area that I could reuse to see what performance impact this might 
have? Keeping aside the performance issue for a bit, is this proposed 
patch something worth considering:

--- 
a/src/jdk.jlink/share/classes/jdk/tools/jlink/internal/plugins/SystemModulesPlugin.java
+++ 
b/src/jdk.jlink/share/classes/jdk/tools/jlink/internal/plugins/SystemModulesPlugin.java
@@ -1099,7 +1099,13 @@ public final class SystemModulesPlugin extends 
AbstractPlugin {
                  mv.visitVarInsn(ALOAD, MD_VAR);
                  pushInt(mv, index);
                  mv.visitVarInsn(ALOAD, BUILDER_VAR);
-                mv.visitLdcInsn(md.hashCode());
+                // Let the ModuleDescriptor hashcode be computed at 
runtime.
+                // Embedding the current hashcode of the ModuleDescriptor
+                // into the bytecode of a generated class can cause the 
generated
+                // bytecode to be not reproducible, since an object's 
hashcode is allowed
+                // to change across JVM runs.
+                mv.visitLdcInsn(0); // the hashcode to be passed to the
+                                          // 
jdk.internal.module.Builder.build(int) method
                  mv.visitMethodInsn(INVOKEVIRTUAL, 
MODULE_DESCRIPTOR_BUILDER,
                      "build", "(I)Ljava/lang/module/ModuleDescriptor;",
                      false);


The other option I experimented with was to make 
ModuleDescriptor#hashCode() generate the same hashcode across multiple 
JVM runs. Although I do have a "working" version of that change, I 
decided not to spend too much time on it because the 
java.lang.Object#hashCode() contract itself clearly states that this 
value isn't expected to be same across multiple JVM runs. So whatever I 
do here is going to be brittle.


[1] 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/module/Builder.java#L265
[2] 
https://github.com/openjdk/jdk/blob/master/src/jdk.jlink/share/classes/jdk/tools/jlink/internal/plugins/SystemModulesPlugin.java#L1102
[3] 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/module/ModuleDescriptor.java#L2250


P.S: The major part of accurate investigation work for these 
intermittent failures was already done by Dongbo He in 
https://bugs.openjdk.java.net/browse/JDK-8258945. I just used that 
script and details to try and come up with a patch.



-Jaikiran



More information about the core-libs-dev mailing list