JDK-8275509: (jlink) SystemModulesPlugin generates a jdk.internal.module.SystemModules$all.class which isn't reproducible
Jaikiran Pai
jai.forums2013 at gmail.com
Tue Oct 19 12:31:07 UTC 2021
This relates to the intermittent failures in
tools/jlink/JLinkReproducibleTest.java test case which has been
ProblemListed for a while now. The root cause is
https://bugs.openjdk.java.net/browse/JDK-8275509. I couldn't find any
specific mailing lists for jlink tool and I remember seeing
jlink/jpackage related discussions on this mailing list previously, so
creating this discussion here.
The jlink tool uses plugins to let them transform contents of the
modules that are part of the image. One such plugin is the
SystemModulesPlugin and as per its javadoc specification its role is to
prevent parsing of module-info.class files during JVM startup. To do so,
it creates specialized dynamically generated class files (during jlink).
The class file(s) are then bundled into the image that gets generated.
One such class file is a dynamically generated
jdk/internal/module/SystemModules$all.class. The SystemModulesPlugin
generates the bytecode for this class. The details of that bytecode are
very implementation specific. One part of the bytecode generation
involves bytecode for an internal method called "moduleDescriptors()".
One of the statements generated for this method implementation is a call
to jdk.internal.module.Builder.build(int hashCode) method[1]. What this
call does is, it creates a (pre-populated/validated) instance of the
java.lang.module.ModuleDescriptor.
The SystemModulesPlugin, when generating the bytecode of this method,
uses the hashCode() of the ModuleDescriptor of the current JVM instance
(through a typical ModuleDescriptor#hashCode() call)[2]. By doing so, it
ends up generating bytecode which "embeds" the current runtime specific
hashcode into the generated class file.
The contract of java.lang.Object#hashCode() states:
"
...
This integer need not remain consistent from one execution of an
application to another execution of the same application.
....
"
Effectively, what this means is, if jlink is used to generate an image
with the exact same set of modules (requirements, package, exports,
opens etc...) it still can (and does) end up generating a
jdk/internal/module/SystemModules$all.class file whose binary content
will differ across these runs, thus being non-reproducible.
The implementation in java.lang.module.ModuleDescriptor is such that if
the hashcode passed to it (during construction) is 0, then it lazily
computes the correct hashcode whenever the invocation of hashCode()
happens at runtime. What that means is, the SystemModulesPlugin in its
bytecode generation for the moduleDescriptors() method could always pass
0 as the hashcode to the jdk.internal.module.Builder.build(int hashCode)
call. That's something that I experimented with and after that change,
with 100s of runs of the JLinkReproducibleTest, I no longer get any
failures. However, given that the SystemModulesPlugin's goal appears to
be to reduce the booting time for system modules, I was wondering if
this change would introduce any performance penalty that is big enough.
What this change will end up doing is, whenever the next time the
hashCode method on instances of the system module's ModuleDescriptor
gets called, it will have to compute it and that computation is
relatively expensive (given how many "components" it uses to calculate
it[3]). It's a (mostly) one time thing for each instance, but I don't
know how expensive that would be. Do we have any existing benchmarks in
this area that I could reuse to see what performance impact this might
have? Keeping aside the performance issue for a bit, is this proposed
patch something worth considering:
---
a/src/jdk.jlink/share/classes/jdk/tools/jlink/internal/plugins/SystemModulesPlugin.java
+++
b/src/jdk.jlink/share/classes/jdk/tools/jlink/internal/plugins/SystemModulesPlugin.java
@@ -1099,7 +1099,13 @@ public final class SystemModulesPlugin extends
AbstractPlugin {
mv.visitVarInsn(ALOAD, MD_VAR);
pushInt(mv, index);
mv.visitVarInsn(ALOAD, BUILDER_VAR);
- mv.visitLdcInsn(md.hashCode());
+ // Let the ModuleDescriptor hashcode be computed at
runtime.
+ // Embedding the current hashcode of the ModuleDescriptor
+ // into the bytecode of a generated class can cause the
generated
+ // bytecode to be not reproducible, since an object's
hashcode is allowed
+ // to change across JVM runs.
+ mv.visitLdcInsn(0); // the hashcode to be passed to the
+ //
jdk.internal.module.Builder.build(int) method
mv.visitMethodInsn(INVOKEVIRTUAL,
MODULE_DESCRIPTOR_BUILDER,
"build", "(I)Ljava/lang/module/ModuleDescriptor;",
false);
The other option I experimented with was to make
ModuleDescriptor#hashCode() generate the same hashcode across multiple
JVM runs. Although I do have a "working" version of that change, I
decided not to spend too much time on it because the
java.lang.Object#hashCode() contract itself clearly states that this
value isn't expected to be same across multiple JVM runs. So whatever I
do here is going to be brittle.
[1]
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/module/Builder.java#L265
[2]
https://github.com/openjdk/jdk/blob/master/src/jdk.jlink/share/classes/jdk/tools/jlink/internal/plugins/SystemModulesPlugin.java#L1102
[3]
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/module/ModuleDescriptor.java#L2250
P.S: The major part of accurate investigation work for these
intermittent failures was already done by Dongbo He in
https://bugs.openjdk.java.net/browse/JDK-8258945. I just used that
script and details to try and come up with a patch.
-Jaikiran
More information about the core-libs-dev
mailing list