JDK-8275509: (jlink) SystemModulesPlugin generates a jdk.internal.module.SystemModules$all.class which isn't reproducible

Jaikiran Pai jai.forums2013 at gmail.com
Tue Oct 19 13:49:21 UTC 2021


Hello Alan,

On 19/10/21 6:59 pm, Alan Bateman wrote:
> On 19/10/2021 13:31, Jaikiran Pai wrote:
>> This relates to the intermittent failures in 
>> tools/jlink/JLinkReproducibleTest.java test case which has been 
>> ProblemListed for a while now. The root cause is 
>> https://bugs.openjdk.java.net/browse/JDK-8275509. I couldn't find any 
>> specific mailing lists for jlink tool and I remember seeing 
>> jlink/jpackage related discussions on this mailing list previously, 
>> so creating this discussion here.
>
> Reproducible builds have been a game of whack-a-mole. Many issues have 
> been fixed in recent releases to the JDK is a lot better than it used 
> to be. As it happens, someone else interested in reproducible builds 
> brought up the issue of the hashCode on jigsaw-dev a few weeks ago [1].

Ah! So this exact same investigation had already happened a few weeks 
back then. I haven't subscribed to that list, so missed it. I see in one 
of those messages this part:

"Off hand I can't think of any issues with the ModuleDescriptor 
hashCode. It is computed at link time and should be deterministic. If I 
were to guess then then this may be something to do with the module 
version recorded at compile-time at that is one of the components that 
the hash is based on."

To be clear, is the ModuleDescriptor#hashCode() expected to return 
reproducible (same) hashCode across multiple runs? What currently 
changes the hashCode() across multiple runs is various components within 
ModuleDescriptor's hashCode() implementation using the hashCode() of the 
enums (specifically the various Modifier enums). For example here 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/module/ModuleDescriptor.java#L330 
(the mods.hashCode()). Since the hashCode() returned by enums is 
literally through a call to java.lang.Object#hashCode(), those 
hashCode() value ended up changing across JVM runs, in one of the setups 
I was testing (which I didn't consider a surprise since that's what the 
Object#hashCode() stated).

The other approach that I talked about in my previous mail of trying to 
make ModuleDescriptor#hashCode() reproducible involved using the enum's 
ordinal value as a part of the hashCode() computation instead of calling 
the enum's hashCode() method. Very crudely, it looked like:

diff --git 
a/src/java.base/share/classes/java/lang/module/ModuleDescriptor.java 
b/src/java.base/share/classes/java/lang/module/ModuleDescriptor.java
index a412dd753cc..13c8cd04360 100644
--- a/src/java.base/share/classes/java/lang/module/ModuleDescriptor.java
+++ b/src/java.base/share/classes/java/lang/module/ModuleDescriptor.java
@@ -327,7 +327,7 @@ public class ModuleDescriptor
           */
          @Override
          public int hashCode() {
-            int hash = name.hashCode() * 43 + mods.hashCode();
+            int hash = name.hashCode() * 43 + enumOrdinalHashCode(mods);
              if (compiledVersion != null)
                  hash = hash * 43 + compiledVersion.hashCode();
              if (rawCompiledVersion != null)
@@ -505,7 +505,7 @@ public class ModuleDescriptor
           */
          @Override
          public int hashCode() {
-            int hash = mods.hashCode();
+            int hash = enumOrdinalHashCode(mods);
              hash = hash * 43 + source.hashCode();
              return hash * 43 + targets.hashCode();
          }
@@ -708,7 +708,7 @@ public class ModuleDescriptor
           */
          @Override
          public int hashCode() {
-            int hash = mods.hashCode();
+            int hash = enumOrdinalHashCode(mods);
              hash = hash * 43 + source.hashCode();
              return hash * 43 + targets.hashCode();
          }
@@ -2261,7 +2261,7 @@ public class ModuleDescriptor
          int hc = hash;
          if (hc == 0) {
              hc = name.hashCode();
-            hc = hc * 43 + Objects.hashCode(modifiers);
+            hc = hc * 43 + enumOrdinalHashCode(modifiers);
              hc = hc * 43 + requires.hashCode();
              hc = hc * 43 + Objects.hashCode(packages);
              hc = hc * 43 + exports.hashCode();
@@ -2546,6 +2546,21 @@ public class ModuleDescriptor
                  .collect(Collectors.joining(" "));
      }

+    /**
+     * Generates and returns a hashcode for the enum instances. The 
returned hashcode
+     * is a sum of each of the enum instances' {@link Enum#ordinal() 
ordinal} value.
+     */
+    private static int enumOrdinalHashCode(final Iterable<? extends 
Enum<?>> enums) {
+        int h = 0;
+        for (final Enum<?> e : enums) {
+            if (e == null) {
+                continue;
+            }
+            h += e.ordinal();
+        }
+        return h;
+    }
+
      private static <T extends Object & Comparable<? super T>>
      int compare(T obj1, T obj2) {
          if (obj1 != null) {

With this change (and this change only, no changes in 
SystemModulesPlugin were needed) I was able to consistently run that 
test without any failures. But I didn't pursue this effort further 
because I thought making ModuleDescriptor#hashCode() return reproducible 
result wasn't a goal.

Do you think we should be making ModuleDescriptor#hashCode() 
deterministic and reproducible across runs? If so, if that above patch 
looks reasonable, I can clean it up a bit and run some further tests.


-Jaikiran



More information about the core-libs-dev mailing list