RFR: 8275509: ModuleDescriptor.hashCode isn't reproducible across builds [v9]

Wed Nov 3 03:20:42 UTC 2021

Hello Alan,

On 02/11/21 5:30 pm, Alan Bateman wrote:
> On 02/11/2021 06:38, Jaikiran Pai wrote:
>> :
>>
>> Perhaps run 1 writing the hash code of each of the boot modules' 
>> descriptor into a file and then run 2 reading that same file to 
>> compare the hash codes would be one way to do it. But I think that 
>> would just make this test more complex, which I think is avoidable.
>
> I'm not sure that we really need this. There are several jlink tests 
> that check reproducibility, we think one of them is failing 
> intermittently with this issue already. So maybe we should leave build 
> reproducibility to the jlink tests and expand them as needed.
>
> For ModuleDescriptor::hashCode then I was hoping we keep it simple and 
> just test that the hashCode of the descriptors for modules in the boot 
> layer matches the hashCode computed from re-parsing the module-info 
> files, e.g. something simple like this:
>
>       for (Module module : ModuleLayer.boot().modules()) {
>             System.out.println(module);
>             ModuleDescriptor descriptor1 = module.getDescriptor();
>             ModuleDescriptor descriptor2;
>             try (InputStream in = 
> module.getResourceAsStream("module-info.class")) {
>                 descriptor2 = ModuleDescriptor.read(in);
>             }
>             assertEquals(descriptor1, descriptor2);
>             assertEquals(descriptor1.hashCode(), descriptor2.hashCode());
>             assertEquals(descriptor1.compareTo(descriptor2), 0);
>         }
>
> It run can twice, with with the default, the other with -Xshare:off.

Sounds reasonable. I've updated the PR to simplify the test and run it 
once with default CDS and once with -Xshare:off. Given this 
simplification, it now allows me to use testng for these comparisons, so 
I've moved it from regular "main" to testng test case. I hope that's OK.

>
> One other thought on the change to ModuleDescriptor is that we could 
> drop the modifiers from the computation of the hashCode of 
> ModuleDescriptor, Requires, Exports, Opens. Summing the ordinals is 
> okay but it doesn't give a good spread if you really have modules that 
> have everything the same except for the modifiers then it doesn't 
> really help.

I gave it a bit of thought. However I let the current PR to continue to 
use ordinals for the hash code computation. The reason for that is this 
statement on the Object.hashCode() method which says:

"the programmer should be aware that producing distinct integer results 
for unequal objects may improve the performance of hash tables."

So if two module descriptors have all components equals() except for 
their modifiers and if we don't use the modifiers in our hashCode() 
computation, then they end up producing the same hashCode() which won't 
violate the spec but will not satisfy the above statement about performance.

So I decided to stick with using the ordinal modifiers in the hashCode() 
computation. However, I'm not an expert on the performance aspects of 
the Collections and how much using the modifiers for hashCode() will 
improve the performance in this case. Perhaps that's what you meant by 
it not giving a good (enough) spread? If so, do let me know and I'll go 
ahead and update the PR to remove using the modifiers in the hashCode() 
computation and also update the javadoc of each of those hashCode() 
methods which state that the modifiers are used in the hashCode() 
computation.

-Jaikiran