RFC: Change CDS JAR file validation

Ioi Lam ioi.lam at oracle.com
Thu Sep 1 04:47:46 UTC 2022


Proposal Summary:

Use a digest of a JAR file to detect if the file has changed


Background
==========

CDS is in effect a caching mechanism -- it needs to make sure that the 
InstanceKlasses stored in the archive are the same as those parsed from 
classfiles.

To do this, we archive only the classes from (a) the JDK's modules image 
file and (b) JAR files. We don't archive classes in directories since 
it's difficult to check if the contents of a directory have changed.

At runtime, we assume that (a) didn't change, since we require the exact 
same JDK build to be used.

For (b) we currently do this:

(1) Check that -classpath and -Xbootclasspath (absolute paths) are 
identical between run time and dump time.
(2) For each JAR file in cp and bcp, check if its size and modification 
time has changed.
(3) (Something similar happens with the module path ....)

We have used this scheme for almost a decade. Note that we avoid reading 
the JAR files as that may slow down start-up time on old spinning disks. 
However, as most work-loads run on SSDs now, I believe this is no longer 
a concern.

Recently, we are seeing problems when people deploy CDS inside containers:

For (1) the file system structure may different between run time and 
dump time. We can kludge around this problem by using relative paths 
instead of absolute paths, but this will make the existing code even 
more complicated.

For (2) when deploying the app, it may not be easy to keep the 
modification time unchanged (see JDK-8284692).


Proposal
========

For (1) - don't not compare directory name anymore. Only check that the 
filename is the same:

E.g.

Dump:
     java -Xshare:dump -cp dir1/Foo.jar:dir2/Bar.jar ..

Run:
     java -cp dir1/Foo.jar:dir2/Bar.jar ...     [OK]
java -cp Foo.jar:Bar.jar ...               [OK]
     java -cp Foo.jar:Bxx.jar ... [Fail - changed from Bar.jar to Bxx.jar]

For (2)

- Check that file size has not changed.
- Compute a digest of the file. Check that this has not changed.


Digest
======

The purpose is not for security or (malicious) tamper detection. It's 
for guarding against innocent mistakes (forgot to regenerate CDS archive 
after JAR files have been updated). Therefore, we don't need to run an 
expensive digest like MD5.

Instead, it should be enough to just do a quick XOR digest of the first 
128 bytes of the JAR file. Since this part usually contains the 
META-INF/ directory and its modification time. So it effectively 
contains the time when this JAR file was created. The timestamp seems to 
have a 2 second resolution:

$ while true; do jar cfm foo.jar MANIFEST.MF HelloWorld.class ; head -c 
128 foo.jar | cksum; sleep 2; done
3803507028 128
1857545662 128
916098721 128
3740087168 128
2260752543 128
3257546723 128
2584173820 128
...


Advantage:

- Make it easier to deploy CDS archive (fewer false negatives)
- Simplify logic in the CDS code

Risks:

- Opening every JAR file may cause a slow down if you have lots of JARs 
in the classpath running on a slow file system.



More information about the hotspot-runtime-dev mailing list