RFC: Change CDS JAR file validation
Ioi Lam
ioi.lam at oracle.com
Thu Sep 1 04:47:46 UTC 2022
Proposal Summary:
Use a digest of a JAR file to detect if the file has changed
Background
==========
CDS is in effect a caching mechanism -- it needs to make sure that the
InstanceKlasses stored in the archive are the same as those parsed from
classfiles.
To do this, we archive only the classes from (a) the JDK's modules image
file and (b) JAR files. We don't archive classes in directories since
it's difficult to check if the contents of a directory have changed.
At runtime, we assume that (a) didn't change, since we require the exact
same JDK build to be used.
For (b) we currently do this:
(1) Check that -classpath and -Xbootclasspath (absolute paths) are
identical between run time and dump time.
(2) For each JAR file in cp and bcp, check if its size and modification
time has changed.
(3) (Something similar happens with the module path ....)
We have used this scheme for almost a decade. Note that we avoid reading
the JAR files as that may slow down start-up time on old spinning disks.
However, as most work-loads run on SSDs now, I believe this is no longer
a concern.
Recently, we are seeing problems when people deploy CDS inside containers:
For (1) the file system structure may different between run time and
dump time. We can kludge around this problem by using relative paths
instead of absolute paths, but this will make the existing code even
more complicated.
For (2) when deploying the app, it may not be easy to keep the
modification time unchanged (see JDK-8284692).
Proposal
========
For (1) - don't not compare directory name anymore. Only check that the
filename is the same:
E.g.
Dump:
java -Xshare:dump -cp dir1/Foo.jar:dir2/Bar.jar ..
Run:
java -cp dir1/Foo.jar:dir2/Bar.jar ... [OK]
java -cp Foo.jar:Bar.jar ... [OK]
java -cp Foo.jar:Bxx.jar ... [Fail - changed from Bar.jar to Bxx.jar]
For (2)
- Check that file size has not changed.
- Compute a digest of the file. Check that this has not changed.
Digest
======
The purpose is not for security or (malicious) tamper detection. It's
for guarding against innocent mistakes (forgot to regenerate CDS archive
after JAR files have been updated). Therefore, we don't need to run an
expensive digest like MD5.
Instead, it should be enough to just do a quick XOR digest of the first
128 bytes of the JAR file. Since this part usually contains the
META-INF/ directory and its modification time. So it effectively
contains the time when this JAR file was created. The timestamp seems to
have a 2 second resolution:
$ while true; do jar cfm foo.jar MANIFEST.MF HelloWorld.class ; head -c
128 foo.jar | cksum; sleep 2; done
3803507028 128
1857545662 128
916098721 128
3740087168 128
2260752543 128
3257546723 128
2584173820 128
...
Advantage:
- Make it easier to deploy CDS archive (fewer false negatives)
- Simplify logic in the CDS code
Risks:
- Opening every JAR file may cause a slow down if you have lots of JARs
in the classpath running on a slow file system.
More information about the hotspot-runtime-dev
mailing list