RFC: Change CDS JAR file validation
Jiangli Zhou
jianglizhou at google.com
Thu Sep 1 19:51:24 UTC 2022
When utilizing CDS for tools in a cloud environment a few years back,
we ran into the path checking issue. One of the main problems that we
observed was that the mtime based check was not reliable. Internally,
we've explored a few potential solutions. One of suggested ideas was
to compute a checksum of the jar file and store the value in the zip
central directory. Runtime can then validate the checksum. That can be
reliable. It may require specification changes.
Another approach is to provide a runtime flag (e.g.
-XX:+|-ValidateSharedClassPaths), which can be used to disable the
problematic jar path checking in use cases where it is safe to do so.
This is the approach that we have been using for tools that use CDS.
The tools' JAR files and CDS images are built and released together.
As the release progresses guarantee the compatibility between the JAR
file and the CDS archive in these kinds of usages, we can safely
disable the corresponding path checking. It also minimizes the related
overhead. I'd like to contribute the related patch in the short term.
Thanks,
Jiangli
On Wed, Aug 31, 2022 at 9:47 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
> Proposal Summary:
>
> Use a digest of a JAR file to detect if the file has changed
>
>
> Background
> ==========
>
> CDS is in effect a caching mechanism -- it needs to make sure that the
> InstanceKlasses stored in the archive are the same as those parsed from
> classfiles.
>
> To do this, we archive only the classes from (a) the JDK's modules image
> file and (b) JAR files. We don't archive classes in directories since
> it's difficult to check if the contents of a directory have changed.
>
> At runtime, we assume that (a) didn't change, since we require the exact
> same JDK build to be used.
>
> For (b) we currently do this:
>
> (1) Check that -classpath and -Xbootclasspath (absolute paths) are
> identical between run time and dump time.
> (2) For each JAR file in cp and bcp, check if its size and modification
> time has changed.
> (3) (Something similar happens with the module path ....)
>
> We have used this scheme for almost a decade. Note that we avoid reading
> the JAR files as that may slow down start-up time on old spinning disks.
> However, as most work-loads run on SSDs now, I believe this is no longer
> a concern.
>
> Recently, we are seeing problems when people deploy CDS inside containers:
>
> For (1) the file system structure may different between run time and
> dump time. We can kludge around this problem by using relative paths
> instead of absolute paths, but this will make the existing code even
> more complicated.
>
> For (2) when deploying the app, it may not be easy to keep the
> modification time unchanged (see JDK-8284692).
>
>
> Proposal
> ========
>
> For (1) - don't not compare directory name anymore. Only check that the
> filename is the same:
>
> E.g.
>
> Dump:
> java -Xshare:dump -cp dir1/Foo.jar:dir2/Bar.jar ..
>
> Run:
> java -cp dir1/Foo.jar:dir2/Bar.jar ... [OK]
> java -cp Foo.jar:Bar.jar ... [OK]
> java -cp Foo.jar:Bxx.jar ... [Fail - changed from Bar.jar to Bxx.jar]
>
> For (2)
>
> - Check that file size has not changed.
> - Compute a digest of the file. Check that this has not changed.
>
>
> Digest
> ======
>
> The purpose is not for security or (malicious) tamper detection. It's
> for guarding against innocent mistakes (forgot to regenerate CDS archive
> after JAR files have been updated). Therefore, we don't need to run an
> expensive digest like MD5.
>
> Instead, it should be enough to just do a quick XOR digest of the first
> 128 bytes of the JAR file. Since this part usually contains the
> META-INF/ directory and its modification time. So it effectively
> contains the time when this JAR file was created. The timestamp seems to
> have a 2 second resolution:
>
> $ while true; do jar cfm foo.jar MANIFEST.MF HelloWorld.class ; head -c
> 128 foo.jar | cksum; sleep 2; done
> 3803507028 128
> 1857545662 128
> 916098721 128
> 3740087168 128
> 2260752543 128
> 3257546723 128
> 2584173820 128
> ...
>
>
> Advantage:
>
> - Make it easier to deploy CDS archive (fewer false negatives)
> - Simplify logic in the CDS code
>
> Risks:
>
> - Opening every JAR file may cause a slow down if you have lots of JARs
> in the classpath running on a slow file system.
>
More information about the hotspot-runtime-dev
mailing list