RFC: Change CDS JAR file validation

Jiangli Zhou jianglizhou at google.com
Thu Sep 8 04:07:15 UTC 2022


On Thu, Sep 1, 2022 at 12:51 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>
> When utilizing CDS for tools in a cloud environment a few years back,
> we ran into the path checking issue. One of the main problems that we
> observed was that the mtime based check was not reliable. Internally,
> we've explored a few potential solutions. One of suggested ideas was
> to compute a checksum of the jar file and store the value in the zip
> central directory. Runtime can then validate the checksum. That can be
> reliable. It may require specification changes.

Some additional details about the idea using the zip central directory
for storing checksum (which was suggested by @martin):

The checksum would be computed at JAR creation time and stored in the
zip central directory. The checksum can be updated when the JAR file
is updated. At CDS image creation time, the JAR checksum could be
obtained and stored in the CDS image header. Runtime can then compare
the checksums to validate compatibility. The advantage is that it
avoids having to compute the checksum at both CDS creation time and
runtime. However, it requires JAR tools to be updated to support that.

>
> Another approach is to provide a runtime flag (e.g.
> -XX:+|-ValidateSharedClassPaths), which can be used to disable the
> problematic jar path checking in use cases where it is safe to do so.
> This is the approach that we have been using for tools that use CDS.
> The tools' JAR files and CDS images are built and released together.
> As the release progresses guarantee the compatibility between the JAR
> file and the CDS archive in these kinds of usages, we can safely
> disable the corresponding path checking. It also minimizes the related
> overhead. I'd like to contribute the related patch in the short term.

Created https://bugs.openjdk.org/browse/JDK-8293526 (apologizing for
duplicating with https://bugs.openjdk.org/browse/JDK-8284692).

Thanks,
Jiangli

>
> Thanks,
> Jiangli
>
> On Wed, Aug 31, 2022 at 9:47 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >
> > Proposal Summary:
> >
> > Use a digest of a JAR file to detect if the file has changed
> >
> >
> > Background
> > ==========
> >
> > CDS is in effect a caching mechanism -- it needs to make sure that the
> > InstanceKlasses stored in the archive are the same as those parsed from
> > classfiles.
> >
> > To do this, we archive only the classes from (a) the JDK's modules image
> > file and (b) JAR files. We don't archive classes in directories since
> > it's difficult to check if the contents of a directory have changed.
> >
> > At runtime, we assume that (a) didn't change, since we require the exact
> > same JDK build to be used.
> >
> > For (b) we currently do this:
> >
> > (1) Check that -classpath and -Xbootclasspath (absolute paths) are
> > identical between run time and dump time.
> > (2) For each JAR file in cp and bcp, check if its size and modification
> > time has changed.
> > (3) (Something similar happens with the module path ....)
> >
> > We have used this scheme for almost a decade. Note that we avoid reading
> > the JAR files as that may slow down start-up time on old spinning disks.
> > However, as most work-loads run on SSDs now, I believe this is no longer
> > a concern.
> >
> > Recently, we are seeing problems when people deploy CDS inside containers:
> >
> > For (1) the file system structure may different between run time and
> > dump time. We can kludge around this problem by using relative paths
> > instead of absolute paths, but this will make the existing code even
> > more complicated.
> >
> > For (2) when deploying the app, it may not be easy to keep the
> > modification time unchanged (see JDK-8284692).
> >
> >
> > Proposal
> > ========
> >
> > For (1) - don't not compare directory name anymore. Only check that the
> > filename is the same:
> >
> > E.g.
> >
> > Dump:
> >      java -Xshare:dump -cp dir1/Foo.jar:dir2/Bar.jar ..
> >
> > Run:
> >      java -cp dir1/Foo.jar:dir2/Bar.jar ...     [OK]
> > java -cp Foo.jar:Bar.jar ...               [OK]
> >      java -cp Foo.jar:Bxx.jar ... [Fail - changed from Bar.jar to Bxx.jar]
> >
> > For (2)
> >
> > - Check that file size has not changed.
> > - Compute a digest of the file. Check that this has not changed.
> >
> >
> > Digest
> > ======
> >
> > The purpose is not for security or (malicious) tamper detection. It's
> > for guarding against innocent mistakes (forgot to regenerate CDS archive
> > after JAR files have been updated). Therefore, we don't need to run an
> > expensive digest like MD5.
> >
> > Instead, it should be enough to just do a quick XOR digest of the first
> > 128 bytes of the JAR file. Since this part usually contains the
> > META-INF/ directory and its modification time. So it effectively
> > contains the time when this JAR file was created. The timestamp seems to
> > have a 2 second resolution:
> >
> > $ while true; do jar cfm foo.jar MANIFEST.MF HelloWorld.class ; head -c
> > 128 foo.jar | cksum; sleep 2; done
> > 3803507028 128
> > 1857545662 128
> > 916098721 128
> > 3740087168 128
> > 2260752543 128
> > 3257546723 128
> > 2584173820 128
> > ...
> >
> >
> > Advantage:
> >
> > - Make it easier to deploy CDS archive (fewer false negatives)
> > - Simplify logic in the CDS code
> >
> > Risks:
> >
> > - Opening every JAR file may cause a slow down if you have lots of JARs
> > in the classpath running on a slow file system.
> >


More information about the hotspot-runtime-dev mailing list