RFR: 8311302: Allow for jlinking a custom runtime without packaged modules being present [v23]

Tue Apr 2 14:08:08 UTC 2024

On Tue, 19 Mar 2024 16:55:14 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this patch which adds a jlink mode to the JDK which doesn't need the packaged modules being present. A.k.a run-time image based jlink. Fundamentally this patch adds an option to use `jlink` even though your JDK install might not come with the packaged modules (directory `jmods`). This is particularly useful to further reduce the size of a jlinked runtime. After the removal of the concept of a JRE, a common distribution mechanism is still the full JDK with all modules and packaged modules. However, packaged modules can incur an additional size tax. For example in a container scenario it could be useful to have a base JDK container including all modules, but without also delivering the packaged modules. This comes at a size advantage of `~25%`. Such a base JDK container could then be used to `jlink` application specific runtimes, further reducing the size of the application runtime image (App + JDK runtime; as a single image *or* separate bundles, depending on the app 
 being modularized).
>> 
>> The basic design of this approach is to add a jlink plugin for tracking non-class and non-resource files of a JDK install. I.e. files which aren't present in the jimage (`lib/modules`). This enables producing a `JRTArchive` class which has all the info of what constitutes the final jlinked runtime.
>> 
>> Basic usage example:
>> 
>> $ diff -u <(./bin/java --list-modules --limit-modules java.se) <(../linux-x86_64-server-release/images/jdk/bin/java --list-modules --limit-modules java.se)
>> $ diff -u <(./bin/java --list-modules --limit-modules jdk.jlink) <(../linux-x86_64-server-release/images/jdk/bin/java --list-modules --limit-modules jdk.jlink)
>> $ ls ../linux-x86_64-server-release/images/jdk/jmods
>> java.base.jmod            java.net.http.jmod       java.sql.rowset.jmod      jdk.crypto.ec.jmod         jdk.internal.opt.jmod                     jdk.jdi.jmod         jdk.management.agent.jmod  jdk.security.auth.jmod
>> java.compiler.jmod        java.prefs.jmod          java.transaction.xa.jmod  jdk.dynalink.jmod          jdk.internal.vm.ci.jmod                   jdk.jdwp.agent.jmod  jdk.management.jfr.jmod    jdk.security.jgss.jmod
>> java.datatransfer.jmod    java.rmi.jmod            java.xml.crypto.jmod      jdk.editpad.jmod           jdk.internal.vm.compiler.jmod             jdk.jfr.jmod         jdk.management.jmod        jdk.unsupported.desktop.jmod
>> java.desktop.jmod         java.scripting.jmod      java.xml.jmod             jdk.hotspot.agent.jmod     jdk.i...
>
> Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Move CreateLinkableRuntimePlugin to build folder
>   
>   Keep runtime link supporting classes in package
>   jdk.tools.jlink.internal.runtimelink

I'm posting this for posterity, since I did some research on the `jimage` format in light of being able to re-create an existing `jimage` file with potentially a few resources being added. One use-case for this would be to add the "diff" data to an pre-existing, optimized `jimage` in `images/jdk/lib/modules`. The way that the `jimage` write algorithm works is based on the fact that it knows the resources and bytes at `jlink` time in **full**. Therefore, it can iterate over all resources, generate an index (and header) for them and then can serialize resource bytes as well as container (folder) information for them so as to support various iteration entry points. Since it's inherently relying on the header, index and container information at `jimage` read-time the format doesn't lend itself nicely for "appending". By adding just a few resources, the header, index, and container iteration information get invalidated. Therefore, a new `jimage` would need to be created, based on the old 
 `jimage`'s resources inferring `ResourcePoolEntry`s and then working with them. The additional resources would then get added and header/index/container info generated afresh. In order to support this, the existing `jimage` file would need to have sufficient information in it, to re-constitute a "similar" jimage from it (at the infer `ResourcePoolEntry` step). There are two specific issues that need to be overcome to support this: resource ordering and resource compression. There might be more, that I'm missing.

## Ordering of Resources

`jlink` supports ordering of resources. In other words, the `--order-resources` plugin instructs `jlink` to produce a `jimage` where resources will end up in the target `jimage` in a specified order. This can be observed by listing `jimage` contents sorted by content byte offsets. For example, the default JDK build for Linux adds this expression: `--order-resources=**module-info.class,@/path/to/build/output/support/link_opt/classlist,/java.base/java/**,/java.base/jdk/**,/java.base/sun/**,/java.base/com/**,/jdk.localedata/**` to the `jlink` invocation at build time. That is, `module-info.class` files come first in the resulting `jimage`, then class bytes as specified by CDS's `classlist` file, then `java.base/java` classes and so on. How would one re-constitute the desired ordering given only the `jimage` file as input? I think this could be solved by looking at the content byte offsets in the existing `jimage` for each resource. This would allow one to re-constitute the ordering in t
 he derived image. The question would be how ordering of the added resources should be handled in such a case.

## Compression of Resources

`jlink` suppports compression of resources. Currently, only `zip` and `compact-cp` are supported. There is still code handling multiple compressions. In fact, the `jimage` format itself doesn't put a bound on the number of levels of compressions. In other words, a resource could be compressed with the `zip` compressor multiple times or use the `compact-cp` compressor in conjunction with the `zip` compressor. In my investigation, I've focused on the `zip` compressor for now. In my research I've not found a reliable way to detect the **input level** compaction level, like `zip-1` by looking only at a `jimage` file. `jlink` uses `Inflater` and `Deflater` classes internally to decompress and compress respectively, but the `zlib` format doesn't seem to expose the 10 input levels (`zip-0` to `zip-9`). It only has a notion of it's three compression levels that's exposed in the `zlib` header. Fastest compression, fast compression and max compression (default compression level being the fourt
 h). However, for example `zip-1` compaction level has the following config in terms of `good`, `lazy`, `nice` and `chain` configs: `4`, `4`, `8`, `4` together with `deflate_fast` compression function (compress level exposed in the `zlib` header). One possible way to overcome this would be to support some kind of pass-through compressor in `jlink` code. I haven't looked much into `compact-cp` code, but the challenges there might be similar.

It's also worth noting that the current `jlink` code seems to support compression only for a select set of files. That would likely also need investigation how it could be handled. The "pass-through" idea could be viable for this too as compressed bytes would be "passed through" to the target `jimage`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14787#issuecomment-2032140826