jmods-less jlinking prototype

Wed Mar 15 15:02:49 UTC 2023

On Wed, 2023-03-15 at 12:36 +0000, Ron Pressler wrote:
> > On 15 Mar 2023, at 10:42, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> > Reducing JDK's size is important and would in our opinion be worth some
> > extra complexity in jlink code. Why?
> > 
> >   1. Allow recursive jlink runs (see above).
> 
> I understand it’s an added capability. I’d like to understand why
> it’s important.

>From [i] there are 3 critical properties:

"Condensation is composable." Consider jlink as a tool driving
condensation. The proposed prototype satisfies the composable property.
So its importance derives from that.

It seems conceivable that JDK developers would run some condensation.
If Leyden supports condensation for application developers, this jlink
approach could be a step in that direction. It enables that use-case on
a broader spectrum of JDK installs (not just the ones including jmods
files).

So in a way it would also support "Condensation is selectable".

I understand that this is a moot point if jlink ends up not being used
as a tool for condensation. The premise of this prototype work was that
it might be. Does that make sense?

> >   2. Installed JDK size is something everyone is paying a tax for,
> >      even though they might not even use jlink for their application
> >      needs.
> 
> But if they don’t use jlink, the easier solution is to just delete the jmod files.

That's right. Thus, our approach of having jmods in a separate RPM sub-
package. Users can selectively install jmods or not. However, that use-
case becomes harder for container images where image owners would
either install it by default or provide two different images (with and
without jmods). I've phrased it the way I did, since there seemed to
have been reservations on a jmods-less JDK install from your initial
reply. Therefore, my argument was in light of a full JDK install (of
which jmods are a part of). I hope that makes some sense.

> > For example installing the *full* JDK on Fedora or Red Hat
> >      Enterprise Linux by picking the 'java-17-openjdk-jmods' package,
> >      would have users download a whopping extra ~230MB of data.
> 
> If size is that important you can get an even bigger reduction by not
> including debug info.

Debug info is an important support tool. In our world, native libraries
of a base JDK install have debuginfos in-file (internal), get them
stripped from executables and shared libraries by the RPM/deb build
system and get transplanted into corresponding `-debuginfo` subpackages
so that they can get installed when need be. Not including debuginfo
would break this. Including it, has the issue described with jmods
size. But we are digressing...

> >   3. Considering a cloud setup where a full JDK container image is
> >      being used to generate an application specific image including
> >      the Java runtime, such a JDK container image would have to
> >      include the jmods archives. The full JDK container image is an
> >      infrastructure component in such a setup. Even a ~80MB extra for
> >      such images results in extra money needing to be spent (for
> >      storage or network bandwidth).
> 
> I still don’t understand. How many containers are used for building?
> Assuming a nice JDK build where jmods are 25%, we’re talking about a
> 25% difference in the bandwidth and storage for the *build* infra.
> How big of an impact is it?

How many containers? One per JDK version. Say for JDK versions 21, 17,
11 we'd have an image. Then each such image, receives quarterly updates
(at the very least; there are base image updates as well). So on a
cloud build system, updated images would get re-pulled at least once a
quarter so as to get the latest update. Add 25% in size to that and it
quickly adds up.

> 
> > What's more, the size difference
> >      makes using the same JDK image for application runtime - yes some
> >      users want the full JDK in containers - as well as for the build-
> >      your-own-application-image jlink use-case uncompelling.
> > 
> 
> This one is really confusing to me. If you’re concerned with runtime
> size, with jlink you can reduce the size to 40MB in total; that’s a
> much, much bigger impact than removing the jmod files.
> So if size is important, jlink has a far bigger positive impact than
> a negative one, and a bigger positive impact than what you’re
> proposing — running jlink reduces the size by 85% as opposed to 25% —
> and if you don’t want to use jlink you can just delete the jmod files
> and be done with it.

IMHO an important point in this scenario is that there are different
user groups (owning teams) involved. One group are application owners
(A), another JDK providers (B), and possibly a third group container
image providers (C). Groups need not be the same people (but could be
in some cases).

Going back to the cloud jlink use-case where the end goal would be for
group (A) to get application code into a container as small as possible
we have: input application source code (i) and output the application
binary with a jlink-ed runtime image all bundled up in a container
(ii). In order to do this, artifacts from (C) are being used (artifact
from group (C) use artifacts from group (B)). Note that group (A)
doesn't want/have access to the JDK themselves. It's provided to them
by (C) and they don't want to change/maintain their own image for this.

Now, group (A) builds their apps in the cloud. So they need a "builder
image" with jmods (from group (C)), since they will be using jlink
without them knowing it. So from that aspect, a size reduction for a
jlink-capable "builder image" is already a win. The end result of group
(A), namely (ii), would be even smaller, but you have to produce this
artifact first. Thus, the jlink prototype wins here.

Also, for group (C) it's a win in terms of maintenance when there need
not be two different container images for satisfying uses cases from
group (A) where they either 1) want just a runtime (full JDK) + app in
a container or 2) want to use jlink to create a container app image.

This would be a scenario where the "best of both worlds" approach wins.

> I understand that what you’re saying is that with a bit more
> complexity you can get the best of both worlds. It’s just that
> without more information about the impact, it’s unclear how
> significantly are both worlds better than just one world.

Understood.

Thanks,
Severin

[i] https://openjdk.org/projects/leyden/notes/02-shift-and-constrain