Loading native libraries from the classpath

Wed Jan 17 11:10:46 UTC 2024

Hi,

Apologies for breaking the thread, I saw this discussion in the
archives and subscribed to respond.

Viewing library unpacking and loading as a packaging problem is not
ideal. I say this as someone who has developed and sells a packaging
tool that has some extra support for JVM apps [1], as part of which it
supports using native libraries [2] so I have a lot of experience with
this by now.

Shipping JVM apps in any context is made _significantly_ more awkward
by the lack of leadership from the JDK around how native code is
handled, so it would be great if Panama would step up and tackle this
as the next step. As part of making this work well for end users and
developers, Conveyor has to do the following steps. It would be
excellent to standardize and eliminate the need for this complexity.

1. Scan all JARs to find native code modules. This is slow because
there are no standards for placement, so it has to be cached.

2. Detect what OS and CPU the module is for. This is difficult because
there are no naming standards, so Conveyor has to do binary sniffing
to figure it out. Some widely used Java libraries ship JNI libraries
for old or exotic operating systems, and unfortunately ELF's
mechanisms for marking what OS something targets don't really work. We
have some extremely complicated heuristics to correctly separate e.g.
binaries compiled for NetBSD from binaries compiled for Linux.

3. Rewrite the JAR to delete all the native code modules, as otherwise
you're bloating the package for no benefit.

5. Possibly sign the native libraries that were extracted, Conveyor
does this in a cross-platform way as otherwise this step requires OS
specific native tooling and then introduces a CI dependency for
packaging which is awkward/expensive to set up and maintain.

6. Move the code module to a place where the JVM can find it then
disable the libraries own unpacking logic, because not every library
is smart enough to try loading the library first and only unpack if
that fails. In practice this means compiling lists of magic system
properties and their values. Sometimes it's not even possible, and the
extraction mechanism must be disabled.

7. In case library extraction breaks the app, place the signed library
back into the JAR where the app expects it to be so it can unpack it
to a temporary directory.

This process is not only exceptionally complicated but also tends to
break things in subtle ways, for example, on macOS the OS dyld doesn't
like loading modules with mixed code signatures, so if a JAR comes
pre-signed and is then re-signed as part of packaging, but not
extracted, then the unpacking code will probably not notice that
there's a cache mismatch and re-extract which can cause loading to
fail.

And that's not all! We also have to ship a Gradle plugin that lets you
specify per-machine dependencies, because sometimes frameworks like to
ship libraries in separate container JARs that are selected by build
system plugins based on the OS doing the build, but to distribute you
need all the possible native libraries. And then some popular
libraries just need extra help on top of all that.

And that's still not all! All this assumes the library being loaded is
actually shipped with the app to begin with. Your app might load a JNI
wrapper for a library that's already provided by the OS, especially if
you only target Linux. So Conveyor scans native code to locate its
native code dependencies, and then adds package dependencies to the
generated debs (unfortunately RPM is not yet supported). If your code
accesses native libraries directly using Panama then there is nothing
for it but to add the package dependency manually.

As you can see all that is stuff that jlink probably isn't going to do
anytime soon (unless Oracle buys my company). Therefore making this a
packaging problem is really just saying that the only rational way to
ship JVM apps that use native code is to use commercial products to
outsource the complexity. Good for me perhaps, but not ideal for the
Java ecosystem.

To make improvements there are a couple of options:

1. Do nothing. Because jlink/jpackage don't solve this problem the
Java ecosystem suffers and continues to have a reputation for painful
deployment, which will steadily get worse with time as native code
usage goes up due it being easier now with Panama.

2. Staff up a project to solve this problem well, leaning into
established community practices. It should be quite cheap!

The obvious place to start would be to supply a library unpacker
utility class as part of Panama that uses standardized naming and
locations inside a JAR (not JMODs). That would allow for incremental
adoption by projects that already have their own unpacker code - first
PR, move libs around to the right places, second PR, use the built-in
unpacker when available and fall back to the old home grown unpacker
when not, third PR, remove the home grown unpacker once enough years
have passed and everyone is using a new enough JDK.

Maurizio has objected that sometimes unpacking isn't possible because
there's nowhere writeable to unpack to. That's true! But consider the
following:

a. There usually is! A huge number of apps require somewhere to stash
temporary or cache files and will break if that's not available.
Ensuring a writeable $HOME or $TMP can be made a responsibility of the
user.

b. Once conventions are established jlink can do library extraction
from JARs without needing complex search+sniff heuristics, ensuring
that libraries are a part of the pre-baked JVM image. So the use case
of a fully immutable system can also be satisfied quite easily. In
this case the Panama unpacker would just do nothing and load the
library immediately from the library path.

The rest of the code signing and packaging problem could be left for
commercial tools that do this job well already.

JMODs doesn't seem to have much of a role here: they don't have any
way to contain native code for multiple CPU archs at once and due to
the lack of classpath usability, the ecosystem doesn't support them
anyway. Plus support for even platform specific JARs is entirely DIY
via build system plugins, which is why when the native code files are
small they tend to just all be shipped in one JAR for simplicity. So
any improvement has to be something that fits with regular JARs.

I hope this email is useful and outlines some of the less often
discussed difficulties with the current approach. There's an
opportunity for JDK upgrades to really improve things here with
minimal effort, as long as the functionality meets community
requirements.

thanks,
-mike

[1] https://conveyor.hydraulic.dev/

[2] https://conveyor.hydraulic.dev/13.0/configs/jvm/#native-code