Notes on packaging of native code in libraries
Hello, Recently there was a discussion of hermetic Java and how native libraries are packaged in Google's solution. The approach is Linux specific but once you leave that OS new issues emerge. We've just published a blog post that explores the problems that arise when distributing Java apps to platforms that require code signing: https://hydraulic.software/blog/11-in-jar-signing.html Conveyor is a build tool that layers on top of jlink and it does extensive work to turn JARs into something that can be shipped to Mac/Windows users successfully. The blog post discusses some of these challenges, which any approach to static Java will also face (assuming it's supported on more than just Linux of course). The core issues are related to code signing, and the practice of extracting libraries on the fly from JARs to caches or temp directories. Conveyor has two modes, one which does this extraction ahead of time (but many libraries don't expect this and must be explicitly configured via system properties, or they just break), and the other which code signs libraries inside the JARs. The latter is less likely to break apps but yields other problems. Neither approach is perfect. Especially once Panama ships the Java ecosystem could strongly benefit from a standardization of how native components are bundled into and loaded from libraries, as current build systems and the JVM tooling don't have much to say on the topic. The result is a lot of wheel reinvention across the ecosystem in the form of NativeLibraryLoader.java classes, always unique per project, and a bunch of bugs / developer friction that doesn't need to happen. The good news is that all this would be very cheap to improve. All that's needed is: - A defined layout for JARs (or JMODs) that standardizes where to place native libraries given an OS and CPU architecture. - Tooling that extracts native code to a user-specified directory that's then appended to the java.library.path at runtime (e.g. a flag to the java launcher?), so that once build systems learn to pass this flag or do the extraction themselves library authors can just deprecate and eventually remove all their custom loader code (which is large, complex, copy/pasted between projects and inconsistent). - Support for that mechanism in jlink. The current JMOD mechanism improved things over the prior situation, but unfortunately doesn't quite supply everything needed. A solution that works with JARs and supports multiple libraries in one JAR would be easily adopted.
On 24/02/2023 14:45, Mike Hearn wrote:
Especially once Panama ships the Java ecosystem could strongly benefit from a standardization of how native components are bundled into and loaded from libraries, as current build systems and the JVM tooling don't have much to say on the topic. The result is a lot of wheel reinvention across the ecosystem in the form of NativeLibraryLoader.java classes, always unique per project, and a bunch of bugs / developer friction that doesn't need to happen.
I tend to agree with the overall assessment. Shipping native libraries with Java projects is a known pain point, and it would be nice to have some solution for that. That being said, while I'm aware that the best way to make things work in today's world is by shipping native libraries in a jar, and then extract them _somewhere_, so that they can be loaded with `System::loadLibrary`, I'm not sure how much that can be viewed as a full solution, rather than a workaround. I can imagine cases where extracting libraries into a custom folder is not feasible (e.g. because of missing permissions). My general feeling is that, with jars and native libraries it's like trying to fit a round peg in a square hole: surely you can devise some pragmatic solution which makes things sort of work, but what you get is always a little brittle. If you look at what we did for jextract [1], the approach we used was different: jextract is written entirely in Java, but has a dependency (via Foreign Function & Memory API) on libclang. When we build jextract, we create a jmod [2] for jextract, with the native library for libclang in the right place. We then create a JDK image which contains jdk.compiler, java.base and the newly created jextract module. The resulting JDK will have the libraries in the right place. This means that we can provide a launcher simply by calling jextract's entry point using the custom JDK image. You can run jextract and run all jextract tests against this custom image, which then requires zero extra custom arguments passed on the command line (because the native libraries are added in the right place already). An approach such as this seems more promising than doing heroics with jarfiles, at least for applications, and one that could be more amenable to things like code signing (in jextract we don't do this, but I don't see why that could not be added). Maurizio [1] - https://jdk.java.net/jextract/ [2] - https://github.com/openjdk/jextract/blob/master/build.gradle
The good news is that all this would be very cheap to improve. All that's needed is:
* A defined layout for JARs (or JMODs) that standardizes where to place native libraries given an OS and CPU architecture.
* Tooling that extracts native code to a user-specified directory that's then appended to the java.library.path at runtime (e.g. a flag to the java launcher?), so that once build systems learn to pass this flag or do the extraction themselves library authors can just deprecate and eventually remove all their custom loader code (which is large, complex, copy/pasted between projects and inconsistent).
* Support for that mechanism in jlink.
Thanks! Yes creating JMODs is one way to do it, but if you were to distribute jextract as a library and not just an application you'd get user complaints because it'd take more work to consume: - JMODs are platform specific but build tools don't make it easy to select the right artifacts based on platforms. You can do it but it often requires custom Maven/Gradle plugins and such. - You can't put JMODs on the module path so users would now need to run jlink to get a JDK they could use, but build tools don't invoke jlink during the normal development cycle and don't easily support switching which JDK they use half way through based on the output of build tasks (maybe they should but last time I tried, they don't). You could write extra plugins to teach them to do that maybe, but then you'd hit performance problems - jlinking is a fairly heavy and slow operation, and doesn't cache anything. All this complexity is why some vendor JDKs pre-jlink JavaFX. - It requires libraries to be modular and jlinkable. Sadly this is often quite challenging :( e.g. out of the box a vanilla Spring Boot Web app can't be linked because the module graph is invalid. The Spring guys know and don't plan to fix it. See https://github.com/spring-projects/spring-boot/issues/33942 So this runs into a fundamental question that Jigsaw never really resolved and maybe Leyden needs to: is jlink meant to be a last step optimization process you run before distribution (today, yes) or is it meant to be an integrated part of the compile-and-test cycle? If it's the former you can get bugs that only appear at distribution time. If it's the latter then it needs a different performance and compatibility model. GraalVM Native Image faces the same problem: it's slow to compile and introduces new bugs, so it's tough to integrate it into the standard development and testing process. Ordinary HotSpot with a classpath/module path is so great for development because changing things about your app is just so darn fast, users are loath to lose that. Hence the proliferation of libraries that extract code on the fly. It's inelegant and creates other problems but it preserves the ultra-fast build loops that people love so much, and doesn't require special support from build systems. Sponsored Link: maybe jextract should be distributed with Conveyor ;) It'd be convenient to have installs that can keep themselves up to date, added to the path automatically etc. On Fri, 24 Feb 2023 at 16:09, Maurizio Cimadamore < maurizio.cimadamore@oracle.com> wrote:
On 24/02/2023 14:45, Mike Hearn wrote:
Especially once Panama ships the Java ecosystem could strongly benefit from a standardization of how native components are bundled into and loaded from libraries, as current build systems and the JVM tooling don't have much to say on the topic. The result is a lot of wheel reinvention across the ecosystem in the form of NativeLibraryLoader.java classes, always unique per project, and a bunch of bugs / developer friction that doesn't need to happen.
I tend to agree with the overall assessment. Shipping native libraries with Java projects is a known pain point, and it would be nice to have some solution for that. That being said, while I'm aware that the best way to make things work in today's world is by shipping native libraries in a jar, and then extract them _somewhere_, so that they can be loaded with `System::loadLibrary`, I'm not sure how much that can be viewed as a full solution, rather than a workaround. I can imagine cases where extracting libraries into a custom folder is not feasible (e.g. because of missing permissions). My general feeling is that, with jars and native libraries it's like trying to fit a round peg in a square hole: surely you can devise some pragmatic solution which makes things sort of work, but what you get is always a little brittle.
If you look at what we did for jextract [1], the approach we used was different: jextract is written entirely in Java, but has a dependency (via Foreign Function & Memory API) on libclang. When we build jextract, we create a jmod [2] for jextract, with the native library for libclang in the right place. We then create a JDK image which contains jdk.compiler, java.base and the newly created jextract module. The resulting JDK will have the libraries in the right place. This means that we can provide a launcher simply by calling jextract's entry point using the custom JDK image. You can run jextract and run all jextract tests against this custom image, which then requires zero extra custom arguments passed on the command line (because the native libraries are added in the right place already).
An approach such as this seems more promising than doing heroics with jarfiles, at least for applications, and one that could be more amenable to things like code signing (in jextract we don't do this, but I don't see why that could not be added).
Maurizio
[1] - https://jdk.java.net/jextract/ [2] - https://github.com/openjdk/jextract/blob/master/build.gradle
The good news is that all this would be very cheap to improve. All that's needed is:
- A defined layout for JARs (or JMODs) that standardizes where to place native libraries given an OS and CPU architecture.
- Tooling that extracts native code to a user-specified directory that's then appended to the java.library.path at runtime (e.g. a flag to the java launcher?), so that once build systems learn to pass this flag or do the extraction themselves library authors can just deprecate and eventually remove all their custom loader code (which is large, complex, copy/pasted between projects and inconsistent).
- Support for that mechanism in jlink.
On 24/02/2023 16:59, Mike Hearn wrote:
Thanks! Yes creating JMODs is one way to do it, but if you were to distribute jextract as a library and not just an application you'd get user complaints because it'd take more work to consume: Yep - I agree libraries are a different beast (in my email I specifically mentioned "application").
* JMODs are platform specific but build tools don't make it easy to select the right artifacts based on platforms. You can do it but it often requires custom Maven/Gradle plugins and such.
* You can't put JMODs on the module path so users would now need to run jlink to get a JDK they could use, but build tools don't invoke jlink during the normal development cycle and don't easily support switching which JDK they use half way through based on the output of build tasks (maybe they should but last time I tried, they don't). You could write extra plugins to teach them to do that maybe, but then you'd hit performance problems - jlinking is a fairly heavy and slow operation, and doesn't cache anything. All this complexity is why some vendor JDKs pre-jlink JavaFX.
* It requires libraries to be modular and jlinkable. Sadly this is often quite challenging :( e.g. out of the box a vanilla Spring Boot Web app can't be linked because the module graph is invalid. The Spring guys know and don't plan to fix it. See https://github.com/spring-projects/spring-boot/issues/33942 <https://urldefense.com/v3/__https://github.com/spring-projects/spring-boot/issues/33942__;!!ACWV5N9M2RV99hQ!PGVRhNK33FPB-AdBjX-73zOMCjfYscyXY5nxOYkjDfozfiCxl7ro2vkGZ-9Q-OtGSg4utZDaY6MGrIz5rbEh4UyAIKtl$>
It's true that jmod are mostly designed to be inputs to jlink. That said, my general feeling is that tools such as jlink are being under-used and, when used correctly they can simplify processes quite a bit. So perhaps investing in that direction might provide better dividends.
So this runs into a fundamental question that Jigsaw never really resolved and maybe Leyden needs to: is jlink meant to be a last step optimization process you run before distribution (today, yes) or is it meant to be an integrated part of the compile-and-test cycle?
I personally think that if jlink was used as part of compile-and-test cycle it would be for the betterment of mankind :-) I can't count the number of times where I was staring at a maven/gradle build file in confusion, trying to understand why a certain dependency wasn't being pulled in, or why the IDE had a "view" of the world that didn't match the one provided by Maven/Gradle. While IDEs have become quite good at this, there is still a lot of friction which, IMHO completely goes away if the artifact of a build is not just a bunch of classes, but a full JDK image. But, perhaps, we're getting away from the scope of this mailing list :-) Maurizio
If it's the former you can get bugs that only appear at distribution time. If it's the latter then it needs a different performance and compatibility model. GraalVM Native Image faces the same problem: it's slow to compile and introduces new bugs, so it's tough to integrate it into the standard development and testing process. Ordinary HotSpot with a classpath/module path is so great for development because changing things about your app is just so darn fast, users are loath to lose that. Hence the proliferation of libraries that extract code on the fly. It's inelegant and creates other problems but it preserves the ultra-fast build loops that people love so much, and doesn't require special support from build systems.
I have jextract setup with jlink for my daily work and honestly I can't complain about it being "slow". I can change code and run all tests and it feels quite instant and natural (and with very very little fiddling). The main issue is that all the build tools we know and love (Maven, Gradle) are _not_ designed around the idea of linked images - which makes it very difficult to do what we did (and I understand why somebody might look elsewhere rather than picking a fight with gradle to try and run jlink, and then use the generated image to run tests).
Sponsored Link: maybe jextract should be distributed with Conveyor ;) It'd be convenient to have installs that can keep themselves up to date, added to the path automatically etc.
Hehe Maurizio
On Fri, 24 Feb 2023 at 16:09, Maurizio Cimadamore <maurizio.cimadamore@oracle.com> wrote:
On 24/02/2023 14:45, Mike Hearn wrote:
Especially once Panama ships the Java ecosystem could strongly benefit from a standardization of how native components are bundled into and loaded from libraries, as current build systems and the JVM tooling don't have much to say on the topic. The result is a lot of wheel reinvention across the ecosystem in the form of NativeLibraryLoader.java classes, always unique per project, and a bunch of bugs / developer friction that doesn't need to happen.
I tend to agree with the overall assessment. Shipping native libraries with Java projects is a known pain point, and it would be nice to have some solution for that. That being said, while I'm aware that the best way to make things work in today's world is by shipping native libraries in a jar, and then extract them _somewhere_, so that they can be loaded with `System::loadLibrary`, I'm not sure how much that can be viewed as a full solution, rather than a workaround. I can imagine cases where extracting libraries into a custom folder is not feasible (e.g. because of missing permissions). My general feeling is that, with jars and native libraries it's like trying to fit a round peg in a square hole: surely you can devise some pragmatic solution which makes things sort of work, but what you get is always a little brittle.
If you look at what we did for jextract [1], the approach we used was different: jextract is written entirely in Java, but has a dependency (via Foreign Function & Memory API) on libclang. When we build jextract, we create a jmod [2] for jextract, with the native library for libclang in the right place. We then create a JDK image which contains jdk.compiler, java.base and the newly created jextract module. The resulting JDK will have the libraries in the right place. This means that we can provide a launcher simply by calling jextract's entry point using the custom JDK image. You can run jextract and run all jextract tests against this custom image, which then requires zero extra custom arguments passed on the command line (because the native libraries are added in the right place already).
An approach such as this seems more promising than doing heroics with jarfiles, at least for applications, and one that could be more amenable to things like code signing (in jextract we don't do this, but I don't see why that could not be added).
Maurizio
[1] - https://jdk.java.net/jextract/ <https://urldefense.com/v3/__https://jdk.java.net/jextract/__;!!ACWV5N9M2RV99hQ!PGVRhNK33FPB-AdBjX-73zOMCjfYscyXY5nxOYkjDfozfiCxl7ro2vkGZ-9Q-OtGSg4utZDaY6MGrIz5rbEh4WVwpbl5$> [2] - https://github.com/openjdk/jextract/blob/master/build.gradle <https://urldefense.com/v3/__https://github.com/openjdk/jextract/blob/master/build.gradle__;!!ACWV5N9M2RV99hQ!PGVRhNK33FPB-AdBjX-73zOMCjfYscyXY5nxOYkjDfozfiCxl7ro2vkGZ-9Q-OtGSg4utZDaY6MGrIz5rbEh4cHXAD9_$>
The good news is that all this would be very cheap to improve. All that's needed is:
* A defined layout for JARs (or JMODs) that standardizes where to place native libraries given an OS and CPU architecture.
* Tooling that extracts native code to a user-specified directory that's then appended to the java.library.path at runtime (e.g. a flag to the java launcher?), so that once build systems learn to pass this flag or do the extraction themselves library authors can just deprecate and eventually remove all their custom loader code (which is large, complex, copy/pasted between projects and inconsistent).
* Support for that mechanism in jlink.
participants (2)
-
Maurizio Cimadamore
-
Mike Hearn