Questions about the Hermetic Java project
Magnus Ihse Bursie
magnus.ihse.bursie at oracle.com
Wed Feb 14 09:00:27 UTC 2024
First some background for build-dev: I have spent some time looking at
the build implications of the Hermetic Java effort, which is part of
Project Leyden. A high-level overview is available here:
https://cr.openjdk.org/~jiangli/hermetic_java.pdf and the current source
code is here: https://github.com/openjdk/leyden/tree/hermetic-java-runtime.
Hermetic Java faces several challenges, but the part that is relevant
for the build system is the ability to create static libraries. We've
had this functionality (in three different ways...) for some time, but
it is rather badly implemented.
As a result of my investigations, I have a bunch of questions. :-) I
have gotten some answers in private discussion, but for the sake of
transparency I will repeat them here, to foster an open dialogue.
1. Am I correct in understanding that the ultimate goal of this exercise
is to be able to have jmods which include static libraries (*.a) of the
native code which the module uses, and that the user can then run a
special jlink command to have this linked into a single executable
binary (which also bundles the *.class files and any additional
resources needed)?
2. If so, is the idea to create special kinds of static jmods, like
java.base-static.jmod, that contains *.a files instead of lib*.so files?
Or is the idea that the normal jmod should contain both?
3. Linking .o and .a files into an executable is a formidable task. Is
the intention to have jlink call a system-provided ld, or to bundle ld
with jlink, or to reimplement this functionality in Java?
4. Is the intention is to allow users to create their own jmods with
static libraries, and have these linked in as well? This seems to be the
case. If that is so, then there will always be the risk for name
collisions, and we can only minimize the risk by making sure any global
names are as unique as possible.
5. The original implementation of static builds in the JDK, created for
the Mobile project, used a configure flag, --enable-static-builds, to
change the entire behavior of the build system to only produce *.a files
instead of lib*.so. In contrast, the current system is using a special
target instead. In my eyes, this is a much worse solution. Apart from
the conceptual principle (if the build should generate static or dynamic
libraries is definitely a property of what a "configuration" means),
this makes it much harder to implement efficiently, since we cannot make
changes in NativeCompilation.gmk, where they are needed.
That was not as much a question as a statement. 🙂 But here is the
question: Do you think it would be reasonable to restore the old
behavior but with the new methods, so that we don't use special targets,
but instead tells configure to generate static libraries? I'm thinking
we should have a flag like "--with-library-type=" that can have values
"dynamic" (which is default), "static" or "both". I am not sure if
"both" are needed, but if we want to bundle both lib*.so and *.a files
into a single jmod file (see question 2 above), then it definitely is.
In general, the cost of producing two kinds of libraries are quite
small, compared to the cost of compiling the source code to object files.
Finally, I have looked at how to manipulate symbol visibility. There
seems many ways forward, so I feel confident that we can find a good
solution.
One way forward is to use objcopy to manipulate symbol status
(global/local). There is an option --localize-symbol in objcopy, that
has been available in objcopy since at least 2.15, which was released
2004, so it should be safe to use. But ideally we should avoid using
objcopy and do this as part of the linking process. This should be
possible to do, given that we make changes in NativeCompilation.gmk --
see question 5 above.
As a fallback, it is also possible to rename symbols, either piecewise
or wholesale, using objcopy. There are many ways to do this, using
--prefix-symbols, --redefine-sym or --redefine-syms (note the -s, this
takes a file with a list of symbols). Thus we can always introduce a
"post factum namespace" by renaming symbols.
So in the end, I think it will be fully possible to produce .a files
that only has global symbols for the functions that are part of the API
exposed by that library, and have all other symbols local, and make this
is in a way that is consistent with the rest of the build system.
Finally, a note on Hotspot. Due to debugging reasons, we export
basically all symbols in hotspot as global. This is not reasonable to do
for a static build. The effect of not exporting those symbols will be
that SA will not function to 100%. On the other hand, I have no idea if
SA works at all with a static build. Have you tested this? Is this part
of the plan to support, or will it be officially dropped for Hermetic Java?
/Magnus
More information about the leyden-dev
mailing list