Bytecode transformation investigation
Dan Heidinga
heidinga at redhat.com
Fri Aug 5 14:49:31 UTC 2022
Responding to one piece of this now as it's important to get everyone
on the same page with the requirements. And I know I've tripped over
the "move fast, break things" philosophy multiple times in this space
before coming to this conclusion.
> From a specification perspective, there are multiple separate specifications viewpoints to consider: JLS, JDK and JVMS. From a JLS perspective, I would say that if the Java *compiler* were to do what your jlink plugin does, this would be a reasonable way to implement a compiler for the Java language -- the classfiles emitted would respect the semantics of the language. There's nothing that says a Java compiler has to translate lambdas with indy, or with hidden classes, so if the indy never got generated, that's not a problem.
>
> From the JDK+JVMS perspective, it starts to get a little murky, and one of the goals of Leyden is to bring more clarity to this area. The compiler emits certain classfiles with `invokedynamic`, and then some build-time tool rewrites these classes to be different. Is this OK? If the build-time tool is just "Dan's Magic Unofficial (Not) Java Bytecode Mangler", then this is the sort of build time mangling people do every day. But we want this to be an official part of the platform, so I think there's a little more specification work to be done to allow (and specify) such transformations. This is not a deal breaker, but we need to apply more thought here. I think there are two categories of new work here: some specification work to characterize what build-time transformations like this are allowed to do or not do, and your transformer will likely want a specification for what it does as well.
>
What if we doubled down on treating all pre-runtime bytecode
transformations as optional behaviours akin to "Dan's Magic Unofficial
(Not) Java Bytecode Mangler" despite shiping with the platform? Each
transformation - jlink plugin? - could be self describing so users
know what they are opting (the key point!) into when they enable the
transformation. This allows treating these transformations as a
pre-step that has significant leeway on what it does provided the
modified classfiles run correctly.
The JVM's role is then to load / verify / execute the classes as
required by the application and defined by the JVM specification.
Anything done to the classfiles prior to that is outside the JVM
spec's remit.
This "user opt-in to transformations" model shrinks the two categories
to one: specifying what a transformer does. As the first
"specification work to characterize what build-time transformations
like this are allowed to do or not do" category is answered with
"whatever they want, provided they generate valid classfiles". And if
the user is opting-in for an application-specific runtime (jlinked),
then why not?
Although it's kind of satisfying to say we can do what we want here,
it doesn't actually work. Why? Because this model destroys any
invariants built into the JDK platform.
Don't like how a method operates? Transform it to do something else!
Introduce bugs! Open security holes! It's trivially easy to break
the platform invariants, get surprising results, or open subtle
security holes here. Basically, all the concerns raised with Native
Image's Substitution mechanism come into play here. Though it's
possible to do many of these things today with JVMTI agents or even
user written jlink plugins (or historically by hand hacking rt.jar),
it's less common because it's hard! and because users have been
rightfully wary of what this can do to their applications.
Not to mention that Support Engineers will hate us if we take this
approach as it's hard to argue something isn't a supported config if
jdk ships the transformation that breaks the invariant.
All that to say, I think the "specification work to characterize what
build-time transformations like this are allowed to do or not do" is
important to this work actually being successful.
--Dan
More information about the leyden-dev
mailing list