Bytecode transformation investigation

Fri Aug 5 21:18:57 UTC 2022

> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>, "Dan Heidinga" <heidinga at redhat.com>
> Cc: "leyden-dev" <leyden-dev at openjdk.java.net>
> Sent: Friday, August 5, 2022 10:39:40 PM
> Subject: Re: Bytecode transformation investigation

> Remi;

> I think this misses a bigger picture here. A key goal of Leyden is that we be
> able to _selectively and flexibly constrain and shift dynamism_. We don't want
> to force users to decide at compile time whether they want to AOT it, partially
> evaluate the program, gather profiling data, constrain away indy and other
> classloading, etc; we want them to be able to write their program and run it on
> a dynamic VM, as well as choosing to condense it (perhaps in a series of
> phases) to shift some behavior from runtime to an earlier phase, all completely
> optionally.

> This is why Dan has focused on jlink; because jlink is positioned as the thing
> you run when you're ready to accept some tighter coupling in exchange for a
> smaller or faster deployment unit. So the choice of `jlink` here is entirely
> appropriate, and has the advantage that developers can do their
> develop-test-run cycle with the lightest possible build chain, and spend more
> cycles to condense the program to a smaller/faster one only when spending those
> cycles has a positive return.
Thanks for the re-explaining the problem to me. 

The downside I see is that most of the tooling we have is not ready for that, nobody test after jlink, that's why people expect 100% compatibility with the bytecode generated by javac. 

I don't have the magic solution, i just know that doing the transformation at the same time as the generation of the classfiles is easier. 

Rémi 

> On 8/5/2022 1:49 PM, Remi Forax wrote:

>> ----- Original Message -----

>>> From: "Dan Heidinga" [ mailto:heidinga at redhat.com | <heidinga at redhat.com> ] To:
>>> "Brian Goetz" [ mailto:brian.goetz at oracle.com | <brian.goetz at oracle.com> ] Cc:
>>> "leyden-dev" [ mailto:leyden-dev at openjdk.java.net |
>>> <leyden-dev at openjdk.java.net> ] Sent: Friday, August 5, 2022 4:49:31 PM
>>> Subject: Re: Bytecode transformation investigation

>>> Responding to one piece of this now as it's important to get everyone
>>> on the same page with the requirements.  And I know I've tripped over
>>> the "move fast, break things" philosophy multiple times in this space
>>> before coming to this conclusion.

>>>> From a specification perspective, there are multiple separate specifications
>>>> viewpoints to consider: JLS, JDK and JVMS.  From a JLS perspective, I would say
>>>> that if the Java *compiler* were to do what your jlink plugin does, this would
>>>> be a reasonable way to implement a compiler for the Java language -- the
>>>> classfiles emitted would respect the semantics of the language.  There's
>>>> nothing that says a Java compiler has to translate lambdas with indy, or with
>>>> hidden classes, so if the indy never got generated, that's not a problem.

>>>> From the JDK+JVMS perspective, it starts to get a little murky, and one of the
>>>> goals of Leyden is to bring more clarity to this area.  The compiler emits
>>>> certain classfiles with `invokedynamic`, and then some build-time tool rewrites
>>>> these classes to be different.  Is this OK?  If the build-time tool is just
>>>> "Dan's Magic Unofficial (Not) Java Bytecode Mangler", then this is the sort of
>>>> build time mangling people do every day.  But we want this to be an official
>>>> part of the platform, so I think there's a little more specification work to be
>>>> done to allow (and specify) such transformations.  This is not a deal breaker,
>>>> but we need to apply more thought here.  I think there are two categories of
>>>> new work here: some specification work to characterize what build-time
>>>> transformations like this are allowed to do or not do, and your transformer
>>>> will likely want a specification for what it does as well.

>>> What if we doubled down on treating all pre-runtime bytecode
>>> transformations as optional behaviours akin to "Dan's Magic Unofficial
>>> (Not) Java Bytecode Mangler" despite shiping with the platform?  Each
>>> transformation - jlink plugin? - could be self describing so users
>>> know what they are opting (the key point!) into when they enable the
>>> transformation.  This allows treating these transformations as a
>>> pre-step that has significant leeway on what it does provided the
>>> modified classfiles run correctly.

>> I don't think it should be run by jlink but more as a post process step of
>> javac, more like annotation processors.
>> It will work with anything that using invokedynamic.

>> If the transformation are done by jlink, you can do more transformation, resolve
>> Class.forName() / ServiceLoader by example, but you are in closed world
>> assumption.

>>> The JVM's role is then to load / verify / execute the classes as
>>> required by the application and defined by the JVM specification.
>>> Anything done to the classfiles prior to that is outside the JVM
>>> spec's remit.

>>> This "user opt-in to transformations" model shrinks the two categories
>>> to one: specifying what a transformer does.  As the first
>>> "specification work to characterize what build-time transformations
>>> like this are allowed to do or not do" category is answered with
>>> "whatever they want, provided they generate valid classfiles".  And if
>>> the user is opting-in for an application-specific runtime (jlinked),
>>> then why not?

>>> Although it's kind of satisfying to say we can do what we want here,
>>> it doesn't actually work.  Why? Because this model destroys any
>>> invariants built into the JDK platform.

>>> Don't like how a method operates?  Transform it to do something else!
>>> Introduce bugs!  Open security holes!  It's trivially easy to break
>>> the platform invariants, get surprising results, or open subtle
>>> security holes here. Basically, all the concerns raised with Native
>>> Image's Substitution mechanism come into play here.  Though it's
>>> possible to do many of these things today with JVMTI agents or even
>>> user written jlink plugins (or historically by hand hacking rt.jar),
>>> it's less common because it's hard! and because users have been
>>> rightfully wary of what this can do to their applications.

>> Why do you want the user to be able to opt-in to an unbounded set of
>> transformation ?
>> You can be far more restrictive by saying that you only have one javac flag to
>> opt-in to a more "static" view of the world, using a bytecode transformer or
>> not becomes an implementation details in that case.

>>> Not to mention that Support Engineers will hate us if we take this
>>> approach as it's hard to argue something isn't a supported config if
>>> jdk ships the transformation that breaks the invariant.

>>> All that to say, I think the "specification work to characterize what
>>> build-time transformations like this are allowed to do or not do" is
>>> important to this work actually being successful.

>> yes

>>> --Dan

>> Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20220805/229c33a3/attachment.htm>