Bytecode transformation investigation

Fri Aug 5 17:49:31 UTC 2022

----- Original Message -----
> From: "Dan Heidinga" <heidinga at redhat.com>
> To: "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "leyden-dev" <leyden-dev at openjdk.java.net>
> Sent: Friday, August 5, 2022 4:49:31 PM
> Subject: Re: Bytecode transformation investigation

> Responding to one piece of this now as it's important to get everyone
> on the same page with the requirements.  And I know I've tripped over
> the "move fast, break things" philosophy multiple times in this space
> before coming to this conclusion.
> 
>> From a specification perspective, there are multiple separate specifications
>> viewpoints to consider: JLS, JDK and JVMS.  From a JLS perspective, I would say
>> that if the Java *compiler* were to do what your jlink plugin does, this would
>> be a reasonable way to implement a compiler for the Java language -- the
>> classfiles emitted would respect the semantics of the language.  There's
>> nothing that says a Java compiler has to translate lambdas with indy, or with
>> hidden classes, so if the indy never got generated, that's not a problem.
>>
>> From the JDK+JVMS perspective, it starts to get a little murky, and one of the
>> goals of Leyden is to bring more clarity to this area.  The compiler emits
>> certain classfiles with `invokedynamic`, and then some build-time tool rewrites
>> these classes to be different.  Is this OK?  If the build-time tool is just
>> "Dan's Magic Unofficial (Not) Java Bytecode Mangler", then this is the sort of
>> build time mangling people do every day.  But we want this to be an official
>> part of the platform, so I think there's a little more specification work to be
>> done to allow (and specify) such transformations.  This is not a deal breaker,
>> but we need to apply more thought here.  I think there are two categories of
>> new work here: some specification work to characterize what build-time
>> transformations like this are allowed to do or not do, and your transformer
>> will likely want a specification for what it does as well.
>>
> 
> What if we doubled down on treating all pre-runtime bytecode
> transformations as optional behaviours akin to "Dan's Magic Unofficial
> (Not) Java Bytecode Mangler" despite shiping with the platform?  Each
> transformation - jlink plugin? - could be self describing so users
> know what they are opting (the key point!) into when they enable the
> transformation.  This allows treating these transformations as a
> pre-step that has significant leeway on what it does provided the
> modified classfiles run correctly.

I don't think it should be run by jlink but more as a post process step of javac, more like annotation processors.
It will work with anything that using invokedynamic.

If the transformation are done by jlink, you can do more transformation, resolve Class.forName() / ServiceLoader by example, but you are in closed world assumption.

> 
> The JVM's role is then to load / verify / execute the classes as
> required by the application and defined by the JVM specification.
> Anything done to the classfiles prior to that is outside the JVM
> spec's remit.
> 
> This "user opt-in to transformations" model shrinks the two categories
> to one: specifying what a transformer does.  As the first
> "specification work to characterize what build-time transformations
> like this are allowed to do or not do" category is answered with
> "whatever they want, provided they generate valid classfiles".  And if
> the user is opting-in for an application-specific runtime (jlinked),
> then why not?
> 
> Although it's kind of satisfying to say we can do what we want here,
> it doesn't actually work.  Why? Because this model destroys any
> invariants built into the JDK platform.
> 
> Don't like how a method operates?  Transform it to do something else!
> Introduce bugs!  Open security holes!  It's trivially easy to break
> the platform invariants, get surprising results, or open subtle
> security holes here. Basically, all the concerns raised with Native
> Image's Substitution mechanism come into play here.  Though it's
> possible to do many of these things today with JVMTI agents or even
> user written jlink plugins (or historically by hand hacking rt.jar),
> it's less common because it's hard! and because users have been
> rightfully wary of what this can do to their applications.

Why do you want the user to be able to opt-in to an unbounded set of transformation ?
You can be far more restrictive by saying that you only have one javac flag to opt-in to a more "static" view of the world, using a bytecode transformer or not becomes an implementation details in that case.

> 
> Not to mention that Support Engineers will hate us if we take this
> approach as it's hard to argue something isn't a supported config if
> jdk ships the transformation that breaks the invariant.
> 
> All that to say, I think the "specification work to characterize what
> build-time transformations like this are allowed to do or not do" is
> important to this work actually being successful.

yes 

> 
> --Dan

Rémi