From heidinga at redhat.com Tue Aug 2 20:30:38 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Tue, 2 Aug 2022 16:30:38 -0400 Subject: Bytecode transformation investigation Message-ID: When Mark kicked off the project, he wrote about the "spectrum of constraints" enabling optimizations that are weaker than those of the closed world constraint, but more broadly applicable. In line with that, I've been doing some investigation into bytecode transformations. While bytecode transformations are strictly less powerful than AOT, they provide a way to simplify the program we're running based on information available at build / deploy time. They allow us to move (some) dynamic behaviour from one phase (runtime) to an earlier one - behaviour such as reflective operations, runtime class generation, optional paths, etc can be simplified at the bytecode level based on information the author (or deployer) of the software knows without having to discover it at runtime. Great! But there's always a catch. And the primary catch here is that bytecode transformation can result in user visible changes. Before we go too far down the path of developing transformations, we should determine which user-visible changes are legitimate and where the lines need to be drawn. jlink experiment: ---------------------- As a starting point, I prototyped using jlink to transform Lambda expressions to use pre-generated classes rather than runtime generated ones. Lambda expressions * encode the lambda body as a private method in the defining class * use an invokedynamic instruction to dynamically pick the strategy for creating the lambda instances at runtime, and * encode a "recipe" combining MethodHandle, MethodType, Class and int arguments passed to the LambdaMetaFactory to actually generate the required class and create the lambda instance. None of the code outside the LambdaMetafactory (LMF) cares how the lambda is implemented as long as it meets the contract by implementing the correct interfaces and by calling the private implementation method. I modified the LMF internals to allow a jlink plugin to pre-generate the lambda classes [0], but doing so produces user-visible behaviour changes: 1) Lambda classes are no longer hidden anonymous classes. The LMF loaded the implementation class as a hidden, anonymous class. This meant Class.forName() can't find the class, that Class::isHidden()[1] returned true, and that the class was specially named [2]. With the pre-generated class, Class.forName can find the class, it is no longer hidden as it is loaded using normal class loading, and the name is a normal class name. [3] 2) The pregenerated class must be a Nest member peer to the defining class. Since Lambda implementation methods are private on the class that defines them, the pre-generated lambda class must be a nest peer of the defining class inorder to call them. Calls to Class:getNestHost on the lambda class may result in different answers between the two strategies. The nest host will also now include the pre-generated classes in its list of nest members for the pre-generated case. Users can observe this difference with the Class::getNestHost & ::getNestMember calls. 3) Stacktraces Classes generated by the LMF at runtime are not visible in stack traces. The pre-generated classes are visible. Users will be able to observe this with the StackWalker class and may notice the difference in any tools they use to process stack traces. This is the initial set of user visible changes I've run across in this experiment. There are likely other corner cases that I haven't hit yet, and other experiments will reveal other user visible differences. The key question out of this effort is whether these kinds of user-visible differences are "acceptable"? Where do we draw the line and how do we inform users of these differences? --Dan [0] https://github.com/DanHeidinga/jdk-sandbox/pull/1/files (prototype code) [1] https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Class.html#isHidden() [2] ex.mod.Example$$Lambda$23/0x0000000800c019f0 [3] ex.mod.Example$$Lambda$4 From forax at univ-mlv.fr Wed Aug 3 18:38:13 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 3 Aug 2022 20:38:13 +0200 (CEST) Subject: Bytecode transformation investigation In-Reply-To: References: Message-ID: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Dan Heidinga" > To: "leyden-dev" > Sent: Tuesday, August 2, 2022 10:30:38 PM > Subject: Bytecode transformation investigation > When Mark kicked off the project, he wrote about the "spectrum of > constraints" enabling optimizations that are weaker than those of the > closed world constraint, but more broadly applicable. In line with > that, I've been doing some investigation into bytecode > transformations. > > While bytecode transformations are strictly less powerful than AOT, > they provide a way to simplify the program we're running based on > information available at build / deploy time. They allow us to move > (some) dynamic behaviour from one phase (runtime) to an earlier one - > behaviour such as reflective operations, runtime class generation, > optional paths, etc can be simplified at the bytecode level based on > information the author (or deployer) of the software knows without > having to discover it at runtime. > > Great! But there's always a catch. And the primary catch here is > that bytecode transformation can result in user visible changes. > Before we go too far down the path of developing transformations, we > should determine which user-visible changes are legitimate and where > the lines need to be drawn. > > jlink experiment: > ---------------------- > As a starting point, I prototyped using jlink to transform Lambda > expressions to use pre-generated classes rather than runtime generated > ones. > > Lambda expressions > * encode the lambda body as a private method in the defining class > * use an invokedynamic instruction to dynamically pick the strategy > for creating the lambda instances at runtime, and > * encode a "recipe" combining MethodHandle, MethodType, Class and int > arguments passed to the LambdaMetaFactory to actually generate the > required class and create the lambda instance. > > None of the code outside the LambdaMetafactory (LMF) cares how the > lambda is implemented as long as it meets the contract by implementing > the correct interfaces and by calling the private implementation > method. > > I modified the LMF internals to allow a jlink plugin to pre-generate > the lambda classes [0], but doing so produces user-visible behaviour > changes: > > 1) Lambda classes are no longer hidden anonymous classes. > The LMF loaded the implementation class as a hidden, anonymous class. > This meant Class.forName() can't find the class, that > Class::isHidden()[1] returned true, and that the class was specially > named [2]. > With the pre-generated class, Class.forName can find the class, it is > no longer hidden as it is loaded using normal class loading, and the > name is a normal class name. [3] > > 2) The pregenerated class must be a Nest member peer to the defining class. > Since Lambda implementation methods are private on the class that > defines them, the pre-generated lambda class must be a nest peer of > the defining class inorder to call them. > Calls to Class:getNestHost on the lambda class may result in different > answers between the two strategies. The nest host will also now > include the pre-generated classes in its list of nest members for the > pre-generated case. > Users can observe this difference with the Class::getNestHost & > ::getNestMember calls. > > 3) Stacktraces > Classes generated by the LMF at runtime are not visible in stack > traces. The pre-generated classes are visible. > Users will be able to observe this with the StackWalker class and may > notice the difference in any tools they use to process stack traces. > > This is the initial set of user visible changes I've run across in > this experiment. There are likely other corner cases that I haven't > hit yet, and other experiments will reveal other user visible > differences. > > The key question out of this effort is whether these kinds of > user-visible differences are "acceptable"? Where do we draw the line > and how do we inform users of these differences? isHidden() returning false is a compatibility issue because i've seen it used has an equivalent of isALambda() (like isAnonymous() was used before isHidden()), GraalVM emulates isHidden() for this reason. For me, instead of trying to emulate those differences, i think it's easier here to provide a method Class.isLambdaProxy() and adds an empty classfile attribute LambdaProxy in the VM spec so both the lambda proxy generated using invokedynamic or pre-generated will mostly behave the same way. I'm afraid that Leyden will be exactly that, see how people are using a dynamic thingy, see how it can be emulated it at generation time, provide a way for library developers to see them the same way by bridging the gap between the two and also try to convince library developers that relying too much on implementation details is not a good idea. > > --Dan > > [0] https://github.com/DanHeidinga/jdk-sandbox/pull/1/files (prototype code) > [1] > https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Class.html#isHidden() > [2] ex.mod.Example$$Lambda$23/0x0000000800c019f0 > [3] ex.mod.Example$$Lambda$4 R?mi From heidinga at redhat.com Wed Aug 3 20:34:46 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 3 Aug 2022 16:34:46 -0400 Subject: Bytecode transformation investigation In-Reply-To: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> Message-ID: Thanks for the reply Remi. > isHidden() returning false is a compatibility issue because i've seen it used has an equivalent of isALambda() (like isAnonymous() was used before isHidden()), GraalVM emulates isHidden() for this reason. I see two parts to the handling of isHidden. The first is whether the *behaviour change* is a *compatibility issue* or an *implementation detail*. The fact that classes generated by the LMF happen to be hidden classes today seems like an implementation detail (especially given there is no compiler specification that outlines how language features need to be compiled to classfiles). How far do we want to commit to implementation details being compatibility constraints? I would posit the platform is moving away from treating implementation details in this way though there will be cases where they should be constraints. The second part is why is an "isALambda()" query useful? Can you expand on why it matters whether the instance in hand was created by explicitly implementing the interface or by the LMF? > > For me, instead of trying to emulate those differences, i think it's easier here to provide a method Class.isLambdaProxy() and adds an empty classfile attribute LambdaProxy in the VM spec so both the lambda proxy generated using invokedynamic or pre-generated will mostly behave the same way. > That's a potential solution (though a costly one in terms of extra VM & spec efforts) but can at best be a suggestion as other class generation tools could easily add the attribute to any class. I think we need to understand which implementation details should be treated as constraints and why this "isALambda" use case is important to preserve before we spec the solution. > I'm afraid that Leyden will be exactly that, see how people are using a dynamic thingy, see how it can be emulated it at generation time, provide a way for library developers to see them the same way by bridging the gap between the two and also try to convince library developers that relying too much on implementation details is not a good idea. > We agree on educating users not to rely on implementation details. And that there's going to be a number of implementation details that we trip over in doing bytecode transformation. Ideally, we end up with a consistent approach to decide which ones should be treated as constraints and which shouldn't. --Dan > > > > --Dan > > > > [0] https://github.com/DanHeidinga/jdk-sandbox/pull/1/files (prototype code) > > [1] > > https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Class.html#isHidden() > > [2] ex.mod.Example$$Lambda$23/0x0000000800c019f0 > > [3] ex.mod.Example$$Lambda$4 > > R?mi > From brian.goetz at oracle.com Wed Aug 3 23:26:38 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 3 Aug 2022 19:26:38 -0400 Subject: Bytecode transformation investigation In-Reply-To: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> Message-ID: <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> > isHidden() returning false is a compatibility issue because i've seen it used has an equivalent of isALambda() (like isAnonymous() was used before isHidden()), GraalVM emulates isHidden() for this reason. I'm not very sympathetic here.? Code that interprets isHidden in this way is just wrong.? There were extensive discussions about "how do I detect whether an object is a lambda" and the answer has consistently been "don't try, you don't need to know, and none of the mechanisms answer the question you are asking." > For me, instead of trying to emulate those differences, i think it's easier here to provide a method Class.isLambdaProxy() and adds an empty classfile attribute LambdaProxy in the VM spec so both the lambda proxy generated using invokedynamic or pre-generated will mostly behave the same way. We made a very clear decision in the JSR 335 EG -- that at runtime, lambdas are not a thing.? The question of "are you a lambda proxy" is no more interesting than "was it a tuesday when the source file for this class was last changed", and it was a deliberate choice to not provide any sort of reflection support here.? So I would not want to expose this; it's an implementation detail. From heidinga at redhat.com Thu Aug 4 12:57:01 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Thu, 4 Aug 2022 08:57:01 -0400 Subject: Bytecode transformation investigation In-Reply-To: <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> Message-ID: Hi Brian, Glad we're on the same page regarding isHidden being an implementation detail. Do the ::getNestHost & ::getNestMembers calls and stacktrace differences fall into the same implementation detail bucket in your mind? I'd be happy for the nest mates cases to be implementation details but would need to look closer at the intersection of stacktraces, @callerSensitive methods, and the SecurityManager to be certain stacktrace differences aren't making bigger problems. Any other areas concern with this kind of approach? --Dan On Wed, Aug 3, 2022 at 7:26 PM Brian Goetz wrote: > > > > isHidden() returning false is a compatibility issue because i've seen it used has an equivalent of isALambda() (like isAnonymous() was used before isHidden()), GraalVM emulates isHidden() for this reason. > > I'm not very sympathetic here. Code that interprets isHidden in this > way is just wrong. There were extensive discussions about "how do I > detect whether an object is a lambda" and the answer has consistently > been "don't try, you don't need to know, and none of the mechanisms > answer the question you are asking." > > > For me, instead of trying to emulate those differences, i think it's easier here to provide a method Class.isLambdaProxy() and adds an empty classfile attribute LambdaProxy in the VM spec so both the lambda proxy generated using invokedynamic or pre-generated will mostly behave the same way. > > We made a very clear decision in the JSR 335 EG -- that at runtime, > lambdas are not a thing. The question of "are you a lambda proxy" is no > more interesting than "was it a tuesday when the source file for this > class was last changed", and it was a deliberate choice to not provide > any sort of reflection support here. So I would not want to expose > this; it's an implementation detail. > From brian.goetz at oracle.com Thu Aug 4 16:36:27 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 4 Aug 2022 12:36:27 -0400 Subject: Bytecode transformation investigation In-Reply-To: References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> Message-ID: <2b4d5e3e-676d-a523-d8ed-b1b6dafa3d60@oracle.com> Yes, sorry for the delay, I've been trying to organize my thoughts on this. Overall I am very happy to see this investigation.? It is obviously relevant to a number of points across the Leyden spectrum.? I had done a related thought experiment at one point about behavioral differences, and came up with a similar list with respect to LambdaMetafactory: ?- proxy class goes from hidden to non-hidden; ?- perturbs the set of nestmates of both the proxy class and capturing class; ?- potentially perturbs the timing of loading the proxy class (though this can be controlled); ?- freezing of bootstrap behavior -- if the bootstrap behavior were to change between build time and runtime (e.g., different JDK), any changes wouldn't be reflected in the execution. Your "stack traces" observation wasn't on my list, so that's a good catch. The "freezing of behavior" one is likely to be common to a number of Leyden techniques, such as AOT.? The answer there likely involves the creation of some sort of coupling between the artifact and a specific JDK version.? Since the main mission of `jlink` is to create a runtime image with both an application and a specific JDK, this seems sensible but there is likely additional spec work needed here. Overall, none of these seem like show stoppers, but the devil is in the details.? There's categories of details here, too, such as implementation vs specification. The implementation details, such as "is it OK to have the lambda proxy class be findable via Class::forName" (even if the lambda is never captured!) need to at least be evaluated through the security lens; does it allow anyone to instantiate a lambda with bogus captured arguments?? I'm guessing no, because the constructor/factory is still private to the nest, but this is the sort of questions we'd have to ask ourselves.? My gut feeling says that these behavioral changes can be, as you suggest, framed as acceptable implementation variation. From a specification perspective, there are multiple separate specifications viewpoints to consider: JLS, JDK and JVMS.? From a JLS perspective, I would say that if the Java *compiler* were to do what your jlink plugin does, this would be a reasonable way to implement a compiler for the Java language -- the classfiles emitted would respect the semantics of the language.? There's nothing that says a Java compiler has to translate lambdas with indy, or with hidden classes, so if the indy never got generated, that's not a problem. From the JDK+JVMS perspective, it starts to get a little murky, and one of the goals of Leyden is to bring more clarity to this area.? The compiler emits certain classfiles with `invokedynamic`, and then some build-time tool rewrites these classes to be different. Is this OK?? If the build-time tool is just "Dan's Magic Unofficial (Not) Java Bytecode Mangler", then this is the sort of build time mangling people do every day. But we want this to be an official part of the platform, so I think there's a little more specification work to be done to allow (and specify) such transformations. This is not a deal breaker, but we need to apply more thought here.? I think there are two categories of new work here: some specification work to characterize what build-time transformations like this are allowed to do or not do, and your transformer will likely want a specification for what it does as well. As with related techniques such as intrinsification, we need to ensure that there are not going to be observable differences with respect to behavior specified by either JVMS or JDK, or that those differences are permissible under the specifications.? Some of the things to worry about here might be: ?- timing of loading the proxy class ?- observable side-effects of indy linkage ?- observable side-effects of bootstrap execution ?- conformance with LMF specification, not just for the code shapes emitted by `javac`, but for any code shape supported by LMF From a side-effects perspective, the answer might well be "there aren't any", but the claim "this code has no side-effects" is often both tricky to ascertain, and can easily become false over time as the code is evolved. As an example (I'm not worried about this one, but it is a good illustration), there's a system property, `jdk.internal.lambda.dumpProxyClasses`, which causes proxy class files to be dumped to the file system for debugging.? That's a side-effect of bootstrap execution that would not happen (or would happen at build time instead of run time).? As this one turns out, this is an implementation detail, not a specified behavior, but this is the sort of line-by-line analysis we'd have to do to convince ourselves that what we're doing is safe -- and watch how the bootstrap implementation evolves to keep it so. On 8/4/2022 8:57 AM, Dan Heidinga wrote: > Hi Brian, > > Glad we're on the same page regarding isHidden being an implementation > detail. Do the ::getNestHost & ::getNestMembers calls and stacktrace > differences fall into the same implementation detail bucket in your > mind? > > I'd be happy for the nest mates cases to be implementation details > but would need to look closer at the intersection of stacktraces, > @callerSensitive methods, and the SecurityManager to be certain > stacktrace differences aren't making bigger problems. Any other areas > concern with this kind of approach? > > --Dan > > On Wed, Aug 3, 2022 at 7:26 PM Brian Goetz wrote: >> >>> isHidden() returning false is a compatibility issue because i've seen it used has an equivalent of isALambda() (like isAnonymous() was used before isHidden()), GraalVM emulates isHidden() for this reason. >> I'm not very sympathetic here. Code that interprets isHidden in this >> way is just wrong. There were extensive discussions about "how do I >> detect whether an object is a lambda" and the answer has consistently >> been "don't try, you don't need to know, and none of the mechanisms >> answer the question you are asking." >> >>> For me, instead of trying to emulate those differences, i think it's easier here to provide a method Class.isLambdaProxy() and adds an empty classfile attribute LambdaProxy in the VM spec so both the lambda proxy generated using invokedynamic or pre-generated will mostly behave the same way. >> We made a very clear decision in the JSR 335 EG -- that at runtime, >> lambdas are not a thing. The question of "are you a lambda proxy" is no >> more interesting than "was it a tuesday when the source file for this >> class was last changed", and it was a deliberate choice to not provide >> any sort of reflection support here. So I would not want to expose >> this; it's an implementation detail. >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From heidinga at redhat.com Fri Aug 5 14:49:31 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Fri, 5 Aug 2022 10:49:31 -0400 Subject: Bytecode transformation investigation In-Reply-To: <2b4d5e3e-676d-a523-d8ed-b1b6dafa3d60@oracle.com> References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> <2b4d5e3e-676d-a523-d8ed-b1b6dafa3d60@oracle.com> Message-ID: Responding to one piece of this now as it's important to get everyone on the same page with the requirements. And I know I've tripped over the "move fast, break things" philosophy multiple times in this space before coming to this conclusion. > From a specification perspective, there are multiple separate specifications viewpoints to consider: JLS, JDK and JVMS. From a JLS perspective, I would say that if the Java *compiler* were to do what your jlink plugin does, this would be a reasonable way to implement a compiler for the Java language -- the classfiles emitted would respect the semantics of the language. There's nothing that says a Java compiler has to translate lambdas with indy, or with hidden classes, so if the indy never got generated, that's not a problem. > > From the JDK+JVMS perspective, it starts to get a little murky, and one of the goals of Leyden is to bring more clarity to this area. The compiler emits certain classfiles with `invokedynamic`, and then some build-time tool rewrites these classes to be different. Is this OK? If the build-time tool is just "Dan's Magic Unofficial (Not) Java Bytecode Mangler", then this is the sort of build time mangling people do every day. But we want this to be an official part of the platform, so I think there's a little more specification work to be done to allow (and specify) such transformations. This is not a deal breaker, but we need to apply more thought here. I think there are two categories of new work here: some specification work to characterize what build-time transformations like this are allowed to do or not do, and your transformer will likely want a specification for what it does as well. > What if we doubled down on treating all pre-runtime bytecode transformations as optional behaviours akin to "Dan's Magic Unofficial (Not) Java Bytecode Mangler" despite shiping with the platform? Each transformation - jlink plugin? - could be self describing so users know what they are opting (the key point!) into when they enable the transformation. This allows treating these transformations as a pre-step that has significant leeway on what it does provided the modified classfiles run correctly. The JVM's role is then to load / verify / execute the classes as required by the application and defined by the JVM specification. Anything done to the classfiles prior to that is outside the JVM spec's remit. This "user opt-in to transformations" model shrinks the two categories to one: specifying what a transformer does. As the first "specification work to characterize what build-time transformations like this are allowed to do or not do" category is answered with "whatever they want, provided they generate valid classfiles". And if the user is opting-in for an application-specific runtime (jlinked), then why not? Although it's kind of satisfying to say we can do what we want here, it doesn't actually work. Why? Because this model destroys any invariants built into the JDK platform. Don't like how a method operates? Transform it to do something else! Introduce bugs! Open security holes! It's trivially easy to break the platform invariants, get surprising results, or open subtle security holes here. Basically, all the concerns raised with Native Image's Substitution mechanism come into play here. Though it's possible to do many of these things today with JVMTI agents or even user written jlink plugins (or historically by hand hacking rt.jar), it's less common because it's hard! and because users have been rightfully wary of what this can do to their applications. Not to mention that Support Engineers will hate us if we take this approach as it's hard to argue something isn't a supported config if jdk ships the transformation that breaks the invariant. All that to say, I think the "specification work to characterize what build-time transformations like this are allowed to do or not do" is important to this work actually being successful. --Dan From forax at univ-mlv.fr Fri Aug 5 17:35:51 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 5 Aug 2022 19:35:51 +0200 (CEST) Subject: Bytecode transformation investigation In-Reply-To: <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> Message-ID: <1337937183.19043020.1659720951533.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Remi Forax" , "Dan Heidinga" > Cc: "leyden-dev" > Sent: Thursday, August 4, 2022 1:26:38 AM > Subject: Re: Bytecode transformation investigation >> isHidden() returning false is a compatibility issue because i've seen it used >> has an equivalent of isALambda() (like isAnonymous() was used before >> isHidden()), GraalVM emulates isHidden() for this reason. > > I'm not very sympathetic here.? Code that interprets isHidden in this > way is just wrong.? There were extensive discussions about "how do I > detect whether an object is a lambda" and the answer has consistently > been "don't try, you don't need to know, and none of the mechanisms > answer the question you are asking." > >> For me, instead of trying to emulate those differences, i think it's easier here >> to provide a method Class.isLambdaProxy() and adds an empty classfile attribute >> LambdaProxy in the VM spec so both the lambda proxy generated using >> invokedynamic or pre-generated will mostly behave the same way. > > We made a very clear decision in the JSR 335 EG -- that at runtime, > lambdas are not a thing.? The question of "are you a lambda proxy" is no > more interesting than "was it a tuesday when the source file for this > class was last changed", and it was a deliberate choice to not provide > any sort of reflection support here.? So I would not want to expose > this; it's an implementation detail. Okay, i was wrong here. R?mi From forax at univ-mlv.fr Fri Aug 5 17:38:05 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 5 Aug 2022 19:38:05 +0200 (CEST) Subject: Bytecode transformation investigation In-Reply-To: <2b4d5e3e-676d-a523-d8ed-b1b6dafa3d60@oracle.com> References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> <2b4d5e3e-676d-a523-d8ed-b1b6dafa3d60@oracle.com> Message-ID: <184503407.19043808.1659721085425.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Dan Heidinga" > Cc: "leyden-dev" > Sent: Thursday, August 4, 2022 6:36:27 PM > Subject: Re: Bytecode transformation investigation > Yes, sorry for the delay, I've been trying to organize my thoughts on this. > Overall I am very happy to see this investigation. It is obviously relevant to a > number of points across the Leyden spectrum. I had done a related thought > experiment at one point about behavioral differences, and came up with a > similar list with respect to LambdaMetafactory: > - proxy class goes from hidden to non-hidden; > - perturbs the set of nestmates of both the proxy class and capturing class; > - potentially perturbs the timing of loading the proxy class (though this can be > controlled); > - freezing of bootstrap behavior -- if the bootstrap behavior were to change > between build time and runtime (e.g., different JDK), any changes wouldn't be > reflected in the execution. > Your "stack traces" observation wasn't on my list, so that's a good catch. also, you can not change the fields of the classes by reflection and the field access are constant folded, the last one is vicious because it means that the performance are not the same. R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Fri Aug 5 17:49:31 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 5 Aug 2022 19:49:31 +0200 (CEST) Subject: Bytecode transformation investigation In-Reply-To: References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> <2b4d5e3e-676d-a523-d8ed-b1b6dafa3d60@oracle.com> Message-ID: <1395024911.19046050.1659721771245.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Dan Heidinga" > To: "Brian Goetz" > Cc: "leyden-dev" > Sent: Friday, August 5, 2022 4:49:31 PM > Subject: Re: Bytecode transformation investigation > Responding to one piece of this now as it's important to get everyone > on the same page with the requirements. And I know I've tripped over > the "move fast, break things" philosophy multiple times in this space > before coming to this conclusion. > >> From a specification perspective, there are multiple separate specifications >> viewpoints to consider: JLS, JDK and JVMS. From a JLS perspective, I would say >> that if the Java *compiler* were to do what your jlink plugin does, this would >> be a reasonable way to implement a compiler for the Java language -- the >> classfiles emitted would respect the semantics of the language. There's >> nothing that says a Java compiler has to translate lambdas with indy, or with >> hidden classes, so if the indy never got generated, that's not a problem. >> >> From the JDK+JVMS perspective, it starts to get a little murky, and one of the >> goals of Leyden is to bring more clarity to this area. The compiler emits >> certain classfiles with `invokedynamic`, and then some build-time tool rewrites >> these classes to be different. Is this OK? If the build-time tool is just >> "Dan's Magic Unofficial (Not) Java Bytecode Mangler", then this is the sort of >> build time mangling people do every day. But we want this to be an official >> part of the platform, so I think there's a little more specification work to be >> done to allow (and specify) such transformations. This is not a deal breaker, >> but we need to apply more thought here. I think there are two categories of >> new work here: some specification work to characterize what build-time >> transformations like this are allowed to do or not do, and your transformer >> will likely want a specification for what it does as well. >> > > What if we doubled down on treating all pre-runtime bytecode > transformations as optional behaviours akin to "Dan's Magic Unofficial > (Not) Java Bytecode Mangler" despite shiping with the platform? Each > transformation - jlink plugin? - could be self describing so users > know what they are opting (the key point!) into when they enable the > transformation. This allows treating these transformations as a > pre-step that has significant leeway on what it does provided the > modified classfiles run correctly. I don't think it should be run by jlink but more as a post process step of javac, more like annotation processors. It will work with anything that using invokedynamic. If the transformation are done by jlink, you can do more transformation, resolve Class.forName() / ServiceLoader by example, but you are in closed world assumption. > > The JVM's role is then to load / verify / execute the classes as > required by the application and defined by the JVM specification. > Anything done to the classfiles prior to that is outside the JVM > spec's remit. > > This "user opt-in to transformations" model shrinks the two categories > to one: specifying what a transformer does. As the first > "specification work to characterize what build-time transformations > like this are allowed to do or not do" category is answered with > "whatever they want, provided they generate valid classfiles". And if > the user is opting-in for an application-specific runtime (jlinked), > then why not? > > Although it's kind of satisfying to say we can do what we want here, > it doesn't actually work. Why? Because this model destroys any > invariants built into the JDK platform. > > Don't like how a method operates? Transform it to do something else! > Introduce bugs! Open security holes! It's trivially easy to break > the platform invariants, get surprising results, or open subtle > security holes here. Basically, all the concerns raised with Native > Image's Substitution mechanism come into play here. Though it's > possible to do many of these things today with JVMTI agents or even > user written jlink plugins (or historically by hand hacking rt.jar), > it's less common because it's hard! and because users have been > rightfully wary of what this can do to their applications. Why do you want the user to be able to opt-in to an unbounded set of transformation ? You can be far more restrictive by saying that you only have one javac flag to opt-in to a more "static" view of the world, using a bytecode transformer or not becomes an implementation details in that case. > > Not to mention that Support Engineers will hate us if we take this > approach as it's hard to argue something isn't a supported config if > jdk ships the transformation that breaks the invariant. > > All that to say, I think the "specification work to characterize what > build-time transformations like this are allowed to do or not do" is > important to this work actually being successful. yes > > --Dan R?mi From brian.goetz at oracle.com Fri Aug 5 20:39:40 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 5 Aug 2022 16:39:40 -0400 Subject: Bytecode transformation investigation In-Reply-To: <1395024911.19046050.1659721771245.JavaMail.zimbra@u-pem.fr> References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> <2b4d5e3e-676d-a523-d8ed-b1b6dafa3d60@oracle.com> <1395024911.19046050.1659721771245.JavaMail.zimbra@u-pem.fr> Message-ID: <12a803ab-4db4-4d75-9e79-e27c7e33bb82@oracle.com> Remi; I think this misses a bigger picture here.? A key goal of Leyden is that we be able to _selectively and flexibly constrain and shift dynamism_.? We don't want to force users to decide at compile time whether they want to AOT it, partially evaluate the program, gather profiling data, constrain away indy and other classloading, etc; we want them to be able to write their program and run it on a dynamic VM, as well as choosing to condense it (perhaps in a series of phases) to shift some behavior from runtime to an earlier phase, all completely optionally. This is why Dan has focused on jlink; because jlink is positioned as the thing you run when you're ready to accept some tighter coupling in exchange for a smaller or faster deployment unit.? So the choice of `jlink` here is entirely appropriate, and has the advantage that developers can do their develop-test-run cycle with the lightest possible build chain, and spend more cycles to condense the program to a smaller/faster one only when spending those cycles has a positive return. On 8/5/2022 1:49 PM, Remi Forax wrote: > ----- Original Message ----- >> From: "Dan Heidinga" >> To: "Brian Goetz" >> Cc: "leyden-dev" >> Sent: Friday, August 5, 2022 4:49:31 PM >> Subject: Re: Bytecode transformation investigation >> Responding to one piece of this now as it's important to get everyone >> on the same page with the requirements. And I know I've tripped over >> the "move fast, break things" philosophy multiple times in this space >> before coming to this conclusion. >> >>> From a specification perspective, there are multiple separate specifications >>> viewpoints to consider: JLS, JDK and JVMS. From a JLS perspective, I would say >>> that if the Java *compiler* were to do what your jlink plugin does, this would >>> be a reasonable way to implement a compiler for the Java language -- the >>> classfiles emitted would respect the semantics of the language. There's >>> nothing that says a Java compiler has to translate lambdas with indy, or with >>> hidden classes, so if the indy never got generated, that's not a problem. >>> >>> From the JDK+JVMS perspective, it starts to get a little murky, and one of the >>> goals of Leyden is to bring more clarity to this area. The compiler emits >>> certain classfiles with `invokedynamic`, and then some build-time tool rewrites >>> these classes to be different. Is this OK? If the build-time tool is just >>> "Dan's Magic Unofficial (Not) Java Bytecode Mangler", then this is the sort of >>> build time mangling people do every day. But we want this to be an official >>> part of the platform, so I think there's a little more specification work to be >>> done to allow (and specify) such transformations. This is not a deal breaker, >>> but we need to apply more thought here. I think there are two categories of >>> new work here: some specification work to characterize what build-time >>> transformations like this are allowed to do or not do, and your transformer >>> will likely want a specification for what it does as well. >>> >> What if we doubled down on treating all pre-runtime bytecode >> transformations as optional behaviours akin to "Dan's Magic Unofficial >> (Not) Java Bytecode Mangler" despite shiping with the platform? Each >> transformation - jlink plugin? - could be self describing so users >> know what they are opting (the key point!) into when they enable the >> transformation. This allows treating these transformations as a >> pre-step that has significant leeway on what it does provided the >> modified classfiles run correctly. > I don't think it should be run by jlink but more as a post process step of javac, more like annotation processors. > It will work with anything that using invokedynamic. > > If the transformation are done by jlink, you can do more transformation, resolve Class.forName() / ServiceLoader by example, but you are in closed world assumption. > >> The JVM's role is then to load / verify / execute the classes as >> required by the application and defined by the JVM specification. >> Anything done to the classfiles prior to that is outside the JVM >> spec's remit. >> >> This "user opt-in to transformations" model shrinks the two categories >> to one: specifying what a transformer does. As the first >> "specification work to characterize what build-time transformations >> like this are allowed to do or not do" category is answered with >> "whatever they want, provided they generate valid classfiles". And if >> the user is opting-in for an application-specific runtime (jlinked), >> then why not? >> >> Although it's kind of satisfying to say we can do what we want here, >> it doesn't actually work. Why? Because this model destroys any >> invariants built into the JDK platform. >> >> Don't like how a method operates? Transform it to do something else! >> Introduce bugs! Open security holes! It's trivially easy to break >> the platform invariants, get surprising results, or open subtle >> security holes here. Basically, all the concerns raised with Native >> Image's Substitution mechanism come into play here. Though it's >> possible to do many of these things today with JVMTI agents or even >> user written jlink plugins (or historically by hand hacking rt.jar), >> it's less common because it's hard! and because users have been >> rightfully wary of what this can do to their applications. > Why do you want the user to be able to opt-in to an unbounded set of transformation ? > You can be far more restrictive by saying that you only have one javac flag to opt-in to a more "static" view of the world, using a bytecode transformer or not becomes an implementation details in that case. > >> Not to mention that Support Engineers will hate us if we take this >> approach as it's hard to argue something isn't a supported config if >> jdk ships the transformation that breaks the invariant. >> >> All that to say, I think the "specification work to characterize what >> build-time transformations like this are allowed to do or not do" is >> important to this work actually being successful. > yes > >> --Dan > R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: From forax at univ-mlv.fr Fri Aug 5 21:18:57 2022 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 5 Aug 2022 23:18:57 +0200 (CEST) Subject: Bytecode transformation investigation In-Reply-To: <12a803ab-4db4-4d75-9e79-e27c7e33bb82@oracle.com> References: <764185537.18340734.1659551893567.JavaMail.zimbra@u-pem.fr> <50123447-5198-017a-41cc-f21d71c698d3@oracle.com> <2b4d5e3e-676d-a523-d8ed-b1b6dafa3d60@oracle.com> <1395024911.19046050.1659721771245.JavaMail.zimbra@u-pem.fr> <12a803ab-4db4-4d75-9e79-e27c7e33bb82@oracle.com> Message-ID: <1603403938.19082449.1659734337566.JavaMail.zimbra@u-pem.fr> > From: "Brian Goetz" > To: "Remi Forax" , "Dan Heidinga" > Cc: "leyden-dev" > Sent: Friday, August 5, 2022 10:39:40 PM > Subject: Re: Bytecode transformation investigation > Remi; > I think this misses a bigger picture here. A key goal of Leyden is that we be > able to _selectively and flexibly constrain and shift dynamism_. We don't want > to force users to decide at compile time whether they want to AOT it, partially > evaluate the program, gather profiling data, constrain away indy and other > classloading, etc; we want them to be able to write their program and run it on > a dynamic VM, as well as choosing to condense it (perhaps in a series of > phases) to shift some behavior from runtime to an earlier phase, all completely > optionally. > This is why Dan has focused on jlink; because jlink is positioned as the thing > you run when you're ready to accept some tighter coupling in exchange for a > smaller or faster deployment unit. So the choice of `jlink` here is entirely > appropriate, and has the advantage that developers can do their > develop-test-run cycle with the lightest possible build chain, and spend more > cycles to condense the program to a smaller/faster one only when spending those > cycles has a positive return. Thanks for the re-explaining the problem to me. The downside I see is that most of the tooling we have is not ready for that, nobody test after jlink, that's why people expect 100% compatibility with the bytecode generated by javac. I don't have the magic solution, i just know that doing the transformation at the same time as the generation of the classfiles is easier. R?mi > On 8/5/2022 1:49 PM, Remi Forax wrote: >> ----- Original Message ----- >>> From: "Dan Heidinga" [ mailto:heidinga at redhat.com | ] To: >>> "Brian Goetz" [ mailto:brian.goetz at oracle.com | ] Cc: >>> "leyden-dev" [ mailto:leyden-dev at openjdk.java.net | >>> ] Sent: Friday, August 5, 2022 4:49:31 PM >>> Subject: Re: Bytecode transformation investigation >>> Responding to one piece of this now as it's important to get everyone >>> on the same page with the requirements. And I know I've tripped over >>> the "move fast, break things" philosophy multiple times in this space >>> before coming to this conclusion. >>>> From a specification perspective, there are multiple separate specifications >>>> viewpoints to consider: JLS, JDK and JVMS. From a JLS perspective, I would say >>>> that if the Java *compiler* were to do what your jlink plugin does, this would >>>> be a reasonable way to implement a compiler for the Java language -- the >>>> classfiles emitted would respect the semantics of the language. There's >>>> nothing that says a Java compiler has to translate lambdas with indy, or with >>>> hidden classes, so if the indy never got generated, that's not a problem. >>>> From the JDK+JVMS perspective, it starts to get a little murky, and one of the >>>> goals of Leyden is to bring more clarity to this area. The compiler emits >>>> certain classfiles with `invokedynamic`, and then some build-time tool rewrites >>>> these classes to be different. Is this OK? If the build-time tool is just >>>> "Dan's Magic Unofficial (Not) Java Bytecode Mangler", then this is the sort of >>>> build time mangling people do every day. But we want this to be an official >>>> part of the platform, so I think there's a little more specification work to be >>>> done to allow (and specify) such transformations. This is not a deal breaker, >>>> but we need to apply more thought here. I think there are two categories of >>>> new work here: some specification work to characterize what build-time >>>> transformations like this are allowed to do or not do, and your transformer >>>> will likely want a specification for what it does as well. >>> What if we doubled down on treating all pre-runtime bytecode >>> transformations as optional behaviours akin to "Dan's Magic Unofficial >>> (Not) Java Bytecode Mangler" despite shiping with the platform? Each >>> transformation - jlink plugin? - could be self describing so users >>> know what they are opting (the key point!) into when they enable the >>> transformation. This allows treating these transformations as a >>> pre-step that has significant leeway on what it does provided the >>> modified classfiles run correctly. >> I don't think it should be run by jlink but more as a post process step of >> javac, more like annotation processors. >> It will work with anything that using invokedynamic. >> If the transformation are done by jlink, you can do more transformation, resolve >> Class.forName() / ServiceLoader by example, but you are in closed world >> assumption. >>> The JVM's role is then to load / verify / execute the classes as >>> required by the application and defined by the JVM specification. >>> Anything done to the classfiles prior to that is outside the JVM >>> spec's remit. >>> This "user opt-in to transformations" model shrinks the two categories >>> to one: specifying what a transformer does. As the first >>> "specification work to characterize what build-time transformations >>> like this are allowed to do or not do" category is answered with >>> "whatever they want, provided they generate valid classfiles". And if >>> the user is opting-in for an application-specific runtime (jlinked), >>> then why not? >>> Although it's kind of satisfying to say we can do what we want here, >>> it doesn't actually work. Why? Because this model destroys any >>> invariants built into the JDK platform. >>> Don't like how a method operates? Transform it to do something else! >>> Introduce bugs! Open security holes! It's trivially easy to break >>> the platform invariants, get surprising results, or open subtle >>> security holes here. Basically, all the concerns raised with Native >>> Image's Substitution mechanism come into play here. Though it's >>> possible to do many of these things today with JVMTI agents or even >>> user written jlink plugins (or historically by hand hacking rt.jar), >>> it's less common because it's hard! and because users have been >>> rightfully wary of what this can do to their applications. >> Why do you want the user to be able to opt-in to an unbounded set of >> transformation ? >> You can be far more restrictive by saying that you only have one javac flag to >> opt-in to a more "static" view of the world, using a bytecode transformer or >> not becomes an implementation details in that case. >>> Not to mention that Support Engineers will hate us if we take this >>> approach as it's hard to argue something isn't a supported config if >>> jdk ships the transformation that breaks the invariant. >>> All that to say, I think the "specification work to characterize what >>> build-time transformations like this are allowed to do or not do" is >>> important to this work actually being successful. >> yes >>> --Dan >> R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: