Call for Dicussion: JEP: Java Expression Trees API

Mon May 6 17:50:21 UTC 2024

Hi Konstantin,

> On May 4, 2024, at 4:06 AM, Konstantin Triger <kostat at gmail.com> wrote:
> 
> Hi Paul,
> 
> > I don’t think we should embed specific profiles of code into the platform.
> This makes sense depending on what you mean by "embed". Today there is no way to specify anything. How will a library be able to know what profile @CodeReflection annotated method or Quoted lambda should conform to?
> 

A library consuming code models specifies constraints on the Java programming model and the developer passing reflected code to that library is aware of those constraints. It may be possible in some cases for composition (code models in, code models out), for example a machine learning library could potentially leverage an auto-diff library. It’s still early days so it will be interesting to see how this plays out.

> --
> Code Model analysis thoughts.
> The current design defines a very lean set of Core ops and allows extensions (Extended ops), which basically go after language (e.g. Java) features. My understanding is that other eco languages should define their respective Extended ops.

“Core” and “Extended” are poorly named, eventually we will find better names. The core and extended operations are designed to support the modeling of Java programs. Other "eco languages" do not need to depend on core ops. I think it depends on the use case and the target environment. They can also define their own distinct set of operations (e.g., see the Triton example), so they would depend on the meta-code model.

You make an interesting point about lowering and I have been wondering about that myself. But, I am wary of moving outside the domain of Java. We (Babylon team) need to keep very focused, otherwise the problem space becomes very large, and it’s already large! The meta-code model provides generic modeling of control flow (connected basic blocks), so for Java’s modeling purposes I did not see the need to add more abstract operations, such as a more general (lowerable) loop operation. I was thinking other’s can define their own operations for such purposes, or reinterpret the Java operations.
(Note that in the Triton example we translate the for loop operation to a SSA-based counted loop. In an earlier design I tried to model Java high-level language constructs in SSA form. This quickly became complex, so backed off.)

However, even if we keep a Java focus perhaps there are more intermediate (perhaps optional steps) in this regard to make analysis of Java programs easier, which just so happens to be more generally useful, just like the meta-code model is I think more generally useful. Lowering to generic (further lower able) operations would make a great example and test case.

> For the library developer this poses a dilemma: should I work with Extended or Core ops? Extended might be much easier, but can be a very diverse + constantly moving target. On the other hand, the Core is very lean and low level.
> Maybe it would be possible to extend the Core set with more ops? For example JavaConditionalExpressionOp -> ConditionalExpressionOp. 
> It should be also possible to define a generic version of LoopExpressionOp, TryExpressionOp, SwitchExpressionOp, etc. (All these Ops can be lowerable to a simpler form)
> The idea is to have several levels of lowering. For example, JavaForOp and JavaWhileOp will lower to LoopExpressionOp first.
> If the "intermediate" Ops set will be convenient and stable, it will allow the analysis libraries to target it in many practical cases.
> 
> --
> I understand very well the security consideration of having Quotable interface. But it comes with a price of defining another interface and a potentially redundant hierarchy as a result.

That’s true, I suspect it will likely be more impactful that serializable lambdas.

> Just wanted to mention that there might be alternative implementations addressing this requirement.
> 1. Structural quoting (a la C#) - derives from the variable type, e.g. Quoted<T>. So, the user understands that the lambda body will be accessible.

There are some challenges with Java’s type system, which is why we embedded up where we are with the current approaches. Maurizio is better able than I to describe these challenges.

> 2. Special annotation applied on the variable or a method parameter.
> 

This is tricky. It’s not actually clear to me yet whether `@CodeReflection` on methods is the right way to go. We are skating on thin ice, as platform annotations should not change program behavior. We might need a contextual key word. Even so it may be a deep cut to apply beyond methods declarations themselves, and we would still need to wire up reflectable lambda bodies into the type system. (I suspect we are likely revisit some of the design discussions on serializable lambdas in that respect.) 

Probably Quotable should not have any methods on it, and one has to use a reflection API to access the quoted instance. Thereby avoiding situations where the developer explicitly implements the quotable functional interface using a class, from which we may be able to enforce the method implementation to be reflectable.

Paul.

> Thanks,
> Kosta
> 
> On Thu, May 2, 2024 at 2:00 AM Paul Sandoz <paul.sandoz at oracle.com> wrote:
> Hi Konstantin,
> 
> I don’t think we should embed specific profiles of code into the platform. This is something that libraries should get to define and we work out how to connect those libraries into the compilation process (like previously mentioned to report errors). We have not put much effort into working out that integration, it’s less of a priority for us right now. IMO a higher priority would be exploring the analysis required on code models to help support such profiling. This would help ensure the core code model design is fit for purpose and comes with useful APIs.
> 
> One analysis area I am interested in is producing a dependency tree, which in effect can produce the expression tree, since after all the expressions are there, they just encoded differently. IIRC I believe Gary has explored some of that to help lift up to OpenCL C 99 source code. Given a code model body we should be able to ask the question “does the code in this body conform to an expression?” and perhaps "if so please given me the tree structure of the operations because I prefer it in the form”. We have the low-level means to traverse the data dependencies (down from a value to its uses, and up from a result to its operands), that’s a fundamental part of the model and we can build up from that. More generally we can apply such analysis to values within a body e.g., what’s the expression tree, if any, for the value assigned to this variable.
> 
> --
> 
> Quotable behaves similarly to Serializable. It tells the compiler this is special kind of lambda expression, and thus the instance of the functional interface has more capabilities, in this case the lambda body being made accessible as a code model. Importantly it also tells the reader of source code that the writer of source code explicitly opted to code being made accessible. Such a quotable lambda expression conforms to the existing lambda expression targeting behavior, so the compiler will verify the lambda expression is target compatible. We also have support for structural quoting where a lambda expression can be targeted directly to Quoted, in such cases there is no functional interface associated with the lambda expression. I can see how both may be useful in different cases, but we are not fully certain about the latter. 
> 
> Paul.
> 
> [*] More info…
> 
> FWIW pattern expressions are modeled as trees encoded as operations and dependencies. So far it has worked out well, and it is proven easy to traverse (backwards in the code model from the root pattern operation upwards through the operands) and lower into executable code. Here’s a link to example for nested record patterns and a link to the lowering code:
> 
> https://github.com/openjdk/babylon/blob/code-reflection/test/langtools/tools/javac/reflect/PatternsTest.java#L169
> 
> https://github.com/openjdk/babylon/blob/code-reflection/src/java.base/share/classes/java/lang/reflect/code/op/ExtendedOps.java#L2671
> (You may be able to observe for records it walking up the code model tree by recursively traversing the operands of the record operation.)
> 
> I have also used some simple analysis for transforming a counted for loop into SSA-based for operation (Triton example) where the expressions within the initialization, condition, and step bodies need to be analyzed and some of them hoisted out.
> 
> 
> > On May 1, 2024, at 2:42 PM, Konstantin Triger <kostat at gmail.com> wrote:
> > 
> > Hi Paul,
> > 
> > Thanks a lot for your extensive response, it's very much appreciated.
> > I totally agree with you that the Code Model's MLIR-like approach addresses very well all the "imperative" (i.e. when the target is an imperative language/runtime) translations.
> > 
> > My call is about "declarative" translations. Into this category fall SQL (and other QLs), fluent configurations and "general" declarative DSLs.
> > For these, the MLIR approach is too low level. Reconstruction of mathematical expressions is a fairly challenging task, in addition to other issues I mentioned in previous mails.
> > Basically, the developer needs first to eliminate branches and other "incompatible" ops and then perform compatibility checks. In principle, this is what I do today, but it would be nice to get some compile time validation and tooling support.
> > 
> > All this makes me think of an opportunity to address "declarative" translation cases better inside the project Babylon.
> > For example, there can be a "declarative" profile, which would prohibit certain instructions and put some additional constraints (TBD) on what can be inside the method. To enable this profile @CodeReflection might have a parameter. Also there should be a mechanism to specify that a certain method accepts Quotable lambda with "declarative" profile; potentially @CodeReflection targets might include PARAMETER and LOCAL_VARIABLE? (Honestly I didn't fully understand why there is a need in a magical Quotable interface, it feels like a potential limitation and the lambda's Quoted object could be retrieved by other means).
> > 
> > Then there can be an external library that will convert Code Model to a potentially better representation, e.g. Expressions. Ideally it could be part of JDK, but just having the profile is "good enough" for the beginning.
> > 
> > Thanks,
> > Kosta
> > 
> > 
> > 
> > 
> > On Mon, Apr 29, 2024 at 11:32 PM Paul Sandoz <paul.sandoz at oracle.com> wrote:
> > Hi Konstantin,
> > 
> > Thanks for looking.
> > 
> > A significant challenge when devising the Babylon code model is ensuring it can support a wide variety of use cases for analysis and transformation. Inevitably because of its broad scope it will never be as convenient as something devised for a specific use-case. However, I believe it can be applied effectively to your use case, where you can use it as a more appropriate form to analyze and transform Java code, as opposed to doing so using bytecode. In many Babylon use cases I would expect translation into something more specific for the domain, whether that is a specific kind of model, or using the same Babylon code model capabilities with domain specific operations (see later for some examples).
> > 
> > You have a constrained programming model. The developer can always express more in their Java source files than the constrained programming model allows. So you have to verify and report errors when the developer uses some language construct, or particular expression of, that is not supported by the programming model. IIUC since you are lifting from bytecode there will be bytecode instructions, produced from unsupported language constructs, that you don’t support and will reject. If you translated from a code model you would do something similar. Ideally such errors should be reported at compile time, rather than at run time. This is something we need to explore more fully in Babylon. At the moment we have rudimentary access to models at compile time that are also accessible at run time. We need to work out how a Java library can participate in analyzing models at compile time, to say, reject and report errors when a model contains unsupported operations or structural patterns.
> > 
> > There are similarities between Class-File API and Babylon. The building and translation of code models was inspired by the Class-File API. But the way code is modeled in Babylon is very different from byte code. Take for example the expression you mention:
> > 
> > @CodeReflection
> > static int nestedConditionalExpr(int i, int j) {
> >   return (i < 2) ? (j < 3) ? i : j : i + j;
> > }
> > 
> > The code model produced by the source compiler preserves the nested structure of the conditional expressions, as modeled by the java.cexpression operation. It’s AST-like, closer to the compiler’s AST, but the tree is more shallow. If the developer does not want to handle the specifics of those operations they can transform and lower the model into one which results in a control flow graph of (seven) connected (basic) blocks. This is more compiler-like, closer to bytecode, but with useful properties for control flow and data dependency analysis. We can go even more compiler-like by translating into pure SSA form, which may further aid analysis. Those three models are all represented in the same meta-code model. In the latter two cases I expect it would be more challenging to transform to an SQL CASE expression, but I would further expect it would be even more challenging to do so from bytecode. (Note I am not an expert on SQL!)
> > The textual serialization of those three models is presented at the end of the email. 
> > 
> > It is possible to translate a code model originating from Java source into a code model associated with some other domain. We are exploring a number of examples of this, primarily focused on the GPU use-case but which I think show what is generally possible:
> > 
> > 1. Translating Java code models to SPIRV code models, from which we further translate to SPIRV binaries. (See the example in the Babylon repo.)
> > 2. Translating Java code models to Triton code models, the intent being to then translate into Triton MLIR and compile to executable code using the MLIR toolchain. (See the example in the Babylon repo, and the article on the Babylon project page.)
> > 
> > In other examples we translate into different forms of model:
> > 
> > 3. Translating Java code models OpenCL C code. This is an example of where we prefer to operate on high-level Java code models since it makes the problem easier. (This is part of our GPU investigations. We will be open sourcing that some time this year.)
> > 4. Translating Java code models to bytecode and bytecode to Java code models. (This is part of the Babylon repo. WIP.)
> > 
> > (Note when I say Java code models I mean code models that preserve Java program meaning.)
> > 
> > Hope this gives some more perspective and you find it informative,
> > Paul.
> > 
> > 
> > func @"nestedConditionalExpr" @loc="15:5:file:/Users/sandoz/Projects/jdk/test/babylon-test/src/main/java/X.java" (%0 : int, %1 : int)int -> {
> >     %2 : Var<int> = var %0 @"i" @loc="15:5";
> >     %3 : Var<int> = var %1 @"j" @loc="15:5";
> >     %4 : int = java.cexpression @loc="17:16"
> >         ()boolean -> {
> >             %5 : int = var.load %2 @loc="17:17";
> >             %6 : int = constant @"2" @loc="17:21";
> >             %7 : boolean = lt %5 %6 @loc="17:17";
> >             yield %7 @loc="17:16";
> >         }
> >         ()int -> {
> >             %8 : int = java.cexpression @loc="17:26"
> >                 ()boolean -> {
> >                     %9 : int = var.load %3 @loc="17:27";
> >                     %10 : int = constant @"3" @loc="17:31";
> >                     %11 : boolean = lt %9 %10 @loc="17:27";
> >                     yield %11 @loc="17:26";
> >                 }
> >                 ()int -> {
> >                     %12 : int = var.load %2 @loc="17:36";
> >                     yield %12 @loc="17:26";
> >                 }
> >                 ()int -> {
> >                     %13 : int = var.load %3 @loc="17:40";
> >                     yield %13 @loc="17:26";
> >                 };
> >             yield %8 @loc="17:16";
> >         }
> >         ()int -> {
> >             %14 : int = var.load %2 @loc="17:44";
> >             %15 : int = var.load %3 @loc="17:48";
> >             %16 : int = add %14 %15 @loc="17:44";
> >             yield %16 @loc="17:16";
> >         };
> >     return %4 @loc="17:9";
> > };
> > 
> > func @"nestedConditionalExpr" @loc="15:5:file:/Users/sandoz/Projects/jdk/test/babylon-test/src/main/java/X.java" (%0 : int, %1 : int)int -> {
> >     %2 : Var<int> = var %0 @"i" @loc="15:5";
> >     %3 : Var<int> = var %1 @"j" @loc="15:5";
> >     %4 : int = var.load %2 @loc="17:17";
> >     %5 : int = constant @"2" @loc="17:21";
> >     %6 : boolean = lt %4 %5 @loc="17:17";
> >     cbranch %6 ^block_0 ^block_1;
> > 
> >   ^block_0:
> >     %7 : int = var.load %3 @loc="17:27";
> >     %8 : int = constant @"3" @loc="17:31";
> >     %9 : boolean = lt %7 %8 @loc="17:27";
> >     cbranch %9 ^block_2 ^block_3;
> > 
> >   ^block_2:
> >     %10 : int = var.load %2 @loc="17:36";
> >     branch ^block_4(%10);
> > 
> >   ^block_3:
> >     %11 : int = var.load %3 @loc="17:40";
> >     branch ^block_4(%11);
> > 
> >   ^block_4(%12 : int):
> >     branch ^block_5(%12);
> > 
> >   ^block_1:
> >     %13 : int = var.load %2 @loc="17:44";
> >     %14 : int = var.load %3 @loc="17:48";
> >     %15 : int = add %13 %14 @loc="17:44";
> >     branch ^block_5(%15);
> > 
> >   ^block_5(%16 : int):
> >     return %16 @loc="17:9";
> > };
> > 
> > func @"nestedConditionalExpr" @loc="15:5:file:/Users/sandoz/Projects/jdk/test/babylon-test/src/main/java/X.java" (%0 : int, %1 : int)int -> {
> >     %2 : int = constant @"2" @loc="17:21";
> >     %3 : boolean = lt %0 %2 @loc="17:17";
> >     cbranch %3 ^block_0 ^block_1;
> > 
> >   ^block_0:
> >     %4 : int = constant @"3" @loc="17:31";
> >     %5 : boolean = lt %1 %4 @loc="17:27";
> >     cbranch %5 ^block_2 ^block_3;
> > 
> >   ^block_2:
> >     branch ^block_4(%0);
> > 
> >   ^block_3:
> >     branch ^block_4(%1);
> > 
> >   ^block_4(%6 : int):
> >     branch ^block_5(%6);
> > 
> >   ^block_1:
> >     %7 : int = add %0 %1 @loc="17:44";
> >     branch ^block_5(%7);
> > 
> >   ^block_5(%8 : int):
> >     return %8 @loc="17:9";
> > };
> > 
> > 
> > > On Apr 23, 2024, at 2:47 PM, Konstantin Triger <kostat at gmail.com> wrote:
> > > 
> > > Thank you all for your comments. Indeed, similar goals expressed differently. Unfortunately, I wasn't aware of the Babylon project and could not comment before.
> > > I did a review of the Code Model and do have a few concerns regarding usability of MLIR-like approach for libraries such as FluentJPA and FluentMongo and full freedom of what can be inside the lambda. Let me explain.
> > >     • 
> > > My journey was "FluentJPA first" and not the code model first. 
> > >     • This project goal is an ability to fully express SQL in Java, so I first prototyped the "end game", i.e. some SQL statements written in Java.
> > >     • Then I built the desired "Code Model" classes and interfaces. Actually, I was quite confident that I would not be too challenged by byte code transformation into the Code Model since I had some experience in the field.
> > >     • My primary concern was the ability to generate SQL from the Model.
> > >     • Why? There are several C# Database LINQ providers. All of them solve a much, much simpler problem - the ability to transform just a single expression to SQL. Multi lines, variables, functions are not allowed.
> > >     • I wanted all these goodies available bacause otherwise I would not be able to say "SQL written in Java".
> > >     • So I carefully added more capabilities to what can be inside "SQL lambda" (compared to what C# compile time Expression Tree parser can have inside the lambda).
> > >     • On the other hand, there are many Java capabilities that are not translatable to SQL at all and should not be allowed in this context.
> > >     • These additions added huge complexity to the SQL transformer, so I'm quite confident that if Babylon aims to be a foundation to transpile JBC into something like SQL, there must be a mechanism to enforce some restrictions on what can be inside the Lambda.
> > >     • You may reasonably say that the FluentJPA library developer should filter out inappropriate Ops.
> > >     • Well, here is the first problem - it would mean there is no way for the tooling to perform any runtime validation and ensure that no "unsupported" constructs exist inside the method.
> > >     • Next problem is branching. Consider the following expression (borrowed from the Babylon test suite - nestedConditionalExpr method): (i < 2) ? (j < 3) ? i : j : i + j
> > >         • Basically it's a completely valid SQL/Mongo construct that should be mapped to e.g. CASE statement in SQL.
> > >         • Babylon Code Model generates 6 blocks with complex branching between them. Do you think it would be easy to transpile it to CASE?
> > >         • With expression tree it maps quite well.
> > >         • I deliberately don't support BranchExpression (GotoExpression in C#) due to associated complexity. The decompiler performs some work to convert branching into binary tree with conditions. If it fails, a method is considered not compatible with SQL/Mongo.
> > >     • The next problem is MLIR-like approach. Clearly, it's convenient to the JDK developers and allows straight forward mapping to/from JBC. Is it useful for transpiling, i.e. producing high level language code? I'm not sure. And vice versa, if I need to produce a machine level code - I feel at home.
> > >     • The final problem is logical transformation/normalization. Consider 2 BigDecimals that we compare using compareTo method. Clearly, we would want them to be compared using an appropriate operator, but JBC does not allow that. When transpiling to SQL I normalize this sort of constructs ahead of the main transpiler. On a larger scale, consider an ecosystem language that might have more numeric types, support operators, etc. All this is usually compiled to the corresponding runtime library methods. Ideally it should be possible to normalize it to a condition. I'm not an expert in Code Model, but my impression is that it's not possible to express something that does not really exist in JBC.
> > > Overall, I don't feel that Code Model provides any significant upgrade on top of Class-File API beyond the standard way to get the "quoted" object. I will have to transform the Code Model to expressions first to address the issues raised above.
> > > Don't misunderstand me, the Code Model introduced in Babylon is great and a huge step forward. I can find a lot of use cases where it would be especially useful. My point is that it's not THE perfect fit for projects like FluentJPA/FluentMongo and basically any transpiler to a simple high-level language.
> > > 
> > > If you find it useful, I'll be more than happy to have a conversation, show and explain different use cases, provide pros/cons, etc.
> > > 
> > > Thanks,
> > > Kosta
> > > 
> > > 
> > > 
> > > On Mon, Apr 22, 2024 at 9:25 PM Paul Sandoz <paul.sandoz at oracle.com> wrote:
> > > Hi,
> > > 
> > > Yes, very similar goals expressed differently e.g., the modeling of Java code (Babylon's code model is more MLIR-like rather than C# expression tree-like). Please see the articles linked to from the Babylon project page. One of the goals is to enable developers to write libraries such as FluentJPA and FluentMongo. If you are willing to experiment it would be very interesting to see if Babylon’s code reflection can easily support those libraries you have written.
> > > 
> > > Paul.
> > > 
> > > > On Apr 21, 2024, at 7:11 PM, liangchenblue at gmail.com wrote:
> > > > 
> > > > Hi Konstantin,
> > > > What you propose has a large overlap with Project Babylon (https://openjdk.org/projects/babylon/), which accomplishes "code reflection" from the Java compiler. The project itself also ships a code model that's suitable for representing ASTs from different programming languages, including ones from Java bytecode. Then your proposal would be much simpler - to generate an expression tree from class files.
> > > > 
> > > > Since the discuss list isn't really for development questions, I am replying to babylon-dev mailing list instead and we will sail from there.
> > > > 
> > > > Regards
> > > > 
> > > 
> > > 
> > > -- 
> > > Regards,
> > > Konstantin Triger
> > 
> > 
> > 
> > -- 
> > Regards,
> > Konstantin Triger
> 
> 
> 
> -- 
> Regards,
> Konstantin Triger