Call for Dicussion: JEP: Java Expression Trees API

Mon Apr 29 20:32:37 UTC 2024

Hi Konstantin,

Thanks for looking.

A significant challenge when devising the Babylon code model is ensuring it can support a wide variety of use cases for analysis and transformation. Inevitably because of its broad scope it will never be as convenient as something devised for a specific use-case. However, I believe it can be applied effectively to your use case, where you can use it as a more appropriate form to analyze and transform Java code, as opposed to doing so using bytecode. In many Babylon use cases I would expect translation into something more specific for the domain, whether that is a specific kind of model, or using the same Babylon code model capabilities with domain specific operations (see later for some examples).

You have a constrained programming model. The developer can always express more in their Java source files than the constrained programming model allows. So you have to verify and report errors when the developer uses some language construct, or particular expression of, that is not supported by the programming model. IIUC since you are lifting from bytecode there will be bytecode instructions, produced from unsupported language constructs, that you don’t support and will reject. If you translated from a code model you would do something similar. Ideally such errors should be reported at compile time, rather than at run time. This is something we need to explore more fully in Babylon. At the moment we have rudimentary access to models at compile time that are also accessible at run time. We need to work out how a Java library can participate in analyzing models at compile time, to say, reject and report errors when a model contains unsupported operations or structural patterns.

There are similarities between Class-File API and Babylon. The building and translation of code models was inspired by the Class-File API. But the way code is modeled in Babylon is very different from byte code. Take for example the expression you mention:

@CodeReflection
static int nestedConditionalExpr(int i, int j) {
  return (i < 2) ? (j < 3) ? i : j : i + j;
}

The code model produced by the source compiler preserves the nested structure of the conditional expressions, as modeled by the java.cexpression operation. It’s AST-like, closer to the compiler’s AST, but the tree is more shallow. If the developer does not want to handle the specifics of those operations they can transform and lower the model into one which results in a control flow graph of (seven) connected (basic) blocks. This is more compiler-like, closer to bytecode, but with useful properties for control flow and data dependency analysis. We can go even more compiler-like by translating into pure SSA form, which may further aid analysis. Those three models are all represented in the same meta-code model. In the latter two cases I expect it would be more challenging to transform to an SQL CASE expression, but I would further expect it would be even more challenging to do so from bytecode. (Note I am not an expert on SQL!)
The textual serialization of those three models is presented at the end of the email. 

It is possible to translate a code model originating from Java source into a code model associated with some other domain. We are exploring a number of examples of this, primarily focused on the GPU use-case but which I think show what is generally possible:

1. Translating Java code models to SPIRV code models, from which we further translate to SPIRV binaries. (See the example in the Babylon repo.)
2. Translating Java code models to Triton code models, the intent being to then translate into Triton MLIR and compile to executable code using the MLIR toolchain. (See the example in the Babylon repo, and the article on the Babylon project page.)

In other examples we translate into different forms of model:

3. Translating Java code models OpenCL C code. This is an example of where we prefer to operate on high-level Java code models since it makes the problem easier. (This is part of our GPU investigations. We will be open sourcing that some time this year.)
4. Translating Java code models to bytecode and bytecode to Java code models. (This is part of the Babylon repo. WIP.)

(Note when I say Java code models I mean code models that preserve Java program meaning.)

Hope this gives some more perspective and you find it informative,
Paul.

func @"nestedConditionalExpr" @loc="15:5:file:/Users/sandoz/Projects/jdk/test/babylon-test/src/main/java/X.java" (%0 : int, %1 : int)int -> {
    %2 : Var<int> = var %0 @"i" @loc="15:5";
    %3 : Var<int> = var %1 @"j" @loc="15:5";
    %4 : int = java.cexpression @loc="17:16"
        ()boolean -> {
            %5 : int = var.load %2 @loc="17:17";
            %6 : int = constant @"2" @loc="17:21";
            %7 : boolean = lt %5 %6 @loc="17:17";
            yield %7 @loc="17:16";
        }
        ()int -> {
            %8 : int = java.cexpression @loc="17:26"
                ()boolean -> {
                    %9 : int = var.load %3 @loc="17:27";
                    %10 : int = constant @"3" @loc="17:31";
                    %11 : boolean = lt %9 %10 @loc="17:27";
                    yield %11 @loc="17:26";
                }
                ()int -> {
                    %12 : int = var.load %2 @loc="17:36";
                    yield %12 @loc="17:26";
                }
                ()int -> {
                    %13 : int = var.load %3 @loc="17:40";
                    yield %13 @loc="17:26";
                };
            yield %8 @loc="17:16";
        }
        ()int -> {
            %14 : int = var.load %2 @loc="17:44";
            %15 : int = var.load %3 @loc="17:48";
            %16 : int = add %14 %15 @loc="17:44";
            yield %16 @loc="17:16";
        };
    return %4 @loc="17:9";
};

func @"nestedConditionalExpr" @loc="15:5:file:/Users/sandoz/Projects/jdk/test/babylon-test/src/main/java/X.java" (%0 : int, %1 : int)int -> {
    %2 : Var<int> = var %0 @"i" @loc="15:5";
    %3 : Var<int> = var %1 @"j" @loc="15:5";
    %4 : int = var.load %2 @loc="17:17";
    %5 : int = constant @"2" @loc="17:21";
    %6 : boolean = lt %4 %5 @loc="17:17";
    cbranch %6 ^block_0 ^block_1;

  ^block_0:
    %7 : int = var.load %3 @loc="17:27";
    %8 : int = constant @"3" @loc="17:31";
    %9 : boolean = lt %7 %8 @loc="17:27";
    cbranch %9 ^block_2 ^block_3;

  ^block_2:
    %10 : int = var.load %2 @loc="17:36";
    branch ^block_4(%10);

  ^block_3:
    %11 : int = var.load %3 @loc="17:40";
    branch ^block_4(%11);

  ^block_4(%12 : int):
    branch ^block_5(%12);

  ^block_1:
    %13 : int = var.load %2 @loc="17:44";
    %14 : int = var.load %3 @loc="17:48";
    %15 : int = add %13 %14 @loc="17:44";
    branch ^block_5(%15);

  ^block_5(%16 : int):
    return %16 @loc="17:9";
};

func @"nestedConditionalExpr" @loc="15:5:file:/Users/sandoz/Projects/jdk/test/babylon-test/src/main/java/X.java" (%0 : int, %1 : int)int -> {
    %2 : int = constant @"2" @loc="17:21";
    %3 : boolean = lt %0 %2 @loc="17:17";
    cbranch %3 ^block_0 ^block_1;

  ^block_0:
    %4 : int = constant @"3" @loc="17:31";
    %5 : boolean = lt %1 %4 @loc="17:27";
    cbranch %5 ^block_2 ^block_3;

  ^block_2:
    branch ^block_4(%0);

  ^block_3:
    branch ^block_4(%1);

  ^block_4(%6 : int):
    branch ^block_5(%6);

  ^block_1:
    %7 : int = add %0 %1 @loc="17:44";
    branch ^block_5(%7);

  ^block_5(%8 : int):
    return %8 @loc="17:9";
};

> On Apr 23, 2024, at 2:47 PM, Konstantin Triger <kostat at gmail.com> wrote:
> 
> Thank you all for your comments. Indeed, similar goals expressed differently. Unfortunately, I wasn't aware of the Babylon project and could not comment before.
> I did a review of the Code Model and do have a few concerns regarding usability of MLIR-like approach for libraries such as FluentJPA and FluentMongo and full freedom of what can be inside the lambda. Let me explain.
>     • 
> My journey was "FluentJPA first" and not the code model first. 
>     • This project goal is an ability to fully express SQL in Java, so I first prototyped the "end game", i.e. some SQL statements written in Java.
>     • Then I built the desired "Code Model" classes and interfaces. Actually, I was quite confident that I would not be too challenged by byte code transformation into the Code Model since I had some experience in the field.
>     • My primary concern was the ability to generate SQL from the Model.
>     • Why? There are several C# Database LINQ providers. All of them solve a much, much simpler problem - the ability to transform just a single expression to SQL. Multi lines, variables, functions are not allowed.
>     • I wanted all these goodies available bacause otherwise I would not be able to say "SQL written in Java".
>     • So I carefully added more capabilities to what can be inside "SQL lambda" (compared to what C# compile time Expression Tree parser can have inside the lambda).
>     • On the other hand, there are many Java capabilities that are not translatable to SQL at all and should not be allowed in this context.
>     • These additions added huge complexity to the SQL transformer, so I'm quite confident that if Babylon aims to be a foundation to transpile JBC into something like SQL, there must be a mechanism to enforce some restrictions on what can be inside the Lambda.
>     • You may reasonably say that the FluentJPA library developer should filter out inappropriate Ops.
>     • Well, here is the first problem - it would mean there is no way for the tooling to perform any runtime validation and ensure that no "unsupported" constructs exist inside the method.
>     • Next problem is branching. Consider the following expression (borrowed from the Babylon test suite - nestedConditionalExpr method): (i < 2) ? (j < 3) ? i : j : i + j
>         • Basically it's a completely valid SQL/Mongo construct that should be mapped to e.g. CASE statement in SQL.
>         • Babylon Code Model generates 6 blocks with complex branching between them. Do you think it would be easy to transpile it to CASE?
>         • With expression tree it maps quite well.
>         • I deliberately don't support BranchExpression (GotoExpression in C#) due to associated complexity. The decompiler performs some work to convert branching into binary tree with conditions. If it fails, a method is considered not compatible with SQL/Mongo.
>     • The next problem is MLIR-like approach. Clearly, it's convenient to the JDK developers and allows straight forward mapping to/from JBC. Is it useful for transpiling, i.e. producing high level language code? I'm not sure. And vice versa, if I need to produce a machine level code - I feel at home.
>     • The final problem is logical transformation/normalization. Consider 2 BigDecimals that we compare using compareTo method. Clearly, we would want them to be compared using an appropriate operator, but JBC does not allow that. When transpiling to SQL I normalize this sort of constructs ahead of the main transpiler. On a larger scale, consider an ecosystem language that might have more numeric types, support operators, etc. All this is usually compiled to the corresponding runtime library methods. Ideally it should be possible to normalize it to a condition. I'm not an expert in Code Model, but my impression is that it's not possible to express something that does not really exist in JBC.
> Overall, I don't feel that Code Model provides any significant upgrade on top of Class-File API beyond the standard way to get the "quoted" object. I will have to transform the Code Model to expressions first to address the issues raised above.
> Don't misunderstand me, the Code Model introduced in Babylon is great and a huge step forward. I can find a lot of use cases where it would be especially useful. My point is that it's not THE perfect fit for projects like FluentJPA/FluentMongo and basically any transpiler to a simple high-level language.
> 
> If you find it useful, I'll be more than happy to have a conversation, show and explain different use cases, provide pros/cons, etc.
> 
> Thanks,
> Kosta
> 
> 
> 
> On Mon, Apr 22, 2024 at 9:25 PM Paul Sandoz <paul.sandoz at oracle.com> wrote:
> Hi,
> 
> Yes, very similar goals expressed differently e.g., the modeling of Java code (Babylon's code model is more MLIR-like rather than C# expression tree-like). Please see the articles linked to from the Babylon project page. One of the goals is to enable developers to write libraries such as FluentJPA and FluentMongo. If you are willing to experiment it would be very interesting to see if Babylon’s code reflection can easily support those libraries you have written.
> 
> Paul.
> 
> > On Apr 21, 2024, at 7:11 PM, liangchenblue at gmail.com wrote:
> > 
> > Hi Konstantin,
> > What you propose has a large overlap with Project Babylon (https://openjdk.org/projects/babylon/), which accomplishes "code reflection" from the Java compiler. The project itself also ships a code model that's suitable for representing ASTs from different programming languages, including ones from Java bytecode. Then your proposal would be much simpler - to generate an expression tree from class files.
> > 
> > Since the discuss list isn't really for development questions, I am replying to babylon-dev mailing list instead and we will sail from there.
> > 
> > Regards
> > 
> 
> 
> -- 
> Regards,
> Konstantin Triger