Prototype details

Wed Jan 17 18:58:08 UTC 2024

Hi Andrew,

A fair question.

We will be exploring that option too (lifting mentioned in item 7), primarily when the user has direct access to class files via the file system. I have found this useful when experimenting with Leyden like use cases e.g., identifying and transforming constant dynamic proxy call sites. I think it important that at runtime we don’t grant access to code that the author has not agreed should be accessed and was never written with the knowledge it should be accessed e.g., to differentiate a method that the user never intended for differentiation, or to transform to GPU executable code, etc. both constrained programming models.

I could imagine code models produced by the source compiler or models produced from bytecode produced by the source compiler might meet to some approximation in the middle.

We want to support a wide variety of use cases, some require that structure and typing is preserved at a high fidelity and in other cases less so, where bytecode could provide sufficient fidelity from which to produce a lower-level model. In the former case it is easier and more accurate to start from the AST rather than heroically construct from patterns in the bytecode. Note one problematic aspect of bytecode that can make lifting harder is the compiler's translation strategy (an implementation detail), which may need to be reverse engineered to produce a reasonable model e.g., lambda expressions, but it could get more complicated with switch statements/expressions and patterns.

For many use cases it is not about running Java code but transforming that Java code to a different programming domain where Java programming meaning is not fully preserved. A good example of this is Oracle’s Parallel Graph AnalytiX (PGX) platform. It comes with a Java API and a PGX compiler that transforms a Java program into an executable PGX program that performs queries over graph data. The compiler is implemented as a plugin in the javac compiler. As an experiment we successfully reimplemented the front end of the compiler using code reflection and slotted that into their existing compiler toolchain. I would argue it was easier to do so using code reflection. I hope to see that pattern occur for other use cases, such as reusing native MLIR compiler tool chains.

Paul.

> On Jan 17, 2024, at 1:25 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:
> 
> On 1/16/24 20:52, Paul Sandoz wrote:
>> 4. Enhancements to the Java source compiler to produce code models from
>> its AST representation of Java source and store them in class files.
>> We can model many Java programs, but not all. We will continue to expand
>>  that set as we model more language constructs.
> 
> So this seems like a good time to ask a question that's been bothering me
> since I first heard about Babylon.
> 
> Why not produce the code model from bytecode? Sure, you'd lose some stuff,
> but as we all know there's enough in bytecode to produce runnable Java.
> 
> -- 
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>