Feedback on Code Reflection API

Fri Sep 13 23:38:28 UTC 2024

Hi Olexandr,

Thank you for the feedback. Some comments in line below.

Paul.

> On Sep 13, 2024, at 2:44 PM, Olexandr Rotan <rotanolexandr842 at gmail.com> wrote:
> 
> Hello to everyone on the mailing list. I would love to share some feedback with you regarding the Code Reflection API used to implement LINQ in java.
> 
> Unfortunately, I found myself in the situation where I don't have much time to work on a project and probably will not have it in the foreseeable future, so I will have to speak based on limited experience using the API.

Ok, appreciate the time you have spent.

> Firstly, I will talk about the problems I encountered, then mark some positive features and finish with general thoughts on the direction of the project and jdk in general.
> 
> The problems started earlier than expected, particularly on extending the Quotable interface. When I just built jdk from Babylon and created a new project, the quoted() method, to my surprise, has been throwing UnsupportedOperationException. I spent multiple hours trying to figure out whats wrong, unit finally I stumbled onto following rows in babylon repo:
> 
> A subset of code in java.base is copied with package renaming into the jdk.compiler module.
> 
> This gave me a hint to add "requires jdk.compiler" to module-info, and it finally worked! I am not sure if this is how it intended to work, but if so, this topic could use more clarification.
> 

That’s odd. At runtime you should not need to require the jdk.compiler module or the java.compiler you declared here:

  https://github.com/Evemose/linq/blob/master/src/main/java/module-info.java#L2

I successfully compiled your project with that declaration removed.

The API in java.base is copied into an *internal* in the JDK compiler. This is because parts of the compiler cannot depend on new APIs in java.base, since the compiler compiles itself. We are working around that limitation by copying code. Perhaps it is possible to overcome it, but we have not investigated.

> Moving to API itself, there are a few points I would like to address. I would like to emphasize that I am speaking as a person that builds API to reach out from one high-level language to another, so my feedback is obviously biased. That said, I found an API containing too many low level details.I didn't even get to know what is dominatedBy and other methods related to this "domination".

Yes, that is understandable. Better documentation will I hope help, and similarly to your points on method naming below. There are features for code analysis and transformation that compilers/transformers will leverage, but not all these features are relevant to all use cases.

> The difference between body and block is also not clear, I didn't find a case during work on a project where the body just had one block. 
> 

Did or did not? The source compiler will produce a code model where each body has just one (entry) block, which is intentional. Lowering that model (a transformation) will produce a new model with one body and many blocks, interconnected to form a control flow graph. In your case, mapping to SQL I would expect you could keep at the higher-level, and likely are only dealing with Java expressions where control flow is commonly more limited to conditional expressions (although you can now include switch expressions too!).

Conceptually you can think of a code model body encapsulating some structural unit of code. The code model produced by the compiler approximately mirrors the structure in source in the arrangement of operations and their bodies. But, if you are expecting a more traditional AST representation you may be disappointed and frustrated :-) The whole area of how we model Java code is not yet documented.

> Naming could also use some improvement. Specifically, the uses() method, due to word meaning both that it uses something and that it is used by something is unclear to me.

Good point.

> capturedValues() name also was really misleading to me, since it seems like it should return a map of values of expressions inside the OP, while it, as I understand, just includes some part underlying ops. It also is really similar to capturedValues() of LambdaOp which in fact contains captured values.

Yes, it’s fundamentally the same concept in a code model but I can see how “capture" can be misleading from a Java language perspective. A captured value in this sense is a value that is declared before an operation, O say, (it dominates the operation) but used by operations within *descendent bodies* (if any) of O. Perhaps we can find a better name to describe this.

> AAnd just generally, having VarAccessOp to contain its captured value would be really helpful since it would spare from passing down Map of captured values through a long chain of methods.
> 

A VarAccessOp does not capture any value. It *uses* a value as an operand, and the value’s type is a variable type, and commonly the result of a Var operation. Given such a (variable) value you can query it's uses to find all the loads and stores to that variable.

> I also did not find a particular use for the transform method. I initially thought it could be used to translate a tree into another language tree, but it instead transforms op into another op. I'm not sure if it replaces one node with another (which would make the tree mutable), or if it just produces a new one and then I don't really know what's the use of it.

This transformation API is code model in and code model out. It’s a functional flat-map transformation that composes traversing the input model and building the output model, that works with immutable code models. It does not make sense to use it when transforming a code model to some other in-memory representation. In that case you can traverse the model explicitly.

> 
> Lastly, what was particularly strange is the fact that composite condition ops do not contain their operands and are not in their operands() method, but instead in bodies() method. Not sure if its intended, but for me it was extremely counterintuitive.

Yes, it is intended. Such high-level operations have nested bodies, each corresponding to structured units of code that may or may not be executed based on the control flow rules of the language construct the operation models. Lowering such operations will collapse the bodies into an interconnected set of basic blocks forming a control flow graph (see slides 30 to 35 here https://cr.openjdk.org/~psandoz/conferences/2023-JVMLS/Code-Reflection-JVMLS-23-08-07.pdf).

Hopefully you might be able to guess how many bodies the for operation modeling a for statement might have :-)

> 
> 
> Now let's talk about positive sides. Generally, I really liked that API is (if all low-level methods are ignored), pretty simple yet powerful. I managed to get everything that I required using just a few simple methods like operands(), resolveTOHandle(), result() and other once situatively. By the way, talking about result(), I didn't find a case where it returns something other than Op.Result, and I guess for many op types result() return type could be narrowed.

I don’t understand your last sentence. Can you explain more? (The method Op::result returns a type of Op.Result). Perhaps you are referring to Op:operands returning List<Value>?

> 
> I also found it really pleasant that Quotable is implemented by default, so I don't have to do any additional steps to start working with Quotable-extending interfaces. Though, it would be really helpful if java.util.function interfaces can become Quotable (unless there is a conversion between equivalent functional interfaces), so LINQ and other querying apis could become interchangeable with streams.
> 

Yeah, that’s a challenge we also have with Serializable. We will not retrofit the existing functional interfaces to be Quotable. Reflecting over code requires permission to do so. Note in general I think we will likely make Quotable a marker interface and one will appeal to some reflection API to obtain the quoted instance.

> 
> And lastly, a few thoughts about the general direction of Babylon development. I am obviously biased, but I found the API too low level. I would also argue that most of the use cases for java developers will interact with code reflection api would involve reaching out to other high-level languages, so it would make sense to make api a little more abstracted. Generally speaking, the jdk approach to "shoot in the middle" seems wrong to me. Generally, some particular use case takes up like 95% of demand for features. This is not particularly the case here, but accumulatively high-level languages would be, I guess, the most common target. Currently, on the other hand, API aims at middle-level languages (if those even exist), and lowering tree is used for low-level interactions. I would argue that non-lowered API should aim for high level languages, while lowered for low level. Middle-level models could be produced as a high level tree + some details from a lowered one.
> 

Note that we have successfully managed to transform high-level models to C code (OpenCL C and CUDA C), or to high-level ASTs for graph query expressions in the spoofax framework. (We want to explore an ONNX script use case when we get the time). The code model at a high-level contains enough structural information to do this, but there could be some additional help (e.g., like indicating the precedence of Java operators or extracting expressions, see https://openjdk.org/projects/babylon/articles/code-models).

There are potentially many use cases at various levels of representation. The challenge we set ourselves is to see if we can devise a single representation that is applicable to many cases.

> That's all for now. I hope in future I could spare some more time and contribute something more than currently, maybe providing feedback also on API evolution. 
> 
> PS: github repo if you are interested: https://github.com/Evemose/linq

I hope so! I shall take a closer look at your code.