Feedback on Code Reflection API

Olexandr Rotan rotanolexandr842 at gmail.com
Sat Sep 14 09:04:02 UTC 2024


Hi! I really failed to articulate some points well enough yesterday. So,
let me make some of the statements more clear, along with some thoughts on
your response.


That’s odd. At runtime you should not need to require the jdk.compiler
> module or the java.compiler you declared here

https://github.com/Evemose/linq/blob/master/src/main/java/module-info.java#L2
>

I successfully compiled your project with that declaration removed.


I also found it really strange, but it at least worked. If it helps in any
way, I built jdk from this commit:
https://github.com/openjdk/babylon/commit/bf2ee69c0e14d78b207c51cf96aaaf95a7d7aa83


> The difference between body and block is also not clear, I didn't find a
> case during work on a project where the body just had one block.
> >
> Did or did not? The source compiler will produce a code model where each
> body has just one (entry) block, which is intentional. Lowering that model
> (a transformation) will produce a new model with one body and many blocks,
> interconnected to form a control flow graph. In your case, mapping to SQL I
> would expect you could keep at the higher-level, and likely are only
> dealing with Java expressions where control flow is commonly more limited
> to conditional expressions (although you can now include switch expressions
> too!).


That's what I was missing. Every time I reached into bodies (which most of
the time was when parsing composite condition ops ), I just did
bodies().blocks().getFirst() and was wondering if there is any use case
where there are more than one block.

Yes, it’s fundamentally the same concept in a code model but I can see how
> “capture" can be misleading from a Java language perspective. A captured
> value in this sense is a value that is declared before an operation, O say,
> (it dominates the operation) but used by operations within *descendent
> bodies* (if any) of O. Perhaps we can find a better name to describe this.


That's what I was wondering. There is nothing inherently wrong with this
naming, it just feels like it clashes definitions of "capture" from two
relatively distant subject areas. I am speaking as a person whose most
robust interaction with bytecode instructions was bubble sort in asm, so I
understand how some of my statements could be barbaric to people who are
much more experienced than me in this area. I am just speaking on behalf of
less immersed into bytecode users, who just want API to sound the way they
used to.

 Lastly, what was particularly strange is the fact that composite condition
> ops do not contain their operands and are not in their operands() method,
> but instead in bodies() method. Not sure if its intended, but for me it was
> extremely counterintuitive.

Yes, it is intended. Such high-level operations have nested bodies, each
> corresponding to structured units of code that may or may not be executed
> based on the control flow rules of the language construct the operation
> models.


Once again, I just fell for my own ignorance. Although, I guess there will
be plenty of people like me wondering where the operands went :), so I
guess it would be worth of @apiNote in docs when they are here.

A VarAccessOp does not capture any value. It *uses* a value as an operand,
> and the value’s type is a variable type, and commonly the result of a Var
> operation.


I guess I have yet to learn captures and uses in cod models. Nevertheless,
while I am not saying that it makes any sense from a bytecode perspective,
it would be nice quality-of-life improvement to be able to just have
something like Object currentValue() or something like this, even if
internally it just receives its value from the parent that captured it.

I don’t understand your last sentence. Can you explain more? (The method
> Op::result returns a type of Op.Result). Perhaps you are referring to
> Op:operands returning List<Value>?


My bad. Haven't opened IDe with this project for roughly a week, some
details already have been gone from my mind. Yes, I am indeed talking about
the operands() method, which returns a list of Value. If you look into my
codebase, you will notice that every time I encounter Value, I just cast it
to Op.Result. Maybe I just haven't found the correct case yet, but
currently there has not been a single time when Value wasn't Op.Result.
Maybe I just didn't work with Ops that have such operands, or
Block.Parameter only present in a lowered model, but for now cast to
Op.Result never failed me.

Yeah, that’s a challenge we also have with Serializable. We will not
> retrofit the existing functional interfaces to be Quotable. Reflecting over
> code requires permission to do so.


Well, that's a shame. To my case specifically some times, when operation in
unmappable to sql, I plan to load chunks of data into memory using some
kind of prefetching persistent spliterator, which effectively makes API
work the same as streams the moment data is loaded in memory, and it would
be really nice to have them be interchangeable. I guess this problem might
eventually be solved by itself in Valhalla when there will finally be a
discussion on making IntStream and Stream<int> interchangeable, which will
require some sort of implicit conversion of functional interfaces. There
are of course a million problems to it, especially for stateful interfaces,
but that's another rant.

Note that we have successfully managed to transform high-level models to C
> code (OpenCL C and CUDA C), or to high-level ASTs for graph query
> expressions in the spoofax framework.


It was my intention to make feedback subjective here. I completely
understand that, most likely, a person who reaches out to, lets say, OpenCL
API most likely would consider me insane with all this "i didn't use this,
I didn't need that". That's why I think that it's important to not try
to be objective, but provide a subjective opinion and let people who get
tons of those opinions every day deduct objective one. That said, I would
argue that there are plenty of subjective opinions that would benefit from
some high-level representation, but it may be just my bubble.

And a few points that I forgot to put in a last mail:

!. That is not particularly required for my use case, but it may be good to
have some visitors (or their more modern alternatives like Jsoup API), for
tasks that aim to analyze code, not translate it. For example, the
assertion library that one of the mail list members presented a while ago,
I think, would benefit very much from something like this.
2. Limited bytecode auditing. I, honestly, not sure that it falls under
scope of this project, but it would be really nice to have some simple
auditing mechanisms that allow, for example, knowing the position from
which method is invoked (similar features are present in .Net as far as I
remember, in Java there is bytebuddy but it is always a few weeks late to
release because they have to adapt to bytecode updates in the new java
version). The use case for this example is particular in translating one
model to another. I noticed (which is pretty intuitive) that one chain of
method calls always produces the same template of request, only changes are
observed in variable values, so it would be nice to cache templates by
invocation point after the first request and then just fill in values in
these templates.

I hope now some of my points have become clearer to you. Hope some of my
feedback will be helpful for determining direction of project

Best regards

On Sat, Sep 14, 2024 at 2:38 AM Paul Sandoz <paul.sandoz at oracle.com> wrote:

> Hi Olexandr,
>
> Thank you for the feedback. Some comments in line below.
>
> Paul.
>
>
> > On Sep 13, 2024, at 2:44 PM, Olexandr Rotan <rotanolexandr842 at gmail.com>
> wrote:
> >
> > Hello to everyone on the mailing list. I would love to share some
> feedback with you regarding the Code Reflection API used to implement LINQ
> in java.
> >
> > Unfortunately, I found myself in the situation where I don't have much
> time to work on a project and probably will not have it in the foreseeable
> future, so I will have to speak based on limited experience using the API.
>
> Ok, appreciate the time you have spent.
>
>
> > Firstly, I will talk about the problems I encountered, then mark some
> positive features and finish with general thoughts on the direction of the
> project and jdk in general.
> >
> > The problems started earlier than expected, particularly on extending
> the Quotable interface. When I just built jdk from Babylon and created a
> new project, the quoted() method, to my surprise, has been throwing
> UnsupportedOperationException. I spent multiple hours trying to figure out
> whats wrong, unit finally I stumbled onto following rows in babylon repo:
> >
> > A subset of code in java.base is copied with package renaming into the
> jdk.compiler module.
> >
> > This gave me a hint to add "requires jdk.compiler" to module-info, and
> it finally worked! I am not sure if this is how it intended to work, but if
> so, this topic could use more clarification.
> >
>
> That’s odd. At runtime you should not need to require the jdk.compiler
> module or the java.compiler you declared here:
>
>
> https://github.com/Evemose/linq/blob/master/src/main/java/module-info.java#L2
>
> I successfully compiled your project with that declaration removed.
>
> The API in java.base is copied into an *internal* in the JDK compiler.
> This is because parts of the compiler cannot depend on new APIs in
> java.base, since the compiler compiles itself. We are working around that
> limitation by copying code. Perhaps it is possible to overcome it, but we
> have not investigated.
>
>
> > Moving to API itself, there are a few points I would like to address. I
> would like to emphasize that I am speaking as a person that builds API to
> reach out from one high-level language to another, so my feedback is
> obviously biased. That said, I found an API containing too many low level
> details.I didn't even get to know what is dominatedBy and other methods
> related to this "domination".
>
> Yes, that is understandable. Better documentation will I hope help, and
> similarly to your points on method naming below. There are features for
> code analysis and transformation that compilers/transformers will leverage,
> but not all these features are relevant to all use cases.
>
>
> > The difference between body and block is also not clear, I didn't find a
> case during work on a project where the body just had one block.
> >
>
> Did or did not? The source compiler will produce a code model where each
> body has just one (entry) block, which is intentional. Lowering that model
> (a transformation) will produce a new model with one body and many blocks,
> interconnected to form a control flow graph. In your case, mapping to SQL I
> would expect you could keep at the higher-level, and likely are only
> dealing with Java expressions where control flow is commonly more limited
> to conditional expressions (although you can now include switch expressions
> too!).
>
> Conceptually you can think of a code model body encapsulating some
> structural unit of code. The code model produced by the compiler
> approximately mirrors the structure in source in the arrangement of
> operations and their bodies. But, if you are expecting a more traditional
> AST representation you may be disappointed and frustrated :-) The whole
> area of how we model Java code is not yet documented.
>
>
> > Naming could also use some improvement. Specifically, the uses() method,
> due to word meaning both that it uses something and that it is used by
> something is unclear to me.
>
> Good point.
>
>
> > capturedValues() name also was really misleading to me, since it seems
> like it should return a map of values of expressions inside the OP, while
> it, as I understand, just includes some part underlying ops. It also is
> really similar to capturedValues() of LambdaOp which in fact contains
> captured values.
>
> Yes, it’s fundamentally the same concept in a code model but I can see how
> “capture" can be misleading from a Java language perspective. A captured
> value in this sense is a value that is declared before an operation, O say,
> (it dominates the operation) but used by operations within *descendent
> bodies* (if any) of O. Perhaps we can find a better name to describe this.
>
>
> > AAnd just generally, having VarAccessOp to contain its captured value
> would be really helpful since it would spare from passing down Map of
> captured values through a long chain of methods.
> >
>
> A VarAccessOp does not capture any value. It *uses* a value as an operand,
> and the value’s type is a variable type, and commonly the result of a Var
> operation. Given such a (variable) value you can query it's uses to find
> all the loads and stores to that variable.
>
>
> > I also did not find a particular use for the transform method. I
> initially thought it could be used to translate a tree into another
> language tree, but it instead transforms op into another op. I'm not sure
> if it replaces one node with another (which would make the tree mutable),
> or if it just produces a new one and then I don't really know what's the
> use of it.
>
> This transformation API is code model in and code model out. It’s a
> functional flat-map transformation that composes traversing the input model
> and building the output model, that works with immutable code models. It
> does not make sense to use it when transforming a code model to some other
> in-memory representation. In that case you can traverse the model
> explicitly.
>
>
> >
> > Lastly, what was particularly strange is the fact that composite
> condition ops do not contain their operands and are not in their operands()
> method, but instead in bodies() method. Not sure if its intended, but for
> me it was extremely counterintuitive.
>
> Yes, it is intended. Such high-level operations have nested bodies, each
> corresponding to structured units of code that may or may not be executed
> based on the control flow rules of the language construct the operation
> models. Lowering such operations will collapse the bodies into an
> interconnected set of basic blocks forming a control flow graph (see slides
> 30 to 35 here
> https://cr.openjdk.org/~psandoz/conferences/2023-JVMLS/Code-Reflection-JVMLS-23-08-07.pdf
> ).
>
> Hopefully you might be able to guess how many bodies the for operation
> modeling a for statement might have :-)
>
>
> >
> >
> > Now let's talk about positive sides. Generally, I really liked that API
> is (if all low-level methods are ignored), pretty simple yet powerful. I
> managed to get everything that I required using just a few simple methods
> like operands(), resolveTOHandle(), result() and other once situatively. By
> the way, talking about result(), I didn't find a case where it returns
> something other than Op.Result, and I guess for many op types result()
> return type could be narrowed.
>
> I don’t understand your last sentence. Can you explain more? (The method
> Op::result returns a type of Op.Result). Perhaps you are referring to
> Op:operands returning List<Value>?
>
>
> >
> > I also found it really pleasant that Quotable is implemented by default,
> so I don't have to do any additional steps to start working with
> Quotable-extending interfaces. Though, it would be really helpful if
> java.util.function interfaces can become Quotable (unless there is a
> conversion between equivalent functional interfaces), so LINQ and other
> querying apis could become interchangeable with streams.
> >
>
> Yeah, that’s a challenge we also have with Serializable. We will not
> retrofit the existing functional interfaces to be Quotable. Reflecting over
> code requires permission to do so. Note in general I think we will likely
> make Quotable a marker interface and one will appeal to some reflection API
> to obtain the quoted instance.
>
>
> >
> > And lastly, a few thoughts about the general direction of Babylon
> development. I am obviously biased, but I found the API too low level. I
> would also argue that most of the use cases for java developers will
> interact with code reflection api would involve reaching out to other
> high-level languages, so it would make sense to make api a little more
> abstracted. Generally speaking, the jdk approach to "shoot in the middle"
> seems wrong to me. Generally, some particular use case takes up like 95% of
> demand for features. This is not particularly the case here, but
> accumulatively high-level languages would be, I guess, the most common
> target. Currently, on the other hand, API aims at middle-level languages
> (if those even exist), and lowering tree is used for low-level
> interactions. I would argue that non-lowered API should aim for high level
> languages, while lowered for low level. Middle-level models could be
> produced as a high level tree + some details from a lowered one.
> >
>
> Note that we have successfully managed to transform high-level models to C
> code (OpenCL C and CUDA C), or to high-level ASTs for graph query
> expressions in the spoofax framework. (We want to explore an ONNX script
> use case when we get the time). The code model at a high-level contains
> enough structural information to do this, but there could be some
> additional help (e.g., like indicating the precedence of Java operators or
> extracting expressions, see
> https://openjdk.org/projects/babylon/articles/code-models).
>
> There are potentially many use cases at various levels of representation.
> The challenge we set ourselves is to see if we can devise a single
> representation that is applicable to many cases.
>
>
> > That's all for now. I hope in future I could spare some more time and
> contribute something more than currently, maybe providing feedback also on
> API evolution.
> >
> > PS: github repo if you are interested: https://github.com/Evemose/linq
>
>
> I hope so! I shall take a closer look at your code.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/babylon-dev/attachments/20240914/0147c988/attachment-0001.htm>


More information about the babylon-dev mailing list