[code-reflection] RFR: Regularize support for Java types/references

Wed May 21 11:37:59 UTC 2025

On Wed, 21 May 2025 10:00:12 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> This PR regularizes the support for Java types in the code model. There are two competing constraints to balance: on the one hand, the Java types should be readable from humans when inspecting the textual form of code model; on the other hand, we don't want to bake in Java-specific type parsing in the core code model API.
> 
> Unfortunately, the current implementation scores poorly on both aspects, as (a) textual Java types can become very hard to read (e.g. `.<.<NewTest, NewTest$B>, NewTest$B$C>`), and (b) the parsing logic (defined in `DescParser`) contains several ad-hoc extensions that are only there to support Java types.
> 
> To address this, this PR introduces two different kinds of externalized type forms: *inflated* type forms and *flattened* type forms. An inflated type form closely follow the structure of the `JavaType` or `JavaRef` it models. For instance, the Java type `Foo<? extends Bar>` is modelled as follows:
> 
> 
> java.type.class(Foo, java.type.primitive(void),
>                   java.type.wildcard(EXTENDS,
>                                      java.type.class(Bar, java.type.primitive(void))))
> 
> 
> Inflated type forms can be *flattened* (using `JavaTypeUtils::flatten`), to derive a form that is more suitable for humans. For instance, the above inflated form is flattened as follows:
> 
> 
> java.type:"Foo<? extends Bar>"
> 
> Conversely,  flattened type forms can be *inflated* back (using `JavaTypeUtils::inflate`) to their original inflated form.
> 
> This distinction between inflated and flattened forms allow us to massively simplify `DescParser` -- which no longer has to worries about the syntactic vagaries of Java types. All that complexity is now pushed onto the flattened type forms support. The goal is to progressively make flattened Java type forms an "implementation detail" (more on that below).
> 
> To accommodate flattened and inflated type forms, two changes are needed:
> * in `OpWriter` we need to flatten an inflated type form (if possible) before writing it out as a string (this allows for the textual form of a code model to be more readable)
> * in `OpParser` we need to inflate a flattened type form (if possible) before handing the type form back to the rest of the parsing machinery (which will e.g. invoke the type factory on such inflated type form)
> 
> All Java types and references now follow this pattern:
> 1. the `externalize` method returns an inflated type form
> 2. the `toString` method returns a readable string (the one associated with the corresponding flat type ...

src/jdk.incubator.code/share/classes/jdk/incubator/code/TypeElement.java line 48:

> 46:             }
> 47: 
> 48:             // Unpack array-like identifier [+

This code was removed, as it was specific to Java types

src/jdk.incubator.code/share/classes/jdk/incubator/code/parser/OpParser.java line 127:

> 125: public final class OpParser {
> 126: 
> 127:     static final TypeElement.ExternalizedTypeElement VOID = JavaType.VOID.externalize();

This is a workaround: the parser uses a private void type -- but such a type is not handled by either the core type factory or the Java type factory. So injecting such type into the parsed model results into issues.

src/jdk.incubator.code/share/classes/jdk/incubator/code/parser/OpParser.java line 606:

> 604: 
> 605:     TypeElement.ExternalizedTypeElement parseExTypeElem() {
> 606:         return JavaTypeUtils.inflate(DescParser.parseExTypeElem(lexer));

When parsing an externalized type form, inflate it fist (from its readable form, if it's a Java type).

src/jdk.incubator.code/share/classes/jdk/incubator/code/parser/impl/DescParser.java line 48:

> 46:     }
> 47: 
> 48:     //    ExType:

Note that the grammar here is very simple, as we no longer have to have support for Java types in here. An externalized type form is a type identifier, followed by zero or more externalized type arguments. An identifier is a qualified identifier -- each part in the identifier can be either string literals or identifiers. Various parts can be separated either with `.` or `:`.

src/jdk.incubator.code/share/classes/jdk/incubator/code/type/ArrayType.java line 82:

> 80:     @Override
> 81:     public ExternalizedTypeElement externalize() {
> 82:         return JavaTypeUtils.arrayType(componentType.externalize());

This pattern is common to all types and refs:
* the `externalize` method returns an inflated externalized form built with the corresponding factory in `JavaTypeUtils`
* the `toString` method returns a flat string, obtained with the corresponding method in `JavaTypeUtils`
* the `ofString` factory parses a flat string and turns it into an inflated form, which is then parsed into a `JavaType` or `JavaRef`, using corresponding methods in `JavaTypeUtils`

src/jdk.incubator.code/share/classes/jdk/incubator/code/type/CoreTypeFactory.java line 100:

> 98:         @Override
> 99:         public TypeElement constructType(TypeElement.ExternalizedTypeElement tree) {
> 100:             return JavaTypeUtils.toJavaType(tree);

Note how parsing an inflated type form is now done entirely inside `JavaTypeUtils`.

I'm not entirely sure whether the Java type factory should support both Java type and reference inflated forms. For now I went with just types -- but if we want to push this further (e.g. to cover also op attributes) we would need the factory to support both types and refs (which is a trivial change to do).

src/jdk.incubator.code/share/classes/jdk/incubator/code/type/impl/JavaTypeUtils.java line 655:

> 653:     }
> 654: 
> 655:     //    JavaRef:

Note: the grammar of ref and types is partially overlapping -- for this reason it's slightly easier to keep the two parsing methods separate. If we wanted to unify more, we could require all refs to be prefixed with a `&` which would simplify things a bit (and also be compatible with what we do to disambiguate between class and method type-variable owner).

src/jdk.incubator.code/share/classes/jdk/incubator/code/writer/OpWriter.java line 561:

> 559: 
> 560:     void writeType(TypeElement te) {
> 561:         write(JavaTypeUtils.flatten(te.externalize()).toString());

When writing a type element into textual form, flatten it first (to make it more readable if it's a Java type).

test/langtools/tools/javac/reflect/CodeReflectionTester.java line 72:

> 70:     }
> 71: 
> 72:     static void checkModel(Member member, String found, IR ir) {

This new method allows the harness to print the model we found on the compiled class even in the face of an error when parsing the `@IR` annotation. This is very handy when bulk-updating the tests, as it means that parser errors will cause the harness to just dump the "expected" model for all the methods in the class (instead of throwing on the first parser error).

test/langtools/tools/javac/reflect/ForLoopTest.java line 42:

> 40:                 %2 : Var<java.type:"java.util.List<java.util.List<java.lang.String>>"> = var %1 @"ll";
> 41:                 java.enhancedFor
> 42:                     ()java.type:"java.util.List<java.util.List<java.lang.String>>" -> {

Here and elsewhere -- it seems like block names like `^expr` or `^def` are no longer generated. But this PR doesn't change that (in fact, I tried creating the model for a simple for loop without these changes, and I still don't see the body names).

I believe this has to do with the fact that e.g. for ops are composed of _bodies_, and each body in the op has a single block (thus, it is the entry block for that body). And, entry block names are skipped.

The test was probably still working fine (despite the mismatch) because we parse the IR string, then generate the textual form again before comparing it -- this will drop the entry block names -- meaning the "expected" IR will match the one generated by javac.

-------------

PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2099884559
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2099888570
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2099894899
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2099898042
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2099902238
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2100046974
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2100043265
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2099893171
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2099890681
PR Review Comment: https://git.openjdk.org/babylon/pull/432#discussion_r2099974036