[code-reflection] RFR: Regularize support for Java types/references

Maurizio Cimadamore mcimadamore at openjdk.org
Wed May 21 11:37:54 UTC 2025


This PR regularizes the support for Java types in the code model. There are two competing constraints to balance: on the one hand, the Java types should be readable from humans when inspecting the textual form of code model; on the other hand, we don't want to bake in Java-specific type parsing in the core code model API.

Unfortunately, the current implementation scores poorly on both aspects, as (a) textual Java types can become very hard to read (e.g. `.<.<NewTest, NewTest$B>, NewTest$B$C>`), and (b) the parsing logic (defined in `DescParser`) contains several ad-hoc extensions that are only there to support Java types.

To address this, this PR introduces two different kinds of externalized type forms: *inflated* type forms and *flattened* type forms. An inflated type form closely follow the structure of the `JavaType` or `JavaRef` it models. For instance, the Java type `Foo<? extends Bar>` is modelled as follows:


java.type.class(Foo, java.type.primitive(void),
                  java.type.wildcard(EXTENDS,
                                     java.type.class(Bar, java.type.primitive(void))))


Inflated type forms can be *flattened* (using `JavaTypeUtils::flatten`), to derive a form that is more suitable for humans. For instance, the above inflated form is flattened as follows:


java.type:"Foo<? extends Bar>"

Conversely,  flattened type forms can be *inflated* back (using `JavaTypeUtils::inflate`) to their original inflated form.

This distinction between inflated and flattened forms allow us to massively simplify `DescParser` -- which no longer has to worries about the syntactic vagaries of Java types. All that complexity is now pushed onto the flattened type forms support. The goal is to progressively make flattened Java type forms an "implementation detail" (more on that below).

To accommodate flattened and inflated type forms, two changes are needed:
* in `OpWriter` we need to flatten an inflated type form (if possible) before writing it out as a string (this allows for the textual form of a code model to be more readable)
* in `OpParser` we need to inflate a flattened type form (if possible) before handing the type form back to the rest of the parsing machinery (which will e.g. invoke the type factory on such inflated type form)

All Java types and references now follow this pattern:
1. the `externalize` method returns an inflated type form
2. the `toString` method returns a readable string (the one associated with the corresponding flat type form)
3. the `ofString` factory parses the flat string into an inflated form, then turn that inflated form into either a `JavaType` or `JavaRef`, as needed

Note that (3) is the only reason as to why the flattened type forms surface into the API (since the factories are public, we would need to specify what strings they accept). Crucially, these factories are only used when parsing op attributes -- where Java types and refs are just modeled as plain flat strings -- e.g.:


%6 : java.type:"int" = invoke %5 @"java.lang.Object::hashCode():int";


Note how the Java method reference after `@` is encoded as a plain string, which is then parser by the `InvokeOp` factory using `MethodRef::ofString`.

#### Implementation

This PR moves all the support for inflated and flattened Java type forms into a single class, namely `JavaTypeUtils`. This class contains factories to create the various inflated forms, as well as method to parse them, to turn them into flat strings, or to turn them into `JavaType` or `JavaRef` objects.

The code in this class is a bit tedious, as I wanted to express all the functionality we needed in terms of externalized type forms. This allows us to reduce the impact of Java types on the rest of the code model (e.g. the changes to `OpParser` and `OpWriter` are literally one liners -- at the cost of making the implementation of some of these methods slightly more convoluted.

For instance, it would have been easier to define a mapping from `JavaType` into a flat string (as we could leverage the `toString` method on the various `JavaType` subclasses). But doing so would create some issues, as now `OpWriter` would need to sometimes call `externalize` on the type, sometimes call `toString` on it -- similar idiosyncrasies would show up on the `OpParser` side.

#### Future work

The next step in this journey (not addressed by this PR) would be to parse Java type/ref attributes in a more structural fashion, so that we can let go of the `ofString` factories.

And, after that, another possible improvement would be to augment and generalize the grammar supported by `DescParser` to cover more interesting structural forms -- something like:


node:
    ident '<' node* '>'
    literal


This would allow us to model all externalized type forms as more general *nodes* -- but will also allow us to model other kinds of interesting structure more directly, such as location information, Java literals and, possibly, even Java annotations.

-------------

Commit messages:
 - Fix JavaType grammar
 - Drop redundant field
 - Add Javadoc to `JavaTypeUtils`
 - Cleanup exType parser
 - Cleanup and strengthen code that tests for the specific kind of externalized type
 - Remove parens from method type var external type strings
 - All tests pass
 - Drop Java-type specific code in ExternalizedTypeElement
 - Fix ClassType::rawType() to take enclosing type into account
 - Re-enable all the test cases in `TestJavaType`
 - ... and 6 more: https://git.openjdk.org/babylon/compare/2b0d5a2d...b76890dc

Changes: https://git.openjdk.org/babylon/pull/432/files
  Webrev: https://webrevs.openjdk.org/?repo=babylon&pr=432&range=00
  Stats: 7361 lines in 72 files changed: 840 ins; 450 del; 6071 mod
  Patch: https://git.openjdk.org/babylon/pull/432.diff
  Fetch: git fetch https://git.openjdk.org/babylon.git pull/432/head:pull/432

PR: https://git.openjdk.org/babylon/pull/432


More information about the babylon-dev mailing list