Providing runtime type information without changing type erasure

Sun Dec 4 18:43:53 UTC 2022

You posted this earlier, but I was kind of at a loss for how to respond, 
because it reads to me like "why don't we 'just' reify generics", as if 
we were just waiting for someone to come up with the idea of 
reification.  I think you are underestimating the scope of what you are 
suggesting by several orders of magnitude.

Reification is invasive.  You would have to plumb new data paths in 
method calling conventions to pass type witnesses through generic 
methods (so when someone says `new Foo<T>`, where T is a method type 
variable, you know what T is), and through object layout in the heap (so 
when someone asks a List what it is a List of, it can find out), and 
through reflection, and through serialization, and ...  And even if you 
did that, you might discover that some runtime typechecking that you'd 
like to do is undecideable (see e.g. 
https://www.researchgate.net/publication/303969346_On_Decidability_of_Nominal_Subtyping_with_Variance). 

The mechanics of reification are well enough understood; see e.g. the 
Kennedy and Syme paper of how they did it in .NET 
(https://www.microsoft.com/en-us/research/publication/design-and-implementation-of-generics-for-the-net-common-language-runtime/), 
or Maurizio Cimadamore's PhD thesis.  But in addition to the substantial 
re-architecting of the VM that is required here, you also have to face 
the problem of migration compatibility.  A key requirement when we did 
generics is that libraries and clients could be generified 
independently; if you generify a library, its clients can upgrade to 
generics now, later, or never, at their choice.  When .NET chose 
heterogeneous translation over homogeneous, they had to write new 
collection libraries.  (That was OK because they had relatively few 
users at the time; Java does not have this luxury.)

This is not a "why don't you just" problem.

Several thousands of expert-hours have already gone into researching how 
these things could be achieved; the best capturing so far is John's 
discussion of the "Parametric VM", which is still in the "working out 
the ideas" stage.  You can read the current state here:

http://cr.openjdk.java.net/~jrose/values/parametric-vm.pdf

There's a ton there, and a ton left to go.

On 12/4/2022 4:06 AM, Red IO wrote:
>
> I was once again writing some generic Java class and suddenly I stand 
> there again, I could not get the type of T.
>
> I’m sure this scenario sound familiar to every Java developer out there.
>
> If you first stumble across this problem type erasure comes up as the 
> culprit: “Type T is not there at runtime it is replaced with Object”
>
> There are many solutions to this problem some hacky some require 
> braking change to the generic system.
>
> Now my idea was why don’t we trace back the types at compile time? I 
> mean every generic class is constructed somewhere in 6 possible ways:
>
> Foo<String> foo = new Foo<>();
> Foo<T> foo2 = new Foo<>();
> Foo foo3 = new Foo();
> Foo<?> foo4 = new Foo<>();
> Foo<? extends String> foo5 = new Foo<>();
> Foo<? super String> foo6 = new Foo<>();
> In case 1 we are already at our destination. The code the compiler 
> gets contains the information we want. Why don’t we attach the type 
> written in plain sight to the generic constructor?
>
> Like an implicit Foo<String> foo = new Foo<>(java.lang.String.class); 
> (which explicitly is a current way of solving this. Which is in my 
> opinion pretty ugly.)
>
> It would be syntactic sugar for the syntax above which spares the 
> developers of repeating themselves. (just like <> does)
>
> If the compiler would just use the information he is erasing to fill 
> it into the constructor. The only thing changing would be that the 
> constructor would take 1 additional hidden argument and since the 
> argument is filled at compile time it would not brake any code.
>
> In case 2 we are in some generic context like a nested class or a 
> generic method. In both cases the actual type of T will be available. 
> Such situations might require multiple passes to solve all 
> dependencies. Then we can proceed like in 1.
>
> In case 3 we are dealing with legacy code and the type is always 
> object. (Foo foo = new Foo(java.lang.Object))
>
> In case 4 we are dealing with an open type bound. The type is not 
> constrained so we can only assume Object (Foo<?> foo = new 
> Foo<>(java.lang.Object))
>
> In case 5 we have a upper constrained type bound. We can assume any 
> Object passing that bound Is a subclass of String (Foo<? extends 
> String> foo = new Foo<>(java.lang.String))
>
> In case 6 we have a lower constrained type bound. The Object passing 
> this type bound could be Object. (Foo<? super String> foo = new 
> Foo<>(java.lang.Object))
>
>
> We would store the types at constructor or Method Invocation as 
> arguments and then map them to the accessor T.class. This would allow 
> type checking of T at runtime by tracing the real type at compile time.
>
> Since the type of T.class would be Class<T> or Class<?> we would not 
> need to create a method or a class for each different version of the 
> method or class like other languages do.
>
> I’m not quite sure how reflection is implemented but I’m sure there is 
> a central invocation where the compiler could add the T.class 
> argument. Of course we would need a way for reflection to specify the 
> type T for this to work. We would simply need a method that is 
> designed for generic classes like “<T> Constructor<T> 
> getGenericConstructor(Class<?>[] typeParameterTypes, Class<?>… 
> parameterTypes)” (could have a better signature).
>
> If the “getConstructor” is used on a generic class there would be a 
> warning and the parameters would be set to Object (resulting in a raw 
> object like any Objects generated by reflection currently are).
>
>
> Risks:
>
> It would require the addition of n fields to any generic class or 
> method where n stands for the number of generic arguments, which could 
> result in performance and memory issues when adding this many new 
> fields to the heap and stack. This issue could be reduced by only 
> adding this mechanism when T.class or instanceof T are actually used 
> in the context and would be skiped if the field is not used in the 
> first place. Resulting in unchanged field count in all class files if 
> the feature is used nowhere.
>
> Another rist would be the obvious change in class files, since the 
> hidden arguments need to be stored at compile time near the invoke of 
> the method/constructor.
>
>
> This is just an idea based on my knowledge on generics in java.
>
> Please feel free to correct any misconceptions in this idea and tell 
> me if I missed something.
>
>
> Great regards
>
> RedIODev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-dev/attachments/20221204/0b339c7a/attachment.htm>