Primitives in Generics proposal.

Tue Mar 9 16:57:17 PST 2010

Reinier-

An interesting idea, but unfortunately it is far larger in scope than
project lambda, and requires a much deeper and more formal design of
the type system impact.  A type parameter implicitly "extends Object",
but primitives do not.

There are a few fatal flaws in your suggested approaches.  First,
subtype operations need to preserve reference identity, for reasons
discussed earlier.  Your plan to convert int[] to Integer[] and back
again undermines that.  Second, the boxing conversions blur but do not
erase the distinction between primitives and reference types.  You are
simply incorrect when you say that primitives do not have subtypes.
byte is a subtype of long but Byte is not a subtype of Long.  Third,
you cannot forbid primitives as bounds, even if primitives are
considered invariant for the purposes of generics, because type
parameters can be used as bounds too.  Arrays of primitives are not
syntactically allowed as bounds, but that is a syntactic restriction
only, as type parameters can be primitive-array types, and type
parameters can be bounds.

It is hard to see how this can be implemented without at least
partially reifying generics, as the bytecode to handle primitives
differs from the code to handle reference types. Java has separate
compilation and runtime binding, exposed in the language as "binary
compatibility"; to support that, the VM must generate specializations
at runtime.

Cheers,
Neal

On Tue, Mar 9, 2010 at 1:28 PM, Reinier Zwitserloot
<reinier at zwitserloot.com> wrote:
> As Stephen Colebourne already suggested, the various problems with function
> types can be sidestepped rather simply by sticking purely with (SAM) type
> names, immediately fitting any lambda to an appropriate SAM type. However,
> to feasibly go this route, generics _need_ to be able to work with primitive
> types, in order to reduce ParallelArray's 132 SAM types in Ops down to a
> nice lean 8.
>
> I've written up a proposal to get the discussion going. If indeed the
> existence of such a proposal results in a much simpler Project Lambda, then
> the time saved implementing Project Lambda can be used to implement this
> idea. It should go without saying that allowing primitives in generics would
> be a great feature for JDK7 on its own.
>
> The canonical version that will be updated and is nice on the eyes can be
> found here:
>
> http://projectlombok.org/primitive_generics/
>
>
> but in the interests of saving all discussion in one thread I've pasted the
> markdown source of v1.0 below.
>
> Happy reading!
>
>
> # Primitives in generics proposal
>
> v1.0
>
> by Reinier Zwitserloot
>
>
> ## &sect;1 Preamble and necessity of the feature
>
> The goal of this proposal is to allow primitives in generics. To simplify
> the proposal, `int` is used in all examples and in all text. All the other 7
> primitive types are of course also allowed as type arguments. In particular,
> a closure strategy for JDK7's [Project Lambda][] to always _box_ any closure
> into a so-called SAM type is a far more realistic option if the used SAM
> types can accept primitives in their generics.  (a type that contains
> exactly 1 abstract method, usually an interface. For example,
> `java.util.Comparator` is a SAM type. So is `java.lang.Runnable`). For
> example, [jsr166y-extra - Parallel Arrays][jsr166yx], the standard usecase
> example for Project Lambda, currently lists 132 SAM interface types in its
> [Ops](
> http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/extra166y/Ops.html)
> container class, but most of those are simply to provide a primitive-based
> version of a few central concepts. If primitive generics were possible, this
> list of 132 would be reduced to a much more manageable set. To be precise:
>
>    Ops.Op<A, R>
>    Ops.BinaryOp<A, B, R>
>    Ops.Predicate<A>
>    Ops.BinaryPredicate<A, B>
>    Ops.Procedure<A>
>    Ops.Generator<R>
>    Ops.Reducer<A> extends BinaryOp<A, A, A>
>    java.util.Comparator<A>
>    Ops.Action
>
> These names are more meaningful than their equivalent purely structural
> function type constructs. For example, jsr166yx includes the concept of a
> `Ops.Reducer` even though the signature of a reducer is 100% equivalent to
> the more general `Op` class. It is self-evident then that the designers of
> jsr166yx find this kind of purely name-based distinction important enough to
> dedicate 4 of Ops' 132 SAM types to it even though they didn't have to.
>
> Furthermore, as the inclusion of `java.util.Comparator` foreshadowed,
> sticking with names for function types fits better with existing java code.
> jsr166yx as it stands uses `java.util`'s version of Comparator for its
> Object based `ParallelArray` but defined its own versions for the
> primitives, such as `Ops.LongComparator`. If function types were introduced
> instead, then jsr166yx would obviously replace its `Ops.LongComparator` with
> `#int(long, long)` but it would face a dilemma for its `Object` based
> `ParallelArray`: Should it take `#int(T, T)`, should it take
> `java.util.Comparator<T>`, or should it take both? None of these three
> options are fully consistent. This isn't a rare coincidence; the amount of
> SAM types that are in active use across the java community is staggering,
> and introducing function types to solve the primitives problem is going to
> introduce a lot of inconsistency as APIs evolve to accept closures but keep
> their old SAM based methods for backwards compatibility.
>
> The solution is as clear as it is obvious: Make primitives allowable as type
> argument. Together with a relatively minor and proven (by JRockit which
> already includes it) JVM optimization, both the API and the performance
> issues can be solved.
>
> As a secondary step, one could start questioning if, once this feature has
> been added, function types are even necessary. As recent traffic on the
> [Project Lambda mailing list](
> http://mail.openjdk.java.net/mailman/listinfo/lambda-dev) has indicated,
> there are a host of issues with function types, from syntax ambiguities to
> problems around allowing arrays of function types. By sticking with SAM
> types, no new ideas are introduced for the java community to learn; they
> presumably already know that for example the expression `new
> Comparator<int>[10]` is not legal.
>
> Last but certainly not least: Primitives as type arguments are a great
> feature for the java language all by themselves and not only as a part of a
> closure proposal. For example, if this proposal is implemented in JDK7, the
> following could be allowed: `List<int> intList = new IntList();` and be
> virtually as fast as a purely `int` array based list implementation such as
> the ones that can be found in [GNU Trove](
> http://trove4j.sourceforge.net/javadocs/gnu/trove/TIntArrayList.html), but
> which is API compatible with `java.util.List`.
>
> This v1.0 of the proposal does not highlight which sections of the JLS are
> involved or need to be changed. Its primary intention is to start a
> discussion, explore the feasibility, if it has clear solutions for all
> potential issues, and if it can lead to a simplification of Project Lambda,
> thus freeing up enough time in the JDK7 schedule to implement it.
>
>
> [Project Lambda]: http://openjdk.java.net/projects/lambda/ "Project Lambda
> Mailing List"
> [jsr166yx]: http://gee.cs.oswego.edu/dl/concurrency-interest/ "jsr166y -
> Parallel Arrays"
>
> ## &sect;2 Proposal
>
> ### &sect;2.1 Type Variables
>
> Type variables are the `T` in `public class Foo<T> {}`. Type variables can
> only include actual types in their bounds, such as `<T extends Number>`. As
> `T extends int` doesn't really convey a useful construct, we simply won't
> allow it. This means type variables themselves don't have to change at all.
>
> ### &sect;2.2 Type Arguments
>
> Type Arguments come in two flavours: Just the type, as in: `List<String>`,
> or as a bound, such as `List<? extends Number>`. As primitive types don't
> really have subtype relations, there is again not much sense in allowing
> either `List<? extends int>` or `List<? super int>`. It is therefore not
> allowed; *only* `List<int>` would be legal. To be precise, `List<? extends
> Set<int>>` would be allowed, but `List<Set<? extends int>>` would not be.
>
> ### &sect;2.3 Syntax Sugar
>
> The JVM barely supports generics (instances don't preserve their type
> arguments, and method signatures don't either), and changing the JVM is a
> major undertaking so this proposal does not include any required changes to
> the JVM. Therefore, Something like `List<int> x;` is effectively syntax
> sugar similar to how `List<String> x;` is effectively syntax sugar.
>
> ### &sect;2.4 Legality of primitive type arguments.
>
> A primitive can be bound to a type argument anywhere its wrapped type would
> be legal. Therefore, `int` is valid when the bound is `T extends Number`,
> but it would not be if the bound was, for example, `T extends InputStream`.
>
> ### &sect;2.5 The *typearg_int* imaginary type.
>
> Anytime a primitive type is encountered as a type argument, the compiler
> needs to keep track of the notion that the primitive type is used as type
> bound. For example, the compiler will usually have to inject synthetic
> methods that replace the primitive type with its wrapper type. To help
> clarify the actions that the compiler needs to take, we'll call any `int`
> that appears in a location where it is being used to fit a type variable as
> the imaginary `typearg_int` type. This isn't a real type, just a virtual
> construct of the compiler. This concept already exists in java. If, for
> example, you write the following code:
>
>    public abstract class Example implements List<String> {
>        @Override public boolean void add(String x) {
>            return true;
>        }
>    }
>
> and inspect the class produced by javac, you'll notice 2, and not 1, `add`
> method; you'll see `add(Object)` as well as `add(String)` in the generated
> class file. The `add(Object)` simply passes the call to the `add(String x)`
> method, which involves a type cast. For primitive types, this concept is no
> different:
>
>    public abstract class Example implements List<int> {
>        @Override public boolean void add(int x) {
>            return true;
>        }
>    }
>
> would result in not just `add(int)` but also `add(Object)` which simply
> passes the call on. In addition to a type cast (to `java.lang.Integer`),
> this wrapper method will also auto-unbox the object.
>
> ### Sugar
>
> #### &sect;2.6 Field Read
>
> Anytime a field of type "int" is read, the compiler emits bytecode:
>
>    GETFIELD/GETSTATIC "java/awt/Point" "x" "I"
>
> However, if a field of type "typearg_int" is read, it is unboxed first, so,
> the compiler emits:
>
>    GETFIELD/GETSTATIC "java/awt/Point" "x" "Ljava/lang/Object;"
>    INVOKEVIRTUAL "java/lang/Integer" "intValue" "()I"
>
> In other words, the field is assumed to be of the erased typevar's type
> (Object in the example, but could also be e.g. `Number`), is assumed to
> contain an `Integer` object, and `intValue()` is called on it.
>
> This act may result in the usual `ClassCastException` if heap pollution has
> occurred, but can additionally also result in a `NullPointerException`, even
> without heap pollution. This is analogous to auto-unboxing causing an NPE,
> which is in the current Java6 also possible even if heap pollution has not
> taken place. The java community has embraced autoboxing/unboxing even with
> this shortcoming in the typing system, hence this similar case, which is not
> readily solvable, can be considered the same way: A problem, but not a
> show-stopper.
>
> #### &sect;2.7 Field Write
>
> Analogous to field read; instead of generating a `PUTFIELD` of type "I", the
> erased typevar's type is used instead, and a call to `INVOKESTATIC
> "java/lang/Integer" "valueOf" "(I)Ljava/lang/Integer;"` is inserted before
> the `PUTFIELD` instruction.
>
> Unlike the field read, there's no NullPointerException issue here.
>
> #### &sect;2.8 Creating a field with a "typearg_int" type.
>
> This isn't possible; Fields cannot be overridden, they can only be shadowed.
> Fields *can* of course be defined as having a type variable as type, but a
> type variable's actual type is unknown. For example:
>
>    public class Foo<T> {
>        T x;
>    }
>
> The `T` can be anything and is in practice erased to `java.lang.Object`.
> This erasure will not change because of this proposal, and an erasure to a
> primitive type isn't possible, because `<T extends int>` is not legal.
>
> #### &sect;2.9 Method invocation
>
> When invoking a method where one or more parameters is of type
> `typearg_int`, then during the creation of the bytecode to load the
> parameters on the stack, an `INVOKESTATIC` to `Integer.valueOf(int)` is
> generated to box the primitive. Similarly, the signature generated in the
> `INVOKEVIRTUAL`/`INVOKESTATIC` call is of the erased typevar's type and not
> `I`.
>
> If the invoked method's return type is of type `typearg_int`, the return
> type in the signature is as usual replaced with the erased typevar's type.
> Unless the return value is discarded because the method invocation appears
> as expression statement, an `INVOKEVIRTUAL` call to `Integer.intValue()` is
> inserted immediately after the `INVOKEVIRTUAL`/`INVOKESTATIC` instruction.
>
> #### &sect;2.10 Method declaration (Preamble)
>
> This part of the spec gets somewhat complicated, but this complication is
> not new! For example, imagine the following two types:
>
>    public class A<T> {
>        void foo(T in) {}
>    }
>
>    public class B extends A<String> {
>        void foo(String in) {}
>    }
>
> The current javac compiler solves this scenario by generating a synthetic
> wrapper with the signature: `void foo(Object)`, as `A`'s `foo` method has
> that erased type (`Object` is `T`'s type bound in class `A`). This synthetic
> method wraps the call to `B`. You can actually see this in action by looking
> at the stack trace you get when you throw an exception from `B.foo` and then
> calling it via: `A<String> x = new B(); x.foo("");`
>
> The current java gets even more complicated if the supertype contains both
> `void foo(T)` and `void foo(String)`, which is of course legal as `T` erases
> to `Object`, not `String`. When defining `void foo(String)` in `B`, *BOTH*
> methods are overridden. Furthermore, trying to call `foo("")` on a variable
> of type `A<String>` is not possible; a *this call is ambiguous* error is
> generated. This error cannot be avoided by casting the argument; only by
> casting the receiver (an instance of `A`) to a raw type can `foo` even be
> called in such a circumstance.
>
> The same problems occur when primitives are legal type bounds for type
> variables, and the same solutions suffice as well. There is some friction
> here, as the above java puzzler-esque exercise shows, but I'm not aware of
> any proposal for e.g. Project Coin to adress this problem, and thus one can
> safely conclude that this oddity is not that important.
>
> #### &sect;2.11 Method Declaration (parameter)
>
> Let's say we have:
>
>    public abstract class IntList implements List<int> {
>        public boolean add(int v) {return false;}
>   }
>
> In this case, both the actual method is generated, as well as a synthetic
> method where every `typearg_int` is replaced with `Ljava/lang/Object;` (the
> erased type of typevar), i.e. both of these:
>
>    add(I)Z
>    add(Ljava/lang/Object;)Z
>
> are generated, with the second method typecasting its argument to `Integer`,
> unboxing it, and calling the first. Both `ClassCastException` and
> `NullPointerException` can occur as part of this operation, but the former
> only if heap pollution has taken place.
>
> Note there is no risk of an explosion of method signatures when a large
> number of `typearg_int` typed parameters show up. In a hypothetical method
> `add(T1 a, T2 b, T3 c)` with all 3 parameters a type variable, when all type
> variables are bound to a primitive type, then only two methods are
> generated: all 3 as primitive, and all 3 as wrapper type. Entirely analogous
> to the convoluted example at the beginning of this section, if this method
> is accessed as a `List<int>`, the wrapper type version will be called, but
> code that works directly with the `IntList` type will generate a call to the
> primitive version.
>
> #### &sect;2.12 Method Declaration (return type)
>
> Let's say we have:
>
>    public abstract class IntList implements List<int> {
>        public int get(int idx) {return 0;}
>    }
>
> In this case, again two versions are generated:
>
>    get(I)I
>    get(I)Ljava/lang/Object;
>
> Where `Object` comes from the erased type of this particular version of
> `typearg_int`. The second method simply calls on the first, boxes the
> result, and returns that.
>
> If a method contains both `typearg_int` in its parameter list as well as its
> return type, still only 2 methods are produced; one with all-primitives, and
> one with all-wrappers. The wrapper will unbox parameters and box the
> returned value.
>
> ### &sect;2.13 The JRockit gambit
>
> It is virtually impossible to make something like this:
>
>    List<int> list = new ArrayList<int>();
>
> as efficient as an `int[]` without reification; `ArrayList`'s implementation
> creates an `Object[]` to store the list's elements in, and once it does
> this, the performance game is automatically lost - it must create an `int[]`
> array to get performance that is comparable to the performance of a raw
> `int` array. However, it couldn't possibly do this; when `ArrayList`'s
> constructor is called, its type parameter has been erased, so `ArrayList`
> simply cannot tell what kind of array it should make. However, all hope is
> not lost here, as this can be made quite efficient:
>
>     List<int> list = new IntArrayList();
>
> or, to avoid up to 8 new classes in java.util, these specialized classes can
> be turned into inner classes of ArrayList, and ArrayList can gain a new
> "static" constructor:
>
>    List<int> list = ArrayList.with(int.class);
>
> The above example is poor man's reification: This way ArrayList's "with"
> static method can in fact create a new IntArrayList by runtime-inspecting
> the `int.class` parameter. However, even with the receiver (`IntArrayList`)
> being specifically built with the `int` primitive type in mind, and the
> caller (the one that declared `List<int> list`) also operating solely on the
> primitives level, the above compilation strategy still means that the
> following code:
>
>    list.add(10);
>
> results in the following bytecode:
>
>    ICONST 10
>    INVOKESTATIC java/lang/Integer valueOf(I)Ljava/lang/Integer;
>    INVOKEVIRTUAL java/util/List add(Ljava/lang/Object;)Z
>
>    .... execution moves to `IntArrayList`'s code
>
>    INVOKEVIRTUAL java/lang/Integer intValue()I
>    INVOKEVIRTUAL java/util/IntArrayList add(I)Z
>
>    ....
>
> The hotspot compiler will eventually inline the static call and the last 2
> calls (as they are to small final methods), but nevertheless there's an
> object on the heap involved (the boxed integer), and two calls, even if
> inlined.
>
> However, the JRockit VM already eliminates sequential box/unbox calls even
> across invocations, which eliminates the heap object entirely. Together with
> inlining the pass-through call from add(Object) to add(int) this call is
> then just as fast as calling the *GNU Trove* `TIntArrayList`. See <
> http://blogs.oracle.com/ohrstrom/2009/05/pulling_a_machine_code_rabbit.html>
> for a treatise on how the JRockit VM handles eliminating sequential
> box/unbox operations.
>
> In situations where either the receiver (such as `java.util.ArrayList`) or
> the sender (some generic code that created a new list of it's own type
> variable `E`) is not working exclusively with `int` this optimization
> obviously won't work, _but_, this optimization would be impossible without
> at least reification, which is not feasible for JDK7.
>
> ### &sect;2.14 Generics Conversion
>
> A type such as `Map<String, String>` can be treated as a `Map<String, ?
> extends Object>` implicitly. The same property is granted to primitives; a
> primitive will readily convert to its wrapper type, which includes
> conversion to a (bound) wildcard as `Integer` has a place in the type
> hierarchy. In other words, a `Map<int, boolean>` will implicitly convert to
> a `Map<? extends Number, ?>`, for example. This leads automatically to a
> dilemma; it means `List<int>` can be treated as a `List<Integer>`, and in
> that way, `null` can be added to this list which isn't technically
> compatible with primitives, and as a result, treating this list as a
> `List<int>` again can result in a `NullPointerException`. This type
> conversion feels like autoboxing/unboxing which suffers from the same
> problem, and as covered this omission in the type system has been deemed
> acceptable before. As has been stated, `int` as a type bound is legal
> anywhere `Integer` would be, which implies that converting from for example
> `List<Integer>` to `List<int>` is also legal.
>
> ### &sect;2.15 The benefit to the java community - recap
>
> Even without the JRockit optimization (which can at any time be added
> without requiring changes to this spec, which is focussed on the JLS and the
> compiler, and not the VM), this proposal is highly beneficial to the java
> community. Generified libraries virtually never offer primitive-optimized
> versions. For example, there is no ArrayList-like class for primitive ints
> in the java runtime; one would have to reach for third party libraries such
> as GNU Trove to find them. The large amount of code duplication that is
> required to produce a primitive-optimized version for all 8 primitive types
> is also very inpractical. It is clear, then, that the java community has
> decided to accept the performance penalty, and more problematically, the
> readability penalty, by choosing to use for example `List<Integer>`. The
> readily available alternative, arrays, are continuously problematic in new
> proposals, as they don't work well together with generics and have different
> variance rules.
>
> This proposal would also help greatly in APIs which are designed with
> primitive arrays in mind. For example, the java.util.Arrays class has a
> utility method to sort an int[], but only by the natural order of integers.
> There is no `Comparator` based method for `int[]` in `java.util.Arrays`;
> there is only such a method for sorting `T[]`, and `T` can currently only
> stand for non-primitive types. With this feature, the following code could
> be legal:
>
>    int[] array = {1, -3, 2, 8, 6};
>    Arrays.sort(array, new Comparator<int> {
>        public int compare(int a, int b) {
>            a = Math.abs(a); b = Math.abs(b);
>            return a < b ? -1 : a > b ? 1 : 0;
>        }
>    });
>
> or, with closure support:
>
>    Arrays.sort(array, #(int a, int b) {
>        a = Math.abs(a); b = Math.abs(b);
>        return a < b ? -1 : a > b ? 1 : 0;
>    });
>
> In other words, convenient syntax, fast execution, while *keeping* the
> `Comparator` concept. Comparator cannot obviously be removed (it would not
> be backwards compatible), and it would be inconsistent for the sorting APIs
> to work with `Comparator` when sorting Objects, but to work with `#int(int,
> int)` closures when trying to sort an array of primitive ints.
>
> ### &sect;3 Curses! What about arrays!
>
> There's one case that this proposal has not yet covered: What happens when
> `typearg_int` is used in an array?
>
> Let's say we have:
>
>    public class Foo<T> {
>        protected T[] array = null;
>        public abstract void method(T[] param);
>        public abstract T[] returnType();
>   }
>
> What would happen if we tried to write:
>
>    public class Bar extends Foo<int> {
>        @Override public void method(int[] param) {
>            this.array = param;
>        }
>
>        @Override public int[] returnType() {
>            return param;
>        }
>    }
>
> This question seems academic, in that generics and arrays almost never meet,
> but that isn't quite true: `java.util.Collection` itself contains `public
> abstract <T> T[] toList(T[] in);`. It is important to realize that because
> arrays and generics are already convoluted when used together in java6, the
> `T` in that signature is not related to the `E` of List itself (the `E` in
> `public interface Collection<E>`!). For usage in method declarations the
> wrapper methods can create a new array and copy each element,
> autoboxing/unboxing on the way. This is mind-bogglingly inefficient, but it
> is simple. In practice the authors strongly suggest the implementer of
> `IntArrayList` write a specific method to get a straight `int[]` array
> without this convolution which obviously cannot be optimized by the VM the
> way a single `int` to/from `Integer` conversion can be. For array access,
> the proposal suggests treating this concept as extremely rare and thus
> coming up with the simplest thing that could possibly work: Access to the
> array (read/write) is allowed and involves boxing as usual, but a reference
> to `Foo.array` itself will simply remain its erased type and emits a
> warning. Therefore:
>
>    public abstract class Bar extends Foo<int> {
>        void test() {
>            Object[] o = array; //warning but legal
>            int[] p = array;    //error
>        }
>    }
>
> Nevertheless this concept introduces an API incompatibility: The javadoc for
> `toList` states that if the input array has enough room, the same array will
> be returned, and this cannot of course be done if the input array is first
> number-for-number copied into a `Object[]` array, and then after the copy
> operation has completed, this array is translated back to `int[]`. This
> proposal offers no solution to this dilemma. It can only note that as `int`
> is currently illegal in generics, it is technically backwards compatible to
> add a note to the javadoc that explains the same array is *not* returned
> even if it has enough room if the type bound is a primitive.
>
> It also shows a gap in the API of a hypothetical `IntArrayList`: The
> signature (as mandated by the `Collection` interface) must be `public <T>
> T[] toArray(T[] in)`, and as `T[]`'s erasure, which is `Object[]`, is not
> compatible with `int[]`, this method cannot be made fast even for a specific
> subtype such as `IntArrayList`. This proposal offers no pragmatic solution
> other than to create a custom method in `IntArrayList` as well as a static
> utility method in `java.util.Collections` that can translate any list to
> `int[]`, which uses an instanceof check to pick up
> `java.util.ArrayList.IntArrayList`s special method and use that.
>
> The second issue raised is, in `public <T> T[] toArray(T[] in)`, what
> happens if the calling code ends up binding `int` to `T`? This is a
> particularly painful process: Javac needs to generate the conversion from
> `int[]` to `Integer[]` prior to calling the method, and then the method will
> most likely convert back to `int[]`, and on the return phase the same double
> convert occurs for a grand total of 4 conversions. To avoid accidental
> triggering of this potentially very slow operation, a warning should be
> emitted anytime the compiler generates conversion code to convert a
> primitive array to a non-primitive array in order to fit with the erased
> signature of an invoked method. (For fields, as mentioned, the arrays remain
> their erased type, and an attempt to treat them as primitive array results
> in an error).
>
>