Finding the spirit of L-World
Remi Forax
forax at univ-mlv.fr
Tue Jan 8 08:55:40 UTC 2019
But i agree that having all these Q-ref in the bytecode will not help the introduction of generics.
I think that the way to implement the reification of generics is:
- do the specialization of the data shape by asking a boostrap method with the restriction that the data shape has to be covariant with the generic description (so the specialization of a L-type can be a Q-type)
- pass the type arguments out of band so a call site can call with or without the type arguments
- do the specialization of the bytecode at JIT time because at that time you have the type arguments and their usages and you avoid the bytecode explosion of the C++ like templating mechanism.
So Brian, i agree with you that the way to describe a generic class is to use only L-type descriptor and aload/astore bytecodes but it doesn't mean that at runtime when we specialize a generic class, we can not say that a field typed as a L-type is not in fact a Q-type (or an array of L-type is not in fact an array of Q-type).
Rémi
----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Lundi 7 Janvier 2019 18:21:26
> Objet: Finding the spirit of L-World
> I’ve been processing the discussions at the Burlington meeting. While I think
> we made a lot of progress, I think we fell into a few wishful-thinking traps
> with regard to the object model that we are exposing to users. What follows is
> what I think is the natural conclusion of the L-World design — which is a model
> I think users can love, but requires us to go a little farther in what the VM
> does to support it.
>
>
>
>
> # Finding the Spirit of L-world
>
> L-World is, at heart, an attempt to unify reference objects and
> values; they're unified under a common top type (`Object`), a common
> basic type descriptor (`L`), and a common set of bytecodes (`aload` et
> al.) The war cry for L-World should be, therefore, "Everything is an
> Object". And users will be thrilled to see such a unification --
> assuming we can live up to the high expectations that such a promise
> sets.
>
> By unifying references and values under a common type descriptor and
> supertype, we gain significant benefits for _migration_ -- that
> migrating a reference class to a value class does not break the ways
> existing code refers to it.
>
> By unifying under a common set of bytecodes, we gain significant
> benefits for _specialization_; the method body output by the compiler
> can apply equally to reference and value parameterizations, and all
> specialization can be applied on the constant pool only.
>
> If our war cry is "Everything is an Object", we need to ask ourselves
> what behaviors uses should reasonably expect of all objects -- and
> ensure that values and references alike conform to those behaviors.
>
> ## Object model
>
> In Q-world, we struggled with the fact that there was no true top
> type, but most code was written as if `Object` were the top type. This
> was trying to square a circle; the options for introducing a new top
> type in Q-world were not good (an `Any` superclass provided the
> desired unification but a woefully confusing cost model; an
> `Objectible` interface shared between `Object` and values would set
> off a snail stampede to migrate libraries to use `Objectible` as the
> new fake top), but having multiple roots would have further
> exacerbated the pain of the existing bipartite type system.
>
> L-world offers us an out; it makes `Object` a true top type (save for
> primitives -- but see "Poxing", below), so existing code that deals
> with `Object` can immediately accept values (save for totality -- but
> see "Totality", below) without requiring disruptive migration.
>
> A sensible rationalization of the object model for L-World would be to
> have special subclasses of `Object` for references and values:
>
> ```
> class Object { ... }
> class RefObject extends Object { ... }
> class ValObject extends Object { ... }
> ```
>
> We would enforce that `RefObject` is only extended by classes that do
> not have the `ACC_VALUE` bit, that `ValObject` is only extended by
> classes that do have the `ACC_VALUE` bit, and that classes that claim
> to extend `Object` are implicitly reparented according to their
> `ACC_VALUE` bit. (Actually, in this scheme, we can ditch the
> `ACC_VALUE` bit entirely; at load time, we just look at the
> superclass, and if it's `ValObject`, its a value class, otherwise
> it's a reference class.)
>
> Bringing ref-ness and val-ness into the type system in this way has
> many benefits:
>
> - It reinforces the user's understanding of the relationship between
> values and references.
> - It allows us to declare methods or fields that accept any object,
> reference objects only, or value objects only, using existing
> concepts.
> - It provides a place to declare ref-specific or val-specific methods,
> and ref-specific or value-specific implementations of `Object`
> methods. (For example, we could implement `Object::wait` as a final
> throwing method in `ValObject`, if that's the behavior we want).
> - It allows us to express ref-ness or val-ness as generic type
> bounds, as in `<T extends RefObject>`.
>
> We can pull the same move with nullability, by declaring an interface
> `Nullable`:
>
> ```
> interface Nullable { }
> ```
>
> which is implemented by `RefObject`, and, if we support value classes
> being declared as nullable, would be implemented by those value
> classes as well. Again, this allows us to use `Nullable` as a
> parameter type or field type, or as a type bound (`<T extends
> Nullable>`).
>
> ## Totality
>
> The biggest pain point in the LW1 model is that we're saying that
> everything is an `Object`, but we've had to distort the rules of
> `Object` operations in ways that users might find confusing. LW1 says
> that equality comparison, identity hash code, locking, and
> `Object::wait` are effectively partial, but existing code that deals
> in `Object` may be surprised to find this out. Additionally, arrays
> of reference objects are covariant with `Object`, but arrays of value
> objects are currently not.
>
> #### Equality
>
> The biggest and most important challenge is assigning sensible total
> semantics to equality on `Object`; the LW1 equality semantics are
> sound, but not intuitive. There's no way we can explain why for
> values, you don't get `v == v` in a way that people will say "oh, that
> makes sense." If everything is an object, `==` should be a reasonable
> equality relation on objects. This leads us to a somewhat painful
> shift in the semantics of equality, but once we accept that pain, I
> think things look a lot better.
>
> Users will expect (100% reasonably) the following to work:
>
> ```
> Point p1, p2;
>
> p1 == p1 // true
>
> p2 = p1
> p1 == p2 // true
>
> Object o1 = p1, o2 = p2;
>
> o1 == o1 // true
> o1 == o2 // true
> ```
>
> In LW1, if we map `==` to `ACMP`, they do not, and this will violate
> both user intuition and the spirit of "everything is an object". (If
> everything is an object, then when we assign `o1 = p1`, this is just a
> widening conversion, not a boxing conversion -- it's the same
> underlying object, just with a new static type, so it should behave
> the same.)
>
> The crux of the matter is that interfaces, and `Object` (which for
> purposes of this document should be considered an honorary interface)
> can hold either a reference or a value, but we've not yet upgraded our
> notion of interfaces to reflect this kind-polymorphism. This is what
> we have to put on a sounder footing in order to not have users fall
> into the chasm of anomalies. To start with:
>
> - A class is either a ref class or a value class.
> - `C implements I` means that instances of `C` are instances of `I`.
> - Interfaces are polymorphic over value and ref classes.
>
> Now we need to define equality. The terminology is messy, as so many
> of the terms we might want to use (object, value, instance) already
> have associations. For now, we'll describe a _substitutability_
> predicate on two instances:
>
> - Two refs are substitutable if they refer to the same object
> identity.
> - Two primitives are substitutable if they are `==` (modulo special
> pleading for `NaN` -- see `Float::equals` and `Double::equals`).
> - Two values `a` and `b` are substitutable if they are of the same
> type, and for each of the fields `f` of that type, `a.f` and `b.f`
> are substitutable.
>
> We then say that for any two objects, `a == b` iff a and b are
> substitutable.
>
> This is an "everything is an object" story that users can love!
> Everything is an object, equality is total and intuitive on objects,
> interfaces play nicely -— and there are no pesky boxes (except for
> primitives, but see below.) The new concept here is that interfaces
> abstract over refs and values, and therefore operations that we want
> to be total on interfaces -- like equality -- have to take this seam
> into account.
>
> The costs come in two lumps. The first is that if we're comparing two
> objects, we first have to determine whether they are refs or values,
> and do something different for each. We already paid this cost in
> LW1, but here comes the bigger cost: if a value class has fields
> whose static types are interfaces, the comparison may have to recur on
> substitutability. This is horrifying for a VM engineer, but for users,
> this is just a day at the office -- `equals` comparisons routinely
> recur. (For values known to (recursively) have no interface fields
> and no floating point fields, the VM can optimize comparison to a flat
> bitwise comparison.)
>
> This model eliminates the equality anomalies, and provides users with
> an intuitive and sound basis for "same instance".
>
> One might ask whether we really need to push this into `acmp`, or
> whether we can leave `acmp` alone and provide a new API point for
> substitutability, and have the compiler generate invocations of that.
> While the latter is OK for new code, doing so would cause old code to
> behave differently than new when operating on values (or interfaces
> that may hold values), and may cause it to change its behavior on
> recompile. If we're changing what `Object` means, and what `aload`
> can operate on, we should update `acmp` accordingly.
>
> #### `==` and `equals()`
>
> Code that knows what type it is dealing with generally uses either
> `==` or `equals()`, but not both; generic code (such as `HashMap`)
> generally uses the idiom `a == b || a.equals(b)`. Such code _could_
> fall back to just using `equals()`; this idiom arose as an
> optimization to avoid the virtual method invocation, but the first
> part can be dropped with no semantic loss.
>
> As the cost of `==` gets higher, this optimization (as optimizations
> often do!) may begin to bite back; the `equals()` implementation often
> includes an `==` check as well. There are lots of things we can do
> here, but it is probably best to wait to see what the actual
> performance impact is before doing anything.
>
> #### Identity hash code
>
> Because values have no identity, in LW1 `System::identityHashCode`
> throws `UnsupportedOperationException`. However, this is
> unnecessarily harsh; for values, `identityHashCode` could simply
> return `hashCode`. This would enable classes like `IdentityHashMap`
> (used by serialization frameworks) to accept values without
> modification, with reasonable semantics -- two objects would be deemed
> the same if they are `==`. (For serialization, this means that equal
> values would be interned in the stream, which is probably what is
> wanted.)
>
> #### Locking
>
> Locking is a difficult one. On the one hand, it's bad form to lock on
> an object that hasn't explicitly invited you to participate in its
> locking protocol. On the other hand, there is likely code out there
> that does things like lock on client objects, which might expect at
> least exclusion with other code that locks the same object, and a
> _happens-before_ edge between the release and the acquire. Having
> locking all of a sudden throw `IllegalMonitorStateException` would
> break such code; while we may secretly root for such code to be
> broken, the reality is that such code is likely at the heart of large
> legacy systems that are difficult to modify. So we may well be forced
> into totalizing locking in some way. (Totalizing locking also means
> totalizing the `Object` methods related to locking, `wait`, `notify`,
> and `notifyAll`.)
>
> There are a spectrum of interpretations for totalizing locking, each
> with different tradeoffs:
>
> - Treat locking on a value as an entirely local operation, providing
> no exclusion and no happens-before edge. Existing code will
> continue to run when provided with values, but may produce
> unexpected results.
> - Alternately, treat locking on a value as providing no exclusion,
> but with acquire and release semantics.) Wait and notify would
> still throw.
> - Treat locking on a value as acquiring a fat lock (say, a global
> value lock, a per-type value lock, etc.) This gives us exclusion
> and visibility, with a small risk of deadlock in situations where
> multiple such locks are held, and a sensible semantics for wait
> and notify (single notify would have to be promoted to `notifyAll`).
> - Treat locking on a value as acquiring a proxy lock which is
> inflated by the runtime, which assigns a unique lock to each
> distinguishable value.
> - Put lock-related methods on `ValObject`, whose defaults do one of
> the above, and allow implementations to override them.
>
> While nearly all of these options are horrifying, the goal here is
> not to do something _good_, but merely to do something _good enough_
> to avoid crushing legacy code.
>
> #### Array covariance
>
> Currently, for any class `C`, `C[] <: Object[]`. This makes
> `Object[]` the "top array type". If everything is an object, then an
> array of anything should also be an array of `Object`.
>
> There are two paths to delivering on this vision: extend traditional
> array covariance to value arrays (potentially making `aaload` sites
> megamorphic), or moving in the direction of "Arrays 2.0" and define a
> specializable generic type `Array<T>` where the legacy arrays
> implement `Array<T>`, and require clients to migrate from `T[]` to
> `Array<T>` before specializing their generic classes.
>
> ## Poxing
>
> The Model 3 specializer focused on specializing generics over
> primitives, not values (because we hadn't implemented values yet).
> Many of the complexities we ran into in that exploration stemmed from
> the accidental asymmetries between primitives and objects, including
> irregularities in the bytecode set (single vs double slot, `if_icmpeq`
> vs `dcmp` + `if`). Having unified references and values, it would be
> really nice to unify primitives as well.
>
> While we can't exactly do that easily, beacause of the intrusion to
> the bytecode set, we may be able to come close, using a modified
> boxing conversion. The problem with the existing boxing conversion is
> that `Integer` is a heavy box with identity -- which means boxing is
> expensive. There are two possible paths by which we could mitigate
> this pain:
>
> - Migrate `Integer` to be a value type;
> - Create an alternate box for `int`, which is a value class (`ValInt`)
>
> If we can box primitives to values, then we need not unify primitives
> with objects -- we just insert boxing conversions in the places we
> already do, and interpret specializations like `List<int>` to mean
> "List of int's box".
>
> Migrating `Integer` to be a value may seem the obvious move, but it is
> fraught with compatibility constraints -- there is tons of legacy code
> that does things like locking on `Integer` or depending on it's
> strange accidental identity. Perhaps if we could totalize locking and
> remove the public box constructors, we could get there -- but this
> is not a slam-dunk.
>
> The alternative is creating a value box for primitives (a "pox") and
> adjust the compiler's boxing behavior (when boxing to `Object` or an
> interface, prefer the pox to the box). This too has some
> compatibility concerns, such as code that deals in `Object` that
> assumes that primitives are always boxed to legacy boxes. We may be
> able to finesse this by a trick -- to teach `instanceof` and
> `checkcast` of the relationship between boxes and poxes, so that code
> like:
>
> ```
> if (o instanceof Integer) {
> Integer i = (Integer) o;
> // use o
> }
> ```
>
> would work on both `Integer` and `int`'s pox (by saying "yes" in
> `instanceof` and doing the conversion in `checkcast`.) This move,
> while somewhat risky, could allow us to relegate the legacy boxes to
> legacy, and eventually deprecate them. (We could then have methods
> and intefaces on the poxes, and lift them to the primitives via
> poxing, so that `int` could be seen to implement `Comparable<int>` and
> you could call `compareTo()` on ints.) While this would not be a true
> unification, it would come much closer than we are now.
>
> Clearly, both alternatives are risky and require more investigation
> -- but both have promising payoffs.
>
> ## Migration
>
> In both Q-world and L-world, we took care to ensure that for a value
> class `C`, the descriptor `LC;` describes a subtype of `Object`. This
> is a key part of the story for migrating reference types to values,
> since clients of `C` will describe it with `LC;` and we don't want to
> require a flag day on migration. In Q-world, `LC;` is the (nullable)
> box for `C`; in L-world, it is a nullable `C`.
>
> This is enough that we can migrate a value-based class to a value and
> _existing binary clients_ will not break, even if they stuff a null
> into an `LC;`. However, there are other migration compatibility
> concerns which we need to take up (which I'll do in a separate
> document.)
>
> ## Generics
>
> In Q-world, because values and references were so different,
> specializable generic classes had to be compiled with additional
> constraints. For a specializable type variable `T`, we enforced:
>
> - Cannot compare a `T` to `null`
> - Cannot assign `null` to a `T`
> - Cannot assign a `T` to `Object`
> - Cannot assign a `T[]` to `Object[]`
> - Cannot lock on a `T`
> - Cannot `==` on a `T`
>
> In L-world, the need for most of these can go away. Because
> everything is an object, we can assign values to `Object`, and
> `acmp_null` should work on all objects, so comparing with `null` is
> OK. If we have array covariance, the array assignment restriction
> goes away. If we totalize locking and equality, those restrictions go
> away. The only restriction that remains is the assignment to `null`.
> But now the VM can express the difference between nullable values and
> non-nullable values, and we can express this in the source type system
> with `Nullable`. So all the Q-world restrictions go away, and they
> are replaced by an indication that a given type variable (or perhaps
> an entire generic class) is erased or reifiable, and we treat erased
> type variables as if they have an implicit `Nullable` bound. Then the
> compile-time null-assignment restriction reduces to "does `T` have a
> `Nullable` bound", and the restriction against instantiating an erased
> generic class with a Q-type reduces to a simple bounds violation.
>
> (There's lots more to cover on generics -- again, separate document.)
More information about the valhalla-spec-observers
mailing list