From karen.kinnear at oracle.com Wed Jan 2 14:21:01 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 2 Jan 2019 09:21:01 -0500 Subject: Next Valhalla EG meeting January 16, 2019 In-Reply-To: References: Message-ID: Happy New Year! Correction to my previous email - January 16, 2019 is our next Valhalla EG meeting thanks, Karen From forax at univ-mlv.fr Sat Jan 5 16:41:52 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 5 Jan 2019 17:41:52 +0100 (CET) Subject: val and box Message-ID: <2089888137.393043.1546706512867.JavaMail.zimbra@u-pem.fr> Hi, from the language perspective, one can create a value type using a "value class" VT or a nullable value type NVT with the annonation @ValueBased. The language let you automatically derive - a nullable value type VT.box from a value type VT and VT.val is equivalent to VT. - a non nullable value type NVT.val from a nullable value type NVT and NVT.box is equivalent to NVT. The problem of this scheme is that it creates aliasing of types, VT and VT.val (resp NVT and NVT.box) is the same type for the compiler and as usual if you have a language that support type aliasing you have troubles with the error messages (basically you need to bookkeep the type given by the user and replace the type used by the compiler by the type given by the user when creating an error message). I think it's far simpler to not allow NVT.box and VT.val, at least until we talk about primitive types and their current val/box. R?mi From brian.goetz at oracle.com Mon Jan 7 17:21:26 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 7 Jan 2019 12:21:26 -0500 Subject: Finding the spirit of L-World Message-ID: I?ve been processing the discussions at the Burlington meeting. While I think we made a lot of progress, I think we fell into a few wishful-thinking traps with regard to the object model that we are exposing to users. What follows is what I think is the natural conclusion of the L-World design ? which is a model I think users can love, but requires us to go a little farther in what the VM does to support it. # Finding the Spirit of L-world L-World is, at heart, an attempt to unify reference objects and values; they're unified under a common top type (`Object`), a common basic type descriptor (`L`), and a common set of bytecodes (`aload` et al.) The war cry for L-World should be, therefore, "Everything is an Object". And users will be thrilled to see such a unification -- assuming we can live up to the high expectations that such a promise sets. By unifying references and values under a common type descriptor and supertype, we gain significant benefits for _migration_ -- that migrating a reference class to a value class does not break the ways existing code refers to it. By unifying under a common set of bytecodes, we gain significant benefits for _specialization_; the method body output by the compiler can apply equally to reference and value parameterizations, and all specialization can be applied on the constant pool only. If our war cry is "Everything is an Object", we need to ask ourselves what behaviors uses should reasonably expect of all objects -- and ensure that values and references alike conform to those behaviors. ## Object model In Q-world, we struggled with the fact that there was no true top type, but most code was written as if `Object` were the top type. This was trying to square a circle; the options for introducing a new top type in Q-world were not good (an `Any` superclass provided the desired unification but a woefully confusing cost model; an `Objectible` interface shared between `Object` and values would set off a snail stampede to migrate libraries to use `Objectible` as the new fake top), but having multiple roots would have further exacerbated the pain of the existing bipartite type system. L-world offers us an out; it makes `Object` a true top type (save for primitives -- but see "Poxing", below), so existing code that deals with `Object` can immediately accept values (save for totality -- but see "Totality", below) without requiring disruptive migration. A sensible rationalization of the object model for L-World would be to have special subclasses of `Object` for references and values: ``` class Object { ... } class RefObject extends Object { ... } class ValObject extends Object { ... } ``` We would enforce that `RefObject` is only extended by classes that do not have the `ACC_VALUE` bit, that `ValObject` is only extended by classes that do have the `ACC_VALUE` bit, and that classes that claim to extend `Object` are implicitly reparented according to their `ACC_VALUE` bit. (Actually, in this scheme, we can ditch the `ACC_VALUE` bit entirely; at load time, we just look at the superclass, and if it's `ValObject`, its a value class, otherwise it's a reference class.) Bringing ref-ness and val-ness into the type system in this way has many benefits: - It reinforces the user's understanding of the relationship between values and references. - It allows us to declare methods or fields that accept any object, reference objects only, or value objects only, using existing concepts. - It provides a place to declare ref-specific or val-specific methods, and ref-specific or value-specific implementations of `Object` methods. (For example, we could implement `Object::wait` as a final throwing method in `ValObject`, if that's the behavior we want). - It allows us to express ref-ness or val-ness as generic type bounds, as in ``. We can pull the same move with nullability, by declaring an interface `Nullable`: ``` interface Nullable { } ``` which is implemented by `RefObject`, and, if we support value classes being declared as nullable, would be implemented by those value classes as well. Again, this allows us to use `Nullable` as a parameter type or field type, or as a type bound (``). ## Totality The biggest pain point in the LW1 model is that we're saying that everything is an `Object`, but we've had to distort the rules of `Object` operations in ways that users might find confusing. LW1 says that equality comparison, identity hash code, locking, and `Object::wait` are effectively partial, but existing code that deals in `Object` may be surprised to find this out. Additionally, arrays of reference objects are covariant with `Object`, but arrays of value objects are currently not. #### Equality The biggest and most important challenge is assigning sensible total semantics to equality on `Object`; the LW1 equality semantics are sound, but not intuitive. There's no way we can explain why for values, you don't get `v == v` in a way that people will say "oh, that makes sense." If everything is an object, `==` should be a reasonable equality relation on objects. This leads us to a somewhat painful shift in the semantics of equality, but once we accept that pain, I think things look a lot better. Users will expect (100% reasonably) the following to work: ``` Point p1, p2; p1 == p1 // true p2 = p1 p1 == p2 // true Object o1 = p1, o2 = p2; o1 == o1 // true o1 == o2 // true ``` In LW1, if we map `==` to `ACMP`, they do not, and this will violate both user intuition and the spirit of "everything is an object". (If everything is an object, then when we assign `o1 = p1`, this is just a widening conversion, not a boxing conversion -- it's the same underlying object, just with a new static type, so it should behave the same.) The crux of the matter is that interfaces, and `Object` (which for purposes of this document should be considered an honorary interface) can hold either a reference or a value, but we've not yet upgraded our notion of interfaces to reflect this kind-polymorphism. This is what we have to put on a sounder footing in order to not have users fall into the chasm of anomalies. To start with: - A class is either a ref class or a value class. - `C implements I` means that instances of `C` are instances of `I`. - Interfaces are polymorphic over value and ref classes. Now we need to define equality. The terminology is messy, as so many of the terms we might want to use (object, value, instance) already have associations. For now, we'll describe a _substitutability_ predicate on two instances: - Two refs are substitutable if they refer to the same object identity. - Two primitives are substitutable if they are `==` (modulo special pleading for `NaN` -- see `Float::equals` and `Double::equals`). - Two values `a` and `b` are substitutable if they are of the same type, and for each of the fields `f` of that type, `a.f` and `b.f` are substitutable. We then say that for any two objects, `a == b` iff a and b are substitutable. This is an "everything is an object" story that users can love! Everything is an object, equality is total and intuitive on objects, interfaces play nicely -? and there are no pesky boxes (except for primitives, but see below.) The new concept here is that interfaces abstract over refs and values, and therefore operations that we want to be total on interfaces -- like equality -- have to take this seam into account. The costs come in two lumps. The first is that if we're comparing two objects, we first have to determine whether they are refs or values, and do something different for each. We already paid this cost in LW1, but here comes the bigger cost: if a value class has fields whose static types are interfaces, the comparison may have to recur on substitutability. This is horrifying for a VM engineer, but for users, this is just a day at the office -- `equals` comparisons routinely recur. (For values known to (recursively) have no interface fields and no floating point fields, the VM can optimize comparison to a flat bitwise comparison.) This model eliminates the equality anomalies, and provides users with an intuitive and sound basis for "same instance". One might ask whether we really need to push this into `acmp`, or whether we can leave `acmp` alone and provide a new API point for substitutability, and have the compiler generate invocations of that. While the latter is OK for new code, doing so would cause old code to behave differently than new when operating on values (or interfaces that may hold values), and may cause it to change its behavior on recompile. If we're changing what `Object` means, and what `aload` can operate on, we should update `acmp` accordingly. #### `==` and `equals()` Code that knows what type it is dealing with generally uses either `==` or `equals()`, but not both; generic code (such as `HashMap`) generally uses the idiom `a == b || a.equals(b)`. Such code _could_ fall back to just using `equals()`; this idiom arose as an optimization to avoid the virtual method invocation, but the first part can be dropped with no semantic loss. As the cost of `==` gets higher, this optimization (as optimizations often do!) may begin to bite back; the `equals()` implementation often includes an `==` check as well. There are lots of things we can do here, but it is probably best to wait to see what the actual performance impact is before doing anything. #### Identity hash code Because values have no identity, in LW1 `System::identityHashCode` throws `UnsupportedOperationException`. However, this is unnecessarily harsh; for values, `identityHashCode` could simply return `hashCode`. This would enable classes like `IdentityHashMap` (used by serialization frameworks) to accept values without modification, with reasonable semantics -- two objects would be deemed the same if they are `==`. (For serialization, this means that equal values would be interned in the stream, which is probably what is wanted.) #### Locking Locking is a difficult one. On the one hand, it's bad form to lock on an object that hasn't explicitly invited you to participate in its locking protocol. On the other hand, there is likely code out there that does things like lock on client objects, which might expect at least exclusion with other code that locks the same object, and a _happens-before_ edge between the release and the acquire. Having locking all of a sudden throw `IllegalMonitorStateException` would break such code; while we may secretly root for such code to be broken, the reality is that such code is likely at the heart of large legacy systems that are difficult to modify. So we may well be forced into totalizing locking in some way. (Totalizing locking also means totalizing the `Object` methods related to locking, `wait`, `notify`, and `notifyAll`.) There are a spectrum of interpretations for totalizing locking, each with different tradeoffs: - Treat locking on a value as an entirely local operation, providing no exclusion and no happens-before edge. Existing code will continue to run when provided with values, but may produce unexpected results. - Alternately, treat locking on a value as providing no exclusion, but with acquire and release semantics.) Wait and notify would still throw. - Treat locking on a value as acquiring a fat lock (say, a global value lock, a per-type value lock, etc.) This gives us exclusion and visibility, with a small risk of deadlock in situations where multiple such locks are held, and a sensible semantics for wait and notify (single notify would have to be promoted to `notifyAll`). - Treat locking on a value as acquiring a proxy lock which is inflated by the runtime, which assigns a unique lock to each distinguishable value. - Put lock-related methods on `ValObject`, whose defaults do one of the above, and allow implementations to override them. While nearly all of these options are horrifying, the goal here is not to do something _good_, but merely to do something _good enough_ to avoid crushing legacy code. #### Array covariance Currently, for any class `C`, `C[] <: Object[]`. This makes `Object[]` the "top array type". If everything is an object, then an array of anything should also be an array of `Object`. There are two paths to delivering on this vision: extend traditional array covariance to value arrays (potentially making `aaload` sites megamorphic), or moving in the direction of "Arrays 2.0" and define a specializable generic type `Array` where the legacy arrays implement `Array`, and require clients to migrate from `T[]` to `Array` before specializing their generic classes. ## Poxing The Model 3 specializer focused on specializing generics over primitives, not values (because we hadn't implemented values yet). Many of the complexities we ran into in that exploration stemmed from the accidental asymmetries between primitives and objects, including irregularities in the bytecode set (single vs double slot, `if_icmpeq` vs `dcmp` + `if`). Having unified references and values, it would be really nice to unify primitives as well. While we can't exactly do that easily, beacause of the intrusion to the bytecode set, we may be able to come close, using a modified boxing conversion. The problem with the existing boxing conversion is that `Integer` is a heavy box with identity -- which means boxing is expensive. There are two possible paths by which we could mitigate this pain: - Migrate `Integer` to be a value type; - Create an alternate box for `int`, which is a value class (`ValInt`) If we can box primitives to values, then we need not unify primitives with objects -- we just insert boxing conversions in the places we already do, and interpret specializations like `List` to mean "List of int's box". Migrating `Integer` to be a value may seem the obvious move, but it is fraught with compatibility constraints -- there is tons of legacy code that does things like locking on `Integer` or depending on it's strange accidental identity. Perhaps if we could totalize locking and remove the public box constructors, we could get there -- but this is not a slam-dunk. The alternative is creating a value box for primitives (a "pox") and adjust the compiler's boxing behavior (when boxing to `Object` or an interface, prefer the pox to the box). This too has some compatibility concerns, such as code that deals in `Object` that assumes that primitives are always boxed to legacy boxes. We may be able to finesse this by a trick -- to teach `instanceof` and `checkcast` of the relationship between boxes and poxes, so that code like: ``` if (o instanceof Integer) { Integer i = (Integer) o; // use o } ``` would work on both `Integer` and `int`'s pox (by saying "yes" in `instanceof` and doing the conversion in `checkcast`.) This move, while somewhat risky, could allow us to relegate the legacy boxes to legacy, and eventually deprecate them. (We could then have methods and intefaces on the poxes, and lift them to the primitives via poxing, so that `int` could be seen to implement `Comparable` and you could call `compareTo()` on ints.) While this would not be a true unification, it would come much closer than we are now. Clearly, both alternatives are risky and require more investigation -- but both have promising payoffs. ## Migration In both Q-world and L-world, we took care to ensure that for a value class `C`, the descriptor `LC;` describes a subtype of `Object`. This is a key part of the story for migrating reference types to values, since clients of `C` will describe it with `LC;` and we don't want to require a flag day on migration. In Q-world, `LC;` is the (nullable) box for `C`; in L-world, it is a nullable `C`. This is enough that we can migrate a value-based class to a value and _existing binary clients_ will not break, even if they stuff a null into an `LC;`. However, there are other migration compatibility concerns which we need to take up (which I'll do in a separate document.) ## Generics In Q-world, because values and references were so different, specializable generic classes had to be compiled with additional constraints. For a specializable type variable `T`, we enforced: - Cannot compare a `T` to `null` - Cannot assign `null` to a `T` - Cannot assign a `T` to `Object` - Cannot assign a `T[]` to `Object[]` - Cannot lock on a `T` - Cannot `==` on a `T` In L-world, the need for most of these can go away. Because everything is an object, we can assign values to `Object`, and `acmp_null` should work on all objects, so comparing with `null` is OK. If we have array covariance, the array assignment restriction goes away. If we totalize locking and equality, those restrictions go away. The only restriction that remains is the assignment to `null`. But now the VM can express the difference between nullable values and non-nullable values, and we can express this in the source type system with `Nullable`. So all the Q-world restrictions go away, and they are replaced by an indication that a given type variable (or perhaps an entire generic class) is erased or reifiable, and we treat erased type variables as if they have an implicit `Nullable` bound. Then the compile-time null-assignment restriction reduces to "does `T` have a `Nullable` bound", and the restriction against instantiating an erased generic class with a Q-type reduces to a simple bounds violation. (There's lots more to cover on generics -- again, separate document.) From forax at univ-mlv.fr Tue Jan 8 08:38:52 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 8 Jan 2019 09:38:52 +0100 (CET) Subject: Finding the spirit of L-World In-Reply-To: References: Message-ID: <782061282.336996.1546936732842.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "valhalla-spec-experts" > Envoy?: Lundi 7 Janvier 2019 18:21:26 > Objet: Finding the spirit of L-World > I?ve been processing the discussions at the Burlington meeting. While I think > we made a lot of progress, I think we fell into a few wishful-thinking traps > with regard to the object model that we are exposing to users. What follows is > what I think is the natural conclusion of the L-World design ? which is a model > I think users can love, but requires us to go a little farther in what the VM > does to support it. > > > > > # Finding the Spirit of L-world > > L-World is, at heart, an attempt to unify reference objects and > values; they're unified under a common top type (`Object`), a common > basic type descriptor (`L`), and a common set of bytecodes (`aload` et > al.) The war cry for L-World should be, therefore, "Everything is an > Object". And users will be thrilled to see such a unification -- > assuming we can live up to the high expectations that such a promise > sets. > > By unifying references and values under a common type descriptor and > supertype, we gain significant benefits for _migration_ -- that > migrating a reference class to a value class does not break the ways > existing code refers to it. not more than using ACC_VALUE (see below) > > By unifying under a common set of bytecodes, we gain significant > benefits for _specialization_; the method body output by the compiler > can apply equally to reference and value parameterizations, and all > specialization can be applied on the constant pool only. > > If our war cry is "Everything is an Object", we need to ask ourselves > what behaviors uses should reasonably expect of all objects -- and > ensure that values and references alike conform to those behaviors. yes > > ## Object model > > In Q-world, we struggled with the fact that there was no true top > type, but most code was written as if `Object` were the top type. This > was trying to square a circle; the options for introducing a new top > type in Q-world were not good (an `Any` superclass provided the > desired unification but a woefully confusing cost model; an > `Objectible` interface shared between `Object` and values would set > off a snail stampede to migrate libraries to use `Objectible` as the > new fake top), but having multiple roots would have further > exacerbated the pain of the existing bipartite type system. > > L-world offers us an out; it makes `Object` a true top type (save for > primitives -- but see "Poxing", below), so existing code that deals > with `Object` can immediately accept values (save for totality -- but > see "Totality", below) without requiring disruptive migration. > > A sensible rationalization of the object model for L-World would be to > have special subclasses of `Object` for references and values: > > ``` > class Object { ... } > class RefObject extends Object { ... } > class ValObject extends Object { ... } > ``` > > We would enforce that `RefObject` is only extended by classes that do > not have the `ACC_VALUE` bit, that `ValObject` is only extended by > classes that do have the `ACC_VALUE` bit, and that classes that claim > to extend `Object` are implicitly reparented according to their > `ACC_VALUE` bit. (Actually, in this scheme, we can ditch the > `ACC_VALUE` bit entirely; at load time, we just look at the > superclass, and if it's `ValObject`, its a value class, otherwise > it's a reference class.) Introducing a ValObject as super or having the ACC_VALUE bit are two representation of the same thing for the VM. In both case, we need to load the class to know if it's a value type or not, and in Java, the class loading is delayed which works well for reference type but not well for value type, if you have class that has a field which is a value type, you need to know if the field is a value type or not to be able to flatten it. With your proposal, the VM doesn't know if a field contains a value type or not until it's too late. Or are you suggesting to have a shape shifting objects (Skrulls objects) ? > > Bringing ref-ness and val-ness into the type system in this way has > many benefits: > > - It reinforces the user's understanding of the relationship between > values and references. > - It allows us to declare methods or fields that accept any object, > reference objects only, or value objects only, using existing > concepts. > - It provides a place to declare ref-specific or val-specific methods, > and ref-specific or value-specific implementations of `Object` > methods. (For example, we could implement `Object::wait` as a final > throwing method in `ValObject`, if that's the behavior we want). > - It allows us to express ref-ness or val-ness as generic type > bounds, as in ``. > > We can pull the same move with nullability, by declaring an interface > `Nullable`: > > ``` > interface Nullable { } > ``` > > which is implemented by `RefObject`, and, if we support value classes > being declared as nullable, would be implemented by those value > classes as well. Again, this allows us to use `Nullable` as a > parameter type or field type, or as a type bound (` Nullable>`). same issue here, you want to know if something is nullable or not when you verify the bytecode, but at that point the class may not be loaded so you don't know if the class implements Nullable or not. > > ## Totality > > The biggest pain point in the LW1 model is that we're saying that > everything is an `Object`, but we've had to distort the rules of > `Object` operations in ways that users might find confusing. LW1 says > that equality comparison, identity hash code, locking, and > `Object::wait` are effectively partial, but existing code that deals > in `Object` may be surprised to find this out. Additionally, arrays > of reference objects are covariant with `Object`, but arrays of value > objects are currently not. for the last point, we have decided for now to try to not allow an array of value to be seen as an array of Object but i think it's a mistake because as you said it makes the code irregular. > > #### Equality > > The biggest and most important challenge is assigning sensible total > semantics to equality on `Object`; the LW1 equality semantics are > sound, but not intuitive. There's no way we can explain why for > values, you don't get `v == v` in a way that people will say "oh, that > makes sense." If everything is an object, `==` should be a reasonable > equality relation on objects. This leads us to a somewhat painful > shift in the semantics of equality, but once we accept that pain, I > think things look a lot better. > > Users will expect (100% reasonably) the following to work: > > ``` > Point p1, p2; > > p1 == p1 // true > > p2 = p1 > p1 == p2 // true > > Object o1 = p1, o2 = p2; > > o1 == o1 // true > o1 == o2 // true > ``` > > In LW1, if we map `==` to `ACMP`, they do not, and this will violate > both user intuition and the spirit of "everything is an object". (If > everything is an object, then when we assign `o1 = p1`, this is just a > widening conversion, not a boxing conversion -- it's the same > underlying object, just with a new static type, so it should behave > the same.) > > The crux of the matter is that interfaces, and `Object` (which for > purposes of this document should be considered an honorary interface) > can hold either a reference or a value, but we've not yet upgraded our > notion of interfaces to reflect this kind-polymorphism. This is what > we have to put on a sounder footing in order to not have users fall > into the chasm of anomalies. To start with: > > - A class is either a ref class or a value class. > - `C implements I` means that instances of `C` are instances of `I`. > - Interfaces are polymorphic over value and ref classes. > > Now we need to define equality. The terminology is messy, as so many > of the terms we might want to use (object, value, instance) already > have associations. For now, we'll describe a _substitutability_ > predicate on two instances: > > - Two refs are substitutable if they refer to the same object > identity. > - Two primitives are substitutable if they are `==` (modulo special > pleading for `NaN` -- see `Float::equals` and `Double::equals`). > - Two values `a` and `b` are substitutable if they are of the same > type, and for each of the fields `f` of that type, `a.f` and `b.f` > are substitutable. > > We then say that for any two objects, `a == b` iff a and b are > substitutable. > > This is an "everything is an object" story that users can love! > Everything is an object, equality is total and intuitive on objects, > interfaces play nicely -? and there are no pesky boxes (except for > primitives, but see below.) The new concept here is that interfaces > abstract over refs and values, and therefore operations that we want > to be total on interfaces -- like equality -- have to take this seam > into account. > > The costs come in two lumps. The first is that if we're comparing two > objects, we first have to determine whether they are refs or values, > and do something different for each. We already paid this cost in > LW1, but here comes the bigger cost: if a value class has fields > whose static types are interfaces, the comparison may have to recur on > substitutability. This is horrifying for a VM engineer, but for users, > this is just a day at the office -- `equals` comparisons routinely > recur. (For values known to (recursively) have no interface fields > and no floating point fields, the VM can optimize comparison to a flat > bitwise comparison.) > > This model eliminates the equality anomalies, and provides users with > an intuitive and sound basis for "same instance". Currently, you can not do == on value types, i.e. point1 == point2 doesn't compile, if you want a unified equality, you have to use equals. Users are already used to this weird semantics because this is the semantics of the wrapper object, java.lang.Integer, etc. We don't have to retrofit == in the LIFE pattern because even if == returns false, there is an equals just after the == test. I still prefer to stick with the simple concept of == means testing if the addresses are equal and false for a non nullable value type than trying to elevate == to a kind of equals but not exactly equals because it introduces another kind of "equality" (substituability) which as you said is an extension of the primitive equality but for me it's just another new semantics. > > One might ask whether we really need to push this into `acmp`, or > whether we can leave `acmp` alone and provide a new API point for > substitutability, and have the compiler generate invocations of that. > While the latter is OK for new code, doing so would cause old code to > behave differently than new when operating on values (or interfaces > that may hold values), and may cause it to change its behavior on > recompile. If we're changing what `Object` means, and what `aload` > can operate on, we should update `acmp` accordingly. > > #### `==` and `equals()` > > Code that knows what type it is dealing with generally uses either > `==` or `equals()`, but not both; generic code (such as `HashMap`) > generally uses the idiom `a == b || a.equals(b)`. Such code _could_ > fall back to just using `equals()`; this idiom arose as an > optimization to avoid the virtual method invocation, but the first > part can be dropped with no semantic loss. > > As the cost of `==` gets higher, this optimization (as optimizations > often do!) may begin to bite back; the `equals()` implementation often > includes an `==` check as well. There are lots of things we can do > here, but it is probably best to wait to see what the actual > performance impact is before doing anything. > > #### Identity hash code > > Because values have no identity, in LW1 `System::identityHashCode` > throws `UnsupportedOperationException`. However, this is > unnecessarily harsh; for values, `identityHashCode` could simply > return `hashCode`. This would enable classes like `IdentityHashMap` > (used by serialization frameworks) to accept values without > modification, with reasonable semantics -- two objects would be deemed > the same if they are `==`. (For serialization, this means that equal > values would be interned in the stream, which is probably what is > wanted.) It's not a reasonable semantics, if you serialize to JSON by example, you don't want share value types as object, you want to flatten them. Serialization protocol will need to be updated to work with value types so throwing UnsupportedOperationException indicates to the user that it has to use an updated version of the serialization library he is using. > > #### Locking > > Locking is a difficult one. On the one hand, it's bad form to lock on > an object that hasn't explicitly invited you to participate in its > locking protocol. On the other hand, there is likely code out there > that does things like lock on client objects, which might expect at > least exclusion with other code that locks the same object, and a > _happens-before_ edge between the release and the acquire. Having > locking all of a sudden throw `IllegalMonitorStateException` would > break such code; while we may secretly root for such code to be > broken, the reality is that such code is likely at the heart of large > legacy systems that are difficult to modify. So we may well be forced > into totalizing locking in some way. (Totalizing locking also means > totalizing the `Object` methods related to locking, `wait`, `notify`, > and `notifyAll`.) Same issue as above, what ever the clever emulation you try to comes with, you have less knowledge that the maintainers of the code so it's better to ask them to fix the issue than to come with a trick that will never fully work. In term of compatibility, it's far easier to say, my library doesn't work with value types than my library maybe works with value types. > > There are a spectrum of interpretations for totalizing locking, each > with different tradeoffs: > > - Treat locking on a value as an entirely local operation, providing > no exclusion and no happens-before edge. Existing code will > continue to run when provided with values, but may produce > unexpected results. > - Alternately, treat locking on a value as providing no exclusion, > but with acquire and release semantics.) Wait and notify would > still throw. > - Treat locking on a value as acquiring a fat lock (say, a global > value lock, a per-type value lock, etc.) This gives us exclusion > and visibility, with a small risk of deadlock in situations where > multiple such locks are held, and a sensible semantics for wait > and notify (single notify would have to be promoted to `notifyAll`). > - Treat locking on a value as acquiring a proxy lock which is > inflated by the runtime, which assigns a unique lock to each > distinguishable value. > - Put lock-related methods on `ValObject`, whose defaults do one of > the above, and allow implementations to override them. > > While nearly all of these options are horrifying, the goal here is > not to do something _good_, but merely to do something _good enough_ > to avoid crushing legacy code. It's engineering 101, there are two kind of good enough, the one in an application that you want because you control the data and the one in a library that you don't want because you don't control the data. > > #### Array covariance > > Currently, for any class `C`, `C[] <: Object[]`. This makes > `Object[]` the "top array type". If everything is an object, then an > array of anything should also be an array of `Object`. > > There are two paths to delivering on this vision: extend traditional > array covariance to value arrays (potentially making `aaload` sites > megamorphic), or moving in the direction of "Arrays 2.0" and define a > specializable generic type `Array` where the legacy arrays > implement `Array`, and require clients to migrate from `T[]` to > `Array` before specializing their generic classes. > > ## Poxing > > The Model 3 specializer focused on specializing generics over > primitives, not values (because we hadn't implemented values yet). > Many of the complexities we ran into in that exploration stemmed from > the accidental asymmetries between primitives and objects, including > irregularities in the bytecode set (single vs double slot, `if_icmpeq` > vs `dcmp` + `if`). Having unified references and values, it would be > really nice to unify primitives as well. > > While we can't exactly do that easily, beacause of the intrusion to > the bytecode set, we may be able to come close, using a modified > boxing conversion. The problem with the existing boxing conversion is > that `Integer` is a heavy box with identity -- which means boxing is > expensive. There are two possible paths by which we could mitigate > this pain: > > - Migrate `Integer` to be a value type; > - Create an alternate box for `int`, which is a value class (`ValInt`) > > If we can box primitives to values, then we need not unify primitives > with objects -- we just insert boxing conversions in the places we > already do, and interpret specializations like `List` to mean > "List of int's box". > > Migrating `Integer` to be a value may seem the obvious move, but it is > fraught with compatibility constraints -- there is tons of legacy code > that does things like locking on `Integer` or depending on it's > strange accidental identity. Perhaps if we could totalize locking and > remove the public box constructors, we could get there -- but this > is not a slam-dunk. no good enough solution here ? > > The alternative is creating a value box for primitives (a "pox") and > adjust the compiler's boxing behavior (when boxing to `Object` or an > interface, prefer the pox to the box). This too has some > compatibility concerns, such as code that deals in `Object` that > assumes that primitives are always boxed to legacy boxes. We may be > able to finesse this by a trick -- to teach `instanceof` and > `checkcast` of the relationship between boxes and poxes, so that code > like: > > ``` > if (o instanceof Integer) { > Integer i = (Integer) o; > // use o > } > ``` > > would work on both `Integer` and `int`'s pox (by saying "yes" in > `instanceof` and doing the conversion in `checkcast`.) This move, > while somewhat risky, could allow us to relegate the legacy boxes to > legacy, and eventually deprecate them. (We could then have methods > and intefaces on the poxes, and lift them to the primitives via > poxing, so that `int` could be seen to implement `Comparable` and > you could call `compareTo()` on ints.) While this would not be a true > unification, it would come much closer than we are now. > > Clearly, both alternatives are risky and require more investigation > -- but both have promising payoffs. It's something i've contemplating from some time now, make int.box a generic specialization of java.lang.Integer, so the instanceof/cast trick will work. > > ## Migration > > In both Q-world and L-world, we took care to ensure that for a value > class `C`, the descriptor `LC;` describes a subtype of `Object`. This > is a key part of the story for migrating reference types to values, > since clients of `C` will describe it with `LC;` and we don't want to > require a flag day on migration. In Q-world, `LC;` is the (nullable) > box for `C`; in L-world, it is a nullable `C`. > > This is enough that we can migrate a value-based class to a value and > _existing binary clients_ will not break, even if they stuff a null > into an `LC;`. However, there are other migration compatibility > concerns which we need to take up (which I'll do in a separate > document.) > > ## Generics > > In Q-world, because values and references were so different, > specializable generic classes had to be compiled with additional > constraints. For a specializable type variable `T`, we enforced: > > - Cannot compare a `T` to `null` > - Cannot assign `null` to a `T` > - Cannot assign a `T` to `Object` > - Cannot assign a `T[]` to `Object[]` > - Cannot lock on a `T` > - Cannot `==` on a `T` > > In L-world, the need for most of these can go away. Because > everything is an object, we can assign values to `Object`, and > `acmp_null` should work on all objects, so comparing with `null` is > OK. If we have array covariance, the array assignment restriction > goes away. If we totalize locking and equality, those restrictions go > away. The only restriction that remains is the assignment to `null`. > But now the VM can express the difference between nullable values and > non-nullable values, and we can express this in the source type system > with `Nullable`. So all the Q-world restrictions go away, and they > are replaced by an indication that a given type variable (or perhaps > an entire generic class) is erased or reifiable, and we treat erased > type variables as if they have an implicit `Nullable` bound. Then the > compile-time null-assignment restriction reduces to "does `T` have a > `Nullable` bound", and the restriction against instantiating an erased > generic class with a Q-type reduces to a simple bounds violation. > > (There's lots more to cover on generics -- again, separate document.) R?mi From forax at univ-mlv.fr Tue Jan 8 08:55:40 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 8 Jan 2019 09:55:40 +0100 (CET) Subject: Finding the spirit of L-World In-Reply-To: References: Message-ID: <1099383875.343962.1546937740164.JavaMail.zimbra@u-pem.fr> But i agree that having all these Q-ref in the bytecode will not help the introduction of generics. I think that the way to implement the reification of generics is: - do the specialization of the data shape by asking a boostrap method with the restriction that the data shape has to be covariant with the generic description (so the specialization of a L-type can be a Q-type) - pass the type arguments out of band so a call site can call with or without the type arguments - do the specialization of the bytecode at JIT time because at that time you have the type arguments and their usages and you avoid the bytecode explosion of the C++ like templating mechanism. So Brian, i agree with you that the way to describe a generic class is to use only L-type descriptor and aload/astore bytecodes but it doesn't mean that at runtime when we specialize a generic class, we can not say that a field typed as a L-type is not in fact a Q-type (or an array of L-type is not in fact an array of Q-type). R?mi ----- Mail original ----- > De: "Brian Goetz" > ?: "valhalla-spec-experts" > Envoy?: Lundi 7 Janvier 2019 18:21:26 > Objet: Finding the spirit of L-World > I?ve been processing the discussions at the Burlington meeting. While I think > we made a lot of progress, I think we fell into a few wishful-thinking traps > with regard to the object model that we are exposing to users. What follows is > what I think is the natural conclusion of the L-World design ? which is a model > I think users can love, but requires us to go a little farther in what the VM > does to support it. > > > > > # Finding the Spirit of L-world > > L-World is, at heart, an attempt to unify reference objects and > values; they're unified under a common top type (`Object`), a common > basic type descriptor (`L`), and a common set of bytecodes (`aload` et > al.) The war cry for L-World should be, therefore, "Everything is an > Object". And users will be thrilled to see such a unification -- > assuming we can live up to the high expectations that such a promise > sets. > > By unifying references and values under a common type descriptor and > supertype, we gain significant benefits for _migration_ -- that > migrating a reference class to a value class does not break the ways > existing code refers to it. > > By unifying under a common set of bytecodes, we gain significant > benefits for _specialization_; the method body output by the compiler > can apply equally to reference and value parameterizations, and all > specialization can be applied on the constant pool only. > > If our war cry is "Everything is an Object", we need to ask ourselves > what behaviors uses should reasonably expect of all objects -- and > ensure that values and references alike conform to those behaviors. > > ## Object model > > In Q-world, we struggled with the fact that there was no true top > type, but most code was written as if `Object` were the top type. This > was trying to square a circle; the options for introducing a new top > type in Q-world were not good (an `Any` superclass provided the > desired unification but a woefully confusing cost model; an > `Objectible` interface shared between `Object` and values would set > off a snail stampede to migrate libraries to use `Objectible` as the > new fake top), but having multiple roots would have further > exacerbated the pain of the existing bipartite type system. > > L-world offers us an out; it makes `Object` a true top type (save for > primitives -- but see "Poxing", below), so existing code that deals > with `Object` can immediately accept values (save for totality -- but > see "Totality", below) without requiring disruptive migration. > > A sensible rationalization of the object model for L-World would be to > have special subclasses of `Object` for references and values: > > ``` > class Object { ... } > class RefObject extends Object { ... } > class ValObject extends Object { ... } > ``` > > We would enforce that `RefObject` is only extended by classes that do > not have the `ACC_VALUE` bit, that `ValObject` is only extended by > classes that do have the `ACC_VALUE` bit, and that classes that claim > to extend `Object` are implicitly reparented according to their > `ACC_VALUE` bit. (Actually, in this scheme, we can ditch the > `ACC_VALUE` bit entirely; at load time, we just look at the > superclass, and if it's `ValObject`, its a value class, otherwise > it's a reference class.) > > Bringing ref-ness and val-ness into the type system in this way has > many benefits: > > - It reinforces the user's understanding of the relationship between > values and references. > - It allows us to declare methods or fields that accept any object, > reference objects only, or value objects only, using existing > concepts. > - It provides a place to declare ref-specific or val-specific methods, > and ref-specific or value-specific implementations of `Object` > methods. (For example, we could implement `Object::wait` as a final > throwing method in `ValObject`, if that's the behavior we want). > - It allows us to express ref-ness or val-ness as generic type > bounds, as in ``. > > We can pull the same move with nullability, by declaring an interface > `Nullable`: > > ``` > interface Nullable { } > ``` > > which is implemented by `RefObject`, and, if we support value classes > being declared as nullable, would be implemented by those value > classes as well. Again, this allows us to use `Nullable` as a > parameter type or field type, or as a type bound (` Nullable>`). > > ## Totality > > The biggest pain point in the LW1 model is that we're saying that > everything is an `Object`, but we've had to distort the rules of > `Object` operations in ways that users might find confusing. LW1 says > that equality comparison, identity hash code, locking, and > `Object::wait` are effectively partial, but existing code that deals > in `Object` may be surprised to find this out. Additionally, arrays > of reference objects are covariant with `Object`, but arrays of value > objects are currently not. > > #### Equality > > The biggest and most important challenge is assigning sensible total > semantics to equality on `Object`; the LW1 equality semantics are > sound, but not intuitive. There's no way we can explain why for > values, you don't get `v == v` in a way that people will say "oh, that > makes sense." If everything is an object, `==` should be a reasonable > equality relation on objects. This leads us to a somewhat painful > shift in the semantics of equality, but once we accept that pain, I > think things look a lot better. > > Users will expect (100% reasonably) the following to work: > > ``` > Point p1, p2; > > p1 == p1 // true > > p2 = p1 > p1 == p2 // true > > Object o1 = p1, o2 = p2; > > o1 == o1 // true > o1 == o2 // true > ``` > > In LW1, if we map `==` to `ACMP`, they do not, and this will violate > both user intuition and the spirit of "everything is an object". (If > everything is an object, then when we assign `o1 = p1`, this is just a > widening conversion, not a boxing conversion -- it's the same > underlying object, just with a new static type, so it should behave > the same.) > > The crux of the matter is that interfaces, and `Object` (which for > purposes of this document should be considered an honorary interface) > can hold either a reference or a value, but we've not yet upgraded our > notion of interfaces to reflect this kind-polymorphism. This is what > we have to put on a sounder footing in order to not have users fall > into the chasm of anomalies. To start with: > > - A class is either a ref class or a value class. > - `C implements I` means that instances of `C` are instances of `I`. > - Interfaces are polymorphic over value and ref classes. > > Now we need to define equality. The terminology is messy, as so many > of the terms we might want to use (object, value, instance) already > have associations. For now, we'll describe a _substitutability_ > predicate on two instances: > > - Two refs are substitutable if they refer to the same object > identity. > - Two primitives are substitutable if they are `==` (modulo special > pleading for `NaN` -- see `Float::equals` and `Double::equals`). > - Two values `a` and `b` are substitutable if they are of the same > type, and for each of the fields `f` of that type, `a.f` and `b.f` > are substitutable. > > We then say that for any two objects, `a == b` iff a and b are > substitutable. > > This is an "everything is an object" story that users can love! > Everything is an object, equality is total and intuitive on objects, > interfaces play nicely -? and there are no pesky boxes (except for > primitives, but see below.) The new concept here is that interfaces > abstract over refs and values, and therefore operations that we want > to be total on interfaces -- like equality -- have to take this seam > into account. > > The costs come in two lumps. The first is that if we're comparing two > objects, we first have to determine whether they are refs or values, > and do something different for each. We already paid this cost in > LW1, but here comes the bigger cost: if a value class has fields > whose static types are interfaces, the comparison may have to recur on > substitutability. This is horrifying for a VM engineer, but for users, > this is just a day at the office -- `equals` comparisons routinely > recur. (For values known to (recursively) have no interface fields > and no floating point fields, the VM can optimize comparison to a flat > bitwise comparison.) > > This model eliminates the equality anomalies, and provides users with > an intuitive and sound basis for "same instance". > > One might ask whether we really need to push this into `acmp`, or > whether we can leave `acmp` alone and provide a new API point for > substitutability, and have the compiler generate invocations of that. > While the latter is OK for new code, doing so would cause old code to > behave differently than new when operating on values (or interfaces > that may hold values), and may cause it to change its behavior on > recompile. If we're changing what `Object` means, and what `aload` > can operate on, we should update `acmp` accordingly. > > #### `==` and `equals()` > > Code that knows what type it is dealing with generally uses either > `==` or `equals()`, but not both; generic code (such as `HashMap`) > generally uses the idiom `a == b || a.equals(b)`. Such code _could_ > fall back to just using `equals()`; this idiom arose as an > optimization to avoid the virtual method invocation, but the first > part can be dropped with no semantic loss. > > As the cost of `==` gets higher, this optimization (as optimizations > often do!) may begin to bite back; the `equals()` implementation often > includes an `==` check as well. There are lots of things we can do > here, but it is probably best to wait to see what the actual > performance impact is before doing anything. > > #### Identity hash code > > Because values have no identity, in LW1 `System::identityHashCode` > throws `UnsupportedOperationException`. However, this is > unnecessarily harsh; for values, `identityHashCode` could simply > return `hashCode`. This would enable classes like `IdentityHashMap` > (used by serialization frameworks) to accept values without > modification, with reasonable semantics -- two objects would be deemed > the same if they are `==`. (For serialization, this means that equal > values would be interned in the stream, which is probably what is > wanted.) > > #### Locking > > Locking is a difficult one. On the one hand, it's bad form to lock on > an object that hasn't explicitly invited you to participate in its > locking protocol. On the other hand, there is likely code out there > that does things like lock on client objects, which might expect at > least exclusion with other code that locks the same object, and a > _happens-before_ edge between the release and the acquire. Having > locking all of a sudden throw `IllegalMonitorStateException` would > break such code; while we may secretly root for such code to be > broken, the reality is that such code is likely at the heart of large > legacy systems that are difficult to modify. So we may well be forced > into totalizing locking in some way. (Totalizing locking also means > totalizing the `Object` methods related to locking, `wait`, `notify`, > and `notifyAll`.) > > There are a spectrum of interpretations for totalizing locking, each > with different tradeoffs: > > - Treat locking on a value as an entirely local operation, providing > no exclusion and no happens-before edge. Existing code will > continue to run when provided with values, but may produce > unexpected results. > - Alternately, treat locking on a value as providing no exclusion, > but with acquire and release semantics.) Wait and notify would > still throw. > - Treat locking on a value as acquiring a fat lock (say, a global > value lock, a per-type value lock, etc.) This gives us exclusion > and visibility, with a small risk of deadlock in situations where > multiple such locks are held, and a sensible semantics for wait > and notify (single notify would have to be promoted to `notifyAll`). > - Treat locking on a value as acquiring a proxy lock which is > inflated by the runtime, which assigns a unique lock to each > distinguishable value. > - Put lock-related methods on `ValObject`, whose defaults do one of > the above, and allow implementations to override them. > > While nearly all of these options are horrifying, the goal here is > not to do something _good_, but merely to do something _good enough_ > to avoid crushing legacy code. > > #### Array covariance > > Currently, for any class `C`, `C[] <: Object[]`. This makes > `Object[]` the "top array type". If everything is an object, then an > array of anything should also be an array of `Object`. > > There are two paths to delivering on this vision: extend traditional > array covariance to value arrays (potentially making `aaload` sites > megamorphic), or moving in the direction of "Arrays 2.0" and define a > specializable generic type `Array` where the legacy arrays > implement `Array`, and require clients to migrate from `T[]` to > `Array` before specializing their generic classes. > > ## Poxing > > The Model 3 specializer focused on specializing generics over > primitives, not values (because we hadn't implemented values yet). > Many of the complexities we ran into in that exploration stemmed from > the accidental asymmetries between primitives and objects, including > irregularities in the bytecode set (single vs double slot, `if_icmpeq` > vs `dcmp` + `if`). Having unified references and values, it would be > really nice to unify primitives as well. > > While we can't exactly do that easily, beacause of the intrusion to > the bytecode set, we may be able to come close, using a modified > boxing conversion. The problem with the existing boxing conversion is > that `Integer` is a heavy box with identity -- which means boxing is > expensive. There are two possible paths by which we could mitigate > this pain: > > - Migrate `Integer` to be a value type; > - Create an alternate box for `int`, which is a value class (`ValInt`) > > If we can box primitives to values, then we need not unify primitives > with objects -- we just insert boxing conversions in the places we > already do, and interpret specializations like `List` to mean > "List of int's box". > > Migrating `Integer` to be a value may seem the obvious move, but it is > fraught with compatibility constraints -- there is tons of legacy code > that does things like locking on `Integer` or depending on it's > strange accidental identity. Perhaps if we could totalize locking and > remove the public box constructors, we could get there -- but this > is not a slam-dunk. > > The alternative is creating a value box for primitives (a "pox") and > adjust the compiler's boxing behavior (when boxing to `Object` or an > interface, prefer the pox to the box). This too has some > compatibility concerns, such as code that deals in `Object` that > assumes that primitives are always boxed to legacy boxes. We may be > able to finesse this by a trick -- to teach `instanceof` and > `checkcast` of the relationship between boxes and poxes, so that code > like: > > ``` > if (o instanceof Integer) { > Integer i = (Integer) o; > // use o > } > ``` > > would work on both `Integer` and `int`'s pox (by saying "yes" in > `instanceof` and doing the conversion in `checkcast`.) This move, > while somewhat risky, could allow us to relegate the legacy boxes to > legacy, and eventually deprecate them. (We could then have methods > and intefaces on the poxes, and lift them to the primitives via > poxing, so that `int` could be seen to implement `Comparable` and > you could call `compareTo()` on ints.) While this would not be a true > unification, it would come much closer than we are now. > > Clearly, both alternatives are risky and require more investigation > -- but both have promising payoffs. > > ## Migration > > In both Q-world and L-world, we took care to ensure that for a value > class `C`, the descriptor `LC;` describes a subtype of `Object`. This > is a key part of the story for migrating reference types to values, > since clients of `C` will describe it with `LC;` and we don't want to > require a flag day on migration. In Q-world, `LC;` is the (nullable) > box for `C`; in L-world, it is a nullable `C`. > > This is enough that we can migrate a value-based class to a value and > _existing binary clients_ will not break, even if they stuff a null > into an `LC;`. However, there are other migration compatibility > concerns which we need to take up (which I'll do in a separate > document.) > > ## Generics > > In Q-world, because values and references were so different, > specializable generic classes had to be compiled with additional > constraints. For a specializable type variable `T`, we enforced: > > - Cannot compare a `T` to `null` > - Cannot assign `null` to a `T` > - Cannot assign a `T` to `Object` > - Cannot assign a `T[]` to `Object[]` > - Cannot lock on a `T` > - Cannot `==` on a `T` > > In L-world, the need for most of these can go away. Because > everything is an object, we can assign values to `Object`, and > `acmp_null` should work on all objects, so comparing with `null` is > OK. If we have array covariance, the array assignment restriction > goes away. If we totalize locking and equality, those restrictions go > away. The only restriction that remains is the assignment to `null`. > But now the VM can express the difference between nullable values and > non-nullable values, and we can express this in the source type system > with `Nullable`. So all the Q-world restrictions go away, and they > are replaced by an indication that a given type variable (or perhaps > an entire generic class) is erased or reifiable, and we treat erased > type variables as if they have an implicit `Nullable` bound. Then the > compile-time null-assignment restriction reduces to "does `T` have a > `Nullable` bound", and the restriction against instantiating an erased > generic class with a Q-type reduces to a simple bounds violation. > > (There's lots more to cover on generics -- again, separate document.) From brian.goetz at oracle.com Tue Jan 8 13:03:18 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 8 Jan 2019 08:03:18 -0500 Subject: Finding the spirit of L-World In-Reply-To: <782061282.336996.1546936732842.JavaMail.zimbra@u-pem.fr> References: <782061282.336996.1546936732842.JavaMail.zimbra@u-pem.fr> Message-ID: <19542dce-ab62-0bd3-4e87-22fc1d7e15a6@oracle.com> > Introducing a ValObject as super or having the ACC_VALUE bit are two representation of the same thing for the VM. > In both case, we need to load the class to know if it's a value type or not, and in Java, the class loading is delayed which works well for reference type but not well for value type, if you have class that has a field which is a value type, you need to know if the field is a value type or not to be able to flatten it. With your proposal, the VM doesn't know if a field contains a value type or not until it's too late. Or are you suggesting to have a shape shifting objects (Skrulls objects) ? No.? As you say, from the VM perspective, the two are equivalent, as long as RefObject and ValObject are loaded super-early (which of course they can be.)?? To know whether to flatten a field is an orthogonal question.? We explored an ACC_FLATTENABLE bit, and in BUR we settled on "flatten Qs, don't flatten Ls" -- but we could change again.? But that is completely separate from how the class is declared. > > same issue here, you want to know if something is nullable or not when you verify the bytecode, but at that point the class may not be loaded so you don't know if the class implements Nullable or not. Again, you're talking at a different layer.? At the VM level, we still use L/Q to describe nullability of _instances_.? Putting Nullable in the type system let's the _language_ apply it to _types_, as in a typ bound: .? Different things. > Currently, you can not do == on value types, i.e. point1 == point2 doesn't compile, if you want a unified equality, you have to use equals. Right.? And I'm saying, we can't sell that.? Values should work like an int; you can compare ints with ==.?? I think the "Currently" story doesn't wash. From forax at univ-mlv.fr Tue Jan 8 16:20:06 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 8 Jan 2019 17:20:06 +0100 (CET) Subject: Finding the spirit of L-World In-Reply-To: <19542dce-ab62-0bd3-4e87-22fc1d7e15a6@oracle.com> References: <782061282.336996.1546936732842.JavaMail.zimbra@u-pem.fr> <19542dce-ab62-0bd3-4e87-22fc1d7e15a6@oracle.com> Message-ID: <340341882.521306.1546964406712.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Mardi 8 Janvier 2019 14:03:18 > Objet: Re: Finding the spirit of L-World >> Introducing a ValObject as super or having the ACC_VALUE bit are two >> representation of the same thing for the VM. >> In both case, we need to load the class to know if it's a value type or not, and >> in Java, the class loading is delayed which works well for reference type but >> not well for value type, if you have class that has a field which is a value >> type, you need to know if the field is a value type or not to be able to >> flatten it. With your proposal, the VM doesn't know if a field contains a value >> type or not until it's too late. Or are you suggesting to have a shape shifting >> objects (Skrulls objects) ? > > No.? As you say, from the VM perspective, the two are equivalent, as > long as RefObject and ValObject are loaded super-early (which of course > they can be.)?? To know whether to flatten a field is an orthogonal > question.? We explored an ACC_FLATTENABLE bit, and in BUR we settled on > "flatten Qs, don't flatten Ls" -- but we could change again.? But that > is completely separate from how the class is declared. we've changed because you need to know if a value type is flattenable at 3 places, field, array creation and method parameters, ACC_FLATENABLE only works for field, you don't need any flag for the array creation because at that time the class of the array component need to be loaded, we had no real solution for the method signature. anyway, what you are thinking is more in term of the language than the VM, so yes, you can have a ValObject. > >> >> same issue here, you want to know if something is nullable or not when you >> verify the bytecode, but at that point the class may not be loaded so you don't >> know if the class implements Nullable or not. > > Again, you're talking at a different layer.? At the VM level, we still > use L/Q to describe nullability of _instances_.? Putting Nullable in the > type system let's the _language_ apply it to _types_, as in a type bound: > .? Different things. BTW, i believe it should be otherwise, the interface Nullable will appear in the code when erased. > >> Currently, you can not do == on value types, i.e. point1 == point2 doesn't >> compile, if you want a unified equality, you have to use equals. > > Right.? And I'm saying, we can't sell that.? Values should work like an > int; you can compare ints with ==.?? I think the "Currently" story > doesn't wash. You can not use the "work like an int" argument here, a value type can contain references, so it doesn't work like an int. And what you propose as semantics for == is not the == semantics of a primitive type, 1/ it's an extension of that semantics. 2/ your proposed extension make it awful (see just below). value class IntList { private final int value; private final Object next; IntList(int value, Object next) { this.value = value; this.next = next; } } var list = IntStream.range(0, 100_000).box().reduce(null, (next, value) -> new IntList(value, next)).orElseThrow(); list == list // will gently loop over the list of 100_000 links and stack overflow ! John proposed to stop the recursivity at some point, but it will be very surprising too ! I've though about doing a component wise comparison if there is no reference in the value type, but it means that the semantics will vary depending on the implementation (will behave differently by example if the value type encapsulates a reference or an int as index). So for now, i think the only possible semantics is to consider that '==' means an address comparison for all kind of classes (reference or value class), so a value class acts like a class for == and given that a value has no address, it should return false. cheers, R?mi From brian.goetz at oracle.com Tue Jan 8 17:44:11 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 8 Jan 2019 12:44:11 -0500 Subject: Finding the spirit of L-World In-Reply-To: <340341882.521306.1546964406712.JavaMail.zimbra@u-pem.fr> References: <782061282.336996.1546936732842.JavaMail.zimbra@u-pem.fr> <19542dce-ab62-0bd3-4e87-22fc1d7e15a6@oracle.com> <340341882.521306.1546964406712.JavaMail.zimbra@u-pem.fr> Message-ID: <6340a209-1a63-df53-88c9-818babefc04c@oracle.com> On 1/8/2019 11:20 AM, forax at univ-mlv.fr wrote: >> Right.? And I'm saying, we can't sell that.? Values should work like an >> int; you can compare ints with ==.?? I think the "Currently" story >> doesn't wash. > You can not use the "work like an int" argument here, a value type can contain references, so it doesn't work like an int. Sorry, I don't buy it. One of the primary use cases for value types is numerics.? Are we seriously telling people that they can't compare non-intrinsnic numerics with `==`?? I realize that from a VM perspective, the "values have no ==" seems sound, but for the 99.9999% of Java developers that are not VM engineers, I don't think a single one of them will buy it. > John proposed to stop the recursivity at some point, but it will be very surprising too ! No, in corner cases like this (and surely this is at least a corner of a corner), we eat the recursion.? That's the sensible equality semantics for embedding a linked lists in a value. > So for now, i think the only possible semantics is to consider that '==' means an address comparison for all kind of classes (reference or value class), so a value class acts like a class for == and given that a value has no address, it should return false. That's certainly not the only solution.? Nor do I think its a remotely good one for Java developers. From forax at univ-mlv.fr Tue Jan 8 18:12:58 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 8 Jan 2019 19:12:58 +0100 (CET) Subject: Finding the spirit of L-World In-Reply-To: <6340a209-1a63-df53-88c9-818babefc04c@oracle.com> References: <782061282.336996.1546936732842.JavaMail.zimbra@u-pem.fr> <19542dce-ab62-0bd3-4e87-22fc1d7e15a6@oracle.com> <340341882.521306.1546964406712.JavaMail.zimbra@u-pem.fr> <6340a209-1a63-df53-88c9-818babefc04c@oracle.com> Message-ID: <977994009.541412.1546971178944.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Mardi 8 Janvier 2019 18:44:11 > Objet: Re: Finding the spirit of L-World > On 1/8/2019 11:20 AM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] wrote: >>> Right.? And I'm saying, we can't sell that.? Values should work like an >>> int; you can compare ints with ==.?? I think the "Currently" story >>> doesn't wash. >> You can not use the "work like an int" argument here, a value type can contain >> references, so it doesn't work like an int. > Sorry, I don't buy it. > One of the primary use cases for value types is numerics. Are we seriously > telling people that they can't compare non-intrinsnic numerics with `==`? I > realize that from a VM perspective, the "values have no ==" seems sound, but > for the 99.9999% of Java developers that are not VM engineers, I don't think a > single one of them will buy it. A the level of the language, very early at the beginning of valhalla, we have talk about a way to annotate equals so == on a numeric value type will be redirected to equals (instead of not compiling like now). This is how you make a numeric value type works like an int. Another primary use case is to be able encode any lightweight functional abstractions like a monad or a cursor without paying any cost at runtime, for those value types, defining == as a numeric comparison make little sense. R?mi >> John proposed to stop the recursivity at some point, but it will be very >> surprising too ! > No, in corner cases like this (and surely this is at least a corner of a > corner), we eat the recursion. That's the sensible equality semantics for > embedding a linked lists in a value. >> So for now, i think the only possible semantics is to consider that '==' means >> an address comparison for all kind of classes (reference or value class), so a >> value class acts like a class for == and given that a value has no address, it >> should return false. > That's certainly not the only solution. Nor do I think its a remotely good one > for Java developers. From karen.kinnear at oracle.com Wed Jan 16 14:25:04 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 16 Jan 2019 09:25:04 -0500 Subject: Next Valhalla EG meeting January 16, 2019 In-Reply-To: References: Message-ID: <39A0C3BC-DC05-403E-83C8-3F6067762D40@oracle.com> It has been long enough I thought I would send a reminder. thanks, Karen > On Jan 2, 2019, at 9:21 AM, Karen Kinnear wrote: > > Happy New Year! > > Correction to my previous email - January 16, 2019 is our next Valhalla EG meeting > > thanks, > Karen > > From brian.goetz at oracle.com Tue Jan 22 13:51:35 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 22 Jan 2019 08:51:35 -0500 Subject: Bridge methods in the VM Message-ID: <18977A01-FE12-4A19-A2BB-A352084E969E@oracle.com> We?ve been thinking for a long time about the possibilities of pushing bridging down into the VM. The reasons we have had until now have not been strong enough, but generic specialization, and compatible migration of libraries, give us reason to take another swing. HTML inline (list willing); MD attached. VM Bridging Historically, bridges have been generated by the static compiler. Bridges are generated today when there is a covariant override (a String-returning method overrides an Object-returning method), or when there is a generic instantiation (class Foo implements List). (Historically, we also generated access bridges when accessing private fields of classes in the same nest, but nestmates did away with those!) Intuitively, a bridge method is generated when a single method implementation wants to respond to two distinct descriptors. At the language level, these two methods really are the same method (the compiler enforces that subclasses cannot override bridges), but at the VM level, they are two completely unrelated methods. This asymmetry is the source of the problems with bridges. One of the main values of making the JVM more aware of bridges is that we no longer need to throw away the useful information that two seemingly different methods are related in this way. We took a running leap at this problem back in Java 8, when we were doing default methods; this document constitutes a second run at this problem. Bridge anomalies Compiler-generated bridge methods are brittle; separate compilation can easily generate situations where bridges are missing or inconsistent, which in turn can result in AME, invoking a superclass method when an override exists in a subclass, or everyone's favorite anomaly, the bridge loop. Start with: class Parent implements Cloneable { protected Object clone() { return (Parent)null; } } class Child extends Parent { protected Parent clone() { return (Parent)super.clone(); } } Then, change Parent as follows, and recompile only that: class Parent implements Cloneable { protected Parent clone() { return (Parent)null; } } If you call clone() on Child you get a StackOverflowError (try it!) What's going on is that when we make this change, the place in the hierarchy where the bridge is introduced changes, but we don't recompile the entire hierarchy. As a result, we have a vestigial bridge, and when we invoke clone() with invokevirtual from the new bridge, we hit the old bridge, and loop. The fundamental problem here is that we are rendering bridges into concrete code "too early", based on a compile-time view of the type hierarchy. We want to make bridge dispatch more dynamic; we can accomplish this by making bridges more declarative than imperative, by recording the notion "A is a bridge for B" in the classfile -- and using that in dispatch -- without having to decide ahead of time exactly what bytecodes to use for bridging. Generic specialization Generics gave us a few situations where we want to be able to access a class member through more than one signature; specialized generics will give us more. For example, in a specialized class: class Foo { T t; T get(); } In the instantiation Foo, the type of the field t, and the return type of get(), are int. In the wildcard type Foo, the type of both of these is Object. But because a Foo is a Foo, we want that a Fooresponds to invocations of get()Object, and to accesses of the field t as if it were of type Object. We could handle the method with yet more bridge methods, but bridge methods don't do anything to help us with the field access. (In the M2 prototype we lifted field access on wildcards to method invocations, which was a useful prototyping move, but this does nothing to help existing erased binaries.) So while bridge methods as a mechanism run out of gas here, the concept of bridging -- recording that one member is merely an adaptation for another -- is still applicable. Summary of problems We can divide the problems with bridges into two groups -- old and new. The old problems are not immediately urgent to fix (brittleness, separate compilation anomalies), but are a persistent source of technical debt, bug tails, and constraints on translation evolution. The new problems are that specialized generics give us more places we want bridges, making the old problems worse, as well as some places where we want the effects of bridging, but for which traditional bridge methods won't do the trick -- adaptation of fields. Looking ahead, there are also some other issues on the horizon that we will surely encounter as we migrate the libraries to use specialized generics -- that have related characteristics. Proposed solution: forwarded members I'll lay out one way to express bridges for both fields and methods in the classfile, but there are others. In this model, for a member B that is a bridge for member M, we include a declaration for B in the class file, but we attach a Forwarding attribute to it, identifying the underlying member M (by descriptor, since its name will be the same) and indicating that attempts to link to B should be forwarded to M: Forwarding { u2 name; u4 length; u2 forwardeeType; } A method with a Forwarding attribute has no Code attribute. We would then replace existing bridges with forwarding members, and for specializable classes, we would generate a forwarding member for every method and field whose signature contains type variables (and which therefore would change under erasure), whose descriptor is the erasure of the forwardee descriptor. Adaptation In all the cases so far, the descriptor of the bridge and of the forwardee differ in relatively narrow ways -- the bridge descriptor can be adapted to the forwardee descriptor with a subset of the adaptations performed by MethodHandle::asType. This is adequate for generic and specialization bridges, but as we'll see below, we may want to extend this set. Conflicts If a class contains a bridge whose forwardee descriptor matches the bridge descriptor exactly, the bridge is simply discarded. This decision can be made only looking at the forwarding member, since we'll immediately see that the member descriptor and the forwarding descriptor are identical. (Such situations can arise when a class is specialized with the erasure of its type variables.) Semantics The linkage semantics of forwarding members are different from that of ordinary members. When linking a field or method access, if the resolved target is a forwarding member, we want to make some adjustments at the invocation site. For a getfield that links to a forwarding member, we link the access such that it reads the forwardee field, and then adapts the resulting value to the bridge field type, and leaves the adapted value on the stack. (This is as if the getfield is linked to the result of taking a field getter method handle, and adapting it with asType() to the bridge type.) For a putfield, we do the reverse; we adapt the new field value to the forwardee type, and write to that field. If the forwarding member is a method, we re-resolve the method using the forwardee signature, adapt its parameters as we would for putfield and its return value as we would for getfield, and invoke the forwardee with the invocation mode present at the call site. Again, the semantics here are as if we took a method handle for the forwardee method, using the invocation mode present at the call site, and adapted it with asType to the bridge descriptor. The natural interpretation here is that rather than materializing a real field or method body in the class, we manage the forwarding as part of the linkage process, and include any necessary adaptations at the access site. The bridge "body" is never actually invoked; we use the Forwarding metadata to adapt and re-link the access sites. Bridge loops The linkage strategy outlined above -- where we truly treat bridges as forwarding to another member -- is the key to breaking the bridge loops. Specifying forwarded members means that the JVM can be aware that two methods are, at some level, the same method; the more complex linkage procedure allows us to invoke the bridgee with the correct invocation mode all the time, even under separate compilation. In our Parent/Child example, Child::clone will do an invokespecial to invoke Parent::clone()Object, which after recompilation is a bridge to Parent::clone()Parent. We'll see that this is a bridge, and will forward to Parent::clone()Parent, with an invokespecial, and we'll land in the right place. The elimination of bridge loops here stems from having raised the level of abstraction in which we render the classfile; we record that Parent::clone()Object is merely a bridge for Parent::clone()Parent, and so any invocation of the former is redirected -- with the same invocation mode -- to the latter. It is as if the client knew to invoke the right method. User-controlled bridges The compiler will generate bridges where the language requires it, but we also have the opportunity to enable users to ask for bridges by providing a bridging annotation on the declaration: @GenerateBridge(returnType=Object.class) public static String foo() { ... } This will instruct the compiler to generate an Object-returning method that is a bridge for foo(). This could be done for either fields or methods. (People have written frameworks to do this; see for example http://kohsuke.org/2010/08/07/potd-bridge-method-injector/). Near-future problem: type migration This mechanism may also be able to help us deal with the case when we want to migrate signatures in an otherwise-incompatible manner, such as changing a method that returns int to return long, or an OptionalInt to Optional, or a old-style Date to the newer LocalDate. Numerous library modernizations (such as migrating from the old date-time libraries to the JSR-310 versions) are blocked on the ability to make such migrations; specializing the core libraries (especially Stream) will also generate such migrations. Such migrations are a generalization of the sort of bridges we've been discussing here; they involve adding an additional two features: Additional adaptations, including user-defined adaptations (such as between Date and LocalDate) Interaction with overriding, so that subclasses that override the old signature can still work properly. Projection-embedding pairs Given two types T and U, a projection-embedding pair is a pair of functions p?:?T???U and e?:?U???T such that ?u???U?p(e(u))?=?u, and, if t is in the range of p, then e(p(t))?=?t. Examples of useful projection-embedding pairs are the value sets of LV and QV for any value class V (we can embed the entirety of QV in LV, but LV contains one value -- null -- that can't be mapped back), any types Tand U where T <: U, int and long (we can embed int in long), and Date and LocalDate. Intuitively, a p-e pair means we can freely map back and forth for the embeddable subset, and we get some sort of failure (e.g., NPE, or range truncation) otherwise. User-provided adaptations Many of the adaptations we want to do are handled by MethodHandle::asType: casting, widening, boxing. But sometimes, a migration involves types that require user-provided adaptation behavior, such as converting Date to LocalDate. (Bridges need to do these in both directions; we use the embedding for reads and projection for writes.) Here, we can extend the format of the Forwarding attribute to capture this additional behavior as pairs of method handles, such as: Forwarding { u2 name; u4 length; u2 forwardeeType; // adaptation metadata u1 pePairs; { u2 projection; u2 embedding; }[pePairs]; } When linking an access site for a forwarding member, when an adaptation is not supported by MethodHandle::asType, we use the user-provided embedding function for adapting return types and field reads, and the projection function for adapting parameter types and field writes. Overriding A more complicated problem is when we want to migrate the signature of an instance member in a non-final class, because the class may have existing subclasses that override the member, and may not yet have been recompiled. For example, we might start with: interface Collection { int size(); } class ArrayList implements Collection { int size() { return elements.length; } } Now, we recompile Collection but not ArrayList: interface Collection { @TypeMigration(returnType=int.class) long size(); } When we go to load ArrayList, we'll find that it overrides the bridge (size()int), and does not override the real method. We'll want to adjust ArrayList as we load it to make up for this. Half the problem of this migration is addressed by having a forwarding method from size()int to size()long; any legacy clients that call the old signature will be bridged to the new one. To further indicate that overrides of such a method should be adjusted, suppose we mark this forwarding bridge with ACC_MIGRATED (in reality, we can probably use ACC_FINAL for this). Now, when we go to load ArrayList, we'll see that size()int is trying to override a migrated method (this is much like the existing override-a-final check). Instead of rejecting the subtype, instead we use ACC_MIGRATED bridges as a signal to fix up overrides. We already have all the information in the Forwarding attribute that we need to fix ArrayList::size; we rewrite the descriptor to the forwardee descriptor, use the projection function for adapting argument types, and the embedding function for adapting the return type, and install the result in ArrayList. It is as if we adapted the subclass method with asType to the forwardee descriptor, and installed that in the subclass instead. The effect is that in the presence of a migrated bridge, the bridge descriptor is a toxic waste zone; callers are redirected to the new descriptor by bridging, and overriders are redirected to the new descriptor by adaptation. From brian.goetz at oracle.com Wed Jan 23 17:51:58 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 23 Jan 2019 12:51:58 -0500 Subject: Finding the spirit of L-World In-Reply-To: References: Message-ID: > The key questions are around the mental model of what we're trying to accomplish and how to make it easy (easier?) for users to migrate to use value types or handle when their pre-value code is passed a valuetype. There's a cost for some group of users regardless of how we address each of these issues. Who pays these costs? Those migrating to use the new value types functionality? Those needing to address the performance costs of migrating to a values capable runtime (JDK-N?). Indeed, this is the question. And, full disclosure, my thoughts have evolved since we started this exercise. We initially started with the idea that value types were this thing ?off to the side? ? a special category of classes that only experts would ever use, and so it was OK if they had sharp edges. But this is the sort of wishful thinking one engages in when you are trying to do something that seems impossible; you bargain with the problem. When we did generics, there was a pervasive believe that the complexity of generics could be contained to where only experts would have to deal with it, and the rest of us could happily use our strongly typed collections without having to understand wildcards and such. This turned out to be pure wishful thinking; generics are part of the language, and in order to be an effective Java programmer, you have to understand them. (And this only gets more true; the typing of lambdas builds on generics.) The first experiments (Q world) were along the lines of value types being off to the side. While it was possible to build the VM that way, we ran into problem after another as we tried to use them in Java code. Value types would be useless if you can?t put them in an ArrayList or HashMap, so we were going to have migrate our existing libraries to be value-aware. And with the myriad distinctions between values and objects (different top types, different bytecodes, different type signatures), it was a migration nightmare. In the early EG meetings, Kevin frequently stood up and said things like ?it?s bad enough that we have a type system split in two; are you really trying to sell me one split in three? You can?t do that to the users.? (Thank you, Kevin.) The problems of Q-world were in a sense the problems of erased generics ? we were trying to minimize the disruption to the VM (a worthy goal), but the cost was that sharp edges were exposed to the users in ways they couldn?t avoid. And the solution of L World is: push more of it into the VM. (Obviously there?s a balance to be struck here.) And I believe that we are finally close to a substrate on which we can build a strong, stable tower, where we can compatibly migrate our existing billions of lines of code with minimal intrusion. So this is encouraging. The vision of being able to ?flatten all the way down?, and having values interact cleanly with all the other language features is hard to argue against. But as you say, the question is, who pays. > One concern writ large across our response is performance. I know we're looking at user model here but performance is part of that model. Java has a well understood performance model for array access, == (acmp), and it would be unfortunate if we damaged that model significantly when introducing value types. I agree that this is an expensive place to be making tradeoffs. Surely if the cost were that ACMP got .0000001% slower, it?s a slam dunk ?who cares?, and if ACMP got 100000x slower, it?s a slam-dunk the other way. The real numbers (for which we?ll need data) will not be at either of these extremes, and so some hard decisions are in our future. > Is this a fair statement of the projects goals: to improve memory locality in Java by introducing flattenable data? The rest of where we've gotten to has been working all the threads of that key desire through the rest of the java platform. The L/Q world design has come about from starting from a VM perspective based on what's implementable in ways that allows the JVM to optimize the layout. It?s a fair summary, but I would like to be more precise. Value types offer the user the ability to trade away some programming flexibility (mutability, subtyping) for flatter and denser memory layouts. And we want value types to interact cleanly with the other features of the platform, so that when you (say) put value types in an ArrayList, you still get flat and dense representations. So I think a good way to think about it is ?enabling flattening all the way down?. (Flattenability also maps fairly cleanly to scalarizability, so the same tradeoffs that give us flattenability on the heap give us scalarization on the stack.) Those are the performance goals. But there are also some ?all the way up? goals I?d like to state. Programming with value types should interact cleanly with the rest of the platform; writing code that is generic over references and values should only be slightly harder than writing code that is generic only over erased references. Users should be able to reason about the properties of Object, which means reasoning about the union of references and values. Otherwise, we may gain performance, but we?ve turned Java into C++ (or worse), and one of the core values of the platform will be gone. Balancing these things is a very tricky balance, and I think we?re still spiraling into the right balance. Q World was way too far off in one direction; it gave the experts what they needed but at the cost of making everyone?s language far more complex and hard to code in, and creating intractable migration problems. I think L World is much closer to where we want to be, but I think we?re still a little too much focused on bottom-up decision making, and we need to temper that with some top-down ?what language do we get, and is it the one we want? thinking. I am optimistic, but I?m not declaring victory yet. > One of the other driving factors has been the desire to have valuetypes work with existing collections classes. And a further goal of enabling generic specialization to allow those collections to get the benefits of the flattened data representations (ie: backed by flattened data arrays). Yes. I think this is ?table stakes? for this exercise. Not being able to use HashMap with values, except via boxing, would be terrible; not being able to generify over all the types would be equally terrible. And one of the biggest assets of the Java ecosystem is the rich set of libraries; having to throw them all out and rewrite them (and deal with the migration mess from OldList to NewList) could well be the death sentence. We don?t have to get there all at once; the intermediate target (L10) is ?erased generics over values?, which gives us reuse and reasonable calling conventions but not yet flattening. But that has to lead to a sane generics model where values are first-class type arguments, with flattening all the way down. > The other goal we discussed in Burlington was that pre-value code should be minimally penalized when values are introduced, especially for code that isn't using them. Otherwise, it will be a hard sell for users to take a new JDK release that regresses their existing code. Yes, I think the question here is ?what is minimal.? And the answer is going to be hard to quantify, because there are slippery slopes and sharp cliffs everywhere. If we have some old dusty code and just run unchanged on a future JVM, there probably won?t be many value types flying around, so speculation might get us 99% of the way there. But once you start mixing that old legacy code with some new code that uses values, it might be different. Also, bear in mind that values might provide performance benefits to non-value-using code. For example, say we rewrite HashMap using values as entries. That makes for fewer indirections in everyone?s code, even if they never see a value in the wild. Do we count that when we are counting the ?value penalty? for legacy code? So, we have to balance the cost to existing code (that never asked for values) with the benefits to future code that can do amazing new things with values. > Does that accurate sum up the goals we've been aiming for? With some caveats, its a good starting point :) > > A sensible rationalization of the object model for L-World would be to > have special subclasses of `Object` for references and values: > > ``` > class Object { ... } > class RefObject extends Object { ... } > class ValObject extends Object { ... } > ``` > > Would the intention here be to retcon existing Object subclasses to instead subclass RefObject? While this is arguable the type hierarchy we'd have if creating Java today, it will require additional speculation from the JIT on all Object references in the bytecode to bias the code one way or the other. Some extra checks plus a potential performance cliff if the speculation is wrong and a single valuetype hits a previous RefObject only calcite. That was what I was tossing out, yes. This is one of those nice-to-haves that we might ultimately compromise on because of costs, but we should be aware what the costs are. It has some obvious benefits (clear statement of reality, brings value-ness into the type system.) And the fact that value-ness wasn?t reflected in the type system in Q world was a real problem; it meant we had modifiers on code and type variables like ?val T? that might have been decent prototyping moves, but were not the language we wanted to work with. That said, if the costs are too high, we can revisit. > ``` > interface Nullable { } > ``` > > which is implemented by `RefObject`, and, if we support value classes > being declared as nullable, would be implemented by those value > classes as well. Again, this allows us to use `Nullable` as a > parameter type or field type, or as a type bound (` Nullable>`). > I'm still unclear on the nullability story. Me too :) Some recent discussions have brought us to a refined view of this problem, which is: what?s missing from the object model right now is not necessarily nullable values (we already have these with L-types!), but classes which require initialization through their constructor in order to be valid. This is more about ?initialization safety? than nullability. Stay tuned for some fresh ideas here. > > > #### Equality > > The biggest and most important challenge is assigning sensible total > semantics to equality on `Object`; the LW1 equality semantics are > sound, but not intuitive. There's no way we can explain why for > values, you don't get `v == v` in a way that people will say "oh, that > makes sense." If everything is an object, `==` should be a reasonable > equality relation on objects. This leads us to a somewhat painful > shift in the semantics of equality, but once we accept that pain, I > think things look a lot better. > > Users will expect (100% reasonably) the following to work: > > ``` > Point p1, p2; > > p1 == p1 // true > > p2 = p1 > p1 == p2 // true > > Object o1 = p1, o2 = p2; > > o1 == o1 // true > o1 == o2 // true > ``` > We ran into this problem with PackedObjects which allowed creating multiple "detached" object headers that could refer to the same data. While early users found this painful, it was usually a sign they had deeper problems in their code & understanding. One of the difficulties was that depending on how the PackedObjects code was written, == might be true in some cases. We found a consistent answer was better - and helped to define the user model. I am deeply concerned that this is wishful thinking based on performance concerns ? and validated with a non-representative audience. I?d guess that most of the Packed users were experts who were reaching for packed objects because they had serious performance problems to solve. (What works in a pilot school for gifted students with hand-picked teachers, doesn?t always scale up to LA County Unified.) I think that we muck with the intuitivess of `==` at our peril. Of all the concerns i have about totality, equality is bigger that all the rest put together. > In terms of values, is this really the model we want? Users are already used to needing to call .equals() on equivalent objects. By choosing the answer carefully here, we help to guide the right user mental model for some of the other proposals - locking being a key one. I think this is probably wishful thinking too. A primary use case for values is numerics. Are we going to tell people they can?t compare numerics with ==? And if we base `==` on the static type, then we?ll get different semantics when you convert to Object. But conversion to Object is not a boxing conversion ? it?s a widening conversion. I?m really worried about this. > > While the conceptual model may be clean, it's also, as you point out, horrifying. Trees and linked structures of values become very very expensive to acmp in ways users wouldn't expect. I?m not sure about the ?expect? part. We?re telling people that values are ?just? their state (even if that state is rich.) Wouldn?t you then expect equality to be based on state? > > If we do this, users will build the mental model that values are interned and that they are merely fetching the same instances from some pool of values. This kind of model will lead them down rabbit holes - and seems to give values an identity. We've all seen abuses of String.intern() - do we want values to be subject to that kind of code? That?s not the mental model that comes to mind immediately for me, so let?s talk more about this. > > The costs here are likely quite large - all objects that might be values need to be checked, all interfaces that have ever had a value implement them, and of course, all value type fields plus whatever the Nullability model ends up being. I would say that _in the worst case_ the costs could be large, but in the common cases (e.g., Point), the costs are quite manageable ? the cost of a comparison is a bulk bit comparison. Thats more than a single word comparison, but it?s not so bad. I get that this is where the cost is ? I said up front, this is the pill to swallow. Let?s figure out what it really costs. > > > #### Identity hash code > > Because values have no identity, in LW1 `System::identityHashCode` > throws `UnsupportedOperationException`. However, this is > unnecessarily harsh; for values, `identityHashCode` could simply > return `hashCode`. This would enable classes like `IdentityHashMap` > (used by serialization frameworks) to accept values without > modification, with reasonable semantics -- two objects would be deemed > the same if they are `==`. (For serialization, this means that equal > values would be interned in the stream, which is probably what is > wanted.) > > By return `hashCode`, do you mean call a user defined hashCode function? Would the VM enforce that all values must implement `hashCode()`? Is the intention they are stored (growing the size of the flattened values) or would calling the hashcode() method each time be sufficient? I would prefer to call the "built-in? value hashCode ? the one that is deterministically derived from state. That way, we preserve the invariant that == values have equal identity hash codes. > > The only consistent answer here is to throw on lock operations for values. Anything else hides incorrect code, makes it harder for users to debug issues, and leaves a mess for the VM. As values are immutable, the lock isn't protecting anything. Code locking on unknown objects is fundamentally broken - any semantics we give it comes at a cost and doesn't actually serve users. I don?t disagree. The question is, what are we going to do when Web{Logic,Sphere} turns out to be locking on user objects, and some user passes in a value? Are we going to tell them ?go back to Java 8 if you don?t like it?? (Serious question.) If so, then great, sign me up! > To be continue?. From brian.goetz at oracle.com Fri Jan 25 18:12:51 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 25 Jan 2019 13:12:51 -0500 Subject: Migrating the primitive boxes to values Message-ID: Let's take this problem from the other direction.? What are the impediments to us migrating the existing java.lang.Integer classes and friends to being values?? I think its pretty clear that "all things being equal", this is a better choice than creating new pox classes, and dealing with the sharp edges that entails.? So, what's stopping us?? It's the usual suspects: 1.? Equality.? People do `==` on hopefully-interned Integer values.? We advise against it, but they do.? Having these always fail would surely break lots and lots of code. 2.? System.identityHashCode.? People put `Integer` in object graphs that get serialized; serialization will put them through IdentityHashMap. 3.? Nullability.? These types are surely nullable; I think there's no turning back from that. 4.? Locking.? While it seems dumb, there is surely code out there that locks on Integer instances. 5.? Constructor access.? There are existing constructors, which we deprecated in 9.? Existing binaries will invoke them with new/dup/invokespecial rather than the appropriate value instantiation. My "Finding the Spirit" proposal offered a cure for 1: turn `==` into a substitutibility test.? For wrappers, this behaves as if all instances were interned, rather than just the numbers from zero to some small value; the spec warns that the range depends on runtime parameters.? Combined with deprecating and eventually removing the constructors, this seems like it is a move that is within the range of the spec, and only would affect code that is relying on accidental identity.? Could we get away with this? My proposed cure for (2) is similar: make identityHashCode on values return the "built-in hashCode" -- that is, the state-based hashCode that is the default for values if you don't override hashCode. Nullability is a migration concern, shared by other types migrating to value types, so this is something we likely have to address regardless. Which brings us to ... locking.? The two choices are: assign locking on values a state-based semantics (which no one really wants to do), or ... let code that locks on Integer just break. Both are obviously squirmy options. Which brings me to my real point: if we go the latter route, when a big legacy customer with a big legacy codebase has their code broken by this, what happens next?? I know its really easy to say that we'll tell them they were making a mistake for 22 years and their bad behavior finally caught up with them, but this answer is rarely well received in reality. It seems that if we can get comfortable with `==` being substitutibility (which I still think is a kind of forced move), and with outlawing locking on the primitive boxes -- both potentially big-ticket choices -- then we can rehabilitate the existing boxes.? Which would be a nice place to be. From rschmitt at pobox.com Fri Jan 25 22:57:32 2019 From: rschmitt at pobox.com (Ryan Schmitt) Date: Fri, 25 Jan 2019 14:57:32 -0800 Subject: Migrating the primitive boxes to values In-Reply-To: References: Message-ID: Is there data to suggest that this breakage would be more painful than, say, the removal of sun.misc.BASE64Encoder? (Assuming that's even a relevant comparison.) On Fri, Jan 25, 2019 at 10:15 AM Brian Goetz wrote: > Which brings me to my real point: if we go the latter route, when a big > legacy customer with a big legacy codebase has their code broken by > this, what happens next? I know its really easy to say that we'll tell > them they were making a mistake for 22 years and their bad behavior > finally caught up with them, but this answer is rarely well received in > reality. > > From forax at univ-mlv.fr Sat Jan 26 11:49:20 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 26 Jan 2019 12:49:20 +0100 (CET) Subject: Migrating the primitive boxes to values In-Reply-To: References: Message-ID: <737114433.1347685.1548503360282.JavaMail.zimbra@u-pem.fr> For 5, the first idea is to forward the call to the constructor to the factory method, but detecting the new + dup + invokespecial may be hard especially in the interpreter. The other solution is to redirect the call to new to create a larval value type and have the invokespecial call to fill the larval buffer, but it means that the larval API (at least part of it) has to be included in the VM spec even if only to restricted to wrapper types :( R?mi ----- Mail original ----- > De: "Brian Goetz" > ?: "valhalla-spec-experts" > Envoy?: Vendredi 25 Janvier 2019 19:12:51 > Objet: Migrating the primitive boxes to values > Let's take this problem from the other direction.? What are the > impediments to us migrating the existing java.lang.Integer classes and > friends to being values?? I think its pretty clear that "all things > being equal", this is a better choice than creating new pox classes, and > dealing with the sharp edges that entails.? So, what's stopping us? > It's the usual suspects: > > 1.? Equality.? People do `==` on hopefully-interned Integer values.? We > advise against it, but they do.? Having these always fail would surely > break lots and lots of code. > > 2.? System.identityHashCode.? People put `Integer` in object graphs that > get serialized; serialization will put them through IdentityHashMap. > > 3.? Nullability.? These types are surely nullable; I think there's no > turning back from that. > > 4.? Locking.? While it seems dumb, there is surely code out there that > locks on Integer instances. > > 5.? Constructor access.? There are existing constructors, which we > deprecated in 9.? Existing binaries will invoke them with > new/dup/invokespecial rather than the appropriate value instantiation. > > > My "Finding the Spirit" proposal offered a cure for 1: turn `==` into a > substitutibility test.? For wrappers, this behaves as if all instances > were interned, rather than just the numbers from zero to some small > value; the spec warns that the range depends on runtime parameters. > Combined with deprecating and eventually removing the constructors, this > seems like it is a move that is within the range of the spec, and only > would affect code that is relying on accidental identity.? Could we get > away with this? > > My proposed cure for (2) is similar: make identityHashCode on values > return the "built-in hashCode" -- that is, the state-based hashCode that > is the default for values if you don't override hashCode. > > Nullability is a migration concern, shared by other types migrating to > value types, so this is something we likely have to address regardless. > > Which brings us to ... locking.? The two choices are: assign locking on > values a state-based semantics (which no one really wants to do), or ... > let code that locks on Integer just break. Both are obviously squirmy > options. > > Which brings me to my real point: if we go the latter route, when a big > legacy customer with a big legacy codebase has their code broken by > this, what happens next?? I know its really easy to say that we'll tell > them they were making a mistake for 22 years and their bad behavior > finally caught up with them, but this answer is rarely well received in > reality. > > It seems that if we can get comfortable with `==` being substitutibility > (which I still think is a kind of forced move), and with outlawing > locking on the primitive boxes -- both potentially big-ticket choices -- > then we can rehabilitate the existing boxes.? Which would be a nice > place to be. From forax at univ-mlv.fr Sat Jan 26 12:22:23 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 26 Jan 2019 13:22:23 +0100 (CET) Subject: Migrating the primitive boxes to values In-Reply-To: References: Message-ID: <90848810.1349724.1548505343519.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Ryan Schmitt" > ?: "Valhalla Expert Group Observers" > Envoy?: Vendredi 25 Janvier 2019 23:57:32 > Objet: Re: Migrating the primitive boxes to values > Is there data to suggest that this breakage would be more painful than, > say, the removal of sun.misc.BASE64Encoder? (Assuming that's even a > relevant comparison.) Not exactly, you have two different effects, 1) == starts to returns true where it was returning false previously 2) synchronized on an Integer throws IllegalMonitorStateException For (1), we hope it's a minor issue because changing the size of the Integer cache (setting the AutoBoxCacheMax VM option in early JDKs or changing the property java.lang.Integer.IntegerCache.high) as the same side effect. For (2), at runtime, you have the same kind of effect, get an exception but it's a runtime exception and not an error. Here again, we expect that it's not a real issue because changing the size of the Integer cache change the behavior of synchronized so the code is already broken. But given a runtime exception is thrown a some people love to write catch(Exception), you can have programs that run errand instead of stopping to work. In both case, the program may behave in a strange way when the JDK is updated, which is worst than throwing an error saying that BASE64Encoder is not available. Also, doing a static analysis of the code to find (1) and (2) is hard, because an Integer can be typed as an Object so you need to do an inter-procedural data flow analysis on the whole code (something that you may never able to do because the notion of "whole code" is fuzzy because of Java dynamic class loading). So detecting bad behavior will be harder than just running jdeps --jdk-internals on your jars. So compared to the removal of BASE64Encoder, it's better because for most of the existing codes, you don't have to do anything, it will just work, but for a small subset of all existing programs, they will start to behave weirdly because they rely on a broken semantics but were lucky? to work. R?mi > > On Fri, Jan 25, 2019 at 10:15 AM Brian Goetz > wrote: > >> Which brings me to my real point: if we go the latter route, when a big >> legacy customer with a big legacy codebase has their code broken by >> this, what happens next? I know its really easy to say that we'll tell >> them they were making a mistake for 22 years and their bad behavior >> finally caught up with them, but this answer is rarely well received in >> reality. >> From forax at univ-mlv.fr Sat Jan 26 13:05:56 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 26 Jan 2019 14:05:56 +0100 (CET) Subject: Bridge methods in the VM In-Reply-To: <18977A01-FE12-4A19-A2BB-A352084E969E@oracle.com> References: <18977A01-FE12-4A19-A2BB-A352084E969E@oracle.com> Message-ID: <1496363553.1351659.1548507956590.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "valhalla-spec-experts" > Envoy?: Mardi 22 Janvier 2019 14:51:35 > Objet: Bridge methods in the VM > We?ve been thinking for a long time about the possibilities of pushing bridging > down into the VM. The reasons we have had until now have not been strong > enough, but generic specialization, and compatible migration of libraries, give > us reason to take another swing. HTML inline (list willing); MD attached. I'm worry that we are missing the big picture here, bridging by the VM is one way to patch the vtable, there is another feature we need which is also equivalent to paching the vtable, the where condition, the method specialization where a generic method is replaced by a specific version depending on the value of the type arguments. If we have a general mechanism of vtable patching, the attribute Forwarding may still exist but instead of being directly read by the VM, it will be read by the JDK side that interact with the vtable patching (think like a bootstrap method) instead of being known by the VM and treated in a ad hoc way. That's said, having a forwarding attribute can be fun by itself. > VM Bridging > Historically, bridges have been generated by the static compiler. Bridges are > generated today when there is a covariant override (a String -returning method > overrides an Object -returning method), or when there is a generic > instantiation ( class Foo implements List ). (Historically, we also > generated access bridges when accessing private fields of classes in the same > nest, but nestmates did away with those!) > Intuitively, a bridge method is generated when a single method implementation > wants to respond to two distinct descriptors. At the language level, these two > methods really are the same method (the compiler enforces that subclasses > cannot override bridges), but at the VM level, they are two completely > unrelated methods. This asymmetry is the source of the problems with bridges. > One of the main values of making the JVM more aware of bridges is that we no > longer need to throw away the useful information that two seemingly different > methods are related in this way. > We took a running leap at this problem back in Java 8, when we were doing > default methods; this document constitutes a second run at this problem. Bridge > anomalies > Compiler-generated bridge methods are brittle; separate compilation can easily > generate situations where bridges are missing or inconsistent, which in turn > can result in AME, invoking a superclass method when an override exists in a > subclass, or everyone's favorite anomaly, the bridge loop. Start with: > class Parent implements Cloneable { > protected Object clone() { return (Parent)null; } > } > class Child extends Parent { > protected Parent clone() { return (Parent)super.clone(); } > } > Then, change Parent as follows, and recompile only that: > class Parent implements Cloneable { > protected Parent clone() { return (Parent)null; } > } > If you call clone() on Child you get a StackOverflowError (try it!) What's going > on is that when we make this change, the place in the hierarchy where the > bridge is introduced changes, but we don't recompile the entire hierarchy. As a > result, we have a vestigial bridge, and when we invoke clone() with > invokevirtual from the new bridge, we hit the old bridge, and loop. > The fundamental problem here is that we are rendering bridges into concrete code > "too early", based on a compile-time view of the type hierarchy. We want to > make bridge dispatch more dynamic; we can accomplish this by making bridges > more declarative than imperative, by recording the notion "A is a bridge for B" > in the classfile -- and using that in dispatch -- without having to decide > ahead of time exactly what bytecodes to use for bridging. Generic > specialization > Generics gave us a few situations where we want to be able to access a class > member through more than one signature; specialized generics will give us more. > For example, in a specialized class: > class Foo { > T t; > T get(); > } > In the instantiation Foo , the type of the field t , and the return type of > get() , are int . In the wildcard type Foo , the type of both of these is > Object . But because a Foo is a Foo , we want that a Foo responds > to invocations of get()Object , and to accesses of the field t as if it were of > type Object . > We could handle the method with yet more bridge methods, but bridge methods > don't do anything to help us with the field access. (In the M2 prototype we > lifted field access on wildcards to method invocations, which was a useful > prototyping move, but this does nothing to help existing erased binaries.) > So while bridge methods as a mechanism run out of gas here, the concept of > bridging -- recording that one member is merely an adaptation for another -- is > still applicable. Summary of problems > We can divide the problems with bridges into two groups -- old and new. The old > problems are not immediately urgent to fix (brittleness, separate compilation > anomalies), but are a persistent source of technical debt, bug tails, and > constraints on translation evolution. > The new problems are that specialized generics give us more places we want > bridges, making the old problems worse, as well as some places where we want > the effects of bridging, but for which traditional bridge methods won't do the > trick -- adaptation of fields. > Looking ahead, there are also some other issues on the horizon that we will > surely encounter as we migrate the libraries to use specialized generics -- > that have related characteristics. Proposed solution: forwarded members > I'll lay out one way to express bridges for both fields and methods in the > classfile, but there are others. In this model, for a member B that is a bridge > for member M, we include a declaration for B in the class file, but we attach a > Forwarding attribute to it, identifying the underlying member M (by descriptor, > since its name will be the same) and indicating that attempts to link to B > should be forwarded to M: > Forwarding { > u2 name; > u4 length; > u2 forwardeeType; > } > A method with a Forwarding attribute has no Code attribute. We would then > replace existing bridges with forwarding members, and for specializable > classes, we would generate a forwarding member for every method and field whose > signature contains type variables (and which therefore would change under > erasure), whose descriptor is the erasure of the forwardee descriptor. Note that if instead of a forwardeeType being a descriptor, we use a constant method handle, we have the semantics of the lambda metafactory when there is no captured values. > Adaptation > In all the cases so far, the descriptor of the bridge and of the forwardee > differ in relatively narrow ways -- the bridge descriptor can be adapted to the > forwardee descriptor with a subset of the adaptations performed by > MethodHandle::asType . This is adequate for generic and specialization bridges, > but as we'll see below, we may want to extend this set. Conflicts > If a class contains a bridge whose forwardee descriptor matches the bridge > descriptor exactly, the bridge is simply discarded. This decision can be made > only looking at the forwarding member, since we'll immediately see that the > member descriptor and the forwarding descriptor are identical. (Such situations > can arise when a class is specialized with the erasure of its type variables.) You can also have two forwarding descriptor with no real implementation, by example if you implement twice the same interface with two different type arguments. I believe the semantics is exacly the same as the default method semantics. > Semantics > The linkage semantics of forwarding members are different from that of ordinary > members. When linking a field or method access, if the resolved target is a > forwarding member, we want to make some adjustments at the invocation site . > For a getfield that links to a forwarding member, we link the access such that > it reads the forwardee field, and then adapts the resulting value to the bridge > field type, and leaves the adapted value on the stack. (This is as if the > getfield is linked to the result of taking a field getter method handle, and > adapting it with asType() to the bridge type.) For a putfield , we do the > reverse; we adapt the new field value to the forwardee type, and write to that > field. > If the forwarding member is a method, we re-resolve the method using the > forwardee signature, adapt its parameters as we would for putfield and its > return value as we would for getfield , and invoke the forwardee with the > invocation mode present at the call site . Again, the semantics here are as if > we took a method handle for the forwardee method, using the invocation mode > present at the call site, and adapted it with asType to the bridge descriptor. > The natural interpretation here is that rather than materializing a real field > or method body in the class, we manage the forwarding as part of the linkage > process, and include any necessary adaptations at the access site. The bridge > "body" is never actually invoked; we use the Forwarding metadata to adapt and > re-link the access sites. Note: that asType() implied that you can have a method with a varargs that can be forwarded to a method with no varargs (which is a nice way to implement the java.lang.reflect method invocation). > Bridge loops > The linkage strategy outlined above -- where we truly treat bridges as > forwarding to another member -- is the key to breaking the bridge loops. > Specifying forwarded members means that the JVM can be aware that two methods > are, at some level, the same method; the more complex linkage procedure allows > us to invoke the bridgee with the correct invocation mode all the time, even > under separate compilation. > In our Parent / Child example, Child::clone will do an invokespecial to invoke > Parent::clone()Object , which after recompilation is a bridge to > Parent::clone()Parent . We'll see that this is a bridge, and will forward to > Parent::clone()Parent , with an invokespecial , and we'll land in the right > place. > The elimination of bridge loops here stems from having raised the level of > abstraction in which we render the classfile; we record that > Parent::clone()Object is merely a bridge for Parent::clone()Parent , and so any > invocation of the former is redirected -- with the same invocation mode -- to > the latter. It is as if the client knew to invoke the right method. > User-controlled bridges > The compiler will generate bridges where the language requires it, but we also > have the opportunity to enable users to ask for bridges by providing a bridging > annotation on the declaration: > @GenerateBridge(returnType=Object.class) > public static String foo() { ... } > This will instruct the compiler to generate an Object -returning method that is > a bridge for foo() . This could be done for either fields or methods. (People > have written frameworks to do this; see for example [ > http://kohsuke.org/2010/08/07/potd-bridge-method-injector/ | > http://kohsuke.org/2010/08/07/potd-bridge-method-injector/ ] ). Near-future > problem: type migration > This mechanism may also be able to help us deal with the case when we want to > migrate signatures in an otherwise-incompatible manner, such as changing a > method that returns int to return long , or an OptionalInt to Optional , > or a old-style Date to the newer LocalDate . Numerous library modernizations > (such as migrating from the old date-time libraries to the JSR-310 versions) > are blocked on the ability to make such migrations; specializing the core > libraries (especially Stream ) will also generate such migrations. > Such migrations are a generalization of the sort of bridges we've been > discussing here; they involve adding an additional two features: > * Additional adaptations, including user-defined adaptations (such as between > Date and LocalDate ) > * Interaction with overriding, so that subclasses that override the old > signature can still work properly. > Projection-embedding pairs > Given two types T and U , a projection-embedding pair is a pair of functions p : > T ? U and e : U ? T such that ? u ? U p ( e ( u )) = u , and, if t is in the > range of p , then e ( p ( t )) = t . Examples of useful projection-embedding > pairs are the value sets of LV and QV for any value class V (we can embed the > entirety of QV in LV , but LV contains one value -- null -- that can't be > mapped back), any types T and U where T <: U , int and long (we can embed int > in long ), and Date and LocalDate . Intuitively, a p-e pair means we can freely > map back and forth for the embeddable subset, and we get some sort of failure > (e.g., NPE , or range truncation) otherwise. User-provided adaptations > Many of the adaptations we want to do are handled by MethodHandle::asType : > casting, widening, boxing. But sometimes, a migration involves types that > require user-provided adaptation behavior, such as converting Date to LocalDate > . (Bridges need to do these in both directions; we use the embedding for reads > and projection for writes.) Here, we can extend the format of the Forwarding > attribute to capture this additional behavior as pairs of method handles, such > as: > Forwarding { > u2 name; > u4 length; > u2 forwardeeType; > // adaptation metadata > u1 pePairs; > { u2 projection; u2 embedding; }[pePairs]; > } > When linking an access site for a forwarding member, when an adaptation is not > supported by MethodHandle::asType , we use the user-provided embedding function > for adapting return types and field reads, and the projection function for > adapting parameter types and field writes. If instead of using filter methods, you use a bootstrap method, you can do all the adaptations you want (at least the one provided by the j.l.i package). And if instead of a forward descriptor, you use a constant method handle (or if you send it as a bootstrap constant argument), you have the semantics of what i've called Mjolnir. Which is the equivalent power of expression of instrinsics described in Java. And combined with your user controlled bridge, you have a typesafe macro system. > Overriding > A more complicated problem is when we want to migrate the signature of an > instance member in a non-final class, because the class may have existing > subclasses that override the member, and may not yet have been recompiled. For > example, we might start with: > interface Collection { > int size(); > } > class ArrayList implements Collection { > int size() { return elements.length; } > } > Now, we recompile Collection but not ArrayList : > interface Collection { > @TypeMigration(returnType=int.class) > long size(); > } > When we go to load ArrayList , we'll find that it overrides the bridge ( > size()int ), and does not override the real method. We'll want to adjust > ArrayList as we load it to make up for this. > Half the problem of this migration is addressed by having a forwarding method > from size()int to size()long ; any legacy clients that call the old signature > will be bridged to the new one. To further indicate that overrides of such a > method should be adjusted, suppose we mark this forwarding bridge with > ACC_MIGRATED (in reality, we can probably use ACC_FINAL for this). Now, when we > go to load ArrayList , we'll see that size()int is trying to override a > migrated method (this is much like the existing override-a-final check). > Instead of rejecting the subtype, instead we use ACC_MIGRATED bridges as a > signal to fix up overrides. > We already have all the information in the Forwarding attribute that we need to > fix ArrayList::size ; we rewrite the descriptor to the forwardee descriptor, > use the projection function for adapting argument types, and the embedding > function for adapting the return type, and install the result in ArrayList . It > is as if we adapted the subclass method with asType to the forwardee > descriptor, and installed that in the subclass instead. > The effect is that in the presence of a migrated bridge, the bridge descriptor > is a toxic waste zone; callers are redirected to the new descriptor by > bridging, and overriders are redirected to the new descriptor by adaptation. If think this kind of adaptation is better done when the vtable is constructed. I mean conceptually created because i think that in term of implementation, the VM should insert stub in the vtable that will be resolved lazily the first time the stub is called by invoking the bootstrap method to avoid the initialization issue due to the fact that the vtable is usually created very early in the process. R?mi From brian.goetz at oracle.com Mon Jan 28 21:00:30 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 28 Jan 2019 16:00:30 -0500 Subject: Bridge methods in the VM In-Reply-To: <1496363553.1351659.1548507956590.JavaMail.zimbra@u-pem.fr> References: <18977A01-FE12-4A19-A2BB-A352084E969E@oracle.com> <1496363553.1351659.1548507956590.JavaMail.zimbra@u-pem.fr> Message-ID: > I'm worry that we are missing the big picture here, bridging by the VM > is one way to patch the vtable, there is another feature we need which > is also equivalent to paching the vtable, the where condition, the > method specialization where a generic method is replaced by a specific > version depending on the value of the type arguments. We're not missing that picture; we're just separating the two issues. But, bridging is _not_ just about patching the vtable.? As outlined here, it is about _redirecting_ before selection time. Compiler-generated bridges can only operate at selection time; forwarding bridges can also operate at resolution time, eliminating the problem of "stale" bridging code (and enabling bridging for fields.) > > Note that if instead of a forwardeeType being a descriptor, we use a > constant method handle, we have the semantics of the lambda > metafactory when there is no captured values. Right, but that only handles the part where you actually want to invoke the bridge.? It doesn't address the resolution-time option of not calling that method _at all_, but instead calling a different method.? (Method handles already have their invocation mode built in.) From karen.kinnear at oracle.com Tue Jan 29 18:55:17 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Tue, 29 Jan 2019 13:55:17 -0500 Subject: Valhalla EG notes Jan 16, 2019 Message-ID: Attendees: Remi, Tobi, Dan H, John, Brian, Simms, Fred, Karen Corrections welcome - thank you John for your summary notes AIs: Remi - write up why if we retrofit arrays for Arrays 2.0 specializable interface, do we need covariance? All: find examples where existing code assumes Class.getSuperClass() is java.lang.Object explicitly, or other pain points due to reparenting existing subclasses of Object with RefObject John: question about RefObject handling please? John: val x - new Object() assert x instanceof RefObject; ed. note - John - I don?t get the sentence above - I thought existing explicit java.lang.Objects would stay as supers of RefObject and ValObject, or did I miss context here? I. Array Covariance Karen: Is it reasonable to prototype full array covariance for LW2+? Brian: For generic specialization, if we do not have full array covariance then we would need to move toward Arrays 2.0. The cost of not doing array covariance is worse than the cost of doing it. Dan H: concern with array covariance - metamorphism performance cliff? Dan H: Also measure code cache pressure for existing code in the field - key to get this on larger applications, not just microenchmarks John: prototype full covariance - get user feedback and quantify performance costs Brian: Could flesh out Arrays 2.0 path a bit more so we see migration costs with covariance, least cognitive dissonance Dan H: Arrays 2.0 - retrofit existing arrays to implement an interface? Brian: Yes - e.g. something like an ArrayInterface specializable Remi: if we need to retrofit existing arrays will need covariance II. acmp Karen: proposal - reframe as substitutability goal: allow existing bytecodes to work with value types - since we have to find a way to handle interface/java.lang.Object ==, != comparisons with dynamic value types anyway initial prototype - shows no significant performance loss for specjvm2008, i.e. for existing non-VT code Dan H: Concern cost of recursion if a value type contains fields of its own type John: yes. Also interface/Object must recurse Brian: must do recursion based on static type ?good cases? : if no Float/Double and no interface/Object can do bit comparison: expect mostly memcmp ( ed. note - and no non flattened value type fields ) Remi: concern StackOverflowError - e.g. linkedList with next field is an object - could always just return false Brian: sharp edge: Object o1 = p1; Object o2 = p2 if (o1 == o2) ? // ouch Karen: if concern is StackOverflowError - implementation could be done iteratively Dan H: identityHashMap handling? Brian: serialization uses - may need to rewrite serialization Remi: Arrays.toString() uses Brian: if for values, identityHashMap also uses substitutability - all that are substitutable get lumped together John: Need to prototype and get Google or others to run on their stack Brian: Spirit of L-World goal is to explore a design from the user model - i.e. what users will expect vs. current explorations that have mostly been from performance/cost, will need to balance these Dan H: not sure what users will prefer - if they only want better escape analysis and stack allocation - why are we doing value types? Brian: flat & dense data if they can make compromises John: need to be able to throw away identities - leave out, reconstruct, or redefine operations to reabstract to something more than just identity Remi: what if .equals instead of substitutability? Brian: author overridable, but less optimizable, recursion in java code - better SOE backtrace Dan H: concern about safepoints in .equals Brian: ValObject.Equals must always return same result. will need multiple prototyping rounds here Karen: e.g. always false, substitutability, .equals III. Spirit of L-World RefObject/ValObject Dan H: retcon existing Objects extend RefObject - modify existing or reparent? what will break? Brian: Reparent - allows generic type constraints - as part of generic specialization side explorations Remi: Groovy 1 -> Groovy 2 migration challenges getClass.superclass() assumptions broke John: val x - new Object() assert x instanceof RefObject; (ed. note - see question under AIs above) Remi: solving a type issue with a class hierarchy - could we use interfaces? John: no - we need control over Object.wait() and Object.notify() Brian: If we were writing a language from scratch, we would put wait/notify on RefObject Let?s distinguish the model from legacy issues John: elevate user model into the type system e.g. Object.wait - adhoc handling/JVMS corner case vs. final method that throws Fred: Change the sources to retcon? Brian: at class definition time Remi: existing bytecodes use Object.wait) Brian: create a final method that throws an exception, so transparent in java doc, makes sense, not magic RefObject - rules - e.g. only public members ValObject - rules - e.g. no fields Remi: interfaces do not derive from Object FP: JVMS: calls via interfaces call public non static methods of Object Brian: why classes not interfaces? more JLS adhoc restrictions interfaces can not override object methods interfaces could be a possible fallback Arrays are RefObjects FP: need to double-check JVMS array parent class Simms: alternative (in chat) ValueType extends ValObject existing NotValue implement Identity John: possibly. ValObject is a class. NotValue (RefObject) is a mandatory interface for non-values (not the way we would do this if designing from scratch) Better to make both classes for symmetry so concrete non-default methods can be placed in both IV. Pattern match using ValueTypes Brian: not yet proposed translation strategy - still exploring see email thread instead thanks, Karen From forax at univ-mlv.fr Tue Jan 29 19:10:24 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 29 Jan 2019 20:10:24 +0100 (CET) Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: References: Message-ID: <1005471793.590171.1548789024403.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Karen Kinnear" > ?: "valhalla-spec-experts" > Envoy?: Mardi 29 Janvier 2019 19:55:17 > Objet: Valhalla EG notes Jan 16, 2019 > Attendees: Remi, Tobi, Dan H, John, Brian, Simms, Fred, Karen > Corrections welcome - thank you John for your summary notes > > AIs: > Remi - write up why if we retrofit arrays for Arrays 2.0 specializable > interface, > do we need covariance? > All: find examples where existing code assumes Class.getSuperClass() is > java.lang.Object explicitly, or other pain points due to reparenting existing > subclasses of Object with RefObject > John: question about RefObject handling please? > John: val x - new Object() assert x instanceof RefObject; > > ed. note - John - I don?t get the sentence above - I thought existing explicit > java.lang.Objects > would stay as supers of RefObject and ValObject, or did I miss context here? Hi Karen, currently the result of the expression "new Object()" is a reference type, so it should be a RefObject, but we have created an Object not a RefObject, so it's at best weird. [...] > > thanks, > Karen R?mi From brian.goetz at oracle.com Tue Jan 29 20:27:35 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 29 Jan 2019 15:27:35 -0500 Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: References: Message-ID: <56a03e73-62f9-7e96-519c-bc8ee512f3d4@oracle.com> On 1/29/2019 1:55 PM, Karen Kinnear wrote: > Brian: Could flesh out Arrays 2.0 path a bit more so we see migration costs > with covariance, least cognitive dissonance As promised.... The motivation for all this is driven by the desire to compatibly migrate existing generic libraries (JDK and external both) to Valhalla.? Currently, generic code has the following assumptions regarding arrays: ?- a T[] is always a subtype of Object[] ?- T[] in descriptors is erased to Bound[], usually Object[] ?- Erased descriptors show up in the wildcard type. The following all assumes we're not heading down the array covariance path, and instead charts a different path. When we allow generics to range over values, we have a problem: a Point[] is not an Object[].? That means a class like: ``` class Foo { ??? T[] getArray() { ... } } ``` has a problem; if the wildcard type `Foo` has its `getArray()` returning `Object[]`, but the underlying array is really a `Point[]`, we can't cast it, and we can't copy it without violating user expectations (for one, the identity changes.) So the Arrays 2.0 direction here says we need a new top type for arrays.? Let's call it Array. In the long run, `Array` will be a specializable generic type: ``` interface Array { ... } ``` We'll retrofit all the existing array classes to implement Array, and we'll change the compiler translation to erase arrays to `Array` or `Array`, not `Object[]`.? (Sadly, because Array will not yet be specializable, the best we can do for now is retrofit value and primitive arrays to be subtypes of raw `Array`, rather than the nicer `Array` or `Array`.) But, now we a problem: we have existing code that erases arrays to `Object[]`, but after the flag day, such code will erase to Array instead. Our solution for that problem is the full-blown solution to signature migration -- bridge methods, forwarding methods, and all the fancy handling of overriding "final" bridges as outlined in my Bridge Methods doc.? We'd have to emit bridges for every place we erase arrays to Array, which will take/return Object[] instead. And, I could easily imagine migration pain, even with all this.... So, scorecard: ?- We'd have to have a new top type for arrays, but one that we can't fully make generic yet (since that requires specialization.) - We'd have to do the full support for signature migration -- forwarding bridges for fields and methods, plus support for moving overrides out of the way when they override a "final" forwarding bridge. ?- We'd have to change how the compiler translates arrays in generic, and issue bridges to fix up the difference. These are all things we'd like eventually, but we have to do them all now if we don't want to do covariance now. From john.r.rose at oracle.com Tue Jan 29 23:29:01 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 29 Jan 2019 15:29:01 -0800 Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: <1005471793.590171.1548789024403.JavaMail.zimbra@u-pem.fr> References: <1005471793.590171.1548789024403.JavaMail.zimbra@u-pem.fr> Message-ID: <5EDCDCBC-7C90-451F-ADFE-53655893E39C@oracle.com> On Jan 29, 2019, at 11:10 AM, Remi Forax wrote: > > currently the result of the expression "new Object()" is a reference type, so it should be a RefObject, but we have created an Object not a RefObject, > so it's at best weird. I'd like to rationalize this in two steps. First, allow `new I(x?)` where `I` is an interface, to be treated as shorthand for `I.F(x?)` where the method `F` is somehow declared by `I` as its canonical factory. I'm thinking `List.of` is a good one. (Maybe also extend this rule to classes with non-public constructors.) Second, since `Object` is an honorary interface, change the meaning of `new Object()` to be `Object.newReference()` (or some such), by having `Object` declare `newReference` (of no arguments) as its canonical factory. Moving `new` statements to factories is coherent, also, with changing the translation strategy for Java to deprecate the new/init dance outside of the class being constructed, and eventually make it illegal in bytecode outside of the nest of the class being made. In other words, if I could go back in a time machine and rewrite the translation strategy, I'd insist that each class (or the JVM itself) would define a canonical factory for each constructor of that class, and require all other classes to allocate via the canonical factory. The new/init dance would be legal inside the class but nowhere else. That attack surface has been a painful one. And value types have to use factories from the get-go, so we've got to figure it out sooner or later. The name of the canonical factory can be, in fact, fixed as ''. ? John From john.r.rose at oracle.com Wed Jan 30 00:20:50 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 29 Jan 2019 16:20:50 -0800 Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: <56a03e73-62f9-7e96-519c-bc8ee512f3d4@oracle.com> References: <56a03e73-62f9-7e96-519c-bc8ee512f3d4@oracle.com> Message-ID: On Jan 29, 2019, at 12:27 PM, Brian Goetz wrote: > > But, now we a problem: we have existing code that erases arrays to `Object[]`, but after the flag day, such code will erase to Array instead. Maybe, another way to address this, besides bridging distinct descriptors, is to declare that the old descriptors describe the new types. After all, the old descriptors will be dead (to Java) after the translation strategy stops using them. Make all descriptors of the form `[T` erase to `Array` or some subtype. This isa lossy translation, which means the interpreter will have to do extra runtime type checks on xaload/xastore instructions. The JIT probably won't care. The scheme might shipwreck on reflection, since we need to distinguish all the array classes reflectively, but maybe that's where Crasses (runtime class mirrors) come in handy, to make that distinction. Put another way, int[] and String[] and Object[] are mirrored by Crasses, not proper classes. Maybe there's a solution along this path. Or maybe the floor gives way at some point and you are plunged into the basement where the monsters are. ? John From forax at univ-mlv.fr Wed Jan 30 15:05:51 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 30 Jan 2019 16:05:51 +0100 (CET) Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: References: <56a03e73-62f9-7e96-519c-bc8ee512f3d4@oracle.com> Message-ID: <397968322.798226.1548860751756.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Brian Goetz" > Cc: "valhalla-spec-experts" > Envoy?: Mercredi 30 Janvier 2019 01:20:50 > Objet: Re: Valhalla EG notes Jan 16, 2019 > On Jan 29, 2019, at 12:27 PM, Brian Goetz < [ mailto:brian.goetz at oracle.com | > brian.goetz at oracle.com ] > wrote: >> But, now we a problem: we have existing code that erases arrays to `Object[]`, >> but after the flag day, such code will erase to Array instead. > Maybe, another way to address this, besides bridging distinct > descriptors, is to declare that the old descriptors describe > the new types. > After all, the old descriptors will be dead (to Java) after the > translation strategy stops using them. Make all descriptors > of the form `[T` erase to `Array` or some subtype. > This isa lossy translation, which means the interpreter will have > to do extra runtime type checks on xaload/xastore instructions. > The JIT probably won't care. The scheme might shipwreck > on reflection, since we need to distinguish all the array classes > reflectively, but maybe that's where Crasses (runtime class > mirrors) come in handy, to make that distinction. Put another > way, int[] and String[] and Object[] are mirrored by Crasses, > not proper classes. yes, String[].class == Array.class.getSpecialization(String.class) Note that even if String is a reference type, here we still want the specialization, unlike for generics, so an Array is a special kind of generics that requires full reification. > Maybe there's a solution along this path. Or maybe the floor > gives way at some point and you are plunged into the basement > where the monsters are. the risk is more a Franckenstein's monster than Cthulhu IMO > ? John R?mi From forax at univ-mlv.fr Wed Jan 30 15:21:47 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 30 Jan 2019 16:21:47 +0100 (CET) Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: <5EDCDCBC-7C90-451F-ADFE-53655893E39C@oracle.com> References: <1005471793.590171.1548789024403.JavaMail.zimbra@u-pem.fr> <5EDCDCBC-7C90-451F-ADFE-53655893E39C@oracle.com> Message-ID: <1400545288.804352.1548861707348.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Karen Kinnear" , "valhalla-spec-experts" > > Envoy?: Mercredi 30 Janvier 2019 00:29:01 > Objet: Re: Valhalla EG notes Jan 16, 2019 > On Jan 29, 2019, at 11:10 AM, Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >> currently the result of the expression "new Object()" is a reference type, so it >> should be a RefObject, but we have created an Object not a RefObject, >> so it's at best weird. > I'd like to rationalize this in two steps. > First, allow `new I(x?)` where `I` is an interface, > to be treated as shorthand for `I.F(x?)` where > the method `F` is somehow declared by `I` as > its canonical factory. I'm thinking `List.of` is > a good one. (Maybe also extend this rule to classes > with non-public constructors.) > Second, since `Object` is an honorary interface, > change the meaning of `new Object()` to be > `Object.newReference()` (or some such), by > having `Object` declare `newReference` (of > no arguments) as its canonical factory. > Moving `new` statements to factories is coherent, > also, with changing the translation strategy for Java > to deprecate the new/init dance outside of the class > being constructed, and eventually make it illegal in > bytecode outside of the nest of the class being made. > In other words, if I could go back in a time machine > and rewrite the translation strategy, I'd insist that > each class (or the JVM itself) would define a canonical > factory for each constructor of that class, and require > all other classes to allocate via the canonical factory. > The new/init dance would be legal inside the class > but nowhere else. That attack surface has been a > painful one. And value types have to use factories > from the get-go, so we've got to figure it out sooner > or later. The name of the canonical factory can be, > in fact, fixed as ''. so new Object() <==> Object.new() and class Object { ... public static Object new() { return ReferenceObject.new(); } } and the translation new Object() to Object.new() has to be done by the VM because there is already a lot of code that do a new Object(). > ? John I think i still prefer having interfaces Val/Ref instead of classes, these interfaces can be injected by the VM, at least the Ref interface can be injected, for the Val interface, we can requires all value types to implement that interface. Another question, we have two representations of a value class, the real value class and the nullable value class, the Q and the L classes, Point.val and Point.box, does Point.box a subtype of Val/ValObject ? R?mi From brian.goetz at oracle.com Wed Jan 30 15:28:05 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Jan 2019 10:28:05 -0500 Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: <1400545288.804352.1548861707348.JavaMail.zimbra@u-pem.fr> References: <1005471793.590171.1548789024403.JavaMail.zimbra@u-pem.fr> <5EDCDCBC-7C90-451F-ADFE-53655893E39C@oracle.com> <1400545288.804352.1548861707348.JavaMail.zimbra@u-pem.fr> Message-ID: <6E04E969-2D55-40CF-A204-401A2D3DE82A@oracle.com> > I think i still prefer having interfaces Val/Ref instead of classes, these interfaces can be injected by the VM, at least the Ref interface can be injected, for the Val interface, we can requires all value types to implement that interface. I view using interfaces instead of classes here as a fallback; it?s more twitchy (VM now has to ensure that nothing implements both), it fails to ?paint the world as it is? (which I think has tremendous educational value), but most importantly, it fails to give us an opportunity to put final methods in {Ref,Val}Object. For example, suppose we handled wait/notify by making ValObject.wait() be a final method which throws. This lets us manage the constraint using tools we already have ? inheritance and final methods ? rather than inventing yet more special rules for what values can and cannot do. The more we can express in the object model users are familiar with, the more users will be able to immediately understand what is going on. Now, maybe injecting superclasses will be so problematic that we give up, and go with interfaces instead. That?s OK, but let?s be clear about what we?re doing ? that?s the ?we gave up because the right thing was too hard? route. From karen.kinnear at oracle.com Wed Jan 30 15:46:52 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 30 Jan 2019 10:46:52 -0500 Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: <1400545288.804352.1548861707348.JavaMail.zimbra@u-pem.fr> References: <1005471793.590171.1548789024403.JavaMail.zimbra@u-pem.fr> <5EDCDCBC-7C90-451F-ADFE-53655893E39C@oracle.com> <1400545288.804352.1548861707348.JavaMail.zimbra@u-pem.fr> Message-ID: <89434A64-6615-43CA-88D8-689F3F21B34C@oracle.com> I see two options proposed: Option 1: RefObject, ValObject - classes VM at class definition time replaces superclass of all existing classes from Object -> RefObject VM translates: new Object -> treated as Object.new() { returns RefObject.new() } // assume old code only wants references here risk: existing getClass.getSuper() assumptions Questions: How would the VM translate new Object[] ? - we might want existing code to be able to handle all subtypes of the top type in the existing arrays - or erased generics will break If I wanted to instantiate a new top type which could hold either RefObject or ValObject, how would I do that? Object.new()? Object.newObject() Option 2: RefObject, ValObject - interfaces VM at class definition adds RefObject to all existing classes value classes required to declare ValObject as superinterface ackwardness: VM needs special behaviors for Object.wait, Object.notify - if dynamically a ValObject, throw IMSE instead of having that obvious from the implementation in a superclass. Are there other concerns here? This seems cleaner to me - especially since I also believe that synchronization on ValObjects will also require special handling - which will be vm implementations rather than visible in java sources (exact plan is TBD). Remi - L/Q signatures are orthogonal to the type hierarchy - they declare null-free or null-tolerant so Point.val and Point.box are both subclasses of ValObject thanks, Karen > On Jan 30, 2019, at 10:21 AM, forax at univ-mlv.fr wrote: > > > > De: "John Rose" > ?: "Remi Forax" > Cc: "Karen Kinnear" , "valhalla-spec-experts" > Envoy?: Mercredi 30 Janvier 2019 00:29:01 > Objet: Re: Valhalla EG notes Jan 16, 2019 > On Jan 29, 2019, at 11:10 AM, Remi Forax > wrote: > > currently the result of the expression "new Object()" is a reference type, so it should be a RefObject, but we have created an Object not a RefObject, > so it's at best weird. > > I'd like to rationalize this in two steps. > > First, allow `new I(x?)` where `I` is an interface, > to be treated as shorthand for `I.F(x?)` where > the method `F` is somehow declared by `I` as > its canonical factory. I'm thinking `List.of` is > a good one. (Maybe also extend this rule to classes > with non-public constructors.) > > Second, since `Object` is an honorary interface, > change the meaning of `new Object()` to be > `Object.newReference()` (or some such), by > having `Object` declare `newReference` (of > no arguments) as its canonical factory. > > Moving `new` statements to factories is coherent, > also, with changing the translation strategy for Java > to deprecate the new/init dance outside of the class > being constructed, and eventually make it illegal in > bytecode outside of the nest of the class being made. > In other words, if I could go back in a time machine > and rewrite the translation strategy, I'd insist that > each class (or the JVM itself) would define a canonical > factory for each constructor of that class, and require > all other classes to allocate via the canonical factory. > The new/init dance would be legal inside the class > but nowhere else. That attack surface has been a > painful one. And value types have to use factories > from the get-go, so we've got to figure it out sooner > or later. The name of the canonical factory can be, > in fact, fixed as ''. > > so new Object() <==> Object.new() and > class Object { > ... > public static Object new() { return ReferenceObject.new(); } > } > and the translation new Object() to Object.new() has to be done by the VM because there is already a lot of code that do a new Object(). > > > ? John > > > I think i still prefer having interfaces Val/Ref instead of classes, these interfaces can be injected by the VM, at least the Ref interface can be injected, for the Val interface, we can requires all value types to implement that interface. > > Another question, we have two representations of a value class, the real value class and the nullable value class, the Q and the L classes, Point.val and Point.box, does Point.box a subtype of Val/ValObject ? > > R?mi From brian.goetz at oracle.com Wed Jan 30 16:01:24 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 30 Jan 2019 11:01:24 -0500 Subject: Valhalla EG notes Jan 16, 2019 In-Reply-To: <89434A64-6615-43CA-88D8-689F3F21B34C@oracle.com> References: <1005471793.590171.1548789024403.JavaMail.zimbra@u-pem.fr> <5EDCDCBC-7C90-451F-ADFE-53655893E39C@oracle.com> <1400545288.804352.1548861707348.JavaMail.zimbra@u-pem.fr> <89434A64-6615-43CA-88D8-689F3F21B34C@oracle.com> Message-ID: <99CF3462-E2D1-4FF8-9D69-2642067E6EDF@oracle.com> > Option 2: RefObject, ValObject - interfaces > VM at class definition adds RefObject to all existing classes > value classes required to declare ValObject as superinterface > > ackwardness: VM needs special behaviors for Object.wait, Object.notify - if dynamically a ValObject, throw IMSE > instead of having that obvious from the implementation in a superclass. > > Are there other concerns here? > This seems cleaner to me - especially since I also believe that synchronization on ValObjects will also require > special handling - which will be vm implementations rather than visible in java sources (exact plan is TBD). As I said to Remi, let?s be crystal clear that this seems cleaner _only to VM implementors_. To everyone else, it?s less clean, less powerful, less educational, and more complicated (?why can?t I implement both RefObject and ValObject??) If we bash our heads against the former for a while and give up, that?s OK; we can?t solve all the problems all the time. But let?s be clear what we?re doing, and why. Theres?s also a middle ground: ValObject is a class, and RefObject is an interface. While this may appear stupid, it reclaims more than half of what we lose by making both interfaces ? because it gives us a place to put privileged, final, value-specific behaviors. And values are going to need more of them than references will. From forax at univ-mlv.fr Thu Jan 31 11:19:32 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 31 Jan 2019 12:19:32 +0100 (CET) Subject: An example of substituability test that is recursive Message-ID: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> Hi Karen, here is an example that recurse to its death with the current prototype import java.lang.invoke.ValueBootstrapMethods; import java.util.stream.IntStream; public class Substituable { static value class Link { private final int value; private final Object next; public Link(int value, Object next) { this.value = value; this.next = next; } static Object times(int count) { return IntStream.range(0, count).boxed().reduce(null, (acc, index) -> new Link(index, acc), (l1, l2) -> { throw null; }); } } public static void main(String[] args) { var l = Link.times(1_000); //System.out.println(l == l); System.out.println(ValueBootstrapMethods.isSubstitutable(l, l)); } } R?mi From karen.kinnear at oracle.com Thu Jan 31 13:57:32 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Thu, 31 Jan 2019 08:57:32 -0500 Subject: An example of substituability test that is recursive In-Reply-To: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> Message-ID: <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> Remi, Thank you. So there were two kinds of ?recurse to its death? we talked about 1) expected behavior 2) surprise This strikes me as the expected behavior - where we can set expectations. If we were to always return false - how would you make this kind of example work? thanks, Karen > On Jan 31, 2019, at 6:19 AM, Remi Forax wrote: > > Hi Karen, > here is an example that recurse to its death with the current prototype > > import java.lang.invoke.ValueBootstrapMethods; > import java.util.stream.IntStream; > > public class Substituable { > static value class Link { > private final int value; > private final Object next; > > public Link(int value, Object next) { > this.value = value; > this.next = next; > } > > static Object times(int count) { > return IntStream.range(0, count).boxed().reduce(null, (acc, index) -> new Link(index, acc), (l1, l2) -> { throw null; }); > } > } > > > public static void main(String[] args) { > var l = Link.times(1_000); > > //System.out.println(l == l); > System.out.println(ValueBootstrapMethods.isSubstitutable(l, l)); > } > } > > > R?mi > From forax at univ-mlv.fr Thu Jan 31 14:34:31 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 31 Jan 2019 15:34:31 +0100 (CET) Subject: An example of substituability test that is recursive In-Reply-To: <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> Message-ID: <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Karen Kinnear" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Jeudi 31 Janvier 2019 14:57:32 > Objet: Re: An example of substituability test that is recursive > Remi, > > Thank you. So there were two kinds of ?recurse to its death? we talked about > 1) expected behavior > 2) surprise > > This strikes me as the expected behavior - where we can set expectations. I disagree that it's the expected behavior, or it's like saying that a GC can crash because a linked list too long is the expected behavior. The expected behavior is to cut the recursive calls if it's too deep and restart once you have pop all the frames. The other solution is to say that == should do an upcall to equals (after the null checking and the class checking), if equals throw a StackOverflow, it's the expected behavior because the user is in control of that behavior. > > If we were to always return false - how would you make this kind of example > work? no > > thanks, > Karen R?mi > >> On Jan 31, 2019, at 6:19 AM, Remi Forax wrote: >> >> Hi Karen, >> here is an example that recurse to its death with the current prototype >> >> import java.lang.invoke.ValueBootstrapMethods; >> import java.util.stream.IntStream; >> >> public class Substituable { >> static value class Link { >> private final int value; >> private final Object next; >> >> public Link(int value, Object next) { >> this.value = value; >> this.next = next; >> } >> >> static Object times(int count) { >> return IntStream.range(0, count).boxed().reduce(null, (acc, index) -> new >> Link(index, acc), (l1, l2) -> { throw null; }); >> } >> } >> >> >> public static void main(String[] args) { >> var l = Link.times(1_000); >> >> //System.out.println(l == l); >> System.out.println(ValueBootstrapMethods.isSubstitutable(l, l)); >> } >> } >> >> >> R?mi From forax at univ-mlv.fr Thu Jan 31 16:53:43 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 31 Jan 2019 17:53:43 +0100 (CET) Subject: An example of substituability test that is recursive In-Reply-To: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> Message-ID: <1648738568.1064824.1548953623242.JavaMail.zimbra@u-pem.fr> Thinking a little more about this example, i think it will be more common if we retrofit lambdas to be value type because a series of composition of lambdas is a kind of linked list in term of data structure in memory. For the composition of lambdas, a stack overflow is unlikely because otherwise calling the lambda will stack overflow too but it means that == will be slow (because it does a recursive comparison). R?mi ----- Mail original ----- > De: "Remi Forax" > ?: "Karen Kinnear" > Cc: "valhalla-spec-experts" > Envoy?: Jeudi 31 Janvier 2019 12:19:32 > Objet: An example of substituability test that is recursive > Hi Karen, > here is an example that recurse to its death with the current prototype > > import java.lang.invoke.ValueBootstrapMethods; > import java.util.stream.IntStream; > > public class Substituable { > static value class Link { > private final int value; > private final Object next; > > public Link(int value, Object next) { > this.value = value; > this.next = next; > } > > static Object times(int count) { > return IntStream.range(0, count).boxed().reduce(null, (acc, index) -> new > Link(index, acc), (l1, l2) -> { throw null; }); > } > } > > > public static void main(String[] args) { > var l = Link.times(1_000); > > //System.out.println(l == l); > System.out.println(ValueBootstrapMethods.isSubstitutable(l, l)); > } > } > > > R?mi From brian.goetz at oracle.com Thu Jan 31 16:56:47 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 31 Jan 2019 11:56:47 -0500 Subject: An example of substituability test that is recursive In-Reply-To: <1648738568.1064824.1548953623242.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <1648738568.1064824.1548953623242.JavaMail.zimbra@u-pem.fr> Message-ID: <09709FA1-7EFD-49E7-A843-7A8CC45E252C@oracle.com> Currently, `==` is almost useless on lambdas, as we disclaim nearly all promises. What this would mean is that `==` becomes slightly less useless and slightly more expensive. It?s not obvious this is a bad trade (or that it really matters, because people are discouraged from using `==` on lambdas anyway.) > On Jan 31, 2019, at 11:53 AM, Remi Forax wrote: > > Thinking a little more about this example, > i think it will be more common if we retrofit lambdas to be value type because a series of composition of lambdas is a kind of linked list in term of data structure in memory. > > For the composition of lambdas, a stack overflow is unlikely because otherwise calling the lambda will stack overflow too but it means that == will be slow (because it does a recursive comparison). > > R?mi > > ----- Mail original ----- >> De: "Remi Forax" >> ?: "Karen Kinnear" >> Cc: "valhalla-spec-experts" >> Envoy?: Jeudi 31 Janvier 2019 12:19:32 >> Objet: An example of substituability test that is recursive > >> Hi Karen, >> here is an example that recurse to its death with the current prototype >> >> import java.lang.invoke.ValueBootstrapMethods; >> import java.util.stream.IntStream; >> >> public class Substituable { >> static value class Link { >> private final int value; >> private final Object next; >> >> public Link(int value, Object next) { >> this.value = value; >> this.next = next; >> } >> >> static Object times(int count) { >> return IntStream.range(0, count).boxed().reduce(null, (acc, index) -> new >> Link(index, acc), (l1, l2) -> { throw null; }); >> } >> } >> >> >> public static void main(String[] args) { >> var l = Link.times(1_000); >> >> //System.out.println(l == l); >> System.out.println(ValueBootstrapMethods.isSubstitutable(l, l)); >> } >> } >> >> >> R?mi From forax at univ-mlv.fr Thu Jan 31 17:38:51 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 31 Jan 2019 18:38:51 +0100 (CET) Subject: An example of substituability test that is recursive In-Reply-To: <09709FA1-7EFD-49E7-A843-7A8CC45E252C@oracle.com> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <1648738568.1064824.1548953623242.JavaMail.zimbra@u-pem.fr> <09709FA1-7EFD-49E7-A843-7A8CC45E252C@oracle.com> Message-ID: <207294304.1072029.1548956331346.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "Karen Kinnear" , "valhalla-spec-experts" > Envoy?: Jeudi 31 Janvier 2019 17:56:47 > Objet: Re: An example of substituability test that is recursive > Currently, `==` is almost useless on lambdas, as we disclaim nearly all > promises. What this would mean is that `==` becomes slightly less useless and > slightly more expensive. It?s not obvious this is a bad trade (or that it > really matters, because people are discouraged from using `==` on lambdas > anyway.) > I agree. and it's still useless because there is no guarantee that with Runnable r1 = () -> {}; Runnable r2 = () -> {}; r1 and r2 use the same proxy, so r1 == r2 can still return false. I'm just saying that having recursive value types are more frequent that what i was thinking before. R?mi >> On Jan 31, 2019, at 11:53 AM, Remi Forax wrote: >> >> Thinking a little more about this example, >> i think it will be more common if we retrofit lambdas to be value type because a >> series of composition of lambdas is a kind of linked list in term of data >> structure in memory. >> >> For the composition of lambdas, a stack overflow is unlikely because otherwise >> calling the lambda will stack overflow too but it means that == will be slow >> (because it does a recursive comparison). >> >> R?mi >> >> ----- Mail original ----- >>> De: "Remi Forax" >>> ?: "Karen Kinnear" >>> Cc: "valhalla-spec-experts" >>> Envoy?: Jeudi 31 Janvier 2019 12:19:32 >>> Objet: An example of substituability test that is recursive >> >>> Hi Karen, >>> here is an example that recurse to its death with the current prototype >>> >>> import java.lang.invoke.ValueBootstrapMethods; >>> import java.util.stream.IntStream; >>> >>> public class Substituable { >>> static value class Link { >>> private final int value; >>> private final Object next; >>> >>> public Link(int value, Object next) { >>> this.value = value; >>> this.next = next; >>> } >>> >>> static Object times(int count) { >>> return IntStream.range(0, count).boxed().reduce(null, (acc, index) -> new >>> Link(index, acc), (l1, l2) -> { throw null; }); >>> } >>> } >>> >>> >>> public static void main(String[] args) { >>> var l = Link.times(1_000); >>> >>> //System.out.println(l == l); >>> System.out.println(ValueBootstrapMethods.isSubstitutable(l, l)); >>> } >>> } >>> >>> > >> R?mi From john.r.rose at oracle.com Thu Jan 31 18:03:13 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 31 Jan 2019 10:03:13 -0800 Subject: An example of substituability test that is recursive In-Reply-To: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> Message-ID: <3273D08D-4505-4062-84D7-3AEC89A55B97@oracle.com> On Jan 31, 2019, at 3:19 AM, Remi Forax wrote: > > here is an example that recurse to its death with the current prototype Fun fact: Change the Link to a Tree and you go from linear to exponential in the depth. *Just* a fun fact; it doesn't change Remi's point, which is that we can construct value object instances that have large "interiors". (Definition of the day: The "interior" of a value object instance is the set of variables that determine its substitutability equality and substitutability hash.) To me this takes on a different shade of urgency when I think about turning arrays into values. Suppose we had Arrays.valueCopyOf to take an immutable value-typed snapshot of an array. Very useful! (Sort of like frozen arrays.) You can make a size 1_000 value-array very quickly and easily, and its interior would be as large as Remi's laboriously constructed list. From john.r.rose at oracle.com Thu Jan 31 18:05:33 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 31 Jan 2019 10:05:33 -0800 Subject: An example of substituability test that is recursive In-Reply-To: <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> Message-ID: <1790611A-F656-4ACC-B2CD-8D6F88BBE636@oracle.com> On Jan 31, 2019, at 6:34 AM, forax at univ-mlv.fr wrote: > > The other solution is to say that == should do an upcall to equals (after the null checking and the class checking), if equals throw a StackOverflow, it's the expected behavior because the user is in control of that behavior. What you are doing here, I think, is exposing a requirement that we *don't* use the control stack for recursion on subst. testing (or hashing). That's a reasonable requirement. It leads to a worklist algorithm for doing this tricky thing, just like we had to do many times in the JIT. From forax at univ-mlv.fr Thu Jan 31 18:41:48 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 31 Jan 2019 19:41:48 +0100 (CET) Subject: An example of substituability test that is recursive In-Reply-To: <3273D08D-4505-4062-84D7-3AEC89A55B97@oracle.com> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3273D08D-4505-4062-84D7-3AEC89A55B97@oracle.com> Message-ID: <1784099434.1077554.1548960108830.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Karen Kinnear" , "valhalla-spec-experts" > > Envoy?: Jeudi 31 Janvier 2019 19:03:13 > Objet: Re: An example of substituability test that is recursive > On Jan 31, 2019, at 3:19 AM, Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >> here is an example that recurse to its death with the current prototype > Fun fact: Change the Link to a Tree and you go from > linear to exponential in the depth. *Just* a fun fact; > it doesn't change Remi's point, which is that we can > construct value object instances that have large > "interiors". you mean like this: static value class Link { private final int value; private final Object next; private final Object next2; public Link(int value, Object next) { this.value = value; this.next = next; this.next2 = next; } } yes, i creates a DAG that will be too long to traverse :( you basically, DDOS yourself if you do a ==. > (Definition of the day: The "interior" of a value > object instance is the set of variables that determine > its substitutability equality and substitutability hash.) > To me this takes on a different shade of urgency > when I think about turning arrays into values. > Suppose we had Arrays.valueCopyOf to take an > immutable value-typed snapshot of an array. > Very useful! (Sort of like frozen arrays.) You > can make a size 1_000 value-array very quickly > and easily, and its interior would be as large > as Remi's laboriously constructed list. Facebook skip [1] has an operation like this, data structure are mutable (for the runtime, from the user POV everything is non mutable) inside a function and frozen when publish outside. [1] [ http://www.skiplang.com/ | http://www.skiplang.com/ ] R?mi From forax at univ-mlv.fr Thu Jan 31 18:46:30 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 31 Jan 2019 19:46:30 +0100 (CET) Subject: An example of substituability test that is recursive In-Reply-To: <1790611A-F656-4ACC-B2CD-8D6F88BBE636@oracle.com> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> <1790611A-F656-4ACC-B2CD-8D6F88BBE636@oracle.com> Message-ID: <433488857.1077959.1548960390308.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Karen Kinnear" , "valhalla-spec-experts" > > Envoy?: Jeudi 31 Janvier 2019 19:05:33 > Objet: Re: An example of substituability test that is recursive > On Jan 31, 2019, at 6:34 AM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > wrote: >> The other solution is to say that == should do an upcall to equals (after the >> null checking and the class checking), if equals throw a StackOverflow, it's >> the expected behavior because the user is in control of that behavior. > What you are doing here, I think, is exposing a requirement > that we *don't* use the control stack for recursion on subst. > testing (or hashing). That's a reasonable requirement. > It leads to a worklist algorithm for doing this tricky thing, > just like we had to do many times in the JIT. IMO that the other solution, solution 1: you use a worklist (and also perhaps a marking algorithm to avoid to crawle the DAG) solution 2: you claim it's too complex and you just let the user deal with it by calling equals() (and provide a way for a user to call the default subst). R?mi From brian.goetz at oracle.com Thu Jan 31 18:52:34 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 31 Jan 2019 13:52:34 -0500 Subject: An example of substituability test that is recursive In-Reply-To: <433488857.1077959.1548960390308.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> <1790611A-F656-4ACC-B2CD-8D6F88BBE636@oracle.com> <433488857.1077959.1548960390308.JavaMail.zimbra@u-pem.fr> Message-ID: Let?s not do this to Java, shall we? https://twitter.com/seldo/status/1090931182227861508 > solution 2: you claim it's too complex and you just let the user deal with it by calling equals() (and provide a way for a user to call the default subs). From john.r.rose at oracle.com Thu Jan 31 19:43:33 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 31 Jan 2019 11:43:33 -0800 Subject: An example of substituability test that is recursive In-Reply-To: <1784099434.1077554.1548960108830.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3273D08D-4505-4062-84D7-3AEC89A55B97@oracle.com> <1784099434.1077554.1548960108830.JavaMail.zimbra@u-pem.fr> Message-ID: <0BE3D92C-E0A2-464F-B2DB-D628CE6FE0B4@oracle.com> On Jan 31, 2019, at 10:41 AM, forax at univ-mlv.fr wrote: > > yes, i creates a DAG that will be too long to traverse :( > you basically, DDOS yourself if you do a ==. The complexity is exponential in depth, and can be more than linear in heap allocation, because of the risk of repeat traversals. A worklist algorithm could make use of the secret implementation identity of heap nodes to push the complexity back down to heap allocation size. A portable algorithm could not. This is one more bit of evidence it should be a system intrinsic rather than a vanilla library function. From john.r.rose at oracle.com Thu Jan 31 19:45:13 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 31 Jan 2019 11:45:13 -0800 Subject: An example of substituability test that is recursive In-Reply-To: References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> <1790611A-F656-4ACC-B2CD-8D6F88BBE636@oracle.com> <433488857.1077959.1548960390308.JavaMail.zimbra@u-pem.fr> Message-ID: <06B574B0-A8CF-4816-9E92-C69819306567@oracle.com> On Jan 31, 2019, at 10:52 AM, Brian Goetz wrote: > > Let?s not do this to Java, shall we? > > https://twitter.com/seldo/status/1090931182227861508 > >> solution 2: you claim it's too complex and you just let the user deal with it by calling equals() (and provide a way for a user to call the default subs). That one motivated this: https://twitter.com/JohnRose00/status/1091046489256644608 From forax at univ-mlv.fr Thu Jan 31 20:12:07 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 31 Jan 2019 21:12:07 +0100 (CET) Subject: An example of substituability test that is recursive In-Reply-To: <0BE3D92C-E0A2-464F-B2DB-D628CE6FE0B4@oracle.com> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3273D08D-4505-4062-84D7-3AEC89A55B97@oracle.com> <1784099434.1077554.1548960108830.JavaMail.zimbra@u-pem.fr> <0BE3D92C-E0A2-464F-B2DB-D628CE6FE0B4@oracle.com> Message-ID: <609538561.1085248.1548965527314.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Karen Kinnear" , "valhalla-spec-experts" > > Envoy?: Jeudi 31 Janvier 2019 20:43:33 > Objet: Re: An example of substituability test that is recursive > On Jan 31, 2019, at 10:41 AM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > wrote: >> yes, i creates a DAG that will be too long to traverse :( >> you basically, DDOS yourself if you do a ==. > The complexity is exponential in depth, and > can be more than linear in heap allocation, > because of the risk of repeat traversals. > A worklist algorithm could make use of the > secret implementation identity of heap nodes > to push the complexity back down to heap > allocation size. A portable algorithm could > not. This is one more bit of evidence it should > be a system intrinsic rather than a vanilla library > function. yes, i buy that ! it will be always more expensive in Java, R?mi From karen.kinnear at oracle.com Thu Jan 31 20:36:09 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Thu, 31 Jan 2019 15:36:09 -0500 Subject: An example of substituability test that is recursive In-Reply-To: <433488857.1077959.1548960390308.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> <1790611A-F656-4ACC-B2CD-8D6F88BBE636@oracle.com> <433488857.1077959.1548960390308.JavaMail.zimbra@u-pem.fr> Message-ID: Option #1 was what I was suggesting in the meeting two weeks ago - if this starts to recurse too deeply, create a worklist - which should give you the same result. If you switch to .Equals - you might get a different result ? thanks, Karen > On Jan 31, 2019, at 1:46 PM, forax at univ-mlv.fr wrote: > > > > De: "John Rose" > ?: "Remi Forax" > Cc: "Karen Kinnear" , "valhalla-spec-experts" > Envoy?: Jeudi 31 Janvier 2019 19:05:33 > Objet: Re: An example of substituability test that is recursive > On Jan 31, 2019, at 6:34 AM, forax at univ-mlv.fr wrote: > > The other solution is to say that == should do an upcall to equals (after the null checking and the class checking), if equals throw a StackOverflow, it's the expected behavior because the user is in control of that behavior. > > What you are doing here, I think, is exposing a requirement > that we *don't* use the control stack for recursion on subst. > testing (or hashing). That's a reasonable requirement. > It leads to a worklist algorithm for doing this tricky thing, > just like we had to do many times in the JIT. > > > > IMO that the other solution, > solution 1: you use a worklist (and also perhaps a marking algorithm to avoid to crawle the DAG) > solution 2: you claim it's too complex and you just let the user deal with it by calling equals() (and provide a way for a user to call the default subst). > > R?mi > From forax at univ-mlv.fr Thu Jan 31 20:54:17 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 31 Jan 2019 21:54:17 +0100 (CET) Subject: An example of substituability test that is recursive In-Reply-To: References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> <1790611A-F656-4ACC-B2CD-8D6F88BBE636@oracle.com> <433488857.1077959.1548960390308.JavaMail.zimbra@u-pem.fr> Message-ID: <57381204.731.1548968057867.JavaMail.zimbra@u-pem.fr> > De: "Karen Kinnear" > ?: "Remi Forax" > Cc: "John Rose" , "valhalla-spec-experts" > > Envoy?: Jeudi 31 Janvier 2019 21:36:09 > Objet: Re: An example of substituability test that is recursive > Option #1 was what I was suggesting in the meeting two weeks ago - if this > starts > to recurse too deeply, create a worklist - which should give you the same > result. > If you switch to .Equals - you might get a different result ? yes, you are right, i did not understand what you mean by "expected behavior", my bad on that. > thanks, > Karen regards, R?mi >> On Jan 31, 2019, at 1:46 PM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] >> wrote: >>> De: "John Rose" < [ mailto:john.r.rose at oracle.com | john.r.rose at oracle.com ] > >>> ?: "Remi Forax" < [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > >>> Cc: "Karen Kinnear" < [ mailto:karen.kinnear at oracle.com | >>> karen.kinnear at oracle.com ] >, "valhalla-spec-experts" < [ >>> mailto:valhalla-spec-experts at openjdk.java.net | >>> valhalla-spec-experts at openjdk.java.net ] > >>> Envoy?: Jeudi 31 Janvier 2019 19:05:33 >>> Objet: Re: An example of substituability test that is recursive >>> On Jan 31, 2019, at 6:34 AM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] >>> wrote: >>>> The other solution is to say that == should do an upcall to equals (after the >>>> null checking and the class checking), if equals throw a StackOverflow, it's >>>> the expected behavior because the user is in control of that behavior. >>> What you are doing here, I think, is exposing a requirement >>> that we *don't* use the control stack for recursion on subst. >>> testing (or hashing). That's a reasonable requirement. >>> It leads to a worklist algorithm for doing this tricky thing, >>> just like we had to do many times in the JIT. >> IMO that the other solution, >> solution 1: you use a worklist (and also perhaps a marking algorithm to avoid to >> crawle the DAG) >> solution 2: you claim it's too complex and you just let the user deal with it by >> calling equals() (and provide a way for a user to call the default subst). >> R?mi From karen.kinnear at oracle.com Thu Jan 31 21:29:27 2019 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Thu, 31 Jan 2019 16:29:27 -0500 Subject: An example of substituability test that is recursive In-Reply-To: <57381204.731.1548968057867.JavaMail.zimbra@u-pem.fr> References: <999562327.975266.1548933572641.JavaMail.zimbra@u-pem.fr> <3712C9F6-844F-45B7-9C14-283443398645@oracle.com> <2112638685.1026191.1548945271037.JavaMail.zimbra@u-pem.fr> <1790611A-F656-4ACC-B2CD-8D6F88BBE636@oracle.com> <433488857.1077959.1548960390308.JavaMail.zimbra@u-pem.fr> <57381204.731.1548968057867.JavaMail.zimbra@u-pem.fr> Message-ID: <63A4F64D-A3D0-46C1-A21E-CEB29E95DF64@oracle.com> Actually - I was unclear - apologies. K > On Jan 31, 2019, at 3:54 PM, forax at univ-mlv.fr wrote: > > > > De: "Karen Kinnear" > ?: "Remi Forax" > Cc: "John Rose" , "valhalla-spec-experts" > Envoy?: Jeudi 31 Janvier 2019 21:36:09 > Objet: Re: An example of substituability test that is recursive > Option #1 was what I was suggesting in the meeting two weeks ago - if this starts > to recurse too deeply, create a worklist - which should give you the same result. > > If you switch to .Equals - you might get a different result ? > > yes, you are right, i did not understand what you mean by "expected behavior", my bad on that. > > > thanks, > Karen > > regards, > R?mi > > > On Jan 31, 2019, at 1:46 PM, forax at univ-mlv.fr wrote: > > > > De: "John Rose" > > ?: "Remi Forax" > > Cc: "Karen Kinnear" >, "valhalla-spec-experts" > > Envoy?: Jeudi 31 Janvier 2019 19:05:33 > Objet: Re: An example of substituability test that is recursive > On Jan 31, 2019, at 6:34 AM, forax at univ-mlv.fr wrote: > > The other solution is to say that == should do an upcall to equals (after the null checking and the class checking), if equals throw a StackOverflow, it's the expected behavior because the user is in control of that behavior. > > What you are doing here, I think, is exposing a requirement > that we *don't* use the control stack for recursion on subst. > testing (or hashing). That's a reasonable requirement. > It leads to a worklist algorithm for doing this tricky thing, > just like we had to do many times in the JIT. > > > > IMO that the other solution, > solution 1: you use a worklist (and also perhaps a marking algorithm to avoid to crawle the DAG) > solution 2: you claim it's too complex and you just let the user deal with it by calling equals() (and provide a way for a user to call the default subst). > > R?mi From john.r.rose at oracle.com Thu Jan 31 23:03:35 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 31 Jan 2019 15:03:35 -0800 Subject: An example of substituability test that is recursive References: Message-ID: <1262FC31-3272-43F5-A7AC-6758E3465562@oracle.com> (FYI, offline comment with my reply.) Begin forwarded message: From: John Rose Subject: Re: An example of substituability test that is recursive Date: January 31, 2019 at 11:48:53 AM PST To: Palo Marton Cc: R?mi Forax , Brian Goetz On Jan 31, 2019, at 11:28 AM, Palo Marton > wrote: > > (Sorry to bother you with direct email, I'm only on observer list...) > > Here is a possible solution for "infinitive recursion/DDOS" problem with cmp: > Make values that might be affected by this problem singletons on the heap, so compare function will be just fast pointer comparison (but at the cost of much slower creation). > > Larger values will likely be on the heap anyway, so this will just add another "way" of how values are stored. > 1) flattened > 2) heap > 3) heap+forced singleton > > Palo Not bad. By forced singleton I think you mean a structural intern, kind of like string interning? (Long ago for lists it was called hash-consing.) The whole point of forcing subst into acmp is to create a virtual emulation of such semantics. So we could switch implementation tactics for problem spots like polymorphic value fields. ? John