From forax at univ-mlv.fr Thu Aug 1 16:40:46 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 01 Aug 2019 16:40:46 +0000 Subject: Face to face meeting of yesterday Message-ID: <7BFD162C-DB6D-4E71-BCA9-BBEB1945F25D@univ-mlv.fr> Hi all, here is my takeaway of the valhalla face to face meeting we had yesterday. On the VM side, the model can be simplified by having all classes being either a Q-type or a L-type but not both at the same time. This will simplify the JVMS and solve the naming issue i have with Dan because the descriptor side and the runtime side will be aligned. Obviously, this has to be prototyped. Doing that move force us to reconsider the migration scenario for the value based class (Optional, LocalDateTime, etc). One possible solution to migrate those types is to introduce null default inline types aka inline types that have a special bit pattern to encode null. Again, this has to be prototyped (in the VM and javac too) because introducing those types may have an impact on the non nullable inline type performance. Brian has confirmed that the migration of the value based classes is not necessarily part of LW10, so the prototyping is not a huge priority, at the same time, given that LW10 has to be compatible with LW100, the migration story has to be figure out before the release of LW10. For LW10, we are currently not able to find a Java syntax to represent nullable inline type that interact with non-reified generics. The question mark notation seems to be the least worst compomise. Another possible syntax is John's Indirect class. Anyway, no proposal is a clear winner. A new data is that we now have Panama replacement of the ByteBuffer API, the Vector API, Doug Lea's new HashMap and Remi's version of the carrier object of the Amber's pattern matching that have dependencies on valhalla and it may be wise to introduce an intermediary target between now and LW10 (LW5?) which as proposed by Maurizio will only allow non public inline classes. In that case, in order to use an inline type with a non reified generics, the user will have to declare an interface. LW5 can be a good intermediary step until we figure out how LW10 works. One risk of LW5 is that it's not a release targeting all Java developpers but only the ones interrested in the performance of their libraries. We will have to be careful about how we communicate about LW5 to avoid frustration. I think part of the issue of figuring what LW10 is, is that we have, at least i have, a fuzzy vision of what LW100 is. I don't even know what are the different steps to introduce reified generics, i.e the path in between LW10 and LW100. I think we need a minimal prototype of what the reified generics might be, similar to the MVT for the inline types. As an eternal optimist, i think we can deliver LW5 next year ? Karen replacement ? This is something we have forgotten to discuss, Karen retires, who will take the mantle ? Or said differenly, I'm worry that the retirement of Karen may impact our velocity. Having this Wednesday meetings is important, it's one of the things that have allow us to have a good velocity. If can have a say in this, i vote for David Holmes, because like Karen, he doesn't hesitate to call our bluff. Is there another candidate ? R?mi -- Envoy? de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma bri?vet?. From forax at univ-mlv.fr Fri Aug 2 14:58:37 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 2 Aug 2019 16:58:37 +0200 (CEST) Subject: Toward LW10, the inline interface proposal Message-ID: <556617713.847797.1564757917376.JavaMail.zimbra@u-pem.fr> Hi all, We (Maurizio, John and i) have spent some time to play with another proposal of what LW10 can be based on the "confinent" idea i.e. an inine type can not appear in a public API and you need an interface to use it in erased generics. So the idea is to have at Java language level a construct that desugar itself into an interface and a package-private inline type. I will use the following syntax just to be able to explain the semantics and not as a committement on any syntax. Let say you can declare an "inline interface" (again wrong name) that will contains the impleemntation of the inline type and surface its API as an interface. public inline interface Foo { private int value; private Foo(int value) { this.value = value; } public Foo add(Foo foo) { return new Foo(value + foo.value); } public static Foo of(int value) { return new Foo(value); } } this code is desugared into Foo and Foo$val, with Foo and Foo$val nestmates (but not inner/outer classes). public sealed interface Foo permit Foo$val { public Foo add(Foo foo); public static Foo of(int value) { return new Foo$val(value); } } /* package-private */ final inline class Foo$val implements Foo { private int value; /* package-private */ Foo(int value) { this.value = value; } public Foo add(Foo foo) { return new Foo(value + foo.value); } } So an inline interface contains only private or public members: - private members are moved into the inline class - public fields and public constructors are not allowed - public instance methods are copied into the inline class, the signature is copied as abstract method into the interface - default method may stay in the interface (if supported) - reference to the interface constructor (meta: that's why it's the wrong name) are changed to reference to the inline class constructor. - Foo$val is denotable as Foo.val, but it can only appear in non public context (local variable, parameter type/return type of a private method, private field) and it can not be a type argument or a bound of a generic class/method. With that, i think we may not need null-default inline class anymore. For value based class, we have the issue that those are classes and not interfaces. My hope here is than we can teach the VM that we can retrofit invokevirtual and invokeinterface to work on both classes and interfaces. It will also make the transformation from any class to an interface binary compatible which will be a nice tool if you want to grow a library. For LW100, we will enable the support of public constructors wich will make Foo$val visible denotable. Obviously, it's just a proposal and i hope i'm not too far in my description of what we have discussed. R?mi From brian.goetz at oracle.com Fri Aug 2 15:36:10 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 2 Aug 2019 08:36:10 -0700 Subject: Toward LW10, the inline interface proposal In-Reply-To: <556617713.847797.1564757917376.JavaMail.zimbra@u-pem.fr> References: <556617713.847797.1564757917376.JavaMail.zimbra@u-pem.fr> Message-ID: <95F3D2B9-B509-4706-A662-161E91FBCE85@oracle.com> Obviously having the compiler generate both from the single declaration is just sugar. The key idea here is that pairing a sealed interface With a single inline implementation is a powerful combination. I am uncomfortable with the ad-hoc-ness that this would only work for inline classes. Is there any reason such a construct couldn?t work for any class ? Sent from my MacBook Wheel > On Aug 2, 2019, at 7:58 AM, Remi Forax wrote: > > Hi all, > > We (Maurizio, John and i) have spent some time to play with another proposal of what LW10 can be based on the "confinent" idea i.e. an inine type can not appear in a public API and you need an interface to use it in erased generics. > > So the idea is to have at Java language level a construct that desugar itself into an interface and a package-private inline type. > > I will use the following syntax just to be able to explain the semantics and not as a committement on any syntax. Let say you can declare an "inline interface" (again wrong name) that will contains the impleemntation of the inline type and surface its API as an interface. > > public inline interface Foo { > private int value; > > private Foo(int value) { > this.value = value; > } > > public Foo add(Foo foo) { > return new Foo(value + foo.value); > } > > public static Foo of(int value) { > return new Foo(value); > } > } > > this code is desugared into Foo and Foo$val, > with Foo and Foo$val nestmates (but not inner/outer classes). > > public sealed interface Foo permit Foo$val { > public Foo add(Foo foo); > > public static Foo of(int value) { > return new Foo$val(value); > } > } > /* package-private */ final inline class Foo$val implements Foo { > private int value; > > /* package-private */ Foo(int value) { > this.value = value; > } > > public Foo add(Foo foo) { > return new Foo(value + foo.value); > } > } > > So an inline interface contains only private or public members: > - private members are moved into the inline class > - public fields and public constructors are not allowed > - public instance methods are copied into the inline class, the signature is copied as abstract method into the interface > - default method may stay in the interface (if supported) > - reference to the interface constructor (meta: that's why it's the wrong name) are changed to reference to the inline class constructor. > - Foo$val is denotable as Foo.val, but it can only appear in non public context (local variable, parameter type/return type of a private method, private field) and it can not be a type argument or a bound of a generic class/method. > > With that, i think we may not need null-default inline class anymore. > > For value based class, we have the issue that those are classes and not interfaces. > My hope here is than we can teach the VM that we can retrofit invokevirtual and invokeinterface to work on both classes and interfaces. It will also make the transformation from any class to an interface binary compatible which will be a nice tool if you want to grow a library. > > For LW100, we will enable the support of public constructors wich will make Foo$val visible denotable. > > Obviously, it's just a proposal and i hope i'm not too far in my description of what we have discussed. > > R?mi From brian.goetz at oracle.com Sat Aug 3 16:37:56 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 3 Aug 2019 09:37:56 -0700 Subject: Collapsing the requirements Message-ID: As Remi noted, we had some good discussions at JVMLS this week. Combining that with some discussions John and I have been having over the past few weeks, I think the stars are aligning to enable us to dramatically slim down the requirements. The following threads have been in play for a while: - John: I hate the LPoint/QPoint distinction - Brian: I hate null-default types - Remi: I hate the V? type But the argument for each of these depended, in some way, on the others. I believe, with a few compromises, we can now prune them as a group, which would bring us to a much lower energy state. ## L^Q World ? Goodbye `LV;` We?ve taken it as a requirement that for a value type V, we have to support both LV and QV, where LV is the null-adjunction of QV. This has led to a lot of complexity in the runtime, where we have to manage dual mirrors. The main reason why we wanted LV was to support in-place migration. (In Q-world, LV was the box for QV, so it was natural for migration.) But, as we?ve worked our migration story, we?ve discovered we may not need LV for migration. And if we don?t, we surely don?t need it for anything else; worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we?re worried about erasure and overloading, to something like `LObject//V` using John?s type-operator notation.) Assuming we can restructure the migration story to not require LV to represent a VM-generated ?box" ? which I believe we can, see below ? we can drop the requirement for LV. An inline class V gives rise to a single type descriptor, QV (or whatever we decide to call it; John may have plans here.) ## Goodbye `V?` The other reason we wanted LV was that it was the obvious representation for the language type `V?` (V adjoined with null.) Uses for `V?` include: - Denoting non-flattened value fields; - Denoting non-flattened value arrays; - Denoting erased generics over values (`Foo`); - Denoting the type that is the adjunction of null to V (V | Null), when we really want to talk about nullability. But, we can do all this without a `V?` type; for every V, there is already at least one super type of V that includes `V|Null` ? Object, and any interface implemented by V. If we arrange that every value type V has a super type V?, not implemented by any other type ? then the value set of this V? is exactly that of `V?`. And we can use V? to do all the things `V?` did with respect to V ? including sub typing. The language doesn?t need the `?` type operator, it just needs to ensure that V? always exists. Which turns out to be easy, and also turns out to be essential to the migration story. #### Eclairs We can formalize this by requiring that every value type have a companion interface (or abstract class) supertype. Define an envelope-class pair (?eclair?) as a pair (V, I) such that: - V is an inline class - I is a sealed type - I permits V (and only V) - V <: I (We can define eclairs for indirect classes, but they are less interesting ? because indirect classes already contain null.) If every value type be a member of an eclair, we can use V when we want the flattenable, non-nullable, specializable type; and we use I when we want the non-flattenable, nullable, erased ?box?. We don?t need to denote `V?`; we can just use I, which is an ordinary, nominal type. Note that the VM can optimize eclairs about as well as it could for LV; it knows that I is the adjunction of null to V, so that all non-null values of I are identity free and must be of type V. What we lose relative to V? is access to fields; it was possible to do `getfield` on a LV, but not on I. If this is important (and maybe it?s not), we can handle this in other ways. #### With sugar on top, please We can provide syntax sugar (please, let?s not bike shed it now) so that an inline clause _automatically_ acquires a corresponding interface (if one is not explicitly provided), onto which the public members (and type variables, and other super types) of C are lifted. For sake of exposition, let?s say this is called `C.Box` ? and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.) We?ve been here before, and abandoned it because ?Box? seemed misleading, but let?s call it that for now. And now it is a real nominal type, not a fake type. In the simplest case, merely declaring an inline class could give rise to V.Box. Now, the type formerly known as `V?` is an ordinary, nominal interface (or abstract class) type. The user can say what they mean, and no magic is needed by either the language or the VM. Goodbye `V?`. #### Boxing conversion Given the constraints of the eclair relationship, it would be reasonable for the compiler to derive from this that there is a boxing conversion between C and I (I is just the value set of C, plus null ? which is the relationship boxes have with their corresponding primitives.) The boxing operation is a no-op (since C <: I) and the unboxing operation is a null checking cast. #### Erased generics Using the eclair wrapper also kicks the problem of erased generics down the road; if we use `Foo` for erased generics, and temporarily ban `Foo`, when we get to specialized generics, it will be obvious what `Foo` means (their common super type will be `Foo`). This is a less confusing world, as then ?List of erased V? and ?List of specialized V? don?t coexist; there?s only ?List of V? and ?List of V?s Box?. ## Migration The ability to migrate Optional and friends to values has been an important goal, but it has been the source of significant complexity. Our previous story leaned hard on ?When we migrate X to a value, LX will describe the box, so old callsites will continue to link.? But it turned out that brought a lot of baggage (forwarding bridges, null-default values) and compromises (null-default values lose their calling-convention optimizations), and over the past few weeks John and I have been cooking up a simpler eclair-based recipe for this. The world is indeed full of existing utterances of `LOptional`, and they will still want to work. Fortunately, Optional follows the rules for being a value-based class. We start with migrating Optional from a reference class to an eclair with a public abstract class and a private value implementation. Now, existing code just works (source and binary) ? and optionals are values. But, this isn?t good enough; existing variables of type Optional are not flattened. One of the objections raised to in-place migration was nullity; in order to migrate Optional to a true value, it would have to be a null-default value, and this already entailed compromises. If we?re willing to compromise further, we can get what we want without the baggage. And that compromises is: give up the name. So we define a new public value class `Opt` which is the value half of the eclair, and the existing Optional is the interface/abstract class half. Now, existing fields / arrays can migrate gradually to Opt, as they want the benefit of flattening; existing APIs can continue to truck in Optional (which have about the same optimizations as a null-default value would have on the stack.) This works because of the boxing conversion. Suppose we have old code that does: Optional o = makeAnOptional() when the user changes this to Opt o = ? the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a boxing conversion between them, so we insert an unbox conversion (null check) and we?re done. Users can migrate their fields gradually. The cost: the good name gets burned. But there is a compatible migration path from ref to value. Later, when we have bridges (we don?t need them yet!), we can migrate the library uses from Optional to Opt. ## Null-default values About 75% of the motivation for null-default values ? another huge source of complexity ? was to support the migration of value-based classes. And it wasn?t even a great solution ? because we still lost some key optimizations (e.g., calling conventions.) With the Optional -> Opt path, we don?t need null-default values, we get ordinary values. So while we pay the cost of changing the name, we gain the benefit that the new values, once the full migration is effected, we don?t carry the legacy performance baggage. Another 20% of the motivation was for security-sensitive classes whose default value did not represent a useful value, for which we wanted not null-default-ness but really initialization safety. Let?s look at another way to get there. There are a few ways to get there. One is to treat this problem as protecting such classes from uninitialized fields or array elements; another is to ensure that such classes (a) have no public fields and (b) perform the correct check at the top of each method (which can be injected by the compiler.) I don?t want to solve that problem right here, but I think there enough ways to get there that we can assume this isn?t a hard requirement. The other 5% was just the user-based ?I want null in my value set.? For those, we can tell users: use the interface box when you need null. ## Summary In one swoop, we can banish LV from the VM, V? from the language, and null-default values, by making a simple requirement: every value type is paired with an interface or abstract class ?box?. For most values, this can be automatically generated by the compiler and denoted via a well-known name (e.g., V.Box); for some values, such as those that are migrated from reference types, we can explicitly declare the box type and pick explicit names for both types. There?s a lot to work out, but I think it should be clear enough that this is a much, much lower energy state than what we were aiming at for L10, and also a simpler user model. Let?s focus discussions on validating the model first before we dive into mechanism or surface syntax. From forax at univ-mlv.fr Sat Aug 3 17:48:01 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 3 Aug 2019 19:48:01 +0200 (CEST) Subject: Collapsing the requirements In-Reply-To: References: Message-ID: <950816529.15227.1564854481184.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "valhalla-spec-experts" > Envoy?: Samedi 3 Ao?t 2019 18:37:56 > Objet: Collapsing the requirements > As Remi noted, we had some good discussions at JVMLS this week. Combining that > with some discussions John and I have been having over the past few weeks, I > think the stars are aligning to enable us to dramatically slim down the > requirements. The following threads have been in play for a while: > > - John: I hate the LPoint/QPoint distinction > - Brian: I hate null-default types > - Remi: I hate the V? type > > But the argument for each of these depended, in some way, on the others. I > believe, with a few compromises, we can now prune them as a group, which would > bring us to a much lower energy state. > > ## L^Q World ? Goodbye `LV;` > > We?ve taken it as a requirement that for a value type V, we have to support both > LV and QV, where LV is the null-adjunction of QV. This has led to a lot of > complexity in the runtime, where we have to manage dual mirrors. > > The main reason why we wanted LV was to support in-place migration. (In > Q-world, LV was the box for QV, so it was natural for migration.) But, as > we?ve worked our migration story, we?ve discovered we may not need LV for > migration. And if we don?t, we surely don?t need it for anything else; > worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we?re > worried about erasure and overloading, to something like `LObject//V` using > John?s type-operator notation.) > > Assuming we can restructure the migration story to not require LV to represent a > VM-generated ?box" ? which I believe we can, see below ? we can drop the > requirement for LV. An inline class V gives rise to a single type descriptor, > QV (or whatever we decide to call it; John may have plans here.) > > ## Goodbye `V?` > > The other reason we wanted LV was that it was the obvious representation for the > language type `V?` (V adjoined with null.) Uses for `V?` include: > > - Denoting non-flattened value fields; > - Denoting non-flattened value arrays; > - Denoting erased generics over values (`Foo`); > - Denoting the type that is the adjunction of null to V (V | Null), when we > really want to talk about nullability. > > But, we can do all this without a `V?` type; for every V, there is already at > least one super type of V that includes `V|Null` ? Object, and any interface > implemented by V. If we arrange that every value type V has a super type V?, > not implemented by any other type ? then the value set of this V? is exactly > that of `V?`. And we can use V? to do all the things `V?` did with respect to > V ? including sub typing. The language doesn?t need the `?` type operator, it > just needs to ensure that V? always exists. Which turns out to be easy, and > also turns out to be essential to the migration story. > > #### Eclairs > > We can formalize this by requiring that every value type have a companion > interface (or abstract class) supertype. Define an envelope-class pair > (?eclair?) as a pair (V, I) such that: > > - V is an inline class > - I is a sealed type > - I permits V (and only V) > - V <: I > > (We can define eclairs for indirect classes, but they are less interesting ? > because indirect classes already contain null.) > > If every value type be a member of an eclair, we can use V when we want the > flattenable, non-nullable, specializable type; and we use I when we want the > non-flattenable, nullable, erased ?box?. We don?t need to denote `V?`; we can > just use I, which is an ordinary, nominal type. > > Note that the VM can optimize eclairs about as well as it could for LV; it knows > that I is the adjunction of null to V, so that all non-null values of I are > identity free and must be of type V. > > What we lose relative to V? is access to fields; it was possible to do > `getfield` on a LV, but not on I. If this is important (and maybe it?s not), > we can handle this in other ways. > > #### With sugar on top, please > > We can provide syntax sugar (please, let?s not bike shed it now) so that an > inline clause _automatically_ acquires a corresponding interface (if one is not > explicitly provided), onto which the public members (and type variables, and > other super types) of C are lifted. For sake of exposition, let?s say this is > called `C.Box` ? and is a legitimate inner class of C (which can be generated > by the compiler as an ordinary classfile.) We?ve been here before, and > abandoned it because ?Box? seemed misleading, but let?s call it that for now. > And now it is a real nominal type, not a fake type. In the simplest case, > merely declaring an inline class could give rise to V.Box. > > Now, the type formerly known as `V?` is an ordinary, nominal interface (or > abstract class) type. The user can say what they mean, and no magic is needed > by either the language or the VM. Goodbye `V?`. > > #### Boxing conversion > > Given the constraints of the eclair relationship, it would be reasonable for the > compiler to derive from this that there is a boxing conversion between C and I > (I is just the value set of C, plus null ? which is the relationship boxes have > with their corresponding primitives.) The boxing operation is a no-op (since C > <: I) and the unboxing operation is a null checking cast. > > #### Erased generics > > Using the eclair wrapper also kicks the problem of erased generics down the > road; if we use `Foo` for erased generics, and temporarily ban `Foo`, > when we get to specialized generics, it will be obvious what `Foo` means > (their common super type will be `Foo`). This is a less confusing > world, as then ?List of erased V? and ?List of specialized V? don?t coexist; > there?s only ?List of V? and ?List of V?s Box?. > > ## Migration > > The ability to migrate Optional and friends to values has been an important > goal, but it has been the source of significant complexity. Our previous story > leaned hard on ?When we migrate X to a value, LX will describe the box, so old > callsites will continue to link.? But it turned out that brought a lot of > baggage (forwarding bridges, null-default values) and compromises (null-default > values lose their calling-convention optimizations), and over the past few > weeks John and I have been cooking up a simpler eclair-based recipe for this. > > The world is indeed full of existing utterances of `LOptional`, and they will > still want to work. Fortunately, Optional follows the rules for being a > value-based class. We start with migrating Optional from a reference class to > an eclair with a public abstract class and a private value implementation. > Now, existing code just works (source and binary) ? and optionals are values. > But, this isn?t good enough; existing variables of type Optional are not > flattened. > > One of the objections raised to in-place migration was nullity; in order to > migrate Optional to a true value, it would have to be a null-default value, and > this already entailed compromises. If we?re willing to compromise further, we > can get what we want without the baggage. And that compromises is: give up the > name. > > So we define a new public value class `Opt` which is the value half of the > eclair, and the existing Optional is the interface/abstract class half. Now, > existing fields / arrays can migrate gradually to Opt, as they want the benefit > of flattening; existing APIs can continue to truck in Optional (which have > about the same optimizations as a null-default value would have on the stack.) > > This works because of the boxing conversion. Suppose we have old code that > does: > > Optional o = makeAnOptional() > > when the user changes this to > > Opt o = ? > > the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a > boxing conversion between them, so we insert an unbox conversion (null check) > and we?re done. Users can migrate their fields gradually. The cost: the good > name gets burned. But there is a compatible migration path from ref to value. > > Later, when we have bridges (we don?t need them yet!), we can migrate the > library uses from Optional to Opt. > > ## Null-default values > > About 75% of the motivation for null-default values ? another huge source of > complexity ? was to support the migration of value-based classes. And it > wasn?t even a great solution ? because we still lost some key optimizations > (e.g., calling conventions.) With the Optional -> Opt path, we don?t need > null-default values, we get ordinary values. So while we pay the cost of > changing the name, we gain the benefit that the new values, once the full > migration is effected, we don?t carry the legacy performance baggage. > > Another 20% of the motivation was for security-sensitive classes whose default > value did not represent a useful value, for which we wanted not > null-default-ness but really initialization safety. Let?s look at another way > to get there. > > There are a few ways to get there. One is to treat this problem as protecting > such classes from uninitialized fields or array elements; another is to ensure > that such classes (a) have no public fields and (b) perform the correct check > at the top of each method (which can be injected by the compiler.) I don?t > want to solve that problem right here, but I think there enough ways to get > there that we can assume this isn?t a hard requirement. > > The other 5% was just the user-based ?I want null in my value set.? For those, > we can tell users: use the interface box when you need null. > > ## Summary > > In one swoop, we can banish LV from the VM, V? from the language, and > null-default values, by making a simple requirement: every value type is paired > with an interface or abstract class ?box?. For most values, this can be > automatically generated by the compiler and denoted via a well-known name > (e.g., V.Box); for some values, such as those that are migrated from reference > types, we can explicitly declare the box type and pick explicit names for both > types. > > There?s a lot to work out, but I think it should be clear enough that this is a > much, much lower energy state than what we were aiming at for L10, and also a > simpler user model. > > Let?s focus discussions on validating the model first before we dive into > mechanism or surface syntax. Trying to implement the Eclair interface by hand, it seems we need to have the method of the interface and the one of the implementation to use covariant return types, the box version retuning a box while the inline class version returning the inline class (which is fine because it's a subtype), otherwise when you call a method of the inline class the result is the box so you are loosing the non-null property when chaining calls. with Option the inline class and Option.Box the eclair interface, Option.Box box = Option.Box.of("foo"); box.filter() should return an Option.Box while Option option = option.filter() should return an Option if i'm not wrong about the need of covariant return types, it seems to suggest that we need a desugaring mechanism to at least take care of the covariant return type automatically. Here is an example, but in reverse order, with OptionEclair the eclair and OptionEclair.val being the inline type https://github.com/forax/valuetype-lworld/blob/master/src/main/java/fr.umlv.valuetype/fr/umlv/valuetype/OptionEclair.java A quick run of a benchmark seems to indicate that if the code use the eclair on stack, it's as fast as using an inline class when the code is fully inlined. https://github.com/forax/valuetype-lworld/blob/master/src/test/java/fr.umlv.valuetype/fr/umlv/valuetype/perf/OptionBenchMark.java#L67 R?mi From forax at univ-mlv.fr Sun Aug 4 20:06:26 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 4 Aug 2019 22:06:26 +0200 (CEST) Subject: Collapsing the requirements In-Reply-To: <950816529.15227.1564854481184.JavaMail.zimbra@u-pem.fr> References: <950816529.15227.1564854481184.JavaMail.zimbra@u-pem.fr> Message-ID: <15185241.85842.1564949186845.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Remi Forax" > ?: "Brian Goetz" > Cc: "valhalla-spec-experts" > Envoy?: Samedi 3 Ao?t 2019 19:48:01 > Objet: Re: Collapsing the requirements > ----- Mail original ----- >> De: "Brian Goetz" >> ?: "valhalla-spec-experts" >> Envoy?: Samedi 3 Ao?t 2019 18:37:56 >> Objet: Collapsing the requirements > >> As Remi noted, we had some good discussions at JVMLS this week. Combining that >> with some discussions John and I have been having over the past few weeks, I >> think the stars are aligning to enable us to dramatically slim down the >> requirements. The following threads have been in play for a while: >> >> - John: I hate the LPoint/QPoint distinction >> - Brian: I hate null-default types >> - Remi: I hate the V? type >> >> But the argument for each of these depended, in some way, on the others. I >> believe, with a few compromises, we can now prune them as a group, which would >> bring us to a much lower energy state. >> >> ## L^Q World ? Goodbye `LV;` >> >> We?ve taken it as a requirement that for a value type V, we have to support both >> LV and QV, where LV is the null-adjunction of QV. This has led to a lot of >> complexity in the runtime, where we have to manage dual mirrors. >> >> The main reason why we wanted LV was to support in-place migration. (In >> Q-world, LV was the box for QV, so it was natural for migration.) But, as >> we?ve worked our migration story, we?ve discovered we may not need LV for >> migration. And if we don?t, we surely don?t need it for anything else; >> worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we?re >> worried about erasure and overloading, to something like `LObject//V` using >> John?s type-operator notation.) >> >> Assuming we can restructure the migration story to not require LV to represent a >> VM-generated ?box" ? which I believe we can, see below ? we can drop the >> requirement for LV. An inline class V gives rise to a single type descriptor, >> QV (or whatever we decide to call it; John may have plans here.) >> >> ## Goodbye `V?` >> >> The other reason we wanted LV was that it was the obvious representation for the >> language type `V?` (V adjoined with null.) Uses for `V?` include: >> >> - Denoting non-flattened value fields; >> - Denoting non-flattened value arrays; >> - Denoting erased generics over values (`Foo`); >> - Denoting the type that is the adjunction of null to V (V | Null), when we >> really want to talk about nullability. >> >> But, we can do all this without a `V?` type; for every V, there is already at >> least one super type of V that includes `V|Null` ? Object, and any interface >> implemented by V. If we arrange that every value type V has a super type V?, >> not implemented by any other type ? then the value set of this V? is exactly >> that of `V?`. And we can use V? to do all the things `V?` did with respect to >> V ? including sub typing. The language doesn?t need the `?` type operator, it >> just needs to ensure that V? always exists. Which turns out to be easy, and >> also turns out to be essential to the migration story. >> >> #### Eclairs >> >> We can formalize this by requiring that every value type have a companion >> interface (or abstract class) supertype. Define an envelope-class pair >> (?eclair?) as a pair (V, I) such that: >> >> - V is an inline class >> - I is a sealed type >> - I permits V (and only V) >> - V <: I >> >> (We can define eclairs for indirect classes, but they are less interesting ? >> because indirect classes already contain null.) >> >> If every value type be a member of an eclair, we can use V when we want the >> flattenable, non-nullable, specializable type; and we use I when we want the >> non-flattenable, nullable, erased ?box?. We don?t need to denote `V?`; we can >> just use I, which is an ordinary, nominal type. >> >> Note that the VM can optimize eclairs about as well as it could for LV; it knows >> that I is the adjunction of null to V, so that all non-null values of I are >> identity free and must be of type V. >> >> What we lose relative to V? is access to fields; it was possible to do >> `getfield` on a LV, but not on I. If this is important (and maybe it?s not), >> we can handle this in other ways. >> >> #### With sugar on top, please >> >> We can provide syntax sugar (please, let?s not bike shed it now) so that an >> inline clause _automatically_ acquires a corresponding interface (if one is not >> explicitly provided), onto which the public members (and type variables, and >> other super types) of C are lifted. For sake of exposition, let?s say this is >> called `C.Box` ? and is a legitimate inner class of C (which can be generated >> by the compiler as an ordinary classfile.) We?ve been here before, and >> abandoned it because ?Box? seemed misleading, but let?s call it that for now. >> And now it is a real nominal type, not a fake type. In the simplest case, >> merely declaring an inline class could give rise to V.Box. >> >> Now, the type formerly known as `V?` is an ordinary, nominal interface (or >> abstract class) type. The user can say what they mean, and no magic is needed >> by either the language or the VM. Goodbye `V?`. >> >> #### Boxing conversion >> >> Given the constraints of the eclair relationship, it would be reasonable for the >> compiler to derive from this that there is a boxing conversion between C and I >> (I is just the value set of C, plus null ? which is the relationship boxes have >> with their corresponding primitives.) The boxing operation is a no-op (since C >> <: I) and the unboxing operation is a null checking cast. >> >> #### Erased generics >> >> Using the eclair wrapper also kicks the problem of erased generics down the >> road; if we use `Foo` for erased generics, and temporarily ban `Foo`, >> when we get to specialized generics, it will be obvious what `Foo` means >> (their common super type will be `Foo`). This is a less confusing >> world, as then ?List of erased V? and ?List of specialized V? don?t coexist; >> there?s only ?List of V? and ?List of V?s Box?. >> >> ## Migration >> >> The ability to migrate Optional and friends to values has been an important >> goal, but it has been the source of significant complexity. Our previous story >> leaned hard on ?When we migrate X to a value, LX will describe the box, so old >> callsites will continue to link.? But it turned out that brought a lot of >> baggage (forwarding bridges, null-default values) and compromises (null-default >> values lose their calling-convention optimizations), and over the past few >> weeks John and I have been cooking up a simpler eclair-based recipe for this. >> >> The world is indeed full of existing utterances of `LOptional`, and they will >> still want to work. Fortunately, Optional follows the rules for being a >> value-based class. We start with migrating Optional from a reference class to >> an eclair with a public abstract class and a private value implementation. >> Now, existing code just works (source and binary) ? and optionals are values. >> But, this isn?t good enough; existing variables of type Optional are not >> flattened. >> >> One of the objections raised to in-place migration was nullity; in order to >> migrate Optional to a true value, it would have to be a null-default value, and >> this already entailed compromises. If we?re willing to compromise further, we >> can get what we want without the baggage. And that compromises is: give up the >> name. >> >> So we define a new public value class `Opt` which is the value half of the >> eclair, and the existing Optional is the interface/abstract class half. Now, >> existing fields / arrays can migrate gradually to Opt, as they want the benefit >> of flattening; existing APIs can continue to truck in Optional (which have >> about the same optimizations as a null-default value would have on the stack.) >> >> This works because of the boxing conversion. Suppose we have old code that >> does: >> >> Optional o = makeAnOptional() >> >> when the user changes this to >> >> Opt o = ? >> >> the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a >> boxing conversion between them, so we insert an unbox conversion (null check) >> and we?re done. Users can migrate their fields gradually. The cost: the good >> name gets burned. But there is a compatible migration path from ref to value. >> >> Later, when we have bridges (we don?t need them yet!), we can migrate the >> library uses from Optional to Opt. >> >> ## Null-default values >> >> About 75% of the motivation for null-default values ? another huge source of >> complexity ? was to support the migration of value-based classes. And it >> wasn?t even a great solution ? because we still lost some key optimizations >> (e.g., calling conventions.) With the Optional -> Opt path, we don?t need >> null-default values, we get ordinary values. So while we pay the cost of >> changing the name, we gain the benefit that the new values, once the full >> migration is effected, we don?t carry the legacy performance baggage. >> >> Another 20% of the motivation was for security-sensitive classes whose default >> value did not represent a useful value, for which we wanted not >> null-default-ness but really initialization safety. Let?s look at another way >> to get there. >> >> There are a few ways to get there. One is to treat this problem as protecting >> such classes from uninitialized fields or array elements; another is to ensure >> that such classes (a) have no public fields and (b) perform the correct check >> at the top of each method (which can be injected by the compiler.) I don?t >> want to solve that problem right here, but I think there enough ways to get >> there that we can assume this isn?t a hard requirement. >> >> The other 5% was just the user-based ?I want null in my value set.? For those, >> we can tell users: use the interface box when you need null. >> >> ## Summary >> >> In one swoop, we can banish LV from the VM, V? from the language, and >> null-default values, by making a simple requirement: every value type is paired >> with an interface or abstract class ?box?. For most values, this can be >> automatically generated by the compiler and denoted via a well-known name >> (e.g., V.Box); for some values, such as those that are migrated from reference >> types, we can explicitly declare the box type and pick explicit names for both >> types. >> >> There?s a lot to work out, but I think it should be clear enough that this is a >> much, much lower energy state than what we were aiming at for L10, and also a >> simpler user model. >> >> Let?s focus discussions on validating the model first before we dive into >> mechanism or surface syntax. > > Trying to implement the Eclair interface by hand, > it seems we need to have the method of the interface and the one of the > implementation to use covariant return types, > the box version retuning a box while the inline class version returning the > inline class (which is fine because it's a subtype), > otherwise when you call a method of the inline class the result is the box so > you are loosing the non-null property when chaining calls. so depending on - if you want to 'emulate' a value based class, in that case the eclair is by example Optional and the inline class can have a specific name - if you want an inline class and an eclair only for interacting with erased generics, like Complex and Complex.box. - if the inline class use co-variant return types. so no good solution that will fit them all, which suggests that we should not provide any special compiler support. R?mi From brian.goetz at oracle.com Mon Aug 5 02:15:16 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 4 Aug 2019 19:15:16 -0700 Subject: Collapsing the requirements In-Reply-To: <15185241.85842.1564949186845.JavaMail.zimbra@u-pem.fr> References: <950816529.15227.1564854481184.JavaMail.zimbra@u-pem.fr> <15185241.85842.1564949186845.JavaMail.zimbra@u-pem.fr> Message-ID: <3403A975-531D-4B35-A4F6-BA7C1734E6EA@oracle.com> I?m not really following the concern you?re having. Let?s divide into two cases: - Inline classes declared explicitly with their box (e.g., Optional) - Inline classes declared without a box, which get an implicit box. So let?s say you have: inline class Foo implements Bar { Foo id(Foo f) { return f; } } This would acquire a nested interface Foo.Box, as a super type of Foo: interface Box extends Bar { Foo id(Foo f); } Can you outline what you see as going wrong here? > On Aug 4, 2019, at 1:06 PM, Remi Forax wrote: > > ----- Mail original ----- >> De: "Remi Forax" >> ?: "Brian Goetz" >> Cc: "valhalla-spec-experts" >> Envoy?: Samedi 3 Ao?t 2019 19:48:01 >> Objet: Re: Collapsing the requirements > >> ----- Mail original ----- >>> De: "Brian Goetz" >>> ?: "valhalla-spec-experts" >>> Envoy?: Samedi 3 Ao?t 2019 18:37:56 >>> Objet: Collapsing the requirements >> >>> As Remi noted, we had some good discussions at JVMLS this week. Combining that >>> with some discussions John and I have been having over the past few weeks, I >>> think the stars are aligning to enable us to dramatically slim down the >>> requirements. The following threads have been in play for a while: >>> >>> - John: I hate the LPoint/QPoint distinction >>> - Brian: I hate null-default types >>> - Remi: I hate the V? type >>> >>> But the argument for each of these depended, in some way, on the others. I >>> believe, with a few compromises, we can now prune them as a group, which would >>> bring us to a much lower energy state. >>> >>> ## L^Q World ? Goodbye `LV;` >>> >>> We?ve taken it as a requirement that for a value type V, we have to support both >>> LV and QV, where LV is the null-adjunction of QV. This has led to a lot of >>> complexity in the runtime, where we have to manage dual mirrors. >>> >>> The main reason why we wanted LV was to support in-place migration. (In >>> Q-world, LV was the box for QV, so it was natural for migration.) But, as >>> we?ve worked our migration story, we?ve discovered we may not need LV for >>> migration. And if we don?t, we surely don?t need it for anything else; >>> worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we?re >>> worried about erasure and overloading, to something like `LObject//V` using >>> John?s type-operator notation.) >>> >>> Assuming we can restructure the migration story to not require LV to represent a >>> VM-generated ?box" ? which I believe we can, see below ? we can drop the >>> requirement for LV. An inline class V gives rise to a single type descriptor, >>> QV (or whatever we decide to call it; John may have plans here.) >>> >>> ## Goodbye `V?` >>> >>> The other reason we wanted LV was that it was the obvious representation for the >>> language type `V?` (V adjoined with null.) Uses for `V?` include: >>> >>> - Denoting non-flattened value fields; >>> - Denoting non-flattened value arrays; >>> - Denoting erased generics over values (`Foo`); >>> - Denoting the type that is the adjunction of null to V (V | Null), when we >>> really want to talk about nullability. >>> >>> But, we can do all this without a `V?` type; for every V, there is already at >>> least one super type of V that includes `V|Null` ? Object, and any interface >>> implemented by V. If we arrange that every value type V has a super type V?, >>> not implemented by any other type ? then the value set of this V? is exactly >>> that of `V?`. And we can use V? to do all the things `V?` did with respect to >>> V ? including sub typing. The language doesn?t need the `?` type operator, it >>> just needs to ensure that V? always exists. Which turns out to be easy, and >>> also turns out to be essential to the migration story. >>> >>> #### Eclairs >>> >>> We can formalize this by requiring that every value type have a companion >>> interface (or abstract class) supertype. Define an envelope-class pair >>> (?eclair?) as a pair (V, I) such that: >>> >>> - V is an inline class >>> - I is a sealed type >>> - I permits V (and only V) >>> - V <: I >>> >>> (We can define eclairs for indirect classes, but they are less interesting ? >>> because indirect classes already contain null.) >>> >>> If every value type be a member of an eclair, we can use V when we want the >>> flattenable, non-nullable, specializable type; and we use I when we want the >>> non-flattenable, nullable, erased ?box?. We don?t need to denote `V?`; we can >>> just use I, which is an ordinary, nominal type. >>> >>> Note that the VM can optimize eclairs about as well as it could for LV; it knows >>> that I is the adjunction of null to V, so that all non-null values of I are >>> identity free and must be of type V. >>> >>> What we lose relative to V? is access to fields; it was possible to do >>> `getfield` on a LV, but not on I. If this is important (and maybe it?s not), >>> we can handle this in other ways. >>> >>> #### With sugar on top, please >>> >>> We can provide syntax sugar (please, let?s not bike shed it now) so that an >>> inline clause _automatically_ acquires a corresponding interface (if one is not >>> explicitly provided), onto which the public members (and type variables, and >>> other super types) of C are lifted. For sake of exposition, let?s say this is >>> called `C.Box` ? and is a legitimate inner class of C (which can be generated >>> by the compiler as an ordinary classfile.) We?ve been here before, and >>> abandoned it because ?Box? seemed misleading, but let?s call it that for now. >>> And now it is a real nominal type, not a fake type. In the simplest case, >>> merely declaring an inline class could give rise to V.Box. >>> >>> Now, the type formerly known as `V?` is an ordinary, nominal interface (or >>> abstract class) type. The user can say what they mean, and no magic is needed >>> by either the language or the VM. Goodbye `V?`. >>> >>> #### Boxing conversion >>> >>> Given the constraints of the eclair relationship, it would be reasonable for the >>> compiler to derive from this that there is a boxing conversion between C and I >>> (I is just the value set of C, plus null ? which is the relationship boxes have >>> with their corresponding primitives.) The boxing operation is a no-op (since C >>> <: I) and the unboxing operation is a null checking cast. >>> >>> #### Erased generics >>> >>> Using the eclair wrapper also kicks the problem of erased generics down the >>> road; if we use `Foo` for erased generics, and temporarily ban `Foo`, >>> when we get to specialized generics, it will be obvious what `Foo` means >>> (their common super type will be `Foo`). This is a less confusing >>> world, as then ?List of erased V? and ?List of specialized V? don?t coexist; >>> there?s only ?List of V? and ?List of V?s Box?. >>> >>> ## Migration >>> >>> The ability to migrate Optional and friends to values has been an important >>> goal, but it has been the source of significant complexity. Our previous story >>> leaned hard on ?When we migrate X to a value, LX will describe the box, so old >>> callsites will continue to link.? But it turned out that brought a lot of >>> baggage (forwarding bridges, null-default values) and compromises (null-default >>> values lose their calling-convention optimizations), and over the past few >>> weeks John and I have been cooking up a simpler eclair-based recipe for this. >>> >>> The world is indeed full of existing utterances of `LOptional`, and they will >>> still want to work. Fortunately, Optional follows the rules for being a >>> value-based class. We start with migrating Optional from a reference class to >>> an eclair with a public abstract class and a private value implementation. >>> Now, existing code just works (source and binary) ? and optionals are values. >>> But, this isn?t good enough; existing variables of type Optional are not >>> flattened. >>> >>> One of the objections raised to in-place migration was nullity; in order to >>> migrate Optional to a true value, it would have to be a null-default value, and >>> this already entailed compromises. If we?re willing to compromise further, we >>> can get what we want without the baggage. And that compromises is: give up the >>> name. >>> >>> So we define a new public value class `Opt` which is the value half of the >>> eclair, and the existing Optional is the interface/abstract class half. Now, >>> existing fields / arrays can migrate gradually to Opt, as they want the benefit >>> of flattening; existing APIs can continue to truck in Optional (which have >>> about the same optimizations as a null-default value would have on the stack.) >>> >>> This works because of the boxing conversion. Suppose we have old code that >>> does: >>> >>> Optional o = makeAnOptional() >>> >>> when the user changes this to >>> >>> Opt o = ? >>> >>> the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a >>> boxing conversion between them, so we insert an unbox conversion (null check) >>> and we?re done. Users can migrate their fields gradually. The cost: the good >>> name gets burned. But there is a compatible migration path from ref to value. >>> >>> Later, when we have bridges (we don?t need them yet!), we can migrate the >>> library uses from Optional to Opt. >>> >>> ## Null-default values >>> >>> About 75% of the motivation for null-default values ? another huge source of >>> complexity ? was to support the migration of value-based classes. And it >>> wasn?t even a great solution ? because we still lost some key optimizations >>> (e.g., calling conventions.) With the Optional -> Opt path, we don?t need >>> null-default values, we get ordinary values. So while we pay the cost of >>> changing the name, we gain the benefit that the new values, once the full >>> migration is effected, we don?t carry the legacy performance baggage. >>> >>> Another 20% of the motivation was for security-sensitive classes whose default >>> value did not represent a useful value, for which we wanted not >>> null-default-ness but really initialization safety. Let?s look at another way >>> to get there. >>> >>> There are a few ways to get there. One is to treat this problem as protecting >>> such classes from uninitialized fields or array elements; another is to ensure >>> that such classes (a) have no public fields and (b) perform the correct check >>> at the top of each method (which can be injected by the compiler.) I don?t >>> want to solve that problem right here, but I think there enough ways to get >>> there that we can assume this isn?t a hard requirement. >>> >>> The other 5% was just the user-based ?I want null in my value set.? For those, >>> we can tell users: use the interface box when you need null. >>> >>> ## Summary >>> >>> In one swoop, we can banish LV from the VM, V? from the language, and >>> null-default values, by making a simple requirement: every value type is paired >>> with an interface or abstract class ?box?. For most values, this can be >>> automatically generated by the compiler and denoted via a well-known name >>> (e.g., V.Box); for some values, such as those that are migrated from reference >>> types, we can explicitly declare the box type and pick explicit names for both >>> types. >>> >>> There?s a lot to work out, but I think it should be clear enough that this is a >>> much, much lower energy state than what we were aiming at for L10, and also a >>> simpler user model. >>> >>> Let?s focus discussions on validating the model first before we dive into >>> mechanism or surface syntax. >> >> Trying to implement the Eclair interface by hand, >> it seems we need to have the method of the interface and the one of the >> implementation to use covariant return types, >> the box version retuning a box while the inline class version returning the >> inline class (which is fine because it's a subtype), >> otherwise when you call a method of the inline class the result is the box so >> you are loosing the non-null property when chaining calls. > > so depending on > - if you want to 'emulate' a value based class, in that case the eclair is by example Optional and the inline class can have a specific name > - if you want an inline class and an eclair only for interacting with erased generics, like Complex and Complex.box. > - if the inline class use co-variant return types. > > so no good solution that will fit them all, which suggests that we should not provide any special compiler support. > > R?mi From forax at univ-mlv.fr Mon Aug 5 11:08:27 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Mon, 5 Aug 2019 13:08:27 +0200 (CEST) Subject: Collapsing the requirements In-Reply-To: <3403A975-531D-4B35-A4F6-BA7C1734E6EA@oracle.com> References: <950816529.15227.1564854481184.JavaMail.zimbra@u-pem.fr> <15185241.85842.1564949186845.JavaMail.zimbra@u-pem.fr> <3403A975-531D-4B35-A4F6-BA7C1734E6EA@oracle.com> Message-ID: <1291415783.81747.1565003307284.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Lundi 5 Ao?t 2019 04:15:16 > Objet: Re: Collapsing the requirements > I?m not really following the concern you?re having. > Let?s divide into two cases: > - Inline classes declared explicitly with their box (e.g., Optional) > - Inline classes declared without a box, which get an implicit box. > So let?s say you have: > inline class Foo implements Bar { > Foo id(Foo f) { return f; } > } > This would acquire a nested interface Foo.Box, as a super type of Foo: > interface Box extends Bar { > Foo id(Foo f); > } > Can you outline what you see as going wrong here? This is breaking confinement but maybe confinement is a bad idea. Let me try to explain why i think confinement is a necessary evil for LW10. if we have a public method like "id" that takes a Foo or return a Foo it means that you are providing a public API that doesn't play well with inference (because Foo can not be a type argument for T). Any user codes that will want to use a Stream, an Optional, List.of(), Arrays.asList(), etc will not work. So you are transferring the problem of generic over inline classes being not supported in LW10 from the library developer to the users of those libraries. So the question is should we do something to help library developers to avoid them to publish public API with inline classes in their signature or do we think that library developers will discover by themselves that having an API with an inline class is the middle is not user friendly. R?mi From brian.goetz at oracle.com Mon Aug 5 14:18:48 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 5 Aug 2019 07:18:48 -0700 Subject: Collapsing the requirements In-Reply-To: <1291415783.81747.1565003307284.JavaMail.zimbra@u-pem.fr> References: <950816529.15227.1564854481184.JavaMail.zimbra@u-pem.fr> <15185241.85842.1564949186845.JavaMail.zimbra@u-pem.fr> <3403A975-531D-4B35-A4F6-BA7C1734E6EA@oracle.com> <1291415783.81747.1565003307284.JavaMail.zimbra@u-pem.fr> Message-ID: OK, now I see the chain of assumptions that led you to your concern. Not sure if all those assumptions are valid, but we?re going to have to tackle them explicitly. > On Aug 5, 2019, at 4:08 AM, forax at univ-mlv.fr wrote: > > > > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Lundi 5 Ao?t 2019 04:15:16 > Objet: Re: Collapsing the requirements > I?m not really following the concern you?re having. > > Let?s divide into two cases: > - Inline classes declared explicitly with their box (e.g., Optional) > - Inline classes declared without a box, which get an implicit box. > > So let?s say you have: > > inline class Foo implements Bar { > Foo id(Foo f) { return f; } > } > > This would acquire a nested interface Foo.Box, as a super type of Foo: > > interface Box extends Bar { > Foo id(Foo f); > } > > Can you outline what you see as going wrong here? > > This is breaking confinement but maybe confinement is a bad idea. > Let me try to explain why i think confinement is a necessary evil for LW10. > > if we have a public method like "id" that takes a Foo or return a Foo it means that you are providing a public API that doesn't play well with inference (because Foo can not be a type argument for T). > Any user codes that will want to use a Stream, an Optional, List.of(), Arrays.asList(), etc will not work. > So you are transferring the problem of generic over inline classes being not supported in LW10 from the library developer to the users of those libraries. > > So the question is should we do something to help library developers to avoid them to publish public API with inline classes in their signature or do we think that library developers will discover by themselves that having an API with an inline class is the middle is not user friendly. > > R?mi From john.r.rose at oracle.com Mon Aug 5 21:41:48 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 5 Aug 2019 14:41:48 -0700 Subject: Collapsing the requirements In-Reply-To: References: Message-ID: <80E2C4CE-6735-499E-86E0-063BA5C5EBEB@oracle.com> Yay! Excellent summary of a major break in the logjam! On Aug 3, 2019, at 9:37 AM, Brian Goetz wrote: > > As Remi noted, we had some good discussions at JVMLS this week. Combining that with some discussions John and I have been having over the past few weeks, I think the stars are aligning to enable us to dramatically slim down the requirements. The following threads have been in play for a while: > > - John: I hate the LPoint/QPoint distinction (Expansion: I have come to dislike the costs, in the JVMS, of disambiguating Point into indirect-Point and inline-Point. The root cause is two meanings for the same name Point.) > - Brian: I hate null-default types > - Remi: I hate the V? type > > But the argument for each of these depended, in some way, on the others. I believe, with a few compromises, we can now prune them as a group, which would bring us to a much lower energy state. > > ## L^Q World ? Goodbye `LV;` > > We?ve taken it as a requirement that for a value type V, we have to support both LV and QV, where LV is the null-adjunction of QV. This has led to a lot of complexity in the runtime, where we have to manage dual mirrors. > > The main reason why we wanted LV was to support in-place migration. (In Q-world, LV was the box for QV, so it was natural for migration.) But, as we?ve worked our migration story, we?ve discovered we may not need LV for migration. And if we don?t, we surely don?t need it for anything else; worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we?re worried about erasure and overloading, to something like `LObject//V` using John?s type-operator notation.) (The basic move Brian is alluding to is to distinguish the verifier type T0 from additional ?type decorations? T1, and encode the descriptor in the form T0//T1. The verifier ignores ?//T1?, probably, or at least pays attention only to a small set of ?hardwired? instances of ?//x?, like ?//n? for not-null maybe. The JVM is free to use only the T0 prefix to build calling sequences. The ?//T1? part is not necessarily enforced but gives translation strategies a hook to attach unchecked ?intentionality? to the bare type T0. JITs might use the //T1 part in speculative predication, which would win as long as the translation strategy wasn?t ?polluted?. Overloads of m(T0//T1) and m(T0//T2) and m(T0) are all distinct. The effect is similar to that of interface types which are also unchecked, at least as arguments and returns of simple method calls. The reflective properties of T0//T1 are TBD.) > > Assuming we can restructure the migration story to not require LV to represent a VM-generated ?box" ? which I believe we can, see below ? we can drop the requirement for LV. An inline class V gives rise to a single type descriptor, QV (or whatever we decide to call it; John may have plans here.) More on John?s plans: The best way to trigger pre-loading of V robustly and consistently seems to be to have a new descriptor letter x != ?L?; right now x == ?Q?. We?ve talked through a lot of other ways to keep ?LV;? and drive preloading (or other oracular schema queries) from another signal channel, but nothing works as simply and reliably as a new descriptor letter. My best prediction right now is that we keep ?Q? for now, and if we want to further decouple null-hostility (which is an aspect of today?s ?Q?) from preloading (which is logically independent), then we shift to another letter (?G?) to drive the preloading and signal non-nullity using a type operator like ?//n?, which might be partially enforced by the verifier. The factor which might drive such decoupling in the future is the support for templates. A template instance might be created in response to a preload-mode descriptor like ?QOpt[QPoint;];?, even if the expansion of the template for some reason works out to a pointer which permits null. In such a hypothetical case, the presence of ?Q? means ?expand when you see it?, not ?preload and by the way not null?. Make sense? > > ## Goodbye `V?` > > The other reason we wanted LV was that it was the obvious representation for the language type `V?` (V adjoined with null.) Uses for `V?` include: > > - Denoting non-flattened value fields; > - Denoting non-flattened value arrays; > - Denoting erased generics over values (`Foo`); > - Denoting the type that is the adjunction of null to V (V | Null), when we really want to talk about nullability. > > But, we can do all this without a `V?` type; for every V, there is already at least one super type of V that includes `V|Null` ? Object, and any interface implemented by V. If we arrange that every value type V has a super type V?, not implemented by any other type ? then the value set of this V? is exactly that of `V?`. And we can use V? to do all the things `V?` did with respect to V ? including sub typing. The language doesn?t need the `?` type operator, it just needs to ensure that V? always exists. Which turns out to be easy, and also turns out to be essential to the migration story. > > #### Eclairs > > We can formalize this by requiring that every value type have a companion interface (or abstract class) supertype. Define an envelope-class pair (?eclair?) as a pair (V, I) such that: > > - V is an inline class > - I is a sealed type > - I permits V (and only V) > - V <: I > > (We can define eclairs for indirect classes, but they are less interesting ? because indirect classes already contain null.) > > If every value type be a member of an eclair, we can use V when we want the flattenable, non-nullable, specializable type; and we use I when we want the non-flattenable, nullable, erased ?box?. We don?t need to denote `V?`; we can just use I, which is an ordinary, nominal type. > > Note that the VM can optimize eclairs about as well as it could for LV; it knows that I is the adjunction of null to V, so that all non-null values of I are identity free and must be of type V. > > What we lose relative to V? is access to fields; it was possible to do `getfield` on a LV, but not on I. If this is important (and maybe it?s not), we can handle this in other ways. > > #### With sugar on top, please > > We can provide syntax sugar (please, let?s not bike shed it now) so that an inline clause _automatically_ acquires a corresponding interface (if one is not explicitly provided), onto which the public members (and type variables, and other super types) of C are lifted. For sake of exposition, let?s say this is called `C.Box` ? and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.) We?ve been here before, and abandoned it because ?Box? seemed misleading, but let?s call it that for now. And now it is a real nominal type, not a fake type. > In the simplest case, merely declaring an inline class could give rise to V.Box. I sincerely hope we can do this little trick, so that you can write one-liner inline types (such as records) without mandated interface boilerplate. This interacts with the rules for lifting methods from V into V.Box (see below). But IDEs would be able to refactor the code to make the sugary default visible (for further editing) or invisible again. Point of comparison: This is roughly how Java treats empty constructors (in objects which don?t define their own constructors). It is as if the object?s author had written the ?obvious trivial? constructor with no arguments and an empty body. And IDEs can in principle reveal or suppress such trivial members as routine refactorings. > Now, the type formerly known as `V?` is an ordinary, nominal interface (or abstract class) type. The user can say what they mean, and no magic is needed by either the language or the VM. Goodbye `V?`. (Thunderous applause as the crowd goes wild! At least, I?m getting a little excited here.) Part of the model, I think, is that the I aspect of an eclair is ?just as functional? as the V aspect, excepting only the presence of null and NPE. This means we need a mechanism (which I won?t speculate on here; there are multiple possibilities) for lifting the public methods of V into I so they can be invoked via I. But here are some related user model questions: Is there ever a case where a non-sealed super of V is ?good enough?, or is there always a 1-1 relation between every V and its I? (A type V without a sealed I would still have supers like Object and Comparable. But its unbox operation would go to a type like Object which isn?t unambiguously unboxed back to V. One might call such a half-baked type ?weakly boxable?, a raw biscuit rather than a tasty eclair. Is it possible? Is it worth it?) Given a proper eclair (V and I in 1-1 relation), we can also ask, how do the following sets compare: The instance methods of V and those of I? Also the static methods of V and those of I? Also the instance fields of V and the methods of I? Also the static fields of V and the fields and/or methods of I? And the nested types of V and I? (Quick take: Public instance fields of V are all present in I by some means TBD. Users can, but shouldn?t, define members in I such as default methods, constants, static factories, etc. Such things always work better on V, unless there is a compatibility play going on, in which case perhaps the members should be defined in *both* places as a best practice.) Do we always nest I=V.Box inside of V, or do we sometimes allow other couplings, such as I, V as sibling members of another type or package, or V=I.Val inside of I? (Quick take: Flexibility is good, plus clear best practices. Assuming all the 1-1 relations are always derivable at compile time and run time.) How does this interact with nested classes? If one inline nests inside another, is there any tricky way to go from inner to outer via the box? (Probably not.) Given a nested inner inline type V as C.V, where every V has an outer C instance, what is C.V.default? (That was a solve from NDVs. Maybe we can somehow rewrite vulnerable expressions of type C.V as C.V.Box and let them go to null? Maybe the array type C.V[] is hard to obtain and the language directs you instead t C.V.Box[]?) Since V=I.Val and I=V.Box are statically and dynamically related, it appears that an erased generic Foo can potentially instantiate as Foo and then inside its type signature make use of the static type V=I.Val, spelled as ?T.Val? or ?unbox? or some such. The unboxing casts to V (which reject null) would be planted around the edges of the generic at all uses of its instance, not inside it. This seems possible to me; is it useful enough to justify the weird intrinsic type operator in the JLS? Probably not, but I?m putting it out there anyway. (What?s T.Val / unbox when T is not a ValObject? Maybe it?s just T. Or maybe T.Val is only allowed to instantiate when T is of the form V.Box, but there?s no bound that expresses that, currently, a hint that this ?feature? is DOA. But, a possible use of T.Val / unbox would be in argument and return positions of erased generics, in all places where null should be excluded. There?s no need for T.Box / box since T cannot ever be an inline type V. Today?s generics could be upgraded to be smarter about nulls if they could ?guard? their arguments and return values with box instead of T.) ? John From john.r.rose at oracle.com Mon Aug 5 21:47:48 2019 From: john.r.rose at oracle.com (John Rose) Date: Mon, 5 Aug 2019 14:47:48 -0700 Subject: Collapsing the requirements In-Reply-To: <1291415783.81747.1565003307284.JavaMail.zimbra@u-pem.fr> References: <950816529.15227.1564854481184.JavaMail.zimbra@u-pem.fr> <15185241.85842.1564949186845.JavaMail.zimbra@u-pem.fr> <3403A975-531D-4B35-A4F6-BA7C1734E6EA@oracle.com> <1291415783.81747.1565003307284.JavaMail.zimbra@u-pem.fr> Message-ID: <14A6B91F-5A1A-4C7A-9FC7-67925B3D01EE@oracle.com> On Aug 5, 2019, at 4:08 AM, forax at univ-mlv.fr wrote: > > if we have a public method like "id" that takes a Foo or return a Foo it means that you are providing a public API that doesn't play well with inference (because Foo can not be a type argument for T). > Any user codes that will want to use a Stream, an Optional, List.of(), Arrays.asList(), etc will not work. > So you are transferring the problem of generic over inline classes being not supported in LW10 from the library developer to the users of those libraries. > > So the question is should we do something to help library developers to avoid them to publish public API with inline classes in their signature or do we think that library developers will discover by themselves that having an API with an inline class is the middle is not user friendly. I would prefer to make the inline classes work well with libraries, including generic inference, so that the companion box type is require as rarely as possible, only for instantiation. So, List is required, but if you say List.of(V.default) inference should pop up to List, just like List.of(1) pops up to List. As Brian suggests, this is the sort of detail that needs to be worked out. I?m hopeful it can work at least as well as today?s int/Integer rules. From forax at univ-mlv.fr Tue Aug 6 13:17:07 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 6 Aug 2019 15:17:07 +0200 (CEST) Subject: Collapsing the requirements In-Reply-To: <14A6B91F-5A1A-4C7A-9FC7-67925B3D01EE@oracle.com> References: <950816529.15227.1564854481184.JavaMail.zimbra@u-pem.fr> <15185241.85842.1564949186845.JavaMail.zimbra@u-pem.fr> <3403A975-531D-4B35-A4F6-BA7C1734E6EA@oracle.com> <1291415783.81747.1565003307284.JavaMail.zimbra@u-pem.fr> <14A6B91F-5A1A-4C7A-9FC7-67925B3D01EE@oracle.com> Message-ID: <991931561.245143.1565097427234.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > Envoy?: Lundi 5 Ao?t 2019 23:47:48 > Objet: Re: Collapsing the requirements > On Aug 5, 2019, at 4:08 AM, forax at univ-mlv.fr wrote: >> >> if we have a public method like "id" that takes a Foo or return a Foo it means >> that you are providing a public API that doesn't play well with inference >> (because Foo can not be a type argument for T). >> Any user codes that will want to use a Stream, an Optional, List.of(), >> Arrays.asList(), etc will not work. >> So you are transferring the problem of generic over inline classes being not >> supported in LW10 from the library developer to the users of those libraries. >> >> So the question is should we do something to help library developers to avoid >> them to publish public API with inline classes in their signature or do we >> think that library developers will discover by themselves that having an API >> with an inline class is the middle is not user friendly. > > I would prefer to make the inline classes work well with libraries, including > generic inference, so that the companion box type is require as rarely as > possible, only for instantiation. > > So, List is required, but if you say List.of(V.default) inference should > pop up to List, just like List.of(1) pops up to List. > > As Brian suggests, this is the sort of detail that needs to be worked out. I?m > hopeful it can work at least as well as today?s int/Integer rules. A long term objective is to get ride of boxing, if we are introducing more boxing in LW10, we may cripple our own future. So i don't think LW10 should introduce rules for inline class that works "as well" as boxing conversions, it should work well enough. By example, List.of(V.default) being typed as a List is perhaps too much but List.of(V.default) being typed a List *and* List list = List.of(V.default) being valid should be Ok. The difference may be subtle, but in the second case, the assignment to List works because a target type is specified. This will ensure that the support of specialized generics in LW100 will also work and is the preferred mechanism. Here are the rules, i propose: - we should doesn't allow inline classes in signature of public methods (the confinement rule), - there should be an opt-in mechanism to have the box (the eclair interface) derived automatically by the compiler, this default derived interface is empty (unlike Integer), you have to unbox it to call a method in the bytecode, but as part of the desugaring mechanism, when invoking a method on the box, the compiler will invoke the corresponding method on the inline class (or a NPE is thrown). - the autobox rule should be derived from the handshake between an interface and it's sole permitted implementation being an inline class (so it works for the generated Box classes but also any eclair interfaces) So for a value based class like Optional, Optional will have an hand-crafted box (Opt), the autobox rule works and calling a method on Optional is a plain old interface call. for an inline class like Complex with a Box provided by the compiler, Complex.Box is an empty interface, the autobox rule works and calling a method on Complex.Box is equivalent to a inserting a cast like this: ((Complex)box).method() R?mi From frederic.parain at oracle.com Tue Aug 6 15:44:08 2019 From: frederic.parain at oracle.com (Frederic Parain) Date: Tue, 6 Aug 2019 11:44:08 -0400 Subject: Collapsing the requirements In-Reply-To: References: Message-ID: Brian, Thank you for this description of the new model. See my comments and questions inlined below. > On Aug 3, 2019, at 12:37, Brian Goetz wrote: > > As Remi noted, we had some good discussions at JVMLS this week. Combining that with some discussions John and I have been having over the past few weeks, I think the stars are aligning to enable us to dramatically slim down the requirements. The following threads have been in play for a while: > > - John: I hate the LPoint/QPoint distinction > - Brian: I hate null-default types > - Remi: I hate the V? type > > But the argument for each of these depended, in some way, on the others. I believe, with a few compromises, we can now prune them as a group, which would bring us to a much lower energy state. > > ## L^Q World ? Goodbye `LV;` > > We?ve taken it as a requirement that for a value type V, we have to support both LV and QV, where LV is the null-adjunction of QV. This has led to a lot of complexity in the runtime, where we have to manage dual mirrors. > > The main reason why we wanted LV was to support in-place migration. (In Q-world, LV was the box for QV, so it was natural for migration.) But, as we?ve worked our migration story, we?ve discovered we may not need LV for migration. And if we don?t, we surely don?t need it for anything else; worst-case, we can erase LV to `LValObject` or even `LObject` (or, if we?re worried about erasure and overloading, to something like `LObject//V` using John?s type-operator notation.) > > Assuming we can restructure the migration story to not require LV to represent a VM-generated ?box" ? which I believe we can, see below ? we can drop the requirement for LV. An inline class V gives rise to a single type descriptor, QV (or whatever we decide to call it; John may have plans here.) > > ## Goodbye `V?` > > The other reason we wanted LV was that it was the obvious representation for the language type `V?` (V adjoined with null.) Uses for `V?` include: > > - Denoting non-flattened value fields; > - Denoting non-flattened value arrays; > - Denoting erased generics over values (`Foo`); > - Denoting the type that is the adjunction of null to V (V | Null), when we really want to talk about nullability. > > But, we can do all this without a `V?` type; for every V, there is already at least one super type of V that includes `V|Null` ? Object, and any interface implemented by V. If we arrange that every value type V has a super type V?, not implemented by any other type ? then the value set of this V? is exactly that of `V?`. And we can use V? to do all the things `V?` did with respect to V ? including sub typing. The language doesn?t need the `?` type operator, it just needs to ensure that V? always exists. Which turns out to be easy, and also turns out to be essential to the migration story. > > #### Eclairs > > We can formalize this by requiring that every value type have a companion interface (or abstract class) supertype. Define an envelope-class pair (?eclair?) as a pair (V, I) such that: > > - V is an inline class > - I is a sealed type > - I permits V (and only V) > - V <: I > > (We can define eclairs for indirect classes, but they are less interesting ? because indirect classes already contain null.) > > If every value type be a member of an eclair, we can use V when we want the flattenable, non-nullable, specializable type; and we use I when we want the non-flattenable, nullable, erased ?box?. We don?t need to denote `V?`; we can just use I, which is an ordinary, nominal type. So, legal signatures will be: - QV; - LI; and that?s it, right? Q will continue to have its current semantic (flattenable, non-nullable, triggers pre/eager-loading). L will continue to have its legacy semantic (indirection, nullable, no new loading rules) > > Note that the VM can optimize eclairs about as well as it could for LV; it knows that I is the adjunction of null to V, so that all non-null values of I are identity free and must be of type V. Optimizing I might require some knowledge about V, but because V <: I, I could be loaded while V is not loaded yet. This might not be an issue if all information about the eclair is contained in I, but as far as I understand the model so far, this is not guaranteed. For instance, if I is an interface, it won?t include information about fields layout of V (but if I is an abstract class, it could). So, some explorations are required here to determine: - how information is split between I and V - is there an impact on performances if I is loaded but V is not - should V and I be loaded together? > > What we lose relative to V? is access to fields; it was possible to do `getfield` on a LV, but not on I. If this is important (and maybe it?s not), we can handle this in other ways. This is related to an open question that shows up in many places in this document. What should be the nature of V?s super type? An interface or an abstract class? If it is an abstract class, it could declare and access the fields. The question expands further than just fields, what?s about methods? bodies? Should they be in V or in I? This has an impact on the type of ?this? in these methods, even if this model has the nice property that ?this? will always point to an instance of V (as long as the JVM protects the model, and prevents external forces (JVMTI, Unsafe, etc.) from breaking the special and unique relationship between I and V). And the type of ?this? will also impact the way methods are invoked (invokevirtual vs invokeinterface). > > #### With sugar on top, please > > We can provide syntax sugar (please, let?s not bike shed it now) so that an inline clause _automatically_ acquires a corresponding interface (if one is not explicitly provided), onto which the public members (and type variables, and other super types) of C are lifted. Does the interface only declares public methods, or does it also provide the implementation (default method)? > For sake of exposition, let?s say this is called `C.Box` ? and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.) Is it a new feature? Or just an idea how it could be implemented in the future? Because I?ve tried to compile this: public class C implements C.Box { static public interface Box { } } And I got: C.java:1: error: cyclic inheritance involving C public class C implements C.Box { ^ 1 error > We?ve been here before, and abandoned it because ?Box? seemed misleading, but let?s call it that for now. And now it is a real nominal type, not a fake type. In the simplest case, merely declaring an inline class could give rise to V.Box. > > Now, the type formerly known as `V?` is an ordinary, nominal interface (or abstract class) type. The user can say what they mean, and no magic is needed by either the language or the VM. Goodbye `V?`. > > #### Boxing conversion > > Given the constraints of the eclair relationship, it would be reasonable for the compiler to derive from this that there is a boxing conversion between C and I (I is just the value set of C, plus null ? which is the relationship boxes have with their corresponding primitives.) The boxing operation is a no-op (since C <: I) and the unboxing operation is a null checking cast. Could we assume that boxing/unboxing would be handled by the static compiler (like primitive boxing today), and there?s no expectation that the JVM will do magic boxing when needed? (Not considering auto-bridges yet). > > #### Erased generics > > Using the eclair wrapper also kicks the problem of erased generics down the road; if we use `Foo` for erased generics, and temporarily ban `Foo`, when we get to specialized generics, it will be obvious what `Foo` means (their common super type will be `Foo`). This is a less confusing world, as then ?List of erased V? and ?List of specialized V? don?t coexist; there?s only ?List of V? and ?List of V?s Box?. > > ## Migration > > The ability to migrate Optional and friends to values has been an important goal, but it has been the source of significant complexity. Our previous story leaned hard on ?When we migrate X to a value, LX will describe the box, so old callsites will continue to link.? But it turned out that brought a lot of baggage (forwarding bridges, null-default values) and compromises (null-default values lose their calling-convention optimizations), and over the past few weeks John and I have been cooking up a simpler eclair-based recipe for this. > > The world is indeed full of existing utterances of `LOptional`, and they will still want to work. Fortunately, Optional follows the rules for being a value-based class. We start with migrating Optional from a reference class to an eclair with a public abstract class and a private value implementation. Now, existing code just works (source and binary) ? and optionals are values. But, this isn?t good enough; existing variables of type Optional are not flattened. Notable difference with previous statements: here the eclair is made of an inline class and an abstract class (instead of an inline class and an interface). I assume this is for backward compatibility (Optional?s methods are currently invoked using invokevirtual and not invokeinterface). Is it the plan to have both types of eclair: - inline class + interface - inline class + abstract class Or is it simply an open question about the nature of V?s super type? Having V?s super type be an abstract class, some additional issues have to be considered. If both V and V?s super are classes (abstract or not), they both can declare fields, so they could end up having different layouts. Even if javac checks against that, manually crafted class files and instrumentation frameworks injecting fields (with redefineClass) could create situations where a mismatch exists between V and V?s super. Would this cause issues? Should the JVM guard against that? To be investigated. > > One of the objections raised to in-place migration was nullity; in order to migrate Optional to a true value, it would have to be a null-default value, and this already entailed compromises. If we?re willing to compromise further, we can get what we want without the baggage. And that compromises is: give up the name. > > So we define a new public value class `Opt` which is the value half of the eclair, and the existing Optional is the interface/abstract class half. Now, existing fields / arrays can migrate gradually to Opt, as they want the benefit of flattening; existing APIs can continue to truck in Optional (which have about the same optimizations as a null-default value would have on the stack.) > > This works because of the boxing conversion. Suppose we have old code that does: > > Optional o = makeAnOptional() > > when the user changes this to > > Opt o = ? > > the compiler seems the RHS is an Optional and the LHS is a Opt, and there is a boxing conversion between them, so we insert an unbox conversion (null check) and we?re done. Users can migrate their fields gradually. The cost: the good name gets burned. But there is a compatible migration path from ref to value. > > Later, when we have bridges (we don?t need them yet!), we can migrate the library uses from Optional to Opt. > > ## Null-default values > > About 75% of the motivation for null-default values ? another huge source of complexity ? was to support the migration of value-based classes. And it wasn?t even a great solution ? because we still lost some key optimizations (e.g., calling conventions.) With the Optional -> Opt path, we don?t need null-default values, we get ordinary values. So while we pay the cost of changing the name, we gain the benefit that the new values, once the full migration is effected, we don?t carry the legacy performance baggage. > > Another 20% of the motivation was for security-sensitive classes whose default value did not represent a useful value, for which we wanted not null-default-ness but really initialization safety. Let?s look at another way to get there. > > There are a few ways to get there. One is to treat this problem as protecting such classes from uninitialized fields or array elements; another is to ensure that such classes (a) have no public fields and (b) perform the correct check at the top of each method (which can be injected by the compiler.) I don?t want to solve that problem right here, but I think there enough ways to get there that we can assume this isn?t a hard requirement. Would (b) be applied to non-static inner inline classes, or are they definitively considered as a lost cause? Currently they can throw a NPE which is not so bad after all. > > The other 5% was just the user-based ?I want null in my value set.? For those, we can tell users: use the interface box when you need null. > > ## Summary > > In one swoop, we can banish LV from the VM, The JVM will eventually have to enforce this at some point. It seems beyond what the verifier can do (because it doesn?t load types in L-signatures), so we have to identify when checks can be performed at runtime. > V? from the language, and null-default values, by making a simple requirement: every value type is paired with an interface or abstract class ?box?. For most values, this can be automatically generated by the compiler and denoted via a well-known name (e.g., V.Box); for some values, such as those that are migrated from reference types, we can explicitly declare the box type and pick explicit names for both types. > > There?s a lot to work out, but I think it should be clear enough that this is a much, much lower energy state than what we were aiming at for L10, and also a simpler user model. > > Let?s focus discussions on validating the model first before we dive into mechanism or surface syntax. The model looks promising, but a more precise specification of eclairs would be helpful to estimate the impact on the JVM: - What is the nature of V?s super? - How fields/methods are declared/implemented between V and V?s super? - Is there any special requirements regarding static members between V and V?s super? - Is there a requirement that V and V?s super share the bodies of their non-static public methods? These questions do not expect answers right away, but are suggestions about aspects of the model that would worth discussing. Thank you, Fred From brian.goetz at oracle.com Tue Aug 6 16:50:26 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 6 Aug 2019 09:50:26 -0700 Subject: Collapsing the requirements In-Reply-To: References: Message-ID: > So, legal signatures will be: > - QV; > - LI; > and that?s it, right? > > Q will continue to have its current semantic (flattenable, non-nullable, triggers pre/eager-loading). > L will continue to have its legacy semantic (indirection, nullable, no new loading rules) Correct. Nice and simple! > >> >> Note that the VM can optimize eclairs about as well as it could for LV; it knows that I is the adjunction of null to V, so that all non-null values of I are identity free and must be of type V. > > Optimizing I might require some knowledge about V, but because V <: I, I could be loaded while V is not loaded yet. If the rule is ?always preload Q? (which I think is what John is suggesting), then this case cannot come up, because I?s class file will mention QV. Similarly, the opposite case does not happen either, as we load super types first, so loading V will trigger loading I. Of course, we can twiddle these rules and get different answers, but this is my understanding based on the rules I have heard for load order. > >> >> What we lose relative to V? is access to fields; it was possible to do `getfield` on a LV, but not on I. If this is important (and maybe it?s not), we can handle this in other ways. > > This is related to an open question that shows up in many places in this document. > What should be the nature of V?s super type? An interface or an abstract class? > If it is an abstract class, it could declare and access the fields. > The question expands further than just fields, what?s about methods? bodies? > Should they be in V or in I? This has an impact on the type of ?this? in these > methods, even if this model has the nice property that ?this? will always point > to an instance of V (as long as the JVM protects the model, and prevents external > forces (JVMTI, Unsafe, etc.) from breaking the special and unique relationship > between I and V). And the type of ?this? will also impact the way methods are > invoked (invokevirtual vs invokeinterface). There?s a longer discussion to be had about bringing abstract classes and interfaces closer together, or allowing abstract class super types of values, and if so, how. I have some vague ideas of how the VM and language could handle this combination; rather than dive into that now, I?ll just say that here are the places where the concept of inline-extends-abstract-class has come up: - Migrating VBC to inline classes - Inline records (as there is an abstract Record super type) - Whether ValObject is an interface or an abstract class Which is to say, we should untangle this knot, which I think is pretty closely related to the RefObject/ValObject knot, so I would think it is best to untangle them together. > >> >> #### With sugar on top, please >> >> We can provide syntax sugar (please, let?s not bike shed it now) so that an inline clause _automatically_ acquires a corresponding interface (if one is not explicitly provided), onto which the public members (and type variables, and other super types) of C are lifted. > > Does the interface only declares public methods, or does it also provide the implementation (default method)? If we extract an interface from the class mechanically, we would lift the public methods, the super types, and the type variables to the interface. If the user writes the interface by hand, they will do what they?re going to do. > >> For sake of exposition, let?s say this is called `C.Box` ? and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.) > > Is it a new feature? Or just an idea how it could be implemented in the future? > > Because I?ve tried to compile this: > > public class C implements C.Box { > static public interface Box { > > } > } Yes, we would have to address this. The cycle here is not a real cycle, in that Box does not depend on C for anything, except it happens to live there. >> #### Boxing conversion >> >> Given the constraints of the eclair relationship, it would be reasonable for the compiler to derive from this that there is a boxing conversion between C and I (I is just the value set of C, plus null ? which is the relationship boxes have with their corresponding primitives.) The boxing operation is a no-op (since C <: I) and the unboxing operation is a null checking cast. > > Could we assume that boxing/unboxing would be handled by the static compiler (like primitive boxing today), > and there?s no expectation that the JVM will do magic boxing when needed? (Not considering auto-bridges yet). Yes. In fact, we only need this in one direction; since C <: I, the conversion C -> I comes for free (scbtyping), it is only the conversion I -> C that would require an unboxing conversion. The compiler would introduce the necessary casts (which the VM can optimize to null checks.) >> >> The world is indeed full of existing utterances of `LOptional`, and they will still want to work. Fortunately, Optional follows the rules for being a value-based class. We start with migrating Optional from a reference class to an eclair with a public abstract class and a private value implementation. Now, existing code just works (source and binary) ? and optionals are values. But, this isn?t good enough; existing variables of type Optional are not flattened. > > Notable difference with previous statements: here the eclair is made of an inline class and an abstract class > (instead of an inline class and an interface). I assume this is for backward compatibility (Optional?s methods > are currently invoked using invokevirtual and not invokeinterface). Correct. There are multiple ways to handle this. One is to allow eclairs with abstract classes; another is to blur the distinction between abstract class and interface so that we can make Optional an interface and support the invoke virtual callsites in the wild. I think I prefer the former, but once we start to untangle the ValObject/RefObject knot, I suspect we?ll know more. > Having V?s super type be an abstract class, some additional issues have to be considered. > If both V and V?s super are classes (abstract or not), they both can declare fields, so they > could end up having different layouts. Even if javac checks against that, manually crafted > class files and instrumentation frameworks injecting fields (with redefineClass) could create > situations where a mismatch exists between V and V?s super. > Would this cause issues? Should the JVM guard against that? To be investigated. It would have to be worked out. I think John said something like ?let the language guard against inline value classes extending inappropriate abstract classes, and if the VM sees inline class extend an abstract class, ignore the fields and the ctors. This is probably a reasonable first-order-approximation if we decide to go this route. > >> There are a few ways to get there. One is to treat this problem as protecting such classes from uninitialized fields or array elements; another is to ensure that such classes (a) have no public fields and (b) perform the correct check at the top of each method (which can be injected by the compiler.) I don?t want to solve that problem right here, but I think there enough ways to get there that we can assume this isn?t a hard requirement. > > Would (b) be applied to non-static inner inline classes, or are they definitively considered as a lost cause? > Currently they can throw a NPE which is not so bad after all. Depends on how early we can guarantee that NPE. If the class might do a bunch of side effects before hitting the dereference of the outer pointer, then we might leave things in an inconsistent state. If we can fail faster, that is good. This area definitely needs investigation. > The model looks promising, but a more precise specification of eclairs would be helpful > to estimate the impact on the JVM: > > - What is the nature of V?s super? > - How fields/methods are declared/implemented between V and V?s super? > - Is there any special requirements regarding static members between V and V?s super? > - Is there a requirement that V and V?s super share the bodies of their non-static public methods? Good questions, I hope to have answers eventually. If you have preferred answers, please share your thinking! From forax at univ-mlv.fr Tue Aug 6 20:50:07 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 6 Aug 2019 22:50:07 +0200 (CEST) Subject: Collapsing the requirements In-Reply-To: References: Message-ID: <725129568.291668.1565124607633.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Frederic Parain" > Cc: "valhalla-spec-experts" > Envoy?: Mardi 6 Ao?t 2019 18:50:26 > Objet: Re: Collapsing the requirements >> So, legal signatures will be: >> - QV; >> - LI; >> and that?s it, right? >> Q will continue to have its current semantic (flattenable, non-nullable, >> triggers pre/eager-loading). >> L will continue to have its legacy semantic (indirection, nullable, no new >> loading rules) > Correct. Nice and simple! I believe 'Q' should be only mean preload and the fact that the class has a inline bit should imply flattenable and non-nullable. Yes, we consume one of these precious bit but at the same time, we nicely decouple the meaning of the descriptor from the meaning of the class itself. R?mi From john.r.rose at oracle.com Tue Aug 6 21:03:09 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 6 Aug 2019 14:03:09 -0700 Subject: Collapsing the requirements In-Reply-To: References: Message-ID: <3B71E9A6-2746-4B5E-8621-5C0EFB35F5E3@oracle.com> Good discussion! On Aug 6, 2019, at 9:50 AM, Brian Goetz wrote: > >> So, legal signatures will be: >> - QV; >> - LI; >> and that?s it, right? >> >> Q will continue to have its current semantic (flattenable, non-nullable, triggers pre/eager-loading). >> L will continue to have its legacy semantic (indirection, nullable, no new loading rules) > > Correct. Nice and simple! Not completely simple. The old contract of LV; will haunt us slightly. Remember that LG; is a valid descriptor, for any garbage name G even if G doesn?t exist. (E.g., ?Lno/such/package/or/type!!;?.) You can?t find all such LG;. Therefore, LV; must be allowed as a possibility, on the same footing as LG;. Note that reflecting over LG; will get a CNFE. And the verifier will make only limited accommodation for such types, in effect allowing only ?null? into such variables. There?s nothing to be gained by trying to make the rules against LV; more strict than those for LG;. Therefore, the interpretation of LV; should be ?as if? the string V in that descriptor were truly a non-existing type, to be diagnosed at all the same times that any other LG; would be checked and diagnosed. > >> >>> >>> Note that the VM can optimize eclairs about as well as it could for LV; it knows that I is the adjunction of null to V, so that all non-null values of I are identity free and must be of type V. >> >> Optimizing I might require some knowledge about V, but because V <: I, I could be loaded while V is not loaded yet. > > If the rule is ?always preload Q? (which I think is what John is suggesting), then this case cannot come up, because I?s class file will mention QV. Similarly, the opposite case does not happen either, as we load super types first, so loading V will trigger loading I. Yup. The only truly lazy scenario would be when some API uses only the LI; type, as a descriptor not a CONSTANT_Class. Then the normal contract for L-descriptors applies: I.class isn?t loaded until there?s some specific need for I (as in a CONSTANT_MethodType). That is pleasingly similar to the situation with today?s primitives and their wrappers: ?I? is hardwired but ?java/lang/Integer? is not hardwired to the same degree (the verifier doesn?t have to load it always, for example). > Of course, we can twiddle these rules and get different answers, but this is my understanding based on the rules I have heard for load order. > >> >>> >>> What we lose relative to V? is access to fields; it was possible to do `getfield` on a LV, but not on I. If this is important (and maybe it?s not), we can handle this in other ways. >> >> This is related to an open question that shows up in many places in this document. >> What should be the nature of V?s super type? An interface or an abstract class? >> If it is an abstract class, it could declare and access the fields. >> The question expands further than just fields, what?s about methods? bodies? Yes, these are interesting questions. One thing that makes me happier about this model is the fact that several of the possible answers require no new JVM functionality, but are simply translation strategy decisions. At the moment, I personally prefer the idea (out of several possible ideas) of keeping all concrete functionality inside the inline class V, and lift only API surface into I as (i) abstract methods, (ii) supers, and (iii) type variables, and further to do this lifting ?the old fashioned way? by requiring javac to do the copying at compile time. This is good enough to kick off experimentation with the resulting user model, IMO, if not in LW10 then in LW5 (if we need margin for adjustment). Indeed, after that many questions follow, about fields, static methods, the role (if any) of non-interface supers such as ValObject (if not an interface), the possible role of covariance (or not) on V<:I within the V/I APIs, nested classes of V, type inference rules for V and I, support for user customization of I, alternative patterns other than I=V.Box, JVM or JLS support for defining various bits of the pattern, and so on. (I?m sure I missed something!) But simply copying the (public!) methods into an otherwise-empty I.class (as abstracts, plus supers & typevars) seems a great first cut to me. >> Should they be in V or in I? This has an impact on the type of ?this? in these >> methods, even if this model has the nice property that ?this? will always point >> to an instance of V (as long as the JVM protects the model, and prevents external >> forces (JVMTI, Unsafe, etc.) from breaking the special and unique relationship >> between I and V). And the type of ?this? will also impact the way methods are >> invoked (invokevirtual vs invokeinterface). > > There?s a longer discussion to be had about bringing abstract classes and interfaces closer together, or allowing abstract class super types of values, and if so, how. I have some vague ideas of how the VM and language could handle this combination; rather than dive into that now, I?ll just say that here are the places where the concept of inline-extends-abstract-class has come up: > > - Migrating VBC to inline classes > - Inline records (as there is an abstract Record super type) > - Whether ValObject is an interface or an abstract class > > Which is to say, we should untangle this knot, which I think is pretty closely related to the RefObject/ValObject knot, so I would think it is best to untangle them together. +1 I think there are several ways forward on this front, and we can pick a good one. > >> >>> >>> #### With sugar on top, please >>> >>> We can provide syntax sugar (please, let?s not bike shed it now) so that an inline clause _automatically_ acquires a corresponding interface (if one is not explicitly provided), onto which the public members (and type variables, and other super types) of C are lifted. >> >> Does the interface only declares public methods, or does it also provide the implementation (default method)? > > If we extract an interface from the class mechanically, we would lift the public methods, the super types, and the type variables to the interface. If the user writes the interface by hand, they will do what they?re going to do. +1; a good first cut and maybe even the last cut. > >> >>> For sake of exposition, let?s say this is called `C.Box` ? and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.) >> >> Is it a new feature? Or just an idea how it could be implemented in the future? >> >> Because I?ve tried to compile this: >> >> public class C implements C.Box { >> static public interface Box { >> >> } >> } > > Yes, we would have to address this. The cycle here is not a real cycle, in that Box does not depend on C for anything, except it happens to live there. As the author of that particular restriction I would support lifting it, at least in the case of interfaces, and probably also of any ?static? nested class. The proposed inheritance would be ill-founded if the outer were to extend a non-static inner, which is why it?s a restriction in the first place, but I widened it to a simpler rule out of an abundance of caution. Time to change it. > >>> #### Boxing conversion >>> >>> Given the constraints of the eclair relationship, it would be reasonable for the compiler to derive from this that there is a boxing conversion between C and I (I is just the value set of C, plus null ? which is the relationship boxes have with their corresponding primitives.) The boxing operation is a no-op (since C <: I) and the unboxing operation is a null checking cast. >> >> Could we assume that boxing/unboxing would be handled by the static compiler (like primitive boxing today), >> and there?s no expectation that the JVM will do magic boxing when needed? (Not considering auto-bridges yet). > > Yes. In fact, we only need this in one direction; since C <: I, the conversion C -> I comes for free (scbtyping), it is only the conversion I -> C that would require an unboxing conversion. The compiler would introduce the necessary casts (which the VM can optimize to null checks.) It?s less than the full boxing/unboxing pattern, since ?boxing? is subsumed by simple widening to a super. Also, ?unboxing? is just a cast (narrowing to a sub). We might need a new term to express this hybrid between full-on ?unboxing? and a plain casting conversion, so the JLS can say ?unboxing and devoxing? (or whatever) wherever today?s unboxing comes into play. > >>> >>> The world is indeed full of existing utterances of `LOptional`, and they will still want to work. Fortunately, Optional follows the rules for being a value-based class. We start with migrating Optional from a reference class to an eclair with a public abstract class and a private value implementation. Now, existing code just works (source and binary) ? and optionals are values. But, this isn?t good enough; existing variables of type Optional are not flattened. >> >> Notable difference with previous statements: here the eclair is made of an inline class and an abstract class >> (instead of an inline class and an interface). I assume this is for backward compatibility (Optional?s methods >> are currently invoked using invokevirtual and not invokeinterface). > > Correct. There are multiple ways to handle this. One is to allow eclairs with abstract classes; another is to blur the distinction between abstract class and interface so that we can make Optional an interface and support the invoke virtual callsites in the wild. I think I prefer the former, but once we start to untangle the ValObject/RefObject knot, I suspect we?ll know more. My long-term wish list of JVM cleanups already includes deprecating invokeinterface and upgrading invokevirtual to cover its job. This is a fine time to think about doing that. > >> Having V?s super type be an abstract class, some additional issues have to be considered. >> If both V and V?s super are classes (abstract or not), they both can declare fields, so they >> could end up having different layouts. Even if javac checks against that, manually crafted >> class files and instrumentation frameworks injecting fields (with redefineClass) could create >> situations where a mismatch exists between V and V?s super. >> Would this cause issues? Should the JVM guard against that? To be investigated. Indeed. My thought here is that fields inherited into an inline type would be completely taken over by the inline type; the layout of the abstract super would *not* be reused, so there *would* be mismatches between V and its super. We?d have to distinguish carefully between uses of fields inside an inline instance (which are always ?full custom?) and fields inside a classic ?identity? (indirect) instance, which are always set inside the super and inherited as a full layout. Unsafe field offsets would be subject to restrictions: You can always use them on the declaring class if it?s concrete, but if it a field is inherited into an inline you somehow have to determine the field offset relative to the particular inline class. These restrictions apply to numeric offsets. For symbolic references the problem is probably not so bad. We can probably mandate that a symbolic reference to an inherited field must mention the inline type using the field, not the abstract declaring it; a similar effect is already obtained by the rules of protected fields. Maybe we get some useful leverage from mandating that all fields inherited into an inline are protected? Just brainstorming here? As Brian says, there are details to work out. I?d be happy to exclude fields for now and have abstract superclasses define only behavior and statics, not instance state. Or, if the problem is confined to just ValObject and methods of the Object protocol, I?d be OK with making ValObject be an interface, *but* a special one that can hold methods for the Object protocol (which is forbidden for most interfaces), and maybe also final methods (which is also forbidden for interfaces). Like I say, we have multiple options. > > It would have to be worked out. I think John said something like ?let the language guard against inline value classes extending inappropriate abstract classes, and if the VM sees inline class extend an abstract class, ignore the fields and the ctors. This is probably a reasonable first-order-approximation if we decide to go this route. Yeah; if there are no fields then I think we can make a structural rule that the abstract?s constructor is just as free of behavior as that of an interface. The JVM can verify that it is a bare call to Object.()V or whatever is next up the chain, and javac would forbid constructors to be coded. If there are fields then there are complications to work out? But I?ll stop brainstorming/ratholing now, since there?s more important stuff to consider. > >> >>> There are a few ways to get there. One is to treat this problem as protecting such classes from uninitialized fields or array elements; another is to ensure that such classes (a) have no public fields and (b) perform the correct check at the top of each method (which can be injected by the compiler.) I don?t want to solve that problem right here, but I think there enough ways to get there that we can assume this isn?t a hard requirement. >> >> Would (b) be applied to non-static inner inline classes, or are they definitively considered as a lost cause? >> Currently they can throw a NPE which is not so bad after all. > > Depends on how early we can guarantee that NPE. If the class might do a bunch of side effects before hitting the dereference of the outer pointer, then we might leave things in an inconsistent state. If we can fail faster, that is good. This area definitely needs investigation. Suppose C.IV is an inner inline class of C, in which both C.this and IV.this are in scope. It might be OK if IV.default is NPE-happy *if* we can make it harder to observe. There are lots of things we could try to do this, but perhaps the simplest thing to do is take Remi?s remedy, to confine such types to a non-public role. A companion interface could be placed into the public API of C, as a replacement, and C would be free to pass around either null or valid instances of C.IV (but not IV.default, which C would avoid). Since IV would be non-public, "nobody but family? would be making arrays or fields of type IV. Maybe that?s enough; I think it?s worth an experiment. Maybe more specific tactics would work also, such as having the JLS forbid uninitialized fields and array construction of type C.IV *outside of C?s nest*. The JVM would allow such things, and they?s have NPE risks, but non-family would be firmly discouraged by the language from declaring variables that initialize to IV.default. An explicit mention of IV.default is probably also to be discouraged; if C wants to export a constant that exposes this NPE-risky value, that?s the business of C?s author, but the language can forbid it outside of C?s nest. Lots to talk about here, but we have (as I say) multiple options that seem OK. FTR, I think Remi?s remedy (of confining inlines to non-public) is a little too restrictive for inlines in general, although maybe it?s a conservative thing to try in LW5, when we are running user model experiments. If we want to start using inlines as ?new numerics? (B-float, etc.) it?s really unfriendly to require users to encounter them only via their companion interfaces. > >> The model looks promising, but a more precise specification of eclairs would be helpful >> to estimate the impact on the JVM: >> >> - What is the nature of V?s super? Low impact option for JVM: Just an interface for starters, to be adjusted for ValObject later as needed. >> - How fields/methods are declared/implemented between V and V?s super? Low impact: javac takes responsibility for copying stuff up from V to V?s super. JVM just reads the class file. >> - Is there any special requirements regarding static members between V and V?s super? Low impact: JVM just does what it?s told according to whatever is in V.class and I.class, using standard rules. >> - Is there a requirement that V and V?s super share the bodies of their non-static public methods? Low impact: javac plants abstract methods inside I.class and the JVM does what it?s told. (Javac is handling the Mirandas this time.) > Good questions, I hope to have answers eventually. If you have preferred answers, please share your thinking (Done for my part, see above!) ? John From john.r.rose at oracle.com Tue Aug 6 21:09:50 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 6 Aug 2019 14:09:50 -0700 Subject: Collapsing the requirements In-Reply-To: <725129568.291668.1565124607633.JavaMail.zimbra@u-pem.fr> References: <725129568.291668.1565124607633.JavaMail.zimbra@u-pem.fr> Message-ID: On Aug 6, 2019, at 1:50 PM, Remi Forax wrote: > > > I believe 'Q' should be only mean preload and the fact that the class has a inline bit should imply flattenable and non-nullable. > Yes, we consume one of these precious bit but at the same time, we nicely decouple the meaning of the descriptor from the meaning of the class itself. +1; JVM folks who were at the Burlington meeting earlier this year heard me saying ?Q means ?go and look?? ad nauseam. In other words, since Q means ?preload? to start with, every other implication of Q?s meaning can be obtained by looking inside the classfile of the referenced class. This commits us to Q entailing only def-site properties, and not carrying additional use-site information. I?d like to use a separate ?type operator? syntax, a la QPoint//anno; LString//anno; to carry per-use-site information not derived from the class definition. (What?s ?anno?? TBD, and open for continual extension like any annotation mechanism. Could be ?n? for ?not null? for starters if we need that. Could expand to cover species information.) As for burning a bit in the class file header, that?s doable, but we might also consider deriving the necessary schema information from the supers of the inline class, if we can hardwire the supers just right. For example, if a class declares ValObject as its super, and ValObject is hardwired as the mother of all inlines, then who needs an extra bit to confirm that the class is an inline? Put the extra bit into an internal JVM field on the metadata, but don?t burn an access_flags bit for that. From brian.goetz at oracle.com Fri Aug 9 15:46:19 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 9 Aug 2019 10:46:19 -0500 Subject: Equality for values -- new analysis, same conclusion Message-ID: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> Time to take another look at equality, now that we?ve simplified away the LFoo/QFoo distinction. This mail focuses on the language notion of equality (==) only. Let?s start with the simplest case, the ==(V,V) operator. There is a range of possible interpretations: - Not allowed; the compiler treats == as not applicable to operands of type V. (Note that since V <: Object, == may still be called upon to have an opinion about two Vs whose static types are Object or interface.) - Allowed, but always false. (This appeals to a concept of ?aggressive reboxing?, where a value is reboxed between every pair of byte codes.) - Weak substitutability. This is where we make a ?good faith? attempt to treat equal points as equal, but there are cases (such as those where a value hides behind an Object/interface) where two otherwise equal objects might report not equal. This would have to appeal to some notion of invisible boxing, where sometimes two boxes for the same value are not equal. - Substitutability. This is where we extend == field wise over the fields of the object, potentially recursively. As noted above, we don?t only have to define ==V, but ==Object when there may be a value hiding behind the object. It might be acceptable, though clearly weird, for two values that are ==V to not be ==Object when viewed as Objects. However, the only way this might make sense to users is if this were appealing to a boxing conversion, and its hard to say there?s a boxing conversion from V to Object when V <: Object. There is a gravitational force that says that if two values are V==, then they should still be == when viewed as Object or Comparable. Let?s take a look at the use cases for Object==. - Direct identity comparison. This is used for objects that are known to be interned (such as interned strings or Enum constants), as well as algorithms that want to compare objects by identity. such as IdentityHashMap. (When the operands are of generic (T) or dynamic (Object) type, the ?Interned? case is less credible, but the other cases are still credible.) - As a fast path for deeper equality comparisons (a == b || a.equals(b)), since the contract of equals() requires that == objects are equals(). - In comparing references against null. - In comparing references against a known sentinel value, such as a value already observed, or a sentinel value provided by the user. When generics are specialized, T== will specialize too, so when T specializes to a value, we will get V==, and when T is erased, we will get Object==. Suptyping is a powerful constraint; it says that a value is-a Object. While it is theoretically possible to say that v1==v2 does not imply that ((Object)v1 == (Object) v2), I think we?ll have a very hard time suggesting this with a straight face. (If, instead, the conversion from value to Object were a straight boxing conversion, this would become credible.) Which says to me that if we define == on values at all, then it must be consistent with == on object or interface types. Similarly, the fact that we want to migrate erased generics to specialized, where T== will degenerate to V== on specialization, suggests that having Object== and V== be consistent is a strong normalizing force. Having == not be allowed on values at all would surely be strange, since == is (mostly) a substitutibilty test on primitives, and values are supposed to ?work like an int.? And, even if we disallowed == on values, one could always cast the value to an Object, and compare them. While this is not an outright indefensible position, it is going to be an uncomfortable one. Having V== always be false still does not seem like something we can offer with a straight face, again, citing ?works like an int.? Having V== be ?weak substitutability? is possible, but I don?t think it would make the VM people happy anyway. Most values won?t require recursive comparisons (since most fields of value types will be statically typed as primitives, refs, or values), but much of the cost is in having the split at all. Note too that treating == as substitutibility means use cases such as IdentityHashMap will just work as expected, with no modification for a value-full world. So if V <: Object, it feels we are still being ?boxed? into the corner that == is a substitutability test. But, in generic / dynamically typed code, we are likely to discourage broad use of Object==, since the most common case (fast path comparison) is no long as fast as it once was. We have a few other options to mitigate the performance concerns here: - Live with legacy ACMP anomalies; - Re-explore a boxing relationship between V and Object. If we say that == is substitutability, we still have the option to translate == to something other than ACMP. Which means that existing binaries (and likely, binaries recompiled with ?source 8) will still use ACMP. If we give ACMP the ?false if value? interpretation, then existing classifies (which mostly use == as a fast-path check) will still work, as those tests should be backed up with .equals(), though they may suffer performance changes on recompilation. This is an uncomfortable compromise, but is worth considering. Down this route, ACMP has a much narrower portfolio, as we would not use it in translating most Object== unless we were sure we were dealing with identityful types. The alternate route to preserving a narrower definition of == is to say that _at the language level_, values are not subtypes of Object. Then, we can credibly say that the eclair companion type is the box, and there is a boxing conversion between V and I (putting the cream in the eclair is like putting it in a box.) This may seem like a huge step backwards, but it actually is a consistent world, and in this world, boxing is a super-lightweight operation. The main concern here is that when a user assigns a value to an object/interface type, and then invokes Object.getClass(), they will see the value class ? which perhaps we can present as ?the runtime box is so light that you can?t even see it.? Where this world runs into more trouble is with specialized generics; we?d like to treat specialized Foo as being generic in ?T extends Object?, which subsumes values. This complicates things like bound computation and type inference, and also makes invoking Object methods trickier, since we have to do some sort of reasoning by parts (which we did in M3, but didn?t like it.) tl;dr: if we want a unified type system where values are objects, then I think we have to take the obvious semantics for ==, and if we want to reduce the runtime impact on _old_ binaries, we should consider whether giving older binaries older semantics, and taking the discontinuity as the cost of unification. From john.r.rose at oracle.com Sat Aug 10 18:57:59 2019 From: john.r.rose at oracle.com (John Rose) Date: Sat, 10 Aug 2019 11:57:59 -0700 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> Message-ID: <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> Good analysis. I?m going to comment in part because I want us to be especially clear about definitions. I also have a few additions, naturally. At the end I try to distill our options down to 3 or 4 cases, A/B/C/D. On Aug 9, 2019, at 8:46 AM, Brian Goetz wrote: > > Time to take another look at equality, now that we?ve simplified away the LFoo/QFoo distinction. This mail focuses on the language notion of equality (==) only. Something implicit in this analysis is that there are really only two possible syntactic occurrences of operator== as it relates to an inline type: the statically typed and the dynamically typed. We are calling them V== and Object==; sometimes we say val==, ref==, etc. By context V is either a specific inline type, or else is a stand-in for all inline types, in a statement that applies to them all. I suppose val== is clearer than V== in the latter case, so I?ll say val== when I?m talking about all inline types. The term Object== is unambiguous, as the occurrence of operator== when at least one operand is a non-inline, non-primitive type, such as an interface. When we talk about generics we might say T==, where T stands for some type parameter, but I think that?s a trap, because we immediately have to say whether we are talking about erased vs. specialized generics, and also it?s conceivable that the bound type of T might affect the interpretation of the operator== sign. So, there are two kinds of dynamically typed code: (1) code which works on values under a top type, either Object or an Interface (or some other abstract type, if we go there), and (2) code which works on values under some type variable T *where T is erased not specialized*. There are two kinds of statically typed code: (3) code which works on values under their own type V, and (4) code which uses a type variable T bound to V *using specialization*. Today we work with all types but (4), which comes in with specialized generics. It?s a good simplifying assumption that operator== means one thing in both dynamic cases, and the other thing in both static cases. We could try to weasel in a difference between (1) and (2), but that spoils a refactoring which replaces T by a subtype of its bound. Same point for (3) and (4). I hope it?s not necessary to go there, even though we are in a tight spot. If we buy the above classifications, then we are really talking about static== and dynamic==, under the guise of val== and Object==. This is useful to remember when we go back and think about primitive prim== (int==, float==, etc.) which are similar to val== but, naturally, have to differ in various details due to 25-year-old decisions. (It?s not premature for me to mention primitives at this point, since they are one of the two design centers in the normative slogan, ?codes like a class, works like an int?. More in a bit.) > Let?s start with the simplest case, the ==(V,V) operator. There is a range of possible interpretations: > > - Not allowed; the compiler treats == as not applicable to operands of type V. (Note that since V <: Object, == may still be called upon to have an opinion about two Vs whose static types are Object or interface.) > - Allowed, but always false. (This appeals to a concept of ?aggressive reboxing?, where a value is reboxed between every pair of byte codes.) > - Weak substitutability. This is where we make a ?good faith? attempt to treat equal points as equal, but there are cases (such as those where a value hides behind an Object/interface) where two otherwise equal objects might report not equal. This would have to appeal to some notion of invisible boxing, where sometimes two boxes for the same value are not equal. > - Substitutability. This is where we extend == field wise over the fields of the object, potentially recursively. Let me venture to give some very specific names to these operations, at the risk of derailing the discussion into bikeshed painting. "ID==" Objects.identityEquals(o1, o2) - Returns true if o1 and o2 both have the same object identity, or if o1 and o2 are both null. Returns false for non-identity objects, even if o1 SAME== o2. Then we also have Objects.hasIdentity(o) -> o != null && identityEquals(o, o); (Cf. Math.isNaN for the x==x move.) ?SAME==" Objects.substitutabilityEquals(o1, o2) - Returns true if and only if o1 and o2 are substitutable for each other. Either Objects.identityEquals or else both are the same inline type V with recursively ?SAME==? fields. (In the recursion step, ?SAME==? applies to primitives in the same way that Object.equals applies to their wrappers. Since arrays have object identity, there?s no recursion into arrays.) Since ID== and SAME== differ only for inline types, they mean the same thing in today?s Java, but differ in Valhalla. ?FAST==" Unsafe.fastEqualsCheck(o1, o2) - Must return false if SAME== would return false. May return false even if SAME== would return true. Attempts to return true when it can do so efficiently, on a best efforts basis. JITs may transform to more or less sensitive tests, or even to constant false, based on optimization context. The typical implementation is raw pointer comparison, which is very fast, but sometimes fails to detect identical inline values buffered in different places. There?s no need for FAST== today, since ID== is very fast. If ID== gets slower maybe we want FAST== as a backup. These ?==? operations are not syntactic but semantic, hence the different (upper-case) notation. The syntax operators val==, Object== have to be associated with one of those semantic operations. > As noted above, we don?t only have to define ==V, but ==Object when there may be a value hiding behind the object. It might be acceptable, though clearly weird, for two values that are ==V to not be ==Object when viewed as Objects. However, the only way this might make sense to users is if this were appealing to a boxing conversion, and its hard to say there?s a boxing conversion from V to Object when V <: Object. There is a gravitational force that says that if two values are V==, then they should still be == when viewed as Object or Comparable. > > Let?s take a look at the use cases for Object==. > > - Direct identity comparison. This is used for objects that are known to be interned (such as interned strings or Enum constants), as well as algorithms that want to compare objects by identity. such as IdentityHashMap. (When the operands are of generic (T) or dynamic (Object) type, the ?Interned? case is less credible, but the other cases are still credible.) Under the assumption that all operands are identity-objects, ID== and SAME== produce the same results and can be replaced secretly by FAST== (if the JIT can speculate or prove that assumption). If the JIT can?t tell what?s going on, then both ID== and SAME== work the same, since the slow path of SAME== never gets executed. There?s a small runtime cost for detecting non-ID objects (inlines). > - As a fast path for deeper equality comparisons (a == b || a.equals(b)), since the contract of equals() requires that == objects are equals(). This is what I call L.I.F.E., the Legacy Idiom For Equality. ID== is good here too. FAST== would be fine here, and a JIT could perform that strength reduction if it notices that the == expression is post-dominated by Object.equals (on the false path). I think that?s usually detectable. If we define Object== as SAME==, then the optimization challenge becomes urgent, since the user has written the moral equivalent of ?v.equals(v2) || v.equals(v2)? which appears to duplicate effort. Oops. But the same optimization as for ID== applies: Strength-reduce the first part of L.I.F.E. to FAST== (and maybe then to constant false, FALSE==). In that case, the equals call can happen even if the two inputs are SAME==, so the JIT would have to ensue that this is a valid transform: The V.equals method must not have side effects and must only consult fields of V (and recursive uses of SAME== or other safe equals methods) to make its decisions. This requires putting constraints on certain equals methods, which we could do either by fiat or by analysis. Either way, we should do it. > - In comparing references against null. This is easy to optimize to FAST==, since the JIT almost certainly can see the null. (So FAST== wins a lot in these cases, but it can do so secretly, without the user mentioning Unsafe methods, or burning the FAST== semantics into a public spec.) > - In comparing references against a known sentinel value, such as a value already observed, or a sentinel value provided by the user. This can be optimized if something is statically known about the sentinel. If it is ID-laden, then FAST== is sufficient. If the sentinel is a known inline type, then it?s an instanceof check plus a monomorphic SAME== (for a particular V). If the sentinel is of unknown type, then polymorphic techniques can reduce the cost, but it?s potentially an expensive SAME==, if that?s the semantics we demand. So this scenario splits into a statically-known sentinel and a statically-unknown one. (Maybe Brian only meant the statically-known case. But the other shows up too. More in a second.) > When generics are specialized, T== will specialize too, so when T specializes to a value, we will get V==, and when T is erased, we will get Object==. For an erased generic, the cases above come into play, since it?s really just another instance of dynamic code. OK, so Object== has three candidates: FAST==, ID==, SAME==. SAME== cleanly extends the notion of ?is the same object? to non-identity objects, using the concept of substitutability. I think we?d prefer that outcome. (Note: The term ?same? is the relevant term in the JLS and JVMS today, applying a substitutability test to both primitives and identity-laden objects. ?Substitutable? means the same thing as ?same? but with more syllables. Sometimes you want more syllables, sometimes fewer.) FAST== would inject a lot of uncertainty into Java semantics, so I say let?s leave it to JITs as a secret weapon. ID== is plausible: It is optimizable in the same places SAME== can be optimized, and its slow path is still fast. But it creates strange behaviors, such as ?x != x?. SAME== is often optimizable to FAST==. In a few cases we?d reach the slow path and see a performance drop, in code that works like this: boolean goofyNonLifeContains(List xs, Object x) { for (var x1 : xs) if (x1 == x) return true; return false; } (Note: This is non-LIFE code. ArrayList.contains can optimize SAME== and ID== to FAST==, by leaning on the semantics of the Object.equals call in the LIFE.) Here we don?t know what x is, necessarily, and so if we say that Object== must translate to SAME==, we have an optimization challenge. The code will probably try to speculate that x is an ID-object (or null) and run a special loop that lowers SAME== to FAST==, with another copy of the loop that ?knows? x is an inline object. Some library code may adapt to this new reality manually the way it does with null today (see ArrayList::indexOf for an example). Or not: Code with a proper LIFE can rely on the JIT to clean things up. I think that *only these cases* of variable sentinels or algorithms involving ?raw? pointers compares will put demands on Object== that might discourage us from what we know we should do, which is define Object== as SAME==. I think it?s worth it. I hope I won?t eat my words, with angry customers showing us their code falling into the slow path of SAME== on some miserably polymorphic code. Let?s watch for this? The fallback position would ID==. (Note: We haven?t said anything yet about val==. We are about to.) > Suptyping is a powerful constraint; it says that a value is-a Object. While it is theoretically possible to say that v1==v2 does not imply that ((Object)v1 == (Object) v2), I think we?ll have a very hard time suggesting this with a straight face. (If, instead, the conversion from value to Object were a straight boxing conversion, this would become credible.) Which says to me that if we define == on values at all, then it must be consistent with == on object or interface types. It?s not just theory, it?s practice, so maybe this is overstating this case, a little. IDEs guide users away from the second expression (for many types, like String and Integer), offering to replace it with a call to v1.equals(v2). In the presence of primitives (the ?works like an int? half of values), the above implication is routinely false for values outside of the valueOf cache (abs(v) > 256). So there?s no code base out there that does the ?cast to object and compare? trick for ?works like an int? types, today. For today's ?codes like a class? entities, what you say is 100% true. And the non-boxing relation V <: Object is on the ?like a class? side of the ledger. Still, I think we could do the above with a straight face if it was part of a user model that split the difference between int and class, making clear which aspects of V are ?like an int? and which ?like a class?. See more thoughts below. > Similarly, the fact that we want to migrate erased generics to specialized, where T== will degenerate to V== on specialization, suggests that having Object== and V== be consistent is a strong normalizing force. I think the root phenomenon here is that dynamic code and static code with similar syntax should have similar semantics. This link is made in the ways Brian points out: On the ?codes like a class? side of the design, V <: Object entails that expressions valid under the higher type should be valid under the lower type also. And migration from erased to specialized also moves code from Object down to some specific V, where it should have the same meaning. So Object== (the friend of dynamic code, at the top of the type hierarchy) should do the same as little val== (as viewed in static code). On the other hand, V must also ?work like an int?. And int== is not the same as Integer== (hence Brian?s note that boxing gives us some extra wiggle room in our design). (I should also point out that int+ is not the same as Object+. We shouldn?t expect that migration to specialized expressions involving obj+obj to work without a hitch. So op== isn?t unique in threatening automagic migration of generics.) > Having == not be allowed on values at all would surely be strange, since == is (mostly) a substitutibilty test on primitives, and values are supposed to ?work like an int.? And, even if we disallowed == on values, one could always cast the value to an Object, and compare them. While this is not an outright indefensible position, it is going to be an uncomfortable one. The goal is surely to find the least uncomfortable position, and I expect discomfort on operator== no matter what. (I?ll explain why in a moment.) I do think we we could defend a *temporary* ban on val== until we gain more data, to confirm SAME== or some other option. I also think, in any case, we want a named method like v.isSame(v1), to make it clear that a full SAME== test is intended. People could use that instead of == while we evaluated our options further. OTOH, I think we are close, now, to being able to make a solid decision. > Having V== always be false still does not seem like something we can offer with a straight face, again, citing ?works like an int.? Designer: ?See, for consistency with Object==, we define val== as ID==.? User: ?You mean, you *know* it?s an identity-free type and you are looking for its *object identity*??? Designer: ?It seemed like a consistent thing to do. Let me get back to you?? (Java has lots of precedent for statically rejecting stupid expressions, like ((String)x) instanceof Integer. Using ID== to compare non-id objects is deeply stupid, so should be rejected.) > Having V== be ?weak substitutability? is possible, but I don?t think it would make the VM people happy anyway. Most values won?t require recursive comparisons (since most fields of value types will be statically typed as primitives, refs, or values), but much of the cost is in having the split at all. Yep. And the JIT can get most of the benefit of explicit FAST== secretly, by strength reduction in common cases. > Note too that treating == as substitutibility means use cases such as IdentityHashMap will just work as expected, with no modification for a value-full world. That?s a nice point. The effect of SAME== (with a suitable System.identityHashCode) is to make every distinct inline value appear to be interned to unique object identity, as long as you don?t look too closely (e.g., by synchronizing). > So if V <: Object, it feels we are still being ?boxed? into the corner that == is a substitutability test. But, in generic / dynamically typed code, we are likely to discourage broad use of Object==, since the most common case (fast path comparison) is no long as fast as it once was. Generally speaking, I agree we should write up some warnings about Object==. Happily, the JIT?s secret use of FAST== is a safety net for customers that ignore our warnings. And simple stuff like ?x == null? or ?x == y || x.equals(y)? won?t lose performance, given some L.I.F.E. support in the JIT. As far as Object== goes in today?s usages, ID== would be an acceptable alternative to SAME==, since the two differ only for value types. But (as Brian says) if Object== is bound to ID==, and val== *must not* be bound to ID==, then we have a schism between static and dynamic views of every V. This schism would be painful. What would it gain? We would avoid running the slow path of SAME==, which is the cost that ID== doesn?t have. But I think I?ve shown that this slow path is not a risk for most uses of Object==. So, ID== doesn?t buy much over SAME==, and therefore Object== and val== don?t need a schism. But if I?m wrong, and we need to fall back to ID==, then we will want to reconsider making a schism between dynamic and static views. > We have a few other options to mitigate the performance concerns here: > > - Live with legacy ACMP anomalies; I.e., don?t change ACMP, so it turns out to be FAST==, or change it to ID== but not SAME==. This play makes ACMP into a third independent design point, besides Object== and val==, and makes it differ from both. Currently, Object== compiles to ACMP, so this would suggest that ACMP should be upgraded to the new behavior of Object==, which is SAME== (or as a fallback ID==). But Object== could compile to a method call, if we want to decouple ACMP from Object==. > - Re-explore a boxing relationship between V and Object. This is a different take on ?codes like a class, works like an int?. (More below.) > If we say that == is substitutability, we still have the option to translate == to something other than ACMP. Which means that existing binaries (and likely, binaries recompiled with ?source 8) will still use ACMP. If we give ACMP the ?false if value? interpretation, then existing classifies (which mostly use == as a fast-path check) will still work, as those tests should be backed up with .equals(), though they may suffer performance changes on recompilation. This is an uncomfortable compromise, but is worth considering. Down this route, ACMP has a much narrower portfolio, as we would not use it in translating most Object== unless we were sure we were dealing with identityful types. I like the idea of limiting the use of ACMP to ID==, *if it pays off*. Or, we could deprecate and remove ACMP, in favor of method intrinsics for ID== and SAME==. We could keep ACMP as FAST==, period. But I think that?s too unpredictable; ID== looks safer to me even for legacy code. As noted above, the JIT will often strength-reduce ACMP to FAST==, if it can prove the move is valid. So there?s little upside to mandating FAST== in the JVMS. So the good candidates for ACMP are SAME== and ID==. I think maybe we want performance runs on a JVM which is configured with a switch to bind ACMP to SAME== and ID==, and see what is the downside for SAME==. Surely it?s very small today, since inline instances are not yet common. We can run artificial workloads with a number of inline types flowing through generic code. (In fact, we already have started such analyses, IIRC.) > The alternate route to preserving a narrower definition of == is to say that _at the language level_, values are not subtypes of Object. Then, we can credibly say that the eclair companion type is the box, and there is a boxing conversion between V and I (putting the cream in the eclair is like putting it in a box.) This may seem like a huge step backwards, but it actually is a consistent world, and in this world, boxing is a super-lightweight operation. The main concern here is that when a user assigns a value to an object/interface type, and then invokes Object.getClass(), they will see the value class ? which perhaps we can present as ?the runtime box is so light that you can?t even see it.? So, this a bold move, which changes the balance between ?codes like a class? and ?works like an int?. In fact I am convinced that the tension between those two propositions is where all of our discomfort comes from (about == and many things). The question about val== vs. Object== stresses precisely those points where int and Object have incompatible protocols. Our thesis requires us to split the difference between Object== and int== in order to assign a meaning to val== that is compatible enough with both sides. How do we split the difference? Do we align closely with ?codes like a class?, avoiding schism between val== and Object==? Or do we align closely with ?works like an int?, and embrace boxing as a transform that allows such as schism? Is there a third way that makes obvious sense for inline types (even int, eventually)? I think we should shoot for a third way, if we can find one. That seems to offer the best security against importing the limitations of the past into the future. > Where this world runs into more trouble is with specialized generics; we?d like to treat specialized Foo as being generic in ?T extends Object?, which subsumes values. This complicates things like bound computation and type inference, and also makes invoking Object methods trickier, since we have to do some sort of reasoning by parts (which we did in M3, but didn?t like it.) Yikes, I don?t understand these particular issues fully, but they sound scary. Part of the job of splitting the difference between int-nature and Object-nature involves deciding what to do with box/unbox conversions (which are part of int-nature). The ?gravity? of the situation prompts us to click into a tight orbit around either ?boxes and unboxes like an int? or ?subtypes Object like a class?. Maybe there?s a halfway (Lagrange) point out there somewhere, such as ?unboxes like an int and subtypes Object like a class?, where unboxing artifacts show up (as a special form of cast) but not boxing artifacts. Such a new point would compromise the user models of int and Object. We?ve looked hard for such a thing with not much luck on the problem of operator==, but I think it?s worth trying a little longer to come up with it. (Especially if it could make sense of float==, but I?m not holding my breath for that. See below for more thoughts about float==.) Looking very closely at the current cycle of int-box-Integer-unbox-int, maybe we can limit the irregularities (the schism mentioned above) in some way that allows us to ?blame? them on the unbox step. After all, the ?codes like a class? part only implies one direction of type conversion, from some V to Object or V.Box (the eclair interface). Going the other way, from V.Box to V, is a pure ?works like an int? move, since Java does not allow implicit down-casting, only unboxing conversions. And we can view V.Box-to-V as an unbox *as well as* (or maybe *instead of*) a down-cast. Does this buy us anything? Consider the parallel case of Integer (motivated by ?works like an int?): Integer == Integer => ID== int == Integer => SAME== Integer == int => SAME== int == int => SAME== (Here, I?m relying on the fact that int== is an instance of SAME==. That?s true except for float== and double==.) This suggests a new category of syntactic context for operator==, V.Box. So we have val==, val.box==, and Object==. For int, SAME== makes sense for all of them, although we might want to back off Object== to ID== (see above). Since val.box <: Object, we might expect that operator== must be the same for both, *but no*. The operator== for the box is referred to the prim== for the primitive, if at least one primitive is present. It looks like our job got more complicated. And the gravitational attraction between val== and Object== is more entangled. What is the place (in the above setup) of a schism between val== and Object==? It would be that two ID-types (Integer, Object, V.Box) compared with operator== would bind to Object==, hence SAME== (or ID== as a backup). A mix of V-types and ID-types (or pure V-types) would bind to V==, hence SAME== (and *not* ID==). The relation V <: V.Box preserves the SAME== semantics of V as long as any given v1==v2 has at least one ?witness? of V (not V.Box). The relation V.Box preserves the SAME== semantics of V, unless we have to back down to ID== (for reasons discussed above). The choice of val== vs. Object== is uncomfortable here, but it?s a feature of today?s language. Clearly we wish to improve it, but we are not mandated to fix it, under the slogan ?codes like a class works like an int?, which simply requires us to make a new compromise between existing compromises. So I?ll repeat that SAME== saves us a lot of trouble, since it is already compatible with prim== and Object== (except for float== and double==). If we can use SAME== (except for double== and float==) we win. If we are forced to back down to ID== (as a binding for Object==), then where and how do we transition from ID== (as Object==) to SAME== (as val==)? I would say that a reasonable level of discomfort would be to emulate today?s rules about X==Y: - If either X or Y is an inline type (or primitive) then use val== which is SAME== (or prim==, which is usually SAME==). - If either X or Y is a super of V.Box (such as Object) then use Object== which is ID== (as a fallback). - if both X and Y are V.Box then also use Object== (today?s rule!) So, today?s Integer, applied to operator==, does not always fall up to Object==. Sometimes it falls down to int==, if an int witness is present. If we replicate that rule for inlines, we obtain an uncomfortable mix between ?codes like a class? (can jump up to Object) and ?works like an int? (can unbox down to a primitive, which is not like an Object). If we don?t want to replicate that model, we have to create a new one and sell *that* to users. So my point here, I guess, is that if we move towards a ?box-centric? design for separating val== from Object==, as Brian suggests, there are still shades of gray between V and Object (namely V.Box). > tl;dr: if we want a unified type system where values are objects, then I think we have to take the obvious semantics for ==, and if we want to reduce the runtime impact on _old_ binaries, we should consider whether giving older binaries older semantics, and taking the discontinuity as the cost of unification. So far we have avoided the vexed issue of float== and double==, which is yet another kind of equality. *If* we were to somehow nullify the gravitational entanglement of val== with Object==, and define val==(x,y) as something different from Object==(x,y) (i.e., (Object)x==(Object)y), *then* we could safely define val== in a way that can extend to cover float== (which isn?t even an equality comparison, since it isn?t reflexive on NaN). *But* the costs of this are (1) not being able to define any val== until we have an operator overloading story, *and* (2) having a permanent schism between Object== and V== for many V types, not just float and double. Personally, I think the cost seems high for this. But, it provides (to me) a slight hint that we could *delay* defining val==, forcing early access users to say what they mean, using a method call to compare their values in statically typed code, while we consider the matter further. This is a toe-dip into the waters of Brian?s "Re-explore a boxing relationship between V and Object? but without going all the way in. To summarize, I see three stable positions: A. Make Object== and val== bind to SAME==, using JIT optimizations to the hilt. (At this point, bind ACMP to SAME== also.) Then only float== and double== are outliers to be dealt with later using ad hoc rules. B. Make Object== be SAME== and val== be an overloaded operator, usually SAME== (except for float, double, and IEEE-like user types). Overloading can be introduced later; we start by hardwiring val== as SAME== but plan to unlock it to users. C. Make Object== be ID== and val== be an overloaded operator as in B. This is motivated as a fall-back from B, if we get evidence that the slow path of SAME== is burdensome for dynamic code. In *all cases* we need named entry points for SAME== and ID== (not FAST==) so users can say what they mean. I think SAME== should have bindings for both Object (dynamic typing) and every V (static typing). We can choose A and later extend it to B. In addition, there is a fourth unstable position which may be temporarily desirable: D. Do not define val== as yet, but rather supply symbolic methods for ID== and SAME==. Let users say what they mean. Define Object== as ID== or even FAST== (undefined result) as a temporary measure. See what users say, and adjust the model accordingly to one of A or B above. We can choose D and later extend it to any of A, B, or C. This was very long; I hope it helps. ? John From maurizio.cimadamore at oracle.com Sun Aug 11 23:27:45 2019 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 12 Aug 2019 00:27:45 +0100 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> Message-ID: <010b2e96-4dcf-8a73-0cd2-dd7d8633c246@oracle.com> Agree on the analysis - in a world where values ARE objects, == needs to mean the right thing. Any other answer will be seen as trickery. On 09/08/2019 16:46, Brian Goetz wrote: > The alternate route to preserving a narrower definition of == is to say that_at the language level_, values are not subtypes of Object. Then, we can credibly say that the eclair companion type is the box, and there is a boxing conversion between V and I (putting the cream in the eclair is like putting it in a box.) This may seem like a huge step backwards, but it actually is a consistent world, and in this world, boxing is a super-lightweight operation. The main concern here is that when a user assigns a value to an object/interface type, and then invokes Object.getClass(), they will see the value class ? which perhaps we can present as ?the runtime box is so light that you can?t even see it.? We have discussed this at the EG meeting in Santa Clara two weeks ago - IIRC the conclusion there was that, since values can implement interfaces, it would be odd for values not to subtype same interfaces; and Object is really just another kind of interface. FWIW, I too sense that subtyping is 'boxing' us in (and have been in favor of finding ways to cut that particular knot) - but I realize the above argument is also very strong. Maurizio From brian.goetz at oracle.com Sun Aug 11 23:52:36 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 11 Aug 2019 19:52:36 -0400 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <010b2e96-4dcf-8a73-0cd2-dd7d8633c246@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <010b2e96-4dcf-8a73-0cd2-dd7d8633c246@oracle.com> Message-ID: In Q world, we derived: V implements I ----------------------- box(V) <: I which is a credible interpretation. So if we wanted to have a box-centric interpretation, we could go back to that. But, in terms of where we want to go, ?values are objects?, and ?== is same-value?, is a better place to land, even if the road is rockier in the short run. > On Aug 11, 2019, at 7:27 PM, Maurizio Cimadamore wrote: > > Agree on the analysis - in a world where values ARE objects, == needs to mean the right thing. Any other answer will be seen as trickery. > > On 09/08/2019 16:46, Brian Goetz wrote: >> The alternate route to preserving a narrower definition of == is to say that_at the language level_, values are not subtypes of Object. Then, we can credibly say that the eclair companion type is the box, and there is a boxing conversion between V and I (putting the cream in the eclair is like putting it in a box.) This may seem like a huge step backwards, but it actually is a consistent world, and in this world, boxing is a super-lightweight operation. The main concern here is that when a user assigns a value to an object/interface type, and then invokes Object.getClass(), they will see the value class ? which perhaps we can present as ?the runtime box is so light that you can?t even see it.? > > We have discussed this at the EG meeting in Santa Clara two weeks ago - IIRC the conclusion there was that, since values can implement interfaces, it would be odd for values not to subtype same interfaces; and Object is really just another kind of interface. > > FWIW, I too sense that subtyping is 'boxing' us in (and have been in favor of finding ways to cut that particular knot) - but I realize the above argument is also very strong. > > Maurizio > From forax at univ-mlv.fr Mon Aug 12 14:23:00 2019 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 12 Aug 2019 16:23:00 +0200 (CEST) Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> Message-ID: <440965400.64542.1565619780321.JavaMail.zimbra@u-pem.fr> I think we should take a step back on that subject, because you are all jumping to the conclusion too fast in my opinion. Let starts by the beginning, the question about supporting == on inline type should first be guided by what we should have decided if inline types were present from the inception of Java, it's the usual trick when you want to retcon a feature. If we had inline types from the beginning, i believe we will never had allowed == on Object, the root type of the hierarchy, but have a special method call that will only work on indirect type like in C#. So Object== is a kind of liability, so we are not here to provide a nice semantics to Object== but to deprecate Object== and provide a backward compatible way to make inline types to work with old codes. So first, we have to clearly convey that == should be deprecated apart on primitive type, i propose - to banned V== (compile error) - to make Object==and T== emit a compiler warning explaining that the code should be changed - add a method System.identityEquals(RefObject, RefObject) as replacement Now, the second thing that disturb me is that no email of this thread, lists the two issues of the substitutibility test that make it unsuitable as an implementation of Object==. - it's not compatible with the primitive == on float and double, by example, inline class InlineFloat { float value; public boolean equals(Object o) { if (!(o instanceof InlineFloat i)) { return false; } return value == i.value; } } has the stupid property of having == being true and equals() being false if value is NaN. - it can be really slow 1) Object== can be megamorphic 2) Object== can do a recursive call so it destroys the assumption that Object== is faster than equals. Hopefully that we are not trying to make == to work on inline types, but only to make Object== to have a compatible semantics when one of the operand is an inline type. so the only choice we have is to return false is the left or the right operand is an inline type. And yes, people will find it weird but that's why we are deprecating it after all. R?mi ----- Mail original ----- > De: "Brian Goetz" > ?: "valhalla-spec-experts" > Envoy?: Vendredi 9 Ao?t 2019 17:46:19 > Objet: Equality for values -- new analysis, same conclusion > Time to take another look at equality, now that we?ve simplified away the > LFoo/QFoo distinction. This mail focuses on the language notion of equality > (==) only. Let?s start with the simplest case, the ==(V,V) operator. There is > a range of possible interpretations: > > - Not allowed; the compiler treats == as not applicable to operands of type V. > (Note that since V <: Object, == may still be called upon to have an opinion > about two Vs whose static types are Object or interface.) > - Allowed, but always false. (This appeals to a concept of ?aggressive > reboxing?, where a value is reboxed between every pair of byte codes.) > - Weak substitutability. This is where we make a ?good faith? attempt to treat > equal points as equal, but there are cases (such as those where a value hides > behind an Object/interface) where two otherwise equal objects might report not > equal. This would have to appeal to some notion of invisible boxing, where > sometimes two boxes for the same value are not equal. > - Substitutability. This is where we extend == field wise over the fields of > the object, potentially recursively. > > As noted above, we don?t only have to define ==V, but ==Object when there may be > a value hiding behind the object. It might be acceptable, though clearly weird, > for two values that are ==V to not be ==Object when viewed as Objects. However, > the only way this might make sense to users is if this were appealing to a > boxing conversion, and its hard to say there?s a boxing conversion from V to > Object when V <: Object. There is a gravitational force that says that if two > values are V==, then they should still be == when viewed as Object or > Comparable. > > Let?s take a look at the use cases for Object==. > > - Direct identity comparison. This is used for objects that are known to be > interned (such as interned strings or Enum constants), as well as algorithms > that want to compare objects by identity. such as IdentityHashMap. (When the > operands are of generic (T) or dynamic (Object) type, the ?Interned? case is > less credible, but the other cases are still credible.) > - As a fast path for deeper equality comparisons (a == b || a.equals(b)), since > the contract of equals() requires that == objects are equals(). > - In comparing references against null. > - In comparing references against a known sentinel value, such as a value > already observed, or a sentinel value provided by the user. > > When generics are specialized, T== will specialize too, so when T specializes to > a value, we will get V==, and when T is erased, we will get Object==. > > Suptyping is a powerful constraint; it says that a value is-a Object. While it > is theoretically possible to say that v1==v2 does not imply that ((Object)v1 == > (Object) v2), I think we?ll have a very hard time suggesting this with a > straight face. (If, instead, the conversion from value to Object were a > straight boxing conversion, this would become credible.) Which says to me that > if we define == on values at all, then it must be consistent with == on object > or interface types. > > Similarly, the fact that we want to migrate erased generics to specialized, > where T== will degenerate to V== on specialization, suggests that having > Object== and V== be consistent is a strong normalizing force. > > > Having == not be allowed on values at all would surely be strange, since == is > (mostly) a substitutibilty test on primitives, and values are supposed to ?work > like an int.? And, even if we disallowed == on values, one could always cast > the value to an Object, and compare them. While this is not an outright > indefensible position, it is going to be an uncomfortable one. > > Having V== always be false still does not seem like something we can offer with > a straight face, again, citing ?works like an int.? > > Having V== be ?weak substitutability? is possible, but I don?t think it would > make the VM people happy anyway. Most values won?t require recursive > comparisons (since most fields of value types will be statically typed as > primitives, refs, or values), but much of the cost is in having the split at > all. > > Note too that treating == as substitutibility means use cases such as > IdentityHashMap will just work as expected, with no modification for a > value-full world. > > So if V <: Object, it feels we are still being ?boxed? into the corner that == > is a substitutability test. But, in generic / dynamically typed code, we are > likely to discourage broad use of Object==, since the most common case (fast > path comparison) is no long as fast as it once was. > > > We have a few other options to mitigate the performance concerns here: > > - Live with legacy ACMP anomalies; > - Re-explore a boxing relationship between V and Object. > > If we say that == is substitutability, we still have the option to translate == > to something other than ACMP. Which means that existing binaries (and likely, > binaries recompiled with ?source 8) will still use ACMP. If we give ACMP the > ?false if value? interpretation, then existing classifies (which mostly use == > as a fast-path check) will still work, as those tests should be backed up with > .equals(), though they may suffer performance changes on recompilation. This > is an uncomfortable compromise, but is worth considering. Down this route, > ACMP has a much narrower portfolio, as we would not use it in translating most > Object== unless we were sure we were dealing with identityful types. > > The alternate route to preserving a narrower definition of == is to say that _at > the language level_, values are not subtypes of Object. Then, we can credibly > say that the eclair companion type is the box, and there is a boxing conversion > between V and I (putting the cream in the eclair is like putting it in a box.) > This may seem like a huge step backwards, but it actually is a consistent > world, and in this world, boxing is a super-lightweight operation. The main > concern here is that when a user assigns a value to an object/interface type, > and then invokes Object.getClass(), they will see the value class ? which > perhaps we can present as ?the runtime box is so light that you can?t even see > it.? > > Where this world runs into more trouble is with specialized generics; we?d like > to treat specialized Foo as being generic in ?T extends Object?, which > subsumes values. This complicates things like bound computation and type > inference, and also makes invoking Object methods trickier, since we have to do > some sort of reasoning by parts (which we did in M3, but didn?t like it.) > > tl;dr: if we want a unified type system where values are objects, then I think > we have to take the obvious semantics for ==, and if we want to reduce the > runtime impact on _old_ binaries, we should consider whether giving older > binaries older semantics, and taking the discontinuity as the cost of > unification. From brian.goetz at oracle.com Mon Aug 12 17:37:41 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 12 Aug 2019 13:37:41 -0400 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <440965400.64542.1565619780321.JavaMail.zimbra@u-pem.fr> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <440965400.64542.1565619780321.JavaMail.zimbra@u-pem.fr> Message-ID: <05253002-9c5e-902d-285d-a42affe8bf48@oracle.com> > I think we should take a step back on that subject, > because you are all jumping to the conclusion too fast in my opinion. > > Let starts by the beginning, > the question about supporting == on inline type should first be guided by what we should have decided if inline types were present from the inception of Java, it's the usual trick when you want to retcon a feature. This is a good question, and worth discussing. > If we had inline types from the beginning, i believe we will never had allowed == on Object, the root type of the hierarchy, but have a special method call that will only work on indirect type like in C#. Talk about jumping to conclusions too fast :)? This is surely one of the options, but by far not the only. If we had the benefits of hindsight (both for how Java is used, and where it was going), we might instead have chosen the following total operators: ?- `==` is a substitutibility test for all types ?- `===` delegates to equals() for all class types, and == for primitives (let's not discuss this further, as it is separable and surely not the problem on the table.) Note that on "traditional" object references, Object== _is already_ a substitutibility test.? In fact, on every type that `==` is defined today, it is a substitutibility test (modulo NaN.)? So while we might have chosen a different path back then, we can still choose a path that is consistent with where we might have gone, by extending == to be a substitutibility test for the new types.? This also seems the path of least astonishment. The problem with Object== is not that it is unsound, it's that it is _badly overused_.? This largely comes from coding conventions set very early in Java's lifetime, such as using `==` as a quick check both in the implementation of `equals()` methods, and before calling `equals()` (e.g., `x == y || x.equals(y)`).? And this overuse comes from performance assumptions from the Java 1.0 days, which were that everything was interpreted and virtual method calls like equals() were super-expensive.? This was true for the first few years, but in hindsight, these coding patterns are the boat anchor, not the semantics of Object== itself.? These patterns went from "necessary for performance" to "useless but harmless", and it is their harmlessness that has allowed them to survive. Also, let's be honest: the sole reason we're having this conversation is that we are concerned about the performance impact. That should surely be considered, but letting that dictate the semantics of language-level equality would be an extremely risky move -- and something we should consider with the utmost of care and skepticism. > i propose > - to banned V== (compile error) > - to make Object==and T== emit a compiler warning explaining that the code should be changed > - add a method System.identityEquals(RefObject, RefObject) as replacement I get why this is attractive to you, but I think it will be a constant source of confusion to users.? First, we've told users that one of the key use cases for value types is numerics.? Numerics are frequently compared for equality.? That users can't use `==` on numeric values at all will surely be a puzzlement, and not just once per user.? (There are things that we can explain to users, and they'll say "OK, I don't like it but I get it", but if we try to explain to them why they can't compare two Float16s for equality, their eyes will likely glaze over and will say "you guys have gone off the deep end.") Further, many algorithms need to use == to say "have I reach the sentinel value" or "is this the element I am looking for."? In performance-sensitive code, users want to use == in preference to equals().? This again will be a source of puzzlement. So this approach, while viable, has a much higher cognitive-load cost than you are imagining.? (Yes, you could say "when we have operator overloading, it won't be a problem."? Given that this is not coming for a while, if at all, I don't see this as an answer. And again, let's not discuss this now, as it is a distraction.) I do agree that we should seek to discourage the over-use of `==` through compiler warnings and other tools.? But I think that's a separate and separable problem. > Now, the second thing that disturb me is that no email of this thread, lists the two issues of the substitutibility test that make it unsuitable as an implementation of Object==. > - it's not compatible with the primitive == on float and double, by example, I think this is mostly a "whatabout" argument.? Yes, it's irritating.? Yes, it's tiring to keep saying "modulo NaN".? Yes, it was probably a mistake.? But given the choice between: ?- NaN is so weird that we should just treat it as a removable discontinuity ?- See, NaN does it, so we have precedent, and now can do it wherever we like there's a reasonable choice, and an insane choice.? The reason no one has brought it up is because no one wanted to advocate for the insane choice.? That seems sane to me. > has the stupid property of having == being true and equals() being false if value is NaN. Yes, its stupid.? Do we want to say "oops, we made a mistake there", or emulate that mistake forevermore? > - it can be really slow > 1) Object== can be megamorphic > 2) Object== can do a recursive call > so it destroys the assumption that Object== is faster than equals. This has been discussed extensively, so it puzzles me why you think it hasn't been discussed.? Yes, this is a big concern.? Yes, we should look for ways to mitigate this.? Yes, we should seek to discourage the rampant overuse of ==, and to the extent that the performance model has shifted, educate users about the new performance model.? But it is not, in itself, an argument why we should pick the wrong (or no) semantics for val==. > so the only choice we have is to return false is the left or the right operand is an inline type. OK, my turn to be disturbed.? Yes, this is a valid choice, and we can discuss it.? But to claim that it is the only choice ... well, to misquote Lord Vader: "I find your lack of imagination ... disturbing." ??? https://youtu.be/m0XuKORufGk?t=20 From daniel.smith at oracle.com Mon Aug 12 18:17:51 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 12 Aug 2019 12:17:51 -0600 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> Message-ID: <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> > On Aug 9, 2019, at 9:46 AM, Brian Goetz wrote: > > So if V <: Object, it feels we are still being ?boxed? into the corner that == is a substitutability test. But, in generic / dynamically typed code, we are likely to discourage broad use of Object==, since the most common case (fast path comparison) is no long as fast as it once was. > > > We have a few other options to mitigate the performance concerns here: > > - Live with legacy ACMP anomalies; > - Re-explore a boxing relationship between V and Object. Seems clear to me that a substitutability test is what makes the most semantic sense to most everyone, and we're just struggling to justify that it's "worth it". But before we jump to mitigation?up to and including redesigning the whole object model?do we have anything concrete to say about costs? Like, if every program got 5% slower in JDK X, obviously that would be bad. If programs that make heavy use of values have to recognize that equality is a little more expensive than they're used to, ... not even clear that's a problem. (Shorter: benchmarks, please?) > On Aug 10, 2019, at 12:57 PM, John Rose wrote: >> >> - As a fast path for deeper equality comparisons (a == b || a.equals(b)), since the contract of equals() requires that == objects are equals(). > > This is what I call L.I.F.E., the Legacy Idiom For Equality. ID== is good here too. FAST== would be fine here, and a JIT could perform that strength reduction if it notices that the == expression is post-dominated by Object.equals (on the false path). I think that?s usually detectable. > Major caveat for this kind of optimization: it relies on a "well-behaved" 'equals' method. If 'equals' can thrown an exception or have some other side effect (even indirectly) when a == b, we can't just blindly execute that code. Maybe the optimization you envision is able to cope with these possibilities. JIT is a mystery to me. But it seems like something that needs careful attention. From brian.goetz at oracle.com Mon Aug 12 18:34:01 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 12 Aug 2019 14:34:01 -0400 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> Message-ID: <9eb5d9b0-ab80-d338-85d0-4e6f16dd99ac@oracle.com> > > (Shorter: benchmarks, please?) +1. I'll let Sergey answer in more detail here, since he's closest to it, with the caveat that we've not really tried to optimize the substitutibility test in the JIT.? The biggest concern is not so much that new code will suffer, but that old code which never knew about, and will never care about, values will suffer.? (Much of this can be mitigated by aggressive speculation, but this is not free either.) > >> On Aug 10, 2019, at 12:57 PM, John Rose > > wrote: >>> >>> - As a fast path for deeper equality comparisons (a == b || >>> a.equals(b)), since the contract of equals() requires that == >>> objects are equals(). >> >> This is what I call L.I.F.E., the Legacy Idiom For Equality. ?ID== is >> good here too. ?FAST== would be fine here, and a JIT could perform >> that strength reduction if it notices that the == expression is >> post-dominated by Object.equals (on the false path). ?I think that?s >> usually detectable. >> > > Major caveat for this kind of optimization: it relies on a > "well-behaved" 'equals' method. If 'equals' can thrown an exception or > have some other side effect (even indirectly) when a == b, we can't > just blindly execute that code. > > Maybe the optimization you envision is able to cope with these > possibilities. JIT is a mystery to me. But it seems like something > that needs careful attention. > I believe much of the "is this legal" ground here has been covered by previous attempts to intrinsify various methods (though this would probably the first time we apply this reasoning to user-overridable methods.)? Essentially, this would be using invariants of the specification to enable certain transformations. Specifically, the specification https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/lang/Object.html#equals(java.lang.Object) clearly says that for a non-null reference x, x.equals(x) must be true.? Can we use that to optimize `x==y || x.equals(y)`?? I can see the arguments on both sides. From daniel.smith at oracle.com Mon Aug 12 18:40:56 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 12 Aug 2019 12:40:56 -0600 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <9eb5d9b0-ab80-d338-85d0-4e6f16dd99ac@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> <9eb5d9b0-ab80-d338-85d0-4e6f16dd99ac@oracle.com> Message-ID: > On Aug 12, 2019, at 12:34 PM, Brian Goetz wrote: > > I believe much of the "is this legal" ground here has been covered by previous attempts to intrinsify various methods (though this would probably the first time we apply this reasoning to user-overridable methods.) Essentially, this would be using invariants of the specification to enable certain transformations. Specifically, the specification > > ??? https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/lang/Object.html#equals(java.lang.Object) > > clearly says that for a non-null reference x, x.equals(x) must be true. Can we use that to optimize `x==y || x.equals(y)`? I can see the arguments on both sides. "I didn't get the boolean value I expected because my 'equals' method doesn't follow the specified contract" is one thing. What I'm more concerned with is "I'm getting an exception from a method that I never called, but only when -XXfoobar is turned on". From forax at univ-mlv.fr Mon Aug 12 19:12:38 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Mon, 12 Aug 2019 21:12:38 +0200 (CEST) Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <05253002-9c5e-902d-285d-a42affe8bf48@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <440965400.64542.1565619780321.JavaMail.zimbra@u-pem.fr> <05253002-9c5e-902d-285d-a42affe8bf48@oracle.com> Message-ID: <206106430.84918.1565637158262.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Lundi 12 Ao?t 2019 19:37:41 > Objet: Re: Equality for values -- new analysis, same conclusion >> I think we should take a step back on that subject, >> because you are all jumping to the conclusion too fast in my opinion. >> >> Let starts by the beginning, >> the question about supporting == on inline type should first be guided by what >> we should have decided if inline types were present from the inception of Java, >> it's the usual trick when you want to retcon a feature. > > This is a good question, and worth discussing. > >> If we had inline types from the beginning, i believe we will never had allowed >> == on Object, the root type of the hierarchy, but have a special method call >> that will only work on indirect type like in C#. > Talk about jumping to conclusions too fast :)? This is surely one of the > options, but by far not the only. > > If we had the benefits of hindsight (both for how Java is used, and > where it was going), we might instead have chosen the following total > operators: > > ?- `==` is a substitutibility test for all types > ?- `===` delegates to equals() for all class types, and == for > primitives (let's not discuss this further, as it is separable and > surely not the problem on the table.) > > Note that on "traditional" object references, Object== _is already_ a > substitutibility test.? In fact, on every type that `==` is defined > today, it is a substitutibility test (modulo NaN.)? So while we might > have chosen a different path back then, we can still choose a path that > is consistent with where we might have gone, by extending == to be a > substitutibility test for the new types.? This also seems the path of > least astonishment. and here we disagree, first, you don't have to extend ==, you can let it die. then Object== on an indirect types is not a substitutibility test, you can have two strings that are equals when calling equals() but not when calling Object==, so it's a substitutibility test only when it returns true otherwise, you don't know. > > The problem with Object== is not that it is unsound, it's that it is > _badly overused_.? This largely comes from coding conventions set very > early in Java's lifetime, such as using `==` as a quick check both in > the implementation of `equals()` methods, and before calling `equals()` > (e.g., `x == y || x.equals(y)`).? And this overuse comes from > performance assumptions from the Java 1.0 days, which were that > everything was interpreted and virtual method calls like equals() were > super-expensive.? This was true for the first few years, but in > hindsight, these coding patterns are the boat anchor, not the semantics > of Object== itself.? These patterns went from "necessary for > performance" to "useless but harmless", and it is their harmlessness > that has allowed them to survive. > > Also, let's be honest: the sole reason we're having this conversation is > that we are concerned about the performance impact. That should surely > be considered, but letting that dictate the semantics of language-level > equality would be an extremely risky move -- and something we should > consider with the utmost of care and skepticism. Don't use ==, use equals, it's something i repeat over and over (and over) to my students, you want to test equality, use equals. > >> i propose >> - to banned V== (compile error) >> - to make Object==and T== emit a compiler warning explaining that the code >> should be changed >> - add a method System.identityEquals(RefObject, RefObject) as replacement > > I get why this is attractive to you, but I think it will be a constant > source of confusion to users.? First, we've told users that one of the > key use cases for value types is numerics.? Numerics are frequently > compared for equality.? That users can't use `==` on numeric values at > all will surely be a puzzlement, and not just once per user.? (There are > things that we can explain to users, and they'll say "OK, I don't like > it but I get it", but if we try to explain to them why they can't > compare two Float16s for equality, their eyes will likely glaze over and > will say "you guys have gone off the deep end.") don't use ==, use equals. > > Further, many algorithms need to use == to say "have I reach the > sentinel value" or "is this the element I am looking for."? In > performance-sensitive code, users want to use == in preference to > equals().? This again will be a source of puzzlement. either you are using indirect types and you can use System.identityEquals or you have to find a creative way to mark an inline object as the sentinel, by example C# equivalent of HashMap uses the sign bit of the field hashCode of the inline object corresponding to an entry of the hashtable to mark the entry as a sentinel. > > So this approach, while viable, has a much higher cognitive-load cost > than you are imagining.? (Yes, you could say "when we have operator > overloading, it won't be a problem."? Given that this is not coming for > a while, if at all, I don't see this as an answer. And again, let's not > discuss this now, as it is a distraction.) > > I do agree that we should seek to discourage the over-use of `==` > through compiler warnings and other tools.? But I think that's a > separate and separable problem. we are moving to a world with 3 kinds of values (primitive, indirect, inline), the way to go back to a 2 kind of values is to retcon primitive types as inline types, if we achieve that we will be able to call .equals() on primitive too, making the last "safe" usage of == disappearing. > >> Now, the second thing that disturb me is that no email of this thread, lists the >> two issues of the substitutibility test that make it unsuitable as an >> implementation of Object==. >> - it's not compatible with the primitive == on float and double, by example, > > I think this is mostly a "whatabout" argument.? Yes, it's irritating. > Yes, it's tiring to keep saying "modulo NaN".? Yes, it was probably a > mistake.? But given the choice between: > > ?- NaN is so weird that we should just treat it as a removable > discontinuity > ?- See, NaN does it, so we have precedent, and now can do it wherever > we like > > there's a reasonable choice, and an insane choice.? The reason no one > has brought it up is because no one wanted to advocate for the insane > choice.? That seems sane to me. > >> has the stupid property of having == being true and equals() being false if >> value is NaN. > > Yes, its stupid.? Do we want to say "oops, we made a mistake there", or > emulate that mistake forevermore? It exposes a flaw in the proposed implementation of the substitutibility test and - it shows that we will have to change the definition of Object.equals to force it to be more precise than the substitutibility test, something not required by the current javadoc/spec. - it's the kind of issue that may make our live miserable when we will want to see the primitive types as inline types. > >> - it can be really slow >> 1) Object== can be megamorphic >> 2) Object== can do a recursive call >> so it destroys the assumption that Object== is faster than equals. > > This has been discussed extensively, so it puzzles me why you think it > hasn't been discussed.? Yes, this is a big concern.? Yes, we should look > for ways to mitigate this.? Yes, we should seek to discourage the > rampant overuse of ==, and to the extent that the performance model has > shifted, educate users about the new performance model.? But it is not, > in itself, an argument why we should pick the wrong (or no) semantics > for val==. if the perf model has shifted, that why we should not try to provide a remotely useful val==, otherwise more people will start to use more == > >> so the only choice we have is to return false is the left or the right operand >> is an inline type. > > OK, my turn to be disturbed.? Yes, this is a valid choice, and we can > discuss it.? But to claim that it is the only choice ... well, to > misquote Lord Vader: "I find your lack of imagination ... disturbing." > > ??? https://youtu.be/m0XuKORufGk?t=20 It's the only choice if you agree that we want to demote ==. R?mi From brian.goetz at oracle.com Mon Aug 12 19:47:00 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 12 Aug 2019 15:47:00 -0400 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <206106430.84918.1565637158262.JavaMail.zimbra@u-pem.fr> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <440965400.64542.1565619780321.JavaMail.zimbra@u-pem.fr> <05253002-9c5e-902d-285d-a42affe8bf48@oracle.com> <206106430.84918.1565637158262.JavaMail.zimbra@u-pem.fr> Message-ID: > and here we disagree, > first, you don't have to extend ==, you can let it die. I agree that is a possibility (and I've been quite clear about this).? Where I don't agree is that this is the _only_ sane possibility, nor do I agree that this is somehow intrinsically desirable.? You've flat-out assumed that it is, and gone running from there, which is a totally fair opinion, but rates a zero on the persuasion scale.... So, if you want to make this case, start over, and convince people that Object== is the root problem here. From brian.goetz at oracle.com Mon Aug 12 20:29:46 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 12 Aug 2019 16:29:46 -0400 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <206106430.84918.1565637158262.JavaMail.zimbra@u-pem.fr> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <440965400.64542.1565619780321.JavaMail.zimbra@u-pem.fr> <05253002-9c5e-902d-285d-a42affe8bf48@oracle.com> <206106430.84918.1565637158262.JavaMail.zimbra@u-pem.fr> Message-ID: <9a58c13c-81e9-09d0-b7e1-e34db16b3929@oracle.com> Second pass... > then Object== on an indirect types is not a substitutibility test, you can have two strings that are equals when calling equals() but not when calling Object==, so it's a substitutibility test only when it returns true otherwise, you don't know. Maybe this is part of the disconnect: you are using the term "substitutibility" incorrectly.? Two identity objects that are .equals() but not == are _not_ substitutible, because they have an observable difference -- their identity.? Substitutibility means "you can't discern any difference."? For example, when comparing ints, all "ones" are interchangeable, regardless of what memory location or register they are stored in.? But when comparing Integers, regardless of their content, they have an extra component of state -- the identity -- which is observable through a number of means.? So two distinct (!=) Integer objects are not substitutible, even if they describe the same Integer. On all types where == is defined in Java (modulo NaN), == is currently a substitutibility test, because two object references are only substitutible if they are the same object. From brian.goetz at oracle.com Mon Aug 12 21:02:38 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 12 Aug 2019 17:02:38 -0400 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> Message-ID: <4691f7f0-141a-c2ca-3bef-f0ce3ba49a69@oracle.com> >> Where this world runs into more trouble is with specialized generics; we?d like to treat specialized Foo as being generic in ?T extends Object?, which subsumes values. This complicates things like bound computation and type inference, and also makes invoking Object methods trickier, since we have to do some sort of reasoning by parts (which we did in M3, but didn?t like it.) > Yikes, I don?t understand these particular issues fully, but they sound scary. If we have a class ??? class Foo { ??????? T t; ??????? boolean equals(Object o) -> (o instanceof Foo oo) && o.t == oo.t; ??? } Today, this is an erased class, so when we instantiate Foo, we get Object== semantics for the last comparison.? But tomorrow, when we make this a specialized class, Foo will get specialized with val== for the latter comparison, since `this.t` will be statically typed to `V` rather than `Object`.? If we assign anything other than SAME== to Object==, then upon specialization, Foo and Foo will have different equality semantics.? (Remi will say that the bug here is using ==; this is neither wrong nor right, but simply evidence that there is more than one way to attack this problem.) > Part of the job of splitting the difference between int-nature and Object-nature involves deciding what to do with box/unbox conversions (which are part of int-nature). Our plan, ideally, is to demote the boxes, in favor of a lighter-weight boxing conversion to a value box.? Then, the object-primitive divide effectively goes away, and we are left with everything is an object.? (Perhaps it is a mistake to reuse the term "box" as it is laden with the old associations of "accidental identity" and "slow".) > The ?gravity? of the situation prompts us to click into a tight orbit around either ?boxes and unboxes like an int? or ?subtypes Object like a class?. Maybe there?s a halfway (Lagrange) point out there somewhere, such as ?unboxes like an int and subtypes Object like a class?, where unboxing artifacts show up (as a special form of cast) but not boxing artifacts. In fact, that's where we currently are with the eclair story.? V converts to V.Box via subtyping, but V.Box converts to V (when erased values hit sharply-typed client code) via unboxing. > Looking very closely at the current cycle of int-box-Integer-unbox-int, maybe we can limit the irregularities (the schism mentioned above) in some way that allows us to ?blame? them on the unbox step. After all, the ?codes like a class? part only implies one direction of type conversion, from some V to Object or V.Box (the eclair interface). Going the other way, from V.Box to V, is a pure ?works like an int? move, since Java does not allow implicit down-casting, only unboxing conversions. And we can view V.Box-to-V as an unbox *as well as* (or maybe *instead of*) a down-cast. That pretty much describes the erased-generics-via-eclairs story. > This suggests a new category of syntactic context for operator==, V.Box. So we have val==, val.box==, and Object==. With the same caveat as above: if Val.Box== is not the same as Val==, on specialization, we risk observing semantic changes.? And if it's subtyping, not boxing, I think its a forced move. > So I?ll repeat that SAME== saves us a lot of trouble, since it is already compatible with prim== and Object== (except for float== and double==). If we can use SAME== (except for double== and float==) we win. And you find yourself with me in the bottom of the gravity well. Hi, friend! From daniel.smith at oracle.com Tue Aug 13 17:03:19 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 13 Aug 2019 11:03:19 -0600 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> Message-ID: <42221A76-5434-4E41-B7D6-75D4C8BA28FE@oracle.com> > On Aug 12, 2019, at 12:17 PM, Dan Smith wrote: > >> On Aug 10, 2019, at 12:57 PM, John Rose > wrote: >>> >>> - As a fast path for deeper equality comparisons (a == b || a.equals(b)), since the contract of equals() requires that == objects are equals(). >> >> This is what I call L.I.F.E., the Legacy Idiom For Equality. ID== is good here too. FAST== would be fine here, and a JIT could perform that strength reduction if it notices that the == expression is post-dominated by Object.equals (on the false path). I think that?s usually detectable. >> > > Major caveat for this kind of optimization: it relies on a "well-behaved" 'equals' method. If 'equals' can thrown an exception or have some other side effect (even indirectly) when a == b, we can't just blindly execute that code. > > Maybe the optimization you envision is able to cope with these possibilities. JIT is a mystery to me. But it seems like something that needs careful attention. Want to expand on this, because I'm not sure my description was clear, and it seems like a serious constraint. Example: inline class Dollar { public static int EQUALS_COUNTER = 0; public final long cents; public Dollar(long cents) { this.cents = cents; } boolean equals(Object o) { if (!(o instanceof Dollar)) return false; EQUALS_COUNTER++; return cents == ((Dollar) o).cents; } } class Test { public static int COMPARE_COUNTER = 0; static void compare(Dollar d1, Dollar d2) { if (d1 == d2 || d1.equals(d2)) COMPARE_COUNTER++; } public static void main(String... args) { Dollar d1 = new Dollar(100); Dollar d2 = new Dollar(100); compare(d1, d2); System.out.println("compare: " + COMPARE_COUNTER + "; equals: " + EQUALS_COUNTER); } } Using SAME== semantics, the output should be: compare: 1; equals: 0 I worry that, for certain optimization strategies that claim to implement SAME== semantics, the output will be: compare: 1; equals: 1 From daniel.smith at oracle.com Tue Aug 13 17:18:41 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 13 Aug 2019 11:18:41 -0600 Subject: Collapsing the requirements In-Reply-To: References: Message-ID: > On Aug 3, 2019, at 10:37 AM, Brian Goetz wrote: > > For sake of exposition, let?s say this is called `C.Box` ? and is a legitimate inner class of C (which can be generated by the compiler as an ordinary classfile.) We?ve been here before, and abandoned it because ?Box? seemed misleading, but let?s call it that for now. And now it is a real nominal type, not a fake type. In the simplest case, merely declaring an inline class could give rise to V.Box. > > Now, the type formerly known as `V?` is an ordinary, nominal interface (or abstract class) type. The user can say what they mean, and no magic is needed by either the language or the VM. Goodbye `V?`. What if the spelling of "V.Box" were "V?" Serious point: seems like the way we spell V.Box is orthogonal to how we implement it. The crux of what you're proposing here is to implement V.Box in the compiler, removing the requirement that the JVM support the type LV;, and living with some resulting limitations. Syntax can be set aside as a separate discussion. From john.r.rose at oracle.com Tue Aug 13 18:19:58 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 13 Aug 2019 11:19:58 -0700 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> Message-ID: On Aug 12, 2019, at 11:17 AM, Dan Smith wrote: >> >> This is what I call L.I.F.E., the Legacy Idiom For Equality. ID== is good here too. FAST== would be fine here, and a JIT could perform that strength reduction if it notices that the == expression is post-dominated by Object.equals (on the false path). I think that?s usually detectable. >> > > Major caveat for this kind of optimization: it relies on a "well-behaved" 'equals' method. If 'equals' can thrown an exception or have some other side effect (even indirectly) when a == b, we can't just blindly execute that code. > > Maybe the optimization you envision is able to cope with these possibilities. JIT is a mystery to me. But it seems like something that needs careful attention. Yes it does. I think we need to consider (a) classifying equals methods which are well behaved and maybe (b) allowing the spec to open up loopholes for inexact execution of equals as a way of simplifying online classification. The JIT optimization requires skipping the SAME== check if the equals method will also carry that burden. For that to be valid we need to ensure that applying equals of the same value to itself is a constant true with no side effects. This is easy to argue for IMO since that is part of the contract of Object::equals. What?s harder is to carve the spec so that equals methods which violate it can be covered within the limits of the spec. I?d like the JIT to have latitude under the JVMS to run such methods at will after SAME== is true and accept their side effects as part of the indeterminate behavior that occurs when an object fails its contract. If I can?t get that latitude, there are other things we can do to special case Object::equals, but more transparently to the JVMS. The simplest is to compile two versions, one fused with SAME== and one not. The first one replaces uses of LIFE, transparently. No spec impact. Maybe the compiled method has two entry points. Call the two compile units C::equals and C::equals/LIFE. The latter prepends a SAME== rest to the former. Since the prepended test is now in the same compilation unit, it can be fused and combined with the user-written equals logic. It won?t be executed redundantly out of line. From daniel.smith at oracle.com Tue Aug 13 20:39:25 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 13 Aug 2019 14:39:25 -0600 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <131B9329-ECE7-46BE-909F-AC88421FF6CB@oracle.com> <96650203-B2F7-4FAE-B999-BE7503FC9B02@oracle.com> Message-ID: <8CBC4EFF-9FFC-42A0-BF51-B8068590D58C@oracle.com> > On Aug 13, 2019, at 12:19 PM, John Rose wrote: > > The JIT optimization requires skipping the SAME== check if the equals method will also carry that burden. For that to be valid we need to ensure that applying equals of the same value to itself is a constant true with no side effects. This is easy to argue for IMO since that is part of the contract of Object::equals. I don't think Object.equals has anything to say about side effects? > (a) classifying equals methods which are well behaved What makes me nervous here: whoever does the classification needs to be very careful about things like exceptions. Once you do something more than 'instanceof', 'getfield', and '==', things get pretty fuzzy quickly (e.g., can I guarantee that some accessor method I call in a third party API won't NPE because of some internal state?) From frederic.parain at oracle.com Wed Aug 14 20:13:27 2019 From: frederic.parain at oracle.com (Frederic Parain) Date: Wed, 14 Aug 2019 16:13:27 -0400 Subject: Leaking instance of uninitialized classes Message-ID: <1D9FEC40-BF09-45BD-8F25-312F50997139@oracle.com> This is a follow up on the discussion about static inline fields and the risk of having a default value escaping while the inline class is uninitialized (the problematic scenario being when the class fails to initialize properly). This situation already exists today, it is possible to produce it with this code for instance (the discussion continues after the code): import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; public class UninitilizedClassTest { public static A a; public static boolean b = true; static class A { public int i; public Object o; static long l; static { UninitilizedClassTest.a = new A(); if (UninitilizedClassTest.b) { throw new Error(); } } static void static_print() { System.out.println("Hello!"); } void nonstatic_print() { System.out.println("i=" + i); } } public static void main(String[] args) { try { A test_a = new A(); } catch(Throwable t) { } System.out.println("Accessing instance fields"); System.out.println("a.o=" + a.o); System.out.println("Invoking non-static methods"); a.nonstatic_print(); System.out.println("Accessing static fields"); try { System.out.println("A.l=" + A.l); } catch (Throwable t) { t.printStackTrace(); } System.out.println("Invoking static methods"); Method m = null; try { m = a.getClass().getDeclaredMethod("static_print", null); } catch (NoSuchMethodException e) { e.printStackTrace(); } try { m.invoke(null); } catch (IllegalAccessException e) { e.printStackTrace(); } catch (InvocationTargetException e) { e.printStackTrace(); } } } This situation is known, and the way the JVM deals with it is: - access to the leaked instance (fields, methods) is fine - access to the static context of the class which failed to initialized properly (fields, methods, new) throws an exception Here?s the output of the program above: Accessing instance fields a.o=null Invoking non-static methods i=0 Accessing static fields java.lang.NoClassDefFoundError: Could not initialize class UninitilizedClassTest$A at UninitilizedClassTest.main(UninitilizedClassTest.java:36) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.intellij.rt.execution.application.AppMainV2.main(AppMainV2.java:131) Invoking static methods Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class UninitilizedClassTest$A at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at UninitilizedClassTest.main(UninitilizedClassTest.java:48) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.intellij.rt.execution.application.AppMainV2.main(AppMainV2.java:131) What is new with static inline fields? It just becomes simpler to produce this scenario. With regular classes, it requires a faulty static initializer to leak an instance to a publicly accessible place before the class initialization fails to complete. With static inline classes, any static inline field could potentially produce this situation. Do we want to provide more protection for inline classes than we actually provide for identity classes? The question is why? Should we provide compile time warnings? They are likely to generate a lot of false positive. Last time I?ve investigated this issue and discussed it with Alex Buckley, I came to the conclusion that it was not necessary to do additional work for inline classes. But the EG can have a different opinion. Fred From leoneldossantosjeronimo at gmail.com Thu Aug 15 20:02:22 2019 From: leoneldossantosjeronimo at gmail.com (leonel dossantos) Date: Thu, 15 Aug 2019 21:02:22 +0100 Subject: No subject Message-ID: -te 23 From forax at univ-mlv.fr Sat Aug 17 12:08:08 2019 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Sat, 17 Aug 2019 14:08:08 +0200 (CEST) Subject: Equality for values -- new analysis, same conclusion In-Reply-To: References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <440965400.64542.1565619780321.JavaMail.zimbra@u-pem.fr> <05253002-9c5e-902d-285d-a42affe8bf48@oracle.com> <206106430.84918.1565637158262.JavaMail.zimbra@u-pem.fr> Message-ID: <860271596.506234.1566043688770.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Lundi 12 Ao?t 2019 21:47:00 > Objet: Re: Equality for values -- new analysis, same conclusion >> and here we disagree, >> first, you don't have to extend ==, you can let it die. > > I agree that is a possibility (and I've been quite clear about this). > Where I don't agree is that this is the _only_ sane possibility, nor do > I agree that this is somehow intrinsically desirable.? You've flat-out > assumed that it is, and gone running from there, which is a totally fair > opinion, but rates a zero on the persuasion scale.... not zero, i was able to convince Stephen it seems :) > > So, if you want to make this case, start over, and convince people that > Object== is the root problem here. Object== is not the root of the problem, Object== becomes a problem when we have decided lword, when at the end, every types is a subtype of Object, because this is what lworld is. == has been created with ad hoc polymorphism in mind (overload polymorphism is a better term BTW), let say your are in Java 1.0 time, you have a strong rift between objects and primitive types, and no super type in between them, the way be able to write polymorphic code is to use overloading, so you have println(Object)/println(int)/println(double) etc. But it's not enough, so in 1.1 you introduce the wrapper types, Integer, Double etc, because you can not write reflection code without being able to see a primitive value as an Object. Here, we are doing the opposite, since we have decided to use lworld, Object is the root of every things, indirect types obviously, inline types too. We also know that in the future, we don't want to stay in a 3 kinds of types world. So we have to retrofit primitive types to see them as inline types. By doing this, we are also saying that every types has now Object has its root type. In this brave new world, val== makes little sense, because it's introducing a new overload in a world where you have subtyping polymorphism so you don't need overload polymorphism anymore. For an indirect type, the way to test structural equality is to use equals(), if every types is a subtypes of Object, the logical move for me is to say, use equals() everywhere and to stop using ==. So having a useful val== or a useful Object== goes in the wrong direction, we should demote == and look to the future*. R?mi * and it's very intellectually satisfactory to have a solution which means that our users will have less thing to learn instead of more, i'm thrill that there will be a time where my students will be able to use .equals on a primitive types. From brian.goetz at oracle.com Tue Aug 20 17:14:00 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 20 Aug 2019 13:14:00 -0400 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <860271596.506234.1566043688770.JavaMail.zimbra@u-pem.fr> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <440965400.64542.1565619780321.JavaMail.zimbra@u-pem.fr> <05253002-9c5e-902d-285d-a42affe8bf48@oracle.com> <206106430.84918.1565637158262.JavaMail.zimbra@u-pem.fr> <860271596.506234.1566043688770.JavaMail.zimbra@u-pem.fr> Message-ID: > We also know that in the future... So, let's pull on this string, because now we're talking about the right thing -- what Java do we want to have in the future, even if we can only take one step now. ?First, today. But before that, a digression on terminology. ?While the terminology is not nailed down (and please, start a separate thread if you want to comment on that), the word ?value? is problematic, but its hard to break the habit. ?For purposes of this mail, a value is _any_ datum that can be stored in a variable: primitives, object references, and soon, instances of inline classes. ?Similarly, the term ?object reference? is problematic, because it is laden with overtones of identity. ?So, for purposes of this mail: ?- value: any datum ?- inline class: what we use to call value classes ?- identity class: what we used to call classes ?- class: an identity or inline class ?- object instance: instance of a class, whether identity or inline ?- object reference: a reference to an identity class A variable of type Object (or interface) may hold _either_ an object reference, or an instance of an inline class, or null (this is the?confusing new thing). Note that all values are still passed by value: primitives, object references, and instances of inline classes. I?ll try to use these consistently, but I?ll likely fail. Primitives* have a well-defined equivalence relation: do the two operands describe the exact same value (SAME==).? And it is super-useful.? And, it is really the only useful equivalence on primitives.? Conveniently, we have assigned this the operator `==`.? No one argues with this move. Where things get dodgy is that objects (which historically have always been described through object references) have TWO well-defined, and useful,?equivalence relations: ?- Do the two operands refer to the same object instance (SAME==), denoted by Object==; ?- Are the two objects??equivalent? in the sense defined by their author, denoted by .equals(). ?Let?s call this??equivalence?. Both are useful, so we can?t get rid of either. ?Identity comparison has semantic uses (e.g., topology-aware code like IdentityHashMap, or comparing with sentinels in data structures).? It it also used as an optimization, a faster way to get to equality, and this optimization has unfortunately outlived its usefulness but not outlived its use. Obviously equivalence is useful, and in most cases, the more generally useful of the two, but for better or worse, identity comparison got?custody of the operator `==`. This might have been a questionable move, but it's what we've got, and we're surely not un-assigning this. Taking primitives and objects together, despite the very visible seam between them, the == operator partially heals the seam by working across all types, and assigning a consistent meaning across all?types: SAME== ("are you the exact same thing", where same-ness can incorporate identity.) ?Some may feel this was a mistake or an accident of history, and it might have been, but the outcome has a?sense to it: `==` has a consistent?meaning (SAME==) over all data types. The part that is uncomfortable is that what's been totalized is the less broadly useful equivalence. ?We can be aware of this, and try to do better, but as I?ve observed before, wanting to fix mistakes of history often leads us into new, worse mistakes, so let?s not fixate on?this. I?ll note at this point (and come back to it later) than just as we have some control over what `==` means for inline instances, we _also_ have some control over what `.equals()` means for primitives. OK, now we are adding inline classes to the mix. ?Many of these, like Complex or Point, are like primitives -- they only have one sensible equality semantics -- do they represent the same number.? This is suitable for binding to ==, or .equals()?? or better, both. But there are also other values which are more complicated, because they contain potentially-but-not-necessarily-identityful data, like: ??? inline class Holder { Object o; } This is the conundrum of L-World.? (The irritating part is that these are the values we are spending all our time talking about, even though they will not be the most common ones.) Like with classic objects, for such classes, of the two equivalence relations ("exactly the same", or "semantically the same"), the former is generally the less useful.? And so, were we rewriting history, we might bound the "good" syntax to .equals() here too, and relegated the less useful test to some other uglier API point or operator. ?But again,?let?s not let this distract us. In the future, we?ll have?primitives,?identity objects, and inline objects, and we?d like not only to not have three things, but we?d like to not have two things. ?So we?d like to have a total story for comparing them all. Our story for primitives (but please, let?s not get too distracted on this now), is that primitives?can be??boxed? to inline classes, which will be lighter-weight boxes than our current boxes. ?And we can lift members and interfaces from the box to the primitives, so that (say) int can be seen to implement Comparable and Serializable, and have whatever methods the lightweight box has?? such as equals(). ?Which means that equivalence interpretation can be totalized via Object::equals?? primitives, identity objects, and inline?objects can all have an equals() method. ?And of course, for primitives, equals() and == will be the same* thing. So, in the happy future,?there will be a total operation that implements the desirable equality comparison. ?(Which is important for specializable generic code, since this operation on a T must be available on all the types that can instantiate T.) Or, as you say: >?don't use ==, use equals. I agree, but here?s the difference in the approaches: we don?t have to punish == to make it less desirable; we can raise equals() up and make it more desirable. But we?re not done with val==. ?For the same reason that id== is still useful, if overused, on?references, it is useful on values that hold potential references too. ?Yes, it is unfortunate that the weaker claimant (SAME==) got the good syntax. ?But we still need a way to denote this operation, and it would be even worse (IMO, far worse) than the status quo to say??well, we write SAME== for identity objects one way, but a different way for inline objects, even though you can put both?in an Object."? So even given the above, it _still_ seems like a sensible (if not forced) move to extend the current meaning of ==?? SAME==?? to the new types. ?Then everything is total, and everything is consistent: ? - == means??are the two operands the same value" (indistinguishable); ? - equals() means??are the two operands semantically?equivalent? and both are total, working on primitives, references, and inline instances alike. ?(As mentioned earlier, we can also later ? but?absolutely not now?? explore whether equals() merits a better syntax.) Your agenda here (which I agree with) is to lessen the importance of ==. ?Where I disagree is that we should do so by making == harder to use. ?Instead, I think we should do so by making the better alternatives easier to use, and educating people about the changed object model and performance?reality. (I?m still not sure whether exposing V <:?Object, rather than V convertible-to Object, sets the right user model here?? but that?s a separate discussion.) *Curse you, NaN. >> So, if you want to make this case, start over, and convince people >> that Object== is the root problem here. > Object== is not the root of the problem, Object== becomes a problem > when we have decided lword, when at the end, every types is a subtype > of Object, because this is what lworld is. == has been created with ad > hoc polymorphism in mind (overload polymorphism is a better term BTW), > let say your are in Java 1.0 time, you have a strong rift between > objects and primitive types, and no super type in between them, the > way be able to write polymorphic code is to use overloading, so you > have println(Object)/println(int)/println(double) etc. But it's not > enough, so in 1.1 you introduce the wrapper types, Integer, Double > etc, because you can not write reflection code without being able to > see a primitive value as an Object. Here, we are doing the opposite, > since we have decided to use lworld, Object is the root of every > things, indirect types obviously, inline types too. We also know that > in the future, we don't want to stay in a 3 kinds of types world. So > we have to retrofit primitive types to see them as inline types. By > doing this, we are also saying that every types has now Object has its > root type. In this brave new world, val== makes little sense, because > it's introducing a new overload in a world where you have subtyping > polymorphism so you don't need overload polymorphism anymore. For an > indirect type, the way to test structural equality is to use equals(), > if every types is a subtypes of Object, the logical move for me is to > say, use equals() everywhere and to stop using ==. So having a useful > val== or a useful Object== goes in the wrong direction, we should > demote == and look to the future*. R?mi * and it's very intellectually > satisfactory to have a solution which means that our users will have > less thing to learn instead of more, i'm thrill that there will be a > time where my students will be able to use .equals on a primitive types. From brian.goetz at oracle.com Tue Aug 20 20:09:25 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 20 Aug 2019 16:09:25 -0400 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> Message-ID: <4CFB8A97-62EF-46D5-98C3-706D8CBF003E@oracle.com> The latest proposal at the tip of this thread lands in the following place for the LANGUAGE: - Totalize `==` over all types to mean SAME== (as it already does for all existing types today); - Totalize `.equals()` over all types to mean EQ==, which means adding primitives to the .equals() game The result is two total equivalence relations, each with semantics that are useful in some situations. At the same time, we can start to discourage excessive use of Object== if the compiler can?t prove that both operands are identity class instances, by issuing warnings. Now, that leaves the question of what to do in the VM, and how to bridge the two (via the translation strategy.) For the purposes of this message, we?ll assume the first of the two paths, where V <: Object. > If we say that == is substitutability, we still have the option to translate == to something other than ACMP. Which means that existing binaries (and likely, binaries recompiled with ?source 8) will still use ACMP. If we give ACMP the ?false if value? interpretation, then existing classifies (which mostly use == as a fast-path check) will still work, as those tests should be backed up with .equals(), though they may suffer performance changes on recompilation. This is an uncomfortable compromise, but is worth considering. Down this route, ACMP has a much narrower portfolio, as we would not use it in translating most Object== unless we were sure we were dealing with identityful types. Currently, Object== translates to ACMP byte codes, which have ID== semantics. The VM folks understandably want to avoid perturbing ACMP, especially for legacy code. We have the option to translate Object== differently between (say) ?target 14 and ?target 15, where we translate to ACMP for pre-valhalla language levels and to something else for post, where ACMP retains the ?false if either operand is a value? semantics, and the new target means ?SAME==?. This is a tradeoff between not creating performance potholes for legacy code which does not use values, and creating a discontinuous behavior when migrating old code forward. It means code compiled for later JVMs will use a more refined implementation of Object==. If we believe its acceptable to return false always, it should also be acceptable to return false _sometimes_ but return true when the two values are not externally distinguishable.) Cue Dan to say: ?OK, do we have benchmarks that differentiate between the ?false if value? and ?deep == if value? options? Do we have reason to believe that the former is better enough to risk the discontinuity? Assuming the benchmarks bear out the sense of doing so, where we end up is that ACMP becomes the ID== operator, and we move the SAME== operator somewhere else (either to a new byte code, or an intensified static method.) From daniel.smith at oracle.com Wed Aug 21 06:00:31 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 21 Aug 2019 00:00:31 -0600 Subject: Equality for values -- new analysis, same conclusion In-Reply-To: <4CFB8A97-62EF-46D5-98C3-706D8CBF003E@oracle.com> References: <90C17B62-5968-4352-825E-4F1EFAC81052@oracle.com> <4CFB8A97-62EF-46D5-98C3-706D8CBF003E@oracle.com> Message-ID: > On Aug 20, 2019, at 2:09 PM, Brian Goetz wrote: > > The VM folks understandably want to avoid perturbing ACMP, especially for legacy code. We have the option to translate Object== differently between (say) ?target 14 and ?target 15, where we translate to ACMP for pre-valhalla language levels and to something else for post, where ACMP retains the ?false if either operand is a value? semantics, and the new target means ?SAME==?. This is a tradeoff between not creating performance potholes for legacy code which does not use values, and creating a discontinuous behavior when migrating old code forward. It means code compiled for later JVMs will use a more refined implementation of Object==. If we believe its acceptable to return false always, it should also be acceptable to return false _sometimes_ but return true when the two values are not externally distinguishable.) > > Cue Dan to say: ?OK, do we have benchmarks that differentiate between the ?false if value? and ?deep == if value? options? Do we have reason to believe that the former is better enough to risk the discontinuity? +1 to strawman Dan. I'd also add that if the performance is so bad that we feel the need to give users an opt-out, an opt-out that disappears when you recompile your code is pretty unsatisfying. Suggestion: let's be more precise about what we mean by "legacy code", and what sort of expectations we have. Something like: - Old bytecode, no inline classes: no performance regression (type profiling ought to solve this?) - Recompiled source, no inline classes: same - Old bytecode interacting with inline class instances: minimize behavioral change*, tolerate slowdown compared to equivalent identity classes - Recompiled source interacting with inline class instances: same * Note that SAME== and FAST== both risk behavioral change, but the risk for SAME== is far less?it would involve code that assumes instances of a class can be assumed to be unique and unpublished.