From daniel.smith at oracle.com Wed May 2 20:10:41 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 2 May 2018 14:10:41 -0600 Subject: Nestmates spec updates Message-ID: FYI, as the JVMS changes for nestmates have moved forward toward integration with the official JVMS, there were some substantial updates to improve presentation. For clarity and transparency, I've updated the document that appears in my CR space, along with those attached to JBS issues, to reflect these changes. http://cr.openjdk.java.net/~dlsmith/nestmates.html We also concluded that we should not touch JLS 13.1 in the JLS updates. Revised document: http://cr.openjdk.java.net/~dlsmith/nestmates-jls.html ?Dan From john.r.rose at oracle.com Wed May 2 22:46:32 2018 From: john.r.rose at oracle.com (John Rose) Date: Wed, 2 May 2018 15:46:32 -0700 Subject: Nestmates spec updates In-Reply-To: References: Message-ID: <0F48ED49-5A65-449F-A179-0AFBB4827310@oracle.com> These look good. The changes for clarity greatly improve the spec. I'm glad we're leaving it better than we found it, in this case. Though it took longer to polish the spec. than just hacking in the new attributes, it was well worth the effort, especially to clarify and simplify the troublesome invokespecial instruction. The nestmate specification per se (JVMS 5.4.4) appears to be unchanged since the last time we discussed it. I think these spec. changes are ready for prime time. ? John On May 2, 2018, at 1:10 PM, Dan Smith wrote: > > FYI, as the JVMS changes for nestmates have moved forward toward integration with the official JVMS, there were some substantial updates to improve presentation. For clarity and transparency, I've updated the document that appears in my CR space, along with those attached to JBS issues, to reflect these changes. > > http://cr.openjdk.java.net/~dlsmith/nestmates.html > > We also concluded that we should not touch JLS 13.1 in the JLS updates. Revised document: > > http://cr.openjdk.java.net/~dlsmith/nestmates-jls.html > > ?Dan From karen.kinnear at oracle.com Thu May 3 17:45:52 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Thu, 3 May 2018 13:45:52 -0400 Subject: Nestmates spec updates In-Reply-To: References: Message-ID: <0A41A556-29FF-45AA-9DE2-CD0393954051@oracle.com> Thank you for the updates. They look great. Very much appreciate the ways in which you have made the specification clearer in the process. thanks, Karen > On May 2, 2018, at 4:10 PM, Dan Smith wrote: > > FYI, as the JVMS changes for nestmates have moved forward toward integration with the official JVMS, there were some substantial updates to improve presentation. For clarity and transparency, I've updated the document that appears in my CR space, along with those attached to JBS issues, to reflect these changes. > > http://cr.openjdk.java.net/~dlsmith/nestmates.html > > We also concluded that we should not touch JLS 13.1 in the JLS updates. Revised document: > > http://cr.openjdk.java.net/~dlsmith/nestmates-jls.html > > ?Dan From daniel.smith at oracle.com Thu May 3 22:00:18 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 3 May 2018 16:00:18 -0600 Subject: Nestmates spec updates In-Reply-To: <0A41A556-29FF-45AA-9DE2-CD0393954051@oracle.com> References: <0A41A556-29FF-45AA-9DE2-CD0393954051@oracle.com> Message-ID: > On May 3, 2018, at 11:45 AM, Karen Kinnear wrote: > > Thank you for the updates. They look great. > Very much appreciate the ways in which you have made the specification clearer in the process. > > thanks, > Karen Thank Alex, BTW. This round of changes came from his review and massaging of the text to integrate it with the official JVMS. From john.r.rose at oracle.com Sun May 6 09:17:01 2018 From: john.r.rose at oracle.com (John Rose) Date: Sun, 6 May 2018 02:17:01 -0700 Subject: value type hygiene Message-ID: Like many of us, I have been thinking about the problems of keeping values, nulls, and objects separate in L-world. I wrote up some long-ish notes on the subject. I hope it will help us wrap our arms around the problem, and get it solved. TL;DR: Remi was right in January. We need a ValueTypes attribute. http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html Cheers! ? John P.S. Raw markdown source follows for the record. http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.md # Value Type Hygiene #### May 2018 _(v. 0.1)_ #### John Rose and the Valhalla Expert Group Briefly put, types in L-world are ambiguous, leading to unhygienic mixtures of value operations with reference operations, and uncontrolled pollution from `null`s infecting value code. This note explores a promising proposal for resolving the key ambiguity. It is a cleaner design than the ad hoc mechanisms tried so far. The resulting system would seem to allow more predictable and debuggable behavior, a stronger backward compatibility story, and better optimization. ## Problem statement In the _L-world_ design for value types, the classfile type descriptor syntax is left unchanged, and the pre-existing descriptor form `"LFoo;"` is overloaded to denote value types as well as object types. A previous design introoduced new descriptors for value types of the form `"QFoo;"`, and possibly a union type `"UFoo;"`. This design might be called _Q-world_. In comparison with Q-world, the L-world design approach has two advantages--compatibility and migration--but also one serious disadvantage: ambiguity. L-world is _backward compatible_ with tools that must parse classfile descriptors, since it leaves descriptor syntax unchanged. There have been no changes to this syntax in almost thirty years, and there is a huge volume of code that depends on its stability. The HotSpot JVM itself makes hundreds of distinct decisions based on descriptor syntax which would need careful review and testing if they were to be adapted to take account of a new descriptor type (`"QFoo;"`, etc.). Because of its backward compatibility, L-world also has a distinctly simpler _migration story_ than previous designs. Some _value-based classes_, such as `Optional` and `LocalTime`, have been engineered to be candidates for migration to proper value types. We wish to allow such a migration without recompiling the world or forcing programmers to recode uses of the migrated types. It is very difficult to sustain the illusion in Q-world that a value type `Qjava/util/Optional;` can be operated on in old code under the original object type `Ljava/util/Optional;`, since the descriptors do not match and a myriad of adapters must be spun (one for every mention of the wrong descriptor). With L-world, we have the simpler problem (addressed in this document) of keeping straight the meaning of L-descriptors in each relevant context, whether freshly recompiled or legacy code; this is a simpler problem than spinning adapters. But not all is well in L-world. The compatibility of descriptors implies that, when a classfile must express a semantic distinction between a reference type and an object type, it must be allowed to do so unambiguously, in a side channel outside of the descriptor. Our first thought was, "well, just load all the value types and then you will know the list of them". If we have a global registry of classes (such as the HotSpot system dictionary), nobody needs to express any additional distinctions, since everybody can just ask the register which are the value types. This simple idea has a useful insight, but it goes wrong in three ways. First, for some use cases such as classfile transformation, it might be difficult to find such a global registry; in some cases we might prefer to rely on local information in the classfile. We need a way for a classfile to encode, within itself, which types it is using as value types, so that all viewers of the classfile can make consistent decisions about what's a value and what's not. Second, if we are running in the JVM, the global registry of value types has to be built up by loading classfiles. In order for every classfile that _uses_ a value type to know its status, the classfile the _defines_ the value type must be loaded _first_. But there is no way to totally order these constraints, since it is easy to create circular dependencies between value types, either directly or indirectly. (N.B. Well-foundedness rules for layout don't eliminate all the possible circularities.) And it won't work to add more initialization phases ("first load all the classfiles, then let them all start asking questions about their contents"), because that would require preloading a classfile for every potential value type mentioned in some other classfile. That's every name in every `"LFoo;"` descriptor. Loading a file for every name mentioned anywhere is very un-Java-like, and something that drastic would be required in order to make correct decisions about value types. That leads to the third problem, which comes from our desire to make a migration story. Some classfiles need to operate on value types as if they were object references. (Below, we will see details of how operations can differ between value and reference types.) This means that, if we are to support migration, we need a way for legacy classfiles to make a _local_ decision to treat a given type as a reference type, for backward compatibility. Luckily, this is possible, but it requires a local indication in the classfile so the JVM can adjust certain operations. A solution to these problems requires a way for each classfile to declare how it intends to use each type that is a value type, and (what is more) a way for legacy classfiles to peacefully interoperate with migrated value types. We have experimented with various partial solutions, such as adding an extra bit in a context where a value type may occur, to let the JVM know that the classfile intends a value type. (This is the famous `ACC_FLATTENABLE` bit on fields.) But it turns out that the number of places where value-ness is significant is hard to limit to just a few spots where we can sprinkle a mode bit. We need a _comprehensive_ solution that can clearly and consistently define a classfile's (local) view of the status of each type it works with, so that when the "value or reference?" question comes up, there is a clear and consistent answer. We need to prevent the values and the references from polluting each other; we need _value type hygiene_. ## Value vs. reference operations Value types can be thought of as simpler than reference types, because they lack two features of reference types: - _identity:_ Two value types with the same immediate components are indistinguishable, even if they were created by different code paths. Objects, by contrast, "remember" when they were created, and each object is a unique identity. Identities are distinguished using the `acmp` family of instructions, and Java's `==` operator. - _nullability:_ Any variable of any reference type can store the value `null`; in fact, `null` is the initial value for fields and array elements. So `null` is one of the possible values of any reference type, including `Object` and all interfaces. By contrast, `null` is _not_ the value of any value type. Value type variables are not nullable, because `null` is a reference. (But read on for an awkward exception.) The type `Object` can represent all values and references. Casting an unknown operand of type `Object` to a value type `Foo` must succeed if in fact the operand is of type `Foo`, but a null `Object` reference must never successfully cast to a value type. This strong distinction between values and references is inspired, in part, by the design of Java's primitive types, which also are identity free and are not nullable. Every copy of the `int` value 42 is completely indistinguishable from every other copy, and you can't cast a `null` to `int` (without a null pointer exception). We hope eventually to unify value types and primitives, but even if this never comes to pass, our design slogan for value types is, _codes like a class, works like an int_. By divesting themselves of identity and nullability, value types are able to enjoy new behaviors and optimizations akin to those of primitives, notably flattening in the heap and scalarization in compiled code. To unlock these benefits, the JVM must treat values and references as operationally distinct. Some of these operational distinctions are quite subtle; some required months of discussion to elucidate, though soon (we hope) they will be obvious in hindsight. Here is a partial list of cases where the JVM should be able to distinguish value types from reference types: - _instance fields:_ A value field should be flattened (if possible) to components in adjacent memory words. A reference field must not be flattened, in order to retain identity and store the null reference. - _static fields:_ A static field must be properly initialized to the default value of its type, not to null. This holds true for all fields, in fact. Flattening does not seem to be important for static fields. - _array elements:_ An element of a value array (array whose component type is a value type) should flatten its elements and arrange them compactly in successive memory locations. Such an array must be initialized to the default value of its value type, and never to `null`. - _methods:_ A value parameter or return value should be flattened (if possible) to components in registers. A reference must not be treated this way, because of identity and nullability. - _verifier:_ The verifier needs to know value types, so it can forbid inapplicable operations, such as `new` or `monitorenter`. - _conversions:_ The `checkcast` operator for a value type might reject `null` (as well as rejecting instances of the wrong type). The `ldc` of a dynamic constant of value type must not produce `null` (instead it must fail to link). _ _comparisons:_ The `acmp` operator family must not detect value type identities (since they are not present), so it must operate differently on values and references. In some cases, the verifier might reject `acmp` altogether. - _optimization:_ The JIT needs to know whether it can discard any internal reference (for a value type) and just explode the value into registers. The possibility of `null` mixing with value types. This list can be tweaked to make it shorter, by adjusting the rules in ways that lessen the impact of ambiguity in type names. The list is also incomplete. (We will add to it later.) Each point of distinction is the subject of detailed design trade-offs, many of which we are sketching here. Some of these distinctions can be pushed to instruction link time (when resolved value classes may be present) or run time (when the actual values are on stack). A dynamic check can make a final decision, after all questions of value-ness are settled. This seems to be a good decision for `acmp`. The linkage of a `new` instruction can fail on a value class, or a `checkcast` instruction can reject inspected as part of the dynamic execution of operations like `new`. But this delaying tactic doesn't always work. For example, field layout must be computed during class loading, which (as was seen above) is too early to use the supposed global list of value types. Even if some check can be delayed, like the detection of an erroneous `new` on a value type, we may well decide it is more useful (or "hygienic") to detect the error earlier, such as at verification time, so that a broken program can be detected before it starts to run. Also, some operations may be contextual, to support backward compatibility. Thus, `checkcast` may need to consult the local classfile about whether to reject nulls, so that legacy code won't suddenly fail to verify or execute just because it mixes nulls with (what it thought were) references. Basically, a "legacy checkcast" should work correctly with nulls, while an "upgraded checkcast" should probably reject nulls immediately, without requiring extra tests. We will examine these points in more detail later, but now we need to examine how to contextualize information about value types. ## Towards a solution What is to be done? The rest of this note will propose some solutions to the problem of value type hygiene, and specifically the problem of preventing nulls from mixing with values ("null hygiene"). Both Remi Forax[[1]] and Frederic Parain[[2]] have proposed the idea of having each classfile explicitly declare the list of all value types that it is using. For the record, this author initially resisted the idea[[3]] as overkill: I was hoping to get away with a band-aid (`ACC_FLATTENABLE`), but have since realized we need a more aggressive treatment. Clean and tidy behavior from the JVM will make it easier to implement clean and tidy support for value types in the Java language. [[1]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003685.html [[2]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003699.html [[3]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003687.html Throughout the processing of the classfile, the list can serve as a reliable local registry of decisions about values vs. references. First we will sketch the attribute, and then revisit the points above to see how the list may be used. ## The `ValueTypes` attribute As proposed above, let us define a new attribute called `ValueTypes` which is simply a counted array of `CONSTANT_Class` indexes. Each indexed constant is loaded and checked to be a value type. The JVM uses this list of locally declared value types for all further decisions about value types, relative to the current class. As a running reference, let's call the loaded class `C`. `C` may be any class, either an object or a value. The value types locally declared by `C` we can call `Q`, `Q1`, `Q2`, etc. These are exactly the types which would get `Q` descriptors in Q-world. As an attribute, `ValueTypes` is somewhat like the `InnerClasses` attribute. Both list all classes, within the context a particular classfile, which need some sort of special processing. The `InnerClasses` attribute includes additional data for informing the special processing (including the break down of "binary names" into outer and inner names, and extra modifier bits), but the `ValueTypes` attribute only needs to mention the classes which are known to be value types. Already with the `ACC_FLATTENABLE` bit we have successfully defined logic that pre-loads a supposed value type, ensures that it _is_ in fact a value type, and then allows the JVM to use all of the necessary properties of that value type to improve the layout of the current class. The classes mentioned in `ValueTypes` would be pre-loaded similarly. In fact, the `ACC_FLATTENABLE` bit is no longer needed, since the JVM can simply flatten all fields whose type names are mantioned in the local `ValueTypes` list. We now come to the distinction between properly resolved classes (`CONSTANT_Class` entries) and types named in descriptors. This distinction is important to keep in mind. Once a proper class constant `K` is resolved by `C`, everything is known about it, and a permanent link to `K` goes into `C`'s constant pool. The same is not true of other type names that occur within field and method descriptors. In order for `C` to check whether its field type `"LK;"` is a value type, it must _not_ try to resolve `K`. Instead it must look for `K` _by name_ in the list of locally declared value types. Later on, when we examine verifier types and the components of method descriptors a similar by-name lookup will be necessary to decide whether they refer to value types. Thus, there are two ways a type can occur in a classfile and two ways to decide if it is a value type: By resolving a proper constant `K` and looking at the metadata, and by matching a name `"LK;"` against the local list. Happily, the answers will be complete and consistent if all the queries look at the same list. So a type name can be classified as a value type without resolution, by looking for the same name in the names of the list of declared value types. And this can be done even before the list of declared value types is available. This means that any particular declared value types might not need to be loaded until "hard data" is required of it. A provisional determination of the value status of some `Q` can be made very early before `Q`'s classfile is actually located and pre-loaded. That provision answer might be enough to check some early structural constraint. It seems reasonable to actually pre-load the `Q` values lazily, and only when the JVM needs hard data about `Q`, like its actual layout, or its actual supers. What if an element of `ValueTypes` turns out to be a reference type? (Perhaps someone deployed a value-type version of `Optional` but then got cold feet; meanwhile `C` is still using it under the impression it is a value type.) There are two choices here, loose and strict, either pretend the type wasn't there anyway, or raise an error in the loading of the current classfile. The strict behavior is safer; we can loosen it later if we find a need. The case of an element failing to load at all can be treated like the previous problem, either loosely or strictly; strict is better all else being equal. The strict treatment is also more in line with how to treat failed resolution of super-types, which are a somewhat similar kind of dependency: Super-types, like value types, are loaded as early as possible, and play a role in all phases of classfile loading, notably verification. One corollary of making the list an attribute is that it can be easily stripped, just like `InnerClasses` or `BootstrapMethods`. Is this a bug or a feature? In the case of `InnerClasses`, stripping the attribute doesn't affect execution of the classfile but it does break some Core Reflection queries. In the case of `BootstrapMethods`, the structural constraints on dynamic constant pool constants will break, and the classfile will promptly fail to load. The effect of removing a `ValueTypes` attribute is probably somewhere in between. Because L-world types are ambiguous, and because we specifically allow value types to be used as references from legacy classfiles (for migration), there's always a way to "fake" enough reference behavior from a value type in a classfile which doesn't make special claims about it. So it seems reasonable to allow `ValueTypes` to be stripped, at least in principle. At a worst case the classfile will fail to load, as in the case of a stripped `BootstrapMethods`, but the feature might actually prove useful (say, for before-and-after migration testing). Note that in principle a classfile generator could choose to ignore a value type, and treat it as a (legacy) reference type. Because of migration, the JVM must support at least some such moves, but such picking and choosing is not the center of our design. In particular, we do not want the same compilation unit to treat a type as a value type in one line of code, and a reference type in the next. This may come later, if we decide to introduce concepts of nullable values and/or value boxes, but we think we can defer such features. So for now, classfiles may differ among themselves about which types are value types, but within a single classfile there is only one source of local truth about value types. (Locally-sourced, fresh, hygienic data!) ## Value types and class structure Very early during class loading, the JVM assigns an instance layout to the new class `C`. Before that point it must first load the declared value types (`Q1`, `Q2`, ...), and then recursively extract the layout information from each one. There is no danger of circularity in this because a value type instance cannot contain another instance of itself, directly or indirectly. Both non-static and static fields of value type make sense (because a value "works like an int"). But static fields interact differently with the loading process than non-static fields. A static value type field has no enclosing instance, unless the JVM chooses to make one secretly. Therefore it doesn't need to be flattened. The JVM can make an invisible choice about how to store a static value type field: - Buffered immutably on the heap and stored by (invisible) reference next to the other statics. The `putstatic` instruction would put the new value in a _different_ buffer and change the pointer. - Buffered mutably somewhere, with the pointer stored next to the other statics, or in metadata. The `putstatic` instruction would store the flattened value into the _same_ buffer. - Flattened fully into the same container as the other statics. The first option seems easiest, but the second might be more performant. The third difficult due to bootstrapping concerns. In fact, the same implementation options apply for non-statics as for statics, but only the third one (full flattening) is desirable. The first one (immutable buffering) may be useful as a fallback implemmentation technique for special cases like jumbo values and fields which are `volatile`, and thus need to provide atomicity. The root container for all of `C`'s statics, in HotSpot, happens to be the relevant the `java.lang.Class` value `C.class`. Presumably it's a good place to put the invisible pointers mentioned above. A static field of value type `Q` cannot make its initial value available to `getfield` until `Q`'s `` method runs, (or in the case of re-entrant initialization, has at least started). Since classes can circularly refer to instances of each other via static references, `Q` might return the favor and require materialization of `C`. The first time `C` requires `Q`'s default value, if `Q` has not been initialized, its `` method should run. This may trigger re-entry into the initializer for `C`, so `Q` needs to get its act together _before_ it runs its ``, and immediately create `Q`'s own default value, storing it somewhere in `Q`'s own metadata (or else the `Class` mirror looks like a good spot). The invariant is that, before `Q`'s class initializer can run one bytecode, the default value for `Q` is created and registered for all time. Creating the default value before the initializer runs is odd but harmless, as long as no bytecode can actually access the default value without triggering `Q`'s initialization. This also implies that `C` should create and register its own default value (if it is a value type) before it runs its own `` method, lest `Q` come back and ask `C` for its value type. The JVM may be required to bootstrap value-type statics as invisible null pointers, which are inflated (invisibly by the `getstatic` and/or `putstatic` instructions) into appropriate buffers, after ensuring the initialization of the value type class. But it seems possible that if the previous careful sequencing is observed, there is no need to do lazy inflation of nulls, which would simplify the code for `getstatic` and `putstatic`. ## Value types and method linkage A class `C` includes methods as well as fields, of course. A method can receive or return a value type `Q` simply by mentioning `Q` as a component of its method descriptor (as an L-descriptor `"LQ;"`). If a method `C.m()LD;` mentions some type `D` which is not on the declared list, then that type `D` will be treated, like always, as a nullable, identity-bearing reference. Interestingly, migration compatibility requires this to be the case whether or not `D` is in actual fact a value type. If `C` is unconscious of `D`'s value-ness, the JVM must respect this, and preserve the illusion that `D` values are "just references, nothing to see here, move along". Perhaps `D` is freshly upgraded to a value type, and `C` isn't recompiled yet. `C` should not be penalized for this change, if at all possible. This points to a core decision of the L-world design, that nearly all of the normal operations on object references "just do the right thing" when applied to value types. The two kinds of data use the same descriptor syntax. Value types can be converted to `Object` references, even though the resulting pseudo-reference does not expose any identity (and will never be null). Instructions like `aload` operate on values just as well as references, and so on. Basically, values in L-world routinely go around lightly disguised as references, special pseudo-references which do not retain object identity. As long as nobody looks closely, the fiction that they are references is unbroken. If someone tries a `monitorenter` instruction, the game is over, but we think those embarassing moments will be rare. On the other hand, if a method `C.m()LQ;` uses a locally-declared value type, then the JVM has some interesting options. It may choose to notice that the `Q`-value is not nullable, has no identity. It can adjust the calling sequence of `m` to work with undisguised "naked values", which are passed on the stack, opr broken into components for transport across the method API. This would almost be a purely invisible decision, except that naked values cannot be null, and so such calling sequences are hostile to null. Again, it "works like an int". A null `Integer` value will do just the same thing if you try to pass it to an `int`-recieving method. So we have to be prepared for an occasional embarassing NPE, when one party thinks a type is a nullable reference type and the other party knows it's a value type. One might think that it is straightforward to assign a value-using method a calling sequence by examining the method signature and the locally declared value types of the declaring class. But in fact there are non-local constraints. Only static and private methods can easily be adjusted to work with naked values. Unlike fields, methods can override similar methods in some `C`'s super-type `S`. This immediately leads to the possibility of `C` and `S` differing as to the status of some type `X` in the method's signature. If neither of the `ValueTypes` lists of `C` and `S` mentions `X`, then the classes are agreed that `X` is an object type (even if in truth it happens to be a value type). They can agree to use a reference-based calling sequence for some `m` that works with `X`. If both lists mention some `Q`, then both classes agree, and in fact it must be a value type. They might be able to agree to use "naked values" for the `Q` type when calling the method. Or not: they still have to worry about other supers that might have another opinion about `Q`. What if `C` doesn't list `Q` but `S` does, and they share a method that returns `Q`? For example, what about `C.m()Q` vs. `S.m()Q`? In that case, the JVM may have already set up `S.m` to return its `Q` result as a naked value. Probably this happend before `C` was even loaded. The code for `C.m` will expect simply to return a normal reference. In reality, it will be unconsciously holding a JVM-assigned pseudo-reference to the buffered `Q`-value. The JVM must then unwrap the reference into a naked value to match the calling sequence it assigned (earlier, before `C` was loaded) to `S.m`. The bottom line is that even though `C.m` was loaded as a reference-returning function, the JVM may secretly rewrite it to return a naked value. Since `C.m` returns a reference, it might choose to return `null`. What happens then? The secretly installed adaptation logic cannot extract the components of a buffer that doesn't exist. A `NullPointerException` must be thrown, at the point where `C.m` is adapted to `S.m`, which has the greater knowledge that `Q` is value type (hence non-nullable). It will be as if the `areturn` instruction of `C.m` included a hidden null check. Is such a hidden null check reasonable? One might explain that the `C` code thinks (wrongly) it is working with boxes, while the `S` code _knows_ it is working with values. If the method were `C.m()Integer` and it were overriding `S.m()int`, then if `C.m` returns `null` then the adapter that converts to `S.m()int` must throw NPE during the implicit conversion from `Integer` to `int`. A value "works like an int", so the result must be similar with a value type. It is as if the deficient class `C` were working with boxes for `Q` (indeed that's all it sees) while the knowledgeable class `S` is working with true values. The NPE seems justifiable in such terms, although there is no visible adapter method to switch descriptors in this case. The situation is a little odd when looked at the following way: If you view nullability as a privilege, then this privilege is enjoyed only by deficient classes, ones that have not yet been recompiled to "see" that the type `Q` is a value type. Ignorant classes may pass `null` back and forth through `Q` APIs, all day long, until they pass it through a class that knows `Q` is a value. Then an `NPE` will end their streak of luck. Is using `null` a privilege? Well, yes, but remember also that if `Q` started its career as an object type, it was a value-based class, and such classes are documented as being null-hostile. The null-passers were in a fool's paradise. What if `C` lists `Q` as a value but `S` doesn't? Then the calling sequence assigned when `S` was loaded will use references, and these references will in fact be pseudo-references to buffered `Q` values (or `null`, as just discussed). The knowledgeable method `C.m()Q` will never produce a `null` through this API. The JVM will arrange to properly clothe the `Q`-value produced by `C.m` into a buffer whose pointer can be returned from `S.m`. Class hierarchies can be much deeper than just `C` and `S`, and overrides can occur at many levels on the way down. Frederic Parain has pointed out that the net result seems to be that the first (highest) class that declares a given method (with descriptor) also gets to determine the calling sequence, which is then imposed on all overrides through that class. This leads to a workable implementation strategy, based on v-table packing. A class's v-table is packed at during the "preparation" phase of class linking, just after loading before any subclass v-table is packed. The JVM knows, unambiguously, whether a given v-table entry is new to a class, or is being reaffirmed from a previous super-class (perhaps with an override, perhaps just with an abstract). At this point, a new v-table slot can be given a carefully selected internal calling sequence, which will then be imposed on all overrides. An old v-table slot will have the super's calling sequence imposed on it. In this scheme, the interpreter and compiler must examine both the method descriptor and some metadata about the v-table slot when performing `invokevirtual` or `invokespecial`. A method coming in "sideways" from an interface is harder to manage. It is reasonable to treat such a method as "owned" by the first proper class that makes a v-table entry for it. But that only works for one class hierarchy; the same method might show up in a different hierarchy with incompatible opinions about value types in the method signature. It appears that interface default methods, if not class methods, must be prepared to use more than one kind of calling sequence, in some cases. It is as if, when a class uses a default method, it imports that method and adjusts the method's calling sequence to agree with that class's hierarchy. Often an interface default method is completely new to a class hierarchy. In that case, the interface can choose the calling sequence, and this is likely to provide more coherent calling sequences for that API point. These complexities will need watching as value types proliferate and begin to show up in interface-based APIs. ## Value types and the verifier Let us assume that, if the verifier sees a value type, it should flag all invalid uses of that value type immediately, rather than wait for execution. (This assumption can be relaxed, in which case many points in this section can be dropped. We may also try to get away with implementing as few of these checks as possible, saving them for a later release.) When verifying a method, the verifier tracks and checks types by name, mostly. Sometimes it pre-loads classes to see the class hierarchy. With the `ValueTypes` attribute, there is no need to pre-load value classes; the symbolic method is sufficient. The verifier type system needs a way to distinguish value types from regular object types. To keep the changes small, this distinction can be expressed as a local predicate on type names called `isValueType`, implemented by referring to `ValueTypes`. In this way, the `StackMapTable` attribute does not need any change at all. Nor does the verifier type system need a change: value types go under the `Object` and `Reference` categories, despite the fact that value types are not object types, and values are not references. The verifier rules need to consult `isValueType` at some points. The assignability rules for `null` must be adjusted to exclude value classes. ``` isAssignable(null, class(X, _)) :- not(isValueType(X)). ``` This one change triggers widespread null rejection: wherever a value type is required, the verifier will not allow a `null` to be on the stack. Assuming `null` is on the stack and `Q` is a value type, the following will be rejected as a consequence of the above change: - `putfield` or `putstatic` to a field of type `Q` - `areturn` to a return type `Q` - any `invoke` passing `null` to a parameter of type `Q` - any `invoke` passing `null` to a receiver of type `Q` (but this is rare) Given comprehensive null blocking (along other paths also), the implementation of the `putfield` (or `withfield`) instruction could go ahead and pull a buffered value off the stack without first checking for `null`. If the verifier does not actually reject such `null`s, the dynamic behavior of the bytecodes themselves should, to prevent null pollution from spreading. The verifier rules for `aastore` and `checkcast` only check that the input type is an object reference of some sort. More narrow type checks are performed at runtime. A null may be rejected dynamically by these instructions, but the verifier logic does not need to track `null`s for them. The verifier rules for `invokespecial` have special cases for `` methods, but these do not need special treatment, since such calls will fail to link when applied to a value type receiver. The verifier _could_ reject reference comparisons between value types other operands (including `null`, other value types, and reference types). This would look something like an extra pair of constraints after the main assertion that two references are on the stack: ``` instructionIsTypeSafe(if_acmpeq(Target), Environment, _Offset, StackFrame, NextStackFrame, ExceptionStackFrame) :- canPop(StackFrame, [reference, reference], NextStackFrame), + not( canPop(StackFrame, [_, class(X, _)], _), isValueType(X) ), + not( canPop(StackFrame, [class(X, _), _], _), isValueType(X) ), targetIsTypeSafe(Environment, NextStackFrame, Target), exceptionStackFrame(StackFrame, ExceptionStackFrame). ``` (The JVMS doesn't use any such `not` operator. The actual Prolog changes would be more complex, perhaps requiring a `real_reference` target type instead of `reference`.) This point applies equally to `if_acmpeq`, `if_acmpne`, `if_null`, and `if_nonnull`, This doesn't seem to be worth while, although it might be interesting to try to catch javac bugs this way. In any case, such comparisons are guaranteed to return `false` in L-world, and will optimize quickly in the JIT. In a similar vein, the verifier _could_ reject `monitorenter` and `monitorexit` instructions when they apply to value types: ``` instructionIsTypeSafe(monitorenter, _Environment, _Offset, StackFrame, NextStackFrame, ExceptionStackFrame) :- canPop(StackFrame, [reference], NextStackFrame), + not( canPop(StackFrame, [class(X, _)], _), isValueType(X) ), exceptionStackFrame(StackFrame, ExceptionStackFrame). ``` And a `new` or `putfield` could be quickly rejected if it applies to a value type: ``` instructionIsTypeSafe(new(CP), Environment, Offset, StackFrame, NextStackFrame, ExceptionStackFrame) :- StackFrame = frame(Locals, OperandStack, Flags), CP = class(X, _), + not( isValueType(X) ), ... instructionIsTypeSafe(putfield(CP), Environment, _Offset, StackFrame, NextStackFrame, ExceptionStackFrame) :- CP = field(FieldClass, FieldName, FieldDescriptor), + not( isValueType(FieldClass) ), ... ``` Likewise `withfield` could be rejected by the verifier if applied to a non-value type. The effect of any or all of these verifier rule changes (if we choose to implement them) would be to prevent local code from creating a `null` and accidentally putting it somewhere a value type belongs, or from accidentally applying an identity-sensitive operation to an operand _known statically_ to be a value type. These rules only work when a sharp verifier type unambiguously reports an operand as `null` or as a value type. Nulls must also be rejected, and value types detected, when they are hidden, at verification time, under looser types like `Object`. Protecting local code from outside `null`s must also be done dynamically. Omitting all of these rules will simply shift the responsibility for null rejection and value detection fully to dynamic checks at execution time, but such dynamic checks must be implemented in any case, so the verifier's help is mainly an earlier error check, especially to prevent null pollution inside of a single stack frame. For that reason, the only really important verifier change is the `isAssignable` adjustment, mentioned first. The dynamic checks which back up or replace the other verifier checks will be discussed shortly. ## Value types and legacy classfiles We need to discuss the awkward situation of `null` being passed as a value type, and value types being operated on as objects, by legacy classfiles. One legacy classfile can dump null values into surprising places, even if all the other classfiles are scrupulous about containing `null`. We will also observe some of the effects of having value types "invade" a legacy classfile which expects to apply identity-sensitive operations to them. By "legacy classfile" we of course mean classfiles which lack `ValueTypes` attributes, and which may attempt to misuse value types in some way. (Force of habit: it's strong.) We also can envision half-way cases where a legacy classfile has a `ValueTypes` attribute which is not fully up to date. In any case, there is a type `Q` which is _not_ locally declared as a value type, by the legacy class `C`. The first bad thing that can happen is that `C` declares a field of type `Q`. This field will be formatted as a reference field, even though the field type is a value type. Although we implementors might grumble a bit, the JVM will have to arrange to use pseudo-pointers to represent values stored in that field. (It's as if the field were volatile, or not flattenable for some other reason.) That wasn't too bad, but look what's in the field to start with: It's a null. That means that any legitmate operation on this initial value will throw an `NPE`. Of course, the writer of `C` knew `Q` as a value-based class, so the initial null will be discarded and replaced by a suitable non-null value, before anything else happens. What if `C` makes a mistake, and passes a `null` to another class which _does_ know `Q` is a value? At that point we have a choice, as with the verifier's null rejection whether to do more work to detect the problem earlier, or whether to let the `null` flow through and eventually cause an `NPE` down the line. Recall that if an API point gets a calling sequence which recognizes that `Q` is a value type, it will probably unbuffer the value, throwing `NPE` immediately if `C` makes a mistake. This is good, because that's the earliest we could hope to flag the mistake. But if the method accepts the boxed form of `Q`, then the `null` will sneak in, skulk around in the callee's stack frame, and maybe cause an error later. Meanwhile, if the JVM tries to optimize the callee, it will have to limit its optimizations somewhat, because the argument value is nullable (even if only ever by mistake). To cover this case, it may be useful to define that _method entry_ to a method that knows about `Q` is null-hostile, even if the _calling sequence_ for some reason allows references. This means that, at function entry, every known value type parameter is null-checked. This needs to be an official rule in the JVM, not just an optimization for the JIT, in order for the JIT to use it. What if our `C` returns a `null` value to a caller who intends to use it as a value? That won't go well either, but unless we detect the `null` aggressively, it might rattle around for a while, disrupting optimization, before produing an inscrutable error. ("Where'd that `null` come from??"). The same logic applies as with arguments: When a `null` is returned from a method call that purports to return `Q`, this can only be from a legacy file, and the calling sequences were somehow not upgraded. In that case, the JVM needs to mandate a null check on every method invocation which is known to return a value type. The same point also applies if another class `A`, knowing `Q` as a value type, happens to load a `null` from one of `C`'s fields. The `C` field is formatted as a reference, and thus can hand `A` a surprise `null`, but `A` must refuse to see it, and throw `NPE`. Thus, the `getfield` instruction, if it is pointed at a legacy non-flattened field, will need to null-check the value loaded from the field. Meanwhile, `C` is allowed to `putfield` and `getfield` `null` all day long into its own fields (and fields of other benighted legacy classes that it may be friends with). Thus, the `getfield` and `putfield` instructions link to slightly different behavior, not only based on the format of the field, but _also_ based on "who's asking". Code in `C` is allowed to witness `null`s in its `Q` fields, but code in `A` (upgraded) is _not_ allowed to see them, even though it's the same `getfield` to the same symbolic reference. Happily, fields are not shared widely across uncoordinated classfiles, so this is a corner case mainly for testers to worry about. What if `C` stores a `null` into somebody else's `Q` field, or into an element of a `Q[]` array? In that case, `C` must throw an immediate `NPE`; there's no way to reformat someone else's data structure, however out-of-date `C` may be. What if `C` gets a null value from somewhere and casts it to `Q`? Should the `checkcast` throw `NPE` (as it should in a classfile where `Q` is known to be a value type)? For compatibility, the answer is "no"; old code needs to be left undisturbed if possible. After all, `C` believes it has a legitimate need for `null`s, and won't be fixed until it is recompiled and its programmer fixes the source code. That's about it for `null`. If the above dynamic checks are implemented, then legacy classfiles will be unable to disturb upgraded classfiles with surprise null values. The goal mentioned above about controlling `null` on all paths is fulfilled blocking `null` across API calls (which might have a legacy class on one end), and by verifying that `null`s never mix with values, locally within a single stack frame. There are a few other things `C`'s could do to abuse `Q` values. Legacy code needs to be prevented immediately from making any of the following mistakes: - `new` of `Q` should throw `ICCE` - `putfield` to a field of `Q` should throw `ICCE` - `monitorenter`, `monitorexit` on a `Q` value should throw `IllegalMonitorStateException` Happily, the above rules are not specific to legacy code but apply uniformly everywhere. A final mistake is executing an `acmp` instruction on a value type. Again, this is possible everywhere, not just in legacy files, even if the verifier tries to prevent the obvious occurrences. There are several options for `acmp` on value types. The option which breaks the least code and preserves the O(1) performance model of `acmp` is to quickly detect a value type operand and just report `false`, even if the JVM can tell, somehow, that it's the same buffer containing the same value, being compared to itself. All of these mistakes can be explained by analogy, supposing that the legacy class `C` were working with a box type `Integer` where other classes had been recoded to use `int`. All variables under `C`'s control are nullable, but when it works with new code it sees only `int` variables. Implicit conversions sometimes throw `NPE`, and `acmp` (or `monitorenter`) operations on boxed `Integer` values yield unspecific (or nonsensical) results. ## Value types and instruction linkage Linked instructions which are clearly wrong should throw a `LinkageError` of some type. Examples already given are `new` and `putfield` on value types. When a field reference of value type is linked it will have to correctly select the behavior required by both the physical layout of the field, and also the stance toward any possible `null` if the field is nullable. (As argued above, the stance is either lenient for legacy code or strict for new code.) A `getstatic` linkage may elect to replace an invisible `null` with a default value. When an `invoke` is linked it will have to arrange to correctly execute the calling sequence assigned to its method or its v-table. Linkage of `invokeinterface` will be even more dynamic, since the calling sequence cannot be determined until the receiver class is examined. Linkage of dynamic constants in the constant pool must reject `null` for value types. Value types can be determined either globally based on the resolved constant type, or locally based on the `ValueTypes` attribute associated with the constant pool in which the resolution occurs. ## Value types and instruction execution Most of the required dynamic behaviors to support value type hygiene have already been mentioned. Since values are identity-free and non-nullable, the basic requirement is to avoid storing `null`s in value-type variables, and degrade gracefully when value types are queried about their identities. A secondary requirement is to support the needs of legacy code. For null hygeine, the following points apply: - A nullable argument, return value (from a callee), or loaded field must be null-checked before being further processed in the current frame, if its descriptor is locally declared as a value type. - `checkcast` should reject `null` for _locally_ declared value types, but not for others. - If the verifier does not reject `null`, the `areturn`, `putfield` `withfield` instructions should do so dynamically. (Otherwise the other rules are sufficient to contain `null`s.) - An `aastore` to a value type array (`Q[]`) should reject `null` even if the array happens to use invisible indirections as an implementation tactic (say, for jumbo values). This is a purely dynamic behavior, not affected by the `ValueTypes` attribute. Linked field and invoke instructions need sufficient linkage metadata to correctly flatten instance fields and use unboxed (and/or `null` hostile) calling sequences. As discussed above, the `acmp` must short circuit on values. This is a dynamic behavior, not affected by the `ValueTypes` attribute. Generally speaking, any instruction that doesn't refer to the constant pool cannot have contextual behavior, because there is no place to store metadata to adjust the behavior. The `areturn` instruction is an exception to this observation; it is a candidate for bytecode rewriting to gate the extra null check for applicable methods. ## Value types and reflection Some adjustments may be needed for the various reflection APIs, in order to bring them into alignment with the changed bytecode. - `Class.cast` should be given a null-hostile partner `Class.castValue`, to emulate the updated `checkcast` semantics. - `Field` should be given a dynamic `with` to emulate `withfield`, and the `Lookup` API given a way to surface the corresponding MH. - `Class.getValueTypes`, to reflect the attribute, may be useful. ## Conclusions The details are complex, but the story as a whole becomes more intelligible when we require each classfile to locally declare its value types, and handle all values appropriately according to the local declaration. Outside of legacy code, and at its boundaries, tight control of null values is feasible. Inside value-rich code, and across value-rich APIs, full optimization seems within reach. Potential problems with ambiguity in L-world are effectively addressed by a systematic side channel for local value type declarations, assisting the interpratation of `L`-type descriptors. This side channel can be the `ValueTypes` attribute. From john.r.rose at oracle.com Mon May 7 20:55:22 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 7 May 2018 13:55:22 -0700 Subject: value type hygiene In-Reply-To: References: Message-ID: <9732E876-6A81-4D23-A5CB-709F98CAE830@oracle.com> On May 6, 2018, at 2:17 AM, John Rose wrote: > > Like many of us, I have been thinking about the problems of keeping values, nulls, > and objects separate in L-world. I wrote up some long-ish notes on the subject. > I hope it will help us wrap our arms around the problem, and get it solved. > > TL;DR: Remi was right in January. We need a ValueTypes attribute. FTR, there's an interesting meander, in the big picture, to our design process: 1. 2012 Recognizably modern value types are proposed as something to sneak under L-types, as a gated optimization on restricted references rather than on a new set of types. https://blogs.oracle.com/jrose/value-types-in-the-vm (In conversations at this time, indeterminate "heisenboxes" are suggested to suppress identity, following the Common Lisp EQ function semantics. Thinking about specifying and using heisenboxes makes our skin crawl. For a description, see JDK-8163133. Various folks propose hacking acmp to suppress value identity, upgrading it to Common Lisp EQV, but that makes our skin crawl when we think about optimization and generics.) 2. 2014 We realize that making it all an optimization is just too sneaky, and invent Q-descriptors to make everything explicit. http://cr.openjdk.java.net/~jrose/values/values-0.html It takes a several months to get confident that we've found all the places where Q-types matter. Generics over Q-types look very difficult (as they still do). 3. 2016-2017 We define and implement a subset of Q-types as "Minimal Value Types". http://cr.openjdk.java.net/~jrose/values/shady-values.html It works for small cases, but putting Q for L starts to feel like TMI. No, it's not sneaky. It's more like your neighbor playing rock music at 2 AM. 4. 2017-2018 We decide to try "L-world", where we use ambiguous L-descriptors. We put needed information, previously held by Q-descriptors, in ad hoc side channels like ACC_FLATTENABLE bits. http://cr.openjdk.java.net/~fparain/L-world/L-World-JVMS-4.pdf This works only because we now know all the places where Q descriptors are significant to semantics or optimization. It's hard to imagine winning this information without the Q-type exercises of steps 2 and 3. But the need for ad hoc channels grows uncomfortably as we expand our prototype. 5. (present) We are considering consolidating the Q descriptor information in a uniform side channel, one per classfile, an attribute called ValueTypes. In this way, all semantic and optimization decisions about value types can be made based on one source of information (per classfile). http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html This is not sneaky. On the other hand, it does not impose Q-types on tools and classes that don't want to know about them. In fact, we might have a happy medium here! That's the big picture of value type design, as I see it. ? John P.S. Yes, there's also history before 2012. We can start with Java, although SmallTalk and Common Lisp provide interesting precedents. 0. 1999 James Gosling worries about numerics in Java and proposes, as one component of a solution, "immutable class" declarations. http://web.archive.org/web/19990202050412/http://java.sun.com/people/jag/FP.html#classes The project is put aside, in part because Gosling (correctly) assesses that it requires the equivalent of several PhD theses to flesh out the design. Over the years, other folks make similar proposals. If we could force enough optimizations on java.lang.Integer, it could behave like an int. These proposals run into difficulties not with immutability but with other aspects of reference types, such as nullability and (especially) object identity. Also during this period, there are many hopeful suggestions of the form, "Why don't you just do what $Language does, but for Java?" Where Language is usually C++ or C#. There is, of course, no language whose design for values can be independently extracted, let alone successfully applied to Java. One constraint with Java is strong interoperability with pre-existing APIs, which are built solely of primitives and reference types. Another is Java's strong concurrency guarantees, which make immutability more important than in older languages. From forax at univ-mlv.fr Mon May 7 21:23:20 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 7 May 2018 23:23:20 +0200 (CEST) Subject: value type hygiene In-Reply-To: <9732E876-6A81-4D23-A5CB-709F98CAE830@oracle.com> References: <9732E876-6A81-4D23-A5CB-709F98CAE830@oracle.com> Message-ID: <280700120.517784.1525728200896.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "valhalla-spec-experts" > Envoy?: Lundi 7 Mai 2018 22:55:22 > Objet: Re: value type hygiene > On May 6, 2018, at 2:17 AM, John Rose < [ mailto:john.r.rose at oracle.com | > john.r.rose at oracle.com ] > wrote: >> Like many of us, I have been thinking about the problems of keeping values, >> nulls, >> and objects separate in L-world. I wrote up some long-ish notes on the subject. >> I hope it will help us wrap our arms around the problem, and get it solved. >> TL;DR: Remi was right in January. We need a ValueTypes attribute. > FTR, there's an interesting meander, in the big picture, to our design process: > 1. 2012 Recognizably modern value types are proposed as something to > sneak under L-types, as a gated optimization on restricted references rather > than on a new set of types. > [ https://blogs.oracle.com/jrose/value-types-in-the-vm | > https://blogs.oracle.com/jrose/value-types-in-the-vm ] > (In conversations at this time, indeterminate "heisenboxes" are suggested to > suppress identity, following the Common Lisp EQ function semantics. > Thinking about specifying and using heisenboxes makes our skin crawl. > For a description, see JDK-8163133. Various folks propose hacking acmp > to suppress value identity, upgrading it to Common Lisp EQV, but that makes > our skin crawl when we think about optimization and generics.) > 2. 2014 We realize that making it all an optimization is just too sneaky, > and invent Q-descriptors to make everything explicit. > [ http://cr.openjdk.java.net/~jrose/values/values-0.html | > http://cr.openjdk.java.net/~jrose/values/values-0.html ] > It takes a several months to get confident that we've found all the places where > Q-types matter. Generics over Q-types look very difficult (as they still do). > 3. 2016-2017 We define and implement a subset of Q-types as > "Minimal Value Types". > [ http://cr.openjdk.java.net/~jrose/values/shady-values.html | > http://cr.openjdk.java.net/~jrose/values/shady-values.html ] > It works for small cases, but putting Q for L starts to feel like > TMI. No, it's not sneaky. It's more like your neighbor playing rock > music at 2 AM. > 4. 2017-2018 We decide to try "L-world", where we use ambiguous > L-descriptors. We put needed information, previously held by Q-descriptors, > in ad hoc side channels like ACC_FLATTENABLE bits. > [ http://cr.openjdk.java.net/~fparain/L-world/L-World-JVMS-4.pdf | > http://cr.openjdk.java.net/~fparain/L-world/L-World-JVMS-4.pdf ] > This works only because we now know all the places where Q > descriptors are significant to semantics or optimization. It's hard > to imagine winning this information without the Q-type exercises > of steps 2 and 3. But the need for ad hoc channels grows > uncomfortably as we expand our prototype. > 5. (present) We are considering consolidating the Q descriptor > information in a uniform side channel, one per classfile, an attribute > called ValueTypes. In this way, all semantic and optimization > decisions about value types can be made based on one source > of information (per classfile). > [ http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html | > http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html ] > This is not sneaky. On the other hand, it does not impose Q-types > on tools and classes that don't want to know about them. In fact, > we might have a happy medium here! > That's the big picture of value type design, as I see it. > ? John > P.S. Yes, there's also history before 2012. We can start with Java, although > SmallTalk and Common Lisp provide interesting precedents. > 0. 1999 James Gosling worries about numerics in Java and proposes, as > one component of a solution, "immutable class" declarations. > [ > http://web.archive.org/web/19990202050412/http://java.sun.com/people/jag/FP.html#classes > | > http://web.archive.org/web/19990202050412/http://java.sun.com/people/jag/FP.html#classes > ] > The project is put aside, in part because Gosling (correctly) assesses that it > requires the equivalent of several PhD theses to flesh out the design. > Over the years, other folks make similar proposals. If we could force enough > optimizations on java.lang.Integer, it could behave like an int. These proposals > run into difficulties not with immutability but with other aspects of reference > types, such as nullability and (especially) object identity. > Also during this period, there are many hopeful suggestions of the form, > "Why don't you just do what $Language does, but for Java?" Where Language > is usually C++ or C#. There is, of course, no language whose design for > values can be independently extracted, let alone successfully applied to Java. > One constraint with Java is strong interoperability with pre-existing APIs, > which are built solely of primitives and reference types. Another is Java's > strong concurrency guarantees, which make immutability more important > than in older languages. and there is also no concept of addresses from the stack in Java (unlike C, C++ or C#). R?mi From paul.sandoz at oracle.com Tue May 8 01:06:00 2018 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 7 May 2018 18:06:00 -0700 Subject: value type hygiene In-Reply-To: References: Message-ID: <4BD8F0A2-73F8-4DF6-8969-A92E24C646D8@oracle.com> Thanks for sharing this! I like the null containment approach. It recognizes that nulls (for better or worse) are a thing in the ref world but stops the blighters from infecting the value world at the borders. We will need to extend this hygiene to javac and the libraries. Javac could fail to compile when it knows enough, and in other cases place in explicit null checks if not otherwise performed by existing instructions so as to fail fast. Certain APIs that rely on null as a signal will need careful reviewing and possible adaption if the prevention has some side effects, and maybe errors/warnings from javac. The poster child being Map.get, but others like Map.compute are problematic too (if a value is not present for a key, then a null value is passed to the remapping function). How we proceed might depend on whether specialized generics progresses at a slower rate rate than value types. Paul. > On May 6, 2018, at 2:17 AM, John Rose wrote: > > Like many of us, I have been thinking about the problems of keeping values, nulls, > and objects separate in L-world. I wrote up some long-ish notes on the subject. > I hope it will help us wrap our arms around the problem, and get it solved. > > TL;DR: Remi was right in January. We need a ValueTypes attribute. > > http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html > > Cheers! > ? John > > P.S. Raw markdown source follows for the record. > http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.md > > # Value Type Hygiene > > #### May 2018 _(v. 0.1)_ > > #### John Rose and the Valhalla Expert Group > > Briefly put, types in L-world are ambiguous, leading to unhygienic > mixtures of value operations with reference operations, and > uncontrolled pollution from `null`s infecting value code. > > This note explores a promising proposal for resolving the key > ambiguity. It is a cleaner design than the ad hoc mechanisms tried so > far. The resulting system would seem to allow more predictable and > debuggable behavior, a stronger backward compatibility story, and > better optimization. > > ## Problem statement > > In the _L-world_ design for value types, the classfile type descriptor > syntax is left unchanged, and the pre-existing descriptor form > `"LFoo;"` is overloaded to denote value types as well as object types. > A previous design introoduced new descriptors for value types of the > form `"QFoo;"`, and possibly a union type `"UFoo;"`. This design > might be called _Q-world_. In comparison with Q-world, the L-world > design approach has two advantages--compatibility and migration--but > also one serious disadvantage: ambiguity. > > L-world is _backward compatible_ with tools that must parse classfile > descriptors, since it leaves descriptor syntax unchanged. There have > been no changes to this syntax in almost thirty years, and there is a > huge volume of code that depends on its stability. The HotSpot JVM > itself makes hundreds of distinct decisions based on descriptor syntax > which would need careful review and testing if they were to be adapted > to take account of a new descriptor type (`"QFoo;"`, etc.). > > Because of its backward compatibility, L-world also has a distinctly > simpler _migration story_ than previous designs. Some _value-based > classes_, such as `Optional` and `LocalTime`, have been engineered to > be candidates for migration to proper value types. We wish to allow > such a migration without recompiling the world or forcing programmers > to recode uses of the migrated types. It is very difficult to sustain > the illusion in Q-world that a value type `Qjava/util/Optional;` can > be operated on in old code under the original object type > `Ljava/util/Optional;`, since the descriptors do not match and a > myriad of adapters must be spun (one for every mention of the wrong > descriptor). With L-world, we have the simpler problem (addressed in > this document) of keeping straight the meaning of L-descriptors > in each relevant context, whether freshly recompiled or legacy > code; this is a simpler problem than spinning adapters. > > But not all is well in L-world. The compatibility of descriptors > implies that, when a classfile must express a semantic distinction > between a reference type and an object type, it must be allowed to do > so unambiguously, in a side channel outside of the descriptor. > > Our first thought was, "well, just load all the value types and then > you will know the list of them". If we have a global registry of > classes (such as the HotSpot system dictionary), nobody needs to > express any additional distinctions, since everybody can just ask the > register which are the value types. > > This simple idea has a useful insight, but it goes wrong in three > ways. First, for some use cases such as classfile transformation, it > might be difficult to find such a global registry; in some cases we > might prefer to rely on local information in the classfile. We need a > way for a classfile to encode, within itself, which types it is using > as value types, so that all viewers of the classfile can make > consistent decisions about what's a value and what's not. > > Second, if we are running in the JVM, the global registry of value > types has to be built up by loading classfiles. In order for every > classfile that _uses_ a value type to know its status, the classfile > the _defines_ the value type must be loaded _first_. But there is no > way to totally order these constraints, since it is easy to create > circular dependencies between value types, either directly or > indirectly. (N.B. Well-foundedness rules for layout don't eliminate > all the possible circularities.) And it won't work to add more > initialization phases ("first load all the classfiles, then let them > all start asking questions about their contents"), because that would > require preloading a classfile for every potential value type > mentioned in some other classfile. That's every name in every > `"LFoo;"` descriptor. Loading a file for every name mentioned > anywhere is very un-Java-like, and something that drastic would be > required in order to make correct decisions about value types. > > That leads to the third problem, which comes from our desire to make a > migration story. Some classfiles need to operate on value types as if > they were object references. (Below, we will see details of how > operations can differ between value and reference types.) This means > that, if we are to support migration, we need a way for legacy > classfiles to make a _local_ decision to treat a given type as a > reference type, for backward compatibility. Luckily, this is > possible, but it requires a local indication in the classfile so the > JVM can adjust certain operations. > > A solution to these problems requires a way for each classfile to > declare how it intends to use each type that is a value type, and > (what is more) a way for legacy classfiles to peacefully interoperate > with migrated value types. We have experimented with various partial > solutions, such as adding an extra bit in a context where a value type > may occur, to let the JVM know that the classfile intends a value > type. (This is the famous `ACC_FLATTENABLE` bit on fields.) But it > turns out that the number of places where value-ness is significant is > hard to limit to just a few spots where we can sprinkle a mode bit. > We need a _comprehensive_ solution that can clearly and consistently > define a classfile's (local) view of the status of each type it works > with, so that when the "value or reference?" question comes up, there > is a clear and consistent answer. We need to prevent the values and > the references from polluting each other; we need _value type > hygiene_. > > ## Value vs. reference operations > > Value types can be thought of as simpler than reference types, because > they lack two features of reference types: > > - _identity:_ Two value types with the same immediate components are > indistinguishable, even if they were created by different code > paths. Objects, by contrast, "remember" when they were created, > and each object is a unique identity. Identities are > distinguished using the `acmp` family of instructions, and Java's > `==` operator. > > - _nullability:_ Any variable of any reference type can store the > value `null`; in fact, `null` is the initial value for fields and > array elements. So `null` is one of the possible values of any > reference type, including `Object` and all interfaces. By > contrast, `null` is _not_ the value of any value type. Value type > variables are not nullable, because `null` is a reference. (But > read on for an awkward exception.) The type `Object` can > represent all values and references. Casting an unknown operand > of type `Object` to a value type `Foo` must succeed if in fact the > operand is of type `Foo`, but a null `Object` reference must never > successfully cast to a value type. > > This strong distinction between values and references is inspired, in > part, by the design of Java's primitive types, which also are identity > free and are not nullable. Every copy of the `int` value 42 is > completely indistinguishable from every other copy, and you can't cast > a `null` to `int` (without a null pointer exception). We hope > eventually to unify value types and primitives, but even if this > never comes to pass, our design slogan for value types is, _codes > like a class, works like an int_. > > By divesting themselves of identity and nullability, value types are > able to enjoy new behaviors and optimizations akin to those of > primitives, notably flattening in the heap and scalarization in > compiled code. > > To unlock these benefits, the JVM must treat values and references > as operationally distinct. Some of these operational distinctions > are quite subtle; some required months of discussion to elucidate, > though soon (we hope) they will be obvious in hindsight. > > Here is a partial list of cases where the JVM should be able to > distinguish value types from reference types: > > - _instance fields:_ A value field should be flattened (if possible) > to components in adjacent memory words. A reference field must > not be flattened, in order to retain identity and store the null > reference. > - _static fields:_ A static field must be properly initialized > to the default value of its type, not to null. This holds true > for all fields, in fact. Flattening does not seem to be important > for static fields. > - _array elements:_ An element of a value array (array whose > component type is a value type) should flatten its elements and > arrange them compactly in successive memory locations. Such > an array must be initialized to the default value of its value > type, and never to `null`. > - _methods:_ A value parameter or return value should be > flattened (if possible) to components in registers. A reference > must not be treated this way, because of identity and nullability. > - _verifier:_ The verifier needs to know value types, so it can > forbid inapplicable operations, such as `new` or `monitorenter`. > - _conversions:_ The `checkcast` operator for a value type might > reject `null` (as well as rejecting instances of the wrong type). > The `ldc` of a dynamic constant of value type must not produce > `null` (instead it must fail to link). > _ _comparisons:_ The `acmp` operator family must not detect > value type identities (since they are not present), so it must > operate differently on values and references. In some cases, > the verifier might reject `acmp` altogether. > - _optimization:_ The JIT needs to know whether it can discard > any internal reference (for a value type) and just explode the > value into registers. The possibility of `null` mixing with > value types. > > This list can be tweaked to make it shorter, by adjusting the rules in > ways that lessen the impact of ambiguity in type names. The list is > also incomplete. (We will add to it later.) Each point of > distinction is the subject of detailed design trade-offs, many of > which we are sketching here. > > Some of these distinctions can be pushed to instruction link time > (when resolved value classes may be present) or run time (when the > actual values are on stack). A dynamic check can make a final > decision, after all questions of value-ness are settled. This seems > to be a good decision for `acmp`. The linkage of a `new` instruction > can fail on a value class, or a `checkcast` instruction can reject > inspected as part of the dynamic execution of operations like `new`. > > But this delaying tactic doesn't always work. For example, field > layout must be computed during class loading, which (as was seen > above) is too early to use the supposed global list of value types. > > Even if some check can be delayed, like the detection of an erroneous > `new` on a value type, we may well decide it is more useful (or > "hygienic") to detect the error earlier, such as at verification time, > so that a broken program can be detected before it starts to run. > > Also, some operations may be contextual, to support backward > compatibility. Thus, `checkcast` may need to consult the local > classfile about whether to reject nulls, so that legacy code won't > suddenly fail to verify or execute just because it mixes nulls with > (what it thought were) references. Basically, a "legacy checkcast" > should work correctly with nulls, while an "upgraded checkcast" should > probably reject nulls immediately, without requiring extra tests. > > We will examine these points in more detail later, but now we need to > examine how to contextualize information about value types. > > ## Towards a solution > > What is to be done? The rest of this note will propose some solutions > to the problem of value type hygiene, and specifically the problem of > preventing nulls from mixing with values ("null hygiene"). > > Both Remi Forax[[1]] and Frederic Parain[[2]] have proposed the idea > of having each classfile explicitly declare the list of all value > types that it is using. For the record, this author initially > resisted the idea[[3]] as overkill: I was hoping to get away with a > band-aid (`ACC_FLATTENABLE`), but have since realized we need a more > aggressive treatment. Clean and tidy behavior from the JVM will make > it easier to implement clean and tidy support for value types in the > Java language. > > [[1]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003685.html > [[2]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003699.html > [[3]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003687.html > > Throughout the processing of the classfile, the list can serve as a > reliable local registry of decisions about values vs. references. > First we will sketch the attribute, and then revisit the points above > to see how the list may be used. > > ## The `ValueTypes` attribute > > As proposed above, let us define a new attribute called `ValueTypes` > which is simply a counted array of `CONSTANT_Class` indexes. Each > indexed constant is loaded and checked to be a value type. The JVM > uses this list of locally declared value types for all further > decisions about value types, relative to the current class. > > As a running reference, let's call the loaded class `C`. `C` may be > any class, either an object or a value. The value types locally > declared by `C` we can call `Q`, `Q1`, `Q2`, etc. These are exactly > the types which would get `Q` descriptors in Q-world. > > As an attribute, `ValueTypes` is somewhat like the `InnerClasses` > attribute. Both list all classes, within the context a particular > classfile, which need some sort of special processing. The > `InnerClasses` attribute includes additional data for informing the > special processing (including the break down of "binary names" into > outer and inner names, and extra modifier bits), but the `ValueTypes` > attribute only needs to mention the classes which are known to be > value types. > > Already with the `ACC_FLATTENABLE` bit we have successfully defined > logic that pre-loads a supposed value type, ensures that it _is_ in > fact a value type, and then allows the JVM to use all of the necessary > properties of that value type to improve the layout of the current > class. The classes mentioned in `ValueTypes` would be pre-loaded > similarly. In fact, the `ACC_FLATTENABLE` bit is no longer needed, > since the JVM can simply flatten all fields whose type names are > mantioned in the local `ValueTypes` list. > > We now come to the distinction between properly resolved classes > (`CONSTANT_Class` entries) and types named in descriptors. This > distinction is important to keep in mind. Once a proper class > constant `K` is resolved by `C`, everything is known about it, and a > permanent link to `K` goes into `C`'s constant pool. The same is not > true of other type names that occur within field and method > descriptors. In order for `C` to check whether its field type `"LK;"` > is a value type, it must _not_ try to resolve `K`. Instead it must > look for `K` _by name_ in the list of locally declared value types. > Later on, when we examine verifier types and the components of method > descriptors a similar by-name lookup will be necessary to decide > whether they refer to value types. Thus, there are two ways a type > can occur in a classfile and two ways to decide if it is a value type: > By resolving a proper constant `K` and looking at the metadata, and by > matching a name `"LK;"` against the local list. Happily, the answers > will be complete and consistent if all the queries look at the same > list. > > So a type name can be classified as a value type without resolution, > by looking for the same name in the names of the list of declared > value types. And this can be done even before the list of declared > value types is available. This means that any particular declared > value types might not need to be loaded until "hard data" is required > of it. A provisional determination of the value status of some `Q` > can be made very early before `Q`'s classfile is actually located and > pre-loaded. That provision answer might be enough to check some early > structural constraint. It seems reasonable to actually pre-load the > `Q` values lazily, and only when the JVM needs hard data about `Q`, > like its actual layout, or its actual supers. > > What if an element of `ValueTypes` turns out to be a reference type? > (Perhaps someone deployed a value-type version of `Optional` but then > got cold feet; meanwhile `C` is still using it under the impression it > is a value type.) There are two choices here, loose and strict, > either pretend the type wasn't there anyway, or raise an error in the > loading of the current classfile. The strict behavior is safer; we > can loosen it later if we find a need. The case of an element failing > to load at all can be treated like the previous problem, either > loosely or strictly; strict is better all else being equal. > > The strict treatment is also more in line with how to treat failed > resolution of super-types, which are a somewhat similar kind of > dependency: Super-types, like value types, are loaded as early as > possible, and play a role in all phases of classfile loading, notably > verification. > > One corollary of making the list an attribute is that it can be easily > stripped, just like `InnerClasses` or `BootstrapMethods`. Is this a > bug or a feature? In the case of `InnerClasses`, stripping the > attribute doesn't affect execution of the classfile but it does break > some Core Reflection queries. In the case of `BootstrapMethods`, the > structural constraints on dynamic constant pool constants will break, > and the classfile will promptly fail to load. The effect of removing > a `ValueTypes` attribute is probably somewhere in between. Because > L-world types are ambiguous, and because we specifically allow value > types to be used as references from legacy classfiles (for migration), > there's always a way to "fake" enough reference behavior from a value > type in a classfile which doesn't make special claims about it. So it > seems reasonable to allow `ValueTypes` to be stripped, at least in > principle. At a worst case the classfile will fail to load, as in the > case of a stripped `BootstrapMethods`, but the feature might actually > prove useful (say, for before-and-after migration testing). > > Note that in principle a classfile generator could choose to ignore a > value type, and treat it as a (legacy) reference type. Because of > migration, the JVM must support at least some such moves, but such > picking and choosing is not the center of our design. In particular, > we do not want the same compilation unit to treat a type as a value > type in one line of code, and a reference type in the next. This may > come later, if we decide to introduce concepts of nullable values > and/or value boxes, but we think we can defer such features. > > So for now, classfiles may differ among themselves about which types > are value types, but within a single classfile there is only one > source of local truth about value types. (Locally-sourced, fresh, > hygienic data!) > > ## Value types and class structure > > Very early during class loading, the JVM assigns an instance layout to > the new class `C`. Before that point it must first load the declared > value types (`Q1`, `Q2`, ...), and then recursively extract the layout > information from each one. There is no danger of circularity in this > because a value type instance cannot contain another instance of > itself, directly or indirectly. > > Both non-static and static fields of value type make sense (because a > value "works like an int"). But static fields interact differently > with the loading process than non-static fields. > > A static value type field has no enclosing instance, unless the JVM > chooses to make one secretly. Therefore it doesn't need to be > flattened. The JVM can make an invisible choice about how to store a > static value type field: > > - Buffered immutably on the heap and stored by (invisible) reference > next to the other statics. The `putstatic` instruction would > put the new value in a _different_ buffer and change the pointer. > - Buffered mutably somewhere, with the pointer stored next to > the other statics, or in metadata. The `putstatic` instruction > would store the flattened value into the _same_ buffer. > - Flattened fully into the same container as the other statics. > > The first option seems easiest, but the second might be more > performant. The third difficult due to bootstrapping concerns. > > In fact, the same implementation options apply for non-statics as for > statics, but only the third one (full flattening) is desirable. The > first one (immutable buffering) may be useful as a fallback > implemmentation technique for special cases like jumbo values and > fields which are `volatile`, and thus need to provide atomicity. > > The root container for all of `C`'s statics, in HotSpot, happens to be > the relevant the `java.lang.Class` value `C.class`. Presumably it's a > good place to put the invisible pointers mentioned above. > > A static field of value type `Q` cannot make its initial value > available to `getfield` until `Q`'s `` method runs, (or in the > case of re-entrant initialization, has at least started). Since > classes can circularly refer to instances of each other via static > references, `Q` might return the favor and require materialization of > `C`. > > The first time `C` requires `Q`'s default value, if `Q` has not been > initialized, its `` method should run. This may trigger > re-entry into the initializer for `C`, so `Q` needs to get its act > together _before_ it runs its ``, and immediately create `Q`'s > own default value, storing it somewhere in `Q`'s own metadata (or else > the `Class` mirror looks like a good spot). The invariant is that, > before `Q`'s class initializer can run one bytecode, the default value > for `Q` is created and registered for all time. Creating the default > value before the initializer runs is odd but harmless, as long as no > bytecode can actually access the default value without triggering > `Q`'s initialization. > > This also implies that `C` should create and register its own default > value (if it is a value type) before it runs its own `` > method, lest `Q` come back and ask `C` for its value type. > > The JVM may be required to bootstrap value-type statics as invisible > null pointers, which are inflated (invisibly by the `getstatic` and/or > `putstatic` instructions) into appropriate buffers, after ensuring the > initialization of the value type class. But it seems possible that if > the previous careful sequencing is observed, there is no need to do > lazy inflation of nulls, which would simplify the code for `getstatic` > and `putstatic`. > > ## Value types and method linkage > > A class `C` includes methods as well as fields, of course. A method > can receive or return a value type `Q` simply by mentioning `Q` as a > component of its method descriptor (as an L-descriptor `"LQ;"`). > > If a method `C.m()LD;` mentions some type `D` which is not on the > declared list, then that type `D` will be treated, like always, as a > nullable, identity-bearing reference. > > Interestingly, migration compatibility requires this to be the case > whether or not `D` is in actual fact a value type. If `C` is > unconscious of `D`'s value-ness, the JVM must respect this, and > preserve the illusion that `D` values are "just references, nothing to > see here, move along". Perhaps `D` is freshly upgraded to a value > type, and `C` isn't recompiled yet. `C` should not be penalized > for this change, if at all possible. > > This points to a core decision of the L-world design, that nearly all > of the normal operations on object references "just do the right > thing" when applied to value types. The two kinds of data use the > same descriptor syntax. Value types can be converted to `Object` > references, even though the resulting pseudo-reference does not expose > any identity (and will never be null). Instructions like `aload` > operate on values just as well as references, and so on. > > Basically, values in L-world routinely go around lightly disguised as > references, special pseudo-references which do not retain object > identity. As long as nobody looks closely, the fiction that they are > references is unbroken. If someone tries a `monitorenter` > instruction, the game is over, but we think those embarassing moments > will be rare. > > On the other hand, if a method `C.m()LQ;` uses a locally-declared > value type, then the JVM has some interesting options. It may choose > to notice that the `Q`-value is not nullable, has no identity. It can > adjust the calling sequence of `m` to work with undisguised "naked > values", which are passed on the stack, opr broken into components for > transport across the method API. This would almost be a purely > invisible decision, except that naked values cannot be null, and so > such calling sequences are hostile to null. Again, it "works like an > int". A null `Integer` value will do just the same thing if you try > to pass it to an `int`-recieving method. So we have to be prepared > for an occasional embarassing NPE, when one party thinks a type is a > nullable reference type and the other party knows it's a value type. > > One might think that it is straightforward to assign a value-using > method a calling sequence by examining the method signature and the > locally declared value types of the declaring class. But in fact > there are non-local constraints. Only static and private methods > can easily be adjusted to work with naked values. > > Unlike fields, methods can override similar methods in some `C`'s > super-type `S`. This immediately leads to the possibility of `C` and > `S` differing as to the status of some type `X` in the method's > signature. If neither of the `ValueTypes` lists of `C` and `S` > mentions `X`, then the classes are agreed that `X` is an object type > (even if in truth it happens to be a value type). They can agree > to use a reference-based calling sequence for some `m` that works > with `X`. > > If both lists mention some `Q`, then both classes agree, and in fact > it must be a value type. They might be able to agree to use "naked > values" for the `Q` type when calling the method. Or not: they still > have to worry about other supers that might have another opinion about > `Q`. > > What if `C` doesn't list `Q` but `S` does, and they share a method > that returns `Q`? For example, what about `C.m()Q` vs. `S.m()Q`? In > that case, the JVM may have already set up `S.m` to return its `Q` > result as a naked value. Probably this happend before `C` was even > loaded. The code for `C.m` will expect simply to return a normal > reference. In reality, it will be unconsciously holding a > JVM-assigned pseudo-reference to the buffered `Q`-value. The JVM must > then unwrap the reference into a naked value to match the calling > sequence it assigned (earlier, before `C` was loaded) to `S.m`. The > bottom line is that even though `C.m` was loaded as a > reference-returning function, the JVM may secretly rewrite it to > return a naked value. > > Since `C.m` returns a reference, it might choose to return `null`. > What happens then? The secretly installed adaptation logic cannot > extract the components of a buffer that doesn't exist. A > `NullPointerException` must be thrown, at the point where `C.m` is > adapted to `S.m`, which has the greater knowledge that `Q` is value > type (hence non-nullable). It will be as if the `areturn` instruction > of `C.m` included a hidden null check. > > Is such a hidden null check reasonable? One might explain that the > `C` code thinks (wrongly) it is working with boxes, while the `S` code > _knows_ it is working with values. If the method were `C.m()Integer` > and it were overriding `S.m()int`, then if `C.m` returns `null` then > the adapter that converts to `S.m()int` must throw NPE during the > implicit conversion from `Integer` to `int`. A value "works like an > int", so the result must be similar with a value type. It is as if > the deficient class `C` were working with boxes for `Q` (indeed that's > all it sees) while the knowledgeable class `S` is working with true > values. The NPE seems justifiable in such terms, although there is no > visible adapter method to switch descriptors in this case. > > The situation is a little odd when looked at the following way: If you > view nullability as a privilege, then this privilege is enjoyed only > by deficient classes, ones that have not yet been recompiled to "see" > that the type `Q` is a value type. Ignorant classes may pass `null` > back and forth through `Q` APIs, all day long, until they pass it > through a class that knows `Q` is a value. Then an `NPE` will end > their streak of luck. Is using `null` a privilege? Well, yes, but > remember also that if `Q` started its career as an object type, it was > a value-based class, and such classes are documented as being > null-hostile. The null-passers were in a fool's paradise. > > What if `C` lists `Q` as a value but `S` doesn't? Then the calling > sequence assigned when `S` was loaded will use references, and these > references will in fact be pseudo-references to buffered `Q` values > (or `null`, as just discussed). The knowledgeable method `C.m()Q` > will never produce a `null` through this API. The JVM will arrange > to properly clothe the `Q`-value produced by `C.m` into a buffer > whose pointer can be returned from `S.m`. > > Class hierarchies can be much deeper than just `C` and `S`, and > overrides can occur at many levels on the way down. Frederic Parain > has pointed out that the net result seems to be that the first > (highest) class that declares a given method (with descriptor) also > gets to determine the calling sequence, which is then imposed on all > overrides through that class. This leads to a workable implementation > strategy, based on v-table packing. A class's v-table is packed at > during the "preparation" phase of class linking, just after loading > before any subclass v-table is packed. The JVM knows, unambiguously, > whether a given v-table entry is new to a class, or is being > reaffirmed from a previous super-class (perhaps with an override, > perhaps just with an abstract). At this point, a new v-table slot can > be given a carefully selected internal calling sequence, which will > then be imposed on all overrides. An old v-table slot will have the > super's calling sequence imposed on it. In this scheme, the > interpreter and compiler must examine both the method descriptor and > some metadata about the v-table slot when performing `invokevirtual` > or `invokespecial`. > > A method coming in "sideways" from an interface is harder to manage. > It is reasonable to treat such a method as "owned" by the first proper > class that makes a v-table entry for it. But that only works for one > class hierarchy; the same method might show up in a different > hierarchy with incompatible opinions about value types in the method > signature. It appears that interface default methods, if not class > methods, must be prepared to use more than one kind of calling > sequence, in some cases. It is as if, when a class uses a default > method, it imports that method and adjusts the method's calling > sequence to agree with that class's hierarchy. > > Often an interface default method is completely new to a class > hierarchy. In that case, the interface can choose the calling > sequence, and this is likely to provide more coherent calling > sequences for that API point. > > These complexities will need watching as value types proliferate and > begin to show up in interface-based APIs. > > ## Value types and the verifier > > Let us assume that, if the verifier sees a value type, it should flag > all invalid uses of that value type immediately, rather than wait for > execution. > > (This assumption can be relaxed, in which case many points in this > section can be dropped. We may also try to get away with implementing > as few of these checks as possible, saving them for a later release.) > > When verifying a method, the verifier tracks and checks types by name, > mostly. Sometimes it pre-loads classes to see the class hierarchy. > With the `ValueTypes` attribute, there is no need to pre-load value > classes; the symbolic method is sufficient. > > The verifier type system needs a way to distinguish value types from > regular object types. To keep the changes small, this distinction can > be expressed as a local predicate on type names called `isValueType`, > implemented by referring to `ValueTypes`. In this way, the > `StackMapTable` attribute does not need any change at all. Nor does > the verifier type system need a change: value types go under the > `Object` and `Reference` categories, despite the fact that value types > are not object types, and values are not references. > > The verifier rules need to consult `isValueType` at some points. The > assignability rules for `null` must be adjusted to exclude value > classes. > > ``` > isAssignable(null, class(X, _)) :- not(isValueType(X)). > ``` > > This one change triggers widespread null rejection: wherever a value > type is required, the verifier will not allow a `null` to be on the > stack. Assuming `null` is on the stack and `Q` is a value type, the > following will be rejected as a consequence of the above change: > > - `putfield` or `putstatic` to a field of type `Q` > - `areturn` to a return type `Q` > - any `invoke` passing `null` to a parameter of type `Q` > - any `invoke` passing `null` to a receiver of type `Q` (but this is rare) > > Given comprehensive null blocking (along other paths also), the > implementation of the `putfield` (or `withfield`) instruction could go > ahead and pull a buffered value off the stack without first checking > for `null`. If the verifier does not actually reject such `null`s, > the dynamic behavior of the bytecodes themselves should, to prevent > null pollution from spreading. > > The verifier rules for `aastore` and `checkcast` only check that the > input type is an object reference of some sort. More narrow type > checks are performed at runtime. A null may be rejected dynamically > by these instructions, but the verifier logic does not need to track > `null`s for them. > > The verifier rules for `invokespecial` have special cases for `` > methods, but these do not need special treatment, since such calls > will fail to link when applied to a value type receiver. > > The verifier _could_ reject reference comparisons between value types > other operands (including `null`, other value types, and reference > types). This would look something like an extra pair of constraints > after the main assertion that two references are on the stack: > > ``` > instructionIsTypeSafe(if_acmpeq(Target), Environment, _Offset, StackFrame, > NextStackFrame, ExceptionStackFrame) :- > canPop(StackFrame, [reference, reference], NextStackFrame), > + not( canPop(StackFrame, [_, class(X, _)], _), isValueType(X) ), > + not( canPop(StackFrame, [class(X, _), _], _), isValueType(X) ), > targetIsTypeSafe(Environment, NextStackFrame, Target), > exceptionStackFrame(StackFrame, ExceptionStackFrame). > ``` > > (The JVMS doesn't use any such `not` operator. The actual Prolog > changes would be more complex, perhaps requiring a `real_reference` > target type instead of `reference`.) > > This point applies equally to `if_acmpeq`, `if_acmpne`, `if_null`, and > `if_nonnull`, > > This doesn't seem to be worth while, although it might be > interesting to try to catch javac bugs this way. In any case, such > comparisons are guaranteed to return `false` in L-world, and will > optimize quickly in the JIT. > > In a similar vein, the verifier _could_ reject `monitorenter` and > `monitorexit` instructions when they apply to value types: > > ``` > instructionIsTypeSafe(monitorenter, _Environment, _Offset, StackFrame, > NextStackFrame, ExceptionStackFrame) :- > canPop(StackFrame, [reference], NextStackFrame), > + not( canPop(StackFrame, [class(X, _)], _), isValueType(X) ), > exceptionStackFrame(StackFrame, ExceptionStackFrame). > ``` > > And a `new` or `putfield` could be quickly rejected if it applies to a > value type: > > ``` > instructionIsTypeSafe(new(CP), Environment, Offset, StackFrame, > NextStackFrame, ExceptionStackFrame) :- > StackFrame = frame(Locals, OperandStack, Flags), > CP = class(X, _), > + not( isValueType(X) ), > ... > > instructionIsTypeSafe(putfield(CP), Environment, _Offset, StackFrame, > NextStackFrame, ExceptionStackFrame) :- > CP = field(FieldClass, FieldName, FieldDescriptor), > + not( isValueType(FieldClass) ), > ... > ``` > > Likewise `withfield` could be rejected by the verifier if applied to a > non-value type. > > The effect of any or all of these verifier rule changes (if we choose > to implement them) would be to prevent local code from creating a > `null` and accidentally putting it somewhere a value type belongs, or > from accidentally applying an identity-sensitive operation to an > operand _known statically_ to be a value type. These rules only work > when a sharp verifier type unambiguously reports an operand as `null` > or as a value type. > > Nulls must also be rejected, and value types detected, when they are > hidden, at verification time, under looser types like `Object`. > Protecting local code from outside `null`s must also be done > dynamically. > > Omitting all of these rules will simply shift the responsibility for > null rejection and value detection fully to dynamic checks at > execution time, but such dynamic checks must be implemented in any > case, so the verifier's help is mainly an earlier error check, > especially to prevent null pollution inside of a single stack frame. > For that reason, the only really important verifier change is the > `isAssignable` adjustment, mentioned first. > > The dynamic checks which back up or replace the other verifier checks > will be discussed shortly. > > ## Value types and legacy classfiles > > We need to discuss the awkward situation of `null` being passed as a > value type, and value types being operated on as objects, by legacy > classfiles. One legacy classfile can dump null values into surprising > places, even if all the other classfiles are scrupulous about > containing `null`. > > We will also observe some of the effects of having value types > "invade" a legacy classfile which expects to apply identity-sensitive > operations to them. > > By "legacy classfile" we of course mean classfiles which lack > `ValueTypes` attributes, and which may attempt to misuse value types > in some way. (Force of habit: it's strong.) We also can envision > half-way cases where a legacy classfile has a `ValueTypes` attribute > which is not fully up to date. In any case, there is a type `Q` which > is _not_ locally declared as a value type, by the legacy class `C`. > > The first bad thing that can happen is that `C` declares a field of > type `Q`. This field will be formatted as a reference field, even > though the field type is a value type. Although we implementors might > grumble a bit, the JVM will have to arrange to use pseudo-pointers to > represent values stored in that field. (It's as if the field were > volatile, or not flattenable for some other reason.) That wasn't too > bad, but look what's in the field to start with: It's a null. That > means that any legitmate operation on this initial value will throw an > `NPE`. Of course, the writer of `C` knew `Q` as a value-based class, > so the initial null will be discarded and replaced by a suitable > non-null value, before anything else happens. > > What if `C` makes a mistake, and passes a `null` to another class > which _does_ know `Q` is a value? At that point we have a choice, as > with the verifier's null rejection whether to do more work to detect > the problem earlier, or whether to let the `null` flow through and > eventually cause an `NPE` down the line. Recall that if an API point > gets a calling sequence which recognizes that `Q` is a value type, it > will probably unbuffer the value, throwing `NPE` immediately if `C` > makes a mistake. This is good, because that's the earliest we could > hope to flag the mistake. But if the method accepts the boxed form of > `Q`, then the `null` will sneak in, skulk around in the callee's stack > frame, and maybe cause an error later. > > Meanwhile, if the JVM tries to optimize the callee, it will have to > limit its optimizations somewhat, because the argument value is > nullable (even if only ever by mistake). To cover this case, it may > be useful to define that _method entry_ to a method that knows about > `Q` is null-hostile, even if the _calling sequence_ for some reason > allows references. This means that, at function entry, every known > value type parameter is null-checked. This needs to be an official > rule in the JVM, not just an optimization for the JIT, in order for > the JIT to use it. > > What if our `C` returns a `null` value to a caller who intends to use > it as a value? That won't go well either, but unless we detect the > `null` aggressively, it might rattle around for a while, disrupting > optimization, before produing an inscrutable error. ("Where'd that > `null` come from??"). The same logic applies as with arguments: When > a `null` is returned from a method call that purports to return `Q`, > this can only be from a legacy file, and the calling sequences were > somehow not upgraded. In that case, the JVM needs to mandate a null > check on every method invocation which is known to return a value > type. > > The same point also applies if another class `A`, knowing `Q` as a > value type, happens to load a `null` from one of `C`'s fields. The > `C` field is formatted as a reference, and thus can hand `A` a > surprise `null`, but `A` must refuse to see it, and throw `NPE`. > Thus, the `getfield` instruction, if it is pointed at a legacy > non-flattened field, will need to null-check the value loaded > from the field. > > Meanwhile, `C` is allowed to `putfield` and `getfield` `null` all day > long into its own fields (and fields of other benighted legacy classes > that it may be friends with). Thus, the `getfield` and `putfield` > instructions link to slightly different behavior, not only based on > the format of the field, but _also_ based on "who's asking". Code in > `C` is allowed to witness `null`s in its `Q` fields, but code in `A` > (upgraded) is _not_ allowed to see them, even though it's the same > `getfield` to the same symbolic reference. Happily, fields are not > shared widely across uncoordinated classfiles, so this is a corner > case mainly for testers to worry about. > > What if `C` stores a `null` into somebody else's `Q` field, or into an > element of a `Q[]` array? In that case, `C` must throw an immediate > `NPE`; there's no way to reformat someone else's data structure, > however out-of-date `C` may be. > > What if `C` gets a null value from somewhere and casts it to `Q`? > Should the `checkcast` throw `NPE` (as it should in a classfile where > `Q` is known to be a value type)? For compatibility, the answer is > "no"; old code needs to be left undisturbed if possible. After all, > `C` believes it has a legitimate need for `null`s, and won't be fixed > until it is recompiled and its programmer fixes the source code. > > That's about it for `null`. If the above dynamic checks are > implemented, then legacy classfiles will be unable to disturb upgraded > classfiles with surprise null values. The goal mentioned above > about controlling `null` on all paths is fulfilled blocking `null` > across API calls (which might have a legacy class on one end), and by > verifying that `null`s never mix with values, locally within a single > stack frame. > > There are a few other things `C`'s could do to abuse `Q` values. > Legacy code needs to be prevented immediately from making any of the > following mistakes: > > - `new` of `Q` should throw `ICCE` > - `putfield` to a field of `Q` should throw `ICCE` > - `monitorenter`, `monitorexit` on a `Q` value should throw `IllegalMonitorStateException` > > Happily, the above rules are not specific to legacy code but apply > uniformly everywhere. > > A final mistake is executing an `acmp` instruction on a value type. > Again, this is possible everywhere, not just in legacy files, even if > the verifier tries to prevent the obvious occurrences. There are > several options for `acmp` on value types. The option which breaks > the least code and preserves the O(1) performance model of `acmp` is > to quickly detect a value type operand and just report `false`, even > if the JVM can tell, somehow, that it's the same buffer containing the > same value, being compared to itself. > > All of these mistakes can be explained by analogy, supposing that the > legacy class `C` were working with a box type `Integer` where other > classes had been recoded to use `int`. All variables under `C`'s > control are nullable, but when it works with new code it sees only > `int` variables. Implicit conversions sometimes throw `NPE`, and > `acmp` (or `monitorenter`) operations on boxed `Integer` values yield > unspecific (or nonsensical) results. > > ## Value types and instruction linkage > > Linked instructions which are clearly wrong should throw a > `LinkageError` of some type. Examples already given are `new` and > `putfield` on value types. > > When a field reference of value type is linked it will have to > correctly select the behavior required by both the physical layout of > the field, and also the stance toward any possible `null` if the field > is nullable. (As argued above, the stance is either lenient for > legacy code or strict for new code.) > > A `getstatic` linkage may elect to replace an invisible `null` with > a default value. > > When an `invoke` is linked it will have to arrange to correctly > execute the calling sequence assigned to its method or its v-table. > > Linkage of `invokeinterface` will be even more dynamic, since the > calling sequence cannot be determined until the receiver class is > examined. > > Linkage of dynamic constants in the constant pool must reject `null` > for value types. Value types can be determined either globally based > on the resolved constant type, or locally based on the `ValueTypes` > attribute associated with the constant pool in which the resolution > occurs. > > ## Value types and instruction execution > > Most of the required dynamic behaviors to support value type hygiene > have already been mentioned. Since values are identity-free and > non-nullable, the basic requirement is to avoid storing `null`s in > value-type variables, and degrade gracefully when value types are > queried about their identities. A secondary requirement is to support > the needs of legacy code. > > For null hygeine, the following points apply: > > - A nullable argument, return value (from a callee), > or loaded field must be null-checked before being further > processed in the current frame, if its descriptor is locally > declared as a value type. > - `checkcast` should reject `null` for _locally_ declared value > types, but not for others. > - If the verifier does not reject `null`, the `areturn`, `putfield` > `withfield` instructions should do so dynamically. (Otherwise the > other rules are sufficient to contain `null`s.) > - An `aastore` to a value type array (`Q[]`) should reject `null` > even if the array happens to use invisible indirections as an > implementation tactic (say, for jumbo values). This is a purely > dynamic behavior, not affected by the `ValueTypes` attribute. > > Linked field and invoke instructions need sufficient linkage metadata > to correctly flatten instance fields and use unboxed (and/or `null` > hostile) calling sequences. > > As discussed above, the `acmp` must short circuit on values. This is > a dynamic behavior, not affected by the `ValueTypes` attribute. > > Generally speaking, any instruction that doesn't refer to the constant > pool cannot have contextual behavior, because there is no place to > store metadata to adjust the behavior. The `areturn` instruction is > an exception to this observation; it is a candidate for bytecode > rewriting to gate the extra null check for applicable methods. > > ## Value types and reflection > > Some adjustments may be needed for the various reflection APIs, in > order to bring them into alignment with the changed bytecode. > > - `Class.cast` should be given a null-hostile partner > `Class.castValue`, to emulate the updated `checkcast` semantics. > - `Field` should be given a dynamic `with` to emulate `withfield`, > and the `Lookup` API given a way to surface the corresponding MH. > - `Class.getValueTypes`, to reflect the attribute, may be useful. > > ## Conclusions > > The details are complex, but the story as a whole becomes more > intelligible when we require each classfile to locally declare its > value types, and handle all values appropriately according to the > local declaration. > > Outside of legacy code, and at its boundaries, tight control of null > values is feasible. Inside value-rich code, and across value-rich > APIs, full optimization seems within reach. > > Potential problems with ambiguity in L-world are effectively addressed > by a systematic side channel for local value type declarations, > assisting the interpratation of `L`-type descriptors. This side > channel can be the `ValueTypes` attribute. > From john.r.rose at oracle.com Tue May 8 04:25:53 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 7 May 2018 21:25:53 -0700 Subject: value type hygiene In-Reply-To: <4BD8F0A2-73F8-4DF6-8969-A92E24C646D8@oracle.com> References: <4BD8F0A2-73F8-4DF6-8969-A92E24C646D8@oracle.com> Message-ID: <659C82D3-0F15-4739-B5D9-9C0E8AED6F62@oracle.com> On May 7, 2018, at 6:06 PM, Paul Sandoz wrote: > > Thanks for sharing this! > > I like the null containment approach. It recognizes that nulls (for better or worse) are a thing in the ref world but stops the blighters from infecting the value world at the borders. > > We will need to extend this hygiene to javac and the libraries. For any particular null-rejecting behavior, we have a choice of doing in the VM only, doing it in javac only (Objects.requireNN), or doing it in both. I think this is true for all of the null checks proposed. (Fourth choice: Neither; don't reject nulls and hope they don't sneak in.) A good example is checkcast. javac knows each value type that is the subject of a cast, so the checkcast bytecode doesn't necessarily have to include a null check; javac could follow each checkcast by a call to O.rNN. A few considerations tip me towards putting it into the instruction rather than letting javac have the job: - When wouldn't we add O.rNN after checkcast Q? If it's always there, isn't it safer to fold it into the checkcast? - No temptation to bytecode generators to fool around with "optimizing away" the null check. - The JVM has more complete information about APIs and schemas, so it can better optimize away the checks than javac can. - The amended checkcast corresponds better to the source-level operation. - Code complexity is better (smaller methods) if the bytecode has a higher-level behavior. These same considerations apply to all the cases of dynamic null rejection: - checkcast - null check of incoming parameters - null check of received return types (after invoke* to legacy code) - null check of field reads (from legacy fields, which might be null). Although it's easy to imagine javac putting O.rNN in all of these places, it will become annoying enough that we'll want to let the JVM do it. The JVM will have on-line information about some things, such as which methods or fields might be legacy code, which allow it to omit checks many times. If the bytecodes conspire to prevent nulls from getting into verifiable Q-types, then the verifier can add robustness by excluding mixes of aconst_null with Q-types, which is a value add. This checking is inherently after javac, so it's not possible if we require javac to do some or all of the null checking explicitly. > Javac could fail to compile when it knows enough, and in other cases place in explicit null checks if not otherwise performed by existing instructions so as to fail fast. I think it's the case that if we adopt all of the null-checking proposals I wrote, then javac won't need any explicit ones. I think that's one possible sweet spot for the design. I currently don't see a sweeter spot, in fact. > Certain APIs that rely on null as a signal will need careful reviewing and possible adaption if the prevention has some side effects, and maybe errors/warnings from javac. The poster child being Map.get, but others like Map.compute are problematic too (if a value is not present for a key, then a null value is passed to the remapping function). Yep. The sooner we implement (a) a JVM that has a clean upward model for value types, and (b) a javac with a null-sniffing lint mode, then we can being experimenting with this stuff. We can even experiment with Map, List, Function, etc., in their current erased forms. The idea would be "values work like ints, so use these APIs with values just like you would with nullable Integer wrappers". For List.of (and any other null-rejecting API) this will just work out of the box. For Map we'll have to train ourselves to avoid Map.get and use methods like computeIfAbsent. We might want to add more methods to make it easier to avoid nulls, such as Map.getOrElse(K,V) which returns V if the map entry is not present. We can also define an interface type which is somehow related to concrete value types (this should be a template but it also works in the erased world): interface ValueRef { default V byValue() { return (V) this; } } The idea would be to make every value type explicitly (via javac) or implicitly (via JVM magic) implement this interface, instantiated to itself, of course. This interface would then stand for a nullable version of the value type itself, and could be used safely with Map. Map> m = ?; val ref = m.get("pi"); if (ref == null) return "no pi for you"; val pi = ref.byValue(); // NPE if we didn't check first! return "pi="+pi; Here, ComplexDouble to ValueRef is like int is to Integer. ?Pretty much, although the conversion between CD and VR is one way only while int and Integer convert both ways. Also VR still has no object identity, which is fine. More importantly, in L-world erased generics will accept both ComplexDouble and ValueRef. > How we proceed might depend on whether specialized generics progresses at a slower rate rate than value types. Indeed. And I'm pretty sure we will be ready to ship a workable value type system before we have finished figuring out the specialized generics. ? John From forax at univ-mlv.fr Tue May 8 10:06:53 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 8 May 2018 12:06:53 +0200 (CEST) Subject: value type hygiene In-Reply-To: <659C82D3-0F15-4739-B5D9-9C0E8AED6F62@oracle.com> References: <4BD8F0A2-73F8-4DF6-8969-A92E24C646D8@oracle.com> <659C82D3-0F15-4739-B5D9-9C0E8AED6F62@oracle.com> Message-ID: <211281922.569203.1525774013983.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "Paul Sandoz" > Cc: "valhalla-spec-experts" > Envoy?: Mardi 8 Mai 2018 06:25:53 > Objet: Re: value type hygiene > On May 7, 2018, at 6:06 PM, Paul Sandoz wrote: >> >> Thanks for sharing this! >> >> I like the null containment approach. It recognizes that nulls (for better or >> worse) are a thing in the ref world but stops the blighters from infecting the >> value world at the borders. >> >> We will need to extend this hygiene to javac and the libraries. > > For any particular null-rejecting behavior, we have a choice of > doing in the VM only, doing it in javac only (Objects.requireNN), > or doing it in both. I think this is true for all of the null checks > proposed. (Fourth choice: Neither; don't reject nulls and hope > they don't sneak in.) > > A good example is checkcast. javac knows each value type > that is the subject of a cast, so the checkcast bytecode doesn't > necessarily have to include a null check; javac could follow each > checkcast by a call to O.rNN. A few considerations tip me towards > putting it into the instruction rather than letting javac have the job: > > - When wouldn't we add O.rNN after checkcast Q? If it's always > there, isn't it safer to fold it into the checkcast? > - No temptation to bytecode generators to fool around with > "optimizing away" the null check. > - The JVM has more complete information about APIs and schemas, > so it can better optimize away the checks than javac can. > - The amended checkcast corresponds better to the source-level > operation. > - Code complexity is better (smaller methods) if the bytecode has > a higher-level behavior. > > These same considerations apply to all the cases of dynamic null > rejection: > - checkcast > - null check of incoming parameters > - null check of received return types (after invoke* to legacy code) > - null check of field reads (from legacy fields, which might be null). > > Although it's easy to imagine javac putting O.rNN in all of > these places, it will become annoying enough that we'll want > to let the JVM do it. The JVM will have on-line information about > some things, such as which methods or fields might be legacy > code, which allow it to omit checks many times. > > If the bytecodes conspire to prevent nulls from getting into > verifiable Q-types, then the verifier can add robustness by > excluding mixes of aconst_null with Q-types, which is a value > add. This checking is inherently after javac, so it's not possible > if we require javac to do some or all of the null checking explicitly. > >> Javac could fail to compile when it knows enough, and in other cases place in >> explicit null checks if not otherwise performed by existing instructions so as >> to fail fast. > > I think it's the case that if we adopt all of the null-checking proposals > I wrote, then javac won't need any explicit ones. I think that's one > possible sweet spot for the design. I currently don't see a sweeter > spot, in fact. > >> Certain APIs that rely on null as a signal will need careful reviewing and >> possible adaption if the prevention has some side effects, and maybe >> errors/warnings from javac. The poster child being Map.get, but others like >> Map.compute are problematic too (if a value is not present for a key, then a >> null value is passed to the remapping function). > > Yep. The sooner we implement (a) a JVM that has a clean upward > model for value types, and (b) a javac with a null-sniffing lint mode, > then we can being experimenting with this stuff. I agree. > > We can even experiment with Map, List, Function, etc., in their current > erased forms. The idea would be "values work like ints, so use these > APIs with values just like you would with nullable Integer wrappers". > > For List.of (and any other null-rejecting API) this will just work out > of the box. > > For Map we'll have to train ourselves to avoid Map.get and use methods > like computeIfAbsent. We might want to add more methods to make > it easier to avoid nulls, such as Map.getOrElse(K,V) which returns V > if the map entry is not present. Map.getOrDefault() already exists :) > > We can also define an interface type which is somehow related to > concrete value types (this should be a template but it also works > in the erased world): > > interface ValueRef { > default V byValue() { return (V) this; } > } > > The idea would be to make every value type explicitly (via javac) > or implicitly (via JVM magic) implement this interface, instantiated > to itself, of course. This interface would then stand for a nullable > version of the value type itself, and could be used safely with > Map. > > Map> m = ?; > val ref = m.get("pi"); > if (ref == null) return "no pi for you"; > val pi = ref.byValue(); // NPE if we didn't check first! > return "pi="+pi; > > Here, ComplexDouble to ValueRef is like > int is to Integer. ?Pretty much, although the conversion between > CD and VR is one way only while int and Integer convert > both ways. Also VR still has no object identity, which is fine. > More importantly, in L-world erased generics will accept both > ComplexDouble and ValueRef. Its' not clear to me that we have to do something, when you do a List.add or a Map.put, because those method takes an Object, the value type will be buffered, then when the value type is stored in an array (ArrayList) or a field (HashMap), the value type is then boxed. Now when you call, List.get() or Map.get(), the boxed object is returned, and retrandformed to a value type when it hit the cast inserted by the compiler due to the erasure. If someone call Map.get() with an unknown key, null is returned and the cast fails (with the extended semantics for cat) because null is not a valid value for a value type. It's more problematic for the API that send null as an argument of a lambda if this lambda ask for a value type because at runtime the user will get a NPE in the bridge. So the question, Should the cast do a nullcheck or should introduce a nullcheck implicitly, reduces itself to, Is sending null to a lambda a common case or not. And BTW, another solution is to introduce a new opcode vcheckcast, that does nullcheck+checkcast while checkcast don't, so the compiler will use vcheckcast in caller code due to the erasure and checkcast in the bridge code to allow a user to be able to do a nullcheck. This model is more complex for javac because it means that a value type taken as parameter may be null while a value type that comes from a return value can not be null. > >> How we proceed might depend on whether specialized generics progresses at a >> slower rate rate than value types. > > Indeed. And I'm pretty sure we will be ready to ship a workable value > type system before we have finished figuring out the specialized generics. i agree. > > ? John R?mi From daniel.smith at oracle.com Wed May 9 23:46:07 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 9 May 2018 17:46:07 -0600 Subject: value type hygiene In-Reply-To: References: Message-ID: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> > On May 6, 2018, at 3:17 AM, John Rose wrote: > > Like many of us, I have been thinking about the problems of keeping values, nulls, > and objects separate in L-world. I wrote up some long-ish notes on the subject. > I hope it will help us wrap our arms around the problem, and get it solved. > > TL;DR: Remi was right in January. We need a ValueTypes attribute. > > http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html So I've been digesting this for a few days. I don't like it much. Subtle contextual dependencies are a good recipe for exploits and general confusion. If it were the only way forward, okay, but I find myself frequently thinking, "yeah, but... Q types!" The way you've framed the problem has evolved from the original idea. Which is fine, but it's helpful to review: the idea was to make a choice between two type hierarchies, U-world and L-world: U / \ L Q or L / \ R Q The crux of the choice was: in what way do value types interact with legacy bytecode? Does the old code reject values, or does it get automatically enhanced to work with them? We acknowledged that, in the latter hierarchy, we must push many operations into the top, which minimizes the need for 'R' and 'Q', perhaps so much that they can be elided entirely. You said in a November write-up: "The Q-type syntax is *maybe* needed, but in any case does not appear in a parallel position of importance with the dominant L-type syntax." In other words, working exclusively with L types wasn't a requirement, it was a might-be-nice. So we set out on an experiment to see how far we could get without 'R' and 'Q'. My read of the current situation is that we've probably stretched that to the breaking point, so: good experiment, we've learned some things, and we understand what value 'Q' types give us. Another read is that we're not ready to end the experiment yet, we have a few tricks up our sleeves, and we can force this to work. That's fair, but I'm not convinced we need to force it. Not changing descriptors is not a hard requirement. (To be clear about my preferred alternative: we introduce Q types as first-class types (applicable to value classes only), update the descriptor syntax, assert QFoo <: LFoo, and ask compilers to use Qs when they want to guarantee non-nullability and allow flattenability. Compilers generate bridge methods (and bridge fields?) where needed/if desired.) You talk a little about why it's nice to avoid changing descriptors: "L-world is backward compatible with tools that must parse classfile descriptors, since it leaves descriptor syntax unchanged. There have been no changes to this syntax in almost thirty years, and there is a huge volume of code that depends on its stability. The HotSpot JVM itself makes hundreds of distinct decisions based on descriptor syntax which would need careful review and testing if they were to be adapted to take account of a new descriptor type ("QFoo;", etc.)." Okay, put that in the "pro" column for "Should we leave descriptors untouched?" In the "con" column is all the weird new complexity in this proposal. Notably: - The mess of overloading and implicit adaptations. Huge complexity cost here, from spec to implementation to debugging. We've been there before, and have always thrown up our hands and retreated (not always for the same reasons, but still). - The JVM "knows" internally about the two kinds of types, but we won't give users the ability to directly express them, or inspect them with reflection. That mismatch seems bound to bite us repeatedly. - We talk a lot about nullability being a migration problem, but it is sometimes just a really nice feature! All things being equal, not being able to freely talk about nullable value types is limiting. I'd rather spend the feature budget on getting dusty code to work with shiny new descriptors than on dealing with these problems/compromises. I guess that, before going all in on this approach, it would be helpful for me to see a more complete exploration of the relative costs. From forax at univ-mlv.fr Thu May 10 09:11:37 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 10 May 2018 11:11:37 +0200 (CEST) Subject: value type hygiene In-Reply-To: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> Message-ID: <490843007.1145546.1525943497000.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "daniel smith" > ?: "John Rose" > Cc: "valhalla-spec-experts" > Envoy?: Jeudi 10 Mai 2018 01:46:07 > Objet: Re: value type hygiene >> On May 6, 2018, at 3:17 AM, John Rose wrote: >> >> Like many of us, I have been thinking about the problems of keeping values, >> nulls, >> and objects separate in L-world. I wrote up some long-ish notes on the subject. >> I hope it will help us wrap our arms around the problem, and get it solved. >> >> TL;DR: Remi was right in January. We need a ValueTypes attribute. >> >> http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html > > So I've been digesting this for a few days. I don't like it much. Subtle > contextual dependencies are a good recipe for exploits and general confusion. > If it were the only way forward, okay, but I find myself frequently thinking, > "yeah, but... Q types!" > > The way you've framed the problem has evolved from the original idea. Which is > fine, but it's helpful to review: the idea was to make a choice between two > type hierarchies, U-world and L-world: > > U > / \ > L Q > > or > > L > / \ > R Q > > The crux of the choice was: in what way do value types interact with legacy > bytecode? Does the old code reject values, or does it get automatically > enhanced to work with them? > > We acknowledged that, in the latter hierarchy, we must push many operations into > the top, which minimizes the need for 'R' and 'Q', perhaps so much that they > can be elided entirely. You said in a November write-up: > > "The Q-type syntax is *maybe* needed, but in any case does not appear in a > parallel position of importance with the dominant L-type syntax." > > In other words, working exclusively with L types wasn't a requirement, it was a > might-be-nice. > > So we set out on an experiment to see how far we could get without 'R' and 'Q'. > My read of the current situation is that we've probably stretched that to the > breaking point, so: good experiment, we've learned some things, and we > understand what value 'Q' types give us. > > Another read is that we're not ready to end the experiment yet, we have a few > tricks up our sleeves, and we can force this to work. That's fair, but I'm not > convinced we need to force it. Not changing descriptors is not a hard > requirement. Q-Type (if the roots is j.l.Object + interfaces) and having a ValueTypes attributes are two different encoding of the same semantics, either the descriptor is a Q-type or the descriptor is a L-type and you have a side table that says it's a Q-type. > > (To be clear about my preferred alternative: we introduce Q types as first-class > types (applicable to value classes only), update the descriptor syntax, assert > QFoo <: LFoo, and ask compilers to use Qs when they want to guarantee > non-nullability and allow flattenability. Compilers generate bridge methods > (and bridge fields?) where needed/if desired.) The main difference between the two encodings is that you have to generate bridges in case of Q-type. Generating bridges in general is far from obvious (that's why invokedynamic to the adaptation at caller site btw), you need a subtype relation, like String <: T for generics, if you do not have a subtype relationship you can not generate bridges. For value types, QFoo <: LFoo is not what we need, by example, we want the following example to work, let say i have: class A { void m(LFoo) } class B extends A { void m(LFoo) } Foo is now declared as value type, and now i recompile B class B extends A { void m(QFoo) } if i call A::m, i want B::m to be valid at runtime, so QFoo has also to be a super type of LFoo. so the relation between QFoo and LFoo is more like auto-boxing, you have QFoo <: LFoo but you also have QFoo <: LFoo because of the separate compilation issue, and if you do not have a subtyping relationship between types, you can not generate bridges. > > You talk a little about why it's nice to avoid changing descriptors: > > "L-world is backward compatible with tools that must parse classfile > descriptors, since it leaves descriptor syntax unchanged. There have been no > changes to this syntax in almost thirty years, and there is a huge volume of > code that depends on its stability. The HotSpot JVM itself makes hundreds of > distinct decisions based on descriptor syntax which would need careful review > and testing if they were to be adapted to take account of a new descriptor type > ("QFoo;", etc.)." > > Okay, put that in the "pro" column for "Should we leave descriptors untouched?" > In the "con" column is all the weird new complexity in this proposal. Notably: > > - The mess of overloading and implicit adaptations. Huge complexity cost here, > from spec to implementation to debugging. We've been there before, and have > always thrown up our hands and retreated (not always for the same reasons, but > still). i believe you have the same mess of adaptation whatever the encoding, it's due to the fact that you want to allow people to upgrade to value type from a reference type. > > - The JVM "knows" internally about the two kinds of types, but we won't give > users the ability to directly express them, or inspect them with reflection. > That mismatch seems bound to bite us repeatedly. The fact that Java the language surface if a type is a value type or not is a language issue and it's true for both encoding. For the refection, at runtime, you now if a class is a value type or not, the same is true for both encoding. If you mean, that at runtime, you can not see if a method was compiled with the knowledge that a type is a value type or not, again, it depends if you surface Q-type or the ValueTypes attributes at runtime, so this choice is independent of the encoding. > > - We talk a lot about nullability being a migration problem, but it is sometimes > just a really nice feature! All things being equal, not being able to freely > talk about nullable value types is limiting. again, it's a language thing, it's the same issue for both encoding. > > I'd rather spend the feature budget on getting dusty code to work with shiny new > descriptors than on dealing with these problems/compromises. all problems are the same for both encodings, the only difference is that you avoid the bridging problem by using a side attribute. So the question is more, should we allow to retrofit a reference type to be a value type seamlessly, if the answer is yes, then QFoo <: LFoo is not enough so we can not use Q-type but we can use a side table, if the answer is no, then QFoo <: LFoo is ok, we permit to retrofit a L-type to a Q-type, but user code as to wait that all its dependencies have been updated to use the Q-type before being able to use it. > > I guess that, before going all in on this approach, it would be helpful for me > to see a more complete exploration of the relative costs. regards, R?mi From brian.goetz at oracle.com Thu May 10 15:52:32 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 10 May 2018 11:52:32 -0400 Subject: value type hygiene In-Reply-To: References: Message-ID: Thanks for this great writeup. I find much to agree with here, and a few things to be concerned about (I?ll express the latter in a separate mail; Dan touched on some of them.) Now that we see it, elevating from ACC_FLATTENABLE to the ValueTypes attribute makes obvious sense. The key thing to reify is whether V was a value type at the time C was compiled. This flows into many decisions within C, and at the boundary of C and other V-users, so capturing it in one place makes sense. I?ll add that this reminds me very much of loader constraints. When class C calls method D.m(P)R, we first textually match the call with m(P)R in D via descriptor match, and then we additionally make sure that C and D agree on any loader constraints, throwing an error if they do not. In L-world, whether C and D think V is a value or object class is another kind of constraint. At linkage time, if these constraints agree, they can use an optimized protocol; if they disagree, rather than failing, the VM can introduce hidden adaptation to iron out the disagreement. This is a big win over the use of bridges in Q-world, since the adaptors are only generated at runtime when they are strictly needed, and as the ecosystem gets recompiled over time to a more uniform view of V?s value-ness, will steadily go away. We saw shades of this in Albert?s first prototype of heisenboxes, where the JIT compiled multiple versions of each method (if needed) according to different views of value-ness, and then fit them together, lego-style. A note on the responses: - I think the Map.get() discussion is a red herring. This is a signature that simply makes no sense when V is a value. We?ve looked at several alternatives ? optional-bearing, a pattern match (case withMapping(var v)), a get-with-default, etc. In Q-world, we observed that sometimes a method doesn?t make it to the any-fied version; it becomes a restricted method that only makes sense on reference types. In L-world, we don?t necessarily have ?ref V? to fall back on (though we might), but there will need to be some way to give Map.get() a gold watch and thank it for its service (and lament that the best name has been retired from the namespace.) I?ll start a separate thread on my concerns. From maurizio.cimadamore at oracle.com Thu May 10 16:27:00 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 10 May 2018 17:27:00 +0100 Subject: value type hygiene In-Reply-To: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> Message-ID: <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> On 10/05/18 00:46, Dan Smith wrote: > (To be clear about my preferred alternative: we introduce Q types as first-class types (applicable to value classes only), update the descriptor syntax, assert QFoo <: LFoo, and ask compilers to use Qs when they want to guarantee non-nullability and allow flattenability. Compilers generate bridge methods (and bridge fields?) where needed/if desired.) Yes! To be clear, it seems like we've been before - I recall a rainy Friday spent at a whiteboard in Dublin a couple of years ago on this one. As you point out there's a tension here: on the one hand, 'just' using L-types (at least in method signatures) give you a path, when it comes to type-specialization, not to touch method bodies. On the other hand, if L means "maybe L, maybe Q", clients have no way to disambiguate L-code from Q-code - meaning that the Q-accepting-method-in-L-disguise will always have to be prepared for handling things coming from outside its clean almost-Q-but-not-quite bubble. Also, with my bridge hat on (having written many of them) I have to note that, while Remi is right in saying that having separate descriptors for L and Q is *almost* (added by me) equally expressive as having just L with a side-channel (e.g. attribute which lists which Ls are Qs), that move restricts us a lot in terms of bridg-y move we can make - as, at the VM level there's only one true signature (the L one) and you can't write a bridge (from where? to whom?). Unless we want to start emit bridges that have signature that is identical to the bridged thing - but this seems to violate so many constraints that I don't think it's even worth mentioning (in fact pretend I didn't write it :-)). Maurizio From forax at univ-mlv.fr Thu May 10 16:42:06 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 10 May 2018 18:42:06 +0200 (CEST) Subject: value type hygiene In-Reply-To: <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> Message-ID: <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Maurizio Cimadamore" > ?: "daniel smith" , "John Rose" > Cc: "valhalla-spec-experts" > Envoy?: Jeudi 10 Mai 2018 18:27:00 > Objet: Re: value type hygiene > On 10/05/18 00:46, Dan Smith wrote: >> (To be clear about my preferred alternative: we introduce Q types as first-class >> types (applicable to value classes only), update the descriptor syntax, assert >> QFoo <: LFoo, and ask compilers to use Qs when they want to guarantee >> non-nullability and allow flattenability. Compilers generate bridge methods >> (and bridge fields?) where needed/if desired.) > Yes! > > To be clear, it seems like we've been before - I recall a rainy Friday > spent at a whiteboard in Dublin a couple of years ago on this one. As > you point out there's a tension here: on the one hand, 'just' using > L-types (at least in method signatures) give you a path, when it comes > to type-specialization, not to touch method bodies. > > On the other hand, if L means "maybe L, maybe Q", clients have no way to > disambiguate L-code from Q-code - meaning that the > Q-accepting-method-in-L-disguise will always have to be prepared for > handling things coming from outside its clean almost-Q-but-not-quite bubble. > > Also, with my bridge hat on (having written many of them) I have to note > that, while Remi is right in saying that having separate descriptors for > L and Q is *almost* (added by me) equally expressive as having just L > with a side-channel (e.g. attribute which lists which Ls are Qs), that > move restricts us a lot in terms of bridg-y move we can make - as, at > the VM level there's only one true signature (the L one) and you can't > write a bridge (from where? to whom?). Unless we want to start emit > bridges that have signature that is identical to the bridged thing - but > this seems to violate so many constraints that I don't think it's even > worth mentioning (in fact pretend I didn't write it :-)). You can try to bridge inside the implementation, i.e. you have one method, so one signature, with an indy in it (indy let you access to the class metadata and the runtime view) that can decide to which implementations (implemented as side static methods) you want to delegate. I've already done this in case you have one method in Java that match multiple methods in a dynamic language. (I did not write too) > > > Maurizio R?mi From paul.sandoz at oracle.com Thu May 10 16:47:54 2018 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 10 May 2018 09:47:54 -0700 Subject: value type hygiene In-Reply-To: References: Message-ID: > On May 10, 2018, at 8:52 AM, Brian Goetz wrote: > > Thanks for this great writeup. I find much to agree with here, and a few things to be concerned about (I?ll express the latter in a separate mail; Dan touched on some of them.) > > Now that we see it, elevating from ACC_FLATTENABLE to the ValueTypes attribute makes obvious sense. The key thing to reify is whether V was a value type at the time C was compiled. This flows into many decisions within C, and at the boundary of C and other V-users, so capturing it in one place makes sense. > > I?ll add that this reminds me very much of loader constraints. When class C calls method D.m(P)R, we first textually match the call with m(P)R in D via descriptor match, and then we additionally make sure that C and D agree on any loader constraints, throwing an error if they do not. In L-world, whether C and D think V is a value or object class is another kind of constraint. At linkage time, if these constraints agree, they can use an optimized protocol; if they disagree, rather than failing, the VM can introduce hidden adaptation to iron out the disagreement. Also bridges generated by generics are naturally a place for such checks/adaptions from the ref world to the value world, the cast could be coopted to perform the null check and throw e.g. forEach'ing with a Consumer over a List. > This is a big win over the use of bridges in Q-world, since the adaptors are only generated at runtime when they are strictly needed, and as the ecosystem gets recompiled over time to a more uniform view of V?s value-ness, will steadily go away. We saw shades of this in Albert?s first prototype of heisenboxes, where the JIT compiled multiple versions of each method (if needed) according to different views of value-ness, and then fit them together, lego-style. > > A note on the responses: > > - I think the Map.get() discussion is a red herring. This is a signature that simply makes no sense when V is a value. We?ve looked at several alternatives ? optional-bearing, a pattern match (case withMapping(var v)), a get-with-default, etc. In Q-world, we observed that sometimes a method doesn?t make it to the any-fied version; it becomes a restricted method that only makes sense on reference types. In L-world, we don?t necessarily have ?ref V? to fall back on (though we might), but there will need to be some way to give Map.get() a gold watch and thank it for its service (and lament that the best name has been retired from the namespace.) > Yes Map.get has to somehow retire (although i still think it represents a good use case of what to do at the boundary of the value and ref worlds, perhaps List.get is a better case to discuss in this regard) IMO that?s part of the hygiene we need to do to the libraries. I just don?t have a strong sense on how to retire this if value types and specialized generics proceed at different rates. We can start by deprecating it (with forRemoval? that might be tricky), but perhaps it requires some compiler and linkage error when used with values? Paul. > I?ll start a separate thread on my concerns. > > From daniel.smith at oracle.com Thu May 10 16:54:48 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 10 May 2018 10:54:48 -0600 Subject: value type hygiene In-Reply-To: <490843007.1145546.1525943497000.JavaMail.zimbra@u-pem.fr> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <490843007.1145546.1525943497000.JavaMail.zimbra@u-pem.fr> Message-ID: > On May 10, 2018, at 3:11 AM, Remi Forax wrote: > > Q-Type (if the roots is j.l.Object + interfaces) and having a ValueTypes attributes are two different encoding of the same semantics, either the descriptor is a Q-type or the descriptor is a L-type and you have a side table that says it's a Q-type. Yes, with some huge caveats attached to the attribute strategy: - You have to pick one mode for all types of a given class in your class file - The semantics are indirect; people will get used to reading them as a property of the class name, when in reality they're a property of a side attribute ("Debugging: I know Foo is a value class, so why is this null slipping through?...") - Descriptor equality is redefined so that non-equal descriptors match (that is, where one descriptor uses a Q type and one uses an L type); adaptations are necessary to make mismatched descriptors cooperate - We'll probably try very hard to present users with the fiction that there is only one type (e.g., in reflection) > The main difference between the two encodings is that you have to generate bridges in case of Q-type. > > Generating bridges in general is far from obvious (that's why invokedynamic to the adaptation at caller site btw), you need a subtype relation, like String <: T for generics, if you do not have a subtype relationship you can not generate bridges. > > For value types, QFoo <: LFoo is not what we need, by example, we want the following example to work, > let say i have: > class A { > void m(LFoo) > } > class B extends A { > void m(LFoo) > } > Foo is now declared as value type, and now i recompile B > class B extends A { > void m(QFoo) > } > if i call A::m, i want B::m to be valid at runtime, so QFoo has also to be a super type of LFoo. > > so the relation between QFoo and LFoo is more like auto-boxing, you have QFoo <: LFoo but you also have QFoo <: LFoo because of the separate compilation issue, and if you do not have a subtyping relationship between types, you can not generate bridges. Tentatively, the bridge generation strategy I envision looks like this: - When I convert a class to a value class, I annotate it ("@WasAReferenceClass") - When a descriptor mentions a Q type, the compiler also generates an L bridge There are problems with this: for example, when mentioning n distinct Q types, you need 2^n bridges. And maybe there are things the JVM can do to help?we've explored lots of general-purpose "this class has moved" features. My preference is to tackle those problems as needed, on their own terms. But, yes, I'll grant that probably having the JVM totally ignore the problem ultimately won't work. >> - The JVM "knows" internally about the two kinds of types, but we won't give >> users the ability to directly express them, or inspect them with reflection. >> That mismatch seems bound to bite us repeatedly. > > The fact that Java the language surface if a type is a value type or not is a language issue and it's true for both encoding. > For the refection, at runtime, you now if a class is a value type or not, the same is true for both encoding. > If you mean, that at runtime, you can not see if a method was compiled with the knowledge that a type is a value type or not, again, > it depends if you surface Q-type or the ValueTypes attributes at runtime, so this choice is independent of the encoding. The reflection question boils down to: are there two java.lang.Class objects per value class, or one? My read of the goals here is that we'd very much like for there to be only one, for the same reason that we'd like to not change the spelling of descriptors. In that world, I think it will be hard to reason about where null checks happen. (Sure, maybe you can figure it out by consulting the ValueTypes attributes, but that's a huge pain.) >> - We talk a lot about nullability being a migration problem, but it is sometimes >> just a really nice feature! All things being equal, not being able to freely >> talk about nullable value types is limiting. > > again, it's a language thing, it's the same issue for both encoding. I don't buy this. If the JVM doesn't give me (a compiler writer) the direct ability to talk about nullable value types, I can maybe work around that. But there will be seams. It will be confusing. Debugging will be messy. > So the question is more, should we allow to retrofit a reference type to be a value type seamlessly, > if the answer is yes, then QFoo <: LFoo is not enough so we can not use Q-type but we can use a side table, > if the answer is no, then QFoo <: LFoo is ok, we permit to retrofit a L-type to a Q-type, but user code as to wait that all its dependencies have been updated to use the Q-type before being able to use it. For the "answer is no" case: in the scenario where I've started using QFoo, but a library still uses LFoo, what can I do? - Subtyping still works, so I can pass QFoos in. Great! - When I get LFoos out, I will want to null check them and convert to QFoo. Fine. - A QFoo[] that I pass in will reject nulls. Which is to be expected. If the semantics demand nullability, I should use an LFoo[] instead. - Similar with objects I pass in that have LFoo->QFoo bridge methods: there will be null checks, if that's a problem, the objects shouldn't operate on QFoos. - No new identities or boxes get created. It's the same values passing between the two APIs. - The library doesn't get the flattening benefits. It needs to make a choice to opt in to them first. This seems like a fine picture. Ideally, it envisions a language that gives some fine-grained control over whether "Foo" means QFoo or LFoo. Maybe we'll provide that ability in Java?I don't know. It's nice if the JVM gives languages the ability to make that choice. From john.r.rose at oracle.com Thu May 10 18:53:27 2018 From: john.r.rose at oracle.com (John Rose) Date: Thu, 10 May 2018 11:53:27 -0700 Subject: value type hygiene In-Reply-To: <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> Message-ID: <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> On May 10, 2018, at 9:42 AM, Remi Forax wrote: > > You can try to bridge inside the implementation, i.e. you have one method, so one signature, with an indy in it (indy let you access to the class metadata and the runtime view) that can decide to which implementations (implemented as side static methods) you want to delegate. I've already done this in case you have one method in Java that match multiple methods in a dynamic language. (I did not write too) Yes, this is something the proposal buys us (by having L-only). The JVM sets up bridges internally, at runtime (or AOT time), using full knowledge of types. (At least, as they are loaded.) The easiest thing is to assign *one* calling sequence per v-table slot, based on the preferences of the first class allocating the slot. That can fail under obscure circumstances when one v-table slot is constrained by two supers (1 or 2 interfaces), which is rare but possible. For single inheritance, everything works. I'd like to explore, making MI-induced conflicts a CLC failure, under the theory that they won't happen in practice. CLCs, after all, exist to prevent divergence of opinions about the classfile of a named type, and the point of contention is a v-table slot. So this is natural to try as a rider on the CLC rules. (Note that failures only come when, out of two supers in a MI situation, one fails to be recompiled with knowledge of values. E.e., a third-party interface using LocalDate fails to recompile, and fourth-party code does recompile but implements the third-party interface *and* also implements a core interface that also mentions the same descriptor. Sounds rare to me.) If, after experimentation, we realize that we don't want to tolerate a failure in those corner cases (of legacy APIs mixed in just right) then jump forward to Brian's suggestion, which is simply making up multiple internal/invisible bridges. They only have to be built for default methods, AFAICS. Since they are a pain and source of bugs, I don't want to do them in our earlier explorations. Dan could rejoin here (following Remi's method but in reverse) that adding the Q-types doesn't change this story much: The JVM can treat Q-types and L-types as interchangeable, *except* when v-table calling sequences are assigned by the JVM, and when things get flattened. The thing I like about ValueTypes that Q-types don't give me is that it restrains clever bytecode generators from creating nullable versions of Q-types *for the long haul*. How about this for a proposal: We do Q-less L-world for now, forcing language compilers to make the val-vs-ref decision consistently *for each whole classfiles* and by implication *as globally as possible*, and save for later a new descriptor bit which says "this thing is nullable" (if a value) and/or "this thing is not nullable" (if a ref). Such descriptors would give us emotional types. They are not needed (and a distraction) to give us value types. ? John From brian.goetz at oracle.com Thu May 10 18:53:56 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 10 May 2018 14:53:56 -0400 Subject: value type hygiene In-Reply-To: References: Message-ID: As promised, here's the part I'm not so comfortable with.? My concern has specifically to do with migration.? Suppose V is a value-based class (as per https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html), which is later migrated to a value type.? D has been compiled before this transition, so its view of V is still that it is an object type. Here's where I think this approach gets too optimistic: > Well, yes, but > remember also that if `Q` started its career as an object type, it was > a value-based class, and such classes are documented as being > null-hostile. The null-passers were in a fool's paradise. Objection, Your Honor, assumes facts not in evidence!? The letters "n-u-l-l" do not appear in the definition of value-based linked above.? Users have no reasons to believe any of he following are bad for a VBC, among others: ??? V v = null; ??? if (v == null) { ... } ??? List list = new ArrayList<>(); ??? list.add(null); In Q world, these uses of V were compiled to LV rather than QV, so these idioms mapped to a natural and sensible translation.? Our story was that when you went to recompile, then the stricter requirements of value-ness would be enforced, and you'd have to fix your code.? (That is: making V a value is binary compatible but not necessarily source compatible.? This was a pretty valuable outcome.) One of the possible coping strategies in Q world for such code is to allow the box type LV to be denoted at source level in a possibly-ugly way, so that code like the above could continue to work by saying "V.BOX" instead of "V".? IOW, you can opt into the old behavior, and even mix and match between int and Integer.? So users who wanted to fight progress had some escape hatches. While I don't have a specific answer here, I do think we have to back up and reconsider the assumption that all uses of V were well informed about V's null-hostility, and have a more nuanced notion of the boundaries between V-as-value-world and V-as-ref-world. From john.r.rose at oracle.com Thu May 10 19:06:28 2018 From: john.r.rose at oracle.com (John Rose) Date: Thu, 10 May 2018 12:06:28 -0700 Subject: value type hygiene In-Reply-To: <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> Message-ID: <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> On May 10, 2018, at 11:53 AM, John Rose wrote: > > The easiest thing is to assign *one* calling sequence per v-table > slot, based on the preferences of the first class allocating the slot. > P.S. I suppose that key point was too telegraphic. Trying again: The easiest thing is to recognize that we can (and do, already) assign *one* v-table slot and highest super-class to each concrete method defined in a class. (Default methods don't get to do this, though.) If the method is not an override (according to descriptor match, not JLS rules), then that class is the super and the v-table slot is fresh. If the method is an override, then it inherits its v-table slot from the method it is overriding. Given that assignment, we can assign a unique calling sequence per v-table slot, and (equivalently) to the tree of overrides in that slat. Base this on the preferences of the highest class (common super). All concrete methods sharing that same v-table slot must interoperate with the same view of types. The CLCs dictate the same thing, already. The CLC mechanism may be extended (if we choose) to spread the unique calling sequence constraint through default methods. This would enforce a unique view of val vs. ref across all v-table slots, for each concrete method, even defaults. The above logic works even if your JVM impl. doesn't use v-tables. The rules simple decorate methods according to the preferences of the highest classes in in each override tree. Can the "highest class" define its method as abstract, not-concrete? Sure; then you have an override forest, not an override tree. So you assign the same calling sequence to the whole forest. And the information is rooted in the abstract method. That's why I started out by saying the information flows from a v-table slot. But it flows through concrete overrides, as well as (optionally) the top method that they all override. From paul.sandoz at oracle.com Thu May 10 19:39:38 2018 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 10 May 2018 12:39:38 -0700 Subject: value type hygiene In-Reply-To: References: Message-ID: > On May 10, 2018, at 11:53 AM, Brian Goetz wrote: > >> Well, yes, but >> remember also that if `Q` started its career as an object type, it was >> a value-based class, and such classes are documented as being >> null-hostile. The null-passers were in a fool's paradise. > > Objection, Your Honor, assumes facts not in evidence! The letters "n-u-l-l" do not appear in the definition of value-based linked above. Users have no reasons to believe any of he following are bad for a VBC, among others: > > V v = null; > > if (v == null) { ... } > > List list = new ArrayList<>(); > list.add(null); > > In Q world, these uses of V were compiled to LV rather than QV, so these idioms mapped to a natural and sensible translation. Our story was that when you went to recompile, then the stricter requirements of value-ness would be enforced, and you'd have to fix your code. (That is: making V a value is binary compatible but not necessarily source compatible. This was a pretty valuable outcome.) > > One of the possible coping strategies in Q world for such code is to allow the box type LV to be denoted at source level in a possibly-ugly way, so that code like the above could continue to work by saying "V.BOX" instead of "V". IOW, you can opt into the old behavior, and even mix and match between int and Integer. So users who wanted to fight progress had some escape hatches. > > While I don't have a specific answer here, I do think we have to back up and reconsider the assumption that all uses of V were well informed about V's null-hostility, and have a more nuanced notion of the boundaries between V-as-value-world and V-as-ref-world. > If source files tend to be migrated as whole units rather than bit by bit then perhaps this problem can considered similar to compiling with the --release flag? Thus there is no explicit intermixing of the two worlds within one compilation unit. Paul. From john.r.rose at oracle.com Thu May 10 20:36:08 2018 From: john.r.rose at oracle.com (John Rose) Date: Thu, 10 May 2018 13:36:08 -0700 Subject: value type hygiene In-Reply-To: References: Message-ID: On May 10, 2018, at 11:53 AM, Brian Goetz wrote: > > Objection, Your Honor, assumes facts not in evidence! The letters "n-u-l-l" do not appear in the definition of value-based linked above. Users have no reasons to believe any of he following are bad for a VBC, among others: Objection sustained. You are right; null-hostility is not documented in VBCs. It *is* documented for the VBC Optional: > A variable whose type is {@code Optional} should > never itself be {@code null}; it should always point to an {@code Optional} > instance. ?but not for the VBC LocalDate. So some VBCs that are null-friendly will require a more nuanced migration story, since we don't want value types per se to be nullable. Options: Make *some* value types accept null according to some ad hoc opt-in API, yuck. Make a nullable type constructor available, like int?, Optional? or Optional.BOX. Or limit nullability to particular uses of the same type. Or create ValueRef as a workaround interface (no VM or language changes!). The last two are my prefs. The nullable type constructor is fairly principled and arguably useful, but *not necessary* for value types per se, unless it is the cheapest way to migrate. Which IMO it isn't. (N.B. Assuming migration is a goal.) My high-order concern in all of this is to reduce degrees of freedom in classfiles, from "every descriptor makes its own choice about nullity" down to "each classfile makes its decision about each type's valueness" backed up by "and if they don't agree, the JVM expects the user to fix it by recompilation or special workarounds". This is why I'm not jumping at the shiny possibility of int? and String! "just by adding info to the descriptors"; the JVM complexity costs are large and optional for value types. > In Q world, these uses of V were compiled to LV rather than QV, so these idioms mapped to a natural and sensible translation. Our story was that when you went to recompile, then the stricter requirements of value-ness would be enforced, and you'd have to fix your code. (That is: making V a value is binary compatible but not necessarily source compatible. This was a pretty valuable outcome.) > > One of the possible coping strategies in Q world for such code is to allow the box type LV to be denoted at source level in a possibly-ugly way, so that code like the above could continue to work by saying "V.BOX" instead of "V". IOW, you can opt into the old behavior, and even mix and match between int and Integer. So users who wanted to fight progress had some escape hatches. > > While I don't have a specific answer here, I do think we have to back up and reconsider the assumption that all uses of V were well informed about V's null-hostility, and have a more nuanced notion of the boundaries between V-as-value-world and V-as-ref-world. That's fair. Following is a specific answer to consider, plus a completely different one in a P.S. I'd like to spell V.BOX as ValueRef, at least just for argument. More on ValueRef: @ForValueTypesOnly interface ValueRef> { @CanBeFreebieDownCast @SuppressWarnings("unchecked") default T byValue() { return (T) this; } } Ignore the annotations for a moment. Here's an example use (after LocalDate migrates, and is given ValueRef as a super): LocalDate ld = LocalDate.now(); ValueRef ldOrNull = ld; if (p) idOrNull = null; ld = (LocalDate) ldOrNull; // downcast with null check ld = ldOrNull.byValue(); // same thing ValueRef and T are bijective apart from null, with the usual downcast and upcast. Differences between the companion types are: - T is not nullable (if a VT), while ValueRef is (being an interface) - converting ValueRef to T requires an explicit cast, the other way is implicit - you can't call T's methods on ValueRef The language builds in an upcast for free, but the downcast has to be explicit. If we were to put in the upcast as a "freebie" supplied by the language, then we'd have ourselves a wrapper type, like Integer for int: id = idOrNull; //freebie downcast, with null check idOrNull = id; //normal upcast This shows a possible way to associate a box type with each value type, with only incremental changes to JLS and JVMS. (Also worth considering, later on, is imputing methods of T to ValueRef, and conversely methods of Integer to int. Save that for later when we retcon prims as vals.) The @ForValueTypesOnly annotation means that it is probably useless for object types to implement this interface. The Java compiler should warn or error out if that is attempted. The JVM could refuse ValueRef as a super on object types at class load time, if that would add value. Since ValueRef and T are both subtypes of Object, it is also possible to use ValueRef as an ad hoc substitute for T when T itself is a generic type parameter that erases to Object. We might be able to pull off stunts like this: interface Map { V.BOX get(K key); } ?where V.BOX erases to V's bound, and instantiates as ValueRef when V implements ValueRef (tricky, huh?) and otherwise instantiates as V itself (non-value types, as today). Basically, ValueRef can define the special gymnastics we need for nullable value types, without much special pleading to the JLS or JVMS, and then T.BOX can map to either T or ValueRef as needed. (There's also P.BOX waiting in the wings, if P is a primitive type. Maybe we want P.UNBOX.) Maybe there's a story here. What's important for JVM-level value hygiene is that we seem to have our choice of several stories for dealing with legacy nulls, and that none of our choices forces us into fine-grained per-descriptor declarations about nullity or value-ness. ? John P.S. As a completely different solution, we could make a value type nullable in an ad hoc, opt-in manner. This is the one I said "yuck" about above. Here FTR is a simple way to do that. Define a marker interface which says "this value type can be converted from null". @ForValueTypesOnly interface NullConvertsToDefault { } The main thing this would do, when applied as a super to a value type Q, is tell the JVM not to throw NPE when casting to Q, but rather substitute Q's default value. If LocalDate were to implement this marker interface, then null-initialized variables (of whatever source) would cast to LocalDate.default rather than throw NPE. The methods on LocalDate could then opt to DTRT, perhaps even throw NPE for a close emulation of the legacy behavior. Not many adjustments are needed in the JLS to make this a workable tool for migration, but comparisons against null would be necessary: "ld == null" should not return false, but rather translate to something like these instructions: checkcast LocalDate vdefault LocalDate #push LD.default invokestatic System.substitutableValues This translation could be made generic across the NCTD option if we were willing for "v == null" to throw NPE: checkcast LocalDate aconst_null; checkcast LocalDate //NPE or LD.default invokestatic System.substitutableValues At the language level, comparison with an explicit null would be amended by adding the checkcast: //if (ld == null) //=> if (ld == (LocalDate)null) //either NPE or LD.default It begins to get messy, does it not? I think it's a long string if you pull on it. Where are all the cut points where null converts to default? Should the default be stored back over the null, ever? Are null and default distinct values, or do we try to pretend they are the same? Should default ever convert back to null? Etc., etc. From forax at univ-mlv.fr Thu May 10 21:34:40 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 10 May 2018 23:34:40 +0200 (CEST) Subject: value type hygiene In-Reply-To: References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <490843007.1145546.1525943497000.JavaMail.zimbra@u-pem.fr> Message-ID: <141285681.1349697.1525988080163.JavaMail.zimbra@u-pem.fr> > De: "daniel smith" > ?: "Remi Forax" > Cc: "John Rose" , "valhalla-spec-experts" > > Envoy?: Jeudi 10 Mai 2018 18:54:48 > Objet: Re: value type hygiene >> On May 10, 2018, at 3:11 AM, Remi Forax < [ mailto:forax at univ-mlv.fr | >> forax at univ-mlv.fr ] > wrote: >> Q-Type (if the roots is j.l.Object + interfaces) and having a ValueTypes >> attributes are two different encoding of the same semantics, either the >> descriptor is a Q-type or the descriptor is a L-type and you have a side table >> that says it's a Q-type. > Yes, with some huge caveats attached to the attribute strategy: > - You have to pick one mode for all types of a given class in your class file Attaching this attribute on a method makes little sense since you compile/re-compile a java class as a whole. BTW, attaching the attribute to the nest host is not a good idea even if it's what is the closest to a compilation unit because it will trigger the classloading of the nest host very early since the verifier will use this attribute. > - The semantics are indirect; people will get used to reading them as a property > of the class name, when in reality they're a property of a side attribute > ("Debugging: I know Foo is a value class, so why is this null slipping > through?...") here, we are talking about the people that read bytecode, javap can be patched to show in the signature what is considered as a value type or not. > - Descriptor equality is redefined so that non-equal descriptors match (that is, > where one descriptor uses a Q type and one uses an L type); adaptations are > necessary to make mismatched descriptors cooperate yes, see John's mail about trying to do the adaptation while keeping one vtable slot. > - We'll probably try very hard to present users with the fiction that there is > only one type (e.g., in reflection) yes, you do not want to show them bridges/ValueTypes attribute unless they want to know. >> The main difference between the two encodings is that you have to generate >> bridges in case of Q-type. >> Generating bridges in general is far from obvious (that's why invokedynamic to >> the adaptation at caller site btw), you need a subtype relation, like String <: >> T for generics, if you do not have a subtype relationship you can not generate >> bridges. >> For value types, QFoo <: LFoo is not what we need, by example, we want the >> following example to work, >> let say i have: >> class A { >> void m(LFoo) >> } >> class B extends A { >> void m(LFoo) >> } >> Foo is now declared as value type, and now i recompile B >> class B extends A { >> void m(QFoo) >> } >> if i call A::m, i want B::m to be valid at runtime, so QFoo has also to be a >> super type of LFoo. >> so the relation between QFoo and LFoo is more like auto-boxing, you have QFoo <: >> LFoo but you also have QFoo <: LFoo because of the separate compilation issue, >> and if you do not have a subtyping relationship between types, you can not >> generate bridges. > Tentatively, the bridge generation strategy I envision looks like this: > - When I convert a class to a value class, I annotate it ("@WasAReferenceClass") > - When a descriptor mentions a Q type, the compiler also generates an L bridge > There are problems with this: for example, when mentioning n distinct Q types, > you need 2^n bridges. And maybe there are things the JVM can do to help?we've > explored lots of general-purpose "this class has moved" features. My preference > is to tackle those problems as needed, on their own terms. > But, yes, I'll grant that probably having the JVM totally ignore the problem > ultimately won't work. People will publish articles showing that a few line of Java can generate very big vtable at runtime, like they were several articles on exponential verification time before the split verifier. >>> - The JVM "knows" internally about the two kinds of types, but we won't give >>> users the ability to directly express them, or inspect them with reflection. >>> That mismatch seems bound to bite us repeatedly. >> The fact that Java the language surface if a type is a value type or not is a >> language issue and it's true for both encoding. >> For the refection, at runtime, you now if a class is a value type or not, the >> same is true for both encoding. >> If you mean, that at runtime, you can not see if a method was compiled with the >> knowledge that a type is a value type or not, again, >> it depends if you surface Q-type or the ValueTypes attributes at runtime, so >> this choice is independent of the encoding. > The reflection question boils down to: are there two java.lang.Class objects per > value class, or one? My read of the goals here is that we'd very much like for > there to be only one, for the same reason that we'd like to not change the > spelling of descriptors. In that world, I think it will be hard to reason about > where null checks happen. (Sure, maybe you can figure it out by consulting the > ValueTypes attributes, but that's a huge pain.) the VM has to consult it to generate a NPE so the VM can emit an error message that say which value type is expected to be not null. >>> - We talk a lot about nullability being a migration problem, but it is sometimes >>> just a really nice feature! All things being equal, not being able to freely >>> talk about nullable value types is limiting. >> again, it's a language thing, it's the same issue for both encoding. > I don't buy this. If the JVM doesn't give me (a compiler writer) the direct > ability to talk about nullable value types, I can maybe work around that. But > there will be seams. It will be confusing. Debugging will be messy. Separate compilation issues are always messy but if the NPE has a specific error message, we are a stackoverflow entry away from people being able to debug their problem. >> So the question is more, should we allow to retrofit a reference type to be a >> value type seamlessly, >> if the answer is yes, then QFoo <: LFoo is not enough so we can not use Q-type >> but we can use a side table, >> if the answer is no, then QFoo <: LFoo is ok, we permit to retrofit a L-type to >> a Q-type, but user code as to wait that all its dependencies have been updated >> to use the Q-type before being able to use it. > For the "answer is no" case: in the scenario where I've started using QFoo, but > a library still uses LFoo, what can I do? > - Subtyping still works, so I can pass QFoos in. Great! > - When I get LFoos out, I will want to null check them and convert to QFoo. > Fine. > - A QFoo[] that I pass in will reject nulls. Which is to be expected. If the > semantics demand nullability, I should use an LFoo[] instead. > - Similar with objects I pass in that have LFoo->QFoo bridge methods: there will > be null checks, if that's a problem, the objects shouldn't operate on QFoos. > - No new identities or boxes get created. It's the same values passing between > the two APIs. > - The library doesn't get the flattening benefits. It needs to make a choice to > opt in to them first. > This seems like a fine picture. Ideally, it envisions a language that gives some > fine-grained control over whether "Foo" means QFoo or LFoo. Maybe we'll provide > that ability in Java?I don't know. It's nice if the JVM gives languages the > ability to make that choice. You can do everything you list apart the first item if the answer is no, Q-type and the attribute ValueTypes are two encodings of the same problem. The idea of the ValueType attribute is to encode the knowledge about value types of the compiler at the time the code was compiled so the VM can do the adaptations/checks needed to allow users to upgrade a reference type to a value type. R?mi From forax at univ-mlv.fr Thu May 10 22:08:57 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 11 May 2018 00:08:57 +0200 (CEST) Subject: value type hygiene In-Reply-To: <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> Message-ID: <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> Hi John, it's an implementation details, right ! The strawman strategy is to always consider that you have to send a pointer, so you need to buffer value types before calling a virtual method, if it's not a virtual method you can do the adaptation because you know the caller and the callee. All other strategies should be semantically equivalent. R?mi > De: "John Rose" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Jeudi 10 Mai 2018 21:06:28 > Objet: Re: value type hygiene > On May 10, 2018, at 11:53 AM, John Rose < [ mailto:john.r.rose at oracle.com | > john.r.rose at oracle.com ] > wrote: >> The easiest thing is to assign *one* calling sequence per v-table >> slot, based on the preferences of the first class allocating the slot. > P.S. I suppose that key point was too telegraphic. > Trying again: > The easiest thing is to recognize that we can (and do, already) > assign *one* v-table slot and highest super-class to each concrete > method defined in a class. (Default methods don't get to do this, > though.) If the method is not an override (according to descriptor > match, not JLS rules), then that class is the super and the v-table > slot is fresh. If the method is an override, then it inherits its v-table > slot from the method it is overriding. > Given that assignment, we can assign a unique calling sequence > per v-table slot, and (equivalently) to the tree of overrides in that > slat. Base this on the preferences of the highest class (common > super). All concrete methods sharing that same v-table slot > must interoperate with the same view of types. The CLCs > dictate the same thing, already. The CLC mechanism may > be extended (if we choose) to spread the unique calling sequence > constraint through default methods. This would enforce a > unique view of val vs. ref across all v-table slots, for each > concrete method, even defaults. > The above logic works even if your JVM impl. doesn't use > v-tables. The rules simple decorate methods according to > the preferences of the highest classes in in each override > tree. > Can the "highest class" define its method as abstract, > not-concrete? Sure; then you have an override forest, > not an override tree. So you assign the same calling > sequence to the whole forest. And the information is > rooted in the abstract method. That's why I started out > by saying the information flows from a v-table slot. > But it flows through concrete overrides, as well as > (optionally) the top method that they all override. From john.r.rose at oracle.com Fri May 11 01:36:33 2018 From: john.r.rose at oracle.com (John Rose) Date: Thu, 10 May 2018 18:36:33 -0700 Subject: value type hygiene In-Reply-To: <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> Message-ID: On May 10, 2018, at 3:08 PM, forax at univ-mlv.fr wrote: > > The strawman strategy is to always consider that you have to send a pointer, so you need to buffer value types before calling a virtual method, if it's not a virtual method you can do the adaptation because you know the caller and the callee. All other strategies should be semantically equivalent. Calling sequence is just an implementation choice, but there are two ways it can poke up into the user model. Here's my understanding of how that works, and what our options are, FTR. One of the motivations for using ValueTypes (or ad hoc V/Q variation) instead of ACC_FLATTENABLE is being able to assign scalarized calling sequences uniformly across an override tree (i.e. the methods reached by virtual calls which use the same v-table entry). Buffering and scalarizing are semantically equivalent except buffering also supports null. On the other hand, scalarizing is faster in many cases. For any given component for the shared method-descriptor of an override tree, if we can prove that all methods in the tree are null hostile (throwing NPE on entry and/or never returning null), then we can hoist this property outside the tree. We can throw NPE at the call site, instead of after the virtual dispatch into a method. Then we get the payoff: All methods in the override tree may be compiled to use a scalarized representation of the affected argument or return type. All virtual calls into that tree use a non-buffered representation, too. There could be an interface default method, or some other method, which is simultaneously a member of two trees with two different decisions about scalarization vs. buffering. This can be handled by having the JVM create multiple adapters. I'd rather forbid the condition that requires multiple adapters as a CLC violation, because it is potentially complex and buggy, and it's not clear we need this level of service from the JVM. If I'm right, this is one way calling sequences can poke up into the user model. In the end, the JVM could go the extra mile and spin adapters. (We spin multiple method adapters, after all, for on-stack replacement?which is a rare but provably valuable optimization. First releases of the JVM omitted this optimization, until it was proven valuable, and then we put in the effort. I like the move of deferring optimizations which are not yet proven valuable!) We can't always get the scalarization payoff, though. If legacy code is making virtual calls into the override tree (via the single v-table slot), *and* if there is at least one legacy method *in* the override tree, then we can concoct cases where a null is significant and must not be rejected by the common calling sequence used by the tree. At that point buffering is a forced move, or else we declare that the program does not fully enjoy its legacy behaviors. (See below for an example, where 'Bleg' makes a legacy call to its own legacy The other way the calling sequence of an override tree pokes up into the user model is if we declare an override tree to be hostile to nulls (on some method descriptor component type) then dynamically loaded legacy code could come late to the party and add a null-loving method to the override tree. At that point, the legacy code cannot fully enjoy legacy semantics. There's a choice here: When the legacy method shows up in a scalarizing override tree, either reject the class on CLC grounds, or allow the method but firewall it from nulls. That is, virtual calls to the legacy method will be forbidden from returning null, and they will never see null arguments, even if the legacy code is expecting to do something useful with them. I am proposing the firewall instead of the harsher CLC. Again, JVM could go the extra mile to make this problem disappear, by re-organizing the calling sequence of the override tree as soon as the first legacy method shows up. For simplicity I'd rather exclude this tactic until forced by experience to add it. It seems like a heroic optimization to me, seldom used and likely to be buggy. It also seems to spoil the whole performance party when one bad actor shows up, which looks dubious to me. I think we need to experiment with a restrictive model that allow easy scalarization across override trees. With firewalling of legacy methods in override trees defined by modern classes, and with CLC-like rejecting of modern interfaces mixing into override trees which have already been classified as legacy trees (w.r.t. some particular method descriptor component). (FTR, there's also the option of polymorphic calls, where the the virtual calling sequences does a both-and, passing either buffered or scalarized arguments, and using some convention for the caller to say which is which. The callee would respond appropriately. This is likely to be slower than buffering in some cases, but it could be given an optimistic fast path sort of like invokeExact has.) None of these problems occur with distinct Q and L descriptors, since if you want scalarization you just say Q and legacy code can't bother you. L-world adds ambiguity, which is why we are having this discussion about override trees and calling sequences. Resolving the ambiguity with ValueType attributes reduces the complexity of the problem. In fact, I think the problem is clearly manageable *if* the JVM is allowed to exclude hard cases on CLC-like grounds. If the JVM is required to go the extra mile and reorganize calling sequences on the fly, then we should consider whether going back to Q-world is easier, but in that case the same problems of migration appear elsewhere, with many adapters, probably more than even the worst case in L-world. ? John P.S. Here's an example of the misadventures of a late-to-the-party legacy method. #ValueTypes(Q) class A { Q m(Q q) { return q; } } // modern A.m(null) ==> NPE #ValueTypes(Q) class A2 extends A { Q m(Q q) { return q; } } // modern A2.m(null) ==> NPE A a = p ? new A() : new A2(); // a.m(null) ==> NPE // (no ValueTypes attr) class Bleg extends A { Q m(Q q) { return q; } } // legacy method B.m(null) == null // choices at this point: // - firewall m so it never sees null, // - refuse to load Bleg (b/c CLCs) // - heroically refactor override tree of A.m A ab = new Bleg(); // ab.m(null) ==> NPE if firewall // ab.m(null) == null if heroic refactor (loss of perf. too) #ValueTypes(A) class Client { static { Bleg b = new Bleg(); b.m(null); //==> NPE b/c of local knowledge }} // (no ValueTypes attr) class Cleg { static { A a = new A2(); b.m(null); //==> NPE b/c A2.m is null hostile and/or whole A.m override tree Bleg b = new Bleg(); b.m(null); //==> NPE b/c if JVM-assigned firewall on Bleg.m <: A.m //b.m(null) == null if heroic refactor (loss of perf. too) }} Even if Bleg and Cleg privately agree that Q is nullable, if Cleg makes an invokevirtual of Bleg.m method, it will get the consensus of the modern override tree of A.m, which is to throw NPE on null Q. Bleg and Cleg can be the same class, in which case the class is making calls to itself, but still gets null rejection due to a policy decision in a supertype. Is this tolerable or not? If not, should we forbid Blog from loading, on CLC-like grounds? I'd like to experiment, first with the firewalling option. From frederic.parain at oracle.com Fri May 11 14:39:14 2018 From: frederic.parain at oracle.com (Frederic Parain) Date: Fri, 11 May 2018 10:39:14 -0400 Subject: value type hygiene In-Reply-To: References: Message-ID: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> John, I have a question about the semantic within legacy class files (class files lacking a ValueTypes attribute). Your document clarifies the semantic for fields as follow: "Meanwhile, C is allowed to putfield and getfield null all day long into its own fields (and fields of other benighted legacy classes that it may be friends with). Thus, the getfield and putfield instructions link to slightly different behavior, not only based on the format of the field, but also based on ?who?s asking?. Code in C is allowed to witness nulls in its Q fields, but code in A (upgraded) is not allowed to see them, even though it?s the same getfield to the same symbolic reference. Happily, fields are not shared widely across uncoordinated classfiles, so this is a corner case mainly for testers to worry about.? But what?s about arrays? If I follow the same logic that ?old code needs to be left undisturbed if possible?, if a legacy class C creates an array of Q, without knowing that Q is now a value type, C would expect to be allowed to write and read null from this array, as it does from its own fields. Is it a correct assumption? This would mean that the JVM would have to make the distinction between an array of nullable elements, and an array of non-nullable elements. Which could be a good thing if we want to catch leaking of arrays with potentially null elements from old code to new code, instead of waiting for new code to access a null element to throw an exception. In the other hand, the lazy check solution allows arrays of non-nullable elements with zero null elements to work fine with new code. From an implementation point of view, the JVM already has to make the distinction between flattened and not flattened arrays, so there?s a logic in place to detect some internal constraints of arrays, but the nullable/ non-nullable element semantic would require one additional bit. Fred > On May 6, 2018, at 05:17, John Rose wrote: > > Like many of us, I have been thinking about the problems of keeping values, nulls, > and objects separate in L-world. I wrote up some long-ish notes on the subject. > I hope it will help us wrap our arms around the problem, and get it solved. > > TL;DR: Remi was right in January. We need a ValueTypes attribute. > > http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.html > > Cheers! > ? John > > P.S. Raw markdown source follows for the record. > http://cr.openjdk.java.net/~jrose/values/value-type-hygiene.md > > # Value Type Hygiene > > #### May 2018 _(v. 0.1)_ > > #### John Rose and the Valhalla Expert Group > > Briefly put, types in L-world are ambiguous, leading to unhygienic > mixtures of value operations with reference operations, and > uncontrolled pollution from `null`s infecting value code. > > This note explores a promising proposal for resolving the key > ambiguity. It is a cleaner design than the ad hoc mechanisms tried so > far. The resulting system would seem to allow more predictable and > debuggable behavior, a stronger backward compatibility story, and > better optimization. > > ## Problem statement > > In the _L-world_ design for value types, the classfile type descriptor > syntax is left unchanged, and the pre-existing descriptor form > `"LFoo;"` is overloaded to denote value types as well as object types. > A previous design introoduced new descriptors for value types of the > form `"QFoo;"`, and possibly a union type `"UFoo;"`. This design > might be called _Q-world_. In comparison with Q-world, the L-world > design approach has two advantages--compatibility and migration--but > also one serious disadvantage: ambiguity. > > L-world is _backward compatible_ with tools that must parse classfile > descriptors, since it leaves descriptor syntax unchanged. There have > been no changes to this syntax in almost thirty years, and there is a > huge volume of code that depends on its stability. The HotSpot JVM > itself makes hundreds of distinct decisions based on descriptor syntax > which would need careful review and testing if they were to be adapted > to take account of a new descriptor type (`"QFoo;"`, etc.). > > Because of its backward compatibility, L-world also has a distinctly > simpler _migration story_ than previous designs. Some _value-based > classes_, such as `Optional` and `LocalTime`, have been engineered to > be candidates for migration to proper value types. We wish to allow > such a migration without recompiling the world or forcing programmers > to recode uses of the migrated types. It is very difficult to sustain > the illusion in Q-world that a value type `Qjava/util/Optional;` can > be operated on in old code under the original object type > `Ljava/util/Optional;`, since the descriptors do not match and a > myriad of adapters must be spun (one for every mention of the wrong > descriptor). With L-world, we have the simpler problem (addressed in > this document) of keeping straight the meaning of L-descriptors > in each relevant context, whether freshly recompiled or legacy > code; this is a simpler problem than spinning adapters. > > But not all is well in L-world. The compatibility of descriptors > implies that, when a classfile must express a semantic distinction > between a reference type and an object type, it must be allowed to do > so unambiguously, in a side channel outside of the descriptor. > > Our first thought was, "well, just load all the value types and then > you will know the list of them". If we have a global registry of > classes (such as the HotSpot system dictionary), nobody needs to > express any additional distinctions, since everybody can just ask the > register which are the value types. > > This simple idea has a useful insight, but it goes wrong in three > ways. First, for some use cases such as classfile transformation, it > might be difficult to find such a global registry; in some cases we > might prefer to rely on local information in the classfile. We need a > way for a classfile to encode, within itself, which types it is using > as value types, so that all viewers of the classfile can make > consistent decisions about what's a value and what's not. > > Second, if we are running in the JVM, the global registry of value > types has to be built up by loading classfiles. In order for every > classfile that _uses_ a value type to know its status, the classfile > the _defines_ the value type must be loaded _first_. But there is no > way to totally order these constraints, since it is easy to create > circular dependencies between value types, either directly or > indirectly. (N.B. Well-foundedness rules for layout don't eliminate > all the possible circularities.) And it won't work to add more > initialization phases ("first load all the classfiles, then let them > all start asking questions about their contents"), because that would > require preloading a classfile for every potential value type > mentioned in some other classfile. That's every name in every > `"LFoo;"` descriptor. Loading a file for every name mentioned > anywhere is very un-Java-like, and something that drastic would be > required in order to make correct decisions about value types. > > That leads to the third problem, which comes from our desire to make a > migration story. Some classfiles need to operate on value types as if > they were object references. (Below, we will see details of how > operations can differ between value and reference types.) This means > that, if we are to support migration, we need a way for legacy > classfiles to make a _local_ decision to treat a given type as a > reference type, for backward compatibility. Luckily, this is > possible, but it requires a local indication in the classfile so the > JVM can adjust certain operations. > > A solution to these problems requires a way for each classfile to > declare how it intends to use each type that is a value type, and > (what is more) a way for legacy classfiles to peacefully interoperate > with migrated value types. We have experimented with various partial > solutions, such as adding an extra bit in a context where a value type > may occur, to let the JVM know that the classfile intends a value > type. (This is the famous `ACC_FLATTENABLE` bit on fields.) But it > turns out that the number of places where value-ness is significant is > hard to limit to just a few spots where we can sprinkle a mode bit. > We need a _comprehensive_ solution that can clearly and consistently > define a classfile's (local) view of the status of each type it works > with, so that when the "value or reference?" question comes up, there > is a clear and consistent answer. We need to prevent the values and > the references from polluting each other; we need _value type > hygiene_. > > ## Value vs. reference operations > > Value types can be thought of as simpler than reference types, because > they lack two features of reference types: > > - _identity:_ Two value types with the same immediate components are > indistinguishable, even if they were created by different code > paths. Objects, by contrast, "remember" when they were created, > and each object is a unique identity. Identities are > distinguished using the `acmp` family of instructions, and Java's > `==` operator. > > - _nullability:_ Any variable of any reference type can store the > value `null`; in fact, `null` is the initial value for fields and > array elements. So `null` is one of the possible values of any > reference type, including `Object` and all interfaces. By > contrast, `null` is _not_ the value of any value type. Value type > variables are not nullable, because `null` is a reference. (But > read on for an awkward exception.) The type `Object` can > represent all values and references. Casting an unknown operand > of type `Object` to a value type `Foo` must succeed if in fact the > operand is of type `Foo`, but a null `Object` reference must never > successfully cast to a value type. > > This strong distinction between values and references is inspired, in > part, by the design of Java's primitive types, which also are identity > free and are not nullable. Every copy of the `int` value 42 is > completely indistinguishable from every other copy, and you can't cast > a `null` to `int` (without a null pointer exception). We hope > eventually to unify value types and primitives, but even if this > never comes to pass, our design slogan for value types is, _codes > like a class, works like an int_. > > By divesting themselves of identity and nullability, value types are > able to enjoy new behaviors and optimizations akin to those of > primitives, notably flattening in the heap and scalarization in > compiled code. > > To unlock these benefits, the JVM must treat values and references > as operationally distinct. Some of these operational distinctions > are quite subtle; some required months of discussion to elucidate, > though soon (we hope) they will be obvious in hindsight. > > Here is a partial list of cases where the JVM should be able to > distinguish value types from reference types: > > - _instance fields:_ A value field should be flattened (if possible) > to components in adjacent memory words. A reference field must > not be flattened, in order to retain identity and store the null > reference. > - _static fields:_ A static field must be properly initialized > to the default value of its type, not to null. This holds true > for all fields, in fact. Flattening does not seem to be important > for static fields. > - _array elements:_ An element of a value array (array whose > component type is a value type) should flatten its elements and > arrange them compactly in successive memory locations. Such > an array must be initialized to the default value of its value > type, and never to `null`. > - _methods:_ A value parameter or return value should be > flattened (if possible) to components in registers. A reference > must not be treated this way, because of identity and nullability. > - _verifier:_ The verifier needs to know value types, so it can > forbid inapplicable operations, such as `new` or `monitorenter`. > - _conversions:_ The `checkcast` operator for a value type might > reject `null` (as well as rejecting instances of the wrong type). > The `ldc` of a dynamic constant of value type must not produce > `null` (instead it must fail to link). > _ _comparisons:_ The `acmp` operator family must not detect > value type identities (since they are not present), so it must > operate differently on values and references. In some cases, > the verifier might reject `acmp` altogether. > - _optimization:_ The JIT needs to know whether it can discard > any internal reference (for a value type) and just explode the > value into registers. The possibility of `null` mixing with > value types. > > This list can be tweaked to make it shorter, by adjusting the rules in > ways that lessen the impact of ambiguity in type names. The list is > also incomplete. (We will add to it later.) Each point of > distinction is the subject of detailed design trade-offs, many of > which we are sketching here. > > Some of these distinctions can be pushed to instruction link time > (when resolved value classes may be present) or run time (when the > actual values are on stack). A dynamic check can make a final > decision, after all questions of value-ness are settled. This seems > to be a good decision for `acmp`. The linkage of a `new` instruction > can fail on a value class, or a `checkcast` instruction can reject > inspected as part of the dynamic execution of operations like `new`. > > But this delaying tactic doesn't always work. For example, field > layout must be computed during class loading, which (as was seen > above) is too early to use the supposed global list of value types. > > Even if some check can be delayed, like the detection of an erroneous > `new` on a value type, we may well decide it is more useful (or > "hygienic") to detect the error earlier, such as at verification time, > so that a broken program can be detected before it starts to run. > > Also, some operations may be contextual, to support backward > compatibility. Thus, `checkcast` may need to consult the local > classfile about whether to reject nulls, so that legacy code won't > suddenly fail to verify or execute just because it mixes nulls with > (what it thought were) references. Basically, a "legacy checkcast" > should work correctly with nulls, while an "upgraded checkcast" should > probably reject nulls immediately, without requiring extra tests. > > We will examine these points in more detail later, but now we need to > examine how to contextualize information about value types. > > ## Towards a solution > > What is to be done? The rest of this note will propose some solutions > to the problem of value type hygiene, and specifically the problem of > preventing nulls from mixing with values ("null hygiene"). > > Both Remi Forax[[1]] and Frederic Parain[[2]] have proposed the idea > of having each classfile explicitly declare the list of all value > types that it is using. For the record, this author initially > resisted the idea[[3]] as overkill: I was hoping to get away with a > band-aid (`ACC_FLATTENABLE`), but have since realized we need a more > aggressive treatment. Clean and tidy behavior from the JVM will make > it easier to implement clean and tidy support for value types in the > Java language. > > [[1]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003685.html > [[2]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003699.html > [[3]]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-January/003687.html > > Throughout the processing of the classfile, the list can serve as a > reliable local registry of decisions about values vs. references. > First we will sketch the attribute, and then revisit the points above > to see how the list may be used. > > ## The `ValueTypes` attribute > > As proposed above, let us define a new attribute called `ValueTypes` > which is simply a counted array of `CONSTANT_Class` indexes. Each > indexed constant is loaded and checked to be a value type. The JVM > uses this list of locally declared value types for all further > decisions about value types, relative to the current class. > > As a running reference, let's call the loaded class `C`. `C` may be > any class, either an object or a value. The value types locally > declared by `C` we can call `Q`, `Q1`, `Q2`, etc. These are exactly > the types which would get `Q` descriptors in Q-world. > > As an attribute, `ValueTypes` is somewhat like the `InnerClasses` > attribute. Both list all classes, within the context a particular > classfile, which need some sort of special processing. The > `InnerClasses` attribute includes additional data for informing the > special processing (including the break down of "binary names" into > outer and inner names, and extra modifier bits), but the `ValueTypes` > attribute only needs to mention the classes which are known to be > value types. > > Already with the `ACC_FLATTENABLE` bit we have successfully defined > logic that pre-loads a supposed value type, ensures that it _is_ in > fact a value type, and then allows the JVM to use all of the necessary > properties of that value type to improve the layout of the current > class. The classes mentioned in `ValueTypes` would be pre-loaded > similarly. In fact, the `ACC_FLATTENABLE` bit is no longer needed, > since the JVM can simply flatten all fields whose type names are > mantioned in the local `ValueTypes` list. > > We now come to the distinction between properly resolved classes > (`CONSTANT_Class` entries) and types named in descriptors. This > distinction is important to keep in mind. Once a proper class > constant `K` is resolved by `C`, everything is known about it, and a > permanent link to `K` goes into `C`'s constant pool. The same is not > true of other type names that occur within field and method > descriptors. In order for `C` to check whether its field type `"LK;"` > is a value type, it must _not_ try to resolve `K`. Instead it must > look for `K` _by name_ in the list of locally declared value types. > Later on, when we examine verifier types and the components of method > descriptors a similar by-name lookup will be necessary to decide > whether they refer to value types. Thus, there are two ways a type > can occur in a classfile and two ways to decide if it is a value type: > By resolving a proper constant `K` and looking at the metadata, and by > matching a name `"LK;"` against the local list. Happily, the answers > will be complete and consistent if all the queries look at the same > list. > > So a type name can be classified as a value type without resolution, > by looking for the same name in the names of the list of declared > value types. And this can be done even before the list of declared > value types is available. This means that any particular declared > value types might not need to be loaded until "hard data" is required > of it. A provisional determination of the value status of some `Q` > can be made very early before `Q`'s classfile is actually located and > pre-loaded. That provision answer might be enough to check some early > structural constraint. It seems reasonable to actually pre-load the > `Q` values lazily, and only when the JVM needs hard data about `Q`, > like its actual layout, or its actual supers. > > What if an element of `ValueTypes` turns out to be a reference type? > (Perhaps someone deployed a value-type version of `Optional` but then > got cold feet; meanwhile `C` is still using it under the impression it > is a value type.) There are two choices here, loose and strict, > either pretend the type wasn't there anyway, or raise an error in the > loading of the current classfile. The strict behavior is safer; we > can loosen it later if we find a need. The case of an element failing > to load at all can be treated like the previous problem, either > loosely or strictly; strict is better all else being equal. > > The strict treatment is also more in line with how to treat failed > resolution of super-types, which are a somewhat similar kind of > dependency: Super-types, like value types, are loaded as early as > possible, and play a role in all phases of classfile loading, notably > verification. > > One corollary of making the list an attribute is that it can be easily > stripped, just like `InnerClasses` or `BootstrapMethods`. Is this a > bug or a feature? In the case of `InnerClasses`, stripping the > attribute doesn't affect execution of the classfile but it does break > some Core Reflection queries. In the case of `BootstrapMethods`, the > structural constraints on dynamic constant pool constants will break, > and the classfile will promptly fail to load. The effect of removing > a `ValueTypes` attribute is probably somewhere in between. Because > L-world types are ambiguous, and because we specifically allow value > types to be used as references from legacy classfiles (for migration), > there's always a way to "fake" enough reference behavior from a value > type in a classfile which doesn't make special claims about it. So it > seems reasonable to allow `ValueTypes` to be stripped, at least in > principle. At a worst case the classfile will fail to load, as in the > case of a stripped `BootstrapMethods`, but the feature might actually > prove useful (say, for before-and-after migration testing). > > Note that in principle a classfile generator could choose to ignore a > value type, and treat it as a (legacy) reference type. Because of > migration, the JVM must support at least some such moves, but such > picking and choosing is not the center of our design. In particular, > we do not want the same compilation unit to treat a type as a value > type in one line of code, and a reference type in the next. This may > come later, if we decide to introduce concepts of nullable values > and/or value boxes, but we think we can defer such features. > > So for now, classfiles may differ among themselves about which types > are value types, but within a single classfile there is only one > source of local truth about value types. (Locally-sourced, fresh, > hygienic data!) > > ## Value types and class structure > > Very early during class loading, the JVM assigns an instance layout to > the new class `C`. Before that point it must first load the declared > value types (`Q1`, `Q2`, ...), and then recursively extract the layout > information from each one. There is no danger of circularity in this > because a value type instance cannot contain another instance of > itself, directly or indirectly. > > Both non-static and static fields of value type make sense (because a > value "works like an int"). But static fields interact differently > with the loading process than non-static fields. > > A static value type field has no enclosing instance, unless the JVM > chooses to make one secretly. Therefore it doesn't need to be > flattened. The JVM can make an invisible choice about how to store a > static value type field: > > - Buffered immutably on the heap and stored by (invisible) reference > next to the other statics. The `putstatic` instruction would > put the new value in a _different_ buffer and change the pointer. > - Buffered mutably somewhere, with the pointer stored next to > the other statics, or in metadata. The `putstatic` instruction > would store the flattened value into the _same_ buffer. > - Flattened fully into the same container as the other statics. > > The first option seems easiest, but the second might be more > performant. The third difficult due to bootstrapping concerns. > > In fact, the same implementation options apply for non-statics as for > statics, but only the third one (full flattening) is desirable. The > first one (immutable buffering) may be useful as a fallback > implemmentation technique for special cases like jumbo values and > fields which are `volatile`, and thus need to provide atomicity. > > The root container for all of `C`'s statics, in HotSpot, happens to be > the relevant the `java.lang.Class` value `C.class`. Presumably it's a > good place to put the invisible pointers mentioned above. > > A static field of value type `Q` cannot make its initial value > available to `getfield` until `Q`'s `` method runs, (or in the > case of re-entrant initialization, has at least started). Since > classes can circularly refer to instances of each other via static > references, `Q` might return the favor and require materialization of > `C`. > > The first time `C` requires `Q`'s default value, if `Q` has not been > initialized, its `` method should run. This may trigger > re-entry into the initializer for `C`, so `Q` needs to get its act > together _before_ it runs its ``, and immediately create `Q`'s > own default value, storing it somewhere in `Q`'s own metadata (or else > the `Class` mirror looks like a good spot). The invariant is that, > before `Q`'s class initializer can run one bytecode, the default value > for `Q` is created and registered for all time. Creating the default > value before the initializer runs is odd but harmless, as long as no > bytecode can actually access the default value without triggering > `Q`'s initialization. > > This also implies that `C` should create and register its own default > value (if it is a value type) before it runs its own `` > method, lest `Q` come back and ask `C` for its value type. > > The JVM may be required to bootstrap value-type statics as invisible > null pointers, which are inflated (invisibly by the `getstatic` and/or > `putstatic` instructions) into appropriate buffers, after ensuring the > initialization of the value type class. But it seems possible that if > the previous careful sequencing is observed, there is no need to do > lazy inflation of nulls, which would simplify the code for `getstatic` > and `putstatic`. > > ## Value types and method linkage > > A class `C` includes methods as well as fields, of course. A method > can receive or return a value type `Q` simply by mentioning `Q` as a > component of its method descriptor (as an L-descriptor `"LQ;"`). > > If a method `C.m()LD;` mentions some type `D` which is not on the > declared list, then that type `D` will be treated, like always, as a > nullable, identity-bearing reference. > > Interestingly, migration compatibility requires this to be the case > whether or not `D` is in actual fact a value type. If `C` is > unconscious of `D`'s value-ness, the JVM must respect this, and > preserve the illusion that `D` values are "just references, nothing to > see here, move along". Perhaps `D` is freshly upgraded to a value > type, and `C` isn't recompiled yet. `C` should not be penalized > for this change, if at all possible. > > This points to a core decision of the L-world design, that nearly all > of the normal operations on object references "just do the right > thing" when applied to value types. The two kinds of data use the > same descriptor syntax. Value types can be converted to `Object` > references, even though the resulting pseudo-reference does not expose > any identity (and will never be null). Instructions like `aload` > operate on values just as well as references, and so on. > > Basically, values in L-world routinely go around lightly disguised as > references, special pseudo-references which do not retain object > identity. As long as nobody looks closely, the fiction that they are > references is unbroken. If someone tries a `monitorenter` > instruction, the game is over, but we think those embarassing moments > will be rare. > > On the other hand, if a method `C.m()LQ;` uses a locally-declared > value type, then the JVM has some interesting options. It may choose > to notice that the `Q`-value is not nullable, has no identity. It can > adjust the calling sequence of `m` to work with undisguised "naked > values", which are passed on the stack, opr broken into components for > transport across the method API. This would almost be a purely > invisible decision, except that naked values cannot be null, and so > such calling sequences are hostile to null. Again, it "works like an > int". A null `Integer` value will do just the same thing if you try > to pass it to an `int`-recieving method. So we have to be prepared > for an occasional embarassing NPE, when one party thinks a type is a > nullable reference type and the other party knows it's a value type. > > One might think that it is straightforward to assign a value-using > method a calling sequence by examining the method signature and the > locally declared value types of the declaring class. But in fact > there are non-local constraints. Only static and private methods > can easily be adjusted to work with naked values. > > Unlike fields, methods can override similar methods in some `C`'s > super-type `S`. This immediately leads to the possibility of `C` and > `S` differing as to the status of some type `X` in the method's > signature. If neither of the `ValueTypes` lists of `C` and `S` > mentions `X`, then the classes are agreed that `X` is an object type > (even if in truth it happens to be a value type). They can agree > to use a reference-based calling sequence for some `m` that works > with `X`. > > If both lists mention some `Q`, then both classes agree, and in fact > it must be a value type. They might be able to agree to use "naked > values" for the `Q` type when calling the method. Or not: they still > have to worry about other supers that might have another opinion about > `Q`. > > What if `C` doesn't list `Q` but `S` does, and they share a method > that returns `Q`? For example, what about `C.m()Q` vs. `S.m()Q`? In > that case, the JVM may have already set up `S.m` to return its `Q` > result as a naked value. Probably this happend before `C` was even > loaded. The code for `C.m` will expect simply to return a normal > reference. In reality, it will be unconsciously holding a > JVM-assigned pseudo-reference to the buffered `Q`-value. The JVM must > then unwrap the reference into a naked value to match the calling > sequence it assigned (earlier, before `C` was loaded) to `S.m`. The > bottom line is that even though `C.m` was loaded as a > reference-returning function, the JVM may secretly rewrite it to > return a naked value. > > Since `C.m` returns a reference, it might choose to return `null`. > What happens then? The secretly installed adaptation logic cannot > extract the components of a buffer that doesn't exist. A > `NullPointerException` must be thrown, at the point where `C.m` is > adapted to `S.m`, which has the greater knowledge that `Q` is value > type (hence non-nullable). It will be as if the `areturn` instruction > of `C.m` included a hidden null check. > > Is such a hidden null check reasonable? One might explain that the > `C` code thinks (wrongly) it is working with boxes, while the `S` code > _knows_ it is working with values. If the method were `C.m()Integer` > and it were overriding `S.m()int`, then if `C.m` returns `null` then > the adapter that converts to `S.m()int` must throw NPE during the > implicit conversion from `Integer` to `int`. A value "works like an > int", so the result must be similar with a value type. It is as if > the deficient class `C` were working with boxes for `Q` (indeed that's > all it sees) while the knowledgeable class `S` is working with true > values. The NPE seems justifiable in such terms, although there is no > visible adapter method to switch descriptors in this case. > > The situation is a little odd when looked at the following way: If you > view nullability as a privilege, then this privilege is enjoyed only > by deficient classes, ones that have not yet been recompiled to "see" > that the type `Q` is a value type. Ignorant classes may pass `null` > back and forth through `Q` APIs, all day long, until they pass it > through a class that knows `Q` is a value. Then an `NPE` will end > their streak of luck. Is using `null` a privilege? Well, yes, but > remember also that if `Q` started its career as an object type, it was > a value-based class, and such classes are documented as being > null-hostile. The null-passers were in a fool's paradise. > > What if `C` lists `Q` as a value but `S` doesn't? Then the calling > sequence assigned when `S` was loaded will use references, and these > references will in fact be pseudo-references to buffered `Q` values > (or `null`, as just discussed). The knowledgeable method `C.m()Q` > will never produce a `null` through this API. The JVM will arrange > to properly clothe the `Q`-value produced by `C.m` into a buffer > whose pointer can be returned from `S.m`. > > Class hierarchies can be much deeper than just `C` and `S`, and > overrides can occur at many levels on the way down. Frederic Parain > has pointed out that the net result seems to be that the first > (highest) class that declares a given method (with descriptor) also > gets to determine the calling sequence, which is then imposed on all > overrides through that class. This leads to a workable implementation > strategy, based on v-table packing. A class's v-table is packed at > during the "preparation" phase of class linking, just after loading > before any subclass v-table is packed. The JVM knows, unambiguously, > whether a given v-table entry is new to a class, or is being > reaffirmed from a previous super-class (perhaps with an override, > perhaps just with an abstract). At this point, a new v-table slot can > be given a carefully selected internal calling sequence, which will > then be imposed on all overrides. An old v-table slot will have the > super's calling sequence imposed on it. In this scheme, the > interpreter and compiler must examine both the method descriptor and > some metadata about the v-table slot when performing `invokevirtual` > or `invokespecial`. > > A method coming in "sideways" from an interface is harder to manage. > It is reasonable to treat such a method as "owned" by the first proper > class that makes a v-table entry for it. But that only works for one > class hierarchy; the same method might show up in a different > hierarchy with incompatible opinions about value types in the method > signature. It appears that interface default methods, if not class > methods, must be prepared to use more than one kind of calling > sequence, in some cases. It is as if, when a class uses a default > method, it imports that method and adjusts the method's calling > sequence to agree with that class's hierarchy. > > Often an interface default method is completely new to a class > hierarchy. In that case, the interface can choose the calling > sequence, and this is likely to provide more coherent calling > sequences for that API point. > > These complexities will need watching as value types proliferate and > begin to show up in interface-based APIs. > > ## Value types and the verifier > > Let us assume that, if the verifier sees a value type, it should flag > all invalid uses of that value type immediately, rather than wait for > execution. > > (This assumption can be relaxed, in which case many points in this > section can be dropped. We may also try to get away with implementing > as few of these checks as possible, saving them for a later release.) > > When verifying a method, the verifier tracks and checks types by name, > mostly. Sometimes it pre-loads classes to see the class hierarchy. > With the `ValueTypes` attribute, there is no need to pre-load value > classes; the symbolic method is sufficient. > > The verifier type system needs a way to distinguish value types from > regular object types. To keep the changes small, this distinction can > be expressed as a local predicate on type names called `isValueType`, > implemented by referring to `ValueTypes`. In this way, the > `StackMapTable` attribute does not need any change at all. Nor does > the verifier type system need a change: value types go under the > `Object` and `Reference` categories, despite the fact that value types > are not object types, and values are not references. > > The verifier rules need to consult `isValueType` at some points. The > assignability rules for `null` must be adjusted to exclude value > classes. > > ``` > isAssignable(null, class(X, _)) :- not(isValueType(X)). > ``` > > This one change triggers widespread null rejection: wherever a value > type is required, the verifier will not allow a `null` to be on the > stack. Assuming `null` is on the stack and `Q` is a value type, the > following will be rejected as a consequence of the above change: > > - `putfield` or `putstatic` to a field of type `Q` > - `areturn` to a return type `Q` > - any `invoke` passing `null` to a parameter of type `Q` > - any `invoke` passing `null` to a receiver of type `Q` (but this is rare) > > Given comprehensive null blocking (along other paths also), the > implementation of the `putfield` (or `withfield`) instruction could go > ahead and pull a buffered value off the stack without first checking > for `null`. If the verifier does not actually reject such `null`s, > the dynamic behavior of the bytecodes themselves should, to prevent > null pollution from spreading. > > The verifier rules for `aastore` and `checkcast` only check that the > input type is an object reference of some sort. More narrow type > checks are performed at runtime. A null may be rejected dynamically > by these instructions, but the verifier logic does not need to track > `null`s for them. > > The verifier rules for `invokespecial` have special cases for `` > methods, but these do not need special treatment, since such calls > will fail to link when applied to a value type receiver. > > The verifier _could_ reject reference comparisons between value types > other operands (including `null`, other value types, and reference > types). This would look something like an extra pair of constraints > after the main assertion that two references are on the stack: > > ``` > instructionIsTypeSafe(if_acmpeq(Target), Environment, _Offset, StackFrame, > NextStackFrame, ExceptionStackFrame) :- > canPop(StackFrame, [reference, reference], NextStackFrame), > + not( canPop(StackFrame, [_, class(X, _)], _), isValueType(X) ), > + not( canPop(StackFrame, [class(X, _), _], _), isValueType(X) ), > targetIsTypeSafe(Environment, NextStackFrame, Target), > exceptionStackFrame(StackFrame, ExceptionStackFrame). > ``` > > (The JVMS doesn't use any such `not` operator. The actual Prolog > changes would be more complex, perhaps requiring a `real_reference` > target type instead of `reference`.) > > This point applies equally to `if_acmpeq`, `if_acmpne`, `if_null`, and > `if_nonnull`, > > This doesn't seem to be worth while, although it might be > interesting to try to catch javac bugs this way. In any case, such > comparisons are guaranteed to return `false` in L-world, and will > optimize quickly in the JIT. > > In a similar vein, the verifier _could_ reject `monitorenter` and > `monitorexit` instructions when they apply to value types: > > ``` > instructionIsTypeSafe(monitorenter, _Environment, _Offset, StackFrame, > NextStackFrame, ExceptionStackFrame) :- > canPop(StackFrame, [reference], NextStackFrame), > + not( canPop(StackFrame, [class(X, _)], _), isValueType(X) ), > exceptionStackFrame(StackFrame, ExceptionStackFrame). > ``` > > And a `new` or `putfield` could be quickly rejected if it applies to a > value type: > > ``` > instructionIsTypeSafe(new(CP), Environment, Offset, StackFrame, > NextStackFrame, ExceptionStackFrame) :- > StackFrame = frame(Locals, OperandStack, Flags), > CP = class(X, _), > + not( isValueType(X) ), > ... > > instructionIsTypeSafe(putfield(CP), Environment, _Offset, StackFrame, > NextStackFrame, ExceptionStackFrame) :- > CP = field(FieldClass, FieldName, FieldDescriptor), > + not( isValueType(FieldClass) ), > ... > ``` > > Likewise `withfield` could be rejected by the verifier if applied to a > non-value type. > > The effect of any or all of these verifier rule changes (if we choose > to implement them) would be to prevent local code from creating a > `null` and accidentally putting it somewhere a value type belongs, or > from accidentally applying an identity-sensitive operation to an > operand _known statically_ to be a value type. These rules only work > when a sharp verifier type unambiguously reports an operand as `null` > or as a value type. > > Nulls must also be rejected, and value types detected, when they are > hidden, at verification time, under looser types like `Object`. > Protecting local code from outside `null`s must also be done > dynamically. > > Omitting all of these rules will simply shift the responsibility for > null rejection and value detection fully to dynamic checks at > execution time, but such dynamic checks must be implemented in any > case, so the verifier's help is mainly an earlier error check, > especially to prevent null pollution inside of a single stack frame. > For that reason, the only really important verifier change is the > `isAssignable` adjustment, mentioned first. > > The dynamic checks which back up or replace the other verifier checks > will be discussed shortly. > > ## Value types and legacy classfiles > > We need to discuss the awkward situation of `null` being passed as a > value type, and value types being operated on as objects, by legacy > classfiles. One legacy classfile can dump null values into surprising > places, even if all the other classfiles are scrupulous about > containing `null`. > > We will also observe some of the effects of having value types > "invade" a legacy classfile which expects to apply identity-sensitive > operations to them. > > By "legacy classfile" we of course mean classfiles which lack > `ValueTypes` attributes, and which may attempt to misuse value types > in some way. (Force of habit: it's strong.) We also can envision > half-way cases where a legacy classfile has a `ValueTypes` attribute > which is not fully up to date. In any case, there is a type `Q` which > is _not_ locally declared as a value type, by the legacy class `C`. > > The first bad thing that can happen is that `C` declares a field of > type `Q`. This field will be formatted as a reference field, even > though the field type is a value type. Although we implementors might > grumble a bit, the JVM will have to arrange to use pseudo-pointers to > represent values stored in that field. (It's as if the field were > volatile, or not flattenable for some other reason.) That wasn't too > bad, but look what's in the field to start with: It's a null. That > means that any legitmate operation on this initial value will throw an > `NPE`. Of course, the writer of `C` knew `Q` as a value-based class, > so the initial null will be discarded and replaced by a suitable > non-null value, before anything else happens. > > What if `C` makes a mistake, and passes a `null` to another class > which _does_ know `Q` is a value? At that point we have a choice, as > with the verifier's null rejection whether to do more work to detect > the problem earlier, or whether to let the `null` flow through and > eventually cause an `NPE` down the line. Recall that if an API point > gets a calling sequence which recognizes that `Q` is a value type, it > will probably unbuffer the value, throwing `NPE` immediately if `C` > makes a mistake. This is good, because that's the earliest we could > hope to flag the mistake. But if the method accepts the boxed form of > `Q`, then the `null` will sneak in, skulk around in the callee's stack > frame, and maybe cause an error later. > > Meanwhile, if the JVM tries to optimize the callee, it will have to > limit its optimizations somewhat, because the argument value is > nullable (even if only ever by mistake). To cover this case, it may > be useful to define that _method entry_ to a method that knows about > `Q` is null-hostile, even if the _calling sequence_ for some reason > allows references. This means that, at function entry, every known > value type parameter is null-checked. This needs to be an official > rule in the JVM, not just an optimization for the JIT, in order for > the JIT to use it. > > What if our `C` returns a `null` value to a caller who intends to use > it as a value? That won't go well either, but unless we detect the > `null` aggressively, it might rattle around for a while, disrupting > optimization, before produing an inscrutable error. ("Where'd that > `null` come from??"). The same logic applies as with arguments: When > a `null` is returned from a method call that purports to return `Q`, > this can only be from a legacy file, and the calling sequences were > somehow not upgraded. In that case, the JVM needs to mandate a null > check on every method invocation which is known to return a value > type. > > The same point also applies if another class `A`, knowing `Q` as a > value type, happens to load a `null` from one of `C`'s fields. The > `C` field is formatted as a reference, and thus can hand `A` a > surprise `null`, but `A` must refuse to see it, and throw `NPE`. > Thus, the `getfield` instruction, if it is pointed at a legacy > non-flattened field, will need to null-check the value loaded > from the field. > > Meanwhile, `C` is allowed to `putfield` and `getfield` `null` all day > long into its own fields (and fields of other benighted legacy classes > that it may be friends with). Thus, the `getfield` and `putfield` > instructions link to slightly different behavior, not only based on > the format of the field, but _also_ based on "who's asking". Code in > `C` is allowed to witness `null`s in its `Q` fields, but code in `A` > (upgraded) is _not_ allowed to see them, even though it's the same > `getfield` to the same symbolic reference. Happily, fields are not > shared widely across uncoordinated classfiles, so this is a corner > case mainly for testers to worry about. > > What if `C` stores a `null` into somebody else's `Q` field, or into an > element of a `Q[]` array? In that case, `C` must throw an immediate > `NPE`; there's no way to reformat someone else's data structure, > however out-of-date `C` may be. > > What if `C` gets a null value from somewhere and casts it to `Q`? > Should the `checkcast` throw `NPE` (as it should in a classfile where > `Q` is known to be a value type)? For compatibility, the answer is > "no"; old code needs to be left undisturbed if possible. After all, > `C` believes it has a legitimate need for `null`s, and won't be fixed > until it is recompiled and its programmer fixes the source code. > > That's about it for `null`. If the above dynamic checks are > implemented, then legacy classfiles will be unable to disturb upgraded > classfiles with surprise null values. The goal mentioned above > about controlling `null` on all paths is fulfilled blocking `null` > across API calls (which might have a legacy class on one end), and by > verifying that `null`s never mix with values, locally within a single > stack frame. > > There are a few other things `C`'s could do to abuse `Q` values. > Legacy code needs to be prevented immediately from making any of the > following mistakes: > > - `new` of `Q` should throw `ICCE` > - `putfield` to a field of `Q` should throw `ICCE` > - `monitorenter`, `monitorexit` on a `Q` value should throw `IllegalMonitorStateException` > > Happily, the above rules are not specific to legacy code but apply > uniformly everywhere. > > A final mistake is executing an `acmp` instruction on a value type. > Again, this is possible everywhere, not just in legacy files, even if > the verifier tries to prevent the obvious occurrences. There are > several options for `acmp` on value types. The option which breaks > the least code and preserves the O(1) performance model of `acmp` is > to quickly detect a value type operand and just report `false`, even > if the JVM can tell, somehow, that it's the same buffer containing the > same value, being compared to itself. > > All of these mistakes can be explained by analogy, supposing that the > legacy class `C` were working with a box type `Integer` where other > classes had been recoded to use `int`. All variables under `C`'s > control are nullable, but when it works with new code it sees only > `int` variables. Implicit conversions sometimes throw `NPE`, and > `acmp` (or `monitorenter`) operations on boxed `Integer` values yield > unspecific (or nonsensical) results. > > ## Value types and instruction linkage > > Linked instructions which are clearly wrong should throw a > `LinkageError` of some type. Examples already given are `new` and > `putfield` on value types. > > When a field reference of value type is linked it will have to > correctly select the behavior required by both the physical layout of > the field, and also the stance toward any possible `null` if the field > is nullable. (As argued above, the stance is either lenient for > legacy code or strict for new code.) > > A `getstatic` linkage may elect to replace an invisible `null` with > a default value. > > When an `invoke` is linked it will have to arrange to correctly > execute the calling sequence assigned to its method or its v-table. > > Linkage of `invokeinterface` will be even more dynamic, since the > calling sequence cannot be determined until the receiver class is > examined. > > Linkage of dynamic constants in the constant pool must reject `null` > for value types. Value types can be determined either globally based > on the resolved constant type, or locally based on the `ValueTypes` > attribute associated with the constant pool in which the resolution > occurs. > > ## Value types and instruction execution > > Most of the required dynamic behaviors to support value type hygiene > have already been mentioned. Since values are identity-free and > non-nullable, the basic requirement is to avoid storing `null`s in > value-type variables, and degrade gracefully when value types are > queried about their identities. A secondary requirement is to support > the needs of legacy code. > > For null hygeine, the following points apply: > > - A nullable argument, return value (from a callee), > or loaded field must be null-checked before being further > processed in the current frame, if its descriptor is locally > declared as a value type. > - `checkcast` should reject `null` for _locally_ declared value > types, but not for others. > - If the verifier does not reject `null`, the `areturn`, `putfield` > `withfield` instructions should do so dynamically. (Otherwise the > other rules are sufficient to contain `null`s.) > - An `aastore` to a value type array (`Q[]`) should reject `null` > even if the array happens to use invisible indirections as an > implementation tactic (say, for jumbo values). This is a purely > dynamic behavior, not affected by the `ValueTypes` attribute. > > Linked field and invoke instructions need sufficient linkage metadata > to correctly flatten instance fields and use unboxed (and/or `null` > hostile) calling sequences. > > As discussed above, the `acmp` must short circuit on values. This is > a dynamic behavior, not affected by the `ValueTypes` attribute. > > Generally speaking, any instruction that doesn't refer to the constant > pool cannot have contextual behavior, because there is no place to > store metadata to adjust the behavior. The `areturn` instruction is > an exception to this observation; it is a candidate for bytecode > rewriting to gate the extra null check for applicable methods. > > ## Value types and reflection > > Some adjustments may be needed for the various reflection APIs, in > order to bring them into alignment with the changed bytecode. > > - `Class.cast` should be given a null-hostile partner > `Class.castValue`, to emulate the updated `checkcast` semantics. > - `Field` should be given a dynamic `with` to emulate `withfield`, > and the `Lookup` API given a way to surface the corresponding MH. > - `Class.getValueTypes`, to reflect the attribute, may be useful. > > ## Conclusions > > The details are complex, but the story as a whole becomes more > intelligible when we require each classfile to locally declare its > value types, and handle all values appropriately according to the > local declaration. > > Outside of legacy code, and at its boundaries, tight control of null > values is feasible. Inside value-rich code, and across value-rich > APIs, full optimization seems within reach. > > Potential problems with ambiguity in L-world are effectively addressed > by a systematic side channel for local value type declarations, > assisting the interpratation of `L`-type descriptors. This side > channel can be the `ValueTypes` attribute. > From brian.goetz at oracle.com Fri May 11 17:07:59 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 11 May 2018 13:07:59 -0400 Subject: value type hygiene In-Reply-To: References: Message-ID: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> On 5/10/2018 4:36 PM, John Rose wrote: > On May 10, 2018, at 11:53 AM, Brian Goetz > wrote: >> Objection sustained. ?You are right; null-hostility is not documented >> in VBCs. > > It *is* documented for the VBC Optional: >> A variable whose type is {@code Optional} should >> never itself be {@code null}; it should always point to an {@code >> Optional} >> instance. > ?but not for the VBC LocalDate. ?So some VBCs that are null-friendly will > require a more nuanced migration story, since we don't want value types > per se to be nullable. More precisely, I think you'll find that Optional is the sole type that makes this claim, because the very point of Optional is to be better than a null sentinel.? So I think we can approximate this to "all VBCs are currently accidentally null-friendly."? This is the big migration challenge. > Options: ?Make *some* value types accept null according to some ad > hoc opt-in API, yuck. ?Make a nullable type constructor available, > like int?, > Optional? or Optional.BOX. ?Or limit nullability to particular > uses of the same type. ?Or create ValueRef as a workaround > interface (no VM or language changes!). ?The last two are my prefs. On "make some of them nullable", this maps to an idea we kicked around which is to mark values that used to be references (@MigratedFromRef), and have that trigger some looser behavior at either the language or VM or both.? More generally, this fits into a bigger idea (not yet even a story) which is that if we can reify the history of a class better, we can provide better migration tools. (Currently, we very much have a "these are the classes that are" philosophy, which is understandable, but it makes us guess about history, which isn't great.? Code evolves.) Nullable types are generally useful (though an enormous project, and not one particularly amenable to nibbling around the edges.)? This has the advantage that people who thought LD was nullable can just fix their code to say "LD?" and now everyone knows what's going on. In Q-world, LD and LD.BOX could map to separate type descriptors, with appropriate implicit conversions, which was good, but with the ValueTypes attribute, we lose the ability to mix and match LD and LD.BOX in one file (which is OK), but that means we need to either enforce uniform use (bleh) or have some crappy declaration like "import LD as nullable" in a place where the user will ignore it. ValueRef sounds like another flavor of Optional, and both share the problem that we don't have a generics story over values yet, which would make it magic. So, none of these sound like slam-dunks yet. > My high-order concern in all of this is to reduce degrees of freedom > in classfiles, from "every descriptor makes its own choice about nullity" > down to "each classfile makes its decision about each type's valueness" > backed up by "and if they don't agree, the JVM expects the user to fix > it by recompilation or special workarounds". ?This is why I'm not jumping > at the shiny possibility of int? and String! "just by adding info to the > descriptors"; the JVM complexity costs are large and optional for > value types. Understood.? We'll have to validate such moves against use cases, of course, but its a good ATBE position. > More on ValueRef: > > @ForValueTypesOnly > interface ValueRef> { > ? @CanBeFreebieDownCast > ? @SuppressWarnings("unchecked") > ? default T byValue() { return (T) this; } > } > > Ignore the annotations for a moment. ... and the magic-ness of it. IMO, I think one of the biases that has driven us into this corner is the distaste for boxes.? I think we've tried too hard to say "there are no boxes".? I think we should admit there are boxes, maybe ValueRef is spelled "Box" or V.BOX (where reference types box to themselves).? And then it has all the behavior of a heavy box, for better or worse. From brian.goetz at oracle.com Fri May 11 17:13:19 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 11 May 2018 13:13:19 -0400 Subject: value type hygiene In-Reply-To: References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> Message-ID: <2ae3da57-61cb-951d-8fb4-c940dcb44f2f@oracle.com> I get the motivation for this.? FTR, though, I am pretty skeptical that such unexpected and hard-to-explain NPEs won't show up more frequently than the ignorability threshold, and the result will be a perception that Java is unstable.? (Remember, people mix and match with libraries that have been compiled at all different language levels.) On 5/10/2018 9:36 PM, John Rose wrote: > > Again, JVM could go the extra mile to make this problem > disappear, by re-organizing the calling sequence of the override > tree as soon as the first legacy method shows up. ?For simplicity > I'd rather exclude this tactic until forced by experience to add it. > It seems like a heroic optimization to me, seldom used and likely > to be buggy. ?It also seems to spoil the whole performance party > when one bad actor shows up, which looks dubious to me. From forax at univ-mlv.fr Fri May 11 17:22:48 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 11 May 2018 19:22:48 +0200 (CEST) Subject: value type hygiene In-Reply-To: <2ae3da57-61cb-951d-8fb4-c940dcb44f2f@oracle.com> References: <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> <2ae3da57-61cb-951d-8fb4-c940dcb44f2f@oracle.com> Message-ID: <1646505560.1592619.1526059368944.JavaMail.zimbra@u-pem.fr> I agree with Brian that having spurious NPEs will not help. I think reporting a NPE is nefarious here, the semantics of a NPE is too engraved in the Java dev heads, we should report an incompatible change class error or exactly a subtype like by example VMHeroicEffortToProvideValueTypeMigrationFromAReferenceTypeFailError saying which class see the value type as a reference type (in order to be re-compiled). R?mi ----- Mail original ----- > De: "Brian Goetz" > ?: "John Rose" , "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Vendredi 11 Mai 2018 19:13:19 > Objet: Re: value type hygiene > I get the motivation for this.? FTR, though, I am pretty skeptical that > such unexpected and hard-to-explain NPEs won't show up more frequently > than the ignorability threshold, and the result will be a perception > that Java is unstable.? (Remember, people mix and match with libraries > that have been compiled at all different language levels.) > > On 5/10/2018 9:36 PM, John Rose wrote: >> >> Again, JVM could go the extra mile to make this problem >> disappear, by re-organizing the calling sequence of the override >> tree as soon as the first legacy method shows up. ?For simplicity >> I'd rather exclude this tactic until forced by experience to add it. >> It seems like a heroic optimization to me, seldom used and likely >> to be buggy. ?It also seems to spoil the whole performance party > > when one bad actor shows up, which looks dubious to me. From daniel.smith at oracle.com Fri May 11 17:48:56 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 11 May 2018 11:48:56 -0600 Subject: value type hygiene In-Reply-To: References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> Message-ID: > On May 10, 2018, at 7:36 PM, John Rose wrote: > > There could be an interface default method, or some other method, > which is simultaneously a member of two trees with two different > decisions about scalarization vs. buffering. This can be handled > by having the JVM create multiple adapters. I'd rather forbid the > condition that requires multiple adapters as a CLC violation, > because it is potentially complex and buggy, and it's not clear > we need this level of service from the JVM. Use case: --- Library code interface Event { LocalDateTime timestamp(); } --- Client 1 code, compiled with reference class LocalDateTime class Client1Event implements Event { ... } --- Client 2 code, compiled with value class LocalDateTime class Client2Event implements Event { ... } --- Various circumstances can lead to two different clients running on a single JVM, including, say, dependencies between different libraries. Am I understanding correctly that you would consider loading/invoking these methods to be a JVM error? From john.r.rose at oracle.com Fri May 11 17:58:49 2018 From: john.r.rose at oracle.com (John Rose) Date: Fri, 11 May 2018 10:58:49 -0700 Subject: value type hygiene In-Reply-To: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> Message-ID: On May 11, 2018, at 10:07 AM, Brian Goetz wrote: > >> >> More on ValueRef: >> >> @ForValueTypesOnly >> interface ValueRef> { >> @CanBeFreebieDownCast >> @SuppressWarnings("unchecked") >> default T byValue() { return (T) this; } >> } >> >> Ignore the annotations for a moment. > > ... and the magic-ness of it. > > IMO, I think one of the biases that has driven us into this corner is the distaste for boxes. I think we've tried too hard to say "there are no boxes". I think we should admit there are boxes, maybe ValueRef is spelled "Box" or V.BOX (where reference types box to themselves). And then it has all the behavior of a heavy box, for better or worse. > I suspect you don't yet realize how *little* magic there is in this particular spelling of V.BOX. It requires no VM changes, using only existing descriptors available today. It can hand you nullable types as *as a source code convention* above both the language and the VM. With a little magic (very little) it can serve as as a translation strategy totally above the VM for V and V.BOX. From brian.goetz at oracle.com Sat May 12 14:42:35 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 12 May 2018 10:42:35 -0400 Subject: Valhalla EG meeting notes March 28, 2018 In-Reply-To: References: Message-ID: > Specific issues with arrays and sub typing: I?ve lost track of the current state of this; is a V[] yet a subtype of Object[], as object classes are? If values are to play nicely with erased generics, we have to get there. For example: void sort(T[] elements, Comparator c) { ? } This erases to void sort(Object[] elements, Comparator c) { .. } So to sort an array of V, we need V[] <: Object[]. From brian.goetz at oracle.com Sat May 12 14:32:35 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 12 May 2018 10:32:35 -0400 Subject: value type hygiene In-Reply-To: References: Message-ID: <18B58ACE-1FB8-4D68-8652-F30CC7EE9A70@oracle.com> I want to drill into this point a bit. I know that you would prefer not to make heroic efforts to deal with cases where a random subset of a hierarchy was compiled with one polarity and the rest with the other, but I think there?s reasons to keep drilling on it. (Also, because I don?t think you?ll get away with it.) I?ll note that this reminds me further of a related migration discussion in Burlington (DanH was there too) when we were exploring how to crowbar L/D primitives into one slot. Teaching the interpreter to do so in a consistent world was easy; the hard part was dealing with calling sequences / overrides across separately compiled files that had differentially gotten the memo about the new encoding. We talked about various adapters/fixups needed at the junction of a slot arity mismatch. While we didn?t solve it then either, I claim we want to solve it anyway, because this is how we get to primitives as values. Just as the ValueTypes attribute lists the types known as of compile time to be values, we can similarly reify, on a per-class basis, whether the class is able to treat primitives as values. When class A calls / overrides a method in B with the same encoding, everything is great; when A and B differ on the value-ness of primitives, an adaptation, similar to the scalarize/bufferize adaptation, is needed at the junction. Over time, fewer classes will be out of date and the adapters will eventually be purged from the ecosystem. > On May 10, 2018, at 11:52 AM, Brian Goetz wrote: > > I?ll add that this reminds me very much of loader constraints. When class C calls method D.m(P)R, we first textually match the call with m(P)R in D via descriptor match, and then we additionally make sure that C and D agree on any loader constraints, throwing an error if they do not. In L-world, whether C and D think V is a value or object class is another kind of constraint. At linkage time, if these constraints agree, they can use an optimized protocol; if they disagree, rather than failing, the VM can introduce hidden adaptation to iron out the disagreement. This is a big win over the use of bridges in Q-world, since the adaptors are only generated at runtime when they are strictly needed, and as the ecosystem gets recompiled over time to a more uniform view of V?s value-ness, will steadily go away. We saw shades of this in Albert?s first prototype of heisenboxes, where the JIT compiled multiple versions of each method (if needed) according to different views of value-ness, and then fit them together, lego-style. From brian.goetz at oracle.com Sat May 12 14:34:27 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 12 May 2018 10:34:27 -0400 Subject: value type hygiene In-Reply-To: References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> Message-ID: I get that. What I?m saying is: boxes have a place in the user model. We may hate them, but without them, we likely find ourselves ?boxed? into a corner. So I don?t want them to be a library convention; I want them to be understood by, say, asType(). Otherwise we?re playing whack-a-mole. > On May 11, 2018, at 1:58 PM, John Rose wrote: > > On May 11, 2018, at 10:07 AM, Brian Goetz > wrote: >> >>> >>> More on ValueRef: >>> >>> @ForValueTypesOnly >>> interface ValueRef> { >>> @CanBeFreebieDownCast >>> @SuppressWarnings("unchecked") >>> default T byValue() { return (T) this; } >>> } >>> >>> Ignore the annotations for a moment. >> >> ... and the magic-ness of it. >> >> IMO, I think one of the biases that has driven us into this corner is the distaste for boxes. I think we've tried too hard to say "there are no boxes". I think we should admit there are boxes, maybe ValueRef is spelled "Box" or V.BOX (where reference types box to themselves). And then it has all the behavior of a heavy box, for better or worse. >> > > I suspect you don't yet realize how *little* magic there is in this > particular spelling of V.BOX. It requires no VM changes, using > only existing descriptors available today. It can hand you > nullable types as *as a source code convention* above both > the language and the VM. With a little magic (very little) it > can serve as as a translation strategy totally above the VM > for V and V.BOX. > > From john.r.rose at oracle.com Sun May 13 01:48:51 2018 From: john.r.rose at oracle.com (John Rose) Date: Sat, 12 May 2018 18:48:51 -0700 Subject: Valhalla EG meeting notes March 28, 2018 In-Reply-To: References: Message-ID: <518C124B-EB04-4E85-B5C7-0171B2E63B1A@oracle.com> On May 12, 2018, at 7:42 AM, Brian Goetz wrote: > > >> Specific issues with arrays and sub typing: > > I?ve lost track of the current state of this; is a V[] yet a subtype of Object[], as object classes are? > > If values are to play nicely with erased generics, we have to get there. For example: > > void sort(T[] elements, Comparator c) { ? } > > This erases to > > void sort(Object[] elements, Comparator c) { .. } > > So to sort an array of V, we Yes, we are already there. Using aastore on V[] makes V[] <: Object[] almost a forced move. L-world brings values into a place where we can erase them to the types we are used to, as bounds in generics ? John From john.r.rose at oracle.com Mon May 14 16:06:05 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 14 May 2018 09:06:05 -0700 Subject: value type hygiene In-Reply-To: References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> Message-ID: <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> On May 12, 2018, at 7:34 AM, Brian Goetz wrote: > > I get that. What I?m saying is: boxes have a place in the user model. We may hate them, but without them, we likely find ourselves ?boxed? into a corner. So I don?t want them to be a library convention; I want them to be understood by, say, asType(). Otherwise we?re playing whack-a-mole. I don't fully understand the point you make by "a library convention". Maybe "only a library convention unknown to the rest of the stack"? The existing box types are a library convention. It's even an irregular one (int vs. Integer). The JLS recognizes them (in the box/unbox rules). Following the JLS, so do asType, core reflection, etc. Since value types "code like a class" there are new moves for making library conventions unavailable to primitive types, and adding an interface super seems to be a better move, for value types, than the companion type pattern we must use for int/Integer. Further, we can meet many of the requirements met by the companion class pattern by using the generic-super pattern (V <: ValueRef) in the case of value types. One requirement *not* met by using the generic-super pattern is run-time reification of the box types, since before erasure ValueRef and ValueRef are different types, but to the VM they are the same type. But I don't think that's a deal-killer. Maybe you see a problem with the erasure that I don't? Specifically: Do we really need a VT version of the reified wrapper type Integer? That's what I'm trying to question here, at the risk of playing whack-a-mole. There is serious money to be saved if we can decide the companion class isn't needed in the case of value types, even if it is necessary scaffolding for non-L-types. It seems to me that most or all of the machinery in reflection and method handles and the JLS for special-casing the companion classes exists to hoist primitives into the L-descriptor world. When the hoisting occurs to a wrapper class, many use cases go straight up to Object itself (or to another super of a wrapper). Others stay on the wrapper just to make a method call like toString or hashCode. Since value types are already classes with methods, and already are L-descriptors, it follows that they don't need wrapper types very often. Expressing nullity is one of those residual use cases; we know it happens sometimes but the JVM needs help calling it out as a special case. I claim we don't need a fully differentiated, runtime-reified wrapper type like Integer to handle those occasional special cases. In the JVM we just need a way to process the nullable VT instance as an Object or an interface. In the language an erasable static type like ValueRef works as well as a fully reified companion type like Integer. How far should language go in healing the gap between the int/Integer pattern and the VT/VR pattern? Probably not too far until we are ready to fully unify the primitives with values. But there are simple things we could do that might help, like making a new notation like C.BOX that connects the various types in new ways. ? John From john.r.rose at oracle.com Mon May 14 23:02:04 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 14 May 2018 16:02:04 -0700 Subject: value type hygiene In-Reply-To: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> References: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> Message-ID: <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> On May 11, 2018, at 7:39 AM, Frederic Parain wrote: > > John, > > I have a question about the semantic within legacy class files (class > files lacking a ValueTypes attribute). Your document clarifies the semantic > for fields as follow: > > "Meanwhile, C is allowed to putfield and getfield null all day long into its own fields (and fields of other benighted legacy classes that it may be friends with). Thus, the getfield and putfield instructions link to slightly different behavior, not only based on the format of the field, but also based on ?who?s asking?. Code in C is allowed to witness nulls in its Q fields, but code in A (upgraded) is not allowed to see them, even though it?s the same getfield to the same symbolic reference. Happily, fields are not shared widely across uncoordinated classfiles, so this is a corner case mainly for testers to worry about.? > > But what?s about arrays? If I follow the same logic that ?old code needs to > be left undisturbed if possible?, if a legacy class C creates an array of Q, > without knowing that Q is now a value type, C would expect to be allowed > to write and read null from this array, as it does from its own fields. Is it a > correct assumption? Yes, I avoided this question in the write-up. To apply the same move as fields, we could try to say that arrays of type V[] created by a legacy class C do not reject nulls, while arrays of type V[] created by normal classes (that recognize V as value types) are created as flattened. But the analogy between fields and array elements doesn't work in this case. While a class C can only define fields in itself, by creating arrays it is working with a common global type. Think of V[] as a global type, and you'll see that it needs a global definition of what is flattened and what is nullable. I think we will get away with migrating types and declaring that legacy classes that use their arrays will fail. The mode of failure needs engineering via experiment. We could go so far as to reject legacy classes that use anewarray to build arrays of value type, without putting those types on the ValueTypes list. This means that if there is a current class C out there that is creating arrays of type Optional[] or type LocalDate[], then if one of those types is migrated to a value type, then C becomes a legacy class, and it will probably fail to operate correctly. OTOH, since those classes use factories to create non-null values of type Optional or LocalDate, such a legacy class is likely to refrain from using nulls. I think it's possible but not likely that the author of a legacy class will make some clever use of nulls, storing them into an array of upgraded type V. In the end, some legacy code will not port forward without recompilation and even recoding. Let's do what we can to make it easier to diagnose and upgrade such code, as long as it doesn't hurt the basic requirement of making values flattenable. The idea of making fields nullable seems a reasonable low-cost compromise, but making elements nullable a much higher cost. Any need for a boxy or nullable array is more easily served by an explicit reference array, of type Object[] or ValueRef[]. Overloading that behavior into V[] is asking for long-term trouble with performance surprises. Erased Object or interface arrays will fill this gap just as well as a first-class nullable VT.BOX[], with few exceptions. I think those exceptions are manageable by other means than complicating (un-flattening) the basic data types of the VM. > This would mean that the JVM would have to make the distinction between > an array of nullable elements, and an array of non-nullable elements. We could try this, but let's prove that it's worth the trouble before pulling on that string. I'm proposing Object[] and ValueRef[] as workaround types. > Which > could be a good thing if we want to catch leaking of arrays with potentially > null elements from old code to new code, instead of waiting for new code > to access a null element to throw an exception. Why not try to catch the problem when the array is created? Have the anewarray instruction do a cross check (like CLCs) between the base type of the array and local ValueTypes. > In the other hand, the lazy > check solution allows arrays of non-nullable elements with zero null elements > to work fine with new code. So, we have discussed the alternative of adding extra polymorphism to all value array types: Some arrays are flat and reject nulls, while others are boxy and accept nulls. But here again I want to push back against inventing a parallel set of boxy implementations, because it's a long term systemic cost for a short term marginal gain. Besides, some library classes don't use native anewarray but use jlr.Array.newInstance to make arrays. Do we make that guy caller-sensitive so he can tell which kind of array to make? I think this is a long string to pull on. It's easier to define something as "clearly in error" (see above) than to try to fix it on the fly, because you probably have to fix more and more stuff, and keep track of the fixes. Like I say, long term cost for marginal migration improvements. > From an implementation point of view, the JVM already has to make the > distinction between flattened and not flattened arrays, so there?s a logic > in place to detect some internal constraints of arrays, but the nullable/ > non-nullable element semantic would require one additional bit. We *can* do this, but we shouldn't because (a) it's a long string to pull on a user model that is ultimately disappointing, and (b) it means that even optimized code, twenty years from now, will have to deal with this extra polymorphism. ? John From forax at univ-mlv.fr Mon May 14 23:13:59 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 15 May 2018 01:13:59 +0200 (CEST) Subject: value type hygiene In-Reply-To: <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> Message-ID: <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> I think i prefer a declaration site annotation like @ValueBasedClass to a use site annotation ValueRef. For me a value based class, if you want to be as compatible as possible, is a reference type (so 'L') that behave like a value type at runtime, so the JIT can see it has a value type at runtime and can unbox it at will and buffer/box it before calling a method at the horizon of the inlining blob. So as a library designer, i can choose to either replace a class by a real value type and it will fail if null is present or as a value based class if i value (sorry !) the backward compatibility more than the performance. Note that even if we solve the null issue when a reference type is changed to a value type and i think it's not something we should focus on, there is still the problem of the identity, so transforming a reference to a value type will never be 100% compatible. R?mi > De: "John Rose" > ?: "Brian Goetz" > Cc: "valhalla-spec-experts" > Envoy?: Lundi 14 Mai 2018 18:06:05 > Objet: Re: value type hygiene > On May 12, 2018, at 7:34 AM, Brian Goetz < [ mailto:brian.goetz at oracle.com | > brian.goetz at oracle.com ] > wrote: >> I get that. What I?m saying is: boxes have a place in the user model. We may >> hate them, but without them, we likely find ourselves ?boxed? into a corner. So >> I don?t want them to be a library convention; I want them to be understood by, >> say, asType(). Otherwise we?re playing whack-a-mole. > I don't fully understand the point you make by "a library convention". Maybe > "only a > library convention unknown to the rest of the stack"? The existing box types are > a library convention. It's even an irregular one (int vs. Integer). The JLS > recognizes > them (in the box/unbox rules). Following the JLS, so do asType, core reflection, > etc. > Since value types "code like a class" there are new moves for making library > conventions unavailable to primitive types, and adding an interface super seems > to be a better move, for value types, than the companion type pattern we must > use for int/Integer. > Further, we can meet many of the requirements met by the companion class pattern > by using the generic-super pattern (V <: ValueRef) in the case of value > types. > One requirement *not* met by using the generic-super pattern is run-time > reification > of the box types, since before erasure ValueRef and ValueRef > are different types, but to the VM they are the same type. But I don't think > that's a > deal-killer. Maybe you see a problem with the erasure that I don't? > Specifically: Do we really need a VT version of the reified wrapper type > Integer? > That's what I'm trying to question here, at the risk of playing whack-a-mole. > There > is serious money to be saved if we can decide the companion class isn't needed > in the case of value types, even if it is necessary scaffolding for non-L-types. > It seems to me that most or all of the machinery in reflection and method > handles > and the JLS for special-casing the companion classes exists to hoist primitives > into the L-descriptor world. When the hoisting occurs to a wrapper class, many > use cases go straight up to Object itself (or to another super of a wrapper). > Others stay on the wrapper just to make a method call like toString or hashCode. > Since value types are already classes with methods, and already are > L-descriptors, > it follows that they don't need wrapper types very often. > Expressing nullity is one of those residual use cases; we know it happens > sometimes > but the JVM needs help calling it out as a special case. I claim we don't need a > fully > differentiated, runtime-reified wrapper type like Integer to handle those > occasional > special cases. In the JVM we just need a way to process the nullable VT instance > as > an Object or an interface. In the language an erasable static type like > ValueRef > works as well as a fully reified companion type like Integer. > How far should language go in healing the gap between the int/Integer pattern > and > the VT/VR pattern? Probably not too far until we are ready to fully unify > the > primitives with values. But there are simple things we could do that might help, > like making a new notation like C.BOX that connects the various types in new > ways. > ? John From john.r.rose at oracle.com Mon May 14 23:53:31 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 14 May 2018 16:53:31 -0700 Subject: value type hygiene In-Reply-To: <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> Message-ID: <682CDE4C-E077-4899-9460-27FBF4E11075@oracle.com> On May 14, 2018, at 4:13 PM, Remi Forax wrote: > > I think i prefer a declaration site annotation like @ValueBasedClass to a use site annotation ValueRef. In case it's not clear what I am proposing, I am not weighing in on language notations, but rather on VM structures in the context of the L-world experiment: - The L-world JVM should *not* be asked to invent two new runtime types for every VT. - Nullable references to value types in new code should be expressed using reference types. - In bytecodes, such nullable references can be Object or a new VT-specific interface or a mix of both. In source code, I really don't care how nullability is expressed: annotation? keyword? punctuation? Java 5 generic? etc. In source code there is the *option* to use interface ValueRef to denote "nullable ref to VT", erased to "LValueRef;" in the JVM. I think that would be a clever use of erased static types, and a thrifty re-use of existing Java 5 language features. (Surely a bespoke syntax would be more concise; that's fine, but when it boils down to a JVM descriptor, please let it be non-nullable "LVT;" or else a reference type like "Ljava/lang/Object;" or "Ljava/lang/ValueMumble;", not a new kind of descriptor or an MVT-style mangled name. I am claiming we don't need that and don't want to pay for it. Please prove me wrong, if I am wrong, by experiment with a restricted L-world JVM running real code, and not by positing various unlikely compatibility puzzles.) > For me a value based class, if you want to be as compatible as possible, is a reference type (so 'L') that behave like a value type at runtime, so the JIT can see it has a value type at runtime and can unbox it at will and buffer/box it before calling a method at the horizon of the inlining blob. > > So as a library designer, i can choose to either replace a class by a real value type and it will fail if null is present or as a value based class if i value (sorry !) the backward compatibility more than the performance. (Now I'll respond to your suggestion of notation.) The annotation approach you suggest makes it nice and easy to sneak an annotation next to a use of a type. That's appealing. It's also sneaky since it slightly modifies the behavior of the problem (NPE vs. pass). The downside of the annotation is that, unless coupled with an annotation-sensitive translation strategy, the JVM has to mine through annotations to find out which descriptor components are nullable and which are not. The ValueTypes attribute is our best proposal yet for a way to pass the necessary information through to the JVM; surely you don't expect the JVM to turn around and look at type annotations in addition? I prefer ValueRef as a way to express nullability (instead of a type annotation) because it carries the same information, but all in the (static) type system. After erasure, the JVM can inspect the descriptor directly and immediately see whether the type is nullable or not. (ValueRef is *not* in the ValueTypes table. Simple!) You could also drive the translation strategy to perform surgery on descriptor elements annotated by @ValueBasedClass. Thus, an argument or return type of type VT with that annotation would be adjusted by the translation strategy to descriptor Object (erased from VT-or-null) or a suitable interface (such as ValueRef). That is an erasure, and so will disturb overloading rules and other things. Or we could push the annotation down into the descriptor syntax as a modifier. That's Q-world, with its own complementary problems we are trying to avoid by running the L-world experiment. One problem that shows up both at JVM and language level is that any special mechanisms we invent to grease the skids of migration will be with us forever, even after migration is a distant memory. I'd want a story for deprecating and removing API points that need the extra grease of @ValueBasedClass. Eventually @ValueBasedClass itself should be deprecated. > Note that even if we solve the null issue when a reference type is changed to a value type and i think it's not something we should focus on, there is still the problem of the identity, so transforming a reference to a value type will never be 100% compatible. +100. Trying to cleverly solve every last compatibility problem will freeze us into inaction. ? John From john.r.rose at oracle.com Tue May 15 00:57:01 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 14 May 2018 17:57:01 -0700 Subject: value type hygiene In-Reply-To: References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> Message-ID: <842319FF-B112-43CD-8824-39EFD5AF07E5@oracle.com> On May 11, 2018, at 10:48 AM, Dan Smith wrote: > >> On May 10, 2018, at 7:36 PM, John Rose > wrote: >> >> There could be an interface default method, or some other method, >> which is simultaneously a member of two trees with two different >> decisions about scalarization vs. buffering. This can be handled >> by having the JVM create multiple adapters. I'd rather forbid the >> condition that requires multiple adapters as a CLC violation, >> because it is potentially complex and buggy, and it's not clear >> we need this level of service from the JVM. > > Use case: > > > --- > > Library code > > interface Event { > LocalDateTime timestamp(); > } > > --- > > Client 1 code, compiled with reference class LocalDateTime > > class Client1Event implements Event { ... } > > --- > > Client 2 code, compiled with value class LocalDateTime > > class Client2Event implements Event { ... } > > --- > > Various circumstances can lead to two different clients running on a single JVM, including, say, dependencies between different libraries. > > Am I understanding correctly that you would consider loading/invoking these methods to be a JVM error? (Do you mean "loading these classes"?) Let's assume, because it is most likely, that the library interface Event is updated to know the LDT is a VT. Client1Event is the legacy class. Depending on what it does with the Event type, its code is potentially out of date. We have grounds to hope it could still run; if it only uses factory methods to obtain LDT values, and doesn't introduced nulls on its own, then it might just work. OTOH, there are lots of corner cases where it might get itself into trouble. We sometimes throw AME after loading a no-longer-valid implementation of an interface. The AME is thrown later on when an affected API point is called. Some gross violations (as when something is no longer an interface) are sanctioned when the class is loaded. I am considering various ways to state that Client1Event, despite its innocent-looking method descriptor, is no longer a valid implementation of Event after the upgrade of Event. There are various ways to do detect and report the mismatch: 1. Exclude loading the class, on grounds similar to CLC's. 2. Fail to fill the v-table slot for Event.timestamp, leading to AME. 3. Fill the v-table slot with a null-rejecting version of Client1Event.timestamp. The earlier ways give more decisive diagnostics, but reject workable code along with unworkable code. What you quote above is me suggesting something like #1. That would give a fast-fail diagnostic. I think it's a good first experiment to try, although I suppose we are likely to find it is too harsh. A more lenient option is #3, because it sanctions unworkable code only when that code actually produces a null and tries to mix it into an API that treats values as non-nullable types. Doing this requires putting an adapter of some sort into the v-table entry for the override of the Event.timestamp method in Client1Event. The adapter has to agree to call the legacy method, and then null-check its return value. I think I prefer #3 to #1, FTR, although I'd like to get away with #1. Tobias points out that we seem to need two entry points for methods with optimized calling sequences, one for when all callers and callees agree on flattening, and another for use in contexts where close coupling of calling sequences is not desired or not possible. This may include lambda forms, reflection, and/or the interpreter. So #3 has an efficient implementation in that setup, where Client1Event.timestamp has a non-flattened entry point for legacy code to call, and one with the upgraded calling sequence, for use in the v-table. The flattened entry point simply null-checks the return value; otherwise it is identical to the non-flattened one. In the above quoted text, I say, "This can be handled by having the JVM create multiple adapters". The reason I want to avoid that, even at the cost of the harshness of #1, is that a function whose descriptor mentions N different value types has up to 2^N different calling sequences. I'd rather pick the one that most optimally applies to the API that defines the method which creates the v-table slot, and define at most one more, which boxes everything and is used in all cases where the preferred calling sequence won't work (such as legacy code wanting to pass null). It's natural to ask, "Why can't we all be friends here?" Maybe we could allow legacy code free use of nulls in all APIs which mention value types, even new APIs. Then legacy code will never experience a NPE even when using upgraded value type values. I'm pretty sure this option would be much more expensive to implement than the previous options (1/2/3). All optimized v-table calling sequences would be speculative and tentative, flattening only until the first null shows up, and deoptimizing after that, or at least using data-dependent slow paths. And the expense would leak through to users, because bad actors (who refuse to upgrade) would slow down all code using the new APIs. (Reminder: v-table slot refers to an overridable method descriptor in a particular class which first mentions that descriptor. It is a JVM level concept but is portable across JVMs. Calling sequences are easier to optimize in other settings, because we know exactly which method we are calling. But v-table slots mediate virtual and interface methods, which must agree on a common calling sequence for any given v-table slot.) ? John From john.r.rose at oracle.com Tue May 15 01:18:20 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 14 May 2018 18:18:20 -0700 Subject: value type hygiene In-Reply-To: <18B58ACE-1FB8-4D68-8652-F30CC7EE9A70@oracle.com> References: <18B58ACE-1FB8-4D68-8652-F30CC7EE9A70@oracle.com> Message-ID: <59A6AE0C-9CAA-4C20-AD05-0E401F95F18E@oracle.com> On May 12, 2018, at 7:32 AM, Brian Goetz wrote: > > I want to drill into this point a bit. I know that you would prefer not to make heroic efforts to deal with cases where a random subset of a hierarchy was compiled with one polarity and the rest with the other, but I think there?s reasons to keep drilling on it. (Also, because I don?t think you?ll get away with it.) When you say "polarity" it sounds like the two states are equally likely. In fact, one state is clearly preferable, and one is locally incorrect and tolerated?to some extent?by the rest of the system. This is true in both Q-world and L-world. In Q-world, eventually the adapter spinners will say, "you can't do that". In L-world, there are no adapter spinners, but the same "laws of value physics" apply. At some point legacy code can do something so bad that we must forbid it, lest we inflict a performance hit on the entire system. What is the point at which that occurs? I think the easiest way to find this point is make a fairly restrictive JVM (plus lint modes in javac) and learn what bad code looks like, and (more subtly) what not-quite-bad code looks like, that we want to keep running. An obvious example of bad code is something that calls monitorenter. In the setting of null hygiene, bad code can use null as an out-of-band value of a VBC, and expect the rest of the system to faithfully store it in arbitrary containers, eventually returning it undamanged to the bad actor. This can't work. Finding ways to sieve out the really bad legacy code from workable stuff is an experimental process. > I?ll note that this reminds me further of a related migration discussion in Burlington (DanH was there too) when we were exploring how to crowbar L/D primitives into one slot. Teaching the interpreter to do so in a consistent world was easy; the hard part was dealing with calling sequences / overrides across separately compiled files that had differentially gotten the memo about the new encoding. We talked about various adapters/fixups needed at the junction of a slot arity mismatch. The thing that's different here is there are no explicit adapters across different descriptors. The L-world JVM can spin them internally as needed, as it already does, under the matched descriptors. What's also different is everything runs under one slot; the J and D descriptors are just hard to work with because of low-level stack wrangling problems. > While we didn?t solve it then either, I claim we want to solve it anyway, because this is how we get to primitives as values. Based on our conversations about the matter, I'll be surprised if we get to primitives as values and preserve the two-slot nature of J and D, unless we introduce one-slot versions of those types. The two-slot problem is hard all the way around, not just in L-world. In any case, I think we don't need to solve primitives-as-values in L-world, as long as we think L-world doesn't deprives us of important moves. > Just as the ValueTypes attribute lists the types known as of compile time to be values, we can similarly reify, on a per-class basis, whether the class is able to treat primitives as values. When class A calls / overrides a method in B with the same encoding, everything is great; when A and B differ on the value-ness of primitives, an adaptation, similar to the scalarize/bufferize adaptation, is needed at the junction. Over time, fewer classes will be out of date and the adapters will eventually be purged from the ecosystem. That sounds plausible. We might be able to pull the trick of normalizing old descriptors to new one, rewriting "I" to "Lint;" etc. Or we might choose to let new and old descriptors co-exist. It does feel like the flattened calling sequence problem we are talking about with L-world, but it is riskier, since the classfiles must embody a tentative solution to a problem that might change between compile time and run time, rather than having the JVM come up with a solution on the fly, from final information. I don't think we ever want to change J and D to single slot entities. I think we will want to make new L-type descriptors like Llong; and Ldouble; (or Q-type, in Q-world). ? John From david.holmes at oracle.com Tue May 15 01:32:44 2018 From: david.holmes at oracle.com (David Holmes) Date: Tue, 15 May 2018 11:32:44 +1000 Subject: JEP-181 Nest-based Access Control is out for review Message-ID: <2a24bb8c-0a2b-4567-6cfd-40fdae649577@oracle.com> I'm pleased to announce that the implementation of JEP-181 has now gone out for community review with the intent it be targeted to, and integrated into JDK 11 - hopefully by the end of this month. (Those processes still have to follow their natural course ...) My thanks and appreciation to all those involved in the project so far: Alex Buckley, Maurizio Cimadamore, Mandy Chung, Tobias Hartmann, Vladimir Ivanov, Karen Kinnear, Vladimir Kozlov, John Rose, Dan Smith, Serguei Spitsyn, Kumar Srinivasan. Hopefully I did not overlook anyone. David From paul.sandoz at oracle.com Tue May 15 01:53:31 2018 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 14 May 2018 18:53:31 -0700 Subject: value type hygiene In-Reply-To: <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> References: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> Message-ID: <14FCCF01-AB8B-48A1-993C-2D00D6527909@oracle.com> Hi John, The answers below might depend on experimentation but what might you propose the behavior should be for the following code, assuming we have no specialized generics, ArrayList is not yet modified to cope better: value class Point { ? } class VW { public static void main(String[] s) { List l = new ArrayList<>(); l.add(P.default); l.add(P.default); // assuming this works :-) Point[] p = new Point[10]; // Flattened array is created l.toArray(p); // What should happen here? } } (I know toArray is value hostile and maybe should be deprecated or changed but I find it a useful example to think about as it may be indicative of legacy code in general.) Should the call to l.toArray link? If so then i presume some form of array store exception will be thrown when ArrayList attempts to store null into the flattened array at index 2? Or: Point[] p = l.toArray(new Point[2]); a flattened array is returned (the argument)? assuming System.arraycopy works. Or: Point[] p = l.toArray(new Point[1]); a non-flattened array is returned? since Arrays.copyOf operates reflectively on the argument?s class and not additional runtime properties. What about: Object[] o = l.toArray(); A non-flattened array is returned containing elements that are instances of boxed Point? Paul. > On May 14, 2018, at 4:02 PM, John Rose wrote: > > On May 11, 2018, at 7:39 AM, Frederic Parain wrote: >> >> John, >> >> I have a question about the semantic within legacy class files (class >> files lacking a ValueTypes attribute). Your document clarifies the semantic >> for fields as follow: >> >> "Meanwhile, C is allowed to putfield and getfield null all day long into its own fields (and fields of other benighted legacy classes that it may be friends with). Thus, the getfield and putfield instructions link to slightly different behavior, not only based on the format of the field, but also based on ?who?s asking?. Code in C is allowed to witness nulls in its Q fields, but code in A (upgraded) is not allowed to see them, even though it?s the same getfield to the same symbolic reference. Happily, fields are not shared widely across uncoordinated classfiles, so this is a corner case mainly for testers to worry about.? >> >> But what?s about arrays? If I follow the same logic that ?old code needs to >> be left undisturbed if possible?, if a legacy class C creates an array of Q, >> without knowing that Q is now a value type, C would expect to be allowed >> to write and read null from this array, as it does from its own fields. Is it a >> correct assumption? > > Yes, I avoided this question in the write-up. To apply the same move > as fields, we could try to say that arrays of type V[] created by a legacy > class C do not reject nulls, while arrays of type V[] created by normal > classes (that recognize V as value types) are created as flattened. > > But the analogy between fields and array elements doesn't work in this > case. While a class C can only define fields in itself, by creating arrays > it is working with a common global type. Think of V[] as a global type, > and you'll see that it needs a global definition of what is flattened and > what is nullable. I think we will get away with migrating types and > declaring that legacy classes that use their arrays will fail. The mode > of failure needs engineering via experiment. We could go so far as > to reject legacy classes that use anewarray to build arrays of value > type, without putting those types on the ValueTypes list. > > This means that if there is a current class C out there that is creating > arrays of type Optional[] or type LocalDate[], then if one of those types > is migrated to a value type, then C becomes a legacy class, and it will > probably fail to operate correctly. OTOH, since those classes use > factories to create non-null values of type Optional or LocalDate, such > a legacy class is likely to refrain from using nulls. I think it's possible > but not likely that the author of a legacy class will make some clever > use of nulls, storing them into an array of upgraded type V. > > In the end, some legacy code will not port forward without recompilation > and even recoding. Let's do what we can to make it easier to diagnose > and upgrade such code, as long as it doesn't hurt the basic requirement > of making values flattenable. The idea of making fields nullable seems > a reasonable low-cost compromise, but making elements nullable a > much higher cost. > > Any need for a boxy or nullable array is more easily served by an explicit > reference array, of type Object[] or ValueRef[]. Overloading that behavior > into V[] is asking for long-term trouble with performance surprises. Erased > Object or interface arrays will fill this gap just as well as a first-class nullable > VT.BOX[], with few exceptions. I think those exceptions are manageable by > other means than complicating (un-flattening) the basic data types of the VM. > >> This would mean that the JVM would have to make the distinction between >> an array of nullable elements, and an array of non-nullable elements. > > We could try this, but let's prove that it's worth the trouble before pulling > on that string. I'm proposing Object[] and ValueRef[] as workaround > types. > >> Which >> could be a good thing if we want to catch leaking of arrays with potentially >> null elements from old code to new code, instead of waiting for new code >> to access a null element to throw an exception. > > Why not try to catch the problem when the array is created? Have the > anewarray instruction do a cross check (like CLCs) between the base type > of the array and local ValueTypes. > >> In the other hand, the lazy >> check solution allows arrays of non-nullable elements with zero null elements >> to work fine with new code. > > So, we have discussed the alternative of adding extra polymorphism to > all value array types: Some arrays are flat and reject nulls, while others > are boxy and accept nulls. But here again I want to push back against > inventing a parallel set of boxy implementations, because it's a long term > systemic cost for a short term marginal gain. > > Besides, some library classes don't use native anewarray but use > jlr.Array.newInstance to make arrays. Do we make that guy caller-sensitive > so he can tell which kind of array to make? I think this is a long string to > pull on. It's easier to define something as "clearly in error" (see above) > than to try to fix it on the fly, because you probably have to fix more and > more stuff, and keep track of the fixes. Like I say, long term cost for > marginal migration improvements. > > >> From an implementation point of view, the JVM already has to make the >> distinction between flattened and not flattened arrays, so there?s a logic >> in place to detect some internal constraints of arrays, but the nullable/ >> non-nullable element semantic would require one additional bit. > > We *can* do this, but we shouldn't because (a) it's a long string to pull > on a user model that is ultimately disappointing, and (b) it means that > even optimized code, twenty years from now, will have to deal with > this extra polymorphism. > > ? John > From forax at univ-mlv.fr Tue May 15 06:12:22 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 15 May 2018 08:12:22 +0200 (CEST) Subject: value type hygiene In-Reply-To: <682CDE4C-E077-4899-9460-27FBF4E11075@oracle.com> References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> <682CDE4C-E077-4899-9460-27FBF4E11075@oracle.com> Message-ID: <1297467316.468949.1526364742437.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Envoy?: Mardi 15 Mai 2018 01:53:31 > Objet: Re: value type hygiene > On May 14, 2018, at 4:13 PM, Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >> I think i prefer a declaration site annotation like @ValueBasedClass to a use >> site annotation ValueRef. > In case it's not clear what I am proposing, I am not weighing in on language > notations, but rather on VM structures in the context of the L-world experiment: > - The L-world JVM should *not* be asked to invent two new runtime types for > every VT. > - Nullable references to value types in new code should be expressed using > reference types. > - In bytecodes, such nullable references can be Object or a new VT-specific > interface or a mix of both. I agree with your first 2 items but not the last one, a nullable reference should a L reference which is not listed in the attribute ValueTypes. So to be clear, what i'm proposing is to have a way at source level to say it's a value type at runtime but it behave like a reference type, so a class which is tagged with the value type bit and that the compiler doesn't list in the attribute ValueTypes. > In source code, I really don't care how nullability is expressed: annotation? > keyword? punctuation? Java 5 generic? etc. In my opinion, it has to be a declaration site thing not a use site thing but as you said, i do not mind if it's an annotation or a keyword. But it should not be an erased generics since it's a use site 'annotation'. That why i said i prefer an annotation at declaration site like @ValueBased. > In source code there is the *option* to use interface ValueRef to denote > "nullable ref to VT", erased to "LValueRef;" in the JVM. I think that would be a > clever use of erased static types, and a thrifty re-use of existing Java 5 > language > features. I do not think we need to invent something else that the attribute ValueTypes. > (Surely a bespoke syntax would be more concise; that's fine, but when it boils > down to a JVM descriptor, please let it be non-nullable "LVT;" or else a > reference > type like "Ljava/lang/Object;" or "Ljava/lang/ValueMumble;", not a new kind of > descriptor or an MVT-style mangled name. I am claiming we don't need that > and don't want to pay for it. Please prove me wrong, if I am wrong, by > experiment with a restricted L-world JVM running real code, and not by positing > various unlikely compatibility puzzles.) I agree with you. >> For me a value based class, if you want to be as compatible as possible, is a >> reference type (so 'L') that behave like a value type at runtime, so the JIT >> can see it has a value type at runtime and can unbox it at will and buffer/box >> it before calling a method at the horizon of the inlining blob. >> So as a library designer, i can choose to either replace a class by a real value >> type and it will fail if null is present or as a value based class if i value >> (sorry !) the backward compatibility more than the performance. > (Now I'll respond to your suggestion of notation.) > The annotation approach you suggest makes it nice and easy to sneak an > annotation next to a use of a type. That's appealing. It's also sneaky since > it slightly modifies the behavior of the problem (NPE vs. pass). The downside > of the annotation is that, unless coupled with an annotation-sensitive > translation > strategy, the JVM has to mine through annotations to find out which descriptor > components are nullable and which are not. The ValueTypes attribute is our > best proposal yet for a way to pass the necessary information through to > the JVM; surely you don't expect the JVM to turn around and look at > type annotations in addition? I was not clear in my previous mail, i was proposing a declaration site mechanism like the annotation, not to use that annotation at runtime. > I prefer ValueRef as a way to express nullability (instead of a type > annotation) > because it carries the same information, but all in the (static) type system. > After erasure, the JVM can inspect the descriptor directly and immediately > see whether the type is nullable or not. (ValueRef is *not* in the ValueTypes > table. Simple!) i think that the attribute ValueTypes is enough so we do not have to have another use site thingy. > You could also drive the translation strategy to perform surgery on descriptor > elements annotated by @ValueBasedClass. Thus, an argument or return > type of type VT with that annotation would be adjusted by the translation > strategy to descriptor Object (erased from VT-or-null) or a suitable interface > (such as ValueRef). That is an erasure, and so will disturb overloading rules > and other things. > Or we could push the annotation down into the descriptor syntax as a modifier. > That's Q-world, with its own complementary problems we are trying to avoid > by running the L-world experiment. > One problem that shows up both at JVM and language level is that any > special mechanisms we invent to grease the skids of migration will be > with us forever, even after migration is a distant memory. I'd want a > story for deprecating and removing API points that need the extra > grease of @ValueBasedClass. Eventually @ValueBasedClass itself > should be deprecated. yes, we still use raw types but again if it's a class declaration hint for the compiler, it will be used for a handful class in the source code, it uses the attribute ValueTypes, for the VM side it's not another mechanism, it's the same mechanism but we let the end user to choose which kind of compatibility he wants. >> Note that even if we solve the null issue when a reference type is changed to a >> value type and i think it's not something we should focus on, there is still >> the problem of the identity, so transforming a reference to a value type will >> never be 100% compatible. > +100. Trying to cleverly solve every last compatibility problem will > freeze us into inaction. > ? John R?mi From john.r.rose at oracle.com Tue May 15 06:36:31 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 14 May 2018 23:36:31 -0700 Subject: value type hygiene In-Reply-To: <14FCCF01-AB8B-48A1-993C-2D00D6527909@oracle.com> References: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> <14FCCF01-AB8B-48A1-993C-2D00D6527909@oracle.com> Message-ID: On May 14, 2018, at 6:53 PM, Paul Sandoz wrote: > > Hi John, > > The answers below might depend on experimentation but what might you propose the behavior should be for the following code, assuming we have no specialized generics, ArrayList is not yet modified to cope better: This is the right sort of question to ask and answer about value types, and the sooner we get L-world up and running, the quicker we can validate our answers. We'll probably end up with a bunch of rules of thumb about how to handle nulls. In this case, null is a sentinel value used by the external API. There are two main choices with cases like this: (a) disallow the null when used with null-hostile types (V[]) and (b) adapt null to another value when a null-hostile type is detected. > value class Point { ? } > > class VW { > public static void main(String[] s) { > List l = new ArrayList<>(); > l.add(P.default); > l.add(P.default); // assuming this works :-) (There's no reason why it shouldn't work.) > Point[] p = new Point[10]; // Flattened array is created > l.toArray(p); // What should happen here? > } > } > > (I know toArray is value hostile and maybe should be deprecated or changed but I find it a useful example to think about as it may be indicative of legacy code in general.) In the case of List.toArray(.) the simple answer IMO is (b), and the sentinel value is clearly T.default, to be computed something like this: diff --git a/src/java.base/share/classes/java/util/AbstractCollection.java b/src/java.base/share/classes/java/util/AbstractCollection.java --- a/src/java.base/share/classes/java/util/AbstractCollection.java +++ b/src/java.base/share/classes/java/util/AbstractCollection.java @@ -186,13 +186,13 @@ for (int i = 0; i < r.length; i++) { if (! it.hasNext()) { // fewer elements than expected if (a == r) { - r[i] = null; // null-terminate + r[i] = a.getClass().getDefaultValue() } else if (a.length < i) { return Arrays.copyOf(r, i); } else { System.arraycopy(r, 0, a, 0, i); if (a.length > i) { - a[i] = null; + a[i] = a.getClass().getDefaultValue() } } return a; Eventually when int[] <: Object[], then int[].class.getClass().getDefaultValue() will return an appropriate zero value, at which point the above behavior will "work like an int". Another way to make this API point "work like an int" would be to throw an exception (ASE or the like), on the grounds that you can't store a null into an int[] so you shouldn't be able to store a null into a Point[]. > Should the call to l.toArray link? Yes, because Point[] <: Object[]. There's a separate question on whether the source language should allow the instance List; I think it should do so because that's more useful than disallowing it. > If so then i presume some form of array store exception will be thrown when ArrayList attempts to store null into the flattened array at index 2? In the case of the List API it's more useful for the container, which is attempting to contain all kinds of data, to bend a little and store T.default as the proper generalization of null. Under this theory, Object.default == null, and X.default is also null for any non-value, non-primitive X. (Including Integer but not int.) > Or: > > Point[] p = l.toArray(new Point[2]); > > a flattened array is returned (the argument)? assuming System.arraycopy works. It does. (Or will.) Reason: Point[] <: Object[]. > Or: > > Point[] p = l.toArray(new Point[1]); > > a non-flattened array is returned? The reflective argument's class is Point[], so Arrays.copyOf has no choice but to create another instance of Point[], which will also be flattened. It appears that Arrays.copyOf won't need any code changes for values. (As I replied to Frederic, it is technically possible to imagine a system of non-flat versions of VT[] co-existing with flat versions of VT[] but we shouldn't do that just because we can, but because there is a proven need and not doing it is even more costly than doing it. There are good substitutes for non-flat VT[], such as Object[] and I[] where VT <: I. We can even contrive to gain static typing for the substitutes, by using the ValueRef device.) > since Arrays.copyOf operates reflectively on the argument?s class and not additional runtime properties. I don't get this. What runtime properties are you thinking of? ValueClasses? That exists to give meaning to descriptors. The actual Class mirror always knows exactly whether it is a value class or not, and thus whether its arrays are flat or not. > What about: > > Object[] o = l.toArray(); > > A non-flattened array is returned containing elements that are instances of boxed Point? > > Paul. Yes, this is a non-flattened array, since Object[] is never flattened. Here's another option if (a) and (b) don't work out for List: Globally define a mapping between value types and null, and make the VM silently "unbox" null into the correct value type. This isn't a cure-all, because it masks true NPE errors in code. And it only applies when a null is being stored into a container which is strongly typed as a value type. When faced with a non-nullable container of a value type VT, promote stored nulls to VT.default, for all VT's or else for VT's which opt in (maybe VT <: interface PromoteNullToDefault). If we buy that trick, then a[i] = null turns into a[i] = VT.default automatically everywhere, not just in AbstractCollection. This is technically possible but IMO would require experimentation with a real VM running actual code, to see where the paradoxes arise from erasing nulls quietly to VT.default. I'd rather try to get away with changing the API of list to store not null but rather VT.default, when the passed-in array is a value array. This change only affects behavior on new types (value arrays) so it is backward compatible, in some strict sense. And it is arguably unsurprising to a programmer who is working with value arrays. At least, I think it is a defensible generalization of the old rule to store a null after the last stored output value. ? John From john.r.rose at oracle.com Tue May 15 06:56:49 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 14 May 2018 23:56:49 -0700 Subject: value type hygiene In-Reply-To: <1297467316.468949.1526364742437.JavaMail.zimbra@u-pem.fr> References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> <682CDE4C-E077-4899-9460-27FBF4E11075@oracle.com> <1297467316.468949.1526364742437.JavaMail.zimbra@u-pem.fr> Message-ID: On May 14, 2018, at 11:12 PM, forax at univ-mlv.fr wrote: > > > > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > Envoy?: Mardi 15 Mai 2018 01:53:31 > Objet: Re: value type hygiene > On May 14, 2018, at 4:13 PM, Remi Forax > wrote: > > I think i prefer a declaration site annotation like @ValueBasedClass to a use site annotation ValueRef. D'oh. You said declaration site and you really meant it, at the declaration of the value class itself? I was responding as if you were proposing an use-site annotation, where a method is declared that uses a VT but wants to make that occurrence of the VT nullable. So, some of my reply was non-responsive; sorry. > > In case it's not clear what I am proposing, I am not weighing in on language > notations, but rather on VM structures in the context of the L-world experiment: > > - The L-world JVM should *not* be asked to invent two new runtime types for every VT. > - Nullable references to value types in new code should be expressed using reference types. > - In bytecodes, such nullable references can be Object or a new VT-specific interface or a mix of both. > > I agree with your first 2 items but not the last one, a nullable reference should a L reference which is not listed in the attribute ValueTypes. You must admit this use of supers to carry nullable values is possible, but you are saying (I think) that you don't agree that this is useful. > So to be clear, what i'm proposing is to have a way at source level to say it's a value type at runtime but it behave like a reference type, > so a class which is tagged with the value type bit and that the compiler doesn't list in the attribute ValueTypes. What would be the benefit of such a value type? If it is nullable everywhere, conversely it is flattenable nowhere. That seems like it's giving up a fair chunk of valuable value-ness. The VT itself would resist identity checks (acmp => false). Would the arrays be flattenable or not? Seems to me that if a VT author uses such a big hammer to ask for nullability, the arrays also should allow null, hence be non-flattenable. I don't see much payoff from this user model. > > > In source code, I really don't care how nullability is expressed: annotation? > keyword? punctuation? Java 5 generic? etc. > > In my opinion, it has to be a declaration site thing not a use site thing but as you said, i do not mind if it's an annotation or a keyword. But it should not be an erased generics since it's a use site 'annotation'. > That why i said i prefer an annotation at declaration site like @ValueBased. > ... > I was not clear in my previous mail, i was proposing a declaration site mechanism like the annotation, not to use that annotation at runtime. Got it now; see above. > ... it will be used for a handful class in the source code, it uses the attribute ValueTypes, for the VM side it's not another mechanism, it's the same mechanism but we let the end user to choose which kind of compatibility he wants. Changing a VBC to a VT, and then putting @VBC on it, is like ten steps forward and nine steps back, if I'm correctly understanding the implications about flattening. So for def-site choices we have: 0. leave it alone, it's a VBC 1. make it a proper value type, get flattening on recompile, and deal with the null hygiene fallout 0.1 make it a value type but mark it @VBC, no sync or acmp, no flattening either The use-site choices for VTs are: 0. what choice? you didn't want that API point anyway 1. Object is the untyped workaround for all your nullable needs 1.2 clever ValueRef is your statically typed workaround for nullables 2. Q-world: ad hoc variations everywhere between L-VT and Q-VT (cost += 1e6) 3. some sugar like VT.BOX or an annotation for one of the previous ? John From forax at univ-mlv.fr Tue May 15 08:05:22 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 15 May 2018 10:05:22 +0200 (CEST) Subject: value type hygiene In-Reply-To: References: <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> <682CDE4C-E077-4899-9460-27FBF4E11075@oracle.com> <1297467316.468949.1526364742437.JavaMail.zimbra@u-pem.fr> Message-ID: <1511980940.533002.1526371522545.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Envoy?: Mardi 15 Mai 2018 08:56:49 > Objet: Re: value type hygiene > On May 14, 2018, at 11:12 PM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > wrote: >>> De: "John Rose" < [ mailto:john.r.rose at oracle.com | john.r.rose at oracle.com ] > >>> ?: "Remi Forax" < [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > >>> Cc: "Brian Goetz" < [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ] >>> >, "valhalla-spec-experts" < [ mailto:valhalla-spec-experts at openjdk.java.net | >>> valhalla-spec-experts at openjdk.java.net ] > >>> Envoy?: Mardi 15 Mai 2018 01:53:31 >>> Objet: Re: value type hygiene >>> On May 14, 2018, at 4:13 PM, Remi Forax < [ mailto:forax at univ-mlv.fr | >>> forax at univ-mlv.fr ] > wrote: >>>> I think i prefer a declaration site annotation like @ValueBasedClass to a use >>>> site annotation ValueRef. > D'oh. You said declaration site and you really meant it, at the declaration > of the value class itself? I was responding as if you were proposing an > use-site annotation, where a method is declared that uses a VT but > wants to make that occurrence of the VT nullable. So, some of my > reply was non-responsive; sorry. >>> In case it's not clear what I am proposing, I am not weighing in on language >>> notations, but rather on VM structures in the context of the L-world experiment: >>> - The L-world JVM should *not* be asked to invent two new runtime types for >>> every VT. >>> - Nullable references to value types in new code should be expressed using >>> reference types. >>> - In bytecodes, such nullable references can be Object or a new VT-specific >>> interface or a mix of both. >> I agree with your first 2 items but not the last one, a nullable reference >> should a L reference which is not listed in the attribute ValueTypes. > You must admit this use of supers to carry nullable values is possible, > but you are saying (I think) that you don't agree that this is useful. We have already decided that j.l.Object is the super that can carry null, so yes, we do not need another one. >> So to be clear, what i'm proposing is to have a way at source level to say it's >> a value type at runtime but it behave like a reference type, >> so a class which is tagged with the value type bit and that the compiler doesn't >> list in the attribute ValueTypes. > What would be the benefit of such a value type? If it is nullable > everywhere, conversely it is flattenable nowhere. That seems > like it's giving up a fair chunk of valuable value-ness. The VT > itself would resist identity checks (acmp => false). Would the > arrays be flattenable or not? Seems to me that if a VT author > uses such a big hammer to ask for nullability, the arrays also > should allow null, hence be non-flattenable. I don't see much > payoff from this user model. In a sense, you're right, asking for nullability comes with a high cost, it's not flattenable (otherwise you can not store null), acmp => false (it's a value type at runtime) but you still have the fact that JITs that can spill a nullable value type in registers which is an important case. The idea is that even if a nullable value type escapes, the JIT doesn't have to keep it, it can spread it into its multiple components and gather it when it escapes. If you take a look to Optional or LocalDate, i'm not sure the need for flattening is that important, but being able to consider it has a value type inside an inlining blob (inside a function of the generated assembly) is important in term of performance when you do operation like map()/filter() or plus*()/minus*(). >>> In source code, I really don't care how nullability is expressed: annotation? >>> keyword? punctuation? Java 5 generic? etc. >> In my opinion, it has to be a declaration site thing not a use site thing but as >> you said, i do not mind if it's an annotation or a keyword. But it should not >> be an erased generics since it's a use site 'annotation'. >> That why i said i prefer an annotation at declaration site like @ValueBased. >> ... >> I was not clear in my previous mail, i was proposing a declaration site >> mechanism like the annotation, not to use that annotation at runtime. > Got it now; see above. >> ... it will be used for a handful class in the source code, it uses the >> attribute ValueTypes, for the VM side it's not another mechanism, it's the same >> mechanism but we let the end user to choose which kind of compatibility he >> wants. > Changing a VBC to a VT, and then putting @VBC on it, is like ten steps > forward and nine steps back, if I'm correctly understanding the implications > about flattening. > So for def-site choices we have: > 0. leave it alone, it's a VBC > 1. make it a proper value type, get flattening on recompile, and deal with the > null hygiene fallout > 0.1 make it a value type but mark it @VBC, no sync or acmp, no flattening either but it's nullable, the semantics is simple and mostly backward compatible (== does not work, use equals instead, do not synchronize on it) and no allocation cost where it can be important like in loops. in my opinion, yes, it's a trade off, but it's closer to 0.5 than 0.1. > The use-site choices for VTs are: > 0. what choice? you didn't want that API point anyway > 1. Object is the untyped workaround for all your nullable needs > 1.2 clever ValueRef is your statically typed workaround for nullables at the cost of some oddities like what ValueRef or ValueRef.class means. > 2. Q-world: ad hoc variations everywhere between L-VT and Q-VT (cost += 1e6) > 3. some sugar like VT.BOX or an annotation for one of the previous and in all cases, each use-site choice means that people will have to annotate their code to make it works like it were working before with respect of null, so it's not really a practical option because Optional is so widespread in the code that all the codes that contains Optional will never be rewritten. > ? John R?mi From maurizio.cimadamore at oracle.com Tue May 15 12:06:32 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Tue, 15 May 2018 13:06:32 +0100 Subject: value type hygiene In-Reply-To: <842319FF-B112-43CD-8824-39EFD5AF07E5@oracle.com> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> <842319FF-B112-43CD-8824-39EFD5AF07E5@oracle.com> Message-ID: <8db40e21-1854-7b86-5b2b-fc6756e7356c@oracle.com> I wonder if we shouldn't also consider something along the lines of restricting migration compatibility only for _nullable_ value types, where a nullable value type is a value whose representation is big enough that it can afford one spare value to denote null-ness. So, if you want to convert existing legacy reference classes to value types, they'd better be nullable values; this way you don't lose any value in the legacy domain - nulls will be remapped accordingly (using a logic specified in the nullable value type declaration). It seems like we've been somewhere along this path before (when we were exploring the Q vs. L split) - why would something like that not be workable? Maurizio On 15/05/18 01:57, John Rose wrote: > There?are various ways to do detect and report the mismatch: > > 1. Exclude loading the class, on grounds similar to CLC's. > 2. Fail to fill the v-table slot for Event.timestamp, leading to AME. > 3. Fill the v-table slot with a null-rejecting version of > Client1Event.timestamp. From paul.sandoz at oracle.com Tue May 15 19:32:51 2018 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 15 May 2018 12:32:51 -0700 Subject: value type hygiene In-Reply-To: References: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> <14FCCF01-AB8B-48A1-993C-2D00D6527909@oracle.com> Message-ID: <461F7C21-5FB1-4B71-BF96-5B11A1B03347@oracle.com> > On May 14, 2018, at 11:36 PM, John Rose wrote: > > On May 14, 2018, at 6:53 PM, Paul Sandoz wrote: >> >> Hi John, >> >> The answers below might depend on experimentation but what might you propose the behavior should be for the following code, assuming we have no specialized generics, ArrayList is not yet modified to cope better: > > This is the right sort of question to ask and answer about value types, > and the sooner we get L-world up and running, the quicker we can > validate our answers. > > We'll probably end up with a bunch of rules of thumb about how to > handle nulls. In this case, null is a sentinel value used by the external > API. There are two main choices with cases like this: (a) disallow > the null when used with null-hostile types (V[]) and (b) adapt null > to another value when a null-hostile type is detected. > >> value class Point { ? } >> >> class VW { >> public static void main(String[] s) { >> List l = new ArrayList<>(); >> l.add(P.default); >> l.add(P.default); // assuming this works :-) > > (There's no reason why it shouldn't work.) > Agreed. >> Point[] p = new Point[10]; // Flattened array is created >> l.toArray(p); // What should happen here? >> } >> } >> >> (I know toArray is value hostile and maybe should be deprecated or changed but I find it a useful example to think about as it may be indicative of legacy code in general.) > > In the case of List.toArray(.) the simple answer IMO is (b), and the > sentinel value is clearly T.default, to be computed something > like this: > > diff --git a/src/java.base/share/classes/java/util/AbstractCollection.java b/src/java.base/share/classes/java/util/AbstractCollection.java > --- a/src/java.base/share/classes/java/util/AbstractCollection.java > +++ b/src/java.base/share/classes/java/util/AbstractCollection.java > @@ -186,13 +186,13 @@ > for (int i = 0; i < r.length; i++) { > if (! it.hasNext()) { // fewer elements than expected > if (a == r) { > - r[i] = null; // null-terminate > + r[i] = a.getClass().getDefaultValue() > } else if (a.length < i) { > return Arrays.copyOf(r, i); > } else { > System.arraycopy(r, 0, a, 0, i); > if (a.length > i) { > - a[i] = null; > + a[i] = a.getClass().getDefaultValue() > } > } > return a; > > Eventually when int[] <: Object[], then int[].class.getClass().getDefaultValue() > will return an appropriate zero value, at which point the above behavior will > "work like an int". > > Another way to make this API point "work like an int" would be to throw an > exception (ASE or the like), on the grounds that you can't store a null into > an int[] so you shouldn't be able to store a null into a Point[]. > A third approach could be to check if the array is non-nullable and not store a default value, which may be surprising, but storing a default is arguably less useful in general for arrays of value types but it is suppose mostly harmless (i am thinking of cases where a value type has a default that is hostile to be operated on, like perhaps LocalDate). >> Should the call to l.toArray link? > > Yes, because Point[] <: Object[]. There's a separate question on whether > the source language should allow the instance List; I think it should > do so because that's more useful than disallowing it. > >> If so then i presume some form of array store exception will be thrown when ArrayList attempts to store null into the flattened array at index 2? > > In the case of the List API it's more useful for the container, which is attempting > to contain all kinds of data, to bend a little and store T.default as the proper > generalization of null. Under this theory, Object.default == null, and X.default > is also null for any non-value, non-primitive X. (Including Integer but not int.) Agreed, i just wanted to do the thought experiment given the current behavior of List/ArrayList as if it's unmodified legacy code. > >> Or: >> >> Point[] p = l.toArray(new Point[2]); >> >> a flattened array is returned (the argument)? assuming System.arraycopy works. > > It does. (Or will.) Reason: Point[] <: Object[]. > Ok. >> Or: >> >> Point[] p = l.toArray(new Point[1]); >> >> a non-flattened array is returned? > > The reflective argument's class is Point[], so Arrays.copyOf has no choice but > to create another instance of Point[], which will also be flattened. It appears > that Arrays.copyOf won't need any code changes for values. > > (As I replied to Frederic, it is technically possible to imagine a system of > non-flat versions of VT[] co-existing with flat versions of VT[] but we shouldn't > do that just because we can, but because there is a proven need and not > doing it is even more costly than doing it. There are good substitutes for > non-flat VT[], such as Object[] and I[] where VT <: I. We can even contrive > to gain static typing for the substitutes, by using the ValueRef device.) > >> since Arrays.copyOf operates reflectively on the argument?s class and not additional runtime properties. > > I don't get this. What runtime properties are you thinking of? ValueClasses? > That exists to give meaning to descriptors. The actual Class mirror always > knows exactly whether it is a value class or not, and thus whether its arrays > are flat or not. > Ok, i was unsure about the class mirror, and whether there would be runtime associated with the array instance. And just to be clear so i got this straight in my head... ValueWorld ? value class Point {} class A { static void m() { Point[] pa = new Point[10]; B.m1(pa); // returns false B.m2(pa); // returns true } } RefWorld ? final class Point {} // note that the class is declared final class A { static boolean m1(Point[] p) { return p.getClass() != Point[].class ; } static boolean m2(Point[] p) { return Point[].class.isAssignableFrom(p.getClass()); } } And, for the same reasons, that also applies to the class mirror for Point in the value world and ref world. Which got me thinking of the implications, if any, for checked collections :-) e.g. Collections.checkedList, which currently does: E typeCheck(Object o) { if (o != null && !type.isInstance(o)) throw new ClassCastException(badElementMsg(o)); return (E) o; } >> What about: >> >> Object[] o = l.toArray(); >> >> A non-flattened array is returned containing elements that are instances of boxed Point? >> >> Paul. > > Yes, this is a non-flattened array, since Object[] is never flattened. > Ok. > Here's another option if (a) and (b) don't work out for List: Globally > define a mapping between value types and null, and make the VM > silently "unbox" null into the correct value type. This isn't a cure-all, > because it masks true NPE errors in code. And it only applies when > a null is being stored into a container which is strongly typed as > a value type. > > When faced with a non-nullable container of a value type VT, > promote stored nulls to VT.default, for all VT's or else for VT's > which opt in (maybe VT <: interface PromoteNullToDefault). > If we buy that trick, then a[i] = null turns into a[i] = VT.default > automatically everywhere, not just in AbstractCollection. > This is technically possible but IMO would require experimentation > with a real VM running actual code, to see where the paradoxes > arise from erasing nulls quietly to VT.default. > > I'd rather try to get away with changing the API of list to store > not null but rather VT.default, when the passed-in array is a value > array. Me too. Paul. > This change only affects behavior on new types (value arrays) > so it is backward compatible, in some strict sense. And it is arguably > unsurprising to a programmer who is working with value arrays. > At least, I think it is a defensible generalization of the old rule to > store a null after the last stored output value. > > ? John > From daniel.smith at oracle.com Tue May 15 19:35:51 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 15 May 2018 13:35:51 -0600 Subject: value type hygiene In-Reply-To: <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> References: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> Message-ID: <7A2B6203-BC49-40B1-886F-972F7FF0DCD9@oracle.com> > On May 14, 2018, at 5:02 PM, John Rose wrote: > > Think of V[] as a global type, > and you'll see that it needs a global definition of what is flattened and > what is nullable. I think we will get away with migrating types and > declaring that legacy classes that use their arrays will fail. The mode > of failure needs engineering via experiment. We could go so far as > to reject legacy classes that use anewarray to build arrays of value > type, without putting those types on the ValueTypes list. > > This means that if there is a current class C out there that is creating > arrays of type Optional[] or type LocalDate[], then if one of those types > is migrated to a value type, then C becomes a legacy class, and it will > probably fail to operate correctly. ! I don't think it would be acceptable to change the meaning of code like this: LocalDate[] dates = new LocalDate[10]; void set(int i, LocalDate d) { dates[i] = d; } boolean check(int i) { return dates[i] != null; } *Maybe* when it gets recompiled we force flattening and report an error on the comparison to null (there are other possibilities, but this is the strawman language design of record). But if migrating a class to a value class risks breakage like this everywhere in existing binaries, it's simply not a compatible change, and I would discourage anyone (including Java SE) from doing it. My vision of migration is a lot more inclusive: there are classes everywhere that meet the requirements for value classes. We want to give those classes a performance boost, for the benefit of the subset of clients who care, *without* disrupting the clients who just want a nice abstraction and don't have a performance bottleneck. We achieve this by encouraging widespread migration to value classes, and then managing semantics through some form of opt in: opt in, and you get the full performance benefit, but need to adjust for different semantics; remain opted out, and your semantics are stable (with perhaps some marginal performance gains). From john.r.rose at oracle.com Tue May 15 20:27:13 2018 From: john.r.rose at oracle.com (John Rose) Date: Tue, 15 May 2018 13:27:13 -0700 Subject: value type hygiene In-Reply-To: <1511980940.533002.1526371522545.JavaMail.zimbra@u-pem.fr> References: <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> <682CDE4C-E077-4899-9460-27FBF4E11075@oracle.com> <1297467316.468949.1526364742437.JavaMail.zimbra@u-pem.fr> <1511980940.533002.1526371522545.JavaMail.zimbra@u-pem.fr> Message-ID: <0BF68E46-CC61-41D2-94E7-407B837B3BBC@oracle.com> On May 15, 2018, at 1:05 AM, forax at univ-mlv.fr wrote: > > > ... > You must admit this use of supers to carry nullable values is possible, > but you are saying (I think) that you don't agree that this is useful. > > We have already decided that j.l.Object is the super that can carry null, so yes, we do not need another one. I see. Yes, the clever ValueRef adds mainly static checking, because it can carry a type parameter. Other than that, it is just another Object. ?We could also play this move: An interface ValueRef could be defined such that, at the JVM level, the JVM enforces that (x instanceof ValueRef) if and only if (x.getClass().isValue()). The JVM would simply enforce the corresondence at class load time. It's a play I'm keeping in my pocket, which could add special runtime strength to a static typing story. > ... > In a sense, you're right, asking for nullability comes with a high cost, it's not flattenable (otherwise you can not store null), acmp => false (it's a value type at runtime) but you still have the fact that JITs that can spill a nullable value type in registers which is an important case. The idea is that even if a nullable value type escapes, the JIT doesn't have to keep it, it can spread it into its multiple components and gather it when it escapes. > > If you take a look to Optional or LocalDate, i'm not sure the need for flattening is that important, but being able to consider it has a value type inside an inlining blob (inside a function of the generated assembly) is important in term of performance when you do operation like map()/filter() or plus*()/minus*(). This is an interesting tricky point. I'm glad it's moot for 99.9% of value types, which are the non-migrated ones. Dan's objections to non-nullable migrated arrays would be met, at very high performance cost, by making arrays of these types nullable also. What do you think about flattening arrays of these specially marked VTs (that were VBCs)? If we *don't* flatten arrays of a particular VT, then we need a special way to convey that decision to the JVM. (I wonder if Panama vector types would tolerate such a move: We mainly need in-loop optimizations, so nullability would be tolerable there, but having them boxed in arrays would be a non-starter.) > > 0. leave it alone, it's a VBC > 1. make it a proper value type, get flattening on recompile, and deal with the null hygiene fallout > 0.1 make it a value type but mark it @VBC, no sync or acmp, no flattening either > > but it's nullable, the semantics is simple and mostly backward compatible (== does not work, use equals instead, do not synchronize on it) and no allocation cost where it can be important like in loops. > in my opinion, yes, it's a trade off, but it's closer to 0.5 than 0.1. OK, I see how it is worth the experiment. It's nice that (except for the array question) it's purely in the translation strategy: The existing proposed ValueTypes attribute would (as you say) simply never mention such a marked VT. > > > The use-site choices for VTs are: > > 0. what choice? you didn't want that API point anyway > 1. Object is the untyped workaround for all your nullable needs > 1.2 clever ValueRef is your statically typed workaround for nullables > > at the cost of some oddities like what ValueRef or ValueRef.class means. See above: It would mean "this is any value type". Surely that's useful? We already have reflective Class.isValue, and this would be the static type for the same concept. (You see I'm reluctant to kill this "darling"[1].) [1] https://en.wiktionary.org/wiki/kill_one%27s_darlings > > 2. Q-world: ad hoc variations everywhere between L-VT and Q-VT (cost += 1e6) > 3. some sugar like VT.BOX or an annotation for one of the previous > > and in all cases, each use-site choice means that people will have to annotate their code to make it works like it were working before with respect of null, so it's not really a practical option because Optional is so widespread in the code that all the codes that contains Optional will never be rewritten. We're in a tug-of-war here, between the goal of migratability and the prime goal of value types (mainly flattenability, of all variables). One side says "I need to flatten your variable *here*" and the other side says, "nope, not backward compatible". Maybe we're proving that migration is not really possible. We are certainly proving that migration is tricky and requires compromising various kinds of correctness (relative to the pre-migrated semantics). Ultimately, we must flatten values, except perhaps for a negligible clearly marked fraction of "compromised values" which dragged themselves away from VBC-hood, but incompletely. Ultimately we must tell the migrators to migrate with semantic changes, or not migrate. To paraphrase Yoda, "either do or do not, but about it don't cry." ? John From john.r.rose at oracle.com Tue May 15 21:26:59 2018 From: john.r.rose at oracle.com (John Rose) Date: Tue, 15 May 2018 14:26:59 -0700 Subject: value type hygiene In-Reply-To: <8db40e21-1854-7b86-5b2b-fc6756e7356c@oracle.com> References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> <842319FF-B112-43CD-8824-39EFD5AF07E5@oracle.com> <8db40e21-1854-7b86-5b2b-fc6756e7356c@oracle.com> Message-ID: On May 15, 2018, at 5:06 AM, Maurizio Cimadamore wrote: > > I wonder if we shouldn't also consider something along the lines of restricting migration compatibility only for _nullable_ value types, where a nullable value type is a value whose representation is big enough that it can afford one spare value to denote null-ness. So, if you want to convert existing legacy reference classes to value types, they'd better be nullable values; this way you don't lose any value in the legacy domain - nulls will be remapped accordingly (using a logic specified in the nullable value type declaration). > > It seems like we've been somewhere along this path before (when we were exploring the Q vs. L split) - why would something like that not be workable? We do something similar explicitly (not globally) with the combinator MHs.explicitCastArguments, which converts null to zeroes of primitive types. But not vice versa: Zeroes don't re-box to null. And it's a localized thing. I think you are suggesting a def-site opt-in where a class VT somehow nominates a special value VT.N (perhaps its VT.default default, perhaps not) with one or both of these behaviors: assert( (VT)null == VT.N ); // unbox null to N assert( (Object)VT.N == null ); // box N to null (probably not!) For simplicity, let's say N must be VT.default, and that the conversion is just one way (null to VT). Then the opt-in could be as simple as mixing in an interface NullConvertsToDefault. See the P.S. of this message: http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2018-May/000634.html (Perhaps you are suggesting something different?) It might be workable. It's complicated, of course. It would need to inject special logic into many paths in the JVM which are currently just error paths. There might be collateral damage on performance. For optimized code, throwing an NPE is always simpler than patching and continuing. This is roughly because, in optimized code, control flow merges are harder to optimize than control flow forks. (Adding in the symmetric feature, of converting N to null, is probably very expensive, since it would seem to require lots of new tests of the form, "are you N?" And there's little value in converting N to null and then having List.of or some other null-hostile API throw an error. So then you have a puzzler: The seam between N and null is not completely hidden.) AFAIK C# does something like this as a one-time deal, which cannot be mixed in as an interface: https://docs.microsoft.com/en-us/dotnet/api/system.nullable In C#, at the use-site of a type you can opt into it with an emotional type like 'int?'. But we could make it opt in at the def-site, too, if the value type has a spare code-point it's not using. I'm sure if C# doesn't do this there are excellent reasons for them not to. Anybody got information on this? I am hoping to avoid playing such a card. (In case anyone didn't notice, we are playing with a large deck here, if not a full deck. There are lots of potential moves we can make.) I want us to win with a small number of moves. ? John P.S. Other examples of moves: Adding another type descriptor, having one array type be polymorphically boxed or flattened, having two VM-level types per source value type, waiting for reified generics, waiting for primitive convergence, adding large infrastructure for migration. Maybe we will be forced to do one or all of these before we can get anywhere. I hope not; I'm trying to sneak across a meaningful waypoint (not finish line) simply with L-world. From forax at univ-mlv.fr Tue May 15 22:03:52 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 16 May 2018 00:03:52 +0200 (CEST) Subject: value type hygiene In-Reply-To: <0BF68E46-CC61-41D2-94E7-407B837B3BBC@oracle.com> References: <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> <682CDE4C-E077-4899-9460-27FBF4E11075@oracle.com> <1297467316.468949.1526364742437.JavaMail.zimbra@u-pem.fr> <1511980940.533002.1526371522545.JavaMail.zimbra@u-pem.fr> <0BF68E46-CC61-41D2-94E7-407B837B3BBC@oracle.com> Message-ID: <26710850.918579.1526421832691.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Envoy?: Mardi 15 Mai 2018 22:27:13 > Objet: Re: value type hygiene > On May 15, 2018, at 1:05 AM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > wrote: >>> ... >>> You must admit this use of supers to carry nullable values is possible, >>> but you are saying (I think) that you don't agree that this is useful. >> We have already decided that j.l.Object is the super that can carry null, so >> yes, we do not need another one. > I see. Yes, the clever ValueRef adds mainly static checking, because it can > carry a type parameter. Other than that, it is just another Object. > ?We could also play this move: An interface ValueRef could be defined > such that, at the JVM level, the JVM enforces that (x instanceof ValueRef) > if and only if (x.getClass().isValue()). The JVM would simply enforce the > corresondence at class load time. It's a play I'm keeping in my pocket, > which could add special runtime strength to a static typing story. so it's a way to resurect the Qjava/lang/Object; as super type of all value types, it's nullable like Ljava/lang/Object; but you can not put a reference type in it by construction. I'm not sure it can solve our problem because at callsite of a method you know the value type and you know that you will call a method that takes a ValueRef so you can the calling convention of a value type but at callee side, you only know that you will be called with a value type but you don't know which one, and passing the class of the value type in the calling convention doesn't seem a good idea, it's what swift does for its generics, and i hope we do not need such complexity for value types. >> ... >> In a sense, you're right, asking for nullability comes with a high cost, it's >> not flattenable (otherwise you can not store null), acmp => false (it's a value >> type at runtime) but you still have the fact that JITs that can spill a >> nullable value type in registers which is an important case. The idea is that >> even if a nullable value type escapes, the JIT doesn't have to keep it, it can >> spread it into its multiple components and gather it when it escapes. >> If you take a look to Optional or LocalDate, i'm not sure the need for >> flattening is that important, but being able to consider it has a value type >> inside an inlining blob (inside a function of the generated assembly) is >> important in term of performance when you do operation like map()/filter() or >> plus*()/minus*(). > This is an interesting tricky point. I'm glad it's moot for 99.9% of value > types, > which are the non-migrated ones. > Dan's objections to non-nullable migrated arrays would be met, at very high > performance cost, by making arrays of these types nullable also. What do > you think about flattening arrays of these specially marked VTs (that were > VBCs)? If we *don't* flatten arrays of a particular VT, then we need a special > way to convey that decision to the JVM. When you create an array, you can check if it has to be flatten or not using the ValueTypes attributes, if it's possible (see below). Or you can for these specific type allow flattening by default and do heroic change if there is a code that try to put null in it by re-boxing all the elements as you said in the message to Maurizio. Usually doing heroic things like this in the VM is a stupid idea, here, we are talking about a corner case, how to retrofit value based class, so doing heroic things like this is two times stupid. > (I wonder if Panama vector types would tolerate such a move: We mainly > need in-loop optimizations, so nullability would be tolerable there, but having > them boxed in arrays would be a non-starter.) I hope that If the panama vector API is published, it will be as an experimental API so no need to maintain a backward compatibility story here. They can be value types. >>> 0. leave it alone, it's a VBC >>> 1. make it a proper value type, get flattening on recompile, and deal with the >>> null hygiene fallout >>> 0.1 make it a value type but mark it @VBC, no sync or acmp, no flattening either >> but it's nullable, the semantics is simple and mostly backward compatible (== >> does not work, use equals instead, do not synchronize on it) and no allocation >> cost where it can be important like in loops. >> in my opinion, yes, it's a trade off, but it's closer to 0.5 than 0.1. > OK, I see how it is worth the experiment. It's nice that (except for the > array question) it's purely in the translation strategy: The existing proposed > ValueTypes attribute would (as you say) simply never mention such a > marked VT. It can be a pure translation strategy for array too, if the VM uses the attribute ValueTypes to know which kind of array should be created but in that case, it means that the array need to be tagged as flattened or not and i do not know if it's possible. >>> The use-site choices for VTs are: >>> 0. what choice? you didn't want that API point anyway >>> 1. Object is the untyped workaround for all your nullable needs >>> 1.2 clever ValueRef is your statically typed workaround for nullables >> at the cost of some oddities like what ValueRef or ValueRef.class means. > See above: It would mean "this is any value type". Surely that's useful? > We already have reflective Class.isValue, and this would be the static type > for the same concept. (You see I'm reluctant to kill this "darling"[1].) > [1] [ https://en.wiktionary.org/wiki/kill_one%27s_darlings | > https://en.wiktionary.org/wiki/kill_one%27s_darlings ] I can kill it for you :) The argument is that we can always re-introduce it in a later release. >>> 2. Q-world: ad hoc variations everywhere between L-VT and Q-VT (cost += 1e6) >>> 3. some sugar like VT.BOX or an annotation for one of the previous >> and in all cases, each use-site choice means that people will have to annotate >> their code to make it works like it were working before with respect of null, >> so it's not really a practical option because Optional is so widespread in the >> code that all the codes that contains Optional will never be rewritten. > We're in a tug-of-war here, between the goal of migratability and > the prime goal of value types (mainly flattenability, of all variables). > One side says "I need to flatten your variable *here*" and the other > side says, "nope, not backward compatible". Maybe we're proving > that migration is not really possible. We are certainly proving that > migration is tricky and requires compromising various kinds of > correctness (relative to the pre-migrated semantics). yes > Ultimately, we must flatten values, except perhaps for a negligible > clearly marked fraction of "compromised values" which dragged > themselves away from VBC-hood, but incompletely. Ultimately > we must tell the migrators to migrate with semantic changes, > or not migrate. To paraphrase Yoda, "either do or do not, > but about it don't cry." :) > ? John R?mi From john.r.rose at oracle.com Tue May 15 22:13:19 2018 From: john.r.rose at oracle.com (John Rose) Date: Tue, 15 May 2018 15:13:19 -0700 Subject: value type hygiene In-Reply-To: <461F7C21-5FB1-4B71-BF96-5B11A1B03347@oracle.com> References: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> <14FCCF01-AB8B-48A1-993C-2D00D6527909@oracle.com> <461F7C21-5FB1-4B71-BF96-5B11A1B03347@oracle.com> Message-ID: <94D24E28-9638-496A-9E53-7872828EEB25@oracle.com> > On May 15, 2018, at 12:32 PM, Paul Sandoz wrote: > > > >> On May 14, 2018, at 11:36 PM, John Rose wrote: >> ... >> Eventually when int[] <: Object[], then int[].class.getClass().getDefaultValue() >> will return an appropriate zero value, at which point the above behavior will >> "work like an int". >> >> Another way to make this API point "work like an int" would be to throw an >> exception (ASE or the like), on the grounds that you can't store a null into >> an int[] so you shouldn't be able to store a null into a Point[]. >> > > A third approach could be to check if the array is non-nullable and not store a default value, which may be surprising, but storing a default is arguably less useful in general for arrays of value types but it is suppose mostly harmless (i am thinking of cases where a value type has a default that is hostile to be operated on, like perhaps LocalDate). Sure, it could just do nothing if the array doesn't accept a null. It really depends on what "null" means in this use case. Remember the basic reason that toArray puts a null into the first unused spot: It is restoring it to (approximately) what the array looked like (at that spot) when it was first created. The toArray function doesn't know that this is a valid sentinel, but it is the most reasonable deterministic value to store, since it was there at the beginning. And, in the common case where toArray is handed a fresh array (but unfortunately too long), the stored null has no effect at all: It overwrites the null that was there from the beginning. Now, hmmm, how would we do a similar thing with flattened arrays?? Basically, I'm saying that today's spec. should not say toArray "stores a null", but rather "stores the default value of the array element". Which today is null, and tomorrow is something perhaps more interesting. Given that we don't control all implementations of List, we can't upgrade all their toArray methods, which means we have to either be weaselly or hack the JVM to convert null to default (see Maurizio's suggestion). Weasel words would be "if toArray is handed a flattened array, the implementation may choose to throw NPE or reset the array element to its flat initial value. Implementors are encouraged to do the latter. All java.base implementations do so." >> ... >> In the case of the List API it's more useful for the container, which is attempting >> to contain all kinds of data, to bend a little and store T.default as the proper >> generalization of null. Under this theory, Object.default == null, and X.default >> is also null for any non-value, non-primitive X. (Including Integer but not int.) > > Agreed, i just wanted to do the thought experiment given the current behavior of List/ArrayList as if it's unmodified legacy code. Yup; see above. > >> ... >> (As I replied to Frederic, it is technically possible to imagine a system of >> non-flat versions of VT[] co-existing with flat versions of VT[] but we shouldn't >> do that just because we can, but because there is a proven need and not >> doing it is even more costly than doing it. There are good substitutes for >> non-flat VT[], such as Object[] and I[] where VT <: I. We can even contrive >> to gain static typing for the substitutes, by using the ValueRef device.) >> >>> since Arrays.copyOf operates reflectively on the argument?s class and not additional runtime properties. >> >> I don't get this. What runtime properties are you thinking of? ValueClasses? >> That exists to give meaning to descriptors. The actual Class mirror always >> knows exactly whether it is a value class or not, and thus whether its arrays >> are flat or not. >> > > Ok, i was unsure about the class mirror, and whether there would be runtime associated with the array instance. This is a big difference (as I noted to Frederic) between array elements and object fields. For array elements, we need global agreement on flattenability (or nullability, or a polymorphic mix of both). So there has to be some sort of runtime tracking, either in the array type (preferably) or each instance (yuck!) of whether flattening is happening. Given the tracking, we can ask whether any given array is flattenable or not (i.e., nullable or not). (Note: VM folks make a fine distinction between flattenability and flattening, which non-VM folks can ignore. A field or array element is flattened when it really gets unbox and stored by value components. A field or array element is flattenable when the JVM has tried but failed to do so for some reason. To provide deterministic behavior, the JVM must keep this failure a secret from the user. Thus, all flattenable fields and elements reject nulls, even if they are secretly not flattened, and thus their box pointer *could* be replaced by a null.) > > And just to be clear so i got this straight in my head... > > ValueWorld > ? > > value class Point {} > > class A { > static void m() { > Point[] pa = new Point[10]; > B.m1(pa); // returns false > B.m2(pa); // returns true > } > } Are you suggesting that new Point[10].getClass() != Point[].class in some design we are considering? That would be very surprising. Anyway, in L-world it's simple like this: Point[] pa = new Point[10]; // flattened array of 10 values assert( pa.getClass() == Point[].class ); assert( Object[].class.isAssignableFrom( Point[].class ) ); In U-world, values and refs are under a distinct abstract top type Q-X <: U-X && L-X <: U-X. In such a design there could be up to three array types for each X. If we mix into L-world a Q-descriptor, so that Q-X <: L-X, to capture nullable vs. non-nullable descriptions of value types, we would have two array types for each X. That is one way to have it both ways for arrays, a move which I am resisting pending strong proof that it is really needed. (And if we need polymorphic array types, it is still possible to do it without introducing a new descriptor. It could be a yucky per-instance bit, if we had no other need for the expensive new descriptor.) (In this I am assuming the there is a 1-1 correspondence between field descriptors and Class objects. That too is something we could complexify, if it simplified something else greatly. You won't be surprised to know that I don't want to do that move either unless it's forced. Eventually we will have to carefully introduce "crass" pointers when we split a single class into many reified types. And the array distinction, if one exists, could be aligned with the crass/species distinction. We don't need to do it now, though.) > > RefWorld > ? > > final class Point {} // note that the class is declared final > > class A { > static boolean m1(Point[] p) { > return p.getClass() != Point[].class ; > } > > static boolean m2(Point[] p) { > return Point[].class.isAssignableFrom(p.getClass()); > } > > } If Point is final, then m1 will never return true, right? (True for both VBC Point and VT Point.) I'm not sure what you are saying here. That VBCs and VTs, and their arrays, look similar under such type tests? I agree to that. In L-world the covariant array typing just works the same as always, for better or worse. > And, for the same reasons, that also applies to the class mirror for Point in the value world and ref world. > > Which got me thinking of the implications, if any, for checked collections :-) e.g. Collections.checkedList, which currently does: > > E typeCheck(Object o) { > if (o != null && !type.isInstance(o)) > throw new ClassCastException(badElementMsg(o)); > return (E) o; > } So 'o != null' is true, for a couple of reasons, for any value instance o. And then type.isInstance(o) will be true iff o is of the right type. That means a null leaks down to the cast (E) even if E is a value type. And if the cast (E) is a generic, it will let the null through. The null will only be detected and rejected downstream when a client of the generic actually casts the null to a concrete value type, not just a type parameter. You left out the previous line! @SuppressWarnings("unchecked") The above code should give a warning for an unchecked cast on (E). That's also a warning that the VT null checking may fail. The fix is to use Class.cast (if we decide that that will reject nulls properly) or else a new API point Class.castValue (which DTRT for value types w.r.t. null). Fixed code: E typeCheck(Object o) { if (o == null) { if (type.isValue()) throw new NullPointerException(badElementMsg(o)); return null; } if (!type.isInstance(o)) throw new ClassCastException(badElementMsg(o)); @SuppressWarnings("unchecked") E result = (E) o; return result; } or just: E typeCheck(Object o) { return type.castValue(o); } Where in Class: @HotSpotIntrinsicCandidate public T castValue(Object obj) { if (obj == null) { if (isValue()) throw new NullPointerException(cannotCastMsg(obj)); return null; } if (!isInstance(obj)) throw new ClassCastException(cannotCastMsg(obj)); @SuppressWarnings("unchecked") T result = (T) obj; return result; } (I'm agnostic over whether we can sneak the null check into Class::cast. The current arguments about migrated nullable VBCs to VTs will affect that decision.) ? John From john.r.rose at oracle.com Tue May 15 22:30:58 2018 From: john.r.rose at oracle.com (John Rose) Date: Tue, 15 May 2018 15:30:58 -0700 Subject: value type hygiene In-Reply-To: <26710850.918579.1526421832691.JavaMail.zimbra@u-pem.fr> References: <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> <682CDE4C-E077-4899-9460-27FBF4E11075@oracle.com> <1297467316.468949.1526364742437.JavaMail.zimbra@u-pem.fr> <1511980940.533002.1526371522545.JavaMail.zimbra@u-pem.fr> <0BF68E46-CC61-41D2-94E7-407B837B3BBC@oracle.com> <26710850.918579.1526421832691.JavaMail.zimbra@u-pem.fr> Message-ID: On May 15, 2018, at 3:03 PM, forax at univ-mlv.fr wrote: > ?We could also play this move: An interface ValueRef could be defined > such that, at the JVM level, the JVM enforces that (x instanceof ValueRef) > if and only if (x.getClass().isValue()). The JVM would simply enforce the > corresondence at class load time. It's a play I'm keeping in my pocket, > which could add special runtime strength to a static typing story. > > so it's a way to resurect the Qjava/lang/Object; as super type of all value types, it's nullable like Ljava/lang/Object; but you can not put a reference type in it by construction. Yes. Some small usefulness there, not a lot. > I'm not sure it can solve our problem because at callsite of a method you know the value type and you know that you will call a method that takes a ValueRef so you can the calling convention of a value type but at callee side At the receiving site the code would either treat it as a generic pointer (if the code is generic) or else cast it to a particular VT. It depends on whether the API point being called said ValueRef for some type parameter T or whether it said ValueRef requesting a nullable date. In neither case do we need to pass extra metadata. Either it's a non-varying use case of ValueRef or a varying case, where the varying case will be managed by casts or other type tests somewhere else. > ... > When you create an array, you can check if it has to be flatten or not using the ValueTypes attributes, if it's possible (see below). Yep, that's a move we can do for anewarray, as I've already mentioned. But that does't work for jlr.Array.newInstance, which is a significant contributor of array instances. Best to keep arrays monomorphic, if we can hold that line. > Or you can for these specific type allow flattening by default and do heroic change if there is a code that try to put null in it by re-boxing all the elements as you said in the message to Maurizio. > Usually doing heroic things like this in the VM is a stupid idea, here, we are talking about a corner case, how to retrofit value based class, so doing heroic things like this is two times stupid. Yep. > > (I wonder if Panama vector types would tolerate such a move: We mainly > need in-loop optimizations, so nullability would be tolerable there, but having > them boxed in arrays would be a non-starter.) > > I hope that If the panama vector API is published, it will be as an experimental API so no need to maintain a backward compatibility story here. They can be value types. +1 > > 0. leave it alone, it's a VBC > 1. make it a proper value type, get flattening on recompile, and deal with the null hygiene fallout > 0.1 make it a value type but mark it @VBC, no sync or acmp, no flattening either > > but it's nullable, the semantics is simple and mostly backward compatible (== does not work, use equals instead, do not synchronize on it) and no allocation cost where it can be important like in loops. > in my opinion, yes, it's a trade off, but it's closer to 0.5 than 0.1. > > OK, I see how it is worth the experiment. It's nice that (except for the > array question) it's purely in the translation strategy: The existing proposed > ValueTypes attribute would (as you say) simply never mention such a > marked VT. > > It can be a pure translation strategy for array too, if the VM uses the attribute ValueTypes to know which kind of array should be created but in that case, it means that the array need to be tagged as flattened or not and i do not know if it's possible. That's very possible, actually; remember that the JVM already hides arrays which are supposed to be flattenable but for some reason have failed to truly flatten. Also, T[] <: Object[] for all (non-primitive) T whether T can really flatten or not. (This is one of the big moves in L-world!) So, we could have the JVM look for that annotation at class-load time and do two things: (1) throw the switch that secretly de-flattens a nominally flattenable array, (2) allow nulls in, as a public promise. That first step is a hidden thing that is easy to gate with an annotation (if it's not too complicated in structure?type annotations *are* complex). The second step is a somewhat non-annotation-like effect on semantics, but maybe it can pass muster if the alternatives are all worse. (Example of worse alternative: A new keyword in the language just for migrating 3 legacy VBCs. Don't think that will fly. Second example: Adding nullability and non-nullability modifiers on all sorts of type uses and defs. That might fly, but it will take years to get off the ground. If we do that, we might be able to retire the makeshift annotation.) I'm not against doing such an annotation; it is cheap to do, and doesn't harm any other parts of the system. All the downsides would be inflicted on the class that opted into it. ? John From john.r.rose at oracle.com Tue May 15 22:39:14 2018 From: john.r.rose at oracle.com (John Rose) Date: Tue, 15 May 2018 15:39:14 -0700 Subject: value type hygiene In-Reply-To: <7A2B6203-BC49-40B1-886F-972F7FF0DCD9@oracle.com> References: <9D5A86F2-0F1F-4E23-BC51-7A69D6376905@oracle.com> <28A2DCCB-263B-4D0E-BAB6-975DECC643B9@oracle.com> <7A2B6203-BC49-40B1-886F-972F7FF0DCD9@oracle.com> Message-ID: <71572A15-59A0-440A-9948-A9D38CF73079@oracle.com> On May 15, 2018, at 12:35 PM, Dan Smith wrote: > > I don't think it would be acceptable to change the meaning of code like this: > > LocalDate[] dates = new LocalDate[10]; > void set(int i, LocalDate d) { dates[i] = d; } > boolean check(int i) { return dates[i] != null; } > > *Maybe* when it gets recompiled we force flattening and report an error on the comparison to null (there are other possibilities, but this is the strawman language design of record). But if migrating a class to a value class risks breakage like this everywhere in existing binaries, it's simply not a compatible change, and I would discourage anyone (including Java SE) from doing it. > > My vision of migration is a lot more inclusive: there are classes everywhere that meet the requirements for value classes. We want to give those classes a performance boost, for the benefit of the subset of clients who care, *without* disrupting the clients who just want a nice abstraction and don't have a performance bottleneck. We achieve this by encouraging widespread migration to value classes, and then managing semantics through some form of opt in: opt in, and you get the full performance benefit, but need to adjust for different semantics; remain opted out, and your semantics are stable (with perhaps some marginal performance gains). I suppose Remi's annotation, including a non-flattening effect on arrays, would meet the requirements you are posing here. The non-flattening effect would be that the JVM, when loading the annotated value type, would disable all flattening and enable nullability for all heap variables including array elements. For fields, the effect would be as if the declaring class did not mention the type in the ValueTypes attribute. Remi is proposing that javac in fact would *never* mention the annotated value type in *any* local ValueTypes attribute. You'd get nullability everywhere, and flattening *only* locally in optimized code (but not in data structures or across virtual APIs). I guess it's not too bad, if a type author has to opt into it with eyes wide open. ("Do or do not?") Does that help? For a prototype we could throw ICCE if (a) a ValueTypes attribute causes loading of a class, and (b) it checks out as a value type but in fact has the special annotation also. As a further exercise it's worth asking if there is any value in allowing an annotated value to to appear in a class's ValueTypes attribute, and what that would mean; I hope we can avoid such fine distinctions. ? John From john.r.rose at oracle.com Tue May 15 22:53:02 2018 From: john.r.rose at oracle.com (John Rose) Date: Tue, 15 May 2018 15:53:02 -0700 Subject: value type hygiene In-Reply-To: <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> Message-ID: More on your annotation proposal: One way to look at it is that you are proposing two new kinds of concrete classes to go with our old object class. - object class = what we have today: identity-rich, nullable, not flattened - value class = new, flattened*, not nullable, identity-free - enforced value-based class = new, nullable, not flattened, identity-free (* VM folks say "flattenable") Your annotation almost succeeds in being pure translation strategy, except that the JVM needs to be told to take care of the array type. So it's a little more than "just an annotation for javac". Dan seems to want to migrate many classes (most not enumerated) as EVBCs. And certainly LocalDate and friends would seem to slide easily into this mold. If we can implement EVBCs easily as a one-off from full value type, in the context of L-world, should we try it? People responsible for user model (hi Brian!) might say "yuck, we are admitting design failure by giving a consolation prize to the VBCs, instead of the real VTs promised". Maybe EVBCs are the best engineering compromise, or maybe we just cut EVBCs off the feature list and say "VT or VT not", at which point people who wrote VBCs will have sad decisions to make, and Dan will tell them not to migrate at all. From a VM POV, I think EVBCs are simple to implement on top of L-world, and so seem worth the experiment. One challenge will be to resist feature creep on EVBCs that would entrench them in a position as a true third kind of class. To avoid that, I like the idea of making them a one-off of one of the prime types: objects or values. And given that choice, having an annotation tweak them from a value type seems like a fine choice, at least from the VM POV. ? John On May 14, 2018, at 4:13 PM, Remi Forax wrote: > > I think i prefer a declaration site annotation like @ValueBasedClass to a use site annotation ValueRef. > > For me a value based class, if you want to be as compatible as possible, is a reference type (so 'L') that behave like a value type at runtime, so the JIT can see it has a value type at runtime and can unbox it at will and buffer/box it before calling a method at the horizon of the inlining blob. > > So as a library designer, i can choose to either replace a class by a real value type and it will fail if null is present or as a value based class if i value (sorry !) the backward compatibility more than the performance. > > Note that even if we solve the null issue when a reference type is changed to a value type and i think it's not something we should focus on, there is still the problem of the identity, so transforming a reference to a value type will never be 100% compatible. > > From daniel.smith at oracle.com Wed May 16 00:17:20 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 15 May 2018 18:17:20 -0600 Subject: value type hygiene In-Reply-To: References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> Message-ID: <563DA41B-EF57-42C0-A4C4-1803074FF2C3@oracle.com> > On May 15, 2018, at 4:53 PM, John Rose wrote: > > If we can implement EVBCs easily as a one-off from full value type, > in the context of L-world, should we try it? People responsible for > user model (hi Brian!) might say "yuck, we are admitting design > failure by giving a consolation prize to the VBCs, instead of the > real VTs promised". Maybe EVBCs are the best engineering > compromise, or maybe we just cut EVBCs off the feature list > and say "VT or VT not", at which point people who wrote VBCs > will have sad decisions to make, and Dan will tell them not to > migrate at all. Yeah, I'm pretty down on the benefit of these half-value classes. We'd be better off deprecating the old API classes and introducing new, "real" value class versions. The great thing about use site expressiveness is that different clients can choose different trade-offs, rather than forcing a single choice on all clients. The one declaration-site strategy I could see being viable is (per Maurizio, above) to allow a value class to assert that its default value should be treated as null, with all associated semantics (ifnull, NPE checks). Then your safe migration path for VBCs is to design a field layout that has an acceptable 'null' encoding?often no sacrifice at all. From john.r.rose at oracle.com Wed May 16 00:43:44 2018 From: john.r.rose at oracle.com (John Rose) Date: Tue, 15 May 2018 17:43:44 -0700 Subject: value type hygiene In-Reply-To: <563DA41B-EF57-42C0-A4C4-1803074FF2C3@oracle.com> References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> <563DA41B-EF57-42C0-A4C4-1803074FF2C3@oracle.com> Message-ID: On May 15, 2018, at 5:17 PM, Dan Smith wrote: > >> On May 15, 2018, at 4:53 PM, John Rose wrote: >> >> If we can implement EVBCs easily as a one-off from full value type, >> in the context of L-world, should we try it? People responsible for >> user model (hi Brian!) might say "yuck, we are admitting design >> failure by giving a consolation prize to the VBCs, instead of the >> real VTs promised". Maybe EVBCs are the best engineering >> compromise, or maybe we just cut EVBCs off the feature list >> and say "VT or VT not", at which point people who wrote VBCs >> will have sad decisions to make, and Dan will tell them not to >> migrate at all. > > Yeah, I'm pretty down on the benefit of these half-value classes. We'd be better off deprecating the old API classes and introducing new, "real" value class versions. OK, good; that's one less vote for a special JVM feature just for migration. Doing things that way could look like this: 1. rename LocalDateTime to LocalDateTimeVT (not its real name) 2. make LocalDateTimeVT be a value type 3. reconstruct the API of LocalDateTime, this time as an Integer-like wrapper for LocalDateTimeVT BTW, this raises a bee which was sleeping in my bonnet: This is a good time to reconsider the rules for capitalizing types. Right now, at this special moment in time, all value types have lower-case names, and all object type names begin with an upper-case letter. This is trivially true because only primitives are value types, and we followed the C tradition of naming them. So, how about we keep this useful state of affairs? Let's declare that value types will conventionally be camel-case names with an initial lower-case letter. At that point, the migrated LocalDateTime gets an obvious and memorable name, localDateTime. (Perhaps, to avoid collisions with method names, we might perturb the convention a little more, but I don't think it's needed.) I think that would be a useful outcome. > The great thing about use site expressiveness is that different clients can choose different trade-offs, rather than forcing a single choice on all clients. We can use ValueRef vs VT, with language sugar or without, to provide the most common kinds of use-site expressiveness. > The one declaration-site strategy I could see being viable is (per Maurizio, above) to allow a value class to assert that its default value should be treated as null, with all associated semantics (ifnull, NPE checks). As noted before, that's doable, but there are details to work out. > Then your safe migration path for VBCs is to design a field layout that has an acceptable 'null' encoding?often no sacrifice at all. Yes; all nulls get funneled to the naDT value of LocalDateTime. That's not terrible, but needs work to flesh out. ? John From asviraspossible at gmail.com Wed May 16 07:00:29 2018 From: asviraspossible at gmail.com (Victor Nazarov) Date: Wed, 16 May 2018 10:00:29 +0300 Subject: value type hygiene In-Reply-To: References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> Message-ID: Hi, I've been following a discussion and noticed that most of it revolve around arrays. I'd like to share an idea of mine that I find rather attractive following a compatibility story. What about arrays with null-bitmap. Arrays that are flattened array of value types plus a bitmap of is-null-status glued togeather. Every write operation to an array sets a nonnull-bit in a bitmap. For new clients that are aware of value-typeness this is the whole story. Bitmap is uninteresting to them. For legacy clients read operation can return null if is-non-null bit is not set. And write operation can clear a nonnull-bit when it writes null. This bitmap is analogues to ValueType attribute: it'a side channel that augments some operations. -- Victor Nazarov ??, 16 ??? 2018 ?., 1:53 John Rose : > More on your annotation proposal: > > One way to look at it is that you are proposing two new kinds of > concrete classes to go with our old object class. > > - object class = what we have today: identity-rich, nullable, not > flattened > - value class = new, flattened*, not nullable, identity-free > - enforced value-based class = new, nullable, not flattened, identity-free > > (* VM folks say "flattenable") > > Your annotation almost succeeds in being pure translation strategy, > except that the JVM needs to be told to take care of the array type. > So it's a little more than "just an annotation for javac". > > Dan seems to want to migrate many classes (most not enumerated) > as EVBCs. And certainly LocalDate and friends would seem to slide > easily into this mold. > > If we can implement EVBCs easily as a one-off from full value type, > in the context of L-world, should we try it? People responsible for > user model (hi Brian!) might say "yuck, we are admitting design > failure by giving a consolation prize to the VBCs, instead of the > real VTs promised". Maybe EVBCs are the best engineering > compromise, or maybe we just cut EVBCs off the feature list > and say "VT or VT not", at which point people who wrote VBCs > will have sad decisions to make, and Dan will tell them not to > migrate at all. > > From a VM POV, I think EVBCs are simple to implement on > top of L-world, and so seem worth the experiment. One > challenge will be to resist feature creep on EVBCs that > would entrench them in a position as a true third kind > of class. To avoid that, I like the idea of making them > a one-off of one of the prime types: objects or values. > And given that choice, having an annotation tweak them > from a value type seems like a fine choice, at least > from the VM POV. > > ? John > > On May 14, 2018, at 4:13 PM, Remi Forax wrote: > > > > I think i prefer a declaration site annotation like @ValueBasedClass to > a use site annotation ValueRef. > > > > For me a value based class, if you want to be as compatible as possible, > is a reference type (so 'L') that behave like a value type at runtime, so > the JIT can see it has a value type at runtime and can unbox it at will and > buffer/box it before calling a method at the horizon of the inlining blob. > > > > So as a library designer, i can choose to either replace a class by a > real value type and it will fail if null is present or as a value based > class if i value (sorry !) the backward compatibility more than the > performance. > > > > Note that even if we solve the null issue when a reference type is > changed to a value type and i think it's not something we should focus on, > there is still the problem of the identity, so transforming a reference to > a value type will never be 100% compatible. > > > > > > From brian.goetz at oracle.com Wed May 16 12:05:11 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 16 May 2018 08:05:11 -0400 Subject: value type hygiene In-Reply-To: References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> <563DA41B-EF57-42C0-A4C4-1803074FF2C3@oracle.com> Message-ID: <7840AC8A-771A-4A73-A6FF-687DF6F6817D@oracle.com> Please, no. Sent from my iPad > On May 15, 2018, at 8:43 PM, John Rose wrote: > >> On May 15, 2018, at 5:17 PM, Dan Smith wrote: >> >>> On May 15, 2018, at 4:53 PM, John Rose wrote: >>> >>> If we can implement EVBCs easily as a one-off from full value type, >>> in the context of L-world, should we try it? People responsible for >>> user model (hi Brian!) might say "yuck, we are admitting design >>> failure by giving a consolation prize to the VBCs, instead of the >>> real VTs promised". Maybe EVBCs are the best engineering >>> compromise, or maybe we just cut EVBCs off the feature list >>> and say "VT or VT not", at which point people who wrote VBCs >>> will have sad decisions to make, and Dan will tell them not to >>> migrate at all. >> >> Yeah, I'm pretty down on the benefit of these half-value classes. We'd be better off deprecating the old API classes and introducing new, "real" value class versions. > > OK, good; that's one less vote for a special JVM feature just for > migration. > > Doing things that way could look like this: > 1. rename LocalDateTime to LocalDateTimeVT (not its real name) > 2. make LocalDateTimeVT be a value type > 3. reconstruct the API of LocalDateTime, this time as an Integer-like wrapper for LocalDateTimeVT > > BTW, this raises a bee which was sleeping in my bonnet: > This is a good time to reconsider the rules for capitalizing types. > > Right now, at this special moment in time, all value types have > lower-case names, and all object type names begin with an > upper-case letter. This is trivially true because only primitives > are value types, and we followed the C tradition of naming them. > > So, how about we keep this useful state of affairs? Let's declare > that value types will conventionally be camel-case names with > an initial lower-case letter. > > At that point, the migrated LocalDateTime gets an obvious > and memorable name, localDateTime. (Perhaps, to avoid > collisions with method names, we might perturb the convention > a little more, but I don't think it's needed.) I think that would be > a useful outcome. > >> The great thing about use site expressiveness is that different clients can choose different trade-offs, rather than forcing a single choice on all clients. > > We can use ValueRef vs VT, with language sugar or without, > to provide the most common kinds of use-site expressiveness. > >> The one declaration-site strategy I could see being viable is (per Maurizio, above) to allow a value class to assert that its default value should be treated as null, with all associated semantics (ifnull, NPE checks). > > As noted before, that's doable, but there are details to work out. > >> Then your safe migration path for VBCs is to design a field layout that has an acceptable 'null' encoding?often no sacrifice at all. > > Yes; all nulls get funneled to the naDT value of LocalDateTime. > That's not terrible, but needs work to flesh out. > > ? John From forax at univ-mlv.fr Wed May 16 20:53:08 2018 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 16 May 2018 22:53:08 +0200 (CEST) Subject: value type hygiene In-Reply-To: References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> Message-ID: <53008618.1341123.1526503988733.JavaMail.zimbra@u-pem.fr> We can also restrict EVBC to be declared in the JDK (or java.base ?), because as far as i know, there is no other value based classes that currently exist, so the model exposed to the user is still object class vs value class. Also, I've come to realize that in a generified code an EVBC may be able to behave as a true value type because those codes doesn't already exist so there is less need to be backward compatible in this case. R?mi > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Envoy?: Mercredi 16 Mai 2018 00:53:02 > Objet: Re: value type hygiene > More on your annotation proposal: > One way to look at it is that you are proposing two new kinds of > concrete classes to go with our old object class. > - object class = what we have today: identity-rich, nullable, not flattened > - value class = new, flattened*, not nullable, identity-free > - enforced value-based class = new, nullable, not flattened, identity-free > (* VM folks say "flattenable") > Your annotation almost succeeds in being pure translation strategy, > except that the JVM needs to be told to take care of the array type. > So it's a little more than "just an annotation for javac". > Dan seems to want to migrate many classes (most not enumerated) > as EVBCs. And certainly LocalDate and friends would seem to slide > easily into this mold. > If we can implement EVBCs easily as a one-off from full value type, > in the context of L-world, should we try it? People responsible for > user model (hi Brian!) might say "yuck, we are admitting design > failure by giving a consolation prize to the VBCs, instead of the > real VTs promised". Maybe EVBCs are the best engineering > compromise, or maybe we just cut EVBCs off the feature list > and say "VT or VT not", at which point people who wrote VBCs > will have sad decisions to make, and Dan will tell them not to > migrate at all. > From a VM POV, I think EVBCs are simple to implement on > top of L-world, and so seem worth the experiment. One > challenge will be to resist feature creep on EVBCs that > would entrench them in a position as a true third kind > of class. To avoid that, I like the idea of making them > a one-off of one of the prime types: objects or values. > And given that choice, having an annotation tweak them > from a value type seems like a fine choice, at least > from the VM POV. > ? John > On May 14, 2018, at 4:13 PM, Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >> I think i prefer a declaration site annotation like @ValueBasedClass to a use >> site annotation ValueRef. >> For me a value based class, if you want to be as compatible as possible, is a >> reference type (so 'L') that behave like a value type at runtime, so the JIT >> can see it has a value type at runtime and can unbox it at will and buffer/box >> it before calling a method at the horizon of the inlining blob. >> So as a library designer, i can choose to either replace a class by a real value >> type and it will fail if null is present or as a value based class if i value >> (sorry !) the backward compatibility more than the performance. >> Note that even if we solve the null issue when a reference type is changed to a >> value type and i think it's not something we should focus on, there is still >> the problem of the identity, so transforming a reference to a value type will >> never be 100% compatible. From john.r.rose at oracle.com Wed May 16 21:21:00 2018 From: john.r.rose at oracle.com (John Rose) Date: Wed, 16 May 2018 14:21:00 -0700 Subject: value type hygiene In-Reply-To: References: <15062ede-ba64-c6c0-21eb-cbe7eb376c4a@oracle.com> <9C0128E4-C4D4-4E8D-B154-BEE0CC9C47C1@oracle.com> <424446009.441965.1526339639570.JavaMail.zimbra@u-pem.fr> Message-ID: <9BA71B4C-C709-4F6D-9ABD-2436734EAD41@oracle.com> On May 16, 2018, at 12:00 AM, Victor Nazarov wrote: > > What about arrays with null-bitmap That kind of split storage is going to require STM to satisfy the JMM. From brian.goetz at oracle.com Wed May 16 22:45:37 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 16 May 2018 18:45:37 -0400 Subject: Towards Minimal L World Message-ID: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> Putting my project management hat on ?. LWorld started out as a bold ? and risky ? experiment; could we throw away the information of what is a value and what is not from our type signatures, and reconstruct it sufficiently to not pollute performance? And it seems the results are quite promising ? so we would like to get more experience with it beyond writing toy examples and micro benchmarks. So its probably getting to be time to publish (as an EA) some sort of ?Minimal L World?. This list has been full of claims of the from ?we don?t need X?, ?we must have Y?, ?we should have Z?. I claim that all of these claims are type errors ? because they are missing the temporal clause that qualifies _when_ we might or might not need them. Let?s put some temporal structure on this, so we can rectify these transgressions. Let?s start with three milestones. LW1. This is the most minimal L-world implementation we could credibly publish. I would like to suggest we make this really minimal (more on this below), so we can get something into the hands of those that provide us useful feedback. Even a truly minimal version might be useful for machine learning (lots of data, mostly in arrays, no migration, no generics), algorithm design (such as squeezing indirections out of HAMT-based data structures), etc. I?ll leave some room between this and ... LW10. This would be the least we could actually ship as a product. This would need to support, for example, erased generics over values, but wouldn?t have specialized generics, yet. And more room between this and ? LW100. This is having achieved, well, Valhalla. Full optimization, specialized generics, migration support, you name it. OK, so how minimal is LW1? Well, I say really minimal: - User-definable value classes in javac, with a crappy ad-hoc syntax - Reasonable flattening and scalarization of values - No support for migration of VBCs to VCs - No support for any interaction between values and generics _whatsoever_ (I said minimal!) Even with these restrictions, I think we can call this good enough to be LW1, because it is still useful to the folks who are going to put it through its paces and give us feedback. Sure, broad users will not be interested, but that?s fine. We can then proceed to identify what are the sensible candidates for LWn (n < 10), and in what order. From john.r.rose at oracle.com Thu May 17 00:19:44 2018 From: john.r.rose at oracle.com (John Rose) Date: Wed, 16 May 2018 17:19:44 -0700 Subject: Towards Minimal L World In-Reply-To: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> Message-ID: +100 for LW1 ! Then we can more readily prove what?s needed for 10 and 100. From daniel.smith at oracle.com Thu May 17 02:53:38 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 16 May 2018 20:53:38 -0600 Subject: Towards Minimal L World In-Reply-To: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> Message-ID: <6C1200D2-8830-4CD5-834F-216F4D369DEF@oracle.com> > On May 16, 2018, at 4:45 PM, Brian Goetz wrote: > > No support for any interaction between values and generics _whatsoever_ (I said minimal!) You clarified that this means the compiler actively rejects types like List. Not clear to me what would prompt that?it's more work for the compiler, the JVM doesn't care either way, and it's easy for users to work around (use a raw type). But the quality of language support will be "crappy ad-hoc" anyway, so, you know, whatever works. I'm just happy to have a compiler at all! From john.r.rose at oracle.com Thu May 17 04:05:01 2018 From: john.r.rose at oracle.com (John Rose) Date: Wed, 16 May 2018 21:05:01 -0700 Subject: Towards Minimal L World In-Reply-To: <6C1200D2-8830-4CD5-834F-216F4D369DEF@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <6C1200D2-8830-4CD5-834F-216F4D369DEF@oracle.com> Message-ID: IMO, for experimental purposes any special javac check that blocks VTs from genetics needs to have a command line switch to unblock, so we can see by special experiment where exactly VT erasure breaks generics. > On May 16, 2018, at 7:53 PM, Dan Smith wrote: > >> On May 16, 2018, at 4:45 PM, Brian Goetz wrote: >> >> No support for any interaction between values and generics _whatsoever_ (I said minimal!) > > You clarified that this means the compiler actively rejects types like List. Not clear to me what would prompt that?it's more work for the compiler, the JVM doesn't care either way, and it's easy for users to work around (use a raw type). > > But the quality of language support will be "crappy ad-hoc" anyway, so, you know, whatever works. I'm just happy to have a compiler at all! From brian.goetz at oracle.com Thu May 17 12:38:03 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 17 May 2018 08:38:03 -0400 Subject: Towards Minimal L World In-Reply-To: References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <6C1200D2-8830-4CD5-834F-216F4D369DEF@oracle.com> Message-ID: <120D9053-1C0E-48C6-B273-F7C9998EF4D8@oracle.com> Or perhaps that is LWn (1 < n < 10). > On May 17, 2018, at 12:05 AM, John Rose wrote: > > IMO, for experimental purposes any special javac check that blocks VTs from genetics needs to have a command line switch to unblock, so we can see by special experiment where exactly VT erasure breaks generics. > > On May 16, 2018, at 7:53 PM, Dan Smith > wrote: > >>> On May 16, 2018, at 4:45 PM, Brian Goetz > wrote: >>> >>> No support for any interaction between values and generics _whatsoever_ (I said minimal!) >> >> You clarified that this means the compiler actively rejects types like List. Not clear to me what would prompt that?it's more work for the compiler, the JVM doesn't care either way, and it's easy for users to work around (use a raw type). >> >> But the quality of language support will be "crappy ad-hoc" anyway, so, you know, whatever works. I'm just happy to have a compiler at all! From maurizio.cimadamore at oracle.com Thu May 17 16:15:00 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 17 May 2018 17:15:00 +0100 Subject: value type hygiene In-Reply-To: References: <4DE1FC97-835B-435F-AE48-3DE3E60A457D@oracle.com> <77993d79-5097-e9c3-f2f7-d5f744396137@oracle.com> <2066983148.1293217.1525970526700.JavaMail.zimbra@u-pem.fr> <14A662F2-ACC2-4AB4-8A19-451DF26D6F1A@oracle.com> <73A0FC3C-C180-41C3-AED5-313C5BD1CA3A@oracle.com> <20053177.1351568.1525990137944.JavaMail.zimbra@u-pem.fr> <842319FF-B112-43CD-8824-39EFD5AF07E5@oracle.com> <8db40e21-1854-7b86-5b2b-fc6756e7356c@oracle.com> Message-ID: On 15/05/18 22:26, John Rose wrote: > On May 15, 2018, at 5:06 AM, Maurizio Cimadamore wrote: >> I wonder if we shouldn't also consider something along the lines of restricting migration compatibility only for _nullable_ value types, where a nullable value type is a value whose representation is big enough that it can afford one spare value to denote null-ness. So, if you want to convert existing legacy reference classes to value types, they'd better be nullable values; this way you don't lose any value in the legacy domain - nulls will be remapped accordingly (using a logic specified in the nullable value type declaration). >> >> It seems like we've been somewhere along this path before (when we were exploring the Q vs. L split) - why would something like that not be workable? > We do something similar explicitly (not globally) with the combinator > MHs.explicitCastArguments, which converts null to zeroes of > primitive types. But not vice versa: Zeroes don't re-box to null. > And it's a localized thing. > > I think you are suggesting a def-site opt-in where a class VT somehow > nominates a special value VT.N (perhaps its VT.default default, perhaps > not) with one or both of these behaviors: > assert( (VT)null == VT.N ); // unbox null to N > assert( (Object)VT.N == null ); // box N to null (probably not!) Yes, that is the spirit. > > For simplicity, let's say N must be VT.default, and that the conversion > is just one way (null to VT). Then the opt-in could be as simple as > mixing in an interface NullConvertsToDefault. See the P.S. of this > message: > http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2018-May/000634.html > > (Perhaps you are suggesting something different?) Well, I'm suggesting something along those lines, but I don't think that generally you can assume that null pattern and default pattern are the same thing, semantically; we've been there before, and came away with the feeling that these were two different things; e.g. null is often used as a special sentinel value, denoting some missing element. That's not default (which could be a totally legitimate value within the domain). > > It might be workable. It's complicated, of course. It would need to inject > special logic into many paths in the JVM which are currently just error > paths. There might be collateral damage on performance. For optimized > code, throwing an NPE is always simpler than patching and continuing. > This is roughly because, in optimized code, control flow merges are harder > to optimize than control flow forks. > > (Adding in the symmetric feature, of converting N to null, is probably > very expensive, since it would seem to require lots of new tests of > the form, "are you N?" And there's little value in converting N to > null and then having List.of or some other null-hostile API throw > an error. So then you have a puzzler: The seam between N and > null is not completely hidden.) But doesn't this more complex path only come up in migration cases? E.g. maybe there's a way to have this w/o totally compromising the performance model for regularly co-compiled classes? Maurizio > > AFAIK C# does something like this as a one-time deal, which cannot be > mixed in as an interface: > https://docs.microsoft.com/en-us/dotnet/api/system.nullable > > In C#, at the use-site of a type you can opt into it with an emotional > type like 'int?'. But we could make it opt in at the def-site, too, > if the value type has a spare code-point it's not using. I'm sure > if C# doesn't do this there are excellent reasons for them not to. > Anybody got information on this? > > I am hoping to avoid playing such a card. (In case anyone didn't notice, > we are playing with a large deck here, if not a full deck. There are > lots of potential moves we can make.) I want us to win with a small > number of moves. > > ? John > > P.S. Other examples of moves: Adding another type descriptor, having > one array type be polymorphically boxed or flattened, having two > VM-level types per source value type, waiting for reified generics, waiting > for primitive convergence, adding large infrastructure for migration. > Maybe we will be forced to do one or all of these before we can > get anywhere. I hope not; I'm trying to sneak across a meaningful > waypoint (not finish line) simply with L-world. > From maurizio.cimadamore at oracle.com Thu May 17 18:07:23 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Thu, 17 May 2018 19:07:23 +0100 Subject: Towards Minimal L World In-Reply-To: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> Message-ID: <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> This seems a very promising direction; as Dan points out, one of the sore points of previous 'minimal VT' effort was lack of compiler support; this guarantees more or less same features, but with some compiler support sprinkled on top. While I think going down into a syntactic bikeshed right now would be way too premature, IMHO, one thing that the language has to figure out is how the construction model for value types is exposed to programmers. Withers are very good at the bytecode level, not so much at the source level. Whatever we can do to minimize the entry barrier there I think would be very welcome. Maurizio On 16/05/18 23:45, Brian Goetz wrote: > Putting my project management hat on ?. > > LWorld started out as a bold ? and risky ? experiment; could we throw away the information of what is a value and what is not from our type signatures, and reconstruct it sufficiently to not pollute performance? And it seems the results are quite promising ? so we would like to get more experience with it beyond writing toy examples and micro benchmarks. So its probably getting to be time to publish (as an EA) some sort of ?Minimal L World?. > > This list has been full of claims of the from ?we don?t need X?, ?we must have Y?, ?we should have Z?. I claim that all of these claims are type errors ? because they are missing the temporal clause that qualifies _when_ we might or might not need them. Let?s put some temporal structure on this, so we can rectify these transgressions. > > Let?s start with three milestones. > > LW1. This is the most minimal L-world implementation we could credibly publish. I would like to suggest we make this really minimal (more on this below), so we can get something into the hands of those that provide us useful feedback. Even a truly minimal version might be useful for machine learning (lots of data, mostly in arrays, no migration, no generics), algorithm design (such as squeezing indirections out of HAMT-based data structures), etc. > > I?ll leave some room between this and ... > > LW10. This would be the least we could actually ship as a product. This would need to support, for example, erased generics over values, but wouldn?t have specialized generics, yet. > > And more room between this and ? > > LW100. This is having achieved, well, Valhalla. Full optimization, specialized generics, migration support, you name it. > > > OK, so how minimal is LW1? Well, I say really minimal: > - User-definable value classes in javac, with a crappy ad-hoc syntax > - Reasonable flattening and scalarization of values > - No support for migration of VBCs to VCs > - No support for any interaction between values and generics _whatsoever_ (I said minimal!) > > Even with these restrictions, I think we can call this good enough to be LW1, because it is still useful to the folks who are going to put it through its paces and give us feedback. Sure, broad users will not be interested, but that?s fine. > > We can then proceed to identify what are the sensible candidates for LWn (n < 10), and in what order. > > From paul.sandoz at oracle.com Thu May 17 19:24:32 2018 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 17 May 2018 12:24:32 -0700 Subject: Towards Minimal L World In-Reply-To: <6C1200D2-8830-4CD5-834F-216F4D369DEF@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <6C1200D2-8830-4CD5-834F-216F4D369DEF@oracle.com> Message-ID: <5D045092-2F81-4EF5-AB1B-1146CC43064A@oracle.com> > On May 16, 2018, at 7:53 PM, Dan Smith wrote: > >> On May 16, 2018, at 4:45 PM, Brian Goetz > wrote: >> >> No support for any interaction between values and generics _whatsoever_ (I said minimal!) > > You clarified that this means the compiler actively rejects types like List. Not clear to me what would prompt that?it's more work for the compiler, the JVM doesn't care either way, and it's easy for users to work around (use a raw type). > Right, one could use a raw type, which would induce, by default, a warning from javac, so maybe a warning is sufficient by default for the List case. > But the quality of language support will be "crappy ad-hoc" anyway, so, you know, whatever works. I'm just happy to have a compiler at all! Indeed, it's liberating :-) Paul. From john.r.rose at oracle.com Thu May 17 22:22:59 2018 From: john.r.rose at oracle.com (John Rose) Date: Thu, 17 May 2018 15:22:59 -0700 Subject: Towards Minimal L World In-Reply-To: <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> Message-ID: <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> On May 17, 2018, at 11:07 AM, Maurizio Cimadamore wrote: > > While I think going down into a syntactic bikeshed right now would be way too premature, IMHO, one thing that the language has to figure out is how the construction model for value types is exposed to programmers. Withers are very good at the bytecode level, not so much at the source level. Whatever we can do to minimize the entry barrier there I think would be very welcome. I think our starting point is porting classic Java constructor syntax & semantics to the value world. This gives programmers a way (which is awkward but workable) to build withers etc. A Java constructor in a value class will internally use withfield to translate any assignment of the form "this.x = y", and instead of the blank instance being an incoming reference in L[0], the constructor builds a blank value instances out of thin air using vdefault. An explicit wither can simply reconstruct another instance from scratch. This is a workaround in place of direct access to a single wither instruction, using a special syntax. (?Which has been previously discussed in terms like v2=__With(v.x, y) or something similar.) As a workaround, classic Java constructors are good enough to allow other experiments to get off the ground. After all, they are good enough for VBCs. ? John P.S. Teaser: Generalizing classic Java constructors to cover the case of withers and ad hoc update instructions is the subject of a forthcoming-but-delayed memo concerning "reconstructors". From maurizio.cimadamore at oracle.com Thu May 17 23:40:01 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 18 May 2018 00:40:01 +0100 Subject: Towards Minimal L World In-Reply-To: <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> Message-ID: On 17/05/18 23:22, John Rose wrote: > A Java constructor in a value class will internally use withfield > to translate any assignment of the form "this.x = y", and instead > of the blank instance being an incoming reference in L[0], the > constructor builds a blank value instances out of thin air using > vdefault. So, if I understand correctly, a classic Java constructor is a void-returning instance method; in the model you propose a value class constructor would be more similar to a V-returning static method (where V is the value to be constructed). This is all and well, but I feel that this pushes the problem under the (assignment) rug. E.g. I believe that reinterpreting the meaning of 'this.x = y' inside a value constructor to mean "get a brand new value and stick y into x" would be very confusing, as semantically, there's no assignment taking place. And, semantically, it doesn't even make sense to think about a 'this' (after all this is more like a static factory?). Of course you can spin this as reinterpreting the meaning of the word 'this' inside a value constructor - e.g. the new meaning being "the opaque value being constructed"; but that is likely to clash with other utterances of 'this' in the same value class (e.g. in other instance methods - where 'this' would simply mean 'this value'). Language-wise (and I repeat, it might well be too soon to dive into this), it feels like we're missing a way to express a new kind of a primitive operation (the wither). Without that, I'm a bit skeptical on our ability to be able to express value type constructors in a good way. Maurizio From brian.goetz at oracle.com Fri May 18 00:33:33 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 17 May 2018 20:33:33 -0400 Subject: Towards Minimal L World In-Reply-To: References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> Message-ID: <78E786CB-CD3B-4192-BA6D-2DD8882FE696@oracle.com> How about this (which is not unlike one of the ideas you proposed earlier for pattern declarations): - For a value type V with fields f1 .. fn, let the user write a constructor as if it were a regular class. - The compiler inserts synthetic blank finals f`1..f`n, and translates accesses to this.fi to accesses to f`i - The compiler requires that all f`i are DA at all normal completion points, and inserts { default_value / witfield* } copying from the synthetic f`i locals Now, a value ctor looks _exactly_ like a ctor in a non-value type with final fields. No new idioms to learn. > On May 17, 2018, at 7:40 PM, Maurizio Cimadamore wrote: > > > > On 17/05/18 23:22, John Rose wrote: >> A Java constructor in a value class will internally use withfield >> to translate any assignment of the form "this.x = y", and instead >> of the blank instance being an incoming reference in L[0], the >> constructor builds a blank value instances out of thin air using >> vdefault. > So, if I understand correctly, a classic Java constructor is a void-returning instance method; in the model you propose a value class constructor would be more similar to a V-returning static method (where V is the value to be constructed). > > This is all and well, but I feel that this pushes the problem under the (assignment) rug. E.g. I believe that reinterpreting the meaning of 'this.x = y' inside a value constructor to mean "get a brand new value and stick y into x" would be very confusing, as semantically, there's no assignment taking place. And, semantically, it doesn't even make sense to think about a 'this' (after all this is more like a static factory?). > > Of course you can spin this as reinterpreting the meaning of the word 'this' inside a value constructor - e.g. the new meaning being "the opaque value being constructed"; but that is likely to clash with other utterances of 'this' in the same value class (e.g. in other instance methods - where 'this' would simply mean 'this value'). > > Language-wise (and I repeat, it might well be too soon to dive into this), it feels like we're missing a way to express a new kind of a primitive operation (the wither). Without that, I'm a bit skeptical on our ability to be able to express value type constructors in a good way. > > Maurizio From maurizio.cimadamore at oracle.com Fri May 18 08:57:23 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 18 May 2018 09:57:23 +0100 Subject: Towards Minimal L World In-Reply-To: <78E786CB-CD3B-4192-BA6D-2DD8882FE696@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <78E786CB-CD3B-4192-BA6D-2DD8882FE696@oracle.com> Message-ID: <4ebce2f4-85a6-5167-7858-27537ed02413@oracle.com> On 18/05/18 01:33, Brian Goetz wrote: > How about this (which is not unlike one of the ideas you proposed earlier for pattern declarations): > > - For a value type V with fields f1 .. fn, let the user write a constructor as if it were a regular class. > - The compiler inserts synthetic blank finals f`1..f`n, and translates accesses to this.fi to accesses to f`i > - The compiler requires that all f`i are DA at all normal completion points, and inserts { default_value / witfield* } copying from the synthetic f`i locals > > Now, a value ctor looks _exactly_ like a ctor in a non-value type with final fields. No new idioms to learn. Seems more natural - as we're only doing assignments of locals, not assignment of fields (which is, in fact, impossible in the V-world). That's a well-spotted connection! Maurizio > >> On May 17, 2018, at 7:40 PM, Maurizio Cimadamore wrote: >> >> >> >> On 17/05/18 23:22, John Rose wrote: >>> A Java constructor in a value class will internally use withfield >>> to translate any assignment of the form "this.x = y", and instead >>> of the blank instance being an incoming reference in L[0], the >>> constructor builds a blank value instances out of thin air using >>> vdefault. >> So, if I understand correctly, a classic Java constructor is a void-returning instance method; in the model you propose a value class constructor would be more similar to a V-returning static method (where V is the value to be constructed). >> >> This is all and well, but I feel that this pushes the problem under the (assignment) rug. E.g. I believe that reinterpreting the meaning of 'this.x = y' inside a value constructor to mean "get a brand new value and stick y into x" would be very confusing, as semantically, there's no assignment taking place. And, semantically, it doesn't even make sense to think about a 'this' (after all this is more like a static factory?). >> >> Of course you can spin this as reinterpreting the meaning of the word 'this' inside a value constructor - e.g. the new meaning being "the opaque value being constructed"; but that is likely to clash with other utterances of 'this' in the same value class (e.g. in other instance methods - where 'this' would simply mean 'this value'). >> >> Language-wise (and I repeat, it might well be too soon to dive into this), it feels like we're missing a way to express a new kind of a primitive operation (the wither). Without that, I'm a bit skeptical on our ability to be able to express value type constructors in a good way. >> >> Maurizio From ali.ebrahimi1781 at gmail.com Fri May 18 09:58:31 2018 From: ali.ebrahimi1781 at gmail.com (Ali Ebrahimi) Date: Fri, 18 May 2018 14:28:31 +0430 Subject: Towards Minimal L World In-Reply-To: <78E786CB-CD3B-4192-BA6D-2DD8882FE696@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <78E786CB-CD3B-4192-BA6D-2DD8882FE696@oracle.com> Message-ID: Hi, On Fri, May 18, 2018 at 5:03 AM, Brian Goetz wrote: > How about this (which is not unlike one of the ideas you proposed earlier > for pattern declarations): > > - For a value type V with fields f1 .. fn, let the user write a > constructor as if it were a regular class. > - The compiler inserts synthetic blank finals f`1..f`n, and translates > accesses to this.fi to accesses to f`i > - The compiler requires that all f`i are DA at all normal completion > points, and inserts { default_value / witfield* } copying from the > synthetic f`i locals > > Now, a value ctor looks _exactly_ like a ctor in a non-value type with > final fields. No new idioms to learn. > I don't think quite so, consider this: public value class Value { int x; int y; public Value() { this.x = 10; Supplier f1 = () -> this; this.y = 15; Supplier f2 = () -> this; assert(f1.get() == f2.get()); } } -- Best Regards, Ali Ebrahimi From john.r.rose at oracle.com Fri May 18 19:28:59 2018 From: john.r.rose at oracle.com (John Rose) Date: Fri, 18 May 2018 12:28:59 -0700 Subject: Towards Minimal L World In-Reply-To: References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> Message-ID: <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> On May 17, 2018, at 4:40 PM, Maurizio Cimadamore wrote: > > > On 17/05/18 23:22, John Rose wrote: >> A Java constructor in a value class will internally use withfield >> to translate any assignment of the form "this.x = y", and instead >> of the blank instance being an incoming reference in L[0], the >> constructor builds a blank value instances out of thin air using >> vdefault. > So, if I understand correctly, a classic Java constructor is a void-returning instance method; in the model you propose a value class constructor would be more similar to a V-returning static method (where V is the value to be constructed). I think I failed to say clearly that, while I'm talking about JVM-level translation strategy, I'm mainly concerned with JLS-level notation, semantics, and user model. Usually I have my JVM hat on and JLS on the shelf, but this note is different. Thesis: Value constructors are the same notation as VBC constructors. I was speaking of the JLS-level semantics, in which there is no connection at all between constructors and methods (whether void returning or not, static or not). There's a good reason for this, becauseJLS-level constructors for objects translate to a bizarre thingy with properties like you describe, but with more properties which don't fit at all into the JLS, like the requirement that the JVM-level caller create a blank object instance and pass it as the receiver of an invokespecial instruction to the quasi-method (method at JVM level, constructor at JLS level). So I'd like to restate that it is my conviction that the JLS level semantics of constructors (please pretend for a moment that you never read bytecodes) can be carried over 100% wholesale from objects to values, by inspecting the JLS-level semantics of constructors on objects (*especially* in value-based classes) and observing that they apply in detail, without significant change, to a perfectly reasonable (and familiar!) semantics of constructors on values. All the stuff like "this is a factory and that is a void function" is an artifact of translation strategy. The semantics come first, and the translation strategy is a distant second. Then, if you work the details of translation strategy for values, you will find that the user model is the same but the detailed instructions are very different. Kind of like putfield vs. withfield, but deeper in the case of constructors. It is not a red flag, nor is it confusing to users, that the translation strategy must differ between objects and values, even though the constructor semantics are 99% identical. (The 1% comes in if object identity plays a role.) Brian's reply is more on-point to what I am trying to say (sorry I was not clear!), in that he proposes a translation strategy that "saves the appearances" of classic Java constructors (our promise is "codes like a class"), while taking care of the necessary changes to bytecode mechanics behind the scenes. (I would like to be slightly more aggressive than Brian about saving the appearance of 'this'; his solution is 99% but I think we can go safely to 99.9% of VBC operations. But that's a sub-discussion.) > This is all and well, but I feel that this pushes the problem under the (assignment) rug. E.g. I believe that reinterpreting the meaning of 'this.x = y' inside a value constructor to mean "get a brand new value and stick y into x" would be very confusing, as semantically, there's no assignment taking place. And, semantically, it doesn't even make sense to think about a 'this' (after all this is more like a static factory?). I don't know how to respond to this. The semantics *I* am proposing preserve all the appearances of classic Java constructors, which *do* appear semantically to initialize an instance (object or value) by assigning to fields. Even when such assignment is illegal (even meaningless!) in other contexts. Note that constructors are allowed to assign to fields when all other class coce are not, and that (at the JLS level) the assignments to fields in constructor are carefully choreographed by DA/DU rules. I am, of course, talking about final fields here. That's the common ground between VTs and VBCs. > Of course you can spin this as reinterpreting the meaning of the word 'this' inside a value constructor - e.g. the new meaning being "the opaque value being constructed"; but that is likely to clash with other utterances of 'this' in the same value class (e.g. in other instance methods - where 'this' would simply mean 'this value'). The only spin I want to spin is to preserve the classic constructor semantics, of field-wise initialization by assignment (static single assignment!). I think that is the least surprising model. > Language-wise (and I repeat, it might well be too soon to dive into this), it feels like we're missing a way to express a new kind of a primitive operation (the wither). Without that, I'm a bit skeptical on our ability to be able to express value type constructors in a good way. > > Maurizio It sounds like you fear that migrating a VBC to a VT must require rewriting its constructor into a completely new notation, not yet invented. I am saying that's not required, as the VBC notation is perfectly reasonable. Yes, withers require an *extension* to constructors, which (in a not yet specified way) allows field-wise single assignment to produce a new instance of a value given the field bindings of an old value, plus a constructor-like body of code. Working title for that is "reconstructor". For me, one sign that such extensions are done right *at the JLS level* is that they will be immediately applicable to VBCs as well as VTs. If we do withers right (at JLS level!), they will apply to VBCs as well as VTs. Their translation strategy will differ, just as values and objects differ, but the notation will apply to both indiscriminately. If we do it right. If we fumble the ball we'll split "codes like a class" into "codes like a value class when it's a value". But I don't think we will. ? John From brian.goetz at oracle.com Fri May 18 20:05:15 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 18 May 2018 16:05:15 -0400 Subject: Towards Minimal L World In-Reply-To: <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> Message-ID: <80820f60-e07a-9918-e08d-3ed09861f2e1@oracle.com> > Thesis: Value constructors are the same notation as VBC constructors. +1.? I don't think its a good use of our "user model complexity budget" to have a different way to write these guys.? It also facilitates migration.? And "Codes like a class" has a pretty sensible answer for "how do I write a constructor." From forax at univ-mlv.fr Fri May 18 20:07:45 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 18 May 2018 22:07:45 +0200 (CEST) Subject: Towards Minimal L World In-Reply-To: <80820f60-e07a-9918-e08d-3ed09861f2e1@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> <80820f60-e07a-9918-e08d-3ed09861f2e1@oracle.com> Message-ID: <1166217988.2241310.1526674065136.JavaMail.zimbra@u-pem.fr> It's more code like a class and don't leak 'this' as Ali and Maurizio said. R?mi ----- Mail original ----- > De: "Brian Goetz" > ?: "John Rose" , "Maurizio Cimadamore" > Cc: "valhalla-spec-experts" > Envoy?: Vendredi 18 Mai 2018 22:05:15 > Objet: Re: Towards Minimal L World >> Thesis: Value constructors are the same notation as VBC constructors. > > +1.? I don't think its a good use of our "user model complexity budget" > to have a different way to write these guys.? It also facilitates > migration.? And "Codes like a class" has a pretty sensible answer for > "how do I write a constructor." From brian.goetz at oracle.com Fri May 18 20:50:02 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 18 May 2018 16:50:02 -0400 Subject: Towards Minimal L World In-Reply-To: <1166217988.2241310.1526674065136.JavaMail.zimbra@u-pem.fr> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> <80820f60-e07a-9918-e08d-3ed09861f2e1@oracle.com> <1166217988.2241310.1526674065136.JavaMail.zimbra@u-pem.fr> Message-ID: <25abc77d-56ee-8f52-38fc-adb79ced8bf4@oracle.com> Well, leaking `this` from a ctor in a regular class isn't so great either, but yes, the only sensible use of `this` in a value ctor is to qualify a DU field. On 5/18/2018 4:07 PM, Remi Forax wrote: > It's more code like a class and don't leak 'this' as Ali and Maurizio said. > > R?mi > > ----- Mail original ----- >> De: "Brian Goetz" >> ?: "John Rose" , "Maurizio Cimadamore" >> Cc: "valhalla-spec-experts" >> Envoy?: Vendredi 18 Mai 2018 22:05:15 >> Objet: Re: Towards Minimal L World >>> Thesis: Value constructors are the same notation as VBC constructors. >> +1.? I don't think its a good use of our "user model complexity budget" >> to have a different way to write these guys.? It also facilitates >> migration.? And "Codes like a class" has a pretty sensible answer for >> "how do I write a constructor." From john.r.rose at oracle.com Sat May 19 01:38:16 2018 From: john.r.rose at oracle.com (John Rose) Date: Fri, 18 May 2018 18:38:16 -0700 Subject: Towards Minimal L World In-Reply-To: <1166217988.2241310.1526674065136.JavaMail.zimbra@u-pem.fr> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> <80820f60-e07a-9918-e08d-3ed09861f2e1@oracle.com> <1166217988.2241310.1526674065136.JavaMail.zimbra@u-pem.fr> Message-ID: On May 18, 2018, at 1:07 PM, Remi Forax wrote: > > It's more code like a class and don't leak 'this' as Ali and Maurizio said. That rule is an add-on which might help the user model in some way but is in no way logically necessary, any more than for VBCs. From daniel.smith at oracle.com Sat May 19 13:31:01 2018 From: daniel.smith at oracle.com (Dan Smith) Date: Sat, 19 May 2018 07:31:01 -0600 Subject: Towards Minimal L World In-Reply-To: <80820f60-e07a-9918-e08d-3ed09861f2e1@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> <80820f60-e07a-9918-e08d-3ed09861f2e1@oracle.com> Message-ID: <4C8773ED-CF29-46E4-826D-B5D7071D48BB@oracle.com> > On May 18, 2018, at 2:05 PM, Brian Goetz wrote: > > >> Thesis: Value constructors are the same notation as VBC constructors. > > +1. I don't think its a good use of our "user model complexity budget" to have a different way to write these guys. It also facilitates migration. And "Codes like a class" has a pretty sensible answer for "how do I write a constructor." A simpler approach, in the spirit of "let's not waste effort on problems we don't know how we're going to solve": you get one implicit constructor, and it's private. You can't declare custom constructors. Write factory methods for your clients. From brian.goetz at oracle.com Sun May 20 16:44:21 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 20 May 2018 12:44:21 -0400 Subject: Towards Minimal L World In-Reply-To: <4C8773ED-CF29-46E4-826D-B5D7071D48BB@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> <80820f60-e07a-9918-e08d-3ed09861f2e1@oracle.com> <4C8773ED-CF29-46E4-826D-B5D7071D48BB@oracle.com> Message-ID: <56A7F3CB-915F-45AF-AD11-E10321382F4E@oracle.com> Yes, this would work as a hack ? implicitly declare a private ctor that initializes the fields only, and let users write factories that do validation / normalization. The hacky part is that there has to be an order of the fields, and we?d probably take the order in which the fields are declared in the source file, which is kind of brittle. So I?d say that this definitely goes in the ?expedient MLW tricks? category, rather than a viable user model for LW10. > On May 19, 2018, at 9:31 AM, Dan Smith wrote: > >> On May 18, 2018, at 2:05 PM, Brian Goetz wrote: >> >> >>> Thesis: Value constructors are the same notation as VBC constructors. >> >> +1. I don't think its a good use of our "user model complexity budget" to have a different way to write these guys. It also facilitates migration. And "Codes like a class" has a pretty sensible answer for "how do I write a constructor." > > A simpler approach, in the spirit of "let's not waste effort on problems we don't know how we're going to solve": you get one implicit constructor, and it's private. You can't declare custom constructors. Write factory methods for your clients. From maurizio.cimadamore at oracle.com Mon May 21 10:44:04 2018 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 21 May 2018 11:44:04 +0100 Subject: Towards Minimal L World In-Reply-To: <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> References: <50DAF1FF-A82A-4F51-982C-7133FC5EBA30@oracle.com> <3fca935e-25e0-cf62-90cf-33fe8aa1f0d0@oracle.com> <6AD9CC43-62F0-4747-9363-74A518C5E0ED@oracle.com> <0F02BD00-3220-4CDE-9AAD-3BC9902A964F@oracle.com> Message-ID: <56b20b7e-e3c5-6133-c6f1-b2eaa4975238@oracle.com> On 18/05/18 20:28, John Rose wrote: > It sounds like you fear that migrating a VBC to a VT must require rewriting its > constructor into a completely new notation, not yet invented. I am saying that's > not required, as the VBC notation is perfectly reasonable. More the opposite; what I fear is that, by using a syntax that is 99% similar to constructors, developers would be lead into a false sense of security which might result in sharp edges. You speak about semantics; I believe that the fact that object construction is based upon a mutability assumption that is totally absent in values is an important semantic difference. Final fields are, I believe, not a very good metaphore for things that are possible only inside a constructors; when you speak about a final fields, well, you still have a regular field and you could almost even putfield on it - if it weren't for restrictions added on top. With values, we're saying that putfield _doesn't exist_ - it's not matter of allowing it in one place and not in others (which, IMHO, the 'this.x = y' suggests); we're replacing it with a whole new pattern that is not based on mutation. So, the fact that syntaxt-wise we're stuck with a description that is based on mutability seems like a red herring to me. Maurizio From david.holmes at oracle.com Wed May 23 00:08:33 2018 From: david.holmes at oracle.com (David Holmes) Date: Wed, 23 May 2018 10:08:33 +1000 Subject: [Nestmates] Minor updates and clarifications to the Reflection API specification Message-ID: Code review found some minor issues that needed attention. Please advise if there are any concerns with these changes. Thanks, David Full specs: http://cr.openjdk.java.net/~dholmes/8010319-JEP181/specs java.lang.Class::getNestHost() http://cr.openjdk.java.net/~dholmes/8010319-JEP181/specs/java.lang/java/lang/Class.html * Error handling The original text stated: "If there is any error accessing the nest host, or the nest host is in any way invalid, then |this| is returned." but the implementation only catches LinkageErrors. In the original discussion: http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2017-October/000386.html this wasn't discussed explicitly. It was mentioned by me in passing: "Though a case can still be made to allow VME's to pass through." and that is what has been happening with other API's (eg. MethodHandles). It is generally bad form to catch things like OutOfMemoryError and StackOverflow, so these should just propagate. So the text is updated to read: "If there is anylinkage error accessing the nest host, or the nest host is in any way invalid, then |this| is returned." where "linkage error" links to LinkageError. * Additional clarifying/explanatory text o To the paragraph starting "A /nest/ is a set of classes and interfaces ..", we add the final sentence: " All nestmates are implicitly defined in the same runtime package." o In the sentence starting "A class or interface that is not explicitly a member of a nest, is a member of the nest consisting only of itself, ..." we insert a clarification concerning primitive and array classes: "A class or interface that is not explicitly a member of a nest (such as a primitive or array class), is a member of the nest consisting only of itself, ..." o The @return text is reworded from: "the nest host of this class, or this if we cannot obtain a valid nest host" to "the nest host of this class, or |this| if a valid nest host cannot be obtained" From karen.kinnear at oracle.com Wed May 23 14:11:30 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 23 May 2018 10:11:30 -0400 Subject: Valhalla EG minutes May 9 and Apr 25 2018 Message-ID: <2DA3375A-6D11-41E7-8731-AA4F360FA0A3@oracle.com> Corrections welcome - apologies for the delay May 9, 2018 Attendees: Remi, Dan H, ]Dan Smith, Frederic, David Simms, Karen JVMLS upcoming talks: Any interest in cohosting: 1) LWorld 2) Nestmates and the Alternate Accessors 3) VT workshop 4) Condy Nestmates: Plan is to get this into JDK 11 JEP is targetted Spec update was sent for review - any questions? Remi: ASM 6.2 will be ready next week with Condy and Nestmates (later email update: and preview version handling) John: cleaning up issues e.g. invokespecial Value Types: 1. Cloneable: options: 1) disallow override and allow call: what does it mean to return the same Value Type? 2) disallow calling clone? Should javac give a warning? What about backward compatibility? For signatures that pass Object? John: note: clone is protected so must be called from subtype Dan H: propose prototype throw exception we can change this later 2. Null hygiene email VM static typing model - distinguish heap vs. stack verifier - check static type on stack others: ensure type publishing to heap enhance static typing model: list locally known value types - provide a classfile view when load classfile - check if really a Value Type lazily load other types a) identity sensitive most checks are dynamic type based - even if operating on ?Object? b) null suppression (ed. note: under re-discussion) Want static and dynamic from classfile point of view - catch surprise nulls Helps users, helps optimizations e.g. method scalarization static type: 1) verifier: aconst_null 2) checkcast: if local classfile statically knows VT: add null check 3) callee checks input arguments for null 4) return: caller check Dan H: for return - only statically declared VT known locally John: invocations are linked, may want to cache non-nullable information Remi: what about libraries that use null as a return type? e.g. Map.get() John: wants the VM to be locally strict if we have to deal with occurrences of nullable VT, revisit maybe explore: ValueRef in the language? Remi: 1) map.get() 2) push null to a Lambda John: does checkcast save you? Remi: need checkcast in bridge what about aconst_null call to erased generic? John: NPE is thrown, location varies checkcast could work for non null VT, could optimize away checkcast Remi: user may get NPE e.g. java.util.concurrent: methods take null in API spec - null as sentinel input Remi: prefer: nullable ValueType annotation VM has to not do null check John: what about when erased to Object? Remi: or implement interface? John: need to handle this in the context of erasure Perhaps lang and API folks fix the rest? Remi: requires more bridges John: don?t want to change fields, arrays, method descriptors - that ways lies pain but may have to do this Frederic: we have L descriptor which today implies nullable, perhaps a new descriptor for non-nullable? John: what if map.get were to return a ValueRef of ComplexDouble? Frederic: Concern is if we do the first prototype with the same descriptor but change the semantics, it is confusing if we add a new descriptor with the same semantics exploring: nonnullable container for VT - without exploring general nonnullability might help with backward compatibility for APIs with Object/Interface/[Object/[Interface allow generic specialization to optimize if you have nonnullable references allow javac/verifier/runtime checks Remi: isn?t this the same as the side table? Frederic: No. Different signature perhaps. Key is the nonnullability as a property of the container/reference, not the type Caller must convert e.g. in new code: use NVT; - which is non-nullable to pass to a method that expects Object;, safe to cast NVT; -> LVT; do the work at the boundary In Q-world we changed the type hierarchy of classes. In LN world - we change the property of the container/reference, not the type of the class NVT; is a non-nullable reference to the same class VT as LVT; which is a nullable reference to VT Trying to scale down the pain Remi: Need adaptation, e.g. hierarchy of methods John: prefer to try simpler approaches before new descriptors e.g. ValueRef Interface e.g. LComplexDouble -> LValueRef Karen: requires changing implementation of existing code - e.g. APIs which return null John: 1. generic code: free to instantiate to DoubleComples, ValueRef of DoubleComplex 2. APIs which return Optional - don?t have nulls as part of the API Karen: Concern about requiring changing existing implementations in the field: 1) erased generics and 2) APIs that assume null e.g. new code using old APIS that passes a non-nullable [VT and the implementation stores a null -> NPE John: Works like an Int - NPE is analogous to ArrayStoreException With a null return: checkcast will catch this immediately Also an expectation that most users will do a null check first for APIs that use null as a sentinel e.g. that no entry was found John: we may have to go to excited types ! and ? warning - that is a separate language discussion before it enters the vm discussion Remi: need another kind of erasure John: could use lazy casting: e.g. allow null check before casting - above the vm Remi: not many APIs use null for sentinels perhaps we don?t use those methods with VT John: perhaps only allow ?type in the return position? Dan H: Look at Collections? note: Brian started looking in 2012 - there is a literature of methods like this, may need to look again John: did a verbal introduction to Brian yesterday about value type hygiene, Brian is thinking about it Remi: type annotation may allow javac/verifier/runtime/JIT generics: may need special case null checks - perhaps with local annotations? John: let?s prototype the super-strict approach and see if it helps with optimizations Remi: there is an existing code road bock for value-based-class migration What does == on Optional.empty do? Need VBC to use .Equals John: libraries may need implementation changes e.g. Class.cast() - may need to add Class.castRef? reject null for VT? call Object.equals() Clear that a strict vm does not solve all problems Remi: benefit to having a prototype sooner rather than later, outside world needs to see progress John: performance depends on the strictness dial benefits in experiment with and without strictness Dan H: consider binaries via the adoptOpenJDK project? talk to Brian - using this for Amber ==== April 25, 2018 attendees: Dan H, Tobi, John, Frederic, David Simms, Lois, Karen Value Types 1. pre-loading of classes: needed for field flattening JIT method calls - preload formal parameters maybe a single mechanism ? details not worked out yet 2. proposing a new attribute javac view vs. runtime view javac - list all types known to be value types runtime: classes not listed in the attribute - don?t reload, don?t flatten if in the attribute - preload, check if VT can decide to flatten can use for fields and method signatures 3. circularity For layout we need to check circularity errors discussion of - if we detect a circularity error - perhaps don?t fail the load but rather treat as non-flattenable (ed. note - not sure how this would work) Note: VTs used in fields need to preload at class load time, used in method signatures - need to load at preparation time JIT needs to know value types for optimization note: calling convention: if inconsistencies between superclass and subclass lists - need to follow the least specific declaring classes? view of ValueType Dan H: if value type is not in the list -> throw ICCE Remove an attractive nuisance John: if we can be null hostile at call boundaries we can allow optimizations migrated value types are a corner case alternative: leave as is: JIT can be null hostile if it has VT information and deoptimize if null -> reduces performance Frederic: a common approach is to add an additional argument to specify null John: in new code we would never have null for a VT Karen: concern about nullability and backward compatibility Dan H: historically tried to avoid recompilation by javac increasing performance, discouraging javac clever optimizations Frederic: only time this would happen would be if you migrate a VBC -> VT Nullability: John: LWorld Types are all nullable, verifier can?t tell, if values are nullable they are not flattenable goal: have the vm be as null hostile as possible Clearly we will check nullability for containers - instance fields, static fields, array elements Verifier limitations: note: locals are NOT null hostile one approach: javac could add null checks as necessary or have a dial between hostility and leniency John prefers complete hostility when possible e.g arrays of VT are really flattenable and null hostile Dan S: original LWorld discussion - QTypes expressed null hostility alternative exploring: flag on fields if you want full control over null - maybe need QTypes arrays: preload the element, so we know it is a VT fields: have a flag could have QTypes in method signatures and verifier OR: casts could allow nulls locals - could expect javac to introduce null checks If we use Types for identity and value type and we preload all VT ACC_FLATTENABLE allows checking non-nullability for fields John: want the semantic effect if preload a VT and mismatchL -> throw ICCE John: summary: Treat Types as Types in this classfile verifier can reject null bearing values, method call boundaries, heap containers worth exploring From brian.goetz at oracle.com Wed May 23 14:55:37 2018 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 23 May 2018 10:55:37 -0400 Subject: Valhalla EG minutes May 9 and Apr 25 2018 In-Reply-To: <2DA3375A-6D11-41E7-8731-AA4F360FA0A3@oracle.com> References: <2DA3375A-6D11-41E7-8731-AA4F360FA0A3@oracle.com> Message-ID: > JVMLS upcoming talks: > Any interest in cohosting: > 1) LWorld > 2) Nestmates and the Alternate Accessors > 3) VT workshop > 4) Condy From my perspective, I'm interested in having the NM/Alternate Accessors talk, not because its "hot" news, but because it illustrates the real impact of seemingly-harmless VM changes, and by extension illustrates the scope of what the runtime does.? I think everyone who has ever had an idea for "why don't they just..." could benefit from this talk. From karen.kinnear at oracle.com Wed May 23 17:07:58 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 23 May 2018 13:07:58 -0400 Subject: [Nestmates] Minor updates and clarifications to the Reflection API specification In-Reply-To: References: Message-ID: <44B72D0A-B081-452A-BA20-8E6058EFE91D@oracle.com> David, The Valhalla EG met today May 23, 2018 and walked through the details of this proposal and how they relate to the JVMS. The EG was ok with these changes. Thank you for sending them for review. thanks, Karen > On May 22, 2018, at 8:08 PM, David Holmes wrote: > > Code review found some minor issues that needed attention. Please advise if there are any concerns with these changes. > > Thanks, > > David > > Full specs: http://cr.openjdk.java.net/~dholmes/8010319-JEP181/specs > java.lang.Class::getNestHost() > > http://cr.openjdk.java.net/~dholmes/8010319-JEP181/specs/java.lang/java/lang/Class.html > Error handling > The original text stated: > > "If there is any error accessing the nest host, or the nest host is in any way invalid, then this is returned." > > but the implementation only catches LinkageErrors. In the original discussion: > > http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2017-October/000386.html > > this wasn't discussed explicitly. It was mentioned by me in passing: > > "Though a case can still be made to allow VME's to pass through." > > and that is what has been happening with other API's (eg. MethodHandles). It is generally bad form to catch things like OutOfMemoryError and StackOverflow, so these should just propagate. So the text is updated to read: > "If there is any linkage error accessing the nest host, or the nest host is in any way invalid, then this is returned." > > where "linkage error" links to LinkageError. > > Additional clarifying/explanatory text > To the paragraph starting "A nest is a set of classes and interfaces ..", we add the final sentence: " > All nestmates are implicitly defined in the same runtime package." > In the sentence starting "A class or interface that is not explicitly a member of a nest, is a member of the nest consisting only of itself, ..." we insert a clarification concerning primitive and array classes: "A class or interface that is not explicitly a member of a nest (such as a primitive or array class), is a member of the nest consisting only of itself, ..." > The @return text is reworded from: "the nest host of this class, or this if we cannot obtain a valid nest host" to "the nest host of this class, or this if a valid nest host cannot be obtained" > > From david.holmes at oracle.com Wed May 23 21:27:32 2018 From: david.holmes at oracle.com (David Holmes) Date: Thu, 24 May 2018 07:27:32 +1000 Subject: [Nestmates] Minor updates and clarifications to the Reflection API specification In-Reply-To: <44B72D0A-B081-452A-BA20-8E6058EFE91D@oracle.com> References: <44B72D0A-B081-452A-BA20-8E6058EFE91D@oracle.com> Message-ID: Thank you Karen - and EG members. David On 24/05/2018 3:07 AM, Karen Kinnear wrote: > David, > > The Valhalla EG met today May 23, 2018 and walked through the details of > this proposal and how > they relate to the JVMS. > > The EG was ok with these changes. Thank you for sending them for review. > > thanks, > Karen > >> On May 22, 2018, at 8:08 PM, David Holmes > > wrote: >> >> Code review found some minor issues that needed attention. Please >> advise if there are any concerns with these changes. >> >> Thanks, >> >> David >> >> >> Full specs: http://cr.openjdk.java.net/~dholmes/8010319-JEP181/specs >> >> >> java.lang.Class::getNestHost() >> >> http://cr.openjdk.java.net/~dholmes/8010319-JEP181/specs/java.lang/java/lang/Class.html >> >> * Error handling >> >> The original text stated: >> >> "If there is any >> error >> accessing the nest host, or the nest host is in any way >> invalid, then |this| is returned." >> >> but the implementation only catches LinkageErrors. In the original >> discussion: >> >> http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2017-October/000386.html >> >> this wasn't discussed explicitly. It was mentioned by me in passing: >> >> "Though a case can still be made to allow VME's to pass through." >> >> and that is what has been happening with other API's (eg. >> MethodHandles). It is generally bad form to catch things like >> OutOfMemoryError and StackOverflow, so these should just >> propagate. So the text is updated to read: >> >> "If there is anylinkage error >> >> accessing the nest host, or the nest host is in any way >> invalid, then |this| is returned." >> >> where "linkage error" links to LinkageError. >> >> * Additional clarifying/explanatory text >> o To the paragraph starting "A /nest/ is a set of classes and >> interfaces ..", we add the final sentence: " >> All nestmates are implicitly defined in the same runtime package." >> o In the sentence starting "A class or interface that is not >> explicitly a member of a nest, is a member of the nest >> consisting only of itself, ..." we insert a clarification >> concerning primitive and array classes: "A class or interface >> that is not explicitly a member of a nest (such as a primitive >> or array class), is a member of the nest consisting only of >> itself, ..." >> o The @return text is reworded from: "the nest host of this >> class, or this if we cannot obtain a valid nest host" to "the >> nest host of this class, or |this| if a valid nest host cannot >> be obtained" >> >> >> > From frederic.parain at oracle.com Fri May 25 15:18:33 2018 From: frederic.parain at oracle.com (Frederic Parain) Date: Fri, 25 May 2018 11:18:33 -0400 Subject: Static value fields initialization Message-ID: The JVMS has to be modified to cover the initialization of static value fields. However, the implications of the change should be evaluated carefully before committing them. The JVMS10, section 5.4.2 Preparation says: "Preparation involves creating the static fields for a class or interface and initializing such fields to their default values (?2.3, ?2.4). This does not require the execution of any Java Virtual Machine code; explicit initializers for static fields are executed as part of initialization (?5.5), not preparation.? In the JVMS draft for Value types, section 2.4 Reference Types and Values says: "Value classes have a special value, called the default value, which has all its instance variables set to their default value according to their declaration (?4.5) and the initial default value of each type (?2.3, ?2.4). This default value for value classes is a valid, fully initialized value. Any use of a default value of a value type requires class initialization of the value class since a default value is an instance and all instance bytecodes assume pre-initialization. This is subject to the same exception all classes have, which is that during the initializing thread can create instances (including default values) of themselves.? Let?s consider the model for minimal L-world 1, where all value fields are flattenable. Static value fields must be initialized to their default value at preparation time, which implies the initialization of their value class. Here?s the contradiction: the value class initialization requires code execution, when previous statement in the preparation specification says there?s no code execution. How big would be the implications to allow code execution during the preparation phase? Note that the code to be executed is not the code of the class being prepared, but the code of the value class used for one of its static fields. What could be consequences on the current class life cycle if the initialization of the value class fails to initialize? Any though to help understanding this issue is welcome. Thank you, Fred From forax at univ-mlv.fr Sat May 26 10:06:42 2018 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 26 May 2018 12:06:42 +0200 (CEST) Subject: Valhalla EG minutes May 9 and Apr 25 2018 In-Reply-To: <2DA3375A-6D11-41E7-8731-AA4F360FA0A3@oracle.com> References: <2DA3375A-6D11-41E7-8731-AA4F360FA0A3@oracle.com> Message-ID: <99938119.1836273.1527329201999.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Karen Kinnear" > ?: "valhalla-spec-experts" > Envoy?: Mercredi 23 Mai 2018 16:11:30 > Objet: Valhalla EG minutes May 9 and Apr 25 2018 > Corrections welcome - apologies for the delay > > May 9, 2018 > > Attendees: Remi, Dan H, ]Dan Smith, Frederic, David Simms, Karen > > JVMLS upcoming talks: > Any interest in cohosting: > 1) LWorld > 2) Nestmates and the Alternate Accessors > 3) VT workshop > 4) Condy > > Nestmates: > Plan is to get this into JDK 11 > JEP is targetted > Spec update was sent for review - any questions? > Remi: ASM 6.2 will be ready next week with Condy and Nestmates (later email > update: and preview version handling) > John: cleaning up issues e.g. invokespecial > Just to say that ASM 6.2 is now available on maven central with the nestmate support but also constant dynamic and the preview flag, everything is under an experimental flag (ASM7_EXPERIMENTAL) until the jdk 11 is released. R?mi From john.r.rose at oracle.com Mon May 28 23:12:42 2018 From: john.r.rose at oracle.com (John Rose) Date: Mon, 28 May 2018 16:12:42 -0700 Subject: Static value fields initialization In-Reply-To: References: Message-ID: On May 25, 2018, at 8:18 AM, Frederic Parain wrote: > ? Let?s consider the model for minimal L-world 1, where all value > fields are flattenable. Static value fields must be initialized to their > default value at preparation time, which implies the initialization > of their value class. Here?s the contradiction: the value class > initialization requires code execution, when previous statement > in the preparation specification says there?s no code execution. Currently, preparation is a very early phase which can only assume that supers are loaded. With value types, we also load the non-static field types. So far, neither of those (supers or instance fields) will tolerate circular dependencies, and the recursive loading of supers (and presumably instance fields) doesn't cause code to execute either. (Exception: Class loader code can run, at the "meta" level to the class. We usually disregard that kind of execution. We are concerned here with the phased execution of user-written bytecodes in loaded classes.) Executing code during preparation would greatly constrain the JVM's execution order, both creating new dependencies that might not be resolvable, because of circularities, and also preventing optimizations (such as CDS or AOT) which might perform early phases like preparation in a special way, but cannot tolerate general code execution. The good news is I don't think we need to complicate preparation to that extent; some small rule tweaks will get us where we want. When supers and instance field types are loaded, it is not yet possible to run but it *is* possible to compute the bits of a default value, because we have defined the default value without reference to any computation: It is pure structure (the nesting of the value down to its primitive and reference components) plus the zero bits for each leaf of the structure. The early determination of a default value for a value type V lets us compute (and allocate and initialize) static default values for V very early, before V begins to initialize. A class C might have some "static V v;" and V is mentioned in C's ValueTypes attribute. This means that during C's preparation, storage must be prepared for C.v. This also requires that V be loaded enough to determine V's default value. A reasonable implementation might allocate static storage for *both* a reference to V (as if V were a reference type) and *also* a writable buffer (or one-elemenet array) holding the default bits (all zeroes) for V; it would initialize the reference C.v to point to the buffer for C.v and also copy the bits of V.default (previously computed) into the buffer for C.v; further "putfield" ops on C.v would overwrite the bits in the writable buffer for C.v, but leave the reference for C.v unchanged. Another reasonable implementation (and a simpler one) might just allocate static storage a reference C.v and patch it with a read-only copy of V.default; further "putfield" ops would box the new value and overwrite the reference for C.v. Both of these implementations could leverage existing static-field preparation logic that applies to references, to set up the reference part of C.v. (In either of the above implementation strategies, A JVM should probably allocate, as a standard feature of every value class V, a reference to a read-only copy of V's default value, and place that reference in the same table as the user-defined static fields of V. It is as if the JVM adds a synthetic static field V.$default of synthetic type "reference to V", which points to a read-only buffered default (all-zero) value of type V. Such a thing can be easily created during preparation time, when the JVM is already concerned with creating the machinery for V's statics. After that point, V or any of its client classes can easily obtain the pointer to the canonical copy of V's default as V.$default. I don't suggest literally naming it "$default" but rather having an injected "extra" static reference slot, wherever the static field references are kept by the JVM.) Does any of this lead to a paradox where the default value C.v of type V can become visible before V. is run? We certainly want to avoid such a thing; no value of V.default should enter the JVM stack until V. is triggered. Does this mean that we need to execute V. in order to prepare C.v? That brings in additional circular dependencies which I think we will find intractable. The answer is simple, when you think about it: The preparation phase allocates a static variable C.v initialized to a value V.default, but neither the variable nor its value ever accessed until the static variable is loaded. This means that we can *prepare* C.v and V.default before V. runs, as long as we ensure that the value doesn't escape before V. is triggered. Doing this seems simple to me. It is already the case that C. must be triggered before C.v becomes available. What about V.? I think that is easy to handle with a new rule in the JVMS, that makes sure the V. triggers before C.. There's a simple way to phrase this rule: The initialization of a class C must recursively trigger the initialization of every value type that occurs as the type of a field (whether static field or instance field). Thus, just before C. is run, the JVM must first run V., because the type of C.v is V. A simpler, broader rule would be: Just before a class C is initialized, for each type V in its ValueTypes attribute, the type V is initialized (recursively, with the usual short circuit logic if the initialization of V has already started in the same thread). It is tempting to play this game at every step: Each V in C.ValueTypes is loaded before C is loaded, and so on for the other phases (prepared and/or linked, initialized). Perhaps that gains us simplicity without loss of function. (Notes: I say C. and V. for concreteness; what I really mean is the initialization phase of C and V. The logic above should be adjusted to reflect this, because it still must apply even if C or V lacks an actual block. I say "trigger" above because, as we know, the has to *start* before anything in the class can be executed, but it doesn't have to *finish* because of the corner case of circular dependencies, which are resolved in the first thread that needs the initialization to execute.) So let's resolve this contradiction by creating the default value at preparation time but ensuring that it never appears as the result of a bytecode execution (vdefault, getstatic, aaload, etc.) until the value's has been triggered. One final note: You might be thinking that we at least need to execute the code of the value type's nullary constructor, in order to compute the default value. That's where we profit from laying down a hard rule, that the VM owns the default value, not the user. So the nullary constructor is not definable by the user, and (as noted above) its effect is defined purely in terms of the structure of the value type, and the default (all-zero) values of the leaf components of the value. Put another way, the JVM doesn't need *any* constructor to create the unique default value of a given value type (and forbidding nullary constructors is just a way of avoiding misunderstandings). In any case, there is never any need to execute code to derive the default. ? John From karen.kinnear at oracle.com Wed May 30 17:41:35 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 30 May 2018 13:41:35 -0400 Subject: LW1 and ValueTypes consistency Message-ID: Details for LW1: Goals of LW1: Support Early Access Binaries ASAP Support immutable, identity-free value types that are subtypes of java.lang.Object Limitations: No support for value classes as type parameters for generics (enforced by javac) No migration of value-based classes In order to deliver this as quickly as possible, we want to decouple the agreed-upon proposals from the currently open issues. Specifically, we are decoupling adding the ValueTypes attribute for consistently identifying known value types from the discussion of how we deal with nullability, which allows us to make significant progress and narrows the open design issues. This email details this initial step. We have done detailed exploration of eager loading requirements. ======= I. Support for ValueTypes attribute and value type consistency Javac: ValueTypes Attribute identifies value types known at compile time Verifier: Ensure specific bytecodes operate on match of value type vs. not: (withfield, defaultvalue vs. new, monitor*) Runtime: Ensure consistency between assumptions of a type being a value type and the actual type?s expectations when loaded. Thanks to Frederic: improvement on consistency guarantees for being a Value Type. Checks for consistency will be performed and throw ICCE on mismatches between ValueTypes attribute and actual loaded class is_value_type. Clarification of terms: ?preloading? means loading before completion of loading for the declaring class. This is only used for flattenable fields. ?link-time loading? means loading during the link phase for the declaring class, which is used for all other uses of entries in the ValueTypes attribute. 1. *1 Flattenable fields: preload value types used for local flattenable field descriptors listed in the ValueTypes attribute 2. Local methods: load value types in method descriptors (parameters/return) at link time, prior to preparation 3. constant pool resolution: all constant pool resolution will ensure value type consistency 4. References to constant pool descriptors: field and method descriptors at resolution of target field and method descriptors, for all parameters/return, ensure value type consistency Interpreter: specific bytecodes change behavior based on dynamic checks for is_value_type JIT: specific bytecodes change behavior based on dynamic checks for is_value_type Open design issues for LW1 1. flattenable statics: http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2018-May/000699.html 2. nullability handling for LW1. I will send follow-up email exploring trade-offs. thanks, Karen From karen.kinnear at oracle.com Wed May 30 20:14:25 2018 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Wed, 30 May 2018 16:14:25 -0400 Subject: Static value fields initialization In-Reply-To: References: Message-ID: John, Very much like the direction of your proposal, which I summarize as: 1. preparation: continue no code execution VM determines default value via size information from loaded class (flattened fields are pre-loaded, so we have that information at preparation time) 2. pre-: of the containing class requires for all entries in the ValueTypes attribute prior to completing its own ---- I think we could optimize this to pre- for entries in the ValueTypes attribute that are referenced by local flattenable fields, static or instance. Specifically: we have entries in the ValueTypes attribute for: 1. local fields 2. local method signatures 3. remote fields 4. remote method signatures I think we only need the pre- for #1 3. I think we also need to ensure that other bytecodes that can return a default value instance of a value type would also require of the value type: defaultvalue, anewarray, multianewarray (JVMS draft 4d already includes defaultvalue, but not yet array bytecodes) Still exploring to see if there are any holes in these assumptions, appreciate additional eyes on this. So far so good. thanks, Karen p.s. The circularity issue relative to flattenable static fields is the same as the one relative to flattenable instance fields. When you pre-load the value type you risk circularity if it is defined to contain directly or indirectly a flattenable field of the type of itself. For LW1, with all value type fields flattenable, users will not be able to have a static field containing an instance of itself. If we want to support that capability in a future LWX, we could do that with a non-flattenable static field containing a value type. > On May 28, 2018, at 7:12 PM, John Rose wrote: > > On May 25, 2018, at 8:18 AM, Frederic Parain wrote: >> ? Let?s consider the model for minimal L-world 1, where all value >> fields are flattenable. Static value fields must be initialized to their >> default value at preparation time, which implies the initialization >> of their value class. Here?s the contradiction: the value class >> initialization requires code execution, when previous statement >> in the preparation specification says there?s no code execution. > > Currently, preparation is a very early phase which can only assume > that supers are loaded. With value types, we also load the non-static > field types. So far, neither of those (supers or instance fields) will tolerate > circular dependencies, and the recursive loading of supers (and > presumably instance fields) doesn't cause code to execute either. > > (Exception: Class loader code can run, at the "meta" level to the > class. We usually disregard that kind of execution. We are > concerned here with the phased execution of user-written > bytecodes in loaded classes.) > > Executing code during preparation would greatly constrain the > JVM's execution order, both creating new dependencies that might > not be resolvable, because of circularities, and also preventing > optimizations (such as CDS or AOT) which might perform early > phases like preparation in a special way, but cannot tolerate > general code execution. > > The good news is I don't think we need to complicate preparation > to that extent; some small rule tweaks will get us where we want. > > When supers and instance field types are loaded, it is not yet possible > to run but it *is* possible to compute the bits of a default value, > because we have defined the default value without reference to any > computation: It is pure structure (the nesting of the value down to its > primitive and reference components) plus the zero bits for each leaf > of the structure. > > The early determination of a default value for a value type V lets us > compute (and allocate and initialize) static default values for V very > early, before V begins to initialize. > > A class C might have some "static V v;" and V is mentioned in C's > ValueTypes attribute. This means that during C's preparation, storage > must be prepared for C.v. This also requires that V be loaded enough > to determine V's default value. A reasonable implementation might > allocate static storage for *both* a reference to V (as if V were a > reference type) and *also* a writable buffer (or one-elemenet array) > holding the default bits (all zeroes) for V; it would initialize the reference > C.v to point to the buffer for C.v and also copy the bits of V.default (previously > computed) into the buffer for C.v; further "putfield" ops on C.v would > overwrite the bits in the writable buffer for C.v, but leave the reference > for C.v unchanged. Another reasonable implementation (and a simpler > one) might just allocate static storage a reference C.v and patch it > with a read-only copy of V.default; further "putfield" ops would box > the new value and overwrite the reference for C.v. Both of these > implementations could leverage existing static-field preparation logic > that applies to references, to set up the reference part of C.v. > > (In either of the above implementation strategies, A JVM should > probably allocate, as a standard feature of every value class V, > a reference to a read-only copy of V's default value, and place > that reference in the same table as the user-defined static fields > of V. It is as if the JVM adds a synthetic static field V.$default > of synthetic type "reference to V", which points to a read-only > buffered default (all-zero) value of type V. Such a thing can > be easily created during preparation time, when the JVM is > already concerned with creating the machinery for V's statics. > After that point, V or any of its client classes can easily obtain > the pointer to the canonical copy of V's default as V.$default. > I don't suggest literally naming it "$default" but rather having > an injected "extra" static reference slot, wherever the static > field references are kept by the JVM.) > > Does any of this lead to a paradox where the default value C.v of type > V can become visible before V. is run? We certainly want to > avoid such a thing; no value of V.default should enter the JVM stack > until V. is triggered. Does this mean that we need to execute > V. in order to prepare C.v? That brings in additional circular > dependencies which I think we will find intractable. > > The answer is simple, when you think about it: The preparation > phase allocates a static variable C.v initialized to a value V.default, > but neither the variable nor its value ever accessed until the static > variable is loaded. This means that we can *prepare* C.v and > V.default before V. runs, as long as we ensure that the > value doesn't escape before V. is triggered. > > Doing this seems simple to me. It is already the case that C. > must be triggered before C.v becomes available. What about V.? > I think that is easy to handle with a new rule in the JVMS, that makes > sure the V. triggers before C.. > > There's a simple way to phrase this rule: The initialization of a class > C must recursively trigger the initialization of every value type that > occurs as the type of a field (whether static field or instance field). > Thus, just before C. is run, the JVM must first run V., > because the type of C.v is V. > > A simpler, broader rule would be: Just before a class C is initialized, > for each type V in its ValueTypes attribute, the type V is initialized > (recursively, with the usual short circuit logic if the initialization of V > has already started in the same thread). It is tempting to play this > game at every step: Each V in C.ValueTypes is loaded before > C is loaded, and so on for the other phases (prepared and/or linked, > initialized). Perhaps that gains us simplicity without loss of function. > > (Notes: I say C. and V. for concreteness; what I really > mean is the initialization phase of C and V. The logic above should > be adjusted to reflect this, because it still must apply even if C or V > lacks an actual block. I say "trigger" above because, as we > know, the has to *start* before anything in the class can be > executed, but it doesn't have to *finish* because of the corner case > of circular dependencies, which are resolved in the first thread that > needs the initialization to execute.) > > So let's resolve this contradiction by creating the default value > at preparation time but ensuring that it never appears as the > result of a bytecode execution (vdefault, getstatic, aaload, etc.) > until the value's has been triggered. > > One final note: You might be thinking that we at least need to > execute the code of the value type's nullary constructor, in order > to compute the default value. That's where we profit from laying > down a hard rule, that the VM owns the default value, not the user. > So the nullary constructor is not definable by the user, and (as > noted above) its effect is defined purely in terms of the structure > of the value type, and the default (all-zero) values of the leaf > components of the value. Put another way, the JVM doesn't > need *any* constructor to create the unique default value of > a given value type (and forbidding nullary constructors is just > a way of avoiding misunderstandings). In any case, there is > never any need to execute code to derive the default. > > ? John > From john.r.rose at oracle.com Wed May 30 20:32:58 2018 From: john.r.rose at oracle.com (John Rose) Date: Wed, 30 May 2018 13:32:58 -0700 Subject: Static value fields initialization In-Reply-To: References: Message-ID: On May 30, 2018, at 1:14 PM, Karen Kinnear wrote: > > John, > > Very much like the direction of your proposal, which I summarize as: > > 1. preparation: continue no code execution > VM determines default value via size information from loaded class > (flattened fields are pre-loaded, so we have that information at preparation time) > > 2. pre-: of the containing class requires for all entries > in the ValueTypes attribute prior to completing its own Yes. Point 2 expands slightly to be "ensure VT. before any bytecode usage of VT." > > ---- > > I think we could optimize this to pre- for entries in the ValueTypes attribute that > are referenced by local flattenable fields, static or instance. > > Specifically: we have entries in the ValueTypes attribute for: > 1. local fields > 2. local method signatures > 3. remote fields > 4. remote method signatures > > I think we only need the pre- for #1 I agree, and would prefer this. > 3. I think we also need to ensure that other bytecodes that can return a default value instance > of a value type would also require of the value type: > defaultvalue, anewarray, multianewarray (JVMS draft 4d already includes defaultvalue, but not yet array bytecodes) Good catch on defaultvalue (what I meant by "vdefault"). The array bytecodes *do* load the array element class, so we are close already. The missing bit, I think, is to inspect the loaded array class, see if it is a value type, and run before creating the array. This will require us to adjust our initialization barriers a little. > Still exploring to see if there are any holes in these assumptions, appreciate additional eyes on this. > So far so good. I'm very glad you think so. In our experiments, I'd like to lift restrictions on static field types sooner rather than later. (A workaround for Java coders would be to define a private static inner class to hold the problematic statics, but I'd rather not have this sharp edge.) ? John