From Tobi_Ajila at ca.ibm.com Wed Apr 1 14:19:58 2020 From: Tobi_Ajila at ca.ibm.com (Tobi Ajila) Date: Wed, 1 Apr 2020 09:19:58 -0500 Subject: Updated SoV documents In-Reply-To: <7c7f797d-b5da-7f1e-04f1-e0b3e9d2e745@oracle.com> References: <7c7f797d-b5da-7f1e-04f1-e0b3e9d2e745@oracle.com> Message-ID: Hi Brian Thanks for the updated SoV docs. In section 4, it mentions: > In most cases, such as field descriptors and method descriptors, uses of C.ref is translated as LC$ref;, uses of C.val is translated as QC$val;, In the LW2 spec the `name_index` in `CONSTANT_Class_info` structures could refer to "binary class or interface name" as well as "ReferenceType descriptors" which referenced UTF8s with 'Q' descriptors. In LW2 inline-types were both nullable and null-free so it was necessary to have ReferenceType descriptors in order make a distinction in CONSTANT_Class_info structures for things like allocating arrays. With the new model inline-types can only be null-free, so will the CONSTANT_Class_info structures be limited to binary class or interface names? or will ReferenceType descriptors be used for inline-types? --Tobi "valhalla-spec-experts" wrote on 2020/03/27 03:59:38 PM: > From: Brian Goetz > To: valhalla-spec-experts > Date: 2020/03/27 03:59 PM > Subject: [EXTERNAL] Updated SoV documents > Sent by: "valhalla-spec-experts" bounces at openjdk.java.net> > > I've updated the SoV documents, including the new sections on VM > model and translation: > > http://cr.openjdk.java.net/~briangoetz/valhalla/sov/01-background.html > From brian.goetz at oracle.com Thu Apr 2 16:26:18 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 2 Apr 2020 12:26:18 -0400 Subject: Updated SoV documents In-Reply-To: References: <7c7f797d-b5da-7f1e-04f1-e0b3e9d2e745@oracle.com> Message-ID: Hi Tobi; I don't have an answer for you offhand, but the current "eclair" model eliminates one of the reasons we needed the new syntax for C_Class -- that a class name could refer to two VM-level types. This is no longer the case; each VM type corresponds to exactly one classfile. The reason I can't give you an unqualified yes at this point is that I am not sure whether we still need to include a "Q" somewhere as a preload signal.? When Q descriptors appear in certain places (e.g., field declarations), they generate load order constraints, and in other places, they generate nullity assumptions. I will leave it to Frederic to answer whether we have run out of reasons to need anything other than binary class names in C_Class structures.? (It would be nice if we have!) On 4/1/2020 10:19 AM, Tobi Ajila wrote: > > Hi Brian > > Thanks for the updated SoV docs. In section 4, it mentions: > > > In most cases, such as field descriptors and method descriptors, > uses of C.ref is translated as LC$ref;, uses of C.val is translated as > QC$val;, > > In the LW2 spec the `name_index` in `CONSTANT_Class_info` structures > could refer to "binary class or interface name" as well as > "ReferenceType descriptors" which referenced UTF8s with 'Q' > descriptors. In LW2 inline-types were both nullable and null-free so > it was necessary to have ReferenceType descriptors in order make a > distinction in CONSTANT_Class_info structures for things like > allocating arrays. With the new model inline-types can only be > null-free, so will the CONSTANT_Class_info structures be limited to > binary class or interface names? or will ReferenceType descriptors be > used for inline-types? > > --Tobi > > "valhalla-spec-experts" > wrote on 2020/03/27 > 03:59:38 PM: > > > From: Brian Goetz > > To: valhalla-spec-experts > > Date: 2020/03/27 03:59 PM > > Subject: [EXTERNAL] Updated SoV documents > > Sent by: "valhalla-spec-experts" > bounces at openjdk.java.net> > > > > I've updated the SoV documents, including the new sections on VM > > model and translation: > > > > http://cr.openjdk.java.net/~briangoetz/valhalla/sov/01-background.html > > > From frederic.parain at oracle.com Fri Apr 3 17:08:24 2020 From: frederic.parain at oracle.com (Frederic Parain) Date: Fri, 3 Apr 2020 13:08:24 -0400 Subject: Updated SoV documents In-Reply-To: References: <7c7f797d-b5da-7f1e-04f1-e0b3e9d2e745@oracle.com> Message-ID: Hi, This is a good question. The ReferenceType descriptor was added to make the distinction between the two VM-level type sharing the same class name. The CONSTANT_Class_info is used by the following bytecodes: - ldc - new - defaultvalue - anewarray - multianewarray - checkcast - instanceof I?m not seeing any case now where a ReferenceType would be needed. All these bytecodes can work properly with the old fashioned CONSTANT_Class_info. For bytecodes like new, defaultvalue, anewarray, multianewarray, the class is loaded, so verifications are performed at runtime (for instance, new can throw an InstantiationError if the loaded type is an inline type), and layout information are available to allocate structures likes arrays. The Q-signature is used as a marker for inline types (null-free/immutable/identity-free), but also as a pre-loading and eager-loading signal. The places where it is used as such a signal are the descriptors in field_info and method_info, which point directly to an Utf8 entry. CONSTANT_Class_info is also used for the this_class and super_class fields at the classfile top-level structure. In both cases, there?s no need to have a ReferenceType: no distinction is needed, and these types are loaded anyway, so verification is performed on loaded types. Unless I?ve missed something, I would say that we are clear to return CONSTANT_Class_info to its pre-LW2 format, supporting only "binary class or interface name?. Fred > On Apr 2, 2020, at 12:26, Brian Goetz wrote: > > Hi Tobi; > > I don't have an answer for you offhand, but the current "eclair" model eliminates one of the reasons we needed the new syntax for C_Class -- that a class name could refer to two VM-level types. This is no longer the case; each VM type corresponds to exactly one classfile. > > The reason I can't give you an unqualified yes at this point is that I am not sure whether we still need to include a "Q" somewhere as a preload signal. When Q descriptors appear in certain places (e.g., field declarations), they generate load order constraints, and in other places, they generate nullity assumptions. > > I will leave it to Frederic to answer whether we have run out of reasons to need anything other than binary class names in C_Class structures. (It would be nice if we have!) > > On 4/1/2020 10:19 AM, Tobi Ajila wrote: >> Hi Brian >> >> Thanks for the updated SoV docs. In section 4, it mentions: >> >> > In most cases, such as field descriptors and method descriptors, uses of C.ref is translated as LC$ref;, uses of C.val is translated as QC$val;, >> >> In the LW2 spec the `name_index` in `CONSTANT_Class_info` structures could refer to "binary class or interface name" as well as "ReferenceType descriptors" which referenced UTF8s with 'Q' descriptors. In LW2 inline-types were both nullable and null-free so it was necessary to have ReferenceType descriptors in order make a distinction in CONSTANT_Class_info structures for things like allocating arrays. With the new model inline-types can only be null-free, so will the CONSTANT_Class_info structures be limited to binary class or interface names? or will ReferenceType descriptors be used for inline-types? >> >> --Tobi >> >> "valhalla-spec-experts" wrote on 2020/03/27 03:59:38 PM: >> >> > From: Brian Goetz >> > To: valhalla-spec-experts >> > Date: 2020/03/27 03:59 PM >> > Subject: [EXTERNAL] Updated SoV documents >> > Sent by: "valhalla-spec-experts" > > bounces at openjdk.java.net> >> > >> > I've updated the SoV documents, including the new sections on VM >> > model and translation: >> > >> > http://cr.openjdk.java.net/~briangoetz/valhalla/sov/01-background.html >> > >> >> > From frederic.parain at oracle.com Fri Apr 3 17:12:18 2020 From: frederic.parain at oracle.com (Frederic Parain) Date: Fri, 3 Apr 2020 13:12:18 -0400 Subject: Updated SoV documents In-Reply-To: References: <7c7f797d-b5da-7f1e-04f1-e0b3e9d2e745@oracle.com> Message-ID: I haven?t mentioned bytecodes that are using CONSTANT_Class_info indirectly, all the get/put and invoke bytecodes which are using CONSTANT_Class_info through CONSTANT_Methodref_info and CONSTANT_Fieldref_info. Anyway, they don?t need ReferenceType either. Fred > On Apr 3, 2020, at 13:08, Frederic Parain wrote: > > Hi, > > This is a good question. > > The ReferenceType descriptor was added to make the distinction between the two VM-level > type sharing the same class name. The CONSTANT_Class_info is used by the following bytecodes: > - ldc > - new > - defaultvalue > - anewarray > - multianewarray > - checkcast > - instanceof > > I?m not seeing any case now where a ReferenceType would be needed. All these bytecodes can > work properly with the old fashioned CONSTANT_Class_info. For bytecodes like new, defaultvalue, > anewarray, multianewarray, the class is loaded, so verifications are performed at runtime > (for instance, new can throw an InstantiationError if the loaded type is an inline type), > and layout information are available to allocate structures likes arrays. > > The Q-signature is used as a marker for inline types (null-free/immutable/identity-free), > but also as a pre-loading and eager-loading signal. The places where it is used as such a > signal are the descriptors in field_info and method_info, which point directly to an > Utf8 entry. > > CONSTANT_Class_info is also used for the this_class and super_class fields at the classfile > top-level structure. In both cases, there?s no need to have a ReferenceType: no distinction > is needed, and these types are loaded anyway, so verification is performed on loaded types. > > Unless I?ve missed something, I would say that we are clear to return CONSTANT_Class_info > to its pre-LW2 format, supporting only "binary class or interface name?. > > Fred > > >> On Apr 2, 2020, at 12:26, Brian Goetz wrote: >> >> Hi Tobi; >> >> I don't have an answer for you offhand, but the current "eclair" model eliminates one of the reasons we needed the new syntax for C_Class -- that a class name could refer to two VM-level types. This is no longer the case; each VM type corresponds to exactly one classfile. >> >> The reason I can't give you an unqualified yes at this point is that I am not sure whether we still need to include a "Q" somewhere as a preload signal. When Q descriptors appear in certain places (e.g., field declarations), they generate load order constraints, and in other places, they generate nullity assumptions. >> >> I will leave it to Frederic to answer whether we have run out of reasons to need anything other than binary class names in C_Class structures. (It would be nice if we have!) >> >> On 4/1/2020 10:19 AM, Tobi Ajila wrote: >>> Hi Brian >>> >>> Thanks for the updated SoV docs. In section 4, it mentions: >>> >>>> In most cases, such as field descriptors and method descriptors, uses of C.ref is translated as LC$ref;, uses of C.val is translated as QC$val;, >>> >>> In the LW2 spec the `name_index` in `CONSTANT_Class_info` structures could refer to "binary class or interface name" as well as "ReferenceType descriptors" which referenced UTF8s with 'Q' descriptors. In LW2 inline-types were both nullable and null-free so it was necessary to have ReferenceType descriptors in order make a distinction in CONSTANT_Class_info structures for things like allocating arrays. With the new model inline-types can only be null-free, so will the CONSTANT_Class_info structures be limited to binary class or interface names? or will ReferenceType descriptors be used for inline-types? >>> >>> --Tobi >>> >>> "valhalla-spec-experts" wrote on 2020/03/27 03:59:38 PM: >>> >>>> From: Brian Goetz >>>> To: valhalla-spec-experts >>>> Date: 2020/03/27 03:59 PM >>>> Subject: [EXTERNAL] Updated SoV documents >>>> Sent by: "valhalla-spec-experts" >>> bounces at openjdk.java.net> >>>> >>>> I've updated the SoV documents, including the new sections on VM >>>> model and translation: >>>> >>>> http://cr.openjdk.java.net/~briangoetz/valhalla/sov/01-background.html >>>> >>> >>> >> > From brian.goetz at oracle.com Wed Apr 8 16:54:03 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 8 Apr 2020 12:54:03 -0400 Subject: IdentityObject and InlineObject Message-ID: This is a good time to review the motivations for Identity/InlineObject, and see if there's anything we want to tweak about it. There are two main branches of the motivation here: pedagogical and functional. Pedagogically, we're asking users to amend their view of "object" to allow for some objects to have identity, and some objects to not have identity, and supertypes are one of the prime ways we capture these sorts of divisions.? It's not an accident that we have both the informal statement "everything (except primitives) is an object" and the hierarchy constraint "all classes extend (implicitly or not) java.lang.Object".? Not only is Object a place to hang the behavior that is common to all objects (equality, etc), its position at the root of the hierarchy sends a message that conditions how we think about objects.? The intent of partitioning Object into IdentityObject and InlineObject is an attempt to capture the same. Functionally, there are operations that apply only to identity objects, such as identity equality, synchronization, Object::wait, etc.? Some of these have been totalized appropriately (such as `==`); others are partial.? Having synchronization be partial, without offering developers a way to express "if I tried to synchronize on this thing, would it throw", just makes Java less reliable, so we want a way to express identity both in the dynamic type system (`instanceof IdentityObject`) and the static type system (``). We also thought, at one point in time, that InlineObject and IdentityObject would be a sensible place to put new methods or default implementations of Object methods.? However, as the design has evolved, the need for this has gone away.? This opens the door to a new possibility, which I'd like to evaluate: just have one of them.? (And, if we only have one, the move is forced: IdentityObject.) In this world, we'd just have IdentityObject, which would be viewed as a refinement of Object -- "Object, with identity". Identity classes would implement it implicitly, as today.? The pedagogy would then be, instead of "there are two disjoint kinds of Object", be "Some objects are enhanced with identity."? You'd still be able to say ??? x instanceof IdentityObject and ??? void foo(IdentityObject o) { ... } and ??? class Foo { ... } as a way of detecting the refinement, but not the opposite.? So the questions are: ?- Pedagogically, does this help move users to the right mental model, or does the symmetric model do a better job? ?- Functionally, is there anything we might do with InlineObject, that we would miss? Secondarily: ?- If we take this step, is `IdentityObject` still the best name? From john.r.rose at oracle.com Wed Apr 8 18:43:20 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 8 Apr 2020 11:43:20 -0700 Subject: null checks vs. class resolution, and translation strategy for casts Message-ID: The latest translation strategies for inline classes involve two classfiles, one for the actual inline class C and one for its reference projection N. The reference projection N exists to provide a name for the type ?C or null?. As we all know on this list, this is a surprisingly pleasant way to handle the problem of representing the two types. (This was a surprise to me; I had assumed from the beginning of the project that our build-out of new descriptors, including Q-types, would inevitably provide the natural way for the JVM to express null vs. non-null versions of the same nominal class. But this failed to correspond to a language-level type system that was workable, and also broke binary compatibility in some cases where we wished to migrate old L-types to new Q-types. Having names for both C and N fixes both problems, with surprisingly little cost to the JVM?s model of types.) But there?s a problem in this translation strategy with null values which needs resolution. To be quite precise, this problem requires careful non-resolution. The issue is the exact sequencing of the JVM?s intrinsic null-checking operations as applied to types which may or may not be inline classes. (One of the delights of working on a compatible language a quarter century old is that there?s always more to the story, because however simple your model of things might seem, there?s always some 25-year-old constraint you have to cope with, that adds surprising complexity to your simple mental model. Today?s topic is null checking of the instanceof and checkcast instructions, which we just discussed in a Zoom meeting?special thanks to Dan H. and Remi and Fred P. for guiding me in this topic.) The static operand of an instanceof or checklist instruction indexes a constant poool entry of type CONSTANT_Class_info (defined in JVMS ?4.4.1). Such a C_Class entry is resolvable. Indeed, bytecodes that use such an entry are specified to resolve it first, which, may cause a cascade of side effects including loading a classfile, if it has not already been loaded. ?6.5 says this about instanceof (I have added numbers but nothing else): > 1. The run-time constant pool item at the index must be a symbolic > reference to a class, array, or interface type. > > 2. If objectref is null, the instanceof instruction pushes an int > result of 0 as an int onto the operand stack. > > 3. Otherwise, the named class, array, or interface type is resolved > (?5.4.3.1). The corresponding documentation for checkcast is identical except (as you might expect) for this step: > 2. If objectref is null, then the operand stack is unchanged. Step 1 says, ?you must point to a C_Class?. This is checked when the class file containing the instruction is loaded. This step does *not* call for any classfiles to be loaded. Step 2 handles the null case. Step 3 requires that the C_Class reference be resolved, so that the resolved class can be used to finish the instruction. The next step (4) is not so important here but I?ll include it here for completeness, for both instanceof and checkcast: > 4. If objectref is an instance of the resolved class or array type, or > implements the resolved interface, the instanceof instruction pushes > an int result of 1 as an int onto the operand stack; otherwise, it > pushes an int result of 0. > 4. If objectref can be cast to the resolved class, array, or interface > type, the operand stack is unchanged; otherwise, the checkcast > instruction throws a ClassCastException. Notice that if the object reference on the stack is null then step 2 finishes the instruction, and step 3 is not executed to load the referenced class (nor is step 4 executed). This is a little bit inconvenient in the case of a checkcast to an inline class type. The Java language requires that a cast to an inline class must always fail on null, while a cast to a regular identity class must always succeed on null. (If we ever add other null-rejecting types to the language, similar points will hold for their casts.) This means that checkcast is not exactly right as a translation for source-level cast to an inline type. You might think the ordering of steps 2 and 3 is an unimportant optimization: Why bother to do the work of loading the class if you know the outcome of the instruction (because the operand happens to be null)? It?s a little more than an optimization, though. What would happen if we were to switch the order of steps 2 and 3, so that the class is always loaded? Could we switch the order of checks in the JVM, moving forward from here, so that the Java language compiler can use checkcast to translate inline type casts? Or, does it even matter; why not just translate with the existing instruction even if it does let nulls through? First, the existing behavior is important, to some extent. If we were to switch steps 2 and 3, existing programs would change their behavior during bootstrapping (class loading). Suppose some class X is referred to in a checkcast instruction, and the early behavior of some program executes this instruction before X is loaded. At that point, the only possible operand of the instruction is null (since there are no instances of X around yet). The checkcast instruction will leave X on disk, and the JVM will wait for some other event to trigger X?s loading. In fact, X might not even exist at all; perhaps it?s an optional component that is never dynamically loaded. Java?s dynamic linking model encourages programs to be structured this way (whether or not it?s a good idea in any particular case). Yes, we have static frameworks like the module system, but they co-exist with the original model of Java, which allows loading decisions to be deferred until resolution of a symbolic reference. Even if X exists in the application performing an early checkcast to X on a null, it may be incorrect to load X the first time a checkcast instruction refers to it. The cascade of side effects that arise from the resolution of X do not involve running X?s static initializers () but they can involve running various methods on class loaders, including those that may be defined by the application. If X has not been loaded yet, perhaps its eventual class loader is not ready to run, and so a checkcast may cause bootstrap errors in the application, if the semantics of the checkcast are changed by switching steps 2 and 3 above. (BTW, people who work on ahead-of-time compilers for Java sometimes wish they could switch steps 2 and 3, so that they can assume that, by the time a cast on X executes, there is already a predictably stable definition of X in the application. This is one of the many points where Java?s dynamic linking model makes AOT hard.) OK, so applications may fail to configure themselves correctly if we switch steps 2 and 3 above. (Who knew??) What happens if we leave the instruction as it is and also continue to use it to translate Java language casts of inline types? Consider this code: interface Pointable { ? } inline class Point implements Pointable { ? } Point getThePoint(Pointable ref) { return (Point) ref; } The method is never allowed to return null, because null is not part of the value set of Point. But if the method were implemented using a single checkcast instruction, then nulls would be allowed to leak out, since checkcast must pass nulls without complaint. This would be a simple case of a more general problem we might call ?null pollution?, when the JVM is presented with the possibility of a null value where it is expecting an instance of an inline class. Using null-permissive checkcast instructions to implement cast expressions which are null-rejecting would allow polluting nulls to travel to places where they should not be allowed. Even if the language allows null pollution in some places (which is not the case here), the polluting nulls are likely to have a performance cost, since the JVM must somehow track the null-ness of a value when it would prefer just to break it up into its fields. I think this example (and also more subtle ones) proves that the static compiler needs to translate casts to inline classes differently from casts to regular identity classes. (As soon as we admit that we gate translation on static properties of classes, the problem of binary compatibility arises. If classfile W was compiled in 1998 with a checkcast to some X which was an identity class, and in 2031 X migrates to an inline class, then W must execute, in some sense, without recompiling. If W were recompiled then the checkcast to X would be null-rejecting, but as originally compiled in 1998 it is null-restrictive. Both behaviors must be somehow compatible with the overall contracts of binary compatibility. I think that our recent revisions of translation strategies make this work out pretty well, since X is likely to be migrated to the reference projection of its inline class, so null pollution will not be a problem.) I have a proposal for a translation strategy: 1. Translate casts to inline classes differently from ?classic? casts. Add an extra step of null hostility. For very low-level reasons, I suggest using ?ldc X? followed by Class::cast. Generally speaking, it?s a reasonable move to use reflective API points (like Class::cast) on constant metadata (like X.class) to implement language semantics. The following alternatives are also possible; I present them in decreasing order of preference: 2. Use invokedynamic to roll our own instruction. It will be a trivial BSM since we are really just doing an asType operation. But I think this is probably overkill, despite my fondness for indy. 3. Translate to Object::getClass followed by pop. JVMs are likely to optimize this even better than Class::cast, since getClass is final and also probably an intrinsic. But the first proposal is cleaner, and also has better binary compatibility properties, because it works for both kind of classes. 4. Use Objects::requireNonNull instead of getClass. That?s what users are supposed to say, after all. But JVMs are slightly more likely to optimize getClass. 5. Use test-and-branch bytecodes instead of methods. Please, no; control flow is harder to optimize than simple method calls. In general, every additional bytecode you add to the idioms of a translation strategy slightly reduces the probability that it will be properly optimized by the JIT. All the other options are compact, using 1 or 2 instructions. 6. Consider adding an eager-resolution option in some form to good old checkcast. Basically, allow an annotated instruction which swaps steps 2 and 3 above. We had something like this at one point when we allowed CONSTANT_Class names of the form ?QPoint;?; the special semicolon could be taken to trigger eager loading, or at least prove that nulls are to be rejected. I don?t think this is the right place to put the primitive. ? John From john.r.rose at oracle.com Wed Apr 8 22:46:08 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 8 Apr 2020 15:46:08 -0700 Subject: ClassValue performance model and Record::toString Message-ID: This note is prompted by work in a parallel project, Amber, on the implementation record types, but is properly a JVM question about JSR 292 functionality. Since we?ve got a quorum of experts here, and since we briefly raised the topic this morning on a Zoom chat, I?ll raise the question here of ClassValue performance. I?m BCC-ing amber-spec-experts so they know we are takling about this. (In fact the EGs overlap.) JSR 292 introduced ClassValue as a hook for libraries (especially dynamic language implementations) to efficiently store library specific metadata on JVM classes. A general use case envisioned was to store method handles (or tuples of them) on classes, where a lazy link step (tied to the semantics of ClassValue::get) would materialize the required M?s as needed. A specific use case was to be able to create extensible v-table-like structures, where a CV would embody a v-table position, and each CV::get binding would embody a filled slot at that v-table position, for a particular class. The assumption was that dynamic languages using CV would continue to use the JVM?s built-in class mechanism for part or all of their own types, and also that it would be helpful for a dynamic language to adjoin metadata to system classes like java.lang.String. Both tactics have been used in the field. In the future, template classes may provide an even richer substrate for the types of non-Java languages. JSR 292 was envisioned for dynamic languages, but was built according to the inherent capabilities of the JVM, and so eventually (actually, in the next release!) it has been used for Java language implementations as well (indy for lambda). ClassValue has not yet been used to implement Java language features, but I believe the time may have come to do so. The general use case I have in mind is an efficient translation strategy for generic algorithms, where the genericity is in the receiver type. The specific use case is the default toString method of records (and also the equals and hashCode methods). The logic of this method is generic over the receiver type. For each record type (unless that record type overrides its toString method in source code), the toString method is defined to iterate over the fields of the record type, and produce a printed representation that mentions both the names and values of the fields. The name of the record?s class is also mentioned. If you ask an intermediate Java coder for an implementation of this spec., you will get something resembling an interpreter which walks over the metadata of ?this.getClass()? and collects the necessary strings into a string builder. If you then deliver this code to users, after about a microsecond you will get complaints about its performance. We?re old hands who don?t fall for such traps, so we asked an experienced coder for better code. That code runs the interpreter-like logic once per distinct record type, collecting the distinct field accesses and folding up the string concatenations into a spongy mass of method handles, depositing the result in a cache. That?s better! (Programming with method handles is, alas, not an improvement over source code. Java hasn?t found its best self yet for doing partial evaluation algorithms, though there is good work out there, like Truffle.) In order not to have bad performance numbers, we are also preconditioning the v-table slot for each record?s toString method, as follows: 0. If the record already has a source-code definition, do nothing special. 1. Otherwise, synthesize a synthetic override method to Object::toString which contains a single indy instruction. (There is also data movement via aload and return.) 2. Set up the indy to run the fancy partial MH-builder mentioned above, the first time, and use the cached MH the second time. 3. Profit. In essence, toString works like a generic algorithm, where the generic type parameter is the receiver type. (If we had template methods we?d have another route to take but not today?) This works great. But there?s a flaw, because it doesn?t use ClassValue. As far as I can tell, it would be better for the translation strategy to *not* generate synthetic methods, but instead to put steps 1. and 2. above into a plain old Java method called Record::toString. This method would call x=this.getClass() and then y=R_TOSTRING.get(x) and then y.invokeExact(this). Non-use of CV is not the flaw, it?s the cause of the flaw. The flaw is apparent if you read the javadoc for Record::toString. It doesn?t say there?s a method there (because there isn?t) but it says weaselly stuff about ?the default method provided does this and that?. In a purely dynamic OOL, the default method is just method bound to Record::toString, and it?s active as long as nobody overrides it (or calls super.toString). People spend years learning to reason about overrides in OOLs like Java, and we should cater to that. We could in this case, but we don?t, because we are pulling a non-OOL trick under the covers, and we have to be honest about it in the Javadoc. So there?s a concern with CV (though I don?t think an overriding one) that we don?t get to step 3 and profit, because the lookups of x and y appear to be interpreter-like overheads. Won?t record types suffer in performance by having those extra indirections happen every time toString (or equals or hashCode) is called? (This problem isn?t unique to Records, but Records are an early case of this sort of problem, of the need for link-time optimization of inheritable OO methods. If you look around you might find similar opportunities with interfaces and default methods.) This is where CV has to get up out of its chair and make itself useful. I think the JVM should take three steps, two sooner and the other later, and both without changing any public API points. 1. Encourage the JIT to constant-fold through ClassValue::get. This would fold up the proposed Record::toString method at all points where the type of the receiver record is known to the JIT. (That?s most places.) 2. Ensure that, if the operand to CV::get is not constant, we get good code anyway. (This is already true, probably.) Look for any small optimization cleanups getting through CV::get and on into MH::invokeExact. 3. Later on, consider v-table slot splitting in response to polymorphic methods which perform CV::get on their receiver. In general, v-table slot splitting is the practice of installing differently compiled code in different v-table slots of the same method. It can make sense if the JIT can do different jobs optimizing the same code on different classes of receivers. It?s usually a heroic hand optimization, but can also be done by the JVM. One more item, not directly related to CV?s but related to the above optimizations: 4. We should invest in one or more auto-bridging features in the JVM, where a call site (such as MyRecord::toString) can be rerouted through an intermediate step before it gets to the built-in target mandated by the JVMS (such as Object::toString or Record::toString), and can also be routed somewhere even if the supposed target method doesn?t even exist. Perhaps the target method symbolic reference is Foo::equals(int) and statically matching method is Foo::equals(Object); normally the static compiler puts in an auto-boxing step to fix the descriptor but there are reasons to consider a more dynamic bridging solution. Such a rerouting decision would be very naturally cached in v-table slots, obviating some or all of step 3 above. In the presence of feature #4, we might rewrite Record::toString to (somehow) advertise that it had no regular method body, but that it would be very happy to bridge any and all calls, using some advertised BSM, and decoupling the implementation from ClassValue. This implementation decision could be hidden from the user (and the Javadoc), but only if we did the ClassValue trick today, so we could advertise Record::toString as a regular old object-oriented method (with clever optimizations inside its implementation, natch). So, let?s take ClassValue off the bench, and start warming up Bridge-O-Matic. ? John From john.r.rose at oracle.com Thu Apr 9 05:29:04 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 8 Apr 2020 22:29:04 -0700 Subject: access control for withfield bytecode, compared to putfield Message-ID: <5DD7A25B-BBDD-4BFA-BA86-E0BAC92101D6@oracle.com> In the Java language fields can be final or not, and independently can be access controlled at one of four levels of access: public, protected, package, and private. Final fields cannot be written to except under very narrow circumstances: (a) In an initialization block (static initializer or constructor body), and (b) only if the static compiler can prove there has been no previous write (based on the rules of the language). We are adding inline classes, whose non-static fields are always final. (There are possible meanings for non-final fields of inline classes, but nothing I?m saying today interacts or interferes with any known such meanings.) Behaviorally, an inline class behaves like a class with all-final non-static fields, *and* it has its identity radically suppressed by the JVM. In the language, a constructor for an inline class is approximately indistinguishable from a constructor for a regular class with all-final non-static fields. In particular, a constructor of any class (inline or regular identity) is empowered, by rules of the the language, to set each of its (final, non-static) fields exactly once along any path through the constructor. All of this hangs together nicely. When we translate to the JVM, the reading of any non-static field always uses the getfield instruction, and the access checks built into the JVM enforce the language access rules for that field?and this is true equally for inline and identity classes (the JVM doesn?t care). However, we have to use distinct tactics for translating assignments to fields. The existing putfield instruction has no possible applicability to inline classes, because it assumes you can pass it an instance pointer, execute it, and the *same instance pointer* will refer to the updated instance. This cannot possibly work with inline classes (unless we add a whole new layer of ?larval? states to inline classes?which would not be thrifty design). Instead, setting the field of an inline class needs a new bytecode , a new sibling of getfield and putfield, which we call withfield. Its output is a new instance of the same inline class whose field values are all identical to those in the old instance, except for the one field referred to by the withfield instruction. Thus: * getfield consumes a reference and returns a value (I) ? (F) * putfield consumes both and returns a side effect (I F) & state ? () & state? * withfield consumes same as putfield and produces a new instance (I F) ? (I?) The access checking rules are fairly uniform for all of these instructions. If the field F of C has protection level P, unless a client has access to level P of C, then it cannot execute (cannot even resolve) the instruction that tries to access F. In the case of putfield or withfield, if F is final (and for withfield that is currently always the case, though that could change), then an additional check is made, to ensure that F is only being set in a legitimate context. More in a moment on what ?legitimate? means for this ?context?. The getfield instruction only has to pass the access check, and then the client has full access to read the value of the field. This works pleasingly like the source-level expression which fetches the field value. Currently, for a non-static final field, both ?putfield? and ?withfield? are generated only inside of constructors, which have rigid rules, in the source language, that ensure nothing too fishy can happen. For an identity class C, it would be extremely fishy if the classfile of C were able to execute putfield instructions outside of one of C?s constructors. The reason for this is that a constructor of C would be able to produce a supposedly all-final instance of C, but then some other method of C would be (in principle) be able to overwrite one of C?s supposedly final fields with some other value, by executing a putfield instruction in that other method. Now, the JVM doesn?t fully trust final fields even today (because they change state at most once from default to some other value), but if maliciously spun classfiles were able to perform ?putfield? at will on fully constructed objects, it might be possible to create paradoxes that could lead to unpredictable behavior. For this reason, not only doesn?t the JVM fully trust final fields, but it also forbids classes from executing putfield on their own final fields, except inside of constructors. In essence, putfield on a final field is a special restricted operating mode of putfield which has unusually tight restrictions on its execution. In this note I?d like to call it out with a special name, putfield-on-a-final. Note that the JVM does *not* fully enforce the Java source language rules for field initialization: At the JVM level, a constructor can run putfield-on-a-final, on some given field, zero, one, or many times, where the Java language requires at most one, and exactly one on normal exits. The JVM simply provides a reasonable backstop check, preventing certain failure modes due either to javac bugs or (what?s more sinister) intentionally broken class files. The main responsibility for ensuring the integrity of some class C is, and always will be, C?s compilation unit C.java, as faithfully compiled by javac into a nest of classes containing at least C.class maybe other nestmates. This is an important point to back up and take notice of: While the JVM can perform some basic checks to help some class C maintain its encapsulation boundary, the responsibility for the meaning of the encapsulation, and the restrictions and/or freedoms within that boundary, are the sole responsibility of the programmer of C.java. If I, the author of C, am claiming that, of two fields, one is always non-null, then it is up to me to enforce those rules in all states of my class, including constructors (start states) and any methods which can create new states (whether constructors or regular methods). A working hypothesis on our project so far has been that withfield is so much like putfield, and inline instance fields are so much like final identity instance fields, that parallel restrictions are appropriate for the two instructions. Penciling this out, we would get to a place where a class C can only issue putfield or withfield instructions inside its own constructors. This is a consistent view, but I do not believe that it is the best view, and I?d like to decouple withfield from putfield-on-a-final to be more like plain old putfield, in some ways. My aim here is to keep withfield alive as a tool for likely future translation strategies (including of non-Java languages), which exposes, not the current envisioned uses of withfield in Java constructors, but its natural set of capabilities in the JVM. What is the natural set of capabilities of withfield? It is more basic and fundamental than putfield-on-a-final, and at the same time does *more* than putfield-on-a-final. Note that putfield-on-a-final is just one operation out of a suite of required operations in a constructor of a class (since you need a putfield-on-a-final for each of the class?s final fields, according to Java rules). Note on the other hand that withfield has the same effect as running a constructor which copies out all the old fields from the old instance and writes the new value into the selected field, then returns the new instance. Seen from this point of view, withfield is both simpler and more powerful than putfield-on-a-final, and does not fit at all into an easy analogy. The withfield instruction is also inherently more secure than putfield-on-a-final, because its design does not allow it to invalidate any pre-existing instance; it can only ever create a new instance. The set of security failure modes for withfield is completely different from putfield-on-a-final. This means that there is no particular reason to restrict withfield to execute only in constructors. What about creating an *invalid* new instance? Well, that?s where the JVM says, ?it?s not my responsibility?. As noted above, the sole responsibility for defining and enforcing the invariants of an encapsulation is the human author of the original source file. The JVM protects this encapsulation, not by reading the user?s mind, but by enforcing boundaries, primarily the boundary around the nest of classes that result from the compilation of C.java. Within the nest, any type can access any private member of any nestmate. Outside the nest, private members are strictly inaccessible. (This strict rule can be bent by special reflection modes, and by nestmate injection, but it can?t be broken.) Under this theory, the withfield instruction is the elemental factory mechanism for creating new inline classes. The coder of the source file defining the field has full control to create new instances with arbitrary field settings. In the current language, this still goes only through user-written constructors, but that could change. In any case, the JVM design needs to support the language and *also* natural abilities of the JVM. This leads me to what I think is the right design for withfield. The permission to execute withfield should be derived, not from its placement within a constructor, but rather from its placement in a nest. In effect, when you execute withfield, you should get access checked as if the field you were referring to is private, even if it has some other marking (public, protected, package). That other marking is good and useful, but it pertains only to getfield. This doesn?t call for any change to today?s translation strategies, but it unlocks the JVM?s natural abilities for future strategies and features. Why make the change? After all, restricting withfield like putfield-on-a-final doesn?t hurt anything today. Suppose some language feature in the future requires ad hoc field replacement. (I call one version of such a feature ?reconstructors?, and another ?with-expressions?.) In that case, javac can contrive synthetic constructors which isolate all required withfield instructions, so that the putfield-on-a-final constraints can be satisfied. But there?s a cost to this: Those synthetic constructors become extra noise in the classfile, and if they are opened outside the nest, they can be security hazards. Another cost is the loss of dynamicity: You can?t inject a hidden class to work on your inline class if the hidden class can only define its own constructors, right? But I think we have learned some lessons about fancy compile-time adapters: They are complex, they obscure the code for the JIT, they can open up surprise encapsulation flaws, they cannot be assigned dynamically. The nestmate work improves all of these problems, by uniformly defining private access to apply equally to all members of a nest, not just to a single class. Although the nestmate access rules themselves are more complex than the original JVM rules for private access, the overall system is better because we can rip out the various synthetic bridges we used to require. The overall model for ?what does private mean?? is simpler, not more complex: ?private means all nestmates are equal?. On balance this helps security by simplifying the model, so that bridge methods can be dropped. I want to keep the model simple, and not introduce (today) a new kind of access control just for the withfield instruction, nor do I want it to mimic the baroque and complex access control for putfield-on-a-final. To summarize: The simplest rule for access checking a withfield instruction is to say, ?pretend the field was declared private, and perform access checks?. That?s it; the rest follows from the rules we have already laid down. Thus, the security analysis of a class can concentrate on the access declarations of its fields. There will be no pressure to generate adapter methods regardless of where the language goes. Other languages can use the natural semantics of ?withfield? to create and enforce their own notions of encapsulation. And future versions of Java can use indy, condy, hidden classes, and whatever else to create flexible methods, on the fly, that work with inline classes. There are two anchors to my argument here. One is that the access control of putfield-on-a-final is a bad model to replicate for a new instruction. The other is that we shouldn?t limit ourselves to the current uses of withfield (as a surrogate for putfield-on-a-final). Let?s design for the future, or at least for the natural capabilities of the JVM, not for the exact output of today?s translation strategies. ? John From frederic.parain at oracle.com Thu Apr 9 15:05:32 2020 From: frederic.parain at oracle.com (Frederic Parain) Date: Thu, 9 Apr 2020 11:05:32 -0400 Subject: IdentityObject and InlineObject In-Reply-To: References: Message-ID: <8F05BA23-5C3E-47EB-859E-2D1102CEA2C9@oracle.com> My two cents on this topic: Looking back at our previous models, the ?identity? marker and the ?inline? marker served two different and well identified purposes. The ?identity? marker signals that identity sensitive operations are allowed on a given type. This is a marker we absolutely need somehow (the alternative to surround all these operations with try { ? } catch(IMSE e) ? looks like a no-go). The use case for an inline marker was to have the guarantee that a given type can never be null, allowing developers to skip null checks in their code and write null-free algorithms. In the old model, it was achieved by accepting only Q-types which were guaranteed to be null-free. But in the new model, this null-freeness cannot be guaranteed by the InlineObject interface, because it is ? an interface! There's no plan to make all sub-types of InlineObject (including interfaces and abstract classes) null-free, and it sounds like a terrible idea to try to enforce this kind of rules. Bottom line: I think we've already lost the benefits of having an ?inline type? marker, so removing the ?InlineObject? interface would simplify the model and would not have a negative impact on what developers can do with the new user model. Fred > On Apr 8, 2020, at 12:54, Brian Goetz wrote: > > This is a good time to review the motivations for Identity/InlineObject, and see if there's anything we want to tweak about it. > > There are two main branches of the motivation here: pedagogical and functional. > > Pedagogically, we're asking users to amend their view of "object" to allow for some objects to have identity, and some objects to not have identity, and supertypes are one of the prime ways we capture these sorts of divisions. It's not an accident that we have both the informal statement "everything (except primitives) is an object" and the hierarchy constraint "all classes extend (implicitly or not) java.lang.Object". Not only is Object a place to hang the behavior that is common to all objects (equality, etc), its position at the root of the hierarchy sends a message that conditions how we think about objects. The intent of partitioning Object into IdentityObject and InlineObject is an attempt to capture the same. > > Functionally, there are operations that apply only to identity objects, such as identity equality, synchronization, Object::wait, etc. Some of these have been totalized appropriately (such as `==`); others are partial. Having synchronization be partial, without offering developers a way to express "if I tried to synchronize on this thing, would it throw", just makes Java less reliable, so we want a way to express identity both in the dynamic type system (`instanceof IdentityObject`) and the static type system (``). > > We also thought, at one point in time, that InlineObject and IdentityObject would be a sensible place to put new methods or default implementations of Object methods. However, as the design has evolved, the need for this has gone away. This opens the door to a new possibility, which I'd like to evaluate: just have one of them. (And, if we only have one, the move is forced: IdentityObject.) > > In this world, we'd just have IdentityObject, which would be viewed as a refinement of Object -- "Object, with identity". Identity classes would implement it implicitly, as today. The pedagogy would then be, instead of "there are two disjoint kinds of Object", be "Some objects are enhanced with identity." You'd still be able to say > > x instanceof IdentityObject > > and > > void foo(IdentityObject o) { ... } > > and > > class Foo { ... } > > as a way of detecting the refinement, but not the opposite. So the questions are: > > - Pedagogically, does this help move users to the right mental model, or does the symmetric model do a better job? > - Functionally, is there anything we might do with InlineObject, that we would miss? > > Secondarily: > > - If we take this step, is `IdentityObject` still the best name? > > From brian.goetz at oracle.com Thu Apr 9 19:40:45 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 9 Apr 2020 15:40:45 -0400 Subject: IdentityObject and InlineObject In-Reply-To: References: Message-ID: <2ac984c6-3691-13c6-7361-d8283f92e19b@oracle.com> > > Secondarily: > > ?- If we take this step, is `IdentityObject` still the best name? > > Possible candidates: ?- Identity, as in "class Foo implements Identity".? This says what it means, though it is quite possible that it will clash with existing types.? (This is not an absolute disqualifier, but it is a consideration.) ?- IdentityObject.? This is where we are now; it always felt a little clunky to me. ?- ObjectIdentity ("class Foo implements ObjectIdentity").? Better than IdentityObject, less likely to clash than Identity. ?- WithIdentity.? Not the best name, but less likely than Identity to clash. Others? From john.r.rose at oracle.com Thu Apr 9 20:02:44 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Apr 2020 13:02:44 -0700 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: References: Message-ID: Correction? The recommended reflective approach has a flaw (easily fixed), which makes indy my real recommendation. On Apr 8, 2020, at 11:43 AM, John Rose wrote: > ? > I have a proposal for a translation strategy: > > 1. Translate casts to inline classes differently from ?classic? > casts. Add an extra step of null hostility. For very low-level > reasons, I suggest using ?ldc X? followed by Class::cast. > > Generally speaking, it?s a reasonable move to use reflective > API points (like Class::cast) on constant metadata (like X.class) > to implement language semantics. This suggestion is incomplete. If the result of the cast is going to be used as type X, then the verifier must be pacified by adding `checkcast X`. Basically, you have to do both reflective and intrinsic cast operations, if you need to get the verifier on board, as well as do a null check. That tips me over to recommending indy instead, which was #2. Indy, that Swiss army knife of an instruction, can get it done in one. > The following alternatives are also possible; I present them > in decreasing order of preference: > > 2. Use invokedynamic to roll our own instruction. It will > be a trivial BSM since we are really just doing an asType > operation. But I think this is probably overkill, despite > my fondness for indy. For a conversion to type X, where X may be a null-hostile inline type (or any type whose semantics is not exactly covered by native checkcast), a single invokedynamic instruction will cover the operational semantics required and will also feed the right type to the verifier. It will have this signature: (Object) => X It will have a utility bootstrap method which materializes conversions, basically riffing on MethodHandles::identity and asType. (Not MethodHandles::explicitCastArguments, because we are concerned with checked reference conversions.) It will have *no extra arguments* (not even X.class), because the BSM can easily derive X.class from the return type of the method type signature passed to the BSM. ConstantCallSite convertBSM(Lookup ig1, String ig2, MethodType mt) { var mh = MethodHandles.identity(Object.class).asType(mt); return new ConstantCallSite(mh); } As such, it is a candidate for proposed simplifications to bootstrap method configuration (but not the simplest such simplifications, because of the need to feed X.class into the linkage logic). MethodHandle simplifiedConvertBSM() { return MethodHandles.identity(Object.class); } (At some point I should write up those simplifications, shouldn?t I?) ? John From brian.goetz at oracle.com Thu Apr 9 20:03:39 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 9 Apr 2020 16:03:39 -0400 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: References: Message-ID: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> > I have a proposal for a translation strategy: Casts to inline classes from their reference projections will be frequent.? Because the reference projection is sealed to permit only the value projection, a cast is morally equivalent to a null check. We want to preserve this performance model, because otherwise we're reinventing boxing. Going through `ldc X.class / invokevirtual Class.cast` will surely be slow in the interpreter, but also risks being slow elsewhere (as do many of the other options.) So let me add to your list: is it time for a `checknonnull` bytecode, which throws NPE if null, or some other more flexible checking bytecode?? (Alternatively, if we're saving bytecodes: `invokevirtual Object.`), where is a fake method that always links to a no-op, but invokevirtual NPEs on a null receiver.) From amalloy at google.com Thu Apr 9 20:14:54 2020 From: amalloy at google.com (Alan Malloy) Date: Thu, 9 Apr 2020 13:14:54 -0700 Subject: access control for withfield bytecode, compared to putfield In-Reply-To: <5DD7A25B-BBDD-4BFA-BA86-E0BAC92101D6@oracle.com> References: <5DD7A25B-BBDD-4BFA-BA86-E0BAC92101D6@oracle.com> Message-ID: This makes a lot of sense to me, John. withfield seems like a good primitive operation to "edit" inline objects, and is the kind of thing other JVM languages will surely want. It seems like a good idea to make it more permissive than putfield-on-a-final, even if the Java language doesn't (yet?) have a use for that greater flexibility. On Wed, Apr 8, 2020 at 10:30 PM John Rose wrote: > In the Java language fields can be final or not, and independently > can be access controlled at one of four levels of access: public, > protected, package, and private. > > Final fields cannot be written to except under very narrow > circumstances: (a) In an initialization block (static initializer > or constructor body), and (b) only if the static compiler can > prove there has been no previous write (based on the rules > of the language). > > We are adding inline classes, whose non-static fields are always > final. (There are possible meanings for non-final fields of inline > classes, but nothing I?m saying today interacts or interferes > with any known such meanings.) Behaviorally, an inline class > behaves like a class with all-final non-static fields, *and* it has > its identity radically suppressed by the JVM. In the language, > a constructor for an inline class is approximately indistinguishable > from a constructor for a regular class with all-final non-static fields. > In particular, a constructor of any class (inline or regular identity) > is empowered, by rules of the the language, to set each of its (final, > non-static) fields exactly once along any path through the constructor. > > All of this hangs together nicely. When we translate to the JVM, > the reading of any non-static field always uses the getfield instruction, > and the access checks built into the JVM enforce the language access > rules for that field?and this is true equally for inline and identity > classes (the JVM doesn?t care). However, we have to use distinct > tactics for translating assignments to fields. The existing putfield > instruction has no possible applicability to inline classes, because > it assumes you can pass it an instance pointer, execute it, and the > *same instance pointer* will refer to the updated instance. This > cannot possibly work with inline classes (unless we add a whole > new layer of ?larval? states to inline classes?which would not be > thrifty design). > > Instead, setting the field of an inline class needs a new bytecode , > a new sibling of getfield and putfield, which we call withfield. > Its output is a new instance of the same inline class whose > field values are all identical to those in the old instance, except > for the one field referred to by the withfield instruction. Thus: > > * getfield consumes a reference and returns a value (I) ? (F) > * putfield consumes both and returns a side effect (I F) & state ? () & > state? > * withfield consumes same as putfield and produces a new instance (I F) ? > (I?) > > The access checking rules are fairly uniform for all of these > instructions. If the field F of C has protection level P, unless a client > has access to level P of C, then it cannot execute (cannot even resolve) > the instruction that tries to access F. In the case of putfield or > withfield, if F is final (and for withfield that is currently always > the case, though that could change), then an additional check > is made, to ensure that F is only being set in a legitimate context. > More in a moment on what ?legitimate? means for this ?context?. > The getfield instruction only has to pass the access check, and then > the client has full access to read the value of the field. This works > pleasingly like the source-level expression which fetches the field > value. > > Currently, for a non-static final field, both ?putfield? and ?withfield? > are generated only inside of constructors, which have rigid rules, > in the source language, that ensure nothing too fishy can happen. > > For an identity class C, it would be extremely fishy if the classfile of > C were able to execute putfield instructions outside of one of C?s > constructors. The reason for this is that a constructor of C would > be able to produce a supposedly all-final instance of C, but then > some other method of C would be (in principle) be able to overwrite > one of C?s supposedly final fields with some other value, by executing > a putfield instruction in that other method. Now, the JVM doesn?t > fully trust final fields even today (because they change state at most > once from default to some other value), but if maliciously spun > classfiles were able to perform ?putfield? at will on fully constructed > objects, it might be possible to create paradoxes that could lead > to unpredictable behavior. For this reason, not only doesn?t the > JVM fully trust final fields, but it also forbids classes from executing > putfield on their own final fields, except inside of constructors. > In essence, putfield on a final field is a special restricted operating > mode of putfield which has unusually tight restrictions on its > execution. In this note I?d like to call it out with a special name, > putfield-on-a-final. > > Note that the JVM does *not* fully enforce the Java source language > rules for field initialization: At the JVM level, a constructor can > run putfield-on-a-final, on some given field, zero, one, or many > times, where the Java language requires at most one, and exactly > one on normal exits. The JVM simply provides a reasonable backstop > check, preventing certain failure modes due either to javac bugs > or (what?s more sinister) intentionally broken class files. > > The main responsibility for ensuring the integrity of some class > C is, and always will be, C?s compilation unit C.java, as faithfully > compiled by javac into a nest of classes containing at least C.class > maybe other nestmates. > > This is an important point to back up and take notice of: While > the JVM can perform some basic checks to help some class C maintain > its encapsulation boundary, the responsibility for the meaning > of the encapsulation, and the restrictions and/or freedoms within > that boundary, are the sole responsibility of the programmer of > C.java. If I, the author of C, am claiming that, of two fields, one > is always non-null, then it is up to me to enforce those rules in all > states of my class, including constructors (start states) and any methods > which can create new states (whether constructors or regular methods). > > A working hypothesis on our project so far has been that withfield > is so much like putfield, and inline instance fields are so much like > final identity instance fields, that parallel restrictions are appropriate > for the two instructions. Penciling this out, we would get to a place > where a class C can only issue putfield or withfield instructions inside > its own constructors. This is a consistent view, but I do not believe > that it is the best view, and I?d like to decouple withfield from > putfield-on-a-final to be more like plain old putfield, in some ways. > > My aim here is to keep withfield alive as a tool for likely future > translation strategies (including of non-Java languages), which > exposes, not the current envisioned uses of withfield in Java > constructors, but its natural set of capabilities in the JVM. > > What is the natural set of capabilities of withfield? It is more > basic and fundamental than putfield-on-a-final, and at the > same time does *more* than putfield-on-a-final. Note that > putfield-on-a-final is just one operation out of a suite of > required operations in a constructor of a class (since you > need a putfield-on-a-final for each of the class?s final fields, > according to Java rules). Note on the other hand that > withfield has the same effect as running a constructor > which copies out all the old fields from the old instance > and writes the new value into the selected field, then > returns the new instance. Seen from this point of view, > withfield is both simpler and more powerful than > putfield-on-a-final, and does not fit at all into an easy > analogy. > > The withfield instruction is also inherently more secure > than putfield-on-a-final, because its design does not allow > it to invalidate any pre-existing instance; it can only ever > create a new instance. The set of security failure modes > for withfield is completely different from putfield-on-a-final. > This means that there is no particular reason to restrict > withfield to execute only in constructors. > > What about creating an *invalid* new instance? Well, that?s > where the JVM says, ?it?s not my responsibility?. As noted > above, the sole responsibility for defining and enforcing the > invariants of an encapsulation is the human author of the > original source file. The JVM protects this encapsulation, > not by reading the user?s mind, but by enforcing boundaries, > primarily the boundary around the nest of classes that result > from the compilation of C.java. Within the nest, any type > can access any private member of any nestmate. Outside > the nest, private members are strictly inaccessible. > (This strict rule can be bent by special reflection modes, > and by nestmate injection, but it can?t be broken.) > > Under this theory, the withfield instruction is the elemental > factory mechanism for creating new inline classes. The coder > of the source file defining the field has full control to create > new instances with arbitrary field settings. In the current > language, this still goes only through user-written constructors, > but that could change. In any case, the JVM design needs to > support the language and *also* natural abilities of the JVM. > > This leads me to what I think is the right design for > withfield. The permission to execute withfield should > be derived, not from its placement within a constructor, > but rather from its placement in a nest. In effect, when > you execute withfield, you should get access checked as if > the field you were referring to is private, even if it has > some other marking (public, protected, package). That > other marking is good and useful, but it pertains only > to getfield. > > This doesn?t call for any change to today?s translation > strategies, but it unlocks the JVM?s natural abilities > for future strategies and features. > > Why make the change? After all, restricting withfield > like putfield-on-a-final doesn?t hurt anything today. > Suppose some language feature in the future requires ad > hoc field replacement. (I call one version of such a feature > ?reconstructors?, and another ?with-expressions?.) > In that case, javac can contrive synthetic constructors > which isolate all required withfield instructions, so > that the putfield-on-a-final constraints can be satisfied. > But there?s a cost to this: Those synthetic constructors > become extra noise in the classfile, and if they are opened > outside the nest, they can be security hazards. Another > cost is the loss of dynamicity: You can?t inject a hidden > class to work on your inline class if the hidden class can > only define its own constructors, right? > > But I think we have learned some lessons about fancy > compile-time adapters: They are complex, they obscure > the code for the JIT, they can open up surprise encapsulation > flaws, they cannot be assigned dynamically. The nestmate > work improves all of these problems, by uniformly defining > private access to apply equally to all members of a nest, > not just to a single class. Although the nestmate access > rules themselves are more complex than the original JVM > rules for private access, the overall system is better because > we can rip out the various synthetic bridges we used to > require. The overall model for ?what does private mean?? > is simpler, not more complex: ?private means all nestmates > are equal?. On balance this helps security by simplifying > the model, so that bridge methods can be dropped. > > I want to keep the model simple, and not introduce (today) > a new kind of access control just for the withfield instruction, > nor do I want it to mimic the baroque and complex access > control for putfield-on-a-final. > > To summarize: The simplest rule for access checking a > withfield instruction is to say, ?pretend the field was > declared private, and perform access checks?. That?s > it; the rest follows from the rules we have already laid > down. > > Thus, the security analysis of a class can concentrate > on the access declarations of its fields. There will be > no pressure to generate adapter methods regardless > of where the language goes. Other languages can > use the natural semantics of ?withfield? to create > and enforce their own notions of encapsulation. > And future versions of Java can use indy, condy, > hidden classes, and whatever else to create flexible > methods, on the fly, that work with inline classes. > > There are two anchors to my argument here. > One is that the access control of putfield-on-a-final > is a bad model to replicate for a new instruction. > The other is that we shouldn?t limit ourselves to > the current uses of withfield (as a surrogate for > putfield-on-a-final). Let?s design for the future, > or at least for the natural capabilities of the JVM, > not for the exact output of today?s translation > strategies. > > ? John From john.r.rose at oracle.com Thu Apr 9 20:16:06 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Apr 2020 13:16:06 -0700 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> Message-ID: <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> On Apr 9, 2020, at 1:03 PM, Brian Goetz wrote: > > >> I have a proposal for a translation strategy: > > Casts to inline classes from their reference projections will be frequent. Because the reference projection is sealed to permit only the value projection, a cast is morally equivalent to a null check. We want to preserve this performance model, because otherwise we're reinventing boxing. > > Going through `ldc X.class / invokevirtual Class.cast` will surely be slow in the interpreter, but also risks being slow elsewhere (as do many of the other options.) > > So let me add to your list: is it time for a `checknonnull` bytecode, which throws NPE if null, or some other more flexible checking bytecode? (Alternatively, if we're saving bytecodes: `invokevirtual Object.`), where is a fake method that always links to a no-op, but invokevirtual NPEs on a null receiver.) Um, this feels a lot like a premature optimization. Let?s not add `checknonnull` intrinsics to the interpreter (the very most expensive way to do it) until we have tried the other alternatives (Objects.requireNonNull, etc.) and have proven that the costs are noticeable. And a spec EG is not the place to evaluate such questions; it has to be demonstrated in a prototype. I see now why you are angling for verifier rules that know about sealing relations. I think that also is premature optimizations. Actually, verifier rules (not interpreter bytecodes) are the most costly way to get anything done. Sorry to be a party pooper here, but that?s how it looks right now. ? John From john.r.rose at oracle.com Thu Apr 9 20:20:17 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Apr 2020 13:20:17 -0700 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> Message-ID: <315745F0-9885-406F-84D0-D210B9250DC0@oracle.com> On Apr 9, 2020, at 1:16 PM, John Rose wrote: > > On Apr 9, 2020, at 1:03 PM, Brian Goetz wrote: >> >> >>> I have a proposal for a translation strategy: >> >> Casts to inline classes from their reference projections will be frequent. Because the reference projection is sealed to permit only the value projection, a cast is morally equivalent to a null check. We want to preserve this performance model, because otherwise we're reinventing boxing. >> >> Going through `ldc X.class / invokevirtual Class.cast` will surely be slow in the interpreter, but also risks being slow elsewhere (as do many of the other options.) >> >> So let me add to your list: is it time for a `checknonnull` bytecode, which throws NPE if null, or some other more flexible checking bytecode? (Alternatively, if we're saving bytecodes: `invokevirtual Object.`), where is a fake method that always links to a no-op, but invokevirtual NPEs on a null receiver.) > > Um, this feels a lot like a premature optimization. Let?s not add > `checknonnull` intrinsics to the interpreter (the very most > expensive way to do it) until we have tried the other alternatives > (Objects.requireNonNull, etc.) and have proven that the costs > are noticeable. And a spec EG is not the place to evaluate such > questions; it has to be demonstrated in a prototype. > > I see now why you are angling for verifier rules that know about > sealing relations. I think that also is premature optimizations. > Actually, verifier rules (not interpreter bytecodes) are the most > costly way to get anything done. > > Sorry to be a party pooper here, but that?s how it looks right now. > > ? John P.S. The Object. idea is clever, and we have done things like that in the past; the interpreter has special fast entry points for certain math functions. These were added due to certain benchmarks being slow >20 years ago; who knows if they are still relevant. We could do the same for Objects.requireNonNull; that would be a less intrusive (more sneaky) version of Object.. No specs were harmed in making this proposal. From john.r.rose at oracle.com Thu Apr 9 20:26:35 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Apr 2020 13:26:35 -0700 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <315745F0-9885-406F-84D0-D210B9250DC0@oracle.com> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> <315745F0-9885-406F-84D0-D210B9250DC0@oracle.com> Message-ID: <3604237A-23EB-4D89-B8F1-DDF40E977B7E@oracle.com> On Apr 9, 2020, at 1:20 PM, John Rose wrote: > > No specs were harmed in making this proposal. P.P.S. Although there?s no precedent yet for it except static code rewriters, we could also intrinsify certain indy instructions in the same way, as early as the interpreter. Then we?d have customized verifier rules, based on each indy instruction signature, at no runtime cost, even at startup, thanks to the intrinsification logic. There are lots of ways to skin this? orange. From forax at univ-mlv.fr Thu Apr 9 20:58:57 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 9 Apr 2020 22:58:57 +0200 (CEST) Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> Message-ID: <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> I don't fully understand why not using checkcast because from the user POV the message will be better than just NPE. R?mi ----- Mail original ----- > De: "John Rose" > ?: "Brian Goetz" > Cc: "valhalla-spec-experts" > Envoy?: Jeudi 9 Avril 2020 22:16:06 > Objet: Re: null checks vs. class resolution, and translation strategy for casts > On Apr 9, 2020, at 1:03 PM, Brian Goetz wrote: >> >> >>> I have a proposal for a translation strategy: >> >> Casts to inline classes from their reference projections will be frequent. >> Because the reference projection is sealed to permit only the value >> projection, a cast is morally equivalent to a null check. We want to preserve >> this performance model, because otherwise we're reinventing boxing. >> >> Going through `ldc X.class / invokevirtual Class.cast` will surely be slow in >> the interpreter, but also risks being slow elsewhere (as do many of the other >> options.) >> >> So let me add to your list: is it time for a `checknonnull` bytecode, which >> throws NPE if null, or some other more flexible checking bytecode? >> (Alternatively, if we're saving bytecodes: `invokevirtual >> Object.`), where is a fake method that always links to a >> no-op, but invokevirtual NPEs on a null receiver.) > > Um, this feels a lot like a premature optimization. Let?s not add > `checknonnull` intrinsics to the interpreter (the very most > expensive way to do it) until we have tried the other alternatives > (Objects.requireNonNull, etc.) and have proven that the costs > are noticeable. And a spec EG is not the place to evaluate such > questions; it has to be demonstrated in a prototype. > > I see now why you are angling for verifier rules that know about > sealing relations. I think that also is premature optimizations. > Actually, verifier rules (not interpreter bytecodes) are the most > costly way to get anything done. > > Sorry to be a party pooper here, but that?s how it looks right now. > > ? John From john.r.rose at oracle.com Thu Apr 9 21:07:10 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Apr 2020 14:07:10 -0700 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> Message-ID: <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> On Apr 9, 2020, at 1:58 PM, Remi Forax wrote: > > I don't fully understand why not using checkcast because from the user POV the message will be better than just NPE. When today I try unbox an Integer but encounter a null, my experience is ?NPE? rather than a more informative ?failed to unbox this int because you handed me a null?. I agree, generally speaking, that messages should be more informative. But surely the bar is low here, and ?NPE? is not out of bounds? That said, an indy-based solution has full information about the use site, and can be coaxed to generate whatever user experience we desire. And so would intrinsic based solutions, of which I am favorable to some but not all. Perhaps we want another (intrinsically optimized) version of Objects::requireNonNull, which takes a second argument that assists in generating a better diagnostic. ? John From john.r.rose at oracle.com Thu Apr 9 21:10:51 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Apr 2020 14:10:51 -0700 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> Message-ID: <74A3F318-6832-4019-B930-6D1CC3074BF5@oracle.com> On Apr 9, 2020, at 2:07 PM, John Rose wrote: > > Perhaps we want another (intrinsically optimized) version > of Objects::requireNonNull, which takes a second argument > that assists in generating a better diagnostic. (D?oh; there it stands in the the JDK already.) From forax at univ-mlv.fr Thu Apr 9 21:31:07 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 9 Apr 2020 23:31:07 +0200 (CEST) Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> Message-ID: <2088247380.511772.1586467867914.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Envoy?: Jeudi 9 Avril 2020 23:07:10 > Objet: Re: null checks vs. class resolution, and translation strategy for casts > On Apr 9, 2020, at 1:58 PM, Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >> I don't fully understand why not using checkcast because from the user POV the >> message will be better than just NPE. > When today I try unbox an Integer but encounter a null, > my experience is ?NPE? rather than a more informative > ?failed to unbox this int because you handed me a null?. > I agree, generally speaking, that messages should be more > informative. But surely the bar is low here, and ?NPE? is > not out of bounds? Not if JEP 358 is enable, you have a NPE on Integer.intValue(). Anyway, it's not because error message can be improved in an area that it should not be there in another area. Users on the Java side are writing a cast, so having a class cast exception because the value is null seems to be a good error message. > That said, an indy-based solution has full information > about the use site, and can be coaxed to generate whatever > user experience we desire. And so would intrinsic based > solutions, of which I am favorable to some but not all. > Perhaps we want another (intrinsically optimized) version > of Objects::requireNonNull, which takes a second argument > that assists in generating a better diagnostic. yes, indy is a way to create any new bytecode, but it also has some restrictions, the major one being that you can not using it before it has been bootstrapped. > ? John R?mi From john.r.rose at oracle.com Thu Apr 9 21:38:42 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Apr 2020 14:38:42 -0700 Subject: IdentityObject and InlineObject In-Reply-To: References: Message-ID: <8A18DA74-231C-4E2D-8260-54848770A074@oracle.com> On Apr 8, 2020, at 9:54 AM, Brian Goetz wrote: > > This is a good time to review the motivations for Identity/InlineObject, and see if there's anything we want to tweak about it. > > There are two main branches of the motivation here: pedagogical and functional. > > Pedagogically, we're asking users to amend their view of "object" to allow for some objects to have identity, and some objects to not have identity, and supertypes are one of the prime ways we capture these sorts of divisions. It's not an accident that we have both the informal statement "everything (except primitives) is an object" and the hierarchy constraint "all classes extend (implicitly or not) java.lang.Object". Not only is Object a place to hang the behavior that is common to all objects (equality, etc), its position at the root of the hierarchy sends a message that conditions how we think about objects. The intent of partitioning Object into IdentityObject and InlineObject is an attempt to capture the same. (hear hear!) These days I visualize an object?s ?identity stuff? as a unique (number-like) token that is (in some fuzzy sense) stored in the object?s header, and participates in all requests for identity-specific operations. These include: - acmp/identityHashCode - synch/wait - sitting in a WeakReference - side effects into fields (plus JMM consequences) If an object doesn?t have that extra token, it cannot do any of the above operations. (Well, some like acmp/iHC might have fixups, but they don?t work the same way, and users may see the consequences bubble up.) For sure, that token and its operations sounds like an O-O supertype. And the *lack* of the token sounds like a super of that super, which turns out to be Object (viewed restrictively). > Functionally, there are operations that apply only to identity objects, such as identity equality, synchronization, Object::wait, etc. Some of these have been totalized appropriately (such as `==`); others are partial. Having synchronization be partial, without offering developers a way to express "if I tried to synchronize on this thing, would it throw", just makes Java less reliable, so we want a way to express identity both in the dynamic type system (`instanceof IdentityObject`) and the static type system (``). > > We also thought, at one point in time, that InlineObject and IdentityObject would be a sensible place to put new methods or default implementations of Object methods. I have thought for a long time that we might want to endow inline objects with lots of special furniture just for them, like toString and equals. (You can see this in my earliest blogs on value types.) Now that Record is coming out, I see that such furniture can be factored in better ways, not tied directly to value-ness. > However, as the design has evolved, the need for this has gone away. This opens the door to a new possibility, which I'd like to evaluate: just have one of them. (And, if we only have one, the move is forced: IdentityObject.) > > In this world, we'd just have IdentityObject, which would be viewed as a refinement of Object -- "Object, with identity". Identity classes would implement it implicitly, as today. The pedagogy would then be, instead of "there are two disjoint kinds of Object", be "Some objects are enhanced with identity." You'd still be able to say > > x instanceof IdentityObject > > and > > void foo(IdentityObject o) { ... } > > and > > class Foo { ... } > > as a way of detecting the refinement, but not the opposite. So the questions are: > > - Pedagogically, does this help move users to the right mental model, or does the symmetric model do a better job? > - Functionally, is there anything we might do with InlineObject, that we would miss? As a weaker version of this question, are there any types which authors of inline classes would like to opt into, in order to reify important contracts commonly found on inline classes? Here?s the main contract, I think: ?I promise that I won?t give you race conditions.? (N.B. This promise comes in two forms, shallow and deep. Let?s just stay shallow for now; adding deep immutability is a little easier once the shallow version is defined, but the reverse does not hold.) Would InlineObject be a proxy for such a contract? Yes, sort of, but it would also (like Cloneable and Serializable) leak badly into APIs that are studiously trying to avoid making promises of that sort, or may be making them in other forms (e.g., implementation internal mutable caches transparent to the client). Possibly, we could define an interface ShallowlyImmutable which pertains (at least potentially) to *all* inlines, and also to all inline-like identity classes, notably our friends the all-final-non-static-field classes. It could be automatic on records. It could even be automatic on inlines, except maybe then we?d want users to be able to opt out of it (for arcane reasons?the inline object?s implementation logically includes mutable state which the designer has declared is shallowly local to the logical object). In the same circumstances, it could be automatic (with an opt-out) on friendly all-final identity classes. And so on. Is this a contract worth advertising? Don?t know, but it?s a possibility. And (here?s the interesting part for today) it is *not* exclusive to inlines, and perhaps not even *inclusive* of inlines. Yet it correlates with them. > Secondarily: > > - If we take this step, is `IdentityObject` still the best name? FTR, I like the terminology (which is hard-won by weeks of discussion) of ?inline? vs ?identity?. The latter term suggests that the affected object has some mysterious ?identity? property hanging about it (that fuzzy token in the header I mentioned). The term ?inline? has no such connotation, and instead feels adverbial?it?s a mode of use. It?s perfect don?t change please? The word Object is pretty annoyingly redundant, but IdentityObject is (as Doug might say) not too terrible. I also like just plain Identity (but it will clash) and also Object.Identity (feels a little like Map.Entry). ? John From john.r.rose at oracle.com Thu Apr 9 21:56:45 2020 From: john.r.rose at oracle.com (John Rose) Date: Thu, 9 Apr 2020 14:56:45 -0700 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <2088247380.511772.1586467867914.JavaMail.zimbra@u-pem.fr> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> <2088247380.511772.1586467867914.JavaMail.zimbra@u-pem.fr> Message-ID: On Apr 9, 2020, at 2:31 PM, forax at univ-mlv.fr wrote: > > yes, indy is a way to create any new bytecode, but it also has some restrictions, > the major one being that you can not using it before it has been bootstrapped. Good point; we found that with string concatenation, didn?t we? If we use indy for this, we?ll run into similar bootstrapping issues. Which reminds me that Brian has been pondering javac intrinsics for some time, as a way of replacing method calls that would ordinarily be linked and run the normal way, with preferable alternative implementations. This game could also be played (very carefully) with BSMs. That (like javac intrinsics) would sidestep the usual bootstrapping orders. So, here?s a recommendation: Use indy, and use a clunkier fallback in the same places that today use a clunkier fallback for string concatenation. And, record a line item of technical debt that we should further explore indy intrinsics, after we figure out what javac intrinsics look like. ? John From forax at univ-mlv.fr Fri Apr 10 11:19:53 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Fri, 10 Apr 2020 13:19:53 +0200 (CEST) Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> <2088247380.511772.1586467867914.JavaMail.zimbra@u-pem.fr> Message-ID: <120125113.740229.1586517593802.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > Envoy?: Jeudi 9 Avril 2020 23:56:45 > Objet: Re: null checks vs. class resolution, and translation strategy for casts > On Apr 9, 2020, at 2:31 PM, forax at univ-mlv.fr wrote: >> >> yes, indy is a way to create any new bytecode, but it also has some >> restrictions, >> the major one being that you can not using it before it has been bootstrapped. > > Good point; we found that with string concatenation, didn?t we? > If we use indy for this, we?ll run into similar bootstrapping issues. Replacing an inner class by a lambda when calling AccessController.doPrivileged early in the boot process was my first encounter with this issue. > > Which reminds me that Brian has been pondering javac intrinsics > for some time, as a way of replacing method calls that would > ordinarily be linked and run the normal way, with preferable > alternative implementations. This game could also be played > (very carefully) with BSMs. That (like javac intrinsics) would > sidestep the usual bootstrapping orders. javac intrinsics doesn't work well because of profile pollution, by example with String.valueOf(), if the format is a constant, you can transform the format (if it is non Locale sensitive) to a string concatenation, but there is no way to express "if it's a constant" at indy level. Either you capture the format string and hope you will see the same, here you have a profile pollution issue or you have a magic combinator that say if it's constant use this method handle and not the other one but in that case, the method handle called if the argument is a constant has never been called and so has no profile further down. > > So, here?s a recommendation: Use indy, and use a clunkier > fallback in the same places that today use a clunkier fallback > for string concatenation. And, record a line item of technical > debt that we should further explore indy intrinsics, after we > figure out what javac intrinsics look like. What is not clear to me is that javac can replace unbox by a nullcheck, for the VM, the input is an interface and the output is an inline type, given that interfaces are not checked until runtime, how the VM can validate that only a nullcheck is enough ? Also it's still not clear to me what indy provide in this case. So i still think that doing a checkcast (reusing checkcast being a trick to avoid to introduce a new bytecode) or having a special unbox opcode is a better idea. > > ? John R?mi From john.r.rose at oracle.com Sat Apr 11 05:43:28 2020 From: john.r.rose at oracle.com (John Rose) Date: Fri, 10 Apr 2020 22:43:28 -0700 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <120125113.740229.1586517593802.JavaMail.zimbra@u-pem.fr> References: <14f2102c-c742-d407-3df1-563441fd7071@oracle.com> <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> <2088247380.511772.1586467867914.JavaMail.zimbra@u-pem.fr> <120125113.740229.1586517593802.JavaMail.zimbra@u-pem.fr> Message-ID: <814A7A4A-51BA-4BC6-83AD-317F8725C1A1@oracle.com> On Apr 10, 2020, at 4:19 AM, forax at univ-mlv.fr wrote: > >> So, here?s a recommendation: Use indy, and use a clunkier >> fallback in the same places that today use a clunkier fallback >> for string concatenation. And, record a line item of technical >> debt that we should further explore indy intrinsics, after we >> figure out what javac intrinsics look like. > > What is not clear to me is that javac can replace unbox by a nullcheck, for the VM, the input is an interface and the output is an inline type, given that interfaces are not checked until runtime, how the VM can validate that only a nullcheck is enough ? It can?t; that?s why I?m saying javac needs to ask for a null check, *and* somehow affirm the inline type (subtype of interface). This is two bytecodes, invokestatic Objects.requireNN, plus checkcast C. > Also it's still not clear to me what indy provide in this case. It provides both of the above effects in one bytecode. The bytecode, in turn, can expand to some internal JVM intrinsic which the runtime will optimize better than a back-to-back combo of the two standard instructions. That intrinsic never has to be admitted to by any spec. > So i still think that doing a checkcast (reusing checkcast being a trick to avoid to introduce a new bytecode) or having a special unbox opcode is a better idea. Changing opcode behaviors and/or adding new opcodes is always more expensive than appealing to indy, even if we have to add secret optimizations to indy. Specs are almost always harder to change than optimizations. ? John From forax at univ-mlv.fr Sun Apr 12 12:09:55 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Sun, 12 Apr 2020 14:09:55 +0200 (CEST) Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <814A7A4A-51BA-4BC6-83AD-317F8725C1A1@oracle.com> References: <56DEB1D5-9A78-4551-B599-5514D1A98C77@oracle.com> <547544228.508489.1586465937908.JavaMail.zimbra@u-pem.fr> <7A4F8AC2-CCA7-457E-AD30-1EAC98281E60@oracle.com> <2088247380.511772.1586467867914.JavaMail.zimbra@u-pem.fr> <120125113.740229.1586517593802.JavaMail.zimbra@u-pem.fr> <814A7A4A-51BA-4BC6-83AD-317F8725C1A1@oracle.com> Message-ID: <2000104233.1208724.1586693395206.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Brian Goetz" , "valhalla-spec-experts" > > Envoy?: Samedi 11 Avril 2020 07:43:28 > Objet: Re: null checks vs. class resolution, and translation strategy for casts > On Apr 10, 2020, at 4:19 AM, [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] > wrote: >>> So, here?s a recommendation: Use indy, and use a clunkier >>> fallback in the same places that today use a clunkier fallback >>> for string concatenation. And, record a line item of technical >>> debt that we should further explore indy intrinsics, after we >>> figure out what javac intrinsics look like. >> What is not clear to me is that javac can replace unbox by a nullcheck, for the >> VM, the input is an interface and the output is an inline type, given that >> interfaces are not checked until runtime, how the VM can validate that only a >> nullcheck is enough ? > It can?t; that?s why I?m saying javac needs to ask for a null check, > *and* somehow affirm the inline type (subtype of interface). > This is two bytecodes, invokestatic Objects.requireNN, plus > checkcast C. Ok, >> Also it's still not clear to me what indy provide in this case. > It provides both of the above effects in one bytecode. The bytecode, > in turn, can expand to some internal JVM intrinsic which the runtime > will optimize better than a back-to-back combo of the two standard > instructions. That intrinsic never has to be admitted to by any spec. >> So i still think that doing a checkcast (reusing checkcast being a trick to >> avoid to introduce a new bytecode) or having a special unbox opcode is a better >> idea. > Changing opcode behaviors and/or adding new opcodes is always > more expensive than appealing to indy, even if we have to add secret > optimizations to indy. Specs are almost always harder to change than > optimizations. Why do we have the new opcodes defaultvalue and withfield in that case ? In both case, the semantics "new inline type" and "unbox inline type" can be express with an indy, but for the former we have chosen to go with 2 new bytecodes and for the later you want to use indy, that doesn't seem logical. I understand why you want to use indy but from my armchair it seems like paying the cost upfront (with a new bytecode) or later (when optimizing). Indy is good when the linking is complex, for lambdas when you need to create a proxy class out of thin air, for the string concatenation or for the pattern matching because you have a lot of code shapes to link together. Indy has three major drawbacks, calling the BSM is slow, it's only fully inlined by c2 and you can not use it before it has been bootstraped. Those issues are all severe in our case, i don't see how we can use an inline type to express the entry (the pair of K,V) of a HashMap without being stopped by these issues. I heard you about the cost, but here indy is not the silver bullet, it's a shiny tool with its own weaknesses. And yes, adding a new opcode has a more upfront cost. > ? John R?mi From brian.goetz at oracle.com Sun Apr 12 15:59:11 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 12 Apr 2020 11:59:11 -0400 Subject: IdentityObject and InlineObject In-Reply-To: <2ac984c6-3691-13c6-7361-d8283f92e19b@oracle.com> References: <2ac984c6-3691-13c6-7361-d8283f92e19b@oracle.com> Message-ID: > Possible candidates: > > ?- Identity, as in "class Foo implements Identity".? This says what it > means, though it is quite possible that it will clash with existing > types.? (This is not an absolute disqualifier, but it is a > consideration.) > ?- IdentityObject.? This is where we are now; it always felt a little > clunky to me. > ?- ObjectIdentity ("class Foo implements ObjectIdentity").? Better > than IdentityObject, less likely to clash than Identity. > ?- WithIdentity.? Not the best name, but less likely than Identity to > clash. > > Others? Thinking on this for a few days ... I like "Identity" but it is sure to clash with every ORM and similar system out there.? I think I slightly prefer `ObjectIdentity` to `IdentityObject`, and both to `WithIdentity` and `HasIdentity`. Note that originally we were thinking that these types might be abstract classes; now that they are interfaces, this has slight consequences for the natural naming, as interfaces are often named for adjectives (Comparable) but abstract classes are almost always named for nouns.? We should evaluate these, in part, on how they sound in ??? class C implements I and, ??? class String implements ObjectIdentity more directly expresses the point -- that one of the behaviors of String is that it has an object identity -- than does: ??? class String implements IdentityObject It also puts the focus on the _identity_, rather than the object itself, which is consistent with breaking this behavior out of the root type and into a "mix in." From elias at vasylenko.uk Sun Apr 12 18:51:17 2020 From: elias at vasylenko.uk (Elias N Vasylenko) Date: Sun, 12 Apr 2020 19:51:17 +0100 Subject: IdentityObject and InlineObject In-Reply-To: References: <2ac984c6-3691-13c6-7361-d8283f92e19b@oracle.com> Message-ID: <1716fbb4888.27b9.0017d12c154faf61e4684984ecb879d4@vasylenko.uk> Then why not simply Identifiable? class String implements Identifiable Has a decent ring to it. On 12 April 2020 17:04:16 Brian Goetz wrote: >> Possible candidates: >> >> - Identity, as in "class Foo implements Identity". This says what it >> means, though it is quite possible that it will clash with existing >> types. (This is not an absolute disqualifier, but it is a >> consideration.) >> - IdentityObject. This is where we are now; it always felt a little >> clunky to me. >> - ObjectIdentity ("class Foo implements ObjectIdentity"). Better >> than IdentityObject, less likely to clash than Identity. >> - WithIdentity. Not the best name, but less likely than Identity to >> clash. >> >> Others? > > Thinking on this for a few days ... > > I like "Identity" but it is sure to clash with every ORM and similar > system out there. I think I slightly prefer `ObjectIdentity` to > `IdentityObject`, and both to `WithIdentity` and `HasIdentity`. > > Note that originally we were thinking that these types might be abstract > classes; now that they are interfaces, this has slight consequences for > the natural naming, as interfaces are often named for adjectives > (Comparable) but abstract classes are almost always named for nouns. We > should evaluate these, in part, on how they sound in > > class C implements I > > and, > > class String implements ObjectIdentity > > more directly expresses the point -- that one of the behaviors of String > is that it has an object identity -- than does: > > class String implements IdentityObject > > It also puts the focus on the _identity_, rather than the object itself, > which is consistent with breaking this behavior out of the root type and > into a "mix in." Sent with AquaMail for Android https://www.mobisystems.com/aqua-mail From frederic.parain at oracle.com Mon Apr 13 13:24:18 2020 From: frederic.parain at oracle.com (Frederic Parain) Date: Mon, 13 Apr 2020 09:24:18 -0400 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: References: Message-ID: <1FB0CDF6-5AD1-4EE8-8422-78C253D9CA58@oracle.com> > On Apr 8, 2020, at 14:43, John Rose wrote: > > I have a proposal for a translation strategy: > > 1. Translate casts to inline classes differently from ?classic? > casts. Add an extra step of null hostility. For very low-level > reasons, I suggest using ?ldc X? followed by Class::cast. > > Generally speaking, it?s a reasonable move to use reflective > API points (like Class::cast) on constant metadata (like X.class) > to implement language semantics. There?s an alternative way to implement this: Casts to inline classes C can be translated to #ldc C #checkcast C with this new definition of checkast: "If objectref is null then: - if type C has not been loaded yet, the operand stack is unchanged, - if type C has already been loaded: - if type C is not an inline type, the operand stack is unchanged - otherwise the checkcast instruction throws a ClassCastException Otherwise, the named class, array, or interface type is resolved (?5.4.3.1). If objectref can be cast to the resolved class, array, or interface type, the operand stack is unchanged; otherwise, the checkcast instruction throws a ClassCastException." This new definition doesn?t change the behavior of checkcast for old class files, and doesn?t change the behavior nor the translation strategy for casts to non-inline types. In new class files, javac will use the ldc/checkcast sequence whenever a cast to an inline type is required. Note that in many cases, type C would have already be loaded before ldc is executed (by pre-loading or eager loading). With migrated types, an old class file can still have a standalone checkcast (without ldc) referencing a type which is now an inline type, causing the null reference to pass the checkcast successfully. This is not a new issue. The same situation can be created by reading a field declared as ?LC;?, getfield would simply read the field (only possible value is null) and push it on the stack without checking if C is an inline type or not. This ?null? reference to an invalid type can only be used by code that have the wrong information about type C, and this is in fact the only possible value for this type (any attempt to create a real instance of ?LC;? would fail). In order to use this reference as a legitimate reference to the real type ?QC;?, another checkcast, using the proper sequence above, would be required and would throw an exception. Ill formed or malicious class files could be aware that C is an inline type, but use a single checkcast instruction (without preceding ldc) anyway. This is part of a bigger problem that has not been discussed yet: L/Q consistency inside a class file. the SoV document stipules that the value projection only exists in the Q-form and the reference projection only exists in the L-form. As of today, there?s no verification of such kind performed on class files. Nothing prevent a class file from declaring a field of type ?LC;? and another of type ?QC;?, and the same remark applies to method arguments. Going back to the modified specification of checkcast, new tests are easy to implement in the interpreter, and can easily be optimized by JIT compilers (most types would be loaded at compilation time), and there?s no bootstrapping issues. Fred From brian.goetz at oracle.com Mon Apr 13 19:28:40 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 13 Apr 2020 15:28:40 -0400 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <1FB0CDF6-5AD1-4EE8-8422-78C253D9CA58@oracle.com> References: <1FB0CDF6-5AD1-4EE8-8422-78C253D9CA58@oracle.com> Message-ID: <106b2cf0-056e-34d0-b7d8-2bf3958ad7e5@oracle.com> > There?s an alternative way to implement this: > > Casts to inline classes C can be translated to > #ldc C > #checkcast C And, the ldc can be hoisted statically if desired, to the top of the method, or even into a static initializer. From forax at univ-mlv.fr Mon Apr 13 19:46:57 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 13 Apr 2020 21:46:57 +0200 (CEST) Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <1FB0CDF6-5AD1-4EE8-8422-78C253D9CA58@oracle.com> References: <1FB0CDF6-5AD1-4EE8-8422-78C253D9CA58@oracle.com> Message-ID: <1254247717.207379.1586807217564.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Frederic Parain" > ?: "valhalla-spec-experts" > Envoy?: Lundi 13 Avril 2020 15:24:18 > Objet: Re: null checks vs. class resolution, and translation strategy for casts >> On Apr 8, 2020, at 14:43, John Rose wrote: >> >> I have a proposal for a translation strategy: >> >> 1. Translate casts to inline classes differently from ?classic? >> casts. Add an extra step of null hostility. For very low-level >> reasons, I suggest using ?ldc X? followed by Class::cast. >> >> Generally speaking, it?s a reasonable move to use reflective >> API points (like Class::cast) on constant metadata (like X.class) >> to implement language semantics. > > > There?s an alternative way to implement this: > > Casts to inline classes C can be translated to > #ldc C > #checkcast C > > with this new definition of checkast: > > "If objectref is null then: > - if type C has not been loaded yet, the operand stack is unchanged, > - if type C has already been loaded: > - if type C is not an inline type, the operand stack is unchanged > - otherwise the checkcast instruction throws a ClassCastException so it's more: #ldc C #checkcast 0 <--- > > Otherwise, the named class, array, or interface type is resolved (?5.4.3.1). If > objectref can be cast to the resolved class, array, or interface type, the > operand stack is unchanged; otherwise, the checkcast instruction throws a > ClassCastException." > > > This new definition doesn?t change the behavior of checkcast for old class > files, > and doesn?t change the behavior nor the translation strategy for casts to > non-inline types. > > In new class files, javac will use the ldc/checkcast sequence whenever a cast to > an > inline type is required. Note that in many cases, type C would have already be > loaded > before ldc is executed (by pre-loading or eager loading). > > With migrated types, an old class file can still have a standalone checkcast > (without ldc) > referencing a type which is now an inline type, causing the null reference to > pass the checkcast > successfully. This is not a new issue. The same situation can be created by > reading a field > declared as ?LC;?, getfield would simply read the field (only possible value is > null) and push > it on the stack without checking if C is an inline type or not. This ?null? > reference to an > invalid type can only be used by code that have the wrong information about type > C, and this > is in fact the only possible value for this type (any attempt to create a real > instance of > ?LC;? would fail). In order to use this reference as a legitimate reference to > the real > type ?QC;?, another checkcast, using the proper sequence above, would be > required and would > throw an exception. > > Ill formed or malicious class files could be aware that C is an inline type, but > use a > single checkcast instruction (without preceding ldc) anyway. This is part of a > bigger > problem that has not been discussed yet: L/Q consistency inside a class file. > the SoV > document stipules that the value projection only exists in the Q-form and the > reference > projection only exists in the L-form. As of today, there?s no verification of > such kind > performed on class files. Nothing prevent a class file from declaring a field of > type > ?LC;? and another of type ?QC;?, and the same remark applies to method > arguments. > > > Going back to the modified specification of checkcast, new tests are easy to > implement > in the interpreter, and can easily be optimized by JIT compilers (most types > would be > loaded at compilation time), and there?s no bootstrapping issues. > How it is better than a new opcode "unbox" with exactly the same semantics ? > Fred R?mi From brian.goetz at oracle.com Tue Apr 14 13:56:06 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Apr 2020 09:56:06 -0400 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <1254247717.207379.1586807217564.JavaMail.zimbra@u-pem.fr> References: <1FB0CDF6-5AD1-4EE8-8422-78C253D9CA58@oracle.com> <1254247717.207379.1586807217564.JavaMail.zimbra@u-pem.fr> Message-ID: <3CC36830-DCE2-4A8C-9FC5-E94FF6B3CB3E@oracle.com> > > How it is better than a new opcode "unbox" with exactly the same semantics ? For one, an ?unbox? opcode assumes that the VM understands the fictitious relationship between the val and ref projections. But these are language fictions; the VM sees only classes related by extension and sealing. From forax at univ-mlv.fr Tue Apr 14 14:08:34 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 14 Apr 2020 16:08:34 +0200 (CEST) Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <3CC36830-DCE2-4A8C-9FC5-E94FF6B3CB3E@oracle.com> References: <1FB0CDF6-5AD1-4EE8-8422-78C253D9CA58@oracle.com> <1254247717.207379.1586807217564.JavaMail.zimbra@u-pem.fr> <3CC36830-DCE2-4A8C-9FC5-E94FF6B3CB3E@oracle.com> Message-ID: <1379923003.713109.1586873314875.JavaMail.zimbra@u-pem.fr> > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "Frederic Parain" , "valhalla-spec-experts" > > Envoy?: Mardi 14 Avril 2020 15:56:06 > Objet: Re: null checks vs. class resolution, and translation strategy for casts >> How it is better than a new opcode "unbox" with exactly the same semantics ? > For one, an ?unbox? opcode assumes that the VM understands the fictitious > relationship between the val and ref projections. But these are language > fictions; the VM sees only classes related by extension and sealing. No, your assuming that the semantics of unbox implies that the verifier will check for 'sealness', this is not the case, unbox <==> checkcast restricted to inline type (so with eager loading). You're free to unbox an Object to a Point, by example. R?mi From brian.goetz at oracle.com Tue Apr 14 14:09:41 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Apr 2020 10:09:41 -0400 Subject: null checks vs. class resolution, and translation strategy for casts In-Reply-To: <1379923003.713109.1586873314875.JavaMail.zimbra@u-pem.fr> References: <1FB0CDF6-5AD1-4EE8-8422-78C253D9CA58@oracle.com> <1254247717.207379.1586807217564.JavaMail.zimbra@u-pem.fr> <3CC36830-DCE2-4A8C-9FC5-E94FF6B3CB3E@oracle.com> <1379923003.713109.1586873314875.JavaMail.zimbra@u-pem.fr> Message-ID: Then it?s a terrible and confusing name for an opcode :) > On Apr 14, 2020, at 10:08 AM, forax at univ-mlv.fr wrote: > > > > De: "Brian Goetz" > ?: "Remi Forax" > Cc: "Frederic Parain" , "valhalla-spec-experts" > Envoy?: Mardi 14 Avril 2020 15:56:06 > Objet: Re: null checks vs. class resolution, and translation strategy for casts > > How it is better than a new opcode "unbox" with exactly the same semantics ? > > For one, an ?unbox? opcode assumes that the VM understands the fictitious relationship between the val and ref projections. But these are language fictions; the VM sees only classes related by extension and sealing. > > No, > your assuming that the semantics of unbox implies that the verifier will check for 'sealness', this is not the case, unbox <==> checkcast restricted to inline type (so with eager loading). > > You're free to unbox an Object to a Point, by example. > > R?mi > From brian.goetz at oracle.com Tue Apr 14 23:12:11 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 14 Apr 2020 19:12:11 -0400 Subject: Fwd: IdentityObject and InlineObject In-Reply-To: <50a25a43-f7e6-d718-1612-aa54bb8c5f50@web.de> References: <50a25a43-f7e6-d718-1612-aa54bb8c5f50@web.de> Message-ID: Received on the -comments list. It is tempting to reach for an "implementation by parts" approach like this, but unless we go the extra mile and teach the compiler to "stitch together" these two partial methods into a total method, it doesn't work so well.? Instead, we can lean on an existing overloading: ??? Heap heapify(Object... args) { ... } ??? Heap heapify(IdentityObject... args) { ... } where the former is essentially the inline version, since the existing most-specific-overload rules will select the latter when the arguments are known to be IdentityObject.? So we don't give up much here. Moreover, this overload points to the real distinction we want -- has identity vs not.? Inline-ness here is just the absence of identity, so (at least for this purpose) doesn't need its own interface.? (There may be other reasons we'll discover later, though.) -------- Forwarded Message -------- Subject: IdentityObject and InlineObject Date: Tue, 14 Apr 2020 11:02:02 +0200 From: Gernot Neppert To: valhalla-spec-comments at openjdk.java.net After reading through Brian Goetz's arguments in favour of abandoning 'InlineObject', I was wondering: Couldn't it be useful to be able to overload a function based on the two disjunct type-families? For example, it may turn out to be more efficient to use two different algorithms for inline- vs. non-inline types. So I might want to write this: Heap heapify(T...args) { } Heap heapify(T...args) { } (If we had only 'IdentifyObject', the overload-set would contain functions with non-disjunct argument-types, which is generally discouraged!) From frederic.parain at oracle.com Mon Apr 20 17:20:19 2020 From: frederic.parain at oracle.com (Frederic Parain) Date: Mon, 20 Apr 2020 13:20:19 -0400 Subject: null checks vs class resolution: taking a few steps back Message-ID: <216C7D44-98C9-4492-A41B-CC5D19D90366@oracle.com> Here?s a few thoughts about the null checks vs class resolution issue (many thanks to Brian for his review and his improvements to this document). Checkcast: is it a null issue or a type issue? There has been some discussion recently on how casts should be translated. While the static compiler has considerable latitude on how to translate language constructs to bytecode, I?d like to make sure that we first have a clean story at the bytecode level, and then take up the translation story (if we still need to.) History, and historical inconveniences Before Valhalla, classfiles had two ways to denote a reference type: the plain name used in CONSTANT_Class_info entries, and the name within an envelope in the field and method descriptors used in CONSTANT_Fieldref_info, CONSTANT_Methodref_info and CONSTANT_InterfaceMethodref_info entries. Having two syntaxes was already a sign that something was weird, but we mostly wrote that off as a historical accident. (Worse, it is not even applied uniformly: arrays are always denoted with their envelope, even in CONSTANT_Class_info entries.) Aesthetics aside, it worked because there was a single unambiguous translation from a class name to a class name with envelope. In the bytecode sequence: aload_1 checkcast #10 // class Foo invokestatic #19 // Method Bar:(LFoo;)V the real meaning of the checkcast was: ?I guarantee that the top of stack is a reference to an instance of class Foo (a.k.a. LFoo;), otherwise I?ll throw an exception?. Because null is valid value of all reference types, the JVM does not load the class Foo if the value on the top of the stack is a null, and the verifier is still satisfied that the arguments on the stack match the signature of the method begin invoked. Valhalla turns up the pressure The Valhalla project introduces a new kind of envelope: Q*;. The spelling has remained the same, but it?s meaning has evolved with each prototype: With the v* bytecodes, it was a marker of a new kind of type; In L-world, it became a marker of null-hostility; In the current user model, it has become part of the type. The last two points require some explanation. In L-world, the L and Q flavors of an inline class were projected from a single set of class metadata. In this world, there were really three names ? the L projection of C, the Q projection of C, and the class C itself ? all of which could be given meaning. So it still could make sense to denote a class just by name ? but it?s not clear this was a very good idea. For instance, the devaultvalue bytecode used a CONSTANT_Class_info entry referring to the value class by its plain name. This was unambiguous, because of course the defaultvalue bytecode was referring to the Q-version of the type. (Until some future when we want to apply defaultvalue to reference types, and get null out.) The information was missing from the constant pool entry but deduced from the context because of the implicit assumption that defaultvalue only applies to Q-types. But there were other cases where even such implicit assumptions was not sufficient to deduce which variant of a value type should be used. The checkcast bytecode was one of this cases; it then becoame necessary to denote the class argument with the full envelope in order to express the expected behavior. With the new model of inline types, a class can only have one envelope: either Q if it is an inline type, or L otherwise. Which means that LFoo; and QFoo; are not two variants of a same type, but are in fact two different types. As much as we?d like to ignore it, if Foo is an inline type, it is still possible to forge a reference with type LFoo; ? we can create a class that declares a field of type LFoo;, instantiate an instance, and read the field. This LFoo; is a pretty silly type; it cannot interact with any other type, and it can only hold null. But the JVM has to deal with such silly types all the time, such as LBar; when Bar is a nonexistent class. But the reality is that LFoo; and QFoo; are two different types (with completely disjoint value sets!), and we should be honest about it. In the current inline type model, the envelope is an essential part of the identification of a type. Checkcast The legacy behavior of checkcast is on a collision course with the new type system. If the following bytecode sequence: aload_1 checkcast #10 // class Foo still means the same as before ? checking that the reference on the top of the stack is of type LFoo; ? we have a problem if Foo is an inline class, because if the top of stack holds the null, the checkcast will succeed (because null is indeed a valid value of the otherwise-useless type LFoo;), but this is not really what we had in mind when we asked whether the top of the stack held a Foo. It is easy to assume that this is just yet another bad nullity behavior, and forgivable to make this assumption because null has been the source of so much bad behavior in the past. But this would be putting the blame in the wrong place. In this example, the checkcast operation is simply operating on the wrong type, assuming LFoo; where it has no right to do so ? LFoo; and QFoo; are completely distinct types. Quick, plug the hole! There was a lot of discussions on the EG mailing list, and many proposals for ways to restore peace and tranquility. Unfortunately, they all seem to be ?quick fixes?, are each likely to generate new problems of their own. Without recapitulating the details of each of them, here?s a summary of their shortcomings: Generate a different sequence of bytecodes when casting to an inline type. This is a workaround for the current checkcast behavior, but is likely to cause trouble for generic code in the future that is specializable over both identity and inline types, because the goal is to share the bytecode across instantiations, and only patch the constant pool or type descriptors. Use Class::cast. Class::cast is a generic method returning T, which is erased to Object, which will hide the type information the verifier needs to guarantee correctness of method arguments types. Use invokedynamic to call custom behavior. This has serious risk of bootstrapping issues. Invent a checknull bytecode. This, and nother solutions focusing of the handling of null, address the symptom, not the problem. The problem is not the handling of null, it is checking that a particular value is within the value set of this particular type. The handling of the null reference should not be handled separately, and should just fall out of addressing the general question of whether a given value is in the value set of a given type. All of these solutions feel like quick fixes that are likely to bite us back in the fiture. Let?s solve the real problem instead. Concrete proposal Let?s fix this by fixing the underlying problem ? being explicit about what type we are dealing with. Specifically, from Valhalla and beyond, the way to denote a class type in a classfile is always a class name with an envelope. The two possible envelopes (currently) are the L-envelope for types with a value set containing null, and the Q-envelope for types with a value set not containing null. This has several pleasant consequences: All representations within the class file itself are unified: CONSTANT_Class_info, CONSTANT_Fieldref_info, CONSTANT_Methodref_info and CONSTANT_InterfaceMethodref_info will all use the same syntax, with no more translation required between names and type descriptors. Class denotation will be aligned with array denotation, which already uses type descriptors in CONSTANT_Class_info entries. All bytecodes referencing a CONSTANT_Class_info entry will have access to the full denotation, envelope + name, even when the class has not been loaded yet. The verifier will no longer have to translate between names and type descriptors. For the checkcast bytecode, the semantics has to be rephrased: checkcast must ensure that the reference on the top of the stack is within the value set of the type specified in argument, or throw an exception. For L types, this is the same behavior as before, but for Q types, the behavior reflects the value set of the type specified in the classfile. If we have: aload_1 checkcast #10 // class LFoo; then checkcast is being used with a type using a L-envelope, so we still know null is within the value set of Foo without having to load Foo. If the top of stack is not the null reference, then Foo must be loaded to check if this value is part of the remaining of Foo?s value set, as before. On the other hand, if we have: aload_1 checkcast #11 // class QBar; then checkcast is used with a type using a Q-envelope, which means null cannot be part of the value set of Bar. So if the top of stack contains the null reference, an exception can be thrown (again, without loading Bar if we so desire). If the top of stack is not the null reference, then Bar must be loaded to check if this value is part of Bar?s value set, as before. The bytecode sequence is the same for both inline types and not-inline-types, with the behavior being controlled by a constant pool entry, making it suitable for our specialization model, and the semantics being derived from the type on which checkcast operates. The benefits of always using a name+envelope will be less significant for other bytecodes, but they still do exist. (For example, using new on an inline type, could be caught at verification time instead of runtime.) Let?s take this opportunity to address the real problem ? correct denotation of types ? rather than pinning the blame on null (however many sins it committed in the past.) The current loose treatment of non-enveloped names has already caused trouble, and will be a huge source of technical debt going forward. Let?s just pay it off. Backward compatibility Pre-Valhalla class files only know about the L-envelope, so the JVM can continue to deal with them applying the old default translation from names to L*; descriptors. The implementation of checkcast won?t have to check the class file version, as the behavior can be deduced directly from the content of the CONSTANT_Class_info (plain name -> old syntax, name with envelope -> new syntax). New classfiles will reject the old syntax. From brian.goetz at oracle.com Mon Apr 20 18:22:08 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 Apr 2020 14:22:08 -0400 Subject: Reader mail bag Message-ID: <32907638-2e3e-ef69-5164-4fb5105f04c3@oracle.com> Some comments have come in on the `InlineObject` discussion: > > Subject: IdentityObject and InlineObject > Date: Sun, 19 Apr 2020 11:46:40 +0200 > From: Raffaello Giulietti > To: valhalla-spec-comments at openjdk.java.net > > > > > Hello, > > getting rid of InlineObject means that static or dynamic queries for > inline types need to rely on complementarity: an object is inline if > it is not identity (except perhaps for an instance of Object, which is > neither in the type system as I understand it). > > Should a third kind of objects beside identity and inline be invented > in the future (quantum objects? who knows?), complementarity > ("non-identity") alone would no longer suffice as a test for inline. > > As the benefits of giving up InlineObject seem low when compared to > the costs of re-introducing it in the some future, I like the > explicitness of the current dichotomy better. It gives the > programmers, rather than the language designers, the choice of > complementarity versus explicitness. > > So, either it is provable that the inline-identity dichotomy is here > to stay forever (a theorem in computer science, so to say), or I would > keep InlineObject as a reasonably priced investment for the future, > even if it is felt as extra luggage today. > > > Greetings > Raffaello > While I understand that there might be a future use where we want it after all, the real question is how to model it.? The current Inline/Identity duality is unsatisfying because what it really would want to capture is the "it's either one or the other", but we don't actually capture that.? The alternative, which is being explored, is to say "there are some operations that you can do on all objects, and then some you can only do on identity objects." If you want to follow up, start a new thread. From brian.goetz at oracle.com Mon Apr 20 18:23:57 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 20 Apr 2020 14:23:57 -0400 Subject: null checks vs class resolution: taking a few steps back In-Reply-To: <216C7D44-98C9-4492-A41B-CC5D19D90366@oracle.com> References: <216C7D44-98C9-4492-A41B-CC5D19D90366@oracle.com> Message-ID: <6e1b9bc0-de66-a2ca-5883-43843f082ee2@oracle.com> As Fred mentions, he shared this with me a few days ago, and I found these arguments very persuasive.? We have already, several times, fixated on nullity when the problem was elsewhere, and this feels like yet another case of that.? Fixing the underlying technical debt -- that the language that `checkcast` and friends have for referring to classes is inadequate -- feels like addressing the real problem. On 4/20/2020 1:20 PM, Frederic Parain wrote: > > > > Here?s a few thoughts about the?null?checks vs class resolution issue > (many thanks to Brian for his review and his improvements to this > document). > > > Checkcast: is it a null issue or a type issue? > > There has been some discussion recently on how casts should be translated. > While the static compiler has considerable latitude on how to > translate language > constructs to bytecode, I?d like to make sure that we first have a > clean story > at the bytecode level, and then take up the translation story (if we > still need > to.) > > > History, and historical inconveniences > > Before Valhalla, classfiles had two ways to denote a reference type: > the plain > name used in |CONSTANT_Class_info|?entries, and the name within an > envelope in > the field and method descriptors used in |CONSTANT_Fieldref_info|, > |CONSTANT_Methodref_info|?and |CONSTANT_InterfaceMethodref_info|?entries. > > Having two syntaxes was already a sign that something was weird, but > we mostly > wrote that off as a historical accident. (Worse, it is not even applied > uniformly: arrays are always denoted with their envelope, even in > |CONSTANT_Class_info|?entries.) Aesthetics aside, it worked because > there was a > single unambiguous translation from a class name to a class name with > envelope. > > In the bytecode sequence: > > |aload_1 checkcast #10 // class Foo invokestatic #19 // Method > Bar:(LFoo;)V | > > the real meaning of the |checkcast|?was: ?I guarantee that the top of > stack is a > reference to an instance of class |Foo|?(a.k.a. |LFoo;|), otherwise > I?ll throw > an exception?. Because |null|?is valid value of all reference types, > the JVM > does not load the class |Foo|?if the value on the top of the stack is > a |null|, > and the verifier is still satisfied that the arguments on the stack > match the > signature of the method begin invoked. > > > Valhalla turns up the pressure > > The Valhalla project introduces a new kind of envelope: |Q*;|. The > spelling has > remained the same, but it?s meaning has evolved with each prototype: > > * With the |v*|?bytecodes, it was a marker of a /new kind of type/; > * In L-world, it became a marker of /null-hostility/; > * In the current user model, it has become /part of the type/. > > The last two points require some explanation. In L-world, the L and Q > flavors > of an inline class were projected from a single set of class metadata. > In this > world, there were really three names ? the L projection of C, the Q > projection > of C, and the class C itself ? all of which could be given meaning. So it > still could make sense to denote a class just by name ? but it?s not > clear this > was a very good idea. > > For instance, the |devaultvalue|?bytecode used a > |CONSTANT_Class_info|?entry > referring to the value class by its plain name. This was unambiguous, > because > /of course/?the |defaultvalue|?bytecode was referring to the Q-version > of the > type. (Until some future when we want to apply |defaultvalue|?to reference > types, and get |null|?out.) The information was missing from the > constant pool > entry but deduced from the context because of the implicit assumption that > |defaultvalue|?only applies to Q-types. But there were other cases > where even > such implicit assumptions was not sufficient to deduce which variant > of a value > type should be used. The |checkcast|?bytecode was one of this cases; > it then > becoame necessary to denote the class argument with the full envelope > in order > to express the expected behavior. > > With the new model of inline types, a class can only have one > envelope: either > |Q|?if it is an inline type, or |L|?otherwise. Which means that > |LFoo;|?and > |QFoo;|?are not two variants of a same type, but are in fact /two > different > types/. > > As much as we?d like to ignore it, if |Foo|?is an inline type, it is still > possible to forge a reference with type |LFoo;|?? we can create a > class that > declares a field of type |LFoo;|, instantiate an instance, and read > the field. > This |LFoo;|?is a pretty silly type; it cannot interact with any other > type, and > it can only hold |null|. But the JVM has to deal with such silly types > all the > time, such as |LBar;|?when |Bar|?is a nonexistent class. But the > reality is > that |LFoo;|?and |QFoo;|?are two different types (with completely > disjoint value > sets!), and we should be honest about it. > > In the current inline type model, the envelope is an essential > part of the > identification of a type. > > > Checkcast > > The legacy behavior of |checkcast|?is on a collision course with the > new type > system. If the following bytecode sequence: > > |aload_1 checkcast #10 // class Foo | > > still means the same as before ? checking that the reference on the > top of the > stack is of type |LFoo;|?? we have a problem if |Foo|?is an inline class, > because if the top of stack holds the |null|, the |checkcast|?will succeed > (because null is indeed a valid value of the otherwise-useless type > |LFoo;|), > but this is not really what we had in mind when we asked whether the > top of the > stack held a |Foo|. > > It is easy to assume that this is just yet another bad nullity > behavior, and > forgivable to make this assumption because |null|?has been the source > of so much > bad behavior in the past. But this would be putting the blame in the wrong > place. > > In this example, the |checkcast|?operation is simply operating on > the wrong > type, assuming |LFoo;|?where it has no right to do so ? > |LFoo;|?and |QFoo;|?are > completely distinct types. > > > Quick, plug the hole! > > There was a lot of discussions on the EG mailing list, and many > proposals for > ways to restore peace and tranquility. Unfortunately, they all seem to be > ?quick fixes?, are each likely to generate new problems of their own. > Without > recapitulating the details of each of them, here?s a summary of their > shortcomings: > > * > *Generate a different sequence of bytecodes when casting to an inline > type.*?This is a workaround for the current |checkcast|?behavior, > but is > likely to cause trouble for generic code in the future that is > specializable > over both identity and inline types, because the goal is to share the > bytecode across instantiations, and only patch the constant pool > or type > descriptors. > * > *Use |Class::cast|.* |Class::cast|?is a generic method returning > T, which > is erased to |Object|, which will hide the type information the > verifier > needs to guarantee correctness of method arguments types. > * > *Use |invokedynamic|?to call custom behavior.*?This has serious > risk of > bootstrapping issues. > * > *Invent a |checknull|?bytecode.*?This, and nother solutions > focusing of > the handling of |null|, address the symptom, not the problem. The > problem > is not the handling of |null|, it is /checking that a particular > value is > within the value set of this particular type/. The handling of the > |null| > reference should not be handled separately, and should just fall > out of > addressing the general question of whether a given value is in the > value set > of a given type. > > All of these solutions feel like quick fixes that are likely to bite > us back > in the fiture. Let?s solve the real problem instead. > > > Concrete proposal > > Let?s fix this by fixing the underlying problem ? being explicit about > what > type we are dealing with. Specifically, from Valhalla and beyond, the > way to > denote a class type in a classfile is always a class name with an > envelope. > > The two possible envelopes (currently) are the L-envelope for types > with a value > set containing |null|, and the Q-envelope for types with a value set not > containing |null|. > > This has several pleasant consequences: > > * > All representations within the class file itself are unified: > |CONSTANT_Class_info|, |CONSTANT_Fieldref_info|, > |CONSTANT_Methodref_info| > and |CONSTANT_InterfaceMethodref_info|?will all use the same > syntax, with no > more translation required between names and type descriptors. > * > Class denotation will be aligned with array denotation, which > already uses > type descriptors in |CONSTANT_Class_info|?entries. > * > All bytecodes referencing a |CONSTANT_Class_info|?entry will have > access to > the full denotation, envelope + name, even when the class has not been > loaded yet. > * > The verifier will no longer have to translate between names and type > descriptors. > > For the |checkcast|?bytecode, the semantics has to be rephrased: > |checkcast| > must ensure that the reference on the top of the stack is within the > value set > of the type specified in argument, or throw an exception. For > |L|?types, this > is the same behavior as before, but for |Q|?types, the behavior > reflects the > value set of the type specified in the classfile. If we have: > > |aload_1 checkcast #10 // class LFoo; | > > then |checkcast|?is being used with a type using a L-envelope, so we > still know > |null|?is within the value set of |Foo|?without having to load |Foo|. > If the > top of stack is not the |null|?reference, then |Foo|?must be loaded to > check if > this value is part of the remaining of |Foo|?s value set, as before. > > On the other hand, if we have: > > |aload_1 checkcast #11 // class QBar; | > > then |checkcast|?is used with a type using a Q-envelope, which means > |null| > cannot be part of the value set of |Bar|. So if the top of stack > contains the > |null|?reference, an exception can be thrown (again, without loading > |Bar|?if we > so desire). If the top of stack is not the |null|?reference, then > |Bar|?must be > loaded to check if this value is part of |Bar|?s value set, as before. > > The bytecode sequence is the same for both inline types and > not-inline-types, > with the behavior being controlled by a constant pool entry, making it > suitable > for our specialization model, and the semantics being derived from the > type on > which |checkcast|?operates. > > The benefits of always using a name+envelope will be less significant > for other > bytecodes, but they still do exist. (For example, using |new|?on an inline > type, could be caught at verification time instead of runtime.) > > Let?s take this > opportunity to address the real problem ? correct denotation of > types ? rather > than pinning the blame on |null|?(however many sins it committed > in the past.) > The current loose treatment of non-enveloped names has already > caused trouble, > and will be a huge source of technical debt going forward. Let?s > just pay it > off. > > > Backward compatibility > > Pre-Valhalla class files only know about the L-envelope, so the JVM > can continue > to deal with them applying the old default translation from names to |L*;| > descriptors. The implementation of |checkcast|?won?t have to check the > class > file version, as the behavior can be deduced directly from the content > of the > |CONSTANT_Class_info|?(plain name -> old syntax, name with envelope -> new > syntax). New classfiles will reject the old syntax. > From brian.goetz at oracle.com Thu Apr 23 20:37:16 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 23 Apr 2020 16:37:16 -0400 Subject: Clarifications on SoV documents Message-ID: [ Sending to right EG list this time ] Srikanth asked me to make a few clarifications on the SoV documents.? I'll summarize them here and then later work them into the docs. 1.? Array subtyping.? Historically, array covariance worked like this: ??? if T <: U, then T[] <: U[] In Valhalla, this is refined to be: ??? if T.ref <: U, then T[] <: U[] Because T.ref == T for reference types, this rule generalizes the previous rule. There is no inline widening/narrowing conversion between array types; array types are reference types. Example: ??? V[] va = ... ??? V.ref[] vb = ... ??? vb = va // allowed by ordinary subtyping ??? va = vb // not allowed ??? va = (V[]) vb // allowed, may CCE 2.? Bounds checking.? If we have a bound: ??? class Foo { } then this was historically satisfied if T <: U.? In Valhalla, this is amended in the same way as (1): ??? X is within bound `T extends U` if X.ref <: U If `U` is an inline type, the bounds `T extends U` does not make sense, so is rejected. 3.? This story isn't written yet, but where we're aiming (everyone but Srikanth, please hold questions on "how would that work" until I write this up) is that inline types can be used as type parameters for erased generics _without_ having to say `Foo`.? This represents a shift from where we were before, where we didn't allow inlines as type arguments.