From daniel.smith at oracle.com Thu Oct 17 18:22:38 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 17 Oct 2019 12:22:38 -0600 Subject: and factories Message-ID: The plan of record for compiling the constructors of inline classes is to generate static methods named "" with an appropriate return type, and invoke them with 'invokestatic'. This requires relaxing the existing restrictions on method names and references. Historically, the special names "" and "" have been reserved for special-purpose JVM rules (for example, 'invokespecial' is treated like a distinct instruction if it invokes a method named ''); for convenience, we've also prohibited all other method names that include the characters '<' or '>' (JVMS 4.2.2). Equivalently, we might say that, within the space of method names, we've carved out a reserved space for special purposes: any names that include '<' or '>'. A few months ago, I put together a tentative specification that effectively cedes a chunk of the reserved space for general usage [1]. The names "" and "" are no longer reserved, *unless* they're paired with descriptors of a certain form ("(.*)V" and "()V", respectively). Pulling on the thread, we could even wonder whether the JVM should have a reserved space at all?why can't I name my method "bob>" or "", for example? In retrospect, I'm not sure this direction is such a good idea. There is value in having well-known names that instantly indicate important properties, without having more complex tests. (Complex tests are likely to be a source of bugs and security exploits.) Since the JVM ecosystem is already accustomed to the existence of a reserved space for special method names, we can keep that space for free, while it's potentially costly to give it up. So here's a alternative design: - "" continues to indicate instance initialization methods; "" continues to indicate class initialization methods - A new reserved name, "", say, can be used to declare factories - To avoid misleading declarations, methods named "" must be static and have a return type that matches their declaring class; only 'invokestatic' instructions can reference them - The rest of the "<.*>" space of names (plus ".*<.*" and ".*>.*") is held in reserve, available for special purposes as we discover them The Java compiler would only use "" methods for inline class construction, for now; perhaps in the future we'll find other use cases that make sense (like surfacing some sort of factory mechanism). Does this seem promising? Any particular reason it's better to overload "" than just come up with a new special name? [1] http://cr.openjdk.java.net/~dlsmith/lw2/lw2-20190628/specs/init-methods-jvms.html From brian.goetz at oracle.com Thu Oct 17 19:20:10 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 17 Oct 2019 15:20:10 -0400 Subject: and factories In-Reply-To: References: Message-ID: I think the choice to keep the reserved space is a good one. For the vs distinction, it makes me want to ask: what's a factory?? Obviously, inline classes have constrained-form constructors that are translated to factory methods, but is this the end of the factory story, or the beginning? As has been discussed, javac could well generate _both_ a method and a factory, where the latter is derived from the former via new/dup/init.? Whether this is desirable depends on what we get for this. But, the question that this plan leaves me wondering is whether there should be a notion of a factory in the language (such a concept would warrant a novel translation strategy, if for no other reason than not being lossy.)? Currently, we use the word "factory" quite loosely (basically, any this-class-returning static method), and there's no type checking that, for example, prevents a factory from returning null. So I think much of the value of having a factory concept in the VM is coupled to whether we have a factory concept in the language. (If we asked the personification of records, he/she would definitely want factory methods, because then we could be justified in making constructors private and instead exposing a factory, as this would actually have linguistic meaning.) On 10/17/2019 2:22 PM, Dan Smith wrote: > The plan of record for compiling the constructors of inline classes is > to generate static methods named "" with an appropriate return > type, and invoke them with 'invokestatic'. > > This requires relaxing the existing restrictions on method names and > references. Historically, the special names "" and "" > have been reserved for special-purpose JVM rules (for example, > 'invokespecial' is treated like a distinct instruction if it invokes a > method named ''); for convenience, we've also prohibited all > other method names that include the characters '<' or '>' (JVMS 4.2.2). > > Equivalently, we might say that, within the space of method names, > we've carved out a reserved space for special purposes: any names that > include '<' or '>'. > > A few months ago, I put together a tentative specification that > effectively cedes a chunk of the reserved space for general usage [1]. > The names "" and "" are no longer reserved, *unless* > they're paired with descriptors of a certain form ("(.*)V" and "()V", > respectively). Pulling on the thread, we could even wonder whether the > JVM should have a reserved space at all?why can't I name my method > "bob>" or "", for example? > > In retrospect, I'm not sure this direction is such a good idea. There > is value in having well-known names that instantly indicate important > properties, without having more complex tests. (Complex tests are > likely to be a source of bugs and security exploits.) Since the JVM > ecosystem is already accustomed to the existence of a reserved space > for special method names, we can keep that space for free, while it's > potentially costly to give it up. > > So here's a alternative design: > > - "" continues to indicate instance initialization methods; > "" continues to indicate class initialization methods > > - A new reserved name, "", say, can be used to declare factories > > - To avoid misleading declarations, methods named "" must be > static and have a return type that matches their declaring class; only > 'invokestatic' instructions can reference them > > - The rest of the "<.*>" space of names (plus ".*<.*" and ".*>.*") is > held in reserve, available for special purposes as we discover them > > The Java compiler would only use "" methods for inline class > construction, for now; perhaps in the future we'll find other use > cases that make sense (like surfacing some sort of factory mechanism). > > Does this seem promising? Any particular reason it's better to > overload "" than just come up with a new special name? > > [1] > http://cr.openjdk.java.net/~dlsmith/lw2/lw2-20190628/specs/init-methods-jvms.html From john.r.rose at oracle.com Thu Oct 17 20:00:54 2019 From: john.r.rose at oracle.com (John Rose) Date: Thu, 17 Oct 2019 13:00:54 -0700 Subject: and factories In-Reply-To: References: Message-ID: <21D1B657-E623-4CAE-B420-0F8F431719D4@oracle.com> On Oct 17, 2019, at 11:22 AM, Dan Smith wrote: > > The plan of record for compiling the constructors of inline classes is to generate static methods named "" with an appropriate return type, and invoke them with 'invokestatic'. > > This requires relaxing the existing restrictions on method names and references. Historically, the special names "" and "" have been reserved for special-purpose JVM rules (for example, 'invokespecial' is treated like a distinct instruction if it invokes a method named ''); for convenience, we've also prohibited all other method names that include the characters '<' or '>' (JVMS 4.2.2). > > Equivalently, we might say that, within the space of method names, we've carved out a reserved space for special purposes: any names that include '<' or '>'. > > A few months ago, I put together a tentative specification that effectively cedes a chunk of the reserved space for general usage [1]. The names "" and "" are no longer reserved, *unless* they're paired with descriptors of a certain form ("(.*)V" and "()V", respectively). Pulling on the thread, we could even wonder whether the JVM should have a reserved space at all?why can't I name my method "bob>" or "", for example? > > In retrospect, I'm not sure this direction is such a good idea. There is value in having well-known names that instantly indicate important properties, without having more complex tests. (Complex tests are likely to be a source of bugs and security exploits.) Since the JVM ecosystem is already accustomed to the existence of a reserved space for special method names, we can keep that space for free, while it's potentially costly to give it up. > > So here's a alternative design: > > - "" continues to indicate instance initialization methods; "" continues to indicate class initialization methods > > - A new reserved name, "", say, can be used to declare factories > > - To avoid misleading declarations, methods named "" must be static and have a return type that matches their declaring class; only 'invokestatic' instructions can reference them > > - The rest of the "<.*>" space of names (plus ".*<.*" and ".*>.*") is held in reserve, available for special purposes as we discover them > > The Java compiler would only use "" methods for inline class construction, for now; perhaps in the future we'll find other use cases that make sense (like surfacing some sort of factory mechanism). > > Does this seem promising? Any particular reason it's better to overload "" than just come up with a new special name? For my part either outcome is fine. The prototype overloads but it could almost as well have added . Fine points in the VM prototype: - A method must be static, and it can be restricted to return exactly the type of its declaring class, except in ?cases?. - In some cases (VMACs and hidden classes) the declaring class is not denotable in a descriptor; the return type must be a super (maybe always Object). So the prototype allows Object as a return type from a static function. I don?t remember whether it checks that the declaring class is a VMAC in that case. Would there be any restrictions on the contents of a constructor/factory method ? (I hope not.) Would there be any enhancements to the capabilities of a function? For example, I think we should consider allowing to invokespecial super. on a new instance, and/or putstatic into the final fields of the new instance. If don?t allow this, then translation strategies may have to spin private methods to handle the super call and final field inits, which seems suboptimal to me. (To be clear: I?m thinking of using here in a non-inline class.) One result of using a different name () is that there?s no need to require that it be static or not. I don?t think there?s any benefit to requiring that be static. (Well maybe some: It partitions from any kind of virtual call.) Maybe a non-static could serve as a factory method which takes the current instance and ?reconstructs? it as a new instance. But that can be done by wrapping a static into some other method m, and then there?s no confusion about making m virtual. > [1] http://cr.openjdk.java.net/~dlsmith/lw2/lw2-20190628/specs/init-methods-jvms.html Using something like is a forced move for inline classes. It is also (IMO) a fruitful move for regular non-inline (?identity?) classes. If the translation strategy were adjusted to translate every new Foo() expression as invokestatic , the following benefits would appear: - Less reliance on the verifier to validate arbitrary-in-the-wild ?new/dup/invokespecial? code shapes. (It?s been buggy in the past.) - Simpler more optimizable bytecode for complex expressions like new A(?new B()?), currently a pain point in our JITs. - A more direct path for migrating ?new VT()? expressions from VT as a value-based class to an inline class. (No migration with new/dup/invokespecial.) - More compact (and analyzable) classfiles, when they contain new A(?) expressions. - A future option to make the ?new instance? instruction be *private* to the class which it is constructing, a probable security benefit. - A future option to separate, at the language level, the capability of constructing a subclass instance (super()) from requesting a new object (new A()). ? John P.S. About that last option: A public constructor C allows *both* creation of new instances and subclassing. It is difficult to separately control access for these operations. (They correspond to calls to C.super. and to C..) If it were possible tease apart these as separate API points (corresponding to the distinct underlying names) then they could be given independent access control (one public, one private, etc.). In fact, a more clear separation would be to call the super-version C.super.. So that super() calls could be translated to invokespecial (with the same powers and responsibilities as for in that position). And new T() calls would be translated to invokestatic . And would serve both at once, in various use cases, but a class translation might have only and , or perhaps and and some private methods to factor out code used by both, locally. I?ll tip my hand here: I think of a method as a ?final constructor?: It?s the use of a constructor in the terminal position, when the requested class is known, and *not* when a random subclass is requesting initialization of one of its progeny. I also think of (or used only by subclasses) as an ?abstract constructor?. It?s the use of a constructor in a non-terminal position, when the requested class is some subclass elsewhere but it needs to call up the super chain for proper instantiation. The analogy with final and abstract methods is not exact, but it is close enough that I think there?s something there. In this mindset, I think of today?s as a hack which performs both jobs, even though they are distinct, and of today?s constructor notation as defining *both* the and the methods, and indeed stashing the one copy of the code on the hack. When we get bridging technology, we can declaratively spin bridges from non-private and API points (w/o bodies) into private methods. So the extra distinctions I?m thinking of don?t have to end up duplicating bytecodes, in the common case where a class needs to define parallel and API points. From brian.goetz at oracle.com Thu Oct 17 20:38:09 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 17 Oct 2019 16:38:09 -0400 Subject: and factories In-Reply-To: <21D1B657-E623-4CAE-B420-0F8F431719D4@oracle.com> References: <21D1B657-E623-4CAE-B420-0F8F431719D4@oracle.com> Message-ID: > Fine points in the VM prototype: > > Would there be any restrictions on the contents of a constructor/factory method ? (I hope not.) I'd be sad if it were possible for a invocation of a `` method to leave a `null` on the stack. > From daniel.smith at oracle.com Thu Oct 17 22:51:50 2019 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 17 Oct 2019 16:51:50 -0600 Subject: and factories In-Reply-To: <21D1B657-E623-4CAE-B420-0F8F431719D4@oracle.com> References: <21D1B657-E623-4CAE-B420-0F8F431719D4@oracle.com> Message-ID: <400DAEF9-E917-4564-8D71-144A7B95EB12@oracle.com> > On Oct 17, 2019, at 2:00 PM, John Rose wrote: > > If the translation strategy were adjusted to translate every > new Foo() expression as invokestatic One interesting thing we might consider: 'invokestatic Foo.(...)...' could implicitly be rewritten to 'new Foo; dup; invokespecial ...' if there is no declared '' method. This lets us immediately implement the translation strategy you propose, without waiting for classes to be recompiled. It also gives us an atomic "newdupinit" bytecode for free. (I recognize, however, that most implicit magic like this that we've considered in the past has been dropped because the JVM doesn't like implicit magic.) From brian.goetz at oracle.com Thu Oct 17 19:19:52 2019 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 17 Oct 2019 15:19:52 -0400 Subject: and factories In-Reply-To: References: Message-ID: <9274d8da-4fe5-00aa-c446-23d7bb70de44@oracle.com> I think the choice to keep the reserved space is a good one. For the vs distinction, it makes me want to ask: what's a factory?? Obviously, inline classes have constrained-form constructors that are translated to factory methods, but is this the end of the factory story, or the beginning? As has been discussed, javac could well generate _both_ a method and a factory, where the latter is derived from the former via new/dup/init.? Whether this is desirable depends on what we get for this. But, the question that this plan leaves me wondering is whether there should be a notion of a factory in the language (such a concept would warrant a novel translation strategy, if for no other reason than not being lossy.)? Currently, we use the word "factory" quite loosely (basically, any this-class-returning static method), and there's no type checking that, for example, prevents a factory from returning null. So I think much of the value of having a factory concept in the VM is coupled to whether we have a factory concept in the language. (If we asked the personification of records, he/she would definitely want factory methods, because then we could be justified in making constructors private and instead exposing a factory, as this would actually have linguistic meaning.) On 10/17/2019 2:22 PM, Dan Smith wrote: > The plan of record for compiling the constructors of inline classes is to generate static methods named "" with an appropriate return type, and invoke them with 'invokestatic'. > > This requires relaxing the existing restrictions on method names and references. Historically, the special names "" and "" have been reserved for special-purpose JVM rules (for example, 'invokespecial' is treated like a distinct instruction if it invokes a method named ''); for convenience, we've also prohibited all other method names that include the characters '<' or '>' (JVMS 4.2.2). > > Equivalently, we might say that, within the space of method names, we've carved out a reserved space for special purposes: any names that include '<' or '>'. > > A few months ago, I put together a tentative specification that effectively cedes a chunk of the reserved space for general usage [1]. The names "" and "" are no longer reserved, *unless* they're paired with descriptors of a certain form ("(.*)V" and "()V", respectively). Pulling on the thread, we could even wonder whether the JVM should have a reserved space at all?why can't I name my method "bob>" or "", for example? > > In retrospect, I'm not sure this direction is such a good idea. There is value in having well-known names that instantly indicate important properties, without having more complex tests. (Complex tests are likely to be a source of bugs and security exploits.) Since the JVM ecosystem is already accustomed to the existence of a reserved space for special method names, we can keep that space for free, while it's potentially costly to give it up. > > So here's a alternative design: > > - "" continues to indicate instance initialization methods; "" continues to indicate class initialization methods > > - A new reserved name, "", say, can be used to declare factories > > - To avoid misleading declarations, methods named "" must be static and have a return type that matches their declaring class; only 'invokestatic' instructions can reference them > > - The rest of the "<.*>" space of names (plus ".*<.*" and ".*>.*") is held in reserve, available for special purposes as we discover them > > The Java compiler would only use "" methods for inline class construction, for now; perhaps in the future we'll find other use cases that make sense (like surfacing some sort of factory mechanism). > > Does this seem promising? Any particular reason it's better to overload "" than just come up with a new special name? > > [1] http://cr.openjdk.java.net/~dlsmith/lw2/lw2-20190628/specs/init-methods-jvms.html From john.r.rose at oracle.com Fri Oct 18 17:46:59 2019 From: john.r.rose at oracle.com (John Rose) Date: Fri, 18 Oct 2019 10:46:59 -0700 Subject: and factories In-Reply-To: References: <21D1B657-E623-4CAE-B420-0F8F431719D4@oracle.com> Message-ID: On Oct 17, 2019, at 1:38 PM, Brian Goetz wrote: > > >> Fine points in the VM prototype: >> >> Would there be any restrictions on the contents of a constructor/factory method ? (I hope not.) > > I'd be sad if it were possible for a invocation of a `` method to leave a `null` on the stack. Yes. And should a factory contract sometimes include a guarantee of an exact type for the non-null return value? (Maybe yes, sometimes no. Probably null is always wrong; don?t call that a factory.) So this leads to one or two use cases for type operators: 1. Non-null decoration on descriptor. Could be a template specialization NonNull<*C> where C is the return value and the thing with * is reified. All factories should return this. (Could be LC//NonNull; or LNonNull//C; or LC[NonNull]; or LNonNull[C]; as a decoration syntax for descriptors. Various other considerations would determine the actual bike shed color.) 2. Exact-type decoration on descriptor. Could be another template specialization Exact<*C>. Exact types are sometimes nice to have, although they make API points very rigid. Sometimes that?s the goal. ? John