From daniel.smith at oracle.com Mon Oct 5 19:18:40 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 5 Oct 2020 13:18:40 -0600 Subject: Terminology update: primitive objects Message-ID: We've been struggling with some uncomfortable rough edges of the "inline class"/"inline type" terminology for awhile. After multiple rounds of bikeshedding, here's an alternative that the Oracle team feels pretty good about: - A *primitive object* is a new kind of object that lacks identity. It has special behavior with respect to equality, synchronization, and related operations. *Identity object* describes all other objects (including arrays). (The new objects are "primitive" in the sense that they are lighter-weight and represent simple, immutable values.) - A *primitive class* (formerly *inline class*) is a special class whose instances are primitive objects. Primitive classes are always concrete and final, and their declarations are subject to various restrictions. A class that is not primitive is either an *identity class* or an *abstract class* (or Object, if we don't end up making it abstract). - A *primitive value type* (formerly *inline type*) is a type whose values are primitive objects?the objects themselves, not *references* to the objects. Each primitive class has a primitive value type, typically denoted by the class name. - A *primitive reference type* is a type whose values are references to primitive objects, or null. Each primitive class has a primitive reference type, typically denoted as ClassName.ref. - The general term *primitive type* refers to either a primitive value type or a primitive reference type. The general term *reference type* continues to mean a type whose values are reference to objects (of unspecified kind), or null. - In the Java language, the existing primitive values will become primitive objects, with java.lang.Integer, etc., acting as their primitive classes. When needed, *built-in primitive value type*, etc., can be used to refer to their types. In the JVM, something like *primitive object type* might be appropriate to distinguish between primitive objects and the built-in numerics. The type terminology leans on intuitions about "pass by value" and "pass by reference". Some languages pass *variables* by value or reference, but in Java the, er, primitive, is passing *objects* by value or by reference. Similar properties apply in both contexts. --- This is a brief sketch, just enough to define the terms. Future documents, including project overviews, JEPs, and specs, will illustrate use of the terms in the broader context of the language and VM. From brian.goetz at oracle.com Mon Oct 5 19:22:15 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 5 Oct 2020 15:22:15 -0400 Subject: Terminology update: primitive objects In-Reply-To: References: Message-ID: There's always been a duality between whether inline classes are "faster objects" or "user-programmable primitives."? Until now, we're been erring on the "faster objects" side of the line, but after some thought, we think that flipping the perspective will better frame what these new features are for. The obvious stumble you have to get over before this idea is appealing is "but these are not primitives."? So we have to be explicit about that.? But, once you buy that, I think these terms work much better. On 10/5/2020 3:18 PM, Dan Smith wrote: > We've been struggling with some uncomfortable rough edges of the "inline class"/"inline type" terminology for awhile. After multiple rounds of bikeshedding, here's an alternative that the Oracle team feels pretty good about: > > - A *primitive object* is a new kind of object that lacks identity. It has special behavior with respect to equality, synchronization, and related operations. *Identity object* describes all other objects (including arrays). (The new objects are "primitive" in the sense that they are lighter-weight and represent simple, immutable values.) > > - A *primitive class* (formerly *inline class*) is a special class whose instances are primitive objects. Primitive classes are always concrete and final, and their declarations are subject to various restrictions. A class that is not primitive is either an *identity class* or an *abstract class* (or Object, if we don't end up making it abstract). > > - A *primitive value type* (formerly *inline type*) is a type whose values are primitive objects?the objects themselves, not *references* to the objects. Each primitive class has a primitive value type, typically denoted by the class name. > > - A *primitive reference type* is a type whose values are references to primitive objects, or null. Each primitive class has a primitive reference type, typically denoted as ClassName.ref. > > - The general term *primitive type* refers to either a primitive value type or a primitive reference type. The general term *reference type* continues to mean a type whose values are reference to objects (of unspecified kind), or null. > > - In the Java language, the existing primitive values will become primitive objects, with java.lang.Integer, etc., acting as their primitive classes. When needed, *built-in primitive value type*, etc., can be used to refer to their types. In the JVM, something like *primitive object type* might be appropriate to distinguish between primitive objects and the built-in numerics. > > The type terminology leans on intuitions about "pass by value" and "pass by reference". Some languages pass *variables* by value or reference, but in Java the, er, primitive, is passing *objects* by value or by reference. Similar properties apply in both contexts. > > --- > > This is a brief sketch, just enough to define the terms. Future documents, including project overviews, JEPs, and specs, will illustrate use of the terms in the broader context of the language and VM. From forax at univ-mlv.fr Wed Oct 7 10:19:06 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 7 Oct 2020 12:19:06 +0200 (CEST) Subject: Terminology update: primitive objects In-Reply-To: References: Message-ID: <382168465.1772693.1602065946788.JavaMail.zimbra@u-pem.fr> I'm fine with that change. For me, it's more where you put the emphasis - how it behaves on stack, it behavse like a primitive type ,the "pass by value" Dan is talking about - how it behaves on heap, it behaves by inlining itdelf in its container. So they are a primitive inlining class :) R?mi > De: "Brian Goetz" > ?: "daniel smith" , "valhalla-spec-experts" > > Envoy?: Lundi 5 Octobre 2020 21:22:15 > Objet: Re: Terminology update: primitive objects > There's always been a duality between whether inline classes are "faster > objects" or "user-programmable primitives." Until now, we're been erring on the > "faster objects" side of the line, but after some thought, we think that > flipping the perspective will better frame what these new features are for. > The obvious stumble you have to get over before this idea is appealing is "but > these are not primitives." So we have to be explicit about that. But, once you > buy that, I think these terms work much better. > On 10/5/2020 3:18 PM, Dan Smith wrote: >> We've been struggling with some uncomfortable rough edges of the "inline >> class"/"inline type" terminology for awhile. After multiple rounds of >> bikeshedding, here's an alternative that the Oracle team feels pretty good >> about: >> - A *primitive object* is a new kind of object that lacks identity. It has >> special behavior with respect to equality, synchronization, and related >> operations. *Identity object* describes all other objects (including arrays). >> (The new objects are "primitive" in the sense that they are lighter-weight and >> represent simple, immutable values.) >> - A *primitive class* (formerly *inline class*) is a special class whose >> instances are primitive objects. Primitive classes are always concrete and >> final, and their declarations are subject to various restrictions. A class that >> is not primitive is either an *identity class* or an *abstract class* (or >> Object, if we don't end up making it abstract). >> - A *primitive value type* (formerly *inline type*) is a type whose values are >> primitive objects?the objects themselves, not *references* to the objects. Each >> primitive class has a primitive value type, typically denoted by the class >> name. >> - A *primitive reference type* is a type whose values are references to >> primitive objects, or null. Each primitive class has a primitive reference >> type, typically denoted as ClassName.ref. >> - The general term *primitive type* refers to either a primitive value type or a >> primitive reference type. The general term *reference type* continues to mean a >> type whose values are reference to objects (of unspecified kind), or null. >> - In the Java language, the existing primitive values will become primitive >> objects, with java.lang.Integer, etc., acting as their primitive classes. When >> needed, *built-in primitive value type*, etc., can be used to refer to their >> types. In the JVM, something like *primitive object type* might be appropriate >> to distinguish between primitive objects and the built-in numerics. >> The type terminology leans on intuitions about "pass by value" and "pass by >> reference". Some languages pass *variables* by value or reference, but in Java >> the, er, primitive, is passing *objects* by value or by reference. Similar >> properties apply in both contexts. >> --- >> This is a brief sketch, just enough to define the terms. Future documents, >> including project overviews, JEPs, and specs, will illustrate use of the terms >> in the broader context of the language and VM. From brian.goetz at oracle.com Wed Oct 7 13:37:37 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 7 Oct 2020 09:37:37 -0400 Subject: Terminology update: primitive objects In-Reply-To: References: <382168465.1772693.1602065946788.JavaMail.zimbra@u-pem.fr> Message-ID: <7f381f09-7e80-69f4-4f0c-64dd8006fcb2@oracle.com> Yes, though a more thorough investigation is in order if we think this is a good direction. In the first round, the main problem was the use of "value" as both noun and adjective, giving us weird locutions like "the value set of a value type", or "if the value is a value".? This was clearly going to be a problem. In this round, the adjustments are more straightforward.? For example, we will likely have to adjust phrases like "primitive values" to something like "primitive objects", and "primitive types" to either "primitive value types" or "built-in primitive value types", depending on context.? But most of these involve being _more precise_ about the terminology. On 10/7/2020 9:14 AM, Dan Heidinga wrote: > One of the things that moved us away from the use of the term "value" > was the violence it would do to the Java and JVM specs. Has a similar > analysis been done on how using primitive objects will affect the > existing specs? > > --Dan > > On Wed, Oct 7, 2020 at 6:19 AM Remi Forax wrote: >> I'm fine with that change. >> >> For me, it's more where you put the emphasis >> - how it behaves on stack, it behavse like a primitive type ,the "pass by value" Dan is talking about >> - how it behaves on heap, it behaves by inlining itdelf in its container. >> >> So they are a primitive inlining class :) >> >> R?mi >> >> ________________________________ >> >> De: "Brian Goetz" >> ?: "daniel smith" , "valhalla-spec-experts" >> Envoy?: Lundi 5 Octobre 2020 21:22:15 >> Objet: Re: Terminology update: primitive objects >> >> There's always been a duality between whether inline classes are "faster objects" or "user-programmable primitives." Until now, we're been erring on the "faster objects" side of the line, but after some thought, we think that flipping the perspective will better frame what these new features are for. >> >> The obvious stumble you have to get over before this idea is appealing is "but these are not primitives." So we have to be explicit about that. But, once you buy that, I think these terms work much better. >> >> >> >> On 10/5/2020 3:18 PM, Dan Smith wrote: >> >> We've been struggling with some uncomfortable rough edges of the "inline class"/"inline type" terminology for awhile. After multiple rounds of bikeshedding, here's an alternative that the Oracle team feels pretty good about: >> >> - A *primitive object* is a new kind of object that lacks identity. It has special behavior with respect to equality, synchronization, and related operations. *Identity object* describes all other objects (including arrays). (The new objects are "primitive" in the sense that they are lighter-weight and represent simple, immutable values.) >> >> - A *primitive class* (formerly *inline class*) is a special class whose instances are primitive objects. Primitive classes are always concrete and final, and their declarations are subject to various restrictions. A class that is not primitive is either an *identity class* or an *abstract class* (or Object, if we don't end up making it abstract). >> >> - A *primitive value type* (formerly *inline type*) is a type whose values are primitive objects?the objects themselves, not *references* to the objects. Each primitive class has a primitive value type, typically denoted by the class name. >> >> - A *primitive reference type* is a type whose values are references to primitive objects, or null. Each primitive class has a primitive reference type, typically denoted as ClassName.ref. >> >> - The general term *primitive type* refers to either a primitive value type or a primitive reference type. The general term *reference type* continues to mean a type whose values are reference to objects (of unspecified kind), or null. >> >> - In the Java language, the existing primitive values will become primitive objects, with java.lang.Integer, etc., acting as their primitive classes. When needed, *built-in primitive value type*, etc., can be used to refer to their types. In the JVM, something like *primitive object type* might be appropriate to distinguish between primitive objects and the built-in numerics. >> >> The type terminology leans on intuitions about "pass by value" and "pass by reference". Some languages pass *variables* by value or reference, but in Java the, er, primitive, is passing *objects* by value or by reference. Similar properties apply in both contexts. >> >> --- >> >> This is a brief sketch, just enough to define the terms. Future documents, including project overviews, JEPs, and specs, will illustrate use of the terms in the broader context of the language and VM. >> >> >> From daniel.smith at oracle.com Wed Oct 7 14:28:55 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 7 Oct 2020 08:28:55 -0600 Subject: EG meeting, 2020-10-07 Message-ID: <2D5E7679-DDE8-4056-8552-98763F29321F@oracle.com> The next EG Zoom meeting is today at 4pm UTC (9am PDT, 12pm EDT). Recent threads to discuss: - "Terminology update: primitive objects": I outline new terminology we're proposing to talk about Valhalla's values, classes, and types May be a short meeting, depending on how long this topic takes. From forax at univ-mlv.fr Wed Oct 7 14:47:55 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 7 Oct 2020 16:47:55 +0200 (CEST) Subject: EG meeting, 2020-10-07 In-Reply-To: <2D5E7679-DDE8-4056-8552-98763F29321F@oracle.com> References: <2D5E7679-DDE8-4056-8552-98763F29321F@oracle.com> Message-ID: <826146368.2032522.1602082075415.JavaMail.zimbra@u-pem.fr> RestrictedField ? https://bugs.openjdk.java.net/browse/JDK-8254022 R?mi ----- Mail original ----- > De: "daniel smith" > ?: "valhalla-spec-experts" > Envoy?: Mercredi 7 Octobre 2020 16:28:55 > Objet: EG meeting, 2020-10-07 > The next EG Zoom meeting is today at 4pm UTC (9am PDT, 12pm EDT). > > Recent threads to discuss: > > - "Terminology update: primitive objects": I outline new terminology we're > proposing to talk about Valhalla's values, classes, and types > > May be a short meeting, depending on how long this topic takes. From brian.goetz at oracle.com Wed Oct 7 15:30:24 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 7 Oct 2020 11:30:24 -0400 Subject: Terminology update: primitive objects In-Reply-To: References: <382168465.1772693.1602065946788.JavaMail.zimbra@u-pem.fr> <7f381f09-7e80-69f4-4f0c-64dd8006fcb2@oracle.com> Message-ID: > From a user perspective, will the new terms encourage the right mental > model for when value types get flattened (or not)? As an EG, early on > we spent a lot of time discussing "flattened" vs "flattenable" and > trying to guide users to the view that this is a VM-level decision and > not a guarantee. Does leaning on the "pass by value" inituition undo > that previous work? The intent of this taxonomy is indeed to encourage the correct mental models (now that we actually understand what the correct mental model is.) The centerpiece of the current language design is that, for primitive classes, there are two ways to "describe" an instance: directly and indirectly.? The varying modes of description applies (among others) to fields (flattening) and calling conventions (scalarization) and variables (does the variable hold a value, or a reference.)? The intuition of "by value" and "by reference" is intended to capture all of these differences; do you store the value, or a reference to the value?? Do you pass a value, or a reference to the value?? Is the type of `t` a value, or a reference to a value? As a statically typed language, we capture this distinction by dividing between _primitive value types_ and _primitive reference types_ (a form of reference type, an existing concept.)? The modifiers "value" and "reference" are chosen to evoke "by value" and "by reference", whether for passing, storing, or describing.? It took a long time to realize this was essentially a forced move, since the value set of interface types consists of references to objects, and if we want primitive objects to implement interfaces, then there must be a way to describe them with references as well as directly. The reference vs value characterization is semantic, but reasonably correlates with expectations about runtime behavior.? Today, with reference types, we assume that they will _routinely_ be implemented with pointers (and they are), but in some cases, the VM figures out it can scalarize or constant fold (and we don't mind).? Similarly, with primitive value types, we assume they will be _routinely_ flattened, but the ultimate decision is the VMs, and there may be circumstances when the VM decides on an indirect representation.? So these intuitions are aimed at setting reasonable expectations, but not guarantees. From mcnepp02 at googlemail.com Fri Oct 16 07:41:57 2020 From: mcnepp02 at googlemail.com (Gernot Neppert) Date: Fri, 16 Oct 2020 09:41:57 +0200 Subject: RFR: 8254275: some more value-based classes Message-ID: I think I found some more classes that satisfy all the requirements for "value-based classes" (as listed in ValueBased.htm): java.net.Inet4Address java.net.Inet6Address java.nio.file.attribute.FileTime Additionally, I was wondering if the deprecations of public constructors that were done on the primitive wrapper classes should not also be done on the following classes, making them properly "value-based" in the future: java.util.UUID java.math.BigDecimal java.math.BigInteger From daniel.smith at oracle.com Tue Oct 20 00:01:16 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 19 Oct 2020 18:01:16 -0600 Subject: Source code analysis: calls to wrapper class constructors Message-ID: In the context of the Warnings for Value-Based Classes JEP, we're looking for usages of the deprecated wrapper class constructors ('new Integer(...)', 'new Double(...)', etc.). When do these get used? How often is this motivated by wanting a unique object vs. legacy code that has no particular reason not to use 'valueOf'? We've got some investigations going on at Oracle, including looking at some open-source projects, but I'm interested in examples from other companies/projects as well. Please investigate and share what you find! I'll reply in a few days with our analysis for the code we look at. From daniel.smith at oracle.com Wed Oct 21 14:20:49 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 21 Oct 2020 08:20:49 -0600 Subject: EG meeting *canceled*, 2020-10-21 Message-ID: <2ECA508B-DAE4-429D-A6FC-E35CACC65E00@oracle.com> I don't think we've got anything to discuss today, so let's cancel. Please *do* weigh in on the "source code analysis" thread if you can provide some perspective from any major Java projects. From daniel.smith at oracle.com Wed Oct 21 14:32:31 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 21 Oct 2020 08:32:31 -0600 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: Message-ID: <05FD7627-7C9F-4E58-AA64-98684E4868CB@oracle.com> > On Oct 19, 2020, at 6:01 PM, Dan Smith wrote: > > In the context of the Warnings for Value-Based Classes JEP, we're looking for usages of the deprecated wrapper class constructors ('new Integer(...)', 'new Double(...)', etc.). When do these get used? How often is this motivated by wanting a unique object vs. legacy code that has no particular reason not to use 'valueOf'? > > We've got some investigations going on at Oracle, including looking at some open-source projects, but I'm interested in examples from other companies/projects as well. Please investigate and share what you find! > > I'll reply in a few days with our analysis for the code we look at. Some initial areas of particular concern: - How disruptive will it be if legacy jar files stop running on the JVM? How often will projects in, say, 2023 depend on binaries that haven't been recompiled since 9, when the constructors were deprecated? (We've seen evidence that many projects?a lot of Apache APIs for example?fixed up their code in response to the deprecation warnings.) - How disruptive will it be if programs need to update their older shaded sources (repackaged sources copied from another project) before they can compile? These sources are more likely to go unmaintained, preserving 'new Integer' calls from years ago. From daniel.smith at oracle.com Fri Oct 23 06:21:42 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 23 Oct 2020 00:21:42 -0600 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <05FD7627-7C9F-4E58-AA64-98684E4868CB@oracle.com> References: <05FD7627-7C9F-4E58-AA64-98684E4868CB@oracle.com> Message-ID: Here are some numbers looking at Maven jars published since 2019: 116,532 jars 4,726 jars that invoke a wrapper constructor (4.1%) 1,031 jars that have more than 10 classes invoking a wrapper constructor (0.9%) If we focus just on the largest projects (jars with >1000 classes): 3,315 jars 1,620 jars that invoke a wrapper constructor (48.9%) 805 jars that have more than 10 classes invoking a wrapper constructor (24.3%) 133 jars that have more than 100 classes invoking a wrapper constructor (4.0%) Are these intended to be run on bleeding-edge JVMs? Class files show the following version breakdown: Pre-5: 2% 5-6: 34% 7: 15% 8: 46% 11: 2% Others (non-LTS): 0% What this doesn't tell us is the *latest* JVM a jar file is intended to support. We can at least say that newly-released jars targeting JDK 8 or earlier will be with us for quite some time. The wrapper constructors were deprecated in 9, and the post-8 sample size is fairly small, but there's not an obvious trend towards reduced use of wrapper constructors in those classes. On the other hand, when you dig into specific projects, I *do* see plenty of examples of projects that have updated their code, apparently in response to the deprecation warnings. Some specific examples, chosen pretty much at random among the worst offenders: - Apache Commons: The legacy commons.lang and commons.collections packages have many constructor calls, but these have been fixed in the newer APIs commons.lang3 and commons.collections4 - Apache FOP: Lots of constructor calls in generated code (representing fonts); these have all been fixed as of FOP 2.2, released in 2017. - Clover repackages old versions of utility libraries like fastutil and Apache Commons, which make heavy use of the wrapper constructors. Both of these projects have been updated to use 'valueOf', but the repackaged sources preserve the old code. - The GemFire project has a lot of constructor calls, but almost all appear in tests. - org.omg.CORBA makes heavy use of the constructors, but is also unmaintained code as of JEP 320 (I think?). Would there be an expectation that these libraries continue to run on a new JVM in a few years? - OpenCMS is an example of an active project that calls the constructor in what appears to be old code, e.g., manually boxing to Object when storing primitives in collections. From daniel.smith at oracle.com Sat Oct 24 01:20:36 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Fri, 23 Oct 2020 19:20:36 -0600 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: <05FD7627-7C9F-4E58-AA64-98684E4868CB@oracle.com> Message-ID: <869708CD-F3A2-4866-90E5-A13B156810C5@oracle.com> One more data-gathering exercise: I took a closer look at some popular Maven projects to see how they've evolved in their use of wrapper constructors. ----- junit:junit < 4.12 (<2014): many problems in junit.framework.Assert & org.junit.Assert 4.12-4.13 (2014-2020): a few problems in junit.framework.Assert 4.13.1 (2020): fixed org.ow2.asm:asm < 6.0 (<2017): many problems in org.objectweb.asm.ClassReader & Opcodes 6.0 (2017): problem in org.objectweb.asm.Opcodes 6.1+ (2018+): no problems commons-lang:commons-lang & org.apache.commons:commons-lang3 < 3.0 (<2011): many problems 3.0 (2011): many problems 3.1+ (2011+): fixed com.fasterxml.jackson.core:jackson-databind < 2.9.0 (<2017): one call in com.fasterxml.jackson.databind.deser.std.NumberDeserializers 2.9.0+ (2017+): fixed javax.xml.bind:jaxb-api 1.0 (2006): two problems 2.0+ (2006+): fixed Log4j (log4j:log4j & org.apache.logging.log4j:log4j-core) < 2.0 (<2014): multiple problems 2.0+ (2014+): no issues org.mockito:mockito-core < 2.1.0 (<2016): many problems, including in repackaged ASM 2.1.0+ (2016+): fixed org.scala-lang:scala-library < 2.8.0 (<2010): problems in scala.Console$, scala.Predef$, scala.mobile.Code, scala.runtime.BoxesUtility 2.8.0-2.9.3 (2010-2013): one issue, scala.actors.threadpool.locks.ReentrantReadWriteLock 2.10.0+ (2012+): fixed org.hibernate:hibernate-core < 4.0 (<2011): many, many, problems 4.0.0-5.4.22 (2011-2020): a few problems 6.0.0+ (not yet released): fixed ch.qos.logback:logback-classic < 1.0.7 (<2012): lots of problems 1.0.7+ (2012): fixed org.clojure < 1.10.0 (<2018): repackages ASM, a few other problems 1.1.0.0+ (2018+): fixed com.google.code.gson < 1.4 (<2010): problems in com.google.gson.DefaultInstanceCreators & JsonParser 1.4+ (2010+): fixed org.jetbrains.kotlin:kotlin-stdlib < 1.3.0 (<2018): no problems 1.3.0+ (2018+): new problems in kotlin.coroutines.jvm.internal.Boxing No problems: org.junit.jupiter:junit-jupiter-api (earliest release 2017) com.google.guava:guava (earliest release 2011) org.apache.commons.commons-collections4 (earliest release 2013) com.fasterxml.jackson.core:jackson-core (earliest release 2012) org.slf4j:slf4j-api (earliest release 2006) org.slf4j:slf4j-log4j12 (earliest release 2006) org.apache.logging.log4j:log4j-core (earliest release 2014) commons-io:commons-io (earliest release 2005) javax.servlet:javax.servlet-api (earliest release 2011) org.apache.httpcomponents:httpclient (earliest release 2009) ----- It does seem like the deprecation warnings introduced in 9 are working?lots of projects have responded by changing their code. A few cases worth special attention: - JUnit 4 was only very recently fixed (and it's in maintenance mode, must clients use 4.11 or 4.12, not 4.13) - Hibernate still has problems until its coming 6.0 release - Kotlin added (!) some constructor calls in 2018, and they're still there - Clojure and ASM were fixed pretty recently Still, it's a positive sign that only one of these projects is making wrapper constructor calls in its latest sources. From brian.goetz at oracle.com Mon Oct 26 17:28:24 2020 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 26 Oct 2020 13:28:24 -0400 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <869708CD-F3A2-4866-90E5-A13B156810C5@oracle.com> References: <05FD7627-7C9F-4E58-AA64-98684E4868CB@oracle.com> <869708CD-F3A2-4866-90E5-A13B156810C5@oracle.com> Message-ID: Overall, I find this data quite encouraging.? It says that when we finally deprecate something, the warnings get noticed, and a lot of uses are fixed within a few years.? Surely raising DepCon to FOR_REMOVAL will have additional effect. On 10/23/2020 9:20 PM, Dan Smith wrote: > One more data-gathering exercise: I took a closer look at some popular Maven projects to see how they've evolved in their use of wrapper constructors. > > ----- > > junit:junit > < 4.12 (<2014): many problems in junit.framework.Assert & org.junit.Assert > 4.12-4.13 (2014-2020): a few problems in junit.framework.Assert > 4.13.1 (2020): fixed > > org.ow2.asm:asm > < 6.0 (<2017): many problems in org.objectweb.asm.ClassReader & Opcodes > 6.0 (2017): problem in org.objectweb.asm.Opcodes > 6.1+ (2018+): no problems > > commons-lang:commons-lang & org.apache.commons:commons-lang3 > < 3.0 (<2011): many problems > 3.0 (2011): many problems > 3.1+ (2011+): fixed > > com.fasterxml.jackson.core:jackson-databind > < 2.9.0 (<2017): one call in com.fasterxml.jackson.databind.deser.std.NumberDeserializers > 2.9.0+ (2017+): fixed > > javax.xml.bind:jaxb-api > 1.0 (2006): two problems > 2.0+ (2006+): fixed > > Log4j (log4j:log4j & org.apache.logging.log4j:log4j-core) > < 2.0 (<2014): multiple problems > 2.0+ (2014+): no issues > > org.mockito:mockito-core > < 2.1.0 (<2016): many problems, including in repackaged ASM > 2.1.0+ (2016+): fixed > > org.scala-lang:scala-library > < 2.8.0 (<2010): problems in scala.Console$, scala.Predef$, scala.mobile.Code, scala.runtime.BoxesUtility > 2.8.0-2.9.3 (2010-2013): one issue, scala.actors.threadpool.locks.ReentrantReadWriteLock > 2.10.0+ (2012+): fixed > > org.hibernate:hibernate-core > < 4.0 (<2011): many, many, problems > 4.0.0-5.4.22 (2011-2020): a few problems > 6.0.0+ (not yet released): fixed > > ch.qos.logback:logback-classic > < 1.0.7 (<2012): lots of problems > 1.0.7+ (2012): fixed > > org.clojure > < 1.10.0 (<2018): repackages ASM, a few other problems > 1.1.0.0+ (2018+): fixed > > com.google.code.gson > < 1.4 (<2010): problems in com.google.gson.DefaultInstanceCreators & JsonParser > 1.4+ (2010+): fixed > > org.jetbrains.kotlin:kotlin-stdlib > < 1.3.0 (<2018): no problems > 1.3.0+ (2018+): new problems in kotlin.coroutines.jvm.internal.Boxing > > No problems: > org.junit.jupiter:junit-jupiter-api (earliest release 2017) > com.google.guava:guava (earliest release 2011) > org.apache.commons.commons-collections4 (earliest release 2013) > com.fasterxml.jackson.core:jackson-core (earliest release 2012) > org.slf4j:slf4j-api (earliest release 2006) > org.slf4j:slf4j-log4j12 (earliest release 2006) > org.apache.logging.log4j:log4j-core (earliest release 2014) > commons-io:commons-io (earliest release 2005) > javax.servlet:javax.servlet-api (earliest release 2011) > org.apache.httpcomponents:httpclient (earliest release 2009) > > ----- > > It does seem like the deprecation warnings introduced in 9 are working?lots of projects have responded by changing their code. > > A few cases worth special attention: > > - JUnit 4 was only very recently fixed (and it's in maintenance mode, must clients use 4.11 or 4.12, not 4.13) > > - Hibernate still has problems until its coming 6.0 release > > - Kotlin added (!) some constructor calls in 2018, and they're still there > > - Clojure and ASM were fixed pretty recently > > Still, it's a positive sign that only one of these projects is making wrapper constructor calls in its latest sources. > From daniel.smith at oracle.com Tue Oct 27 19:27:39 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 27 Oct 2020 13:27:39 -0600 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: Message-ID: > On Oct 19, 2020, at 6:01 PM, Dan Smith wrote: > > In the context of the Warnings for Value-Based Classes JEP, we're looking for usages of the deprecated wrapper class constructors ('new Integer(...)', 'new Double(...)', etc.). When do these get used? How often is this motivated by wanting a unique object vs. legacy code that has no particular reason not to use 'valueOf'? > > We've got some investigations going on at Oracle, including looking at some open-source projects, but I'm interested in examples from other companies/projects as well. Please investigate and share what you find! > > I'll reply in a few days with our analysis for the code we look at. So, some conclusions that we've drawn: - In 2020, the constructor calls are fairly pervasive, even in recently released binaries. Removing these constructors may be the most disruptive single API change we've ever made. - The trend is good?serious projects have mostly responded to the deprecation warnings introduced in 9. In 2024 (for example), the picture may be much better. - It is impossible, given the current JVM model for primitive classes, for Integer to both be a primitive class and support 'new java/lang/Integer'. Binaries calling the deprecated constructors simply won't work. - There is a meaningful semantic difference, in terms of promising unique identity, between code that does 'new Integer' and code that does 'Integer.valueOf' or implicitly boxes. The latter is better aligned with the class's future behavior as a primitive class. - Almost no usages care about the identity distinction?they just want boxing. But it's difficult to automatically detect the usages that do. These points lead to the following tentative plan: We'll proceed as planned with deprecation for removal. The message for all Java programs is to stop using these constructors if you want to run in future JDKs. Typically, the simplest refactoring is to replace 'new Integer(x)' with just 'x', but in some cases you might want 'Integer.valueOf(x)'. When you refactor, you should confirm that the change in identity semantics is acceptable. We'll encourage IDEs to provide tooling for updating sources, including batch processing. (As an example, IntelliJ already highlights deprecated APIs and suggests conversions, although I'm not familiar with its batch processing features.) For legacy binaries that have not been updated but that need to run on a future JDK, we'll provide tooling to rewrite their bytecode containing constructor calls, either as a preprocessing step on jar files, or as a runtime command-line argument. This tooling will support common bytecode patterns like 'new Foo; dup; ...; invokespecial Foo.;', but will not be a comprehensive solution. (Mimicking the behavior of instance initialization method invocation in full generality would be a very difficult task.) It will also carry behavioral compatibility risks. For these reasons, it will be available as a workaround, not as default JVM behavior. From forax at univ-mlv.fr Tue Oct 27 20:00:08 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 27 Oct 2020 21:00:08 +0100 (CET) Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: Message-ID: <941144237.933767.1603828808457.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "daniel smith" > ?: "valhalla-spec-experts" > Envoy?: Mardi 27 Octobre 2020 20:27:39 > Objet: Re: Source code analysis: calls to wrapper class constructors >> On Oct 19, 2020, at 6:01 PM, Dan Smith wrote: >> >> In the context of the Warnings for Value-Based Classes JEP, we're looking for >> usages of the deprecated wrapper class constructors ('new Integer(...)', 'new >> Double(...)', etc.). When do these get used? How often is this motivated by >> wanting a unique object vs. legacy code that has no particular reason not to >> use 'valueOf'? >> >> We've got some investigations going on at Oracle, including looking at some >> open-source projects, but I'm interested in examples from other >> companies/projects as well. Please investigate and share what you find! >> >> I'll reply in a few days with our analysis for the code we look at. > > So, some conclusions that we've drawn: > > - In 2020, the constructor calls are fairly pervasive, even in recently released > binaries. Removing these constructors may be the most disruptive single API > change we've ever made. > > - The trend is good?serious projects have mostly responded to the deprecation > warnings introduced in 9. In 2024 (for example), the picture may be much > better. > > - It is impossible, given the current JVM model for primitive classes, for > Integer to both be a primitive class and support 'new java/lang/Integer'. > Binaries calling the deprecated constructors simply won't work. > > - There is a meaningful semantic difference, in terms of promising unique > identity, between code that does 'new Integer' and code that does > 'Integer.valueOf' or implicitly boxes. The latter is better aligned with the > class's future behavior as a primitive class. > > - Almost no usages care about the identity distinction?they just want boxing. > But it's difficult to automatically detect the usages that do. > > These points lead to the following tentative plan: > > We'll proceed as planned with deprecation for removal. The message for all Java > programs is to stop using these constructors if you want to run in future JDKs. > Typically, the simplest refactoring is to replace 'new Integer(x)' with just > 'x', but in some cases you might want 'Integer.valueOf(x)'. When you refactor, > you should confirm that the change in identity semantics is acceptable. > > We'll encourage IDEs to provide tooling for updating sources, including batch > processing. (As an example, IntelliJ already highlights deprecated APIs and > suggests conversions, although I'm not familiar with its batch processing > features.) > > For legacy binaries that have not been updated but that need to run on a future > JDK, we'll provide tooling to rewrite their bytecode containing constructor > calls, either as a preprocessing step on jar files, or as a runtime > command-line argument. > > This tooling will support common bytecode patterns like 'new Foo; dup; ...; > invokespecial Foo.;', but will not be a comprehensive solution. > (Mimicking the behavior of instance initialization method invocation in full > generality would be a very difficult task.) It will also carry behavioral > compatibility risks. For these reasons, it will be available as a workaround, > not as default JVM behavior. Three remarks: - the compiler can warn because a code is using new Integer(value) or warn because the compiler will automatically transform all new Integer(value) to Integer.valueOf() once Valhalla is integrated, i prefer the second solution, having a warning message saying that in the future new Integer() will not be supported and that all new Integer(...) will be transformed by the compiler automatically to Integer.valueOf(). - the introduction of the strong encapsulation in 9 was a very similar challenge, the first thing to do is to raise awareness, having a warning at runtime (so emitted by the VM) per callsite using new Wrapper(...) will help a lot. People can detect easily if they are using a dependency that use something that will not be backward compatible (this warning should be emitted by Java 16+, because there is no point to wait and because of JEP 396, there will be a second wave of people wanting to update the library they maintain, so they can fixed both issues in one pass. - IDE should inspect the jars downloaded by Maven and Gradle, and report the use of deprecated for removal APIs, again to raise awareness, a warning directly on the tag dependency in the POM saying that this dependency (or one of it's sub-dependencies) is using deprecated for removal APIs will help a lot. R?mi From daniel.smith at oracle.com Tue Oct 27 20:36:21 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Tue, 27 Oct 2020 14:36:21 -0600 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <941144237.933767.1603828808457.JavaMail.zimbra@u-pem.fr> References: <941144237.933767.1603828808457.JavaMail.zimbra@u-pem.fr> Message-ID: <0257B596-D6E1-43D9-8B0D-68F729D243B8@oracle.com> > On Oct 27, 2020, at 2:00 PM, Remi Forax wrote: > > Three remarks: > - the compiler can warn because a code is using new Integer(value) or warn because the compiler will automatically transform all new Integer(value) to Integer.valueOf() once Valhalla is integrated, > i prefer the second solution, having a warning message saying that in the future new Integer() will not be supported and that all new Integer(...) will be transformed by the compiler automatically to Integer.valueOf(). I think the idea here is that if the compiler were doing this today, it would reduce the amount of binaries that will fail in the future. But... it won't eliminate the problem. I think we'd still need to do something about binaries that are *already* published (e.g., JUnit 4.12), or that target a pre-16 JDK. So given that a workaround for binaries will exist, there's not a lot to be gained by complicating the source-level story. And in the minus column, some areas of concern: - We'd have to change the language in 16 to special-case what class instance creation expressions mean. - We'd be breaking behavioral compatibility, with no way to opt out (if, say, you need unique identities, and don't plan to deploy on future JDKs). - There's no change in the compile-time experience?using either deprecation or these special warnings, you'd be stuck with warnings until you rewrote your code. Treating this as a standard application of deprecation, with behavioral changes triggered by source code changes, seems like a more attractive solution. > - the introduction of the strong encapsulation in 9 was a very similar challenge, the first thing to do is to raise awareness, having a warning at runtime (so emitted by the VM) per callsite using new Wrapper(...) > will help a lot. People can detect easily if they are using a dependency that use something that will not be backward compatible (this warning should be emitted by Java 16+, because there is no point to wait > and because of JEP 396, there will be a second wave of people wanting to update the library they maintain, so they can fixed both issues in one pass. Besides javac, one out-of-the-box tool we already have is jdeprscan, which since 9 reports any wrapper constructor calls in a given jar file. I'm not sure whether there's a mechanism in HotSpot to generate warnings about deprecated APIs at link/run time. It does seem like it would be a reasonable feature... > - IDE should inspect the jars downloaded by Maven and Gradle, and report the use of deprecated for removal APIs, again to raise awareness, > a warning directly on the tag dependency in the POM saying that this dependency (or one of it's sub-dependencies) is using deprecated for removal APIs will help a lot. Yes, IDEs providing their own jdeprscan-style diagnostics would be quite helpful. From john.r.rose at oracle.com Wed Oct 28 04:47:13 2020 From: john.r.rose at oracle.com (John Rose) Date: Tue, 27 Oct 2020 21:47:13 -0700 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <0257B596-D6E1-43D9-8B0D-68F729D243B8@oracle.com> References: <941144237.933767.1603828808457.JavaMail.zimbra@u-pem.fr> <0257B596-D6E1-43D9-8B0D-68F729D243B8@oracle.com> Message-ID: <56A201C4-CA61-4BF6-8D32-9BDD92C98331@oracle.com> On Oct 27, 2020, at 1:36 PM, Dan Smith wrote: > > I'm not sure whether there's a mechanism in HotSpot to generate warnings about deprecated APIs at link/run time. It does seem like it would be a reasonable feature... +1 There is no such feature at present; maybe something could be built on top of the debugger interface. A quick look at the event index [1] does not turn up dynamic linkage (resolution) events, which would be the obvious place to start. [1]: https://docs.oracle.com/javase/1.5.0/docs/guide/jvmti/jvmti.html#EventIndex From john.r.rose at oracle.com Wed Oct 28 04:56:29 2020 From: john.r.rose at oracle.com (John Rose) Date: Tue, 27 Oct 2020 21:56:29 -0700 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: Message-ID: On Oct 27, 2020, at 12:27 PM, Dan Smith wrote: > > This tooling will support common bytecode patterns like 'new Foo; dup; ...; invokespecial Foo.;', but will not be a comprehensive solution. (Mimicking the behavior of instance initialization method invocation in full generality would be a very difficult task.) One of the reasons it?s not going to be comprehensive is code like new Integer(complicatedExpr()), in which the `new` and `invokespecial ` are separated by (almost) arbitrarily complex bytecode. The two instructions don?t even have to be in the same basic block (at the bytecode level): new Integer(foo() ? bar() : baz()) // compiles to 4 BB?s in a diamond If we add switch expressions with large sub-blocks, I think we get peak separation of the start and end parts of the new/init dance: new Integer(switch (x) { case 1 -> { complicatedBlock: try { ? } catch ... ; return 0; default -> { for (;;) ? }} ) All of this gives me yet one more reason we would have been better off with factory methods instead of open-coding the new/init dance. It was, in hindsight, a false economy to open code the object creation ?guts? instead of putting them in factory API points. And with an eye toward future evolutions of legacy code (legacy code not yet in existence!), and uniformity with the factory methods of inline classes, let?s try harder to get rid of the new/init dance for identity objects. ? John From forax at univ-mlv.fr Wed Oct 28 09:25:59 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 28 Oct 2020 10:25:59 +0100 (CET) Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: Message-ID: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "John Rose" > ?: "daniel smith" > Cc: "valhalla-spec-experts" > Envoy?: Mercredi 28 Octobre 2020 05:56:29 > Objet: Re: Source code analysis: calls to wrapper class constructors > On Oct 27, 2020, at 12:27 PM, Dan Smith wrote: >> >> This tooling will support common bytecode patterns like 'new Foo; dup; ...; >> invokespecial Foo.;', but will not be a comprehensive solution. >> (Mimicking the behavior of instance initialization method invocation in full >> generality would be a very difficult task.) > > One of the reasons it?s not going to be comprehensive > is code like new Integer(complicatedExpr()), in which > the `new` and `invokespecial ` are separated > by (almost) arbitrarily complex bytecode. The two > instructions don?t even have to be in the same basic > block (at the bytecode level): > > new Integer(foo() ? bar() : baz()) > // compiles to 4 BB?s in a diamond > > If we add switch expressions with large sub-blocks, > I think we get peak separation of the start and > end parts of the new/init dance: > > new Integer(switch (x) { > case 1 -> { complicatedBlock: try { ? } catch ... ; return 0; > default -> { for (;;) ? }} ) > > All of this gives me yet one more reason we would have > been better off with factory methods instead of > open-coding the new/init dance. It was, in hindsight, > a false economy to open code the object creation ?guts? > instead of putting them in factory API points. > > And with an eye toward future evolutions of legacy code > (legacy code not yet in existence!), and uniformity with > the factory methods of inline classes, let?s try harder > to get rid of the new/init dance for identity objects. I believe there is a quick and dirty trick, replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf (I)Ljava/lang/Integer; It has to be done after the code is verified because the new execution doesn't push java/lang/Integer on the stack anymore before calling the arbitrary init expression thus any StackMapTables in between the NOPs and INVOKESTATIC are invalid. > > ? John R?mi From forax at univ-mlv.fr Wed Oct 28 12:48:34 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 28 Oct 2020 13:48:34 +0100 (CET) Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <0257B596-D6E1-43D9-8B0D-68F729D243B8@oracle.com> References: <941144237.933767.1603828808457.JavaMail.zimbra@u-pem.fr> <0257B596-D6E1-43D9-8B0D-68F729D243B8@oracle.com> Message-ID: <1602381123.1205799.1603889314954.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "daniel smith" > ?: "Remi Forax" > Cc: "valhalla-spec-experts" > Envoy?: Mardi 27 Octobre 2020 21:36:21 > Objet: Re: Source code analysis: calls to wrapper class constructors >> On Oct 27, 2020, at 2:00 PM, Remi Forax wrote: >> >> Three remarks: >> - the compiler can warn because a code is using new Integer(value) or warn >> because the compiler will automatically transform all new Integer(value) to >> Integer.valueOf() once Valhalla is integrated, >> i prefer the second solution, having a warning message saying that in the future >> new Integer() will not be supported and that all new Integer(...) will be >> transformed by the compiler automatically to Integer.valueOf(). > > I think the idea here is that if the compiler were doing this today, it would > reduce the amount of binaries that will fail in the future. But... it won't > eliminate the problem. I think we'd still need to do something about binaries > that are *already* published (e.g., JUnit 4.12), or that target a pre-16 JDK. > > So given that a workaround for binaries will exist, there's not a lot to be > gained by complicating the source-level story. And in the minus column, some > areas of concern: > > - We'd have to change the language in 16 to special-case what class instance > creation expressions mean. > > - We'd be breaking behavioral compatibility, with no way to opt out (if, say, > you need unique identities, and don't plan to deploy on future JDKs). > > - There's no change in the compile-time experience?using either deprecation or > these special warnings, you'd be stuck with warnings until you rewrote your > code. > > Treating this as a standard application of deprecation, with behavioral changes > triggered by source code changes, seems like a more attractive solution. [...] If the VM change all new Integer(x) to Integer.valueOf(x), see my other mail on how to do it, then i think the language should mirror the VM behavior. The warning doesn't have to be that the compiler actually rewrite new Integer(x) to Integer.valueOf(x) but that in the future the compiler will rewrite new Integer(x) to Integer.valueOf(x). The rewrite by the compiler has to be done at the same time ==/acmp transition from meaning pointer comparison to meaning primitive object comparison is both sides are primitive objects. The idea is that at a point in time, doing a new on a primitive object class is illegal, so the verifier will reject any use of "new java/lang/Integer" At that time, - the comp?ler will rewrite new Integer(x) to Integer.valueOf(x), so the generated class is valid and the source code is still valid but the semantics has slightly change hence the warning. - the VM will rewrite new Integer(x) to Integer.valueOf(x) for all classfile with a version lower than the current version. It's a better plan because it means that Java 1.0+ code still compile and that the compiler behavior and the VM behavior are aligned. It works because the semantics of the primive object subtype of java.lang.Integer and the semantics of java.lang.Integer has an identity object are mostly similar. It also means that the warning on synchronized(Integer) will now be an error. regards, R?mi From daniel.smith at oracle.com Wed Oct 28 15:49:49 2020 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 28 Oct 2020 09:49:49 -0600 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> References: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> Message-ID: > On Oct 27, 2020, at 10:56 PM, John Rose wrote: > > One of the reasons it?s not going to be comprehensive > is code like new Integer(complicatedExpr()), in which > the `new` and `invokespecial ` are separated > by (almost) arbitrarily complex bytecode. > On Oct 28, 2020, at 3:25 AM, Remi Forax wrote: > > I believe there is a quick and dirty trick, > replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf (I)Ljava/lang/Integer; > > It has to be done after the code is verified because the new execution doesn't push java/lang/Integer on the stack anymore before calling the arbitrary init expression thus any StackMapTables in between the NOPs and INVOKESTATIC are invalid. Don't forget the 'dup'. We're assuming a 'new' immediately followed by 'dup' (4 nops), and code that will eventually consume the second one and leave the first one fully-initialized. You're right that this disrupts verification; I think we can address this pre-verification by rewriting the StackMapTable, eliminating all references to 'uninitialized(Offset)' and shrinking the stack by two. The bigger limitation, which I don't think you run into in any javac-generated code, is that you can put a copy of the uninitialized object reference anywhere you want?in locals, duplicated 15 times on the stack, etc. That's the point where I'm guessing we give up. So, there's a tractable rewrite for any code with the shape: new java/lang/Integer; dup; ... [ad hoc computation, as long as it doesn't touch the two uninitialized Integer refs] invokespecial java/lang/Integer.(...)V; From forax at univ-mlv.fr Wed Oct 28 16:28:29 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 28 Oct 2020 17:28:29 +0100 (CET) Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> Message-ID: <1430442333.1293020.1603902509291.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "daniel smith" > ?: "Remi Forax" , "John Rose" > Cc: "valhalla-spec-experts" > Envoy?: Mercredi 28 Octobre 2020 16:49:49 > Objet: Re: Source code analysis: calls to wrapper class constructors >> On Oct 27, 2020, at 10:56 PM, John Rose wrote: >> >> One of the reasons it?s not going to be comprehensive >> is code like new Integer(complicatedExpr()), in which >> the `new` and `invokespecial ` are separated >> by (almost) arbitrarily complex bytecode. > > >> On Oct 28, 2020, at 3:25 AM, Remi Forax wrote: >> >> I believe there is a quick and dirty trick, >> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL >> java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf >> (I)Ljava/lang/Integer; >> >> It has to be done after the code is verified because the new execution doesn't >> push java/lang/Integer on the stack anymore before calling the arbitrary init >> expression thus any StackMapTables in between the NOPs and INVOKESTATIC are >> invalid. > > Don't forget the 'dup'. We're assuming a 'new' immediately followed by 'dup' (4 > nops), and code that will eventually consume the second one and leave the first > one fully-initialized. yes, i forget the DUP :) > > You're right that this disrupts verification; I think we can address this > pre-verification by rewriting the StackMapTable, eliminating all references to > 'uninitialized(Offset)' and shrinking the stack by two. Another solution, replace NEW [ref] DUP by NOP INVOKESTATIC java/lang/Integer giveMeAFakeInteger ()Ljava/lang/Integer; and replace the INVOKESPECIAL by an INVOKE_STATIC java/lang/Integer trampoline (Ljava/lang/Integer;I) with the method giveMeAFakeInteger returning a special Integer (can be null, maybe?) the method trampoline calling Integer.valueOf(). > > The bigger limitation, which I don't think you run into in any javac-generated > code, is that you can put a copy of the uninitialized object reference anywhere > you want?in locals, duplicated 15 times on the stack, etc. That's the point > where I'm guessing we give up. > > So, there's a tractable rewrite for any code with the shape: > > new java/lang/Integer; > dup; > ... [ad hoc computation, as long as it doesn't touch the two uninitialized > Integer refs] > invokespecial java/lang/Integer.(...)V; Yep, some codes not generated by javac or ecj will fail. R?mi From john.r.rose at oracle.com Wed Oct 28 19:08:30 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 28 Oct 2020 14:08:30 -0500 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> References: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> Message-ID: <05C60E88-1D92-4C8F-87F5-C1E751730F5C@oracle.com> Please accept the Tiger Woods Code Golf award for that one! It only works if the ?dup? output (after ?new?) is still contiguous on the stack. That won?t be true if javac for some reason spilled the result of ?new? to a local instead of holding it on stack. IIRC one reason to spill from stack to locals during expression evaluation is if there is some kind of complicated control flow inside the expression. Different javac?s historically have different policies about stuff like that. > On Oct 28, 2020, at 4:25 AM, Remi Forax wrote: > > ----- Mail original ----- >> De: "John Rose" >> ?: "daniel smith" >> Cc: "valhalla-spec-experts" >> Envoy?: Mercredi 28 Octobre 2020 05:56:29 >> Objet: Re: Source code analysis: calls to wrapper class constructors > >> On Oct 27, 2020, at 12:27 PM, Dan Smith wrote: >>> >>> This tooling will support common bytecode patterns like 'new Foo; dup; ...; >>> invokespecial Foo.;', but will not be a comprehensive solution. >>> (Mimicking the behavior of instance initialization method invocation in full >>> generality would be a very difficult task.) >> >> One of the reasons it?s not going to be comprehensive >> is code like new Integer(complicatedExpr()), in which >> the `new` and `invokespecial ` are separated >> by (almost) arbitrarily complex bytecode. The two >> instructions don?t even have to be in the same basic >> block (at the bytecode level): >> >> new Integer(foo() ? bar() : baz()) >> // compiles to 4 BB?s in a diamond >> >> If we add switch expressions with large sub-blocks, >> I think we get peak separation of the start and >> end parts of the new/init dance: >> >> new Integer(switch (x) { >> case 1 -> { complicatedBlock: try { ? } catch ... ; return 0; >> default -> { for (;;) ? }} ) >> >> All of this gives me yet one more reason we would have >> been better off with factory methods instead of >> open-coding the new/init dance. It was, in hindsight, >> a false economy to open code the object creation ?guts? >> instead of putting them in factory API points. >> >> And with an eye toward future evolutions of legacy code >> (legacy code not yet in existence!), and uniformity with >> the factory methods of inline classes, let?s try harder >> to get rid of the new/init dance for identity objects. > > I believe there is a quick and dirty trick, > replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf (I)Ljava/lang/Integer; > > It has to be done after the code is verified because the new execution doesn't push java/lang/Integer on the stack anymore before calling the arbitrary init expression thus any StackMapTables in between the NOPs and INVOKESTATIC are invalid. > >> >> ? John > > R?mi From john.r.rose at oracle.com Wed Oct 28 19:32:46 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 28 Oct 2020 14:32:46 -0500 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> Message-ID: On Oct 28, 2020, at 10:49 AM, Dan Smith wrote: > > You're right that this disrupts verification; I think we can address this pre-verification by rewriting the StackMapTable, eliminating all references to 'uninitialized(Offset)' and shrinking the stack by two. Or we can try to keep the verification as-is by emulating the stack effects. This requires inserting instructions, I think, but avoids reshaping the stack. Maybe: new Integer; dup; ?(stuff that pushes int)?; invokespecial Integer.(int)V ? ldc_w (String)dummy; dup; ?(stuff that pushes int)?; invokestatic Integer.valueOf(int)Integer; swap; pop; swap; pop Maybe use a helper which can ?gobble up? the stack junk in one go: invokestatic Integer.valueOf(int)Integer; swap; pop; swap; pop ? invokestatic Integer.$pop2$valueOf(String,String,int)Integer If the dummy value has migrated somewhere random, it could be picked up and popped: new Integer; astore L42; ?(stuff that pushes int)?; aload L42; invokespecial Integer.(int)V ? ldc_w (String)dummy; astore L42; ?(stuff that pushes int)?; invokestatic Integer.valueOf(int)Integer; swap; pop; aload L42; pop As a further improvement on this theme, note that the dummy always has two copies, one to feed to invokespecial and one to return to the user. The one to return to the user might be at TOS, or it might be elsewhere (in L42 or deeper on stack). We could do a peephole transform which finds the bytecodes that pull up the dummy value, move them *before* the $pop2$valueOf helper, and the net size change of bytecodes is zero. The location of the invokespecial might move a byte or two later. So: new Integer; ?(stuff like dup that stores a duplicate ref)?; ?(stuff that leaves the new ref on stack, then pushes the int)?; invokespecial Integer.(int)V; ?(unrelated stuff)? ?(stuff that ensures the replicate ref is now at TOS)? ? ldc_w (String)dummy; ?(same stuff like dup that stores a duplicate ref)?; ?(same stuff that leaves the new ref on stack, then pushes the int)?; ?(same stuff that ensures the replicate ref is now at TOS, but moved before the invoke)?; invokestatic Integer.$pop2$valueOf(Object,int)V; ?(same unrelated stuff)? This more elaborate scheme works for both the simple ?dup? case and for the more complicated ?astore L42? case. I don?t think it requires changing stack maps. Hours of educational play for nerds 14 and up! > The bigger limitation, which I don't think you run into in any javac-generated code, is that you can put a copy of the uninitialized object reference anywhere you want?in locals, duplicated 15 times on the stack, etc. That's the point where I'm guessing we give up. From john.r.rose at oracle.com Wed Oct 28 19:36:59 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 28 Oct 2020 14:36:59 -0500 Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: References: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> Message-ID: <77AC23DC-88FB-4D68-8F36-D6737385F92B@oracle.com> On Oct 28, 2020, at 2:32 PM, John Rose wrote: > > invokestatic Integer.$pop2$valueOf(Object,int)V That would be invokestatic Integer.$pop2$valueOf(String,int,String)V And the dummy object could be an Integer (using a condy) if we don?t want to edit the stack maps that might mention the Integer. They might be present if the integer expression contains control flow. So, invokestatic Integer.$pop2$valueOf(Integer,int,Integer)V From forax at univ-mlv.fr Wed Oct 28 20:02:51 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 28 Oct 2020 21:02:51 +0100 (CET) Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <77AC23DC-88FB-4D68-8F36-D6737385F92B@oracle.com> References: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> <77AC23DC-88FB-4D68-8F36-D6737385F92B@oracle.com> Message-ID: <1457978744.1327493.1603915371287.JavaMail.zimbra@u-pem.fr> To summarize, it's possible to rewrite the NEW + DUP + INVOKESPECIAL sequence and it may also be possible to rewrite other sequence like NEW + STORE + LOAD + INVOKESPECIAL + LOAD or other combination by loading with LDC condy using a fake Integer and using an INVOKESTATIC with more parameter so there is no need to change the StackMapFrames. We may still have some bytecode shapes we don't support but it worth a try. R?mi > De: "John Rose" > ?: "daniel smith" > Cc: "Remi Forax" , "valhalla-spec-experts" > > Envoy?: Mercredi 28 Octobre 2020 20:36:59 > Objet: Re: Source code analysis: calls to wrapper class constructors > On Oct 28, 2020, at 2:32 PM, John Rose < [ mailto:john.r.rose at oracle.com | > john.r.rose at oracle.com ] > wrote: >> invokestatic Integer.$pop2$valueOf(Object,int)V > That would be invokestatic Integer.$pop2$valueOf(String,int,String)V > And the dummy object could be an Integer (using a condy) if we don?t > want to edit the stack maps that might mention the Integer. They > might be present if the integer expression contains control flow. > So, invokestatic Integer.$pop2$valueOf(Integer,int,Integer)V From forax at univ-mlv.fr Wed Oct 28 20:05:15 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Wed, 28 Oct 2020 21:05:15 +0100 (CET) Subject: Source code analysis: calls to wrapper class constructors In-Reply-To: <05C60E88-1D92-4C8F-87F5-C1E751730F5C@oracle.com> References: <1230117334.1075618.1603877159693.JavaMail.zimbra@u-pem.fr> <05C60E88-1D92-4C8F-87F5-C1E751730F5C@oracle.com> Message-ID: <78928896.1327867.1603915515955.JavaMail.zimbra@u-pem.fr> > De: "John Rose" > ?: "Remi Forax" > Cc: "Valhalla Expert Group Observers" > , "daniel smith" > , "valhalla-spec-experts" > > Envoy?: Mercredi 28 Octobre 2020 20:08:30 > Objet: Re: Source code analysis: calls to wrapper class constructors > Please accept the Tiger Woods Code Golf award for that one! Did i win something, a free t-shirt :) > It only works if the ?dup? output (after ?new?) is still contiguous > on the stack. That won?t be true if javac for some reason spilled > the result of ?new? to a local instead of holding it on stack. > IIRC one reason to spill from stack to locals during expression > evaluation is if there is some kind of complicated control flow > inside the expression. Different javac?s historically have > different policies about stuff like that. I've never seen such bytecode shapes but I don't think i've ever seen a classfile compiled with a version which was less that Java 1.2. R?mi >> On Oct 28, 2020, at 4:25 AM, Remi Forax < [ mailto:forax at univ-mlv.fr | >> forax at univ-mlv.fr ] > wrote: >> ----- Mail original ----- >>> De: "John Rose" < [ mailto:john.r.rose at oracle.com | john.r.rose at oracle.com ] > >>> ?: "daniel smith" < [ mailto:daniel.smith at oracle.com | daniel.smith at oracle.com ] >>> > >>> Cc: "valhalla-spec-experts" < [ mailto:valhalla-spec-experts at openjdk.java.net | >>> valhalla-spec-experts at openjdk.java.net ] > >>> Envoy?: Mercredi 28 Octobre 2020 05:56:29 >>> Objet: Re: Source code analysis: calls to wrapper class constructors >>> On Oct 27, 2020, at 12:27 PM, Dan Smith < [ mailto:daniel.smith at oracle.com | >>> daniel.smith at oracle.com ] > wrote: >>>> This tooling will support common bytecode patterns like 'new Foo; dup; ...; >>>> invokespecial Foo.;', but will not be a comprehensive solution. >>>> (Mimicking the behavior of instance initialization method invocation in full >>>> generality would be a very difficult task.) >>> One of the reasons it?s not going to be comprehensive >>> is code like new Integer(complicatedExpr()), in which >>> the `new` and `invokespecial ` are separated >>> by (almost) arbitrarily complex bytecode. The two >>> instructions don?t even have to be in the same basic >>> block (at the bytecode level): >>> new Integer(foo() ? bar() : baz()) >>> // compiles to 4 BB?s in a diamond >>> If we add switch expressions with large sub-blocks, >>> I think we get peak separation of the start and >>> end parts of the new/init dance: >>> new Integer(switch (x) { >>> case 1 -> { complicatedBlock: try { ? } catch ... ; return 0; >>> default -> { for (;;) ? }} ) >>> All of this gives me yet one more reason we would have >>> been better off with factory methods instead of >>> open-coding the new/init dance. It was, in hindsight, >>> a false economy to open code the object creation ?guts? >>> instead of putting them in factory API points. >>> And with an eye toward future evolutions of legacy code >>> (legacy code not yet in existence!), and uniformity with >>> the factory methods of inline classes, let?s try harder >>> to get rid of the new/init dance for identity objects. >> I believe there is a quick and dirty trick, >> replace new java/lang/Integer by 3 NOPs and replace INVOKESPECIAL >> java/lang/Integer (I)V by INVOKESTATIC java/lang/Integer valueOf >> (I)Ljava/lang/Integer; >> It has to be done after the code is verified because the new execution doesn't >> push java/lang/Integer on the stack anymore before calling the arbitrary init >> expression thus any StackMapTables in between the NOPs and INVOKESTATIC are >> invalid. >>> ? John >> R?mi