abandon all U-types, welcome to L-world (or, what I learned in Burlington)

Remi Forax forax at univ-mlv.fr
Sun Nov 19 22:47:16 UTC 2017


To summarize for myself,
we already know that we only need only one U, java.lang.__Value, let try to make it java.lang.Object (with no boxing).

The claim is that Object is used more as the root of any types like in collections than as the root of all references like in System.out.println().

Ok, i need to think more about that.

regards,
Rémi

----- Mail original -----
> De: "John Rose" <john.r.rose at oracle.com>
> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Dimanche 19 Novembre 2017 22:40:33
> Objet: abandon all U-types, welcome to L-world (or, what I learned in Burlington)

> We just had a 50-hour week of face-to-face meetings by the
> Valhalla VM team.  We learned a lot and surprised ourselves
> by coming to a consensus that a promising design for value
> types uses mainly the same legacy L-type descriptors, makes
> relatively little use of Q-type descriptors, and does not appear
> to need a third descriptor "kind" or "mode", such as U for
> universal, or R for reference-only.
> 
> First a few highlights out of many.  Fred Parain explained to us how
> he has prototyped a thread-local analog of Java heaps to store value
> structs in a form convenient to the interpreter.  Tobias Hartmann
> and Roland Westrelin (of Red Hat) explained what the compiler
> prefers to see, which is (obviously) the scalarized components
> of each value.  The three of them have worked out detailed
> rules for calling between interpreted and compiled code.
> 
> It seems to me that other implementations of the JVM (looking
> at you, IBM) will tend in similar directions, so although our
> results are strongly informed by our own prototyping, we think
> it is likely that they will apply to other, independent JVM
> implementations.  (Or are there platforms where the interpreter
> will scalarize aggressively and the optimizer will prefer to
> keep everything in structs?  Not.)
> 
> Karen Kinnear and the Oracle Valhalla lead, David Simms, were
> there to make sure we solved the important problems and asked
> the hard questions.  As a special appearance, one of our spec.
> gurus, Dan Smith, was there to help us make rigorous sense out
> of our intuitions and hacks.
> 
> Since we were short on language experts, we just worked in
> the mode (my personal favorite) of pretending that the JVM
> is the most important thing, and the Java language designers
> will just have to figure out how to use it.  Of course, that's an
> oversimplification; the JLS and JVMS inform each other very
> strongly, but it was freeing to temporarily take current thoughts
> about JLS extensions as a given and vary the JVM to find
> the sweet spot that would be simple to implement and supportive
> to what we think we know about the Valhalla Java of the future.
> 
> We had some long conversations about carrier types: L, Q, U,
> and more, and that's what I want to write about here.  We also
> make significant progress in the design of crackable lambdas,
> template classes, and current and future versions of condy.
> We talked to Ron Pressler about kick starting Loom fibers.
> But it is L-types I want to talk about here; the above is just a
> sketch of the past week's environment.
> 
> Logically speaking, we have two things we want to do, and
> that unfolds to a choice between three "worlds" of up to four
> distinct kinds: L/Q/U/R.  L is always present because it is
> a legacy model for reference types.  Q is always present
> because we know we need (at least sometimes) to make
> a syntactic distinction between flattened values and legacy
> objects.
> 
> (Why not just always look inside the classfile? Because
> the verifier cannot be expected to load a class for every
> type it sees, so needs a descriptor kind character from
> time to time.)
> 
> The U kind came a year or two ago when we realized
> that any-generics (and/or templates) and interfaces both
> require a disjoint-union type that is neither Q nor L, but
> can keep track of Q payloads (value instances) and L
> payloads (nullable references to object instances),
> without mixing them up.  In other words, neither Q-types
> nor legacy L-types are parallel class-based constructs,
> and neither conveniently "sits on top" of the other; they
> need a common supertype to carry them without confusion.
> 
> Before I describe the three logically possible "worlds",
> I'll add one more letter, R.  An R-type is exactly a legacy
> L-type, a nullable reference.  Why use a separate letter?
> Answer:  For the same reason we introduced the other
> kind letters, to preserve all the necessary distinctions
> among different kinds of payloads and carrier types,
> and also to talk about the explicit encoding of descriptors.
> 
> There are three worlds we could design to hold both legacy
> R-types (today's L-types) and Q-types:  U-world, L-world,
> and R-world.  They might be notated respectively as U/QL,
> L/Q, and U/QR.
> 
> The "U-world" is what I have been mentally preparing for
> for many months.  It is the design where L-types, marked
> as such in bytecode type descriptors, are always legacy
> object references or null, and Q-types, also marked as
> such in bytecodes, are always new value types.  To
> carry runtime payloads which may dynamically vary
> between the two modes, we need a third mode, U-types,
> which carry the two kinds of payloads (I hesitate to say
> "values" because I want to include reference values also).
> 
> A U-type is a disjoint union between corresponding,
> similarly named Q-types and L-types.
> 
> (Mathematically, a _disjoint union_ of C = A |_| B is no more
> and no less than the sum of all elements or points comprised
> by the two constituent sets A and B.  The disjoint union has
> nothing more: no points not in A or B.  It has nothing less:
> every point of C is from either A and B, but never both.
> If A and B somehow look like they have a non-empty
> intersection, then C is adjusted so as to keep straight
> which elements are from A and which are from B.)
> 
> The "R-world" is a copy of the "U-world", except that the
> new world has no L-types at all, or rather they are renamed
> as R-types.  In this world, bridges would be required
> between legacy bytecodes (which use L's) and Valhalla
> bytecodes (which use R's for the same concept).
> 
> We are pretty sure we don't want to live in R-world, but
> it helps to think about it, since it makes the maximum
> distinctions between legacy APIs and upgraded Valhalla
> APIs.  Any bridge from R-world to legacy code will
> presumably come after a clear decision has been made
> to allow the legacy code to see, under the name of L-types,
> the R's from the new world, plus whatever Q's are also
> allowed over the bridge to interoperate wit the old code.
> 
> The U-world has similar need for bridges, but less extreme.
> We know we will need some bridges to upgrade legacy
> classes like List to use U-types (List<int>, List<ComplexDouble).
> The L-types of U-world just mix without effort into the legacy
> L-types of legacy classes, since the same letter is used.
> 
> The third logical choice, and the one we are now looking
> at very seriously, is "L-world".  (Break out the "abandon
> all hope" and "Niflheim" jokes!)  In L-world, we identify
> (some would say conflate or confuse) the necessary
> U-type which unites R-types and Q-types with the legacy
> syntax "L".  The Q-type syntax is *maybe* needed, but
> in any case does not appear in a parallel position of
> importance with the dominant L-type syntax.  The R-type
> syntax seems even less important; we haven't thought
> of a use for it.  But it is in reserve, in case we need
> R-type descriptors for some corner case.
> 
> The distinction between value types and object types
> is still fundamental, as is the distinction between flat
> and non-flat data.  The classfile which defines any
> given type unambiguously declares whether it is an
> object or value type.  But in L-world, the L-type
> descriptors can carry both payloads.  That's the
> key decision before us.
> 
> (For brevity I'll say R-type/R-value when I mean a
> legacy nullable reference type/value, and Q-type/Q-value
> for value type/instance.  This doesn't mean that we
> will need Q's and R's in the final bytecode syntax.
> But they are useful concepts.)
> 
> There are many implications from the decision to
> put L-types at the top:
> 
> * The type L-Object ("Ljava/lang/Object;") carries both
> .  Thus, we don't need a
> new top-type.  (There are objectionable properties of
> L-Object which need remediation, but this was always
> true, and is not a showstopper for L-world.)
> 
> * Likewise, legacy interface types like L-Comparable
> are immediately useful (without bridges) for carrying
> value instances as well as object instances (and null).
> 
> * It is possible, in some cases, that standard and user-written
> collection classes will work correctly, without recompilation,
> with value types.  (This is a big claim, and valuable if true.
> Read on.)
> 
> * All basic operations that the JVM applies to R-types must
> extend immediately and pervasively to Q-types, since it
> applies them to L-type values (which may be either,
> dynamically).
> 
> * Today, simple movement of R-types is really cheap, just
> a machine pointer move.  That needs to be true for L-types
> in L-world, or else we will get systematic performance hits
> for legacy code, and new code will go slow too.
> 
> * There are a number of object-specific operations which
> the JVM applies to L-types.  The most common is "acmp"
> (the "==" operator for references).  Those operations must
> be enhanced to do something useful with values, with a
> possible runtime cost to detect the distinction between
> an L-type carrying a Q-value and an L-type carrying a
> legacy R-value.  The performance and usable semantics
> of these object operations will make L-world either
> a programmer's paradise or a…  well you know.
> 
> * There is no need for boxes, and they turn out to be
> undesirable.  Legacy types like java.lang.Integer must
> be given a golden watch and a pension, somehow.
> That's easy for the JVM but hard for the language,
> which mandates that "(Object)(int)x" produces an
> Integer rather than an "int".  It seemed a good idea
> at the time.
> 
> * There is no need for a new "universal" carrier type,
> since L-types do the whole job.  Before the L-world
> discussion, my thought has been that we want a 128-bit
> U-type and a 64-bit legacy L/R-type.  Somebody burst
> my bubble this week, by saying that if we do that,
> we may find that interpreter speeds for U-type generics
> will risk a built-in performance barrier just from the
> larger standard carrier type.  If we JVM folks can agree
> that U-types should be 64-bits (by all available means)
> then it is just a simple step to rename U to L.  This is
> the rabbit hole that took our conversation down to L-world.
> 
> * In L-world, the "acmp" instruction needs a very fast way
> to detect Q-values.  This *may* require a tag bit on the 64-bit
> root value.  That in turn will affect GC dynamics.  There is
> a delicate balance here—but we think there is a way through.
> 
> * We probably need extra interpreter profiling to track whether
> a given L-value has ever been a Q-value or an R-value,
> dynamically.  Today we do null tracking on some instructions.
> This probably needs to be upgraded to null/Q/R tracking,
> and perhaps on additional instructions such as "acmp".
> 
> * There are a number of ways to assign semantics to
> an object-like L operation when it encounters a Q-value.
> This will require additional mails, but I think we have
> identified about a half dozen models, of which one or two
> seem to be very promising:  Providing both useful semantics
> and amenable to optimization.
> 
> * One residual use for Q-types is in the declaration of
> instance fields.  In order to avoid loading *all* classfiles
> of types mentioned in field declarations, a classfile which
> declares a flattened field will need to include enough
> information to allow the classfile loader to load *only*
> those fields marked as requiring flattening.  There are
> at least two ways to do this:  Use a Q-type descriptor
> syntax *only* for field declarations, as today.  Or,
> require the ACC_VALUE bit on field declarations which
> are supposed to be flattened.
> 
> * As we were able to dispense with boxes, we may also
> dispense with non-flattened value types.  In that case,
> the translation strategy might emit an ACC_VALUE bit
> or Q-type on a field if and only if the classfile for the
> field's type defines it with ACC_VALUE.  The JVM will
> have to support non-flattened values in L-Object fields,
> of course.
> 
> * If the system uses a thread-local store for value structures
> (to avoid heap traffic), a store barrier will have to quickly
> detect Q-types that are inside the thread and reallocate
> them to the heap, when they are first stored to the heap
> (e.g., as an element of an L-Object array).
> 
> * The Q-type modifier *might* be useful in some settings
> to guarantee, in a verifiable way, that a given value is
> *not* an R-type, *not* null, and *not* modifiable; TBD.
> 
> * The R-type modifier *might* be useful in some settings
> to guarantee, in a verifiable way, that a given value is
> *not* a Q-type, and *does* have an object identity or
> is null.  This is also TBD.
> 
> * For best compatibility with legacy code, combined with
> diagnosability of anti-value algorithms like IdentityHashMap,
> the "acmp" instruction should return false unconditionally
> if either operand is a Q-value (punting to the following
> Object.equals call), and other object-like operations
> such as identityHashCode and monitorenter must throw
> errors in the JVM.  (In the language errors and warnings
> will be appropriate.)
> 
> * New operations are needed for substitutability checks
> which generalize reference equality and hashcode.
> These can be system methods, and do not need to be
> loaded onto either new or old bytecodes.
> 
> * We will almost certainly need to make primitives
> retroactively values.  This means "int" all along has
> really been Q-int (in the JVM) and is a real subtype
> of L-Object.
> 
> * Covariant array subtyping only works for R-types.
> So both int[] and DoubleComplex[] are *not* subtypes
> of Object[], even though int and DoubleComplex *are*
> subtypes of Object.
> 
> * From some points of view (legacy code), Q-values
> are masked invaders coming into the home of code
> which expected to work only on R-values.  Changing
> L-descriptors to encompass Q-values opens such
> code to potentially risky new behaviors.  Is it safe?
> Shouldn't we just have boxes to mediate values
> in such settings?  It depends on the code, really.
> 
> There's more, but this is enough for one message.
> 
> The L-world is very attractive:  No bridges or boxes,
> legacy code is value-enabled, and we get all the
> flattening we need.
> 
> We need to do some experiments:  Can we afford
> the extra Q-checks on acmp and storage to the heap?
> Will legacy algorithms really work on masked but not
> boxed values?  Do other JVM implementations experience
> similar trade-offs, or is this only a HotSpot-centric set
> of compromises?  Can we really avoid all those new
> descriptors and bridges!!??
> 
> Let's talk!
> 
> — John
> 
> P.S. Dan, you should send out your notes on U-types.


More information about the valhalla-spec-observers mailing list