Value type companions, encapsulated

John Rose john.r.rose at oracle.com
Wed Jul 13 19:13:33 UTC 2022


I have updated the document online in response to various comments.

http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md
http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html

The Valhalla JVM team is starting to look at these also.  I expect they 
will want to weigh in on the various JVMS details and issues.

So, thanks!  We should start some separate threads on some of the 
issues.

— John

P.S. For the record here are the diffs to the md file:


```
--- a/Users/jrose/Documents/Work/Valhalla/encapsulating-val.md.~6~
+++ b/Users/jrose/Documents/Work/Valhalla/encapsulating-val.md
@@ -1,5 +1,5 @@
  % Value type companions, encapsulated
-% John Rose for Valhalla EG, June 2022 (ver 0.1)
+% John Rose for Valhalla EG, July 2022 (ver 0.2)

  <meta pandoc-flags="--toc">
  <style type="text/css">
@@ -19,7 +19,7 @@
    p             { margin: .125in 0; }
    p+p,blockquote+p { text-indent: .125in; }
    h4            { margin: 0; }
-  a             { text-decoration: none; }
+  a             { text-decoration: underline; }
  </style>

  ## Background
@@ -34,13 +34,23 @@ restrictions]** at the end.)_
  ### Affordances of `C.ref`

  Every class or interface `C` comes with a companion type, the
-reference type `C.ref` derived from `C` which describes any variable
-(argument, return value, array element, etc.) whose values are either
-null or of a concrete class derived from `C`.  We are not in the habit
+reference type `C.ref` derived from `C` which describes any expression
+(variable, return value, array element, etc.) whose values are either
+null or are instances of a concrete class derived from `C`.
+
+> We are not in the habit
  of distinguishing `C.ref` from `C`, but the distinction is there.  For
  example, if we call `Object::getClass` on a variable of type `C.ref`
  we might not get `C.class`; we might even get a null pointer
-exception!
+exception!  Put another way, `C` as a class means a particular class
+declaration, while `C.ref` as a type means a variable which can refer
+to instances of class `C` or any subclass.  Also `C.ref` can be
+`null`, which is of no class at all.  One can view the result of
+`Object::getClass` as a *type* rather than a mere *class*, since the
+API of `Class` includes representation of types like `int` and `C.val`
+as well as classes.  In any case, the fact that a class can now have
+two associated types requires a clearer distinction between classes
+and types.

  We are so very used to working with reference types (for short,
  _ref-types_) that we sometimes forget all that they do for us
@@ -54,7 +64,7 @@ in addition to their linkage to specific classes:
   - `C.ref` allows a single large object to be shared from many 
locations.
   - `C.ref` with an identity class can centralize access to mutable 
state.
   - `C.ref` values uniformly convert to and from general types like 
`Object`.
- - `C.ref` variable types can be reflected using `Class` mirror 
objects.
+ - `C.ref` values are polymorphic (for non-final `C`), with varying 
`Object::getClass` values.
   - `C.ref` is safe for publication if the fields of `C` are `final`.

  When I store a bunch of `C` objects into an object array or list, sort
@@ -100,10 +110,22 @@ But the author of the class gets to decide which 
states are
  legitimate, and the decisions are enforced by access control at the
  boundaries of the encapsulation.

+> The author of an encapsulation determines whether the constant
+`C.default` is part of the public API or not.  Therefore, the value of
+`C.default` is non-constructed only if `C.val` is privatized.
+
  So if I code my class right, using access control to keep bad states
  away from my clients, my class's external API will have no
  non-constructed states.

+> Reflection and serialization provide additional modes of access to a
+class's API.  The author of an encapsulation must be given control
+over these modes of access as well.  (This is discussed further
+below.)  If the author of `C` allows deserialization of `C` values not
+otherwise constructible via the public API, those values must be
+regarded as constructed, not non-constructed, but the API may also
+be regarded as poorly designed.
+
  ### Costs of `C.ref`

  In that case why have value types at all, if references are so
@@ -119,7 +141,7 @@ always wish to pay:
    - A reference must be able to represent `null`; tightly-packed types 
like `int` and `long` would need to add an extra bit somewhere to cover 
this.

  The major alternative to references, as provided by Valhalla, is flat
-objects, where object fields are laid out immediately in their
+class instances, where instance fields are laid out immediately in 
their
  containers, in place of a pointer which points to them stored
  elsewhere.  Neither alternative is always better than the other, which
  is why Java has both `int` and `Integer` types and their arrays, and
@@ -140,13 +162,30 @@ The two companion types are closely related and 
perform some of the
  same jobs:

   - `C.ref` and `C.val` both give a starting point for accessing `C`'s 
members.
- - `C.ref` and `C.val` can link `C` objects into acyclic graphs.
+ - `C.ref` and `C.val` can link `C` instances into acyclic graphs.
   - `C.ref` and `C.val` values uniformly convert to and from general 
types like `Object`.
- - `C.ref` and `C.val` variable types can be reflected using `Class` 
mirror objects.

  For these jobs, it usually doesn't matter which type companion does
  the work.

+> Specifically,
+
+> - An expression of the form `myc.method()` cares about the class of
+   `myc` but not which companion type it is.  The same point is true
+   (probably) of methods like `Class::getMethods` which ignore the
+   distinction between the mirrors `C.ref.class` and `C.val.class`.
+
+> - I can build a tree of `C` nodes using children lists of either
+   companion type.  (If however my `C` node contains direct child
+   fields they cannot be of the `C.val` type.)
+
+> - Converting a variable `myc` to `Object` (or, respectively, casting
+   an `Object` to store in `myc`), does the same kind of thing
+   regardless of which companion type `myc` has.  The only difference
+   that `null` cannot be a result if `myc` is `C.val` (or,
+   respectively, that `null` is rejected as a `C.val` value).
+
+
  Despite the similarities, many properties of a value companion type
  are subtly different from any reference type:

@@ -158,6 +197,10 @@ are subtly different from any reference type:
   - `C.val` heap variables (fields, array elements) are initialized to 
all-zeroes.
   - `C.val` might not be safe for publication (even though its fields 
are `final`).

+The overall effect is that a `C.val` variable has a very specific
+concrete format, a flattened set of application-defined fields, often
+without added overhead from object headers and pointer chasing.
+
  The JVM distinguishes `C.val` by giving it a different descriptor, a
  so-called _Q-descriptor_ of the form `QC;`, and it also provides a
  so-called _secondary mirror_ `C.val.class` which is similar to the
@@ -166,7 +209,7 @@ built-in primitive mirrors like `int.class`.
  As the Valhalla performance model notes, flattening may be expected
  but is not fully guaranteed.  A `C.val` stored in an `Object`
  container is likely to be boxed on the heap, for example.  But `C.val`
-objects created as bytecode temporaries, arguments, and return values
+instances created as bytecode temporaries, arguments, and return values
  are likely to be flattened into machine registers, and `C.val` fields
  and array elements (at least below certain size thresholds) are also
  likely to be flattened into heap words.
@@ -177,12 +220,35 @@ value class.  There are additional terms and 
conditions for flattening
  Remember that reference types have full abstraction as one of their
  powers, and this means building data structures that can refer to them
  even before they are loaded.  But a class file can request that the 
JVM
-"peek" at a class to see if it is a value class, and if this request
-is acted on early enough (at the JVM's discretion), then the JVM can
+"peek" at a class to see if it is a value class.
+
+> This request is conveyed via the [`Preload` attribute] defined in
+recent drafts of [JEP 8277163 (Value Objects)].  If this request is
+acted on early enough (at the JVM's discretion), then the JVM can
  choose to lay out some or all `C.ref` values as flattened `C.val`
  values _plus_ a boolean or other sentinel value which indicates the
  `null` state.

+> If the JVM succeeds in flattening a `C.ref` variable, the JMM still
+requires that racing reads to such a variable will always return a
+consistent, safely published state.  The atomicity or non-atomicity of
+the `C.val` companion type has no effect on the races possible to a
+`C.ref` variable.  Thus, flattening a `C.ref` variable with a
+non-atomic value type is not simply a matter of adding a `null`
+channel field to a struct, if races are possible on that variable.
+Most machines today provide hardware atomicity only to 128 bits, so
+racing updates must probably be accomplished within the limits of 64-
+or 128-bit reads and writes, for a flattened `C.ref`.  It seems likely
+that the heap buffering enjoyed by today's value-based classes will
+also be the technique of choice in the future, at least for larger
+value classes, when their containers are in the heap.  Since JVM stack
+and locals can never race, adjoining a null state for a `C.ref` value
+can be a simple matter of allocating another calling sequence register
+or stack slot, for an argument or return value.
+
+[`Preload` attribute]: 
<http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220519/specs/value-objects-jvms.html#jvms-4.7.31>
+[JEP 8277163 (Value Objects)]: <https://openjdk.org/jeps/8277163>
+
  ### Pitfalls of `C.val`

  The advantages of value companion types imply some complementary
@@ -221,13 +287,16 @@ racing writes to the same mutable `C.val` variable 
in the heap.
  Unlike reference types, value types can be manipulated to create these
  non-constructed states even in well-designed classes.

-Now, it may be that a constructor (or factory) might be perfectly able
-to create one of the above non-constructed states as well, no strings
+Now, it may be that a public constructor (or factory) might be 
perfectly able
+to create a zero state or an arbitrary field combination, no strings
  attached.  In that case, the class author is enforcing few or no
  invariants on the states of the value class.  Many numeric classes,
  like complex numbers, are like this: Initialization to all-zeroes is
  no problem, and races between components are acceptable, compared to
-the costs of excluding races.
+the costs of excluding races.  The worst a race condition can ever do
+is create a state that is legitimately constructed via the class API.
+We can say that a class which is this permissive has no
+non-constructed states at all.

> (The reader may recall that early JVMs accepted races on the high
  and low halves of 64-bit integers as well; this is no longer a
@@ -259,6 +328,10 @@ Still, it turns out to be useful to give a common 
single point of
  declarative control to handle _all_ non-constructed states, both
  the default value of `C.val` and its mysterious data races.

+So different encapsulation authors will want to make different
+choices.  We will give them the means to make these choices.  And
+(spoiler alert) we will make the safest choice be the default choice.
+
  ## Privatization to the rescue

  _(Here are the important details about the encapsulation of value
@@ -273,13 +346,13 @@ companion can be used freely, fully under control 
of the class author.

  But untrusted clients are prevented from building uninitialized fields
  or arrays of type `C.val`.  This prevents such clients from creating
-(either accidentally or purposefully) non-constructed values of type
+(either accidentally or purposefully) non-constructed states of type
  `C.val`.  How privatization is declared and enforced is discussed in
  the rest of this document.

-> (To review, for those who skipped ahead, non-constructed values are
+> (To review, for those who skipped ahead, non-constructed states are
  those not created under control of the class `C` by constructors or
-other accessible API points.  A non-constructed value may be either an
+other accessible API points.  A non-constructed state may be either an
  uninitialized variable of `C.val`, or the result of a data race on a
  shared mutable variable of type `C.val`.  The class itself can work
  internally with such values all day long, but we exclude external
@@ -291,7 +364,7 @@ As a second tactic, a value class `C` may select 
whether or not the
  JVM enforces atomicity of all occurrences of its value companion
  `C.val`.  A non-atomic value companion is subject to data races, and
  if it is not privatized, external code may misuse `C.val` variables
-(in arrays or mutable fields) to create non-constructed values via
+(in arrays or mutable fields) to create non-constructed states via
  data races.

  A value companion which is atomic is not subject to data races.  This
@@ -328,7 +401,7 @@ of both choices (privatization and declared 
non-atomicity), although
  it is natural to try to boil down the size of the matrix.

    - `C.val` private & atomic is the default, and safest configuration
-  hiding all non-constructed values outside of `C` and all data races
+  hiding the most non-constructed states outside of `C` and all data 
races
    even inside of `C`.  There are some runtime costs.

    - `C.val` public & non-atomic is the opposite, with fewer runtime
@@ -338,12 +411,12 @@ it is natural to try to boil down the size of the 
matrix.
    non-atomic primitive like `long`.

    - `C.val` public & atomic allows everybody to see the all-zero
-  initial value but no other non-constructed states.  This is
+  initial value but no racing non-constructed states.  This is
    analogous to the situation of a naturally atomic primitive like
    `int`.

-  - `C.val` private & non-atomic allows `C` complete control over the
-  visibility of non-constructed states, but `C` also has the ability
+  - `C.val` private & non-atomic allows `C` complete access to and
+  control over non-constructed states, but `C` also has the ability
    to work internally on arrays of non-atomic elements.  `C` should
    take care not to leak internally-created flat arrays to untrusted
    clients, lest they use data races to hammer non-constructed values
@@ -428,7 +501,7 @@ will fail.  If the companion is neither `public` nor 
`private`, then
  Here is an example of a class which refuses to construct its default
  value, and which prevents clients from seeing that state:

-```
+```{#class-C}
  class C {
    int neverzero;
    public C(int x) {
@@ -500,7 +573,7 @@ have a right to expect that encapsulation of 
companion types will
  hope to re-use their knowledge about how type name access works when
  reasoning about companion types.  We aim to accommodate that hope.  If
  it works, users won't have to think very often about the class-vs-type
-distinction.  That is why the above design emulates pre-existing
+distinction.  That is also why the above design emulates pre-existing
  usage patterns for non-denotable types.

  ### Privatization in translation
@@ -518,13 +591,36 @@ The `value_flags` field (16 bits) has the 
following legitimate values:
    - zero: `C.val` default access, non-atomic
    - `ACC_PUBLIC`: `C.val` public access, non-atomic
    - `ACC_PRIVATE`: `C.val` private access, non-atomic
-  - `ACC_VOLATILE`: `C.val` default access, atomic
-  - `ACC_VOLATILE|ACC_PUBLIC`: `C.val` public access, atomic
-  - `ACC_VOLATILE|ACC_PRIVATE`: `C.val` private access, atomic
+  - `ACC_FINAL`: `C.val` default access, atomic
+  - `ACC_FINAL|ACC_PUBLIC`: `C.val` public access, atomic
+  - `ACC_FINAL|ACC_PRIVATE`: `C.val` private access, atomic

  Other values are rejected when the class file is loaded.

-(**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit?  Do we
+The choice of `ACC_FINAL` for this job is arbitrary.  It basically
+means "please ensure safe publication of `final` fields of this class,
+even for fields inside flattened instances."  The race conditions of a
+non-atomic variable of type `C.val` are about the same as (are
+isomorphic to) the race conditions for the states reachable from a
+non-varying non-null variable of type `MC.ref`, where `MC` is a
+hypothetical identity class containing the same instance fields as
+`C`, but whose fields are not declared `final`.  (Remember that `C`,
+being a value class, must have declared its fields `final`.)  Omitting
+`ACC_FINAL` above means about the same as using the non-final fields
+of `MC` to store `C.val` states.  Omitting `ACC_FINAL` is less safe
+for programmers, but much easier to implement in the JVM, since it can
+just peek and poke the fields retail, instead of updating the whole
+instance value in a wholesale transaction.
+
+> That is, if you see what I mean…  `ACC_VOLATILE` would be another
+clever pun along the same lines, since a `volatile` variable of type
+`long` is one which suppresses tearing race conditions.  But
+`volatile` means additional things as well.  Other puns could be
+attempted with `ACC_STATIC`, `ACC_STRICT`, `ACC_NATIVE`, and more.
+John likes `ACC_FINAL` because of the JMM connection to `final`
+fields.
+
+> (**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit?  Do we
  really care that `jlr.Modifiers` kind-of wants to own the reflection
  of the contextual modifier `value`?  Who are the customers of this
  modifier bit, as a bit?  John doesn't care about it personally, and
@@ -536,11 +632,11 @@ kinds of structural checks on the fly during class 
loading even before
  class attributes are processed.  Yet this also seems like a poor
  reason to use a modifier bit.)

-(**JVM ISSUE #1:** What if the attribute is missing; do we reject the
-class file or do we infer `value_flags=ACC_PRIVATE|ACC_VOLATILE`?
+> (**JVM ISSUE #1:** What if the attribute is missing; do we reject the
+class file or do we infer `value_flags=ACC_PRIVATE|ACC_FINAL`?
  Let's just reject the file.)

-(**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place
+> (**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place
  to store the "atomic" bit as well?  This attribute is a green-field
  for VM design, as opposed to the brown-field of modifier bits.  The
  above language assumes the atomic bit belongs in there as well.)
@@ -673,7 +769,7 @@ As required above, the `checkcast` bytecode treats 
full resolution and
  restricted resolution states the same.

  But when the `anewarray` or `multianewarray` instruction is executed,
-it consults throws an access error if its `CONSTANT_Class` is not
+it must throw an access error if its `CONSTANT_Class` is not
  fully resolved (either it is an error or is restricted).  This is how
  the JVM prevents creation of arrays whose component type is an
  inaccessible value companion type, even if the class file does
@@ -846,7 +942,7 @@ There are a number of standard API points for 
creating Java array
  objects.  When they create arrays containing uninitialized elements,
  then a non-constructed default value can appear.  Even when they
  create properly initialized arrays, if the type is declared
-non-atomic, then non-constructed values can be created by races.
+non-atomic, then non-constructed states can be created by races.

    - `java.lang.reflect.Array::newInstance` takes an element mirror and 
length and builds an array.  The elements of the returned array are 
initialized to the default value of the selected element type.
    - `java.util.Arrays::copyOf` and `copyOfRange` can extend the length 
of an existing array to include new uninitialized elements.
@@ -870,7 +966,7 @@ the creation of arrays of type `C.val[]` if `C.val` 
is not public.

    - The special overloading of `java.util.Arrays::copyOf` will refuse
      to create an array of any non-atomic privatized type.  (This
-    refusal protects against non-constructed values arising from data
+    refusal protects against non-constructed states arising from data
      races.)  It also incorporates the restrictions of its sibling
      methods, against creating uninitialized elements (even of an
      atomic type).
@@ -900,8 +996,8 @@ the creation of arrays of type `C.val[]` if `C.val` 
is not public.

  **API ISSUE #1:** Should we relax construction rules for zero-length
  arrays?  This would add complexity but might be a friendly move for
-some use cases.  A zero-length array cannot expose non-constructed
-values.  It may, however, serve as a misleading "witness" that some
+some use cases.  A zero-length array can never expose non-constructed
+states.  It may, however, serve as a misleading "witness" that some
  code has gained permission to work with flat arrays.  It's safer to
  disallow even zero-length arrays.

@@ -969,6 +1065,9 @@ refuse to expose default values of privatized value 
companions.
    legitimate need to convert nulls to privatized values can use
    conditional combinators to do this "the hard way".

+  - `MethodHandle::asType` will refuse to convert from a `void` return
+  to a privatized `C.val` type, similarly to `explicitCastArguments`.
+
    - The method `Lookup::accessCompanion` will be defined analogously
    to `Lookup::accessClass`.  If `Lookup::accessClass` is applied to a
    companion, it will check both the class and the companion, whereas
@@ -1003,7 +1102,7 @@ All such methods can be built on top of 
`MethodHandles.Lookup`.

  In general, a library API may be designed to preserve some aspect of
  companion safety, as it allows untrusted code to work with arrays of
-privatized value type, while preventing non-constructed values of that
+privatized value type, while preventing non-constructed states of that
  type from being materialized.  Each such safe and friendly API has to
  make a choice about how to prevent clients from creating
  non-constructed states, or perhaps how to allow clients to gain
@@ -1011,19 +1110,21 @@ privilege to do so.  Some points are worth 
remembering:

   - An unprivileged client must not obtain `C.default` if `C.val` is 
privatized.
   - An unprivileged client must not obtain a non-empty `C.val[]` array 
if `C.val` is privatized and non-atomic.
- - It's safe to build new (non-empty, mutable) arrays from (non-empty, 
mutable) old arrays, if the default is not injected.
+ - It's safe to build new (non-empty, mutable) arrays from (non-empty, 
mutable) old arrays, as long as new elements containing the `C.default` 
do not appear.
   - If a new array is somehow frozen or wrapped so as be effectively 
immutable, it is safe as long as it does not expose `C.default` values.
   - If a value companion is `public`, there is no need for any 
restriction.
   - Also, unrestricted use can be gated by a `Lookup` object or caller 
sensitivity.

> In the presence of a reconstruction capability, either in the
  language or in a library API or as provided by a single class,
-avoiding non-constructable objects includes allowing legitimate
+avoiding non-constructed instances includes allowing legitimate
  reconstruction requests; each legitimate reconstruction request must
  somehow preserve the intentions of the class's designer.
  Reconstruction should act as if field values had been legitimately
  (from `C`'s API) extracted, transformed, and then again legitimately
-(to `C`'s API) rebuilt into an instance of `C`.  Serialization is an
+(to `C`'s API) rebuilt into an instance of `C`.
+
+> Serialization is an
  example of reconstruction, since field values can be edited in the
  wire format.  Proposed `with` expressions for records are another
  example of reconstruction.  The `withfield` bytecode is the primitive
@@ -1031,7 +1132,25 @@ reconstruction operator, and must be restricted 
to nestmates of `C`
  since it can perform all physically possible field updates.
  Reconstruction operations defined outside of `C` must be designed with
  great care if they use elevated privileges beyond what `C` provides
-directly.
+directly.  Given the historically tricky nature of deserialization,
+more work is needed to consider what serialization of a C.val actually
+means and how it interacts with default reconstitution behaviours.
+One likely possibility is that wire formats should only work with
+`C.ref` types with proper construction paths (enforced by 
serialization),
+and leave conversion to `C.val` types to deserialization code inside
+the encapsulation of `C`.
+
+> JNI, like serialization, allows creation of arrays which is hard to
+constrain with access checks.  We have a choice of at least two
+positions on this.  We could allow JNI full permission to create any
+kind of arrays, thus effectively allowing it "inside the nest" of any
+value class, as far as array construction goes.  Or, we could say that
+JNI (like `Arrays::copyOf`) is absolutely forbidden to create
+uninitialized arrays of privatized value type.  The latter is probably
+acceptable.  As with other API points, programmers with a legitimate
+need to create flat privatized arrays can work around the limitations
+of the "nice" API points by using more complex ones that incorporate
+the necessary access checks.

  ## Summary of user model

@@ -1063,12 +1182,15 @@ to find a workaround, such as:
    - ask `C` politely to build such an array for you
    - crack into `C` with a reflective API and build your own

-If you look closely at the code for `C`, you might noticed that it
+If you looked closely at [the code for `C` above],
+you might have noticed that it
  uses its private type `C.val` in its public API.  This is allowed.
  Just be aware that null values will not flow through such API points.
  When you get a `C.val` value into your own code, you can work on it
  perfectly freely with the type `C` (which is `C.ref`).

+[the code for `C` above]: <#class-C>
+
  If a value companion `C.val` is declared `public`, the class has
  declared that it is willing to encounter its own default value
  `C.default` coming from untrusted code.  If it is declared `private`,
```
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220713/ea9aaf51/attachment-0001.htm>


More information about the valhalla-spec-observers mailing list