Value type companions, encapsulated
John Rose
john.r.rose at oracle.com
Wed Jul 13 19:13:33 UTC 2022
I have updated the document online in response to various comments.
http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md
http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html
The Valhalla JVM team is starting to look at these also. I expect they
will want to weigh in on the various JVMS details and issues.
So, thanks! We should start some separate threads on some of the
issues.
— John
P.S. For the record here are the diffs to the md file:
```
--- a/Users/jrose/Documents/Work/Valhalla/encapsulating-val.md.~6~
+++ b/Users/jrose/Documents/Work/Valhalla/encapsulating-val.md
@@ -1,5 +1,5 @@
% Value type companions, encapsulated
-% John Rose for Valhalla EG, June 2022 (ver 0.1)
+% John Rose for Valhalla EG, July 2022 (ver 0.2)
<meta pandoc-flags="--toc">
<style type="text/css">
@@ -19,7 +19,7 @@
p { margin: .125in 0; }
p+p,blockquote+p { text-indent: .125in; }
h4 { margin: 0; }
- a { text-decoration: none; }
+ a { text-decoration: underline; }
</style>
## Background
@@ -34,13 +34,23 @@ restrictions]** at the end.)_
### Affordances of `C.ref`
Every class or interface `C` comes with a companion type, the
-reference type `C.ref` derived from `C` which describes any variable
-(argument, return value, array element, etc.) whose values are either
-null or of a concrete class derived from `C`. We are not in the habit
+reference type `C.ref` derived from `C` which describes any expression
+(variable, return value, array element, etc.) whose values are either
+null or are instances of a concrete class derived from `C`.
+
+> We are not in the habit
of distinguishing `C.ref` from `C`, but the distinction is there. For
example, if we call `Object::getClass` on a variable of type `C.ref`
we might not get `C.class`; we might even get a null pointer
-exception!
+exception! Put another way, `C` as a class means a particular class
+declaration, while `C.ref` as a type means a variable which can refer
+to instances of class `C` or any subclass. Also `C.ref` can be
+`null`, which is of no class at all. One can view the result of
+`Object::getClass` as a *type* rather than a mere *class*, since the
+API of `Class` includes representation of types like `int` and `C.val`
+as well as classes. In any case, the fact that a class can now have
+two associated types requires a clearer distinction between classes
+and types.
We are so very used to working with reference types (for short,
_ref-types_) that we sometimes forget all that they do for us
@@ -54,7 +64,7 @@ in addition to their linkage to specific classes:
- `C.ref` allows a single large object to be shared from many
locations.
- `C.ref` with an identity class can centralize access to mutable
state.
- `C.ref` values uniformly convert to and from general types like
`Object`.
- - `C.ref` variable types can be reflected using `Class` mirror
objects.
+ - `C.ref` values are polymorphic (for non-final `C`), with varying
`Object::getClass` values.
- `C.ref` is safe for publication if the fields of `C` are `final`.
When I store a bunch of `C` objects into an object array or list, sort
@@ -100,10 +110,22 @@ But the author of the class gets to decide which
states are
legitimate, and the decisions are enforced by access control at the
boundaries of the encapsulation.
+> The author of an encapsulation determines whether the constant
+`C.default` is part of the public API or not. Therefore, the value of
+`C.default` is non-constructed only if `C.val` is privatized.
+
So if I code my class right, using access control to keep bad states
away from my clients, my class's external API will have no
non-constructed states.
+> Reflection and serialization provide additional modes of access to a
+class's API. The author of an encapsulation must be given control
+over these modes of access as well. (This is discussed further
+below.) If the author of `C` allows deserialization of `C` values not
+otherwise constructible via the public API, those values must be
+regarded as constructed, not non-constructed, but the API may also
+be regarded as poorly designed.
+
### Costs of `C.ref`
In that case why have value types at all, if references are so
@@ -119,7 +141,7 @@ always wish to pay:
- A reference must be able to represent `null`; tightly-packed types
like `int` and `long` would need to add an extra bit somewhere to cover
this.
The major alternative to references, as provided by Valhalla, is flat
-objects, where object fields are laid out immediately in their
+class instances, where instance fields are laid out immediately in
their
containers, in place of a pointer which points to them stored
elsewhere. Neither alternative is always better than the other, which
is why Java has both `int` and `Integer` types and their arrays, and
@@ -140,13 +162,30 @@ The two companion types are closely related and
perform some of the
same jobs:
- `C.ref` and `C.val` both give a starting point for accessing `C`'s
members.
- - `C.ref` and `C.val` can link `C` objects into acyclic graphs.
+ - `C.ref` and `C.val` can link `C` instances into acyclic graphs.
- `C.ref` and `C.val` values uniformly convert to and from general
types like `Object`.
- - `C.ref` and `C.val` variable types can be reflected using `Class`
mirror objects.
For these jobs, it usually doesn't matter which type companion does
the work.
+> Specifically,
+
+> - An expression of the form `myc.method()` cares about the class of
+ `myc` but not which companion type it is. The same point is true
+ (probably) of methods like `Class::getMethods` which ignore the
+ distinction between the mirrors `C.ref.class` and `C.val.class`.
+
+> - I can build a tree of `C` nodes using children lists of either
+ companion type. (If however my `C` node contains direct child
+ fields they cannot be of the `C.val` type.)
+
+> - Converting a variable `myc` to `Object` (or, respectively, casting
+ an `Object` to store in `myc`), does the same kind of thing
+ regardless of which companion type `myc` has. The only difference
+ that `null` cannot be a result if `myc` is `C.val` (or,
+ respectively, that `null` is rejected as a `C.val` value).
+
+
Despite the similarities, many properties of a value companion type
are subtly different from any reference type:
@@ -158,6 +197,10 @@ are subtly different from any reference type:
- `C.val` heap variables (fields, array elements) are initialized to
all-zeroes.
- `C.val` might not be safe for publication (even though its fields
are `final`).
+The overall effect is that a `C.val` variable has a very specific
+concrete format, a flattened set of application-defined fields, often
+without added overhead from object headers and pointer chasing.
+
The JVM distinguishes `C.val` by giving it a different descriptor, a
so-called _Q-descriptor_ of the form `QC;`, and it also provides a
so-called _secondary mirror_ `C.val.class` which is similar to the
@@ -166,7 +209,7 @@ built-in primitive mirrors like `int.class`.
As the Valhalla performance model notes, flattening may be expected
but is not fully guaranteed. A `C.val` stored in an `Object`
container is likely to be boxed on the heap, for example. But `C.val`
-objects created as bytecode temporaries, arguments, and return values
+instances created as bytecode temporaries, arguments, and return values
are likely to be flattened into machine registers, and `C.val` fields
and array elements (at least below certain size thresholds) are also
likely to be flattened into heap words.
@@ -177,12 +220,35 @@ value class. There are additional terms and
conditions for flattening
Remember that reference types have full abstraction as one of their
powers, and this means building data structures that can refer to them
even before they are loaded. But a class file can request that the
JVM
-"peek" at a class to see if it is a value class, and if this request
-is acted on early enough (at the JVM's discretion), then the JVM can
+"peek" at a class to see if it is a value class.
+
+> This request is conveyed via the [`Preload` attribute] defined in
+recent drafts of [JEP 8277163 (Value Objects)]. If this request is
+acted on early enough (at the JVM's discretion), then the JVM can
choose to lay out some or all `C.ref` values as flattened `C.val`
values _plus_ a boolean or other sentinel value which indicates the
`null` state.
+> If the JVM succeeds in flattening a `C.ref` variable, the JMM still
+requires that racing reads to such a variable will always return a
+consistent, safely published state. The atomicity or non-atomicity of
+the `C.val` companion type has no effect on the races possible to a
+`C.ref` variable. Thus, flattening a `C.ref` variable with a
+non-atomic value type is not simply a matter of adding a `null`
+channel field to a struct, if races are possible on that variable.
+Most machines today provide hardware atomicity only to 128 bits, so
+racing updates must probably be accomplished within the limits of 64-
+or 128-bit reads and writes, for a flattened `C.ref`. It seems likely
+that the heap buffering enjoyed by today's value-based classes will
+also be the technique of choice in the future, at least for larger
+value classes, when their containers are in the heap. Since JVM stack
+and locals can never race, adjoining a null state for a `C.ref` value
+can be a simple matter of allocating another calling sequence register
+or stack slot, for an argument or return value.
+
+[`Preload` attribute]:
<http://cr.openjdk.java.net/~dlsmith/jep8277163/jep8277163-20220519/specs/value-objects-jvms.html#jvms-4.7.31>
+[JEP 8277163 (Value Objects)]: <https://openjdk.org/jeps/8277163>
+
### Pitfalls of `C.val`
The advantages of value companion types imply some complementary
@@ -221,13 +287,16 @@ racing writes to the same mutable `C.val` variable
in the heap.
Unlike reference types, value types can be manipulated to create these
non-constructed states even in well-designed classes.
-Now, it may be that a constructor (or factory) might be perfectly able
-to create one of the above non-constructed states as well, no strings
+Now, it may be that a public constructor (or factory) might be
perfectly able
+to create a zero state or an arbitrary field combination, no strings
attached. In that case, the class author is enforcing few or no
invariants on the states of the value class. Many numeric classes,
like complex numbers, are like this: Initialization to all-zeroes is
no problem, and races between components are acceptable, compared to
-the costs of excluding races.
+the costs of excluding races. The worst a race condition can ever do
+is create a state that is legitimately constructed via the class API.
+We can say that a class which is this permissive has no
+non-constructed states at all.
> (The reader may recall that early JVMs accepted races on the high
and low halves of 64-bit integers as well; this is no longer a
@@ -259,6 +328,10 @@ Still, it turns out to be useful to give a common
single point of
declarative control to handle _all_ non-constructed states, both
the default value of `C.val` and its mysterious data races.
+So different encapsulation authors will want to make different
+choices. We will give them the means to make these choices. And
+(spoiler alert) we will make the safest choice be the default choice.
+
## Privatization to the rescue
_(Here are the important details about the encapsulation of value
@@ -273,13 +346,13 @@ companion can be used freely, fully under control
of the class author.
But untrusted clients are prevented from building uninitialized fields
or arrays of type `C.val`. This prevents such clients from creating
-(either accidentally or purposefully) non-constructed values of type
+(either accidentally or purposefully) non-constructed states of type
`C.val`. How privatization is declared and enforced is discussed in
the rest of this document.
-> (To review, for those who skipped ahead, non-constructed values are
+> (To review, for those who skipped ahead, non-constructed states are
those not created under control of the class `C` by constructors or
-other accessible API points. A non-constructed value may be either an
+other accessible API points. A non-constructed state may be either an
uninitialized variable of `C.val`, or the result of a data race on a
shared mutable variable of type `C.val`. The class itself can work
internally with such values all day long, but we exclude external
@@ -291,7 +364,7 @@ As a second tactic, a value class `C` may select
whether or not the
JVM enforces atomicity of all occurrences of its value companion
`C.val`. A non-atomic value companion is subject to data races, and
if it is not privatized, external code may misuse `C.val` variables
-(in arrays or mutable fields) to create non-constructed values via
+(in arrays or mutable fields) to create non-constructed states via
data races.
A value companion which is atomic is not subject to data races. This
@@ -328,7 +401,7 @@ of both choices (privatization and declared
non-atomicity), although
it is natural to try to boil down the size of the matrix.
- `C.val` private & atomic is the default, and safest configuration
- hiding all non-constructed values outside of `C` and all data races
+ hiding the most non-constructed states outside of `C` and all data
races
even inside of `C`. There are some runtime costs.
- `C.val` public & non-atomic is the opposite, with fewer runtime
@@ -338,12 +411,12 @@ it is natural to try to boil down the size of the
matrix.
non-atomic primitive like `long`.
- `C.val` public & atomic allows everybody to see the all-zero
- initial value but no other non-constructed states. This is
+ initial value but no racing non-constructed states. This is
analogous to the situation of a naturally atomic primitive like
`int`.
- - `C.val` private & non-atomic allows `C` complete control over the
- visibility of non-constructed states, but `C` also has the ability
+ - `C.val` private & non-atomic allows `C` complete access to and
+ control over non-constructed states, but `C` also has the ability
to work internally on arrays of non-atomic elements. `C` should
take care not to leak internally-created flat arrays to untrusted
clients, lest they use data races to hammer non-constructed values
@@ -428,7 +501,7 @@ will fail. If the companion is neither `public` nor
`private`, then
Here is an example of a class which refuses to construct its default
value, and which prevents clients from seeing that state:
-```
+```{#class-C}
class C {
int neverzero;
public C(int x) {
@@ -500,7 +573,7 @@ have a right to expect that encapsulation of
companion types will
hope to re-use their knowledge about how type name access works when
reasoning about companion types. We aim to accommodate that hope. If
it works, users won't have to think very often about the class-vs-type
-distinction. That is why the above design emulates pre-existing
+distinction. That is also why the above design emulates pre-existing
usage patterns for non-denotable types.
### Privatization in translation
@@ -518,13 +591,36 @@ The `value_flags` field (16 bits) has the
following legitimate values:
- zero: `C.val` default access, non-atomic
- `ACC_PUBLIC`: `C.val` public access, non-atomic
- `ACC_PRIVATE`: `C.val` private access, non-atomic
- - `ACC_VOLATILE`: `C.val` default access, atomic
- - `ACC_VOLATILE|ACC_PUBLIC`: `C.val` public access, atomic
- - `ACC_VOLATILE|ACC_PRIVATE`: `C.val` private access, atomic
+ - `ACC_FINAL`: `C.val` default access, atomic
+ - `ACC_FINAL|ACC_PUBLIC`: `C.val` public access, atomic
+ - `ACC_FINAL|ACC_PRIVATE`: `C.val` private access, atomic
Other values are rejected when the class file is loaded.
-(**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit? Do we
+The choice of `ACC_FINAL` for this job is arbitrary. It basically
+means "please ensure safe publication of `final` fields of this class,
+even for fields inside flattened instances." The race conditions of a
+non-atomic variable of type `C.val` are about the same as (are
+isomorphic to) the race conditions for the states reachable from a
+non-varying non-null variable of type `MC.ref`, where `MC` is a
+hypothetical identity class containing the same instance fields as
+`C`, but whose fields are not declared `final`. (Remember that `C`,
+being a value class, must have declared its fields `final`.) Omitting
+`ACC_FINAL` above means about the same as using the non-final fields
+of `MC` to store `C.val` states. Omitting `ACC_FINAL` is less safe
+for programmers, but much easier to implement in the JVM, since it can
+just peek and poke the fields retail, instead of updating the whole
+instance value in a wholesale transaction.
+
+> That is, if you see what I mean… `ACC_VOLATILE` would be another
+clever pun along the same lines, since a `volatile` variable of type
+`long` is one which suppresses tearing race conditions. But
+`volatile` means additional things as well. Other puns could be
+attempted with `ACC_STATIC`, `ACC_STRICT`, `ACC_NATIVE`, and more.
+John likes `ACC_FINAL` because of the JMM connection to `final`
+fields.
+
+> (**JVM ISSUE #0:** Can we kill the `ACC_VALUE` modifier bit? Do we
really care that `jlr.Modifiers` kind-of wants to own the reflection
of the contextual modifier `value`? Who are the customers of this
modifier bit, as a bit? John doesn't care about it personally, and
@@ -536,11 +632,11 @@ kinds of structural checks on the fly during class
loading even before
class attributes are processed. Yet this also seems like a poor
reason to use a modifier bit.)
-(**JVM ISSUE #1:** What if the attribute is missing; do we reject the
-class file or do we infer `value_flags=ACC_PRIVATE|ACC_VOLATILE`?
+> (**JVM ISSUE #1:** What if the attribute is missing; do we reject the
+class file or do we infer `value_flags=ACC_PRIVATE|ACC_FINAL`?
Let's just reject the file.)
-(**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place
+> (**JVM ISSUE #2:** Is this `ValueClass` attribute really a good place
to store the "atomic" bit as well? This attribute is a green-field
for VM design, as opposed to the brown-field of modifier bits. The
above language assumes the atomic bit belongs in there as well.)
@@ -673,7 +769,7 @@ As required above, the `checkcast` bytecode treats
full resolution and
restricted resolution states the same.
But when the `anewarray` or `multianewarray` instruction is executed,
-it consults throws an access error if its `CONSTANT_Class` is not
+it must throw an access error if its `CONSTANT_Class` is not
fully resolved (either it is an error or is restricted). This is how
the JVM prevents creation of arrays whose component type is an
inaccessible value companion type, even if the class file does
@@ -846,7 +942,7 @@ There are a number of standard API points for
creating Java array
objects. When they create arrays containing uninitialized elements,
then a non-constructed default value can appear. Even when they
create properly initialized arrays, if the type is declared
-non-atomic, then non-constructed values can be created by races.
+non-atomic, then non-constructed states can be created by races.
- `java.lang.reflect.Array::newInstance` takes an element mirror and
length and builds an array. The elements of the returned array are
initialized to the default value of the selected element type.
- `java.util.Arrays::copyOf` and `copyOfRange` can extend the length
of an existing array to include new uninitialized elements.
@@ -870,7 +966,7 @@ the creation of arrays of type `C.val[]` if `C.val`
is not public.
- The special overloading of `java.util.Arrays::copyOf` will refuse
to create an array of any non-atomic privatized type. (This
- refusal protects against non-constructed values arising from data
+ refusal protects against non-constructed states arising from data
races.) It also incorporates the restrictions of its sibling
methods, against creating uninitialized elements (even of an
atomic type).
@@ -900,8 +996,8 @@ the creation of arrays of type `C.val[]` if `C.val`
is not public.
**API ISSUE #1:** Should we relax construction rules for zero-length
arrays? This would add complexity but might be a friendly move for
-some use cases. A zero-length array cannot expose non-constructed
-values. It may, however, serve as a misleading "witness" that some
+some use cases. A zero-length array can never expose non-constructed
+states. It may, however, serve as a misleading "witness" that some
code has gained permission to work with flat arrays. It's safer to
disallow even zero-length arrays.
@@ -969,6 +1065,9 @@ refuse to expose default values of privatized value
companions.
legitimate need to convert nulls to privatized values can use
conditional combinators to do this "the hard way".
+ - `MethodHandle::asType` will refuse to convert from a `void` return
+ to a privatized `C.val` type, similarly to `explicitCastArguments`.
+
- The method `Lookup::accessCompanion` will be defined analogously
to `Lookup::accessClass`. If `Lookup::accessClass` is applied to a
companion, it will check both the class and the companion, whereas
@@ -1003,7 +1102,7 @@ All such methods can be built on top of
`MethodHandles.Lookup`.
In general, a library API may be designed to preserve some aspect of
companion safety, as it allows untrusted code to work with arrays of
-privatized value type, while preventing non-constructed values of that
+privatized value type, while preventing non-constructed states of that
type from being materialized. Each such safe and friendly API has to
make a choice about how to prevent clients from creating
non-constructed states, or perhaps how to allow clients to gain
@@ -1011,19 +1110,21 @@ privilege to do so. Some points are worth
remembering:
- An unprivileged client must not obtain `C.default` if `C.val` is
privatized.
- An unprivileged client must not obtain a non-empty `C.val[]` array
if `C.val` is privatized and non-atomic.
- - It's safe to build new (non-empty, mutable) arrays from (non-empty,
mutable) old arrays, if the default is not injected.
+ - It's safe to build new (non-empty, mutable) arrays from (non-empty,
mutable) old arrays, as long as new elements containing the `C.default`
do not appear.
- If a new array is somehow frozen or wrapped so as be effectively
immutable, it is safe as long as it does not expose `C.default` values.
- If a value companion is `public`, there is no need for any
restriction.
- Also, unrestricted use can be gated by a `Lookup` object or caller
sensitivity.
> In the presence of a reconstruction capability, either in the
language or in a library API or as provided by a single class,
-avoiding non-constructable objects includes allowing legitimate
+avoiding non-constructed instances includes allowing legitimate
reconstruction requests; each legitimate reconstruction request must
somehow preserve the intentions of the class's designer.
Reconstruction should act as if field values had been legitimately
(from `C`'s API) extracted, transformed, and then again legitimately
-(to `C`'s API) rebuilt into an instance of `C`. Serialization is an
+(to `C`'s API) rebuilt into an instance of `C`.
+
+> Serialization is an
example of reconstruction, since field values can be edited in the
wire format. Proposed `with` expressions for records are another
example of reconstruction. The `withfield` bytecode is the primitive
@@ -1031,7 +1132,25 @@ reconstruction operator, and must be restricted
to nestmates of `C`
since it can perform all physically possible field updates.
Reconstruction operations defined outside of `C` must be designed with
great care if they use elevated privileges beyond what `C` provides
-directly.
+directly. Given the historically tricky nature of deserialization,
+more work is needed to consider what serialization of a C.val actually
+means and how it interacts with default reconstitution behaviours.
+One likely possibility is that wire formats should only work with
+`C.ref` types with proper construction paths (enforced by
serialization),
+and leave conversion to `C.val` types to deserialization code inside
+the encapsulation of `C`.
+
+> JNI, like serialization, allows creation of arrays which is hard to
+constrain with access checks. We have a choice of at least two
+positions on this. We could allow JNI full permission to create any
+kind of arrays, thus effectively allowing it "inside the nest" of any
+value class, as far as array construction goes. Or, we could say that
+JNI (like `Arrays::copyOf`) is absolutely forbidden to create
+uninitialized arrays of privatized value type. The latter is probably
+acceptable. As with other API points, programmers with a legitimate
+need to create flat privatized arrays can work around the limitations
+of the "nice" API points by using more complex ones that incorporate
+the necessary access checks.
## Summary of user model
@@ -1063,12 +1182,15 @@ to find a workaround, such as:
- ask `C` politely to build such an array for you
- crack into `C` with a reflective API and build your own
-If you look closely at the code for `C`, you might noticed that it
+If you looked closely at [the code for `C` above],
+you might have noticed that it
uses its private type `C.val` in its public API. This is allowed.
Just be aware that null values will not flow through such API points.
When you get a `C.val` value into your own code, you can work on it
perfectly freely with the type `C` (which is `C.ref`).
+[the code for `C` above]: <#class-C>
+
If a value companion `C.val` is declared `public`, the class has
declared that it is willing to encounter its own default value
`C.default` coming from untrusted code. If it is declared `private`,
```
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220713/ea9aaf51/attachment-0001.htm>
More information about the valhalla-spec-observers
mailing list