nullable-inlined-values on the heap
João Mendonça
jf.mend at gmail.com
Thu Jun 30 22:02:17 UTC 2022
Hello,
Would it be possible to decomplect nullability from a variable's
encoding-mode (reference or inline)?
I have been looking at the C# spec on "nullable-value-types" and I wonder
if the Java runtime could do something similar under the hood to allow
nullable-inlined-values, even on the heap.
I think that, compared to C#'s "value-types", Java can take advantage of
the fact that its value-class instances are immutable, which means that
pass-by-value or pass-by-reference is indistinguishable, which, with
nullable-inlined-values, could mean that Java can have the variable
encoding-mode completely encapsulated/hidden from the user-model as a
runtime implementation detail.
If this is possible, maybe Valhalla's Java could have a user-model like
this:
*** A decomplected user-model ***
For class-authors:
- *value-knob* to reject identity - Applicable on class declarations,
indicates that the class instances don't require identity (a value-class).
- *zero-knob* to indicate that the value-class has a zero-value - if a
value-class does not have a zero-value, its instances won't be inlined in
any shared-variables (§17.4.1.) since this is the only way for the language
to ensure the non-existence of the zero-value. If the value-class is
declared with a zero-value, then care must be taken when reading/writing
constructors since *no constructor invariant can exclude the zero-value*.
- *tearable-knob* to allow tearing - Applicable on zero value-class
declarations with bitSize > 32 bits, may be used by the class-author to
hand the class-user the responsibility of how to avoid tearing, freeing the
runtime to always inline instances in shared-mutables (non-final
shared-variables). Conversely, if this knob is not used, instances will be
kept atomic, which allows the class-author to guarantee constructor
invariants *provided they're not broken by the zero-value*, which may be
useful for the class implementation and class-users to rely upon.
For class-users:
- *not-nullable-knob (!)* to exclude null from a variable's value-set -
Applicable on any variable declarations. On nullable variables, the default
value is null and, in either encoding-mode (reference or inline), the
runtime is free to choose the encoding for the extra bit of information
required to represent the null state.
- *atomic-knob* to avoid tearing - Applicable on shared-mutable
declarations, may be used to reverse the effect of the tearable-knob,
thereby restoring atomicity.
The encoding-mode of a variable is decided at runtime according to this
ternary expression:
var encodingMode =
!valueClass(variable.type) ? REFERENCE // value-knob
: tooBig(variable.type.bitSize) ? REFERENCE
: !shared(variable) ? INLINE // (§17.4.1.)
: !zeroValueClass(variable.type) ? REFERENCE // zero-knob
: final(variable) ? INLINE
: atomicWrite(variable.type.bitSize) ? INLINE
: atomic(variable) ? REFERENCE // atomic-knob
: tearableValueClass(variable.type) ? INLINE // tearable-knob
: REFERENCE;
The variable.type.bitSize depends on nullability as nullable types may
require more space.
The predicates tooBig and atomicWrite depend on the hardware. As an
example, they could be:
boolean tooBig(int bitSize) {return bitSize > 256;}
boolean atomicWrite(int bitSize) {return bitSize <= 64;}
Table-view of the user-model knobs:
identity ‖ (identity) | value
|
zeroness ‖ (no-zero) | (no-zero) | zero
|
atomicity ‖ (atomic) | (atomic) | (atomic)
| tearable |
nullability ‖ (?) | ! | (?) | ! | (?) | ! | (?)
| ! |
==============================================================================================
encoding-mode ‖ reference | inline/reference
|
needs reference ‖ everywhere | shared-variables | no/shared-mutables |
no |
definite-assignment ‖ no | yes | no | yes | no | yes |
yes | yes |
default ‖ null | n.a. | null | n.a. | null | n.a. |
n.a. | n.a. |
init-default ‖ null | null | null | zero/null |
null | zero/null |
Notes:
- tokens in parenthesis are the default when no knob is used
- definite-assignment (§16.) means that the compiler enforces (to the best
of its ability) variable initialization before usage
- default is the default-value of a variable when not definitely-assigned
- init-default is the default-value of a variable before any
initialization code runs
- on non-nullable zero value-classes, the init-default (zero or null) depends
on the encoding-mode chosen by the runtime
- on atomic zero value-classes, reference-encoding is needed on
shared-mutables if instance bitSize cannot be written atomically
*** Migration of value-based classes ***
Requiring definite-assignment on all non-nullable shared-mutables is useful
to get rid of missed-initialization-bugs, so I think it's a good idea to
require it wherever source-compatibility allows.
In this model, all value-based classes can be migrated to (atomic) zero
value-classes. Due to definite-assignment, even if LocalDate is migrated to
a zero value-class, it will be hard to get an accidental "Jan 1, 1970".
Rational can also be a zero value-class but users will have to keep in mind
that it's possible to get a zero-denominator Rational, even if the
constructor throws when we try to build one.
To maintain source-compatibility, no migrated value-based class can be
tearable, not even Double or Long, since wherever in existing code we have
a field declaration such as:
ValueBasedClass v;
v is always reference encoded and, therefore, atomic. For Double and Long,
this is a bit of an anomaly, because it means that for these two
primitives, and for them alone, each of these pair of field declarations
will not be semantically equivalent:
long v; // tearable
Long! v; // atomic
double d; // tearable
Double! d; // atomic
João Mendonça
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-comments/attachments/20220630/00f511fd/attachment-0001.htm>
More information about the valhalla-spec-comments
mailing list