nullable-inlined-values on the heap

Thu Jun 30 22:02:17 UTC 2022

Hello,

Would it be possible to decomplect nullability from a variable's
encoding-mode (reference or inline)?

I have been looking at the C# spec on "nullable-value-types" and I wonder
if the Java runtime could do something similar under the hood to allow
nullable-inlined-values, even on the heap.
I think that, compared to C#'s "value-types", Java can take advantage of
the fact that its value-class instances are immutable, which means that
pass-by-value or pass-by-reference is indistinguishable, which, with
nullable-inlined-values, could mean that Java can have the variable
encoding-mode completely encapsulated/hidden from the user-model as a
runtime implementation detail.

If this is possible, maybe Valhalla's Java could have a user-model like
this:

*** A decomplected user-model ***

For class-authors:

 - *value-knob* to reject identity - Applicable on class declarations,
indicates that the class instances don't require identity (a value-class).
 - *zero-knob* to indicate that the value-class has a zero-value - if a
value-class does not have a zero-value, its instances won't be inlined in
any shared-variables (§17.4.1.) since this is the only way for the language
to ensure the non-existence of the zero-value. If the value-class is
declared with a zero-value, then care must be taken when reading/writing
constructors since *no constructor invariant can exclude the zero-value*.
 - *tearable-knob* to allow tearing - Applicable on zero value-class
declarations with bitSize > 32 bits, may be used by the class-author to
hand the class-user the responsibility of how to avoid tearing, freeing the
runtime to always inline instances in shared-mutables (non-final
shared-variables). Conversely, if this knob is not used, instances will be
kept atomic, which allows the class-author to guarantee constructor
invariants *provided they're not broken by the zero-value*, which may be
useful for the class implementation and class-users to rely upon.

For class-users:

 - *not-nullable-knob (!)* to exclude null from a variable's value-set -
Applicable on any variable declarations. On nullable variables, the default
value is null and, in either encoding-mode (reference or inline), the
runtime is free to choose the encoding for the extra bit of information
required to represent the null state.
 - *atomic-knob* to avoid tearing - Applicable on shared-mutable
declarations, may be used to reverse the effect of the tearable-knob,
thereby restoring atomicity.

The encoding-mode of a variable is decided at runtime according to this
ternary expression:

var encodingMode =
        !valueClass(variable.type)         ? REFERENCE    // value-knob
    :   tooBig(variable.type.bitSize)      ? REFERENCE
    :   !shared(variable)                  ? INLINE       // (§17.4.1.)
    :   !zeroValueClass(variable.type)     ? REFERENCE    // zero-knob
    :   final(variable)                    ? INLINE
    :   atomicWrite(variable.type.bitSize) ? INLINE
    :   atomic(variable)                   ? REFERENCE    // atomic-knob
    :   tearableValueClass(variable.type)  ? INLINE       // tearable-knob
    :                                        REFERENCE;

The variable.type.bitSize depends on nullability as nullable types may
require more space.
The predicates tooBig and atomicWrite depend on the hardware. As an
example, they could be:

    boolean tooBig(int bitSize)      {return bitSize > 256;}
    boolean atomicWrite(int bitSize) {return bitSize <= 64;}

Table-view of the user-model knobs:

identity            ‖  (identity) |                          value
                         |
zeroness            ‖  (no-zero)  |     (no-zero)    |                 zero
                 |
atomicity           ‖  (atomic)   |     (atomic)     |      (atomic)
|     tearable     |
nullability         ‖ (?) |  !   |  (?)  |    !    | (?) |      !      | (?)
|    !      |
==============================================================================================
encoding-mode       ‖  reference  |                    inline/reference
                   |
needs reference     ‖  everywhere | shared-variables | no/shared-mutables |
       no        |
definite-assignment ‖  no  | yes  |   no   |   yes   |  no  |     yes     |
yes  |    yes    |
default             ‖ null | n.a. |  null  |   n.a.  | null |     n.a.    |
n.a. |    n.a.   |
init-default        ‖    null     |       null       | null |  zero/null  |
null | zero/null |

Notes:
 - tokens in parenthesis are the default when no knob is used
 - definite-assignment (§16.) means that the compiler enforces (to the best
of its ability) variable initialization before usage
 - default is the default-value of a variable when not definitely-assigned
 - init-default is the default-value of a variable before any
initialization code runs
 - on non-nullable zero value-classes, the init-default (zero or null) depends
on the encoding-mode chosen by the runtime
 - on atomic zero value-classes, reference-encoding is needed on
shared-mutables if instance bitSize cannot be written atomically

*** Migration of value-based classes ***

Requiring definite-assignment on all non-nullable shared-mutables is useful
to get rid of missed-initialization-bugs, so I think it's a good idea to
require it wherever source-compatibility allows.
In this model, all value-based classes can be migrated to (atomic) zero
value-classes. Due to definite-assignment, even if LocalDate is migrated to
a zero value-class, it will be hard to get an accidental "Jan 1, 1970".
Rational can also be a zero value-class but users will have to keep in mind
that it's possible to get a zero-denominator Rational, even if the
constructor throws when we try to build one.
To maintain source-compatibility, no migrated value-based class can be
tearable, not even Double or Long, since wherever in existing code we have
a field declaration such as:

    ValueBasedClass v;

v is always reference encoded and, therefore, atomic. For Double and Long,
this is a bit of an anomaly, because it means that for these two
primitives, and for them alone, each of these pair of field declarations
will not be semantically equivalent:

long v;    // tearable
Long! v;   // atomic

double d;  // tearable
Double! d; // atomic

João Mendonça
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-comments/attachments/20220630/00f511fd/attachment-0001.htm>