nullable-inlined-values on the heap

João Mendonça jf.mend at gmail.com
Fri Jul 1 05:02:04 UTC 2022


>
> My comments: this posting feels mostly like a solution without stating
> what problem it is trying to solve, so its pretty hard to comment on.
>


The problem it's trying to solve is to remove the .val and .ref
operators/knobs/concepts from the user-model without any loss of
performance or loss of control over nullability, zeroness or atomicity. In
other words, the objective is to take Kevin's ref-by-default idea one step
further.


In theory, we could construct the union type int|Null, but this type
> doesn't have a practical representation in memory, ...
>


Would it be possible to have a value-class give rise to these 3
hidden/runtime-only companion-types on the heap:

    RefType  - reference to a value-instance or no-reference (null)
    ValType  - inlined [value-instance-fields]
    ValType? - inlined [nullability-boolean + value-instance-fields]

Then, the runtime could transparently choose between RefType|ValType for
non-nullable variables or between RefType|ValType? for nullable variables,
depending on hardware, bitSize, zeroness and atomicity constraints, as
explained by the ternary expression in my previous email. Of course, since
ValType? has a higher bitSize than ValType, nullable values will be less
likely to be inlined. But still, the point is: could nullable values
sometimes be inlined on the heap as opposed to never being inlined.


In theory, we could construct the union type int|Null, but this (...) drags
> in all sorts of mismatches because union types would then flow throughout
> the system.
>


Is my 3-companion-types solution a real union type? Sure, I am suggesting
two sort-of-unions:

    RefType|ValType  - for non-nullable value-class variables
    RefType|ValType? - for nullable value-class variables

However, to the user, both types in each union represent the same exact
value-set.


On Fri, 1 Jul 2022 at 00:55, Brian Goetz <brian.goetz at oracle.com> wrote:

> From the -comments list.
>
> My comments: this posting feels mostly like a solution without stating
> what problem it is trying to solve, so its pretty hard to comment on.  But
> ...
>
> Would it be possible to decomplect nullability from a variable's
> encoding-mode (reference or inline)?
>
>
> Not in reality.  A null is fundamentally a *reference* (or the absence of
> a reference.)  In theory, we could construct the union type int|Null, but
> this type doesn't have a practical representation in memory, and drags in
> all sorts of mismatches because union types would then flow throughout the
> system.  So the only practical way to represent "int or null" is "reference
> to int."  Which is to say, Integer (minus identity.)
>
> If this is possible, maybe Valhalla's Java could have a user-model like
> this:
>
>
> You should probably start with what problem you are trying to solve.
>
>
>
>
>
> -------- Forwarded Message --------
> Subject: nullable-inlined-values on the heap
> Date: Thu, 30 Jun 2022 23:02:17 +0100
> From: João Mendonça <jf.mend at gmail.com> <jf.mend at gmail.com>
> To: valhalla-spec-comments at openjdk.org
>
> Hello,
>
>
> Would it be possible to decomplect nullability from a variable's
> encoding-mode (reference or inline)?
>
> I have been looking at the C# spec on "nullable-value-types" and I wonder
> if the Java runtime could do something similar under the hood to allow
> nullable-inlined-values, even on the heap.
> I think that, compared to C#'s "value-types", Java can take advantage of
> the fact that its value-class instances are immutable, which means that
> pass-by-value or pass-by-reference is indistinguishable, which, with
> nullable-inlined-values, could mean that Java can have the variable
> encoding-mode completely encapsulated/hidden from the user-model as a
> runtime implementation detail.
>
> If this is possible, maybe Valhalla's Java could have a user-model like
> this:
>
>
> *** A decomplected user-model ***
>
> For class-authors:
>
>  - *value-knob* to reject identity - Applicable on class declarations,
> indicates that the class instances don't require identity (a value-class).
>  - *zero-knob* to indicate that the value-class has a zero-value - if a
> value-class does not have a zero-value, its instances won't be inlined in
> any shared-variables (§17.4.1.) since this is the only way for the language
> to ensure the non-existence of the zero-value. If the value-class is
> declared with a zero-value, then care must be taken when reading/writing
> constructors since *no constructor invariant can exclude the zero-value*.
>  - *tearable-knob* to allow tearing - Applicable on zero value-class
> declarations with bitSize > 32 bits, may be used by the class-author to
> hand the class-user the responsibility of how to avoid tearing, freeing the
> runtime to always inline instances in shared-mutables (non-final
> shared-variables). Conversely, if this knob is not used, instances will be
> kept atomic, which allows the class-author to guarantee constructor
> invariants *provided they're not broken by the zero-value*, which may be
> useful for the class implementation and class-users to rely upon.
>
> For class-users:
>
>  - *not-nullable-knob (!)* to exclude null from a variable's value-set -
> Applicable on any variable declarations. On nullable variables, the default
> value is null and, in either encoding-mode (reference or inline), the
> runtime is free to choose the encoding for the extra bit of information
> required to represent the null state.
>  - *atomic-knob* to avoid tearing - Applicable on shared-mutable
> declarations, may be used to reverse the effect of the tearable-knob,
> thereby restoring atomicity.
>
>
> The encoding-mode of a variable is decided at runtime according to this
> ternary expression:
>
> var encodingMode =
>         !valueClass(variable.type)         ? REFERENCE    // value-knob
>     :   tooBig(variable.type.bitSize)      ? REFERENCE
>     :   !shared(variable)                  ? INLINE       // (§17.4.1.)
>     :   !zeroValueClass(variable.type)     ? REFERENCE    // zero-knob
>     :   final(variable)                    ? INLINE
>     :   atomicWrite(variable.type.bitSize) ? INLINE
>     :   atomic(variable)                   ? REFERENCE    // atomic-knob
>     :   tearableValueClass(variable.type)  ? INLINE       // tearable-knob
>     :                                        REFERENCE;
>
> The variable.type.bitSize depends on nullability as nullable types may
> require more space.
> The predicates tooBig and atomicWrite depend on the hardware. As an
> example, they could be:
>
>     boolean tooBig(int bitSize)      {return bitSize > 256;}
>     boolean atomicWrite(int bitSize) {return bitSize <= 64;}
>
>
> Table-view of the user-model knobs:
>
> identity            ‖  (identity) |                          value
>                          |
> zeroness            ‖  (no-zero)  |     (no-zero)    |
> zero                  |
> atomicity           ‖  (atomic)   |     (atomic)     |      (atomic)
> |     tearable     |
> nullability         ‖ (?) |  !   |  (?)  |    !    | (?) |      !      |
> (?) |     !     |
>
> ==============================================================================================
> encoding-mode       ‖  reference  |                    inline/reference
>                    |
> needs reference     ‖  everywhere | shared-variables | no/shared-mutables
> |        no        |
> definite-assignment ‖  no  | yes  |   no   |   yes   |  no  |     yes
> | yes  |    yes    |
> default             ‖ null | n.a. |  null  |   n.a.  | null |     n.a.
> | n.a. |    n.a.   |
> init-default        ‖    null     |       null       | null |  zero/null
> | null | zero/null |
>
> Notes:
>  - tokens in parenthesis are the default when no knob is used
>  - definite-assignment (§16.) means that the compiler enforces (to the
> best of its ability) variable initialization before usage
>  - default is the default-value of a variable when not definitely-assigned
>  - init-default is the default-value of a variable before any
> initialization code runs
>  - on non-nullable zero value-classes, the init-default (zero or null) depends
> on the encoding-mode chosen by the runtime
>  - on atomic zero value-classes, reference-encoding is needed on
> shared-mutables if instance bitSize cannot be written atomically
>
>
> *** Migration of value-based classes ***
>
> Requiring definite-assignment on all non-nullable shared-mutables is
> useful to get rid of missed-initialization-bugs, so I think it's a good
> idea to require it wherever source-compatibility allows.
> In this model, all value-based classes can be migrated to (atomic) zero
> value-classes. Due to definite-assignment, even if LocalDate is migrated to
> a zero value-class, it will be hard to get an accidental "Jan 1, 1970".
> Rational can also be a zero value-class but users will have to keep in mind
> that it's possible to get a zero-denominator Rational, even if the
> constructor throws when we try to build one.
> To maintain source-compatibility, no migrated value-based class can be
> tearable, not even Double or Long, since wherever in existing code we have
> a field declaration such as:
>
>     ValueBasedClass v;
>
> v is always reference encoded and, therefore, atomic. For Double and Long,
> this is a bit of an anomaly, because it means that for these two
> primitives, and for them alone, each of these pair of field declarations
> will not be semantically equivalent:
>
> long v;    // tearable
> Long! v;   // atomic
>
> double d;  // tearable
> Double! d; // atomic
>
>
> João Mendonça
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-comments/attachments/20220701/a4e6860a/attachment-0001.htm>


More information about the valhalla-spec-comments mailing list