Fwd: nullable-inlined-values on the heap

Thu Jun 30 23:54:12 UTC 2022

 From the -comments list.

My comments: this posting feels mostly like a solution without stating 
what problem it is trying to solve, so its pretty hard to comment on.  
But ...

> Would it be possible to decomplect nullability from a variable's 
> encoding-mode (reference or inline)?

Not in reality.  A null is fundamentally a *reference* (or the absence 
of a reference.)  In theory, we could construct the union type int|Null, 
but this type doesn't have a practical representation in memory, and 
drags in all sorts of mismatches because union types would then flow 
throughout the system.  So the only practical way to represent "int or 
null" is "reference to int."  Which is to say, Integer (minus identity.)

> If this is possible, maybe Valhalla's Java could have a user-model 
> like this:

You should probably start with what problem you are trying to solve.

-------- Forwarded Message --------
Subject: 	nullable-inlined-values on the heap
Date: 	Thu, 30 Jun 2022 23:02:17 +0100
From: 	João Mendonça <jf.mend at gmail.com>
To: 	valhalla-spec-comments at openjdk.org

Hello,

Would it be possible to decomplect nullability from a variable's 
encoding-mode (reference or inline)?

I have been looking at the C# spec on "nullable-value-types" and I 
wonder if the Java runtime could do something similar under the hood to 
allow nullable-inlined-values, even on the heap.
I think that, compared to C#'s "value-types", Java can take advantage of 
the fact that its value-class instances are immutable, which means that 
pass-by-value or pass-by-reference is indistinguishable, which, with 
nullable-inlined-values,could mean that Java can have the variable 
encoding-mode completely encapsulated/hidden from the user-model as a 
runtime implementation detail.

If this is possible, maybe Valhalla's Java could have a user-model like 
this:

*** A decomplected user-model ***

For class-authors:

  - *value-knob* to reject identity - Applicable on class declarations, 
indicates that the class instances don't require identity (a value-class).
  - *zero-knob* to indicate that the value-class has a zero-value - if a 
value-class does not have a zero-value, its instances won't be inlined 
in any shared-variables (§17.4.1.) since this is the only way for the 
language to ensure the non-existence of the zero-value. If the 
value-class is declared with a zero-value, then care must be taken when 
reading/writing constructors since *no constructor invariant can exclude 
the zero-value*.
  - *tearable-knob* to allow tearing - Applicable on zero value-class 
declarations with bitSize > 32 bits, may be used by the class-author to 
hand the class-user the responsibility of how to avoid tearing, freeing 
the runtime to always inline instances in shared-mutables (non-final 
shared-variables). Conversely, if this knob is not used, instances will 
be kept atomic, which allows the class-author to guarantee constructor 
invariants *provided they're not broken by the zero-value*, which may be 
useful for the class implementation and class-users to rely upon.

For class-users:

  - *not-nullable-knob (!)* to exclude null from a variable's value-set 
- Applicable on any variable declarations. On nullable variables, the 
default value is null and, in either encoding-mode (reference or 
inline), the runtime is free to choose the encoding for the extra bit of 
information required to represent the null state.
  - *atomic-knob* to avoid tearing - Applicable on shared-mutable 
declarations, may be used to reverse the effect of the tearable-knob, 
thereby restoring atomicity.

The encoding-mode of a variable is decided at runtime according to this 
ternary expression:

var encodingMode =
         !valueClass(variable.type)         ? REFERENCE // value-knob
     :   tooBig(variable.type.bitSize)      ? REFERENCE
     :   !shared(variable)                  ? INLINE // (§17.4.1.)
     :   !zeroValueClass(variable.type)     ? REFERENCE // zero-knob
     :   final(variable)                    ? INLINE
     :   atomicWrite(variable.type.bitSize) ? INLINE
     :   atomic(variable)                   ? REFERENCE // atomic-knob
     :   tearableValueClass(variable.type)  ? INLINE // tearable-knob
     :                                        REFERENCE;

The variable.type.bitSize depends on nullability as nullable types may 
require more space.
The predicates tooBig and atomicWrite depend on the hardware. As an 
example, they could be:

     boolean tooBig(int bitSize)      {return bitSize > 256;}
     boolean atomicWrite(int bitSize) {return bitSize <= 64;}

Table-view of the user-model knobs:

identity            ‖  (identity) |                          value |
zeroness            ‖  (no-zero)  |     (no-zero) |                 zero 
                  |
atomicity           ‖  (atomic)   |     (atomic)     |   (atomic)      
|     tearable     |
nullability         ‖ (?)|  !   | (?)  |    !    | (?) |      !      | 
(?)|    !      |
==============================================================================================
encoding-mode       ‖  reference  | inline/reference                      |
needs reference     ‖  everywhere | shared-variables | 
no/shared-mutables |        no        |
definite-assignment ‖  no  | yes  |   no   |   yes   | no  |     yes     
| yes  |    yes    |
default             ‖ null | n.a. | null |   n.a.  | null |     n.a.    
| n.a. |    n.a.   |
init-default        ‖    null     |       null       | null |  
zero/null  | null | zero/null |

Notes:
  - tokens in parenthesis are the default when no knob is used
  - definite-assignment (§16.) means that the compiler enforces (to the 
best of its ability) variable initialization before usage
  - default is the default-value of a variable when not definitely-assigned
  - init-default is the default-value of a variable before any 
initialization code runs
  - on non-nullable zero value-classes, the init-default (zero or null) 
depends on the encoding-mode chosen by the runtime
  - on atomic zero value-classes, reference-encoding is needed on 
shared-mutables if instance bitSize cannot be written atomically

*** Migration of value-based classes ***

Requiring definite-assignment on all non-nullable shared-mutables is 
useful to get rid of missed-initialization-bugs, so I think it's a good 
idea to require it wherever source-compatibility allows.
In this model, all value-based classes can be migrated to (atomic) zero 
value-classes. Due to definite-assignment, even if LocalDate is migrated 
to a zero value-class, it will be hard to get an accidental "Jan 1, 
1970". Rational can also be a zero value-class but users will have to 
keep in mind that it's possible to get a zero-denominator Rational, even 
if the constructor throws when we try to build one.
To maintain source-compatibility, no migrated value-based class can be 
tearable, not even Double or Long, since wherever in existing code we 
have a field declaration such as:

     ValueBasedClass v;

v is always reference encoded and, therefore, atomic. For Double and 
Long, this is a bit of an anomaly, because it means that for these two 
primitives, and for them alone, each of these pair of field declarations 
will not be semantically equivalent:

long v;    // tearable
Long! v;   // atomic

double d; // tearable
Double! d; // atomic

João Mendonça
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-experts/attachments/20220630/448cb273/attachment-0001.htm>