Addressing the full range of use cases

Wed Oct 6 09:56:27 UTC 2021

----- Original Message -----
> From: "daniel smith" <daniel.smith at oracle.com>
> To: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Mardi 5 Octobre 2021 01:34:37
> Subject: Addressing the full range of use cases

> When we talk about use cases for Valhalla, we've often considered a very broad
> set of class abstractions that represent immutable, identity-free data. JEP 401
> mentions varieties of integers and floats, points, dates and times, tuples,
> records, subarrays, cursors, etc. However, as shorthand this broad set often
> gets reduced to an example like Point or Int128, and these latter examples are
> not necessarily representative of all candidate value types.

yes !

> 
> Specifically, our favorite example classes have a property that doesn't
> generalize: they'll happily accept any combination of field values as a valid
> instance. (In fact, they're even happy to accept any combination of *bits* of
> the appropriate length.) Many candidate primitive classes don't have this
> property—the constructors do important validation work, and only certain
> combinations of fields are allowed to represent valid instances.

I now believe the mantra "code like a class acts as an int" is harmful.
A class provides encapsulation, an int has no encapsulation, there is a mismatch.

> 
> Related areas of concern that we've had on the radar for awhile:
> 
> - The "all zeros is your default value" strategy forces an all-zero instance
> into the class's value set, even if that doesn't make sense for the class. Many
> candidate classes have no reasonable default at all, leading naturally to wish
> for "null is your default value" (or other, more exotic, strategies involving
> revisiting the idea that every type has a default value). We've provided
> 'P.ref' for those use sites that *need* null, but haven't provided a complete
> story for value types that want it to be *their* default value, too.
> 
> - Non-atomic heap updates can be used to create new instances that arbitrary
> combine previously-validated instances' fields. There is no guarantee that the
> new combination of fields is semantically valid. Again, while there's precedent
> for this with 'double' and 'long' (JLS 17.7), those are special cases that
> don't generalize—any combination of double bit fields is *still a valid
> double*. (This is usually described as "tearing", although JLS 17.6 has
> something else in mind when it uses that word...) The language provides
> 'volatile' as a use-site opt-in to atomicity, and we've toyed with a
> declaration-site opt-in as well. But object integrity being "off" by default
> may not be ideal.
> 
> - Existing class types like LocalDate are both nullable and atomic. These are
> useful properties to preserve during migration; nullability, in particular, is
> essential for source compatibility. We've provided reference-default
> declarations as a mechanism to make reference types (which have these
> properties) the default, with 'P.val' as an opt-in to value types. But in doing
> so we take away the many benefits of value types by default, and force new code
> to work with the "bad name".

The existing class LocalDate is not atomic per se, atomic in Java implies volatile and currently if a LocalDate field is updated in one thread, another thread may never see that update.
LocalDate is currently not tearable, a QLocalDate; is tearable in case of racy code.

And yes, nullablibilty is a huge compatibility issue.

> 
> While we can provide enough knobs to accommodate all of these special cases,
> we're left with a complex user model which asks class authors to make n
> different choices they may not immediately grasp the consequences of, and class
> users to keep 2^n different categories straight in their heads.

yes !

> 
> As an alternative, we've been exploring whether a simpler model is workable. It
> is becoming clear that there are (at least) two clusters of uses for value
> types.  The "classic" value types are like numerics -- they'll happily accept
> any combination of field values as a valid instance, and the zero value is a
> sensible (often the best possible) default value.  They make relatively little
> use of encapsulation.  These are the ones that best "work like an int."  The
> "encapsulated" value types are those that are more like typical aggregates
> ("codes like a class") -- their constructors do important validation work, and
> only certain combinations of fields are allowed to represent valid instances.
> These are more likely to not have valid zero values (and hence want to be
> nullable).

I agree.

> 
> Some questions to consider for this approach:
> 
> - How do we group features into clusters so that they meet the sweet spot of
> user expectations and use cases while minimizing complexity? Is two clusters
> the right number? Is two already too many? (And what do we call them? What
> keywords best convey the intended intuitions?)

Two is too many, see below.

> 
> - If there are knobs within the clusters, what are the right defaults? E.g.,
> should atomicity be opt-in or opt-out?

I prefer opt-in, see below.

> 
> - What are the performance costs (or, in the other direction, performance gains)
> associated with each feature? For certain feature combinations, have we
> canceled out the performance gains over identity classes (and at that point, is
> that combination even worth supporting?)

Good question ...

Let's me reformulate.

But before, we can not that we have 3 ways of specifying primitive class features,
- we can use different types, by example, Foo.val vs Foo.ref
- we can have container attributes (opt-in or opt-out), by example, declaring a field volatile make it non tearable
- we have runtime knobs, like an array can allow null or not.

First the problem, as you said, if we have a code like the one just below,
the field primFoo is flattened so primFoo.someValue is 0 bypassing the constructor.

  primitive class PrimFoo {
    PrimFoo(int someValue) {
      if (someValue == 0) { throw new IAE(); }
      this.someValue = someValue;
    }

    int someValue;
  }

  class Foo {
    PrimFoo primFoo;
  }

I believe we should try to make a primitive class nullable and flattenable by default, so have one tent pole and have knobs for 2 special cases, non-nullable primitive classes (for use-cases like Complex) and non flattenable classes when stored in field/array cell (the use case "atomicity").

So a primitive class (the default):
- represent the null value (initialized) with a supplementary field when stored on heap, and a supplementary register if necessary
- is tearable in case of racy code (don't write racy code)
- is represented by a Q-type in the bytecode for full flattening or a L-type using a pointer to be backward compatible
- is represented by different java.lang.Class (one for the Q-type, the primary class and one for the L-Type, the secondary class)

I think that a Q-type can be backward compatible with a L-type in the method descriptors, a Q-type should be represented as a L-type + an out-of-band bit saying that this is a Q-type so it should be loaded eagerly (like we use out-of-band attributes for the generic specialization). Obviously, the way to create a Q-type (default + with + with) is still different from a L-type (new + dup + invokespecial) so creating a Q-type instead of a L-type is not backward compatible. So the VM has to generate several method entry points for method that is annotated with the attribute saying there is Q-type in the descriptor (or override a method with such attribute).

The special cases:
1) non-nullable when flattened.
   In believe that all primitive type should be nullable but that a user should have a knob to choose that a primitive class is non-nullable when flattened.
   So the VM will throw a NPE, if a field/or an array is annotated with something saying that null is not a supported value.
   For array, we already have that bit at runtime, i believe we should have a modifier for field saying that null is a possible value when flattened.

2) non tearable.
   We already support the modifier 'volatile' to say that a primitive class should be manipulated by pointer.
   Should we have a declaration site keyword, i don't know. It's perhaps a corner case where not using a primitive class is better.

To summarize, i believe that if a primitive class is always nullable (apart some opt-in special cases), it can be backward compatible (enough) to transform all value based class to primitive class and just let the new version of javac to replace all the L-type by Q-type in the method descriptor (using an atttribute) without asking the user to think too much about it (apart if the code is racy).

regards,
Rémi