Addressing the full range of use cases

Remi Forax forax at univ-mlv.fr
Wed Oct 6 09:56:27 UTC 2021


----- Original Message -----
> From: "daniel smith" <daniel.smith at oracle.com>
> To: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Mardi 5 Octobre 2021 01:34:37
> Subject: Addressing the full range of use cases

> When we talk about use cases for Valhalla, we've often considered a very broad
> set of class abstractions that represent immutable, identity-free data. JEP 401
> mentions varieties of integers and floats, points, dates and times, tuples,
> records, subarrays, cursors, etc. However, as shorthand this broad set often
> gets reduced to an example like Point or Int128, and these latter examples are
> not necessarily representative of all candidate value types.

yes !

> 
> Specifically, our favorite example classes have a property that doesn't
> generalize: they'll happily accept any combination of field values as a valid
> instance. (In fact, they're even happy to accept any combination of *bits* of
> the appropriate length.) Many candidate primitive classes don't have this
> property—the constructors do important validation work, and only certain
> combinations of fields are allowed to represent valid instances.

I now believe the mantra "code like a class acts as an int" is harmful.
A class provides encapsulation, an int has no encapsulation, there is a mismatch.

> 
> Related areas of concern that we've had on the radar for awhile:
> 
> - The "all zeros is your default value" strategy forces an all-zero instance
> into the class's value set, even if that doesn't make sense for the class. Many
> candidate classes have no reasonable default at all, leading naturally to wish
> for "null is your default value" (or other, more exotic, strategies involving
> revisiting the idea that every type has a default value). We've provided
> 'P.ref' for those use sites that *need* null, but haven't provided a complete
> story for value types that want it to be *their* default value, too.
> 
> - Non-atomic heap updates can be used to create new instances that arbitrary
> combine previously-validated instances' fields. There is no guarantee that the
> new combination of fields is semantically valid. Again, while there's precedent
> for this with 'double' and 'long' (JLS 17.7), those are special cases that
> don't generalize—any combination of double bit fields is *still a valid
> double*. (This is usually described as "tearing", although JLS 17.6 has
> something else in mind when it uses that word...) The language provides
> 'volatile' as a use-site opt-in to atomicity, and we've toyed with a
> declaration-site opt-in as well. But object integrity being "off" by default
> may not be ideal.
> 
> - Existing class types like LocalDate are both nullable and atomic. These are
> useful properties to preserve during migration; nullability, in particular, is
> essential for source compatibility. We've provided reference-default
> declarations as a mechanism to make reference types (which have these
> properties) the default, with 'P.val' as an opt-in to value types. But in doing
> so we take away the many benefits of value types by default, and force new code
> to work with the "bad name".

The existing class LocalDate is not atomic per se, atomic in Java implies volatile and currently if a LocalDate field is updated in one thread, another thread may never see that update.
LocalDate is currently not tearable, a QLocalDate; is tearable in case of racy code.

And yes, nullablibilty is a huge compatibility issue.

> 
> While we can provide enough knobs to accommodate all of these special cases,
> we're left with a complex user model which asks class authors to make n
> different choices they may not immediately grasp the consequences of, and class
> users to keep 2^n different categories straight in their heads.

yes !

> 
> As an alternative, we've been exploring whether a simpler model is workable. It
> is becoming clear that there are (at least) two clusters of uses for value
> types.  The "classic" value types are like numerics -- they'll happily accept
> any combination of field values as a valid instance, and the zero value is a
> sensible (often the best possible) default value.  They make relatively little
> use of encapsulation.  These are the ones that best "work like an int."  The
> "encapsulated" value types are those that are more like typical aggregates
> ("codes like a class") -- their constructors do important validation work, and
> only certain combinations of fields are allowed to represent valid instances.
> These are more likely to not have valid zero values (and hence want to be
> nullable).

I agree.

> 
> Some questions to consider for this approach:
> 
> - How do we group features into clusters so that they meet the sweet spot of
> user expectations and use cases while minimizing complexity? Is two clusters
> the right number? Is two already too many? (And what do we call them? What
> keywords best convey the intended intuitions?)

Two is too many, see below.

> 
> - If there are knobs within the clusters, what are the right defaults? E.g.,
> should atomicity be opt-in or opt-out?

I prefer opt-in, see below.

> 
> - What are the performance costs (or, in the other direction, performance gains)
> associated with each feature? For certain feature combinations, have we
> canceled out the performance gains over identity classes (and at that point, is
> that combination even worth supporting?)

Good question ...

Let's me reformulate.

But before, we can not that we have 3 ways of specifying primitive class features,
- we can use different types, by example, Foo.val vs Foo.ref
- we can have container attributes (opt-in or opt-out), by example, declaring a field volatile make it non tearable
- we have runtime knobs, like an array can allow null or not.

First the problem, as you said, if we have a code like the one just below,
the field primFoo is flattened so primFoo.someValue is 0 bypassing the constructor.

  primitive class PrimFoo {
    PrimFoo(int someValue) {
      if (someValue == 0) { throw new IAE(); }
      this.someValue = someValue;
    }

    int someValue;
  }

  class Foo {
    PrimFoo primFoo;
  }


I believe we should try to make a primitive class nullable and flattenable by default, so have one tent pole and have knobs for 2 special cases, non-nullable primitive classes (for use-cases like Complex) and non flattenable classes when stored in field/array cell (the use case "atomicity").

So a primitive class (the default):
- represent the null value (initialized) with a supplementary field when stored on heap, and a supplementary register if necessary
- is tearable in case of racy code (don't write racy code)
- is represented by a Q-type in the bytecode for full flattening or a L-type using a pointer to be backward compatible
- is represented by different java.lang.Class (one for the Q-type, the primary class and one for the L-Type, the secondary class)

I think that a Q-type can be backward compatible with a L-type in the method descriptors, a Q-type should be represented as a L-type + an out-of-band bit saying that this is a Q-type so it should be loaded eagerly (like we use out-of-band attributes for the generic specialization). Obviously, the way to create a Q-type (default + with + with) is still different from a L-type (new + dup + invokespecial) so creating a Q-type instead of a L-type is not backward compatible. So the VM has to generate several method entry points for method that is annotated with the attribute saying there is Q-type in the descriptor (or override a method with such attribute).

The special cases:
1) non-nullable when flattened.
   In believe that all primitive type should be nullable but that a user should have a knob to choose that a primitive class is non-nullable when flattened.
   So the VM will throw a NPE, if a field/or an array is annotated with something saying that null is not a supported value.
   For array, we already have that bit at runtime, i believe we should have a modifier for field saying that null is a possible value when flattened.
  
2) non tearable.
   We already support the modifier 'volatile' to say that a primitive class should be manipulated by pointer.
   Should we have a declaration site keyword, i don't know. It's perhaps a corner case where not using a primitive class is better.

To summarize, i believe that if a primitive class is always nullable (apart some opt-in special cases), it can be backward compatible (enough) to transform all value based class to primitive class and just let the new version of javac to replace all the L-type by Q-type in the method descriptor (using an atttribute) without asking the user to think too much about it (apart if the code is racy).

regards,
Rémi


More information about the valhalla-spec-observers mailing list