Revisiting default values

Remi Forax forax at univ-mlv.fr
Wed Jul 15 14:19:49 UTC 2020


So the default value may a valid value or may be an invalid value,
if it's an invalid value it should be the author of the class that say that because in Java we prefer declaration site to use site.

One way is to try to teach the VM how to do the conversions, i want to explore another way where we try to solve that issue at the language level, to avoid to have a more complex VM.

A default value which is invalid should behave like null, i.e. calling any methods on the default value should result in an exception.
Doing that at the language level means adding a check before calling any instance methods and before accessing any instance fields.

So there are two parts to solve,
1/ how to specify the check, is it just this == Inline.default or is it whatever the user want (or something in the middle, like a field check)
2/ how to execute that check when accessing a field or a method ?

Let explore the solution that offers the maximum freedom for the author of the inline class, i.e. for 1/, the check is user defined.
For that we can introduce a new kind of initializer, like the static block, let's call it the invariant block
  inline class Foo {
    private final Object o;

    invariant {
      if (o == null) {
        throw new InvalidFooException();
      }
    }
  }
this invariant block is translated into a method (that has the name <invariant> see later why) and is called each time a method or a field is accessed.

For 2/, we can either change the spec of the VM so the invariant block is called automatically by the VM or we can use invokedynamic.
invokedynamic has the advantage of not requiring more VM support at the expanse of the bootstrap issue. 

The main issue with invokedynamic is that it's not a backward compatible change because it requires to change the call sites.
So we can lessen the requirement like this, requiring only the call to <invariant> when accessing an instance method because
we suppose that people will not be foolish enough to declare the fields public,
In that case, there is no need for using invokedynamic because a call to the invariant method can be inserted by the compiler at the beginning of any instance method.
This solution also has the advantage of lowering the cost at runtime compared to using invokedynamic.

In term of performance, i believe the language spec should say that the invariant block has to be idempotent.
Because in that case, the VM is free to not execute several calls to the <invariant> method once one is executed on a specific instance
(like the JITs do nullchecks collapsing currently).

To summarize, i believe we should allow more value based classes to be retrofitted as inline class by adding the concept of invariant block to the Java language spec.
An invariant block being a simple idempotent method called at the beginning of every instance methods.  

Rémi

----- Mail original -----
> De: "daniel smith" <daniel.smith at oracle.com>
> À: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Envoyé: Vendredi 10 Juillet 2020 20:23:25
> Objet: Revisiting default values

> Brian pointed out that my list of candidate inline classes in the Identity
> Warnings JEP (JDK-8249100) includes a number of classes that, despite being
> "value-based classes" and disavowing their identity, might not end up as inline
> classes. The problem? Default values.
> 
> This might be a good time to revisit the open design issues surrounding default
> values and see if we can make some progress.
> 
> Background/status quo: every inline class has a default instance, which provides
> the initial value of fields and array components that have the inline type
> (e.g., in 'new Point[10]'). It's also the prototype instance used to create all
> other instances (start with 'vdefault', then apply 'withfield' as needed). The
> default value is, by fiat, the class instance produced by setting all fields to
> *their* default values. Often, but not always, this means field/array
> initialization amounts to setting all the bits to 0. Importantly, no user code
> is involved in creating a default instance.
> 
> Real code is always useful for grounding design discussions, so let's start
> there. Among the classes I listed as inline class candidates, we can put them
> in three buckets:
> 
> Bucket #1: Have a reasonable default, as declared.
> - wrapper classes (the primitive zeros)
> - Optional & friends (empty)
> - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), Duration
> (0s), Period (0d), Year (1 BC, if that's acceptable)
> 
> Bucket #2: Could have a reasonable default after re-interpreting fields.
> - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, ZonedDateTime,
> OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, MinguoDate, HijrahDate,
> JapaneseDate, ThaiBuddhistDate (months and days should be nonzero; null
> Strings, ZoneIds, HijrahChronologies, and JapaneseEras require special
> handling)
> - ListN, SetN, MapN (null array interpreted as empty)
> 
> Bucket #3: No good default.
> - Runtime.Version (need a non-null List<Integer>)
> - ProcessHandleImpl (need a valid process ID)
> - List12, Set12, Map1 (need a non-null value)
> - All ConstantDesc implementations (need real class & method names, etc.)
> 
> There's some subjectivity between the 2nd and 3rd buckets, but the idea behind
> the 2nd is that, with some translation layer between physical fields and
> interpretation of those fields, we can come up with an intuitive default (e.g.,
> "0 means January"; "a null String means time zone 'UTC'"). In contrast, in the
> third bucket, any attempt to define a default value is going to be pretty
> unintuitive ("A null method name means 'toString'").
> 
> The question here is how much work the JVM and language are willing to do, or
> how much work we're willing to ask clients to do, in order to support use cases
> that don't fall into Bucket #1.
> 
> I don't think totally excluding Buckets #2 and #3 is a very good outcome. It
> means that, in many cases, inline classes need to be built up exclusively from
> primitives or other inline types, because if you use reference types, your
> default value will have a null field. (Sometimes, as in Optional, null fields
> have straightforward interpretations, but most of the time programs are
> designed to prevent them.)
> 
> Whether we support Bucket #2 but not Bucket #3 is a harder question. It wouldn't
> be so bad if none of the examples above in Bucket #3 become inline classes—for
> the most part they're handled via interfaces, anyway. (Counterpoint: inline
> class instances that are immediately typed with interface types still
> potentially provide a performance boost.) But I'm also not sure this is
> representative. We've noted before that many use cases, like database records
> or data structure cursors, don't have meaningful defaults (what's a default
> mailing address?). The ConstantDesc classes really illustrate this, even though
> they happen to not be public.
> 
> Another observation is that if we support Bucket #3 but not Bucket #2, that's
> probably not a big deal—I'm not sure anybody really *wants* to deal with the
> default instance; it's just the price you pay for being an inline class. If
> there's a way to opt out of that extra weirdness and move from Bucket #2 to
> Bucket #3, great.
> 
> With that discussion in mind, here are some summaries of approaches we've
> considered, or that I think we ought to consider, for supporting buckets #2 and
> #3. (This is as best as I recall. If there's something I've missed, add it to
> the list!)
> 
> [Weighing in for myself: my current preference is to do one of F, G, or I. I'm
> not that interested in supporting Bucket #2, for reasons given above, although
> Option A works for programmers who really want it.]
> 
> 
> 
> === Solutions to support Bucket #2 ===
> 
> Two broad strategies here: re-interpreting fields (A, B), and re-interpreting
> the default instance (C, D).
> 
> ---
> 
> Option A: Encourage programmers to re-interpret fields
> 
> Guidance to programmers: when you declare an inline class, identify any fields
> for which the default instance should hold something other than zero/null;
> define a mapping for your implementation from zero/null to the value you want.
> 
> One way to do this is to define a (possibly private) getter for each field, and
> include logic like 'return month + 1' or 'return id == null ? "UTC" : id'. Or
> maybe you inline that logic, as long as you're careful to do so everywhere.
> Importantly, you also need to reverse the logic in your constructor—for the
> sake of '==', if somebody manually creates the default instance, you should
> set fields to zero/null.
> 
> This doesn't work if you want public fields, but that's life as an OO
> programmer.
> 
> In this approach, it would be important that inline classes be expected to
> document their default instance in Javadoc (perhaps with a new Javadoc tag)—the
> interpretation of the default instance is less apparent to users than "all
> zeros".
> 
> Limitations:
> 
> - It's a fairly error-prone approach. Programmers will absolutely forget to
> apply the mapping in one place, and everything will be fine until somebody
> tries to invoke a particular method on the default instance. Put that bug in a
> security-sensitive context, and maybe you have an exploit. (Something that
> could help some is choosing good names—call your field 'monthIndex', not plain
> 'month', to remind yourself that it's zero-based.)
> 
> - Performance impact of an extra layer of computation on all field accesses.
> Probably not a big deal in general, but all those null checks, etc., could have
> a negative impact in certain contexts. And the *appearance* of extra cost might
> scare programmers away from doing the right thing ("eh, I probably won't use
> the default value anyway, I'll just ignore it to make my code faster").
> 
> ---
> 
> Option B: Language support for field re-interpretation
> 
> The language allows inline classes to declare fields with mappings to/from an
> internal representation. Just like Option A, but with guarantees that the
> internal representation isn't inappropriately accessed directly.
> 
> This pulls on a thread we explored a bit for Amber awhile back, some form of
> "abstract fields" or "virtual fields". Maybe there's something there, but it
> seems like a general-purpose feature, and one we're not likely to reach a final
> solution on anytime soon.
> 
> ---
> 
> Option C: Language support for a designated default
> 
> The language provides some way for programmers to declare the "logical" default
> instance (something like a special static field). The compiler inserts a test
> for the "physical" default on any field/array access, and replaces it with the
> logical default.
> 
> That is:
> 
> Point p = points[3];
> 
> compiles to
> 
> point p$0 = points[3];
> Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0;
> 
> This is much less bug-prone than Option A—the compiler does all the work—and
> much more achievable in the short/medium term than Option B.
> 
> Compared to Option B, this pushes the computation overhead from inline class
> field accesses to reads of the inline type from fields/arrays. I don't know if
> that's good or bad—maybe a wash, heavily dependent on the use case.
> 
> A few big problems:
> 
> - The physical default still exists, and malicious bytecode can use it. If
> programmers want strong guarantees, they'll have to check and throw wherever an
> untrusted instance is provided. (Clients with access to the inline class's
> fields have to do so, too.)
> 
> - Covariant arrays mean every read from any array type that might be flattened
> (Object[], Runnable[], ConstantDesc[], ...) has to go through translation
> logic.
> 
> - There's an assumption here that the programmer doesn't intend to use the
> physical default as a valid non-default instance. That's hard for the compiler
> to enforce, and weird stuff happens in fields/arrays if the programmer doesn't
> prevent it. (Could be mitigated with extra implicit logic on field/array writes
> or in constructors.)
> 
> ---
> 
> Option D: JVM support for a designated default
> 
> The VM allows inline classes to designate a logical default instance, and the
> field/array access instructions map from the physical default to the logical
> default. The 'vdefault' instruction produces the logical default instance;
> something else is used by the class's factories to build from the physical
> default.
> 
> This addresses the first two problems with Option C—the VM gives strong
> guarantees, and can make the translation a virtual operation of certain arrays.
> 
> To address the second problem, it seems like we'd need the more complex logic I
> hinted at: on writes, map the physical default to the logical default, and map
> the logical default to the physical default. Do the reverse on reads.
> 
> The problem here is bytecode complexity/slowdowns. We've already added some
> complexity to 'aaload'/'aastore' (covariant flattened arrays), and anticipate
> similar changes to 'putfield'/'getfield' (specialized fields), so maybe that
> means we might as well do more. Or maybe it means we're already over budget.
> :-)
> 
> From the users' perspective, if any performance reduction on reads/writes can be
> limited to the inline classes in Bucket #2, *all* the options have a similar
> cost, whether imposed by the programmer, language, or VM. So, to a first
> approximation, slower opcode execution is fine.
> 
> 
> 
> === Solutions to support Bucket #3 ===
> 
> Two broad strategies here: rejecting member accesses on the default instance (E,
> F, G), and preventing programs from ever seeing the default instance (H, I).
> 
> ---
> 
> Option E: Encourage programmers to guard against default instances
> 
> Guidance to programmers: if you don't like your class's default instance, check
> for it in your methods and throw. Maybe Java SE defines a new RuntimeException
> to encourage this.
> 
> The simple way to do this is with some boilerplate at the start of all your
> methods:
> 
> if (this == MyClass.default) throw new InvalidDefaultException();
> 
> More permissive classes could just do some validation on the fields that are
> relevant to a particular operation. (E.g., 'getMonth' doesn't care if 'zoneId'
> is null.)
> 
> This doesn't work if you want public fields, but that's life as an OO
> programmer.
> 
> It's not ideal that an invalid instance can float around a program until
> somebody trips on one of these checks, rather than detecting the invalid value
> earlier—we're propagating the NPE problem. And it takes some getting used to
> that there are two null-like values in the reference type's domain.
> 
> ---
> 
> Option F: Language support for default instance guards
> 
> An inline class declaration can indicate that the default instance is invalid.
> The compiler generates guards, as in Option E, at the start of all instance
> method bodies, and perhaps on all field accesses outside of those methods.
> 
> Programmers give up finer-grained control, but get more safety. I'm sure most
> would be happy with that trade.
> 
> Improper/separately-compiled bytecode can skip the field access checks, but
> that's a minor concern.
> 
> Same issues as Option E regarding adding a "new NPE" to the platform.
> 
> ---
> 
> Option G: JVM support for default instance guards
> 
> Inline class files can indicate that their default instance is invalid. All
> attempts to operate on that instance (via field/method accesses, other than
> 'withfield') result in an exception.
> 
> This tightens up Option F, making it just as impossible to access members of the
> default instance as it is to access members of 'null'.
> 
> Same issues as Option E regarding adding a "new NPE" to the platform.
> 
> ---
> 
> Option H: Language checks on field/array reads
> 
> An inline class declaration can indicate that the default instance is invalid.
> Every field and array access that may involved an uninitialized field/array
> component of that inline type gets augmented with a check that rejects reads of
> the default value (treating it as "you forgot to initialize this variable").
> 
> That is:
> 
> Point p = points[3];
> 
> compiles to
> 
> point p$0 = points[3];
> if (p$0 == [vdefault Point]) throw new UninitializedVariableException();
> Point p = p$0;
> 
> This is much like Option C, and has roughly the same advantages/problems.
> There's not a strong guarantee that the default value won't pop up from
> untrusted bytecode (or unreliable inline class authors), and lots of array
> types need guards.
> 
> ---
> 
> Option I: JVM checks on field/array reads
> 
> Inline class files can indicate that their default instance is invalid. When
> reading from a field/array component of the inline type
> ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default value
> is found (treating it as "you forgot to initialize this variable"). The
> 'vdefault' instruction, like 'withfield', is illegal outside of the inline
> class's nest.
> 
> Better than Option H in that it can be optimized to occur on only certain reads,
> and in that it provides strong guarantees—only the inline class can ever "see"
> the default instance.
> 
> Well, unless the inline class chooses to share that instance with the world. Not
> sure how we prevent that. But maybe at that point, anything bad/weird that
> happens is the author's own fault. (E.g., putting the default value in an array
> will make that component effectively "uninitialized" again.)
> 
> Like Option D, there's a question of whether we're willing to add this
> complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is
> that at least it's less complexity than you have in Option D.


More information about the valhalla-spec-observers mailing list