Revisiting default values

Zheka Kozlov orionllmain at gmail.com
Mon Jul 13 04:45:34 UTC 2020


Hi Dan!

Sorry for a probably stupid question but aren't all classes from Bucket #2
and #3 ref-default? Which means when we are calling new LocalDate[10], all
elements of the array are initialized to null. And since the
constructors of these classes are private, the external user will never see
the instances in their default state.

So why do we need to care about the default initialization at all? Am I
wrong?


сб, 11 июл. 2020 г. в 01:25, Dan Smith <daniel.smith at oracle.com>:

> Brian pointed out that my list of candidate inline classes in the Identity
> Warnings JEP (JDK-8249100) includes a number of classes that, despite being
> "value-based classes" and disavowing their identity, might not end up as
> inline classes. The problem? Default values.
>
> This might be a good time to revisit the open design issues surrounding
> default values and see if we can make some progress.
>
> Background/status quo: every inline class has a default instance, which
> provides the initial value of fields and array components that have the
> inline type (e.g., in 'new Point[10]'). It's also the prototype instance
> used to create all other instances (start with 'vdefault', then apply
> 'withfield' as needed). The default value is, by fiat, the class instance
> produced by setting all fields to *their* default values. Often, but not
> always, this means field/array initialization amounts to setting all the
> bits to 0. Importantly, no user code is involved in creating a default
> instance.
>
> Real code is always useful for grounding design discussions, so let's
> start there. Among the classes I listed as inline class candidates, we can
> put them in three buckets:
>
> Bucket #1: Have a reasonable default, as declared.
> - wrapper classes (the primitive zeros)
> - Optional & friends (empty)
> - From java.time: Instant (start of 1970-01-01), LocalTime (midnight),
> Duration (0s), Period (0d), Year (1 BC, if that's acceptable)
>
> Bucket #2: Could have a reasonable default after re-interpreting fields.
> - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime,
> ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion,
> MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days
> should be nonzero; null Strings, ZoneIds, HijrahChronologies, and
> JapaneseEras require special handling)
> - ListN, SetN, MapN (null array interpreted as empty)
>
> Bucket #3: No good default.
> - Runtime.Version (need a non-null List<Integer>)
> - ProcessHandleImpl (need a valid process ID)
> - List12, Set12, Map1 (need a non-null value)
> - All ConstantDesc implementations (need real class & method names, etc.)
>
> There's some subjectivity between the 2nd and 3rd buckets, but the idea
> behind the 2nd is that, with some translation layer between physical fields
> and interpretation of those fields, we can come up with an intuitive
> default (e.g., "0 means January"; "a null String means time zone 'UTC'").
> In contrast, in the third bucket, any attempt to define a default value is
> going to be pretty unintuitive ("A null method name means 'toString'").
>
> The question here is how much work the JVM and language are willing to do,
> or how much work we're willing to ask clients to do, in order to support
> use cases that don't fall into Bucket #1.
>
> I don't think totally excluding Buckets #2 and #3 is a very good outcome.
> It means that, in many cases, inline classes need to be built up
> exclusively from primitives or other inline types, because if you use
> reference types, your default value will have a null field. (Sometimes, as
> in Optional, null fields have straightforward interpretations, but most of
> the time programs are designed to prevent them.)
>
> Whether we support Bucket #2 but not Bucket #3 is a harder question. It
> wouldn't be so bad if none of the examples above in Bucket #3 become inline
> classes—for the most part they're handled via interfaces, anyway.
> (Counterpoint: inline class instances that are immediately typed with
> interface types still potentially provide a performance boost.) But I'm
> also not sure this is representative. We've noted before that many use
> cases, like database records or data structure cursors, don't have
> meaningful defaults (what's a default mailing address?). The ConstantDesc
> classes really illustrate this, even though they happen to not be public.
>
> Another observation is that if we support Bucket #3 but not Bucket #2,
> that's probably not a big deal—I'm not sure anybody really *wants* to deal
> with the default instance; it's just the price you pay for being an inline
> class. If there's a way to opt out of that extra weirdness and move from
> Bucket #2 to Bucket #3, great.
>
> With that discussion in mind, here are some summaries of approaches we've
> considered, or that I think we ought to consider, for supporting buckets #2
> and #3. (This is as best as I recall. If there's something I've missed, add
> it to the list!)
>
> [Weighing in for myself: my current preference is to do one of F, G, or I.
> I'm not that interested in supporting Bucket #2, for reasons given above,
> although Option A works for programmers who really want it.]
>
>
>
> === Solutions to support Bucket #2 ===
>
> Two broad strategies here: re-interpreting fields (A, B), and
> re-interpreting the default instance (C, D).
>
> ---
>
> Option A: Encourage programmers to re-interpret fields
>
> Guidance to programmers: when you declare an inline class, identify any
> fields for which the default instance should hold something other than
> zero/null; define a mapping for your implementation from zero/null to the
> value you want.
>
> One way to do this is to define a (possibly private) getter for each
> field, and include logic like 'return month + 1' or 'return id == null ?
> "UTC" : id'. Or maybe you inline that logic, as long as you're careful to
> do so everywhere. Importantly, you also need to reverse the logic in your
> constructor—for the sake of '==', if somebody manually creates the default
> instance, you should  set fields to zero/null.
>
> This doesn't work if you want public fields, but that's life as an OO
> programmer.
>
> In this approach, it would be important that inline classes be expected to
> document their default instance in Javadoc (perhaps with a new Javadoc
> tag)—the interpretation of the default instance is less apparent to users
> than "all zeros".
>
> Limitations:
>
> - It's a fairly error-prone approach. Programmers will absolutely forget
> to apply the mapping in one place, and everything will be fine until
> somebody tries to invoke a particular method on the default instance. Put
> that bug in a security-sensitive context, and maybe you have an exploit.
> (Something that could help some is choosing good names—call your field
> 'monthIndex', not plain 'month', to remind yourself that it's zero-based.)
>
> - Performance impact of an extra layer of computation on all field
> accesses. Probably not a big deal in general, but all those null checks,
> etc., could have a negative impact in certain contexts. And the
> *appearance* of extra cost might scare programmers away from doing the
> right thing ("eh, I probably won't use the default value anyway, I'll just
> ignore it to make my code faster").
>
> ---
>
> Option B: Language support for field re-interpretation
>
> The language allows inline classes to declare fields with mappings to/from
> an internal representation. Just like Option A, but with guarantees that
> the internal representation isn't inappropriately accessed directly.
>
> This pulls on a thread we explored a bit for Amber awhile back, some form
> of "abstract fields" or "virtual fields". Maybe there's something there,
> but it seems like a general-purpose feature, and one we're not likely to
> reach a final solution on anytime soon.
>
> ---
>
> Option C: Language support for a designated default
>
> The language provides some way for programmers to declare the "logical"
> default instance (something like a special static field). The compiler
> inserts a test for the "physical" default on any field/array access, and
> replaces it with the logical default.
>
> That is:
>
> Point p = points[3];
>
> compiles to
>
> point p$0 = points[3];
> Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0;
>
> This is much less bug-prone than Option A—the compiler does all the
> work—and much more achievable in the short/medium term than Option B.
>
> Compared to Option B, this pushes the computation overhead from inline
> class field accesses to reads of the inline type from fields/arrays. I
> don't know if that's good or bad—maybe a wash, heavily dependent on the use
> case.
>
> A few big problems:
>
> - The physical default still exists, and malicious bytecode can use it. If
> programmers want strong guarantees, they'll have to check and throw
> wherever an untrusted instance is provided. (Clients with access to the
> inline class's fields have to do so, too.)
>
> - Covariant arrays mean every read from any array type that might be
> flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through
> translation logic.
>
> - There's an assumption here that the programmer doesn't intend to use the
> physical default as a valid non-default instance. That's hard for the
> compiler to enforce, and weird stuff happens in fields/arrays if the
> programmer doesn't prevent it. (Could be mitigated with extra implicit
> logic on field/array writes or in constructors.)
>
> ---
>
> Option D: JVM support for a designated default
>
> The VM allows inline classes to designate a logical default instance, and
> the field/array access instructions map from the physical default to the
> logical default. The 'vdefault' instruction produces the logical default
> instance; something else is used by the class's factories to build from the
> physical default.
>
> This addresses the first two problems with Option C—the VM gives strong
> guarantees, and can make the translation a virtual operation of certain
> arrays.
>
> To address the second problem, it seems like we'd need the more complex
> logic I hinted at: on writes, map the physical default to the logical
> default, and map the logical default to the physical default. Do the
> reverse on reads.
>
> The problem here is bytecode complexity/slowdowns. We've already added
> some complexity to 'aaload'/'aastore' (covariant flattened arrays), and
> anticipate similar changes to 'putfield'/'getfield' (specialized fields),
> so maybe that means we might as well do more. Or maybe it means we're
> already over budget. :-)
>
> From the users' perspective, if any performance reduction on reads/writes
> can be limited to the inline classes in Bucket #2, *all* the options have a
> similar cost, whether imposed by the programmer, language, or VM. So, to a
> first approximation, slower opcode execution is fine.
>
>
>
> === Solutions to support Bucket #3 ===
>
> Two broad strategies here: rejecting member accesses on the default
> instance (E, F, G), and preventing programs from ever seeing the default
> instance (H, I).
>
> ---
>
> Option E: Encourage programmers to guard against default instances
>
> Guidance to programmers: if you don't like your class's default instance,
> check for it in your methods and throw. Maybe Java SE defines a new
> RuntimeException to encourage this.
>
> The simple way to do this is with some boilerplate at the start of all
> your methods:
>
> if (this == MyClass.default) throw new InvalidDefaultException();
>
> More permissive classes could just do some validation on the fields that
> are relevant to a particular operation. (E.g., 'getMonth' doesn't care if
> 'zoneId' is null.)
>
> This doesn't work if you want public fields, but that's life as an OO
> programmer.
>
> It's not ideal that an invalid instance can float around a program until
> somebody trips on one of these checks, rather than detecting the invalid
> value earlier—we're propagating the NPE problem. And it takes some getting
> used to that there are two null-like values in the reference type's domain.
>
> ---
>
> Option F: Language support for default instance guards
>
> An inline class declaration can indicate that the default instance is
> invalid. The compiler generates guards, as in Option E, at the start of all
> instance method bodies, and perhaps on all field accesses outside of those
> methods.
>
> Programmers give up finer-grained control, but get more safety. I'm sure
> most would be happy with that trade.
>
> Improper/separately-compiled bytecode can skip the field access checks,
> but that's a minor concern.
>
> Same issues as Option E regarding adding a "new NPE" to the platform.
>
> ---
>
> Option G: JVM support for default instance guards
>
> Inline class files can indicate that their default instance is invalid.
> All attempts to operate on that instance (via field/method accesses, other
> than 'withfield') result in an exception.
>
> This tightens up Option F, making it just as impossible to access members
> of the default instance as it is to access members of 'null'.
>
> Same issues as Option E regarding adding a "new NPE" to the platform.
>
> ---
>
> Option H: Language checks on field/array reads
>
> An inline class declaration can indicate that the default instance is
> invalid. Every field and array access that may involved an uninitialized
> field/array component of that inline type gets augmented with a check that
> rejects reads of the default value (treating it as "you forgot to
> initialize this variable").
>
> That is:
>
> Point p = points[3];
>
> compiles to
>
> point p$0 = points[3];
> if (p$0 == [vdefault Point]) throw new UninitializedVariableException();
> Point p = p$0;
>
> This is much like Option C, and has roughly the same advantages/problems.
> There's not a strong guarantee that the default value won't pop up from
> untrusted bytecode (or unreliable inline class authors), and lots of array
> types need guards.
>
> ---
>
> Option I: JVM checks on field/array reads
>
> Inline class files can indicate that their default instance is invalid.
> When reading from a field/array component of the inline type
> ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default
> value is found (treating it as "you forgot to initialize this variable").
> The 'vdefault' instruction, like 'withfield', is illegal outside of the
> inline class's nest.
>
> Better than Option H in that it can be optimized to occur on only certain
> reads, and in that it provides strong guarantees—only the inline class can
> ever "see" the default instance.
>
> Well, unless the inline class chooses to share that instance with the
> world. Not sure how we prevent that. But maybe at that point, anything
> bad/weird that happens is the author's own fault. (E.g., putting the
> default value in an array will make that component effectively
> "uninitialized" again.)
>
> Like Option D, there's a question of whether we're willing to add this
> complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is
> that at least it's less complexity than you have in Option D.
>
>


More information about the valhalla-spec-observers mailing list