Reference-default style

Thu Dec 19 21:15:17 UTC 2019

As we flesh out the migration story for inline classes, I've found it useful to identify two different styles that programmers will want to follow as they make use of inline classes. I want to introduce some terminology to talk about these styles, which will hopefully help us think about what use cases we're designing for.

-----
Inline-default

In this style, most clients of an inline class will want to treat it like another primitive type. They'll use the type directly, and maybe allocate flat arrays (or, eventually, other specialized data structures). Operations that make use of null or erased generics will be uncommon.

Examples:
- Numeric types that are effectively variations on primitives
- Typed wrappers for single values (e.g., measurements, pointers)
- Low-level flat building blocks for data structures
- Multiple-return structures (like cursors)

Most of our design is tailored to this style. In the language model, 'Point' is an inline type, and you need to modify it—'Point.ref'—to access the equivalent reference type (if you must—we expect people to utter 'Point.ref' about as frequently as they currently utter 'Integer').

----
Reference-default

In this style, most clients of an inline class will interact with it through a reference type. They'll use nulls and erased generics the same way they always have. Clients may not even realize that there is an inline class under the hood. Flattening is not a priority, and may even be unwanted (because of cycles, tearing, etc.).

Examples:
- Published classes that are already committed to a reference view (Optional, LocalDateTime)
- Components in a system that makes heavy use of null or generics
- General-purpose records (e.g., a POJO view of a database)
- Nodes in recursive data structures
- Behavior abstractions (e.g., functions)
- APIs that don't want their clients to have to think about inline types
- Classes without a natural default value
- APIs that want to limit access to default values

"Why even bother with an inline class?" is a fair question for these use cases. Some answers:
- Principle of least privilege: if you don't need identity, don't claim it
- Potential for GC improvements* and opportunistic JIT flattening
- A subset of users need flattening, but few enough that it doesn't deserve "default" treatment
- A migration strategy—new code should work with inline types, but old code is written for reference types (sorry, new code)

(*On GC: do we have good numbers on this? Personally, my choices about adding class abstractions often come down to "is this abstraction worth the allocation and GC pressure costs associated with lots of new objects?" My performance model here is horrible, so who knows if this is a smart question to ask, but it would be nice if we could say broadly "use an inline class and stop worrying about it.")

Brian proposes an approach to supporting the reference-default style in "State of Valhalla", but I'm not sure it's ideal—this design space seems fairly unexplored to me still. Briefly, here are some ways the language might support it:

1) As a design pattern

We tell programmers to produce two declarations, an inline class and an interface (or abstract class, per another thread). The interface gets the "good name" and exposes the intended API. The inline class may be exposed with an alternate name, for clients who need it, or hidden as private.

The language is still 100% inline-default—the 'Foo.ref' type still exists, but it's redundant and nobody needs to use it.

If we did nothing (and honestly, that's an attractive feature roadmap! super cheap!), I think we'd see this design pattern developing naturally in the community.

2) As an "advanced" feature of inline classes

This is the State of Valhalla strategy: inline classes are designed to be inline-default, but as a special-case feature, you can also declare the 'Foo.ref' interface, give it a name, and wire it up to the inline class declaration.

In reference-default style, the programmer gives the "good name" to the reference projection, and either gives an alternate name to the inline class or is able to elide it entirely (in that case, clients use 'Foo.inline').

Ways this is different than (1):
- The 'Foo.inline' type operator
- Implicit conversions (although sealed types can get us there in (1))
- There are two types, not three (and two JVM classes, not three)
- Opportunities for "boilerplate reduction" in the two declarations

3) As an equal partner with inline-default

An inline class declaration introduces two types, an inline type and a reference type. But a modifier on the declaration determines whether the "good name" goes to the inline type or the reference type. The other type can be derived using an operator ('Foo.ref' or 'Foo.inline'). There's never a need for an alternate name.

In this case, the language isn't biased to one style or the other; each declaration picks one. The trade-off is that clients need to keep track of one more bit when thinking about the inline class ("Is this a *foo* inline class or a *bar* inline class?" Actual terminology to be bikeshedded...)

4) As the only supported style

An inline class declaration always gives the "good name" to the reference type, and you always use an operator to get to the inline type ('Foo.inline'—but we're gonna need better syntax.)

This one would represent a significant shift in the design center of the feature. If you want flattening everywhere, you're going to need to make liberal use of the '.inline' operator. But if you just want to declare that a bunch of your classes don't have identity, and hopefully get a cheap performance boost as a result, it's simple. The burden of learning something new is shifted to "advanced" users and APIs to whom flattening is important.

5) As a use-site contextual option

There's a single inline class declaration with two corresponding types. At the use site, the programmer provides context that picks one or the other for the "good name" (perhaps as a property of the 'import' statement, or some new compiler direction in a source file header, package/module declaration, command line flag, ...).

This probably makes the most sense paired with (4): the *default* default is the reference type, but the language lets you switch to the inline type if you want. Then, unless the client opts in to inline types, they get familiar reference type behavior from the class.

Conclusion:

I'm not ready to completely dismiss any of these designs, but my preferences at the moment are (1) and (3). Options (4) and (5) are more ambitious, discarding some of our assumptions and taking things in a different direction.

Like many design patterns, (1) suffers from boilerplate overhead ((2) too, without some language help). It also risks some missed opportunities for optimization or language convenience, because the relationship between the inline and reference type is incidental. (I'd like to get a clearer picture of whether this really matters or not.)

(2), (3), and (5) suffer from added language complexity. (2) tries to manage it by pushing the feature off into "advanced" territory. But, ultimately, you can't understand the language without understanding those advanced features—the first time you encounter reference-default style, you'll have to rethink your understanding of how inline classes work.

(5) feels like something fundamentally new in Java, although if you squint it's "just" a variation on name resolution. What originally prompted this idea was seeing a similar approach in attempts to introduce nullability type operators—legacy code has the "wrong" default, so you need some lightweight way to pick a different default.

(4) is a simple and consistent story, but probably not the feature we're building. It hinges on how important we think the inline-default use cases are, and how painful we think the 'inline' operator (spelling TBD) would be to use in those cases.

Since (1) is already done (it's the "do nothing" option), it makes sense to use it as a baseline, and then ask whether any of the alternatives are a significant enough improvement that they're worth developing. This will be informed by our understanding of the use cases for the two styles, and some real world experience would probably help that understanding.