Valhalla performance model
Brian Goetz
brian.goetz at oracle.com
Thu Jun 30 21:24:46 UTC 2022
Here's a *first draft* of a document to go into the SoV docset on the
performance model.
# State of Valhalla
## Part 5: Performance Model {.subtitle}
#### Brian Goetz {.author}
#### June 2022 {.date}
This document describes performance considerations for value classes under
Project Valhalla. While we describe the optimizations that we expect the
HotSpot JVM to routinely make, other JVMs may make their own choices, and of
course these choices may vary over time and situations.
## Flattening
Project Valhalla has two broad classes of goals. The first is
_unification_:
unifying the treatment of primitives and references in the Java type system.
The second is _performance_: enabling the declaration of aggregates with
_flatter_and _denser_layouts than the layout we get with today's identity
classes.
By _flatness_, we mean the number of memory indirections that must be
traversed
to get to the leaf data in an object graph. If all object references are
implemented as pointers -- as they almost always are for identity objects --
then each object becomes an "island" of data, requiring indirections
each time
we hop to another island. Flatness begets density; each indirection
requires an
object header, and eliminating indirections also reduces the number of
object
headers. Flatness also reduces garbage collection costs, since there
are fewer
objects in the heap to process.
### Heap flattening
The form of flattening that comes most readily to mind is flattening on the
heap, _inlining_the layout of some objects into that of other objects (and
arrays), which eliminates island-hopping. If `C`is an identity class,
then a
field of type `C`is laid out as a pointer, and an array of type `C[]`is laid
out as an array of pointers. If `V`is a value type, we have the option
to lay
out fields of type `V`by inlining the fields of `V`into the layout of the
enclosing type, and lay out arrays of type `V[]`as repeating (aligned)
groups
of the fields of `V`. These layout choices reduce the number of
indirections to
get to the fields of `V`by one hop.
### Calling convention flattening
A less obvious, but also important form of flattening, is flattening in
_method
calling conventions_. If a method argument or return is a reference to `C`,
where `C`is an identity class or polymorphic type (interfaces and abstract
classes), the argument will usually be passed as a pointer on the stack
or in a
register. If `V`is a value type, we have another option: to _scalarize_`V`
(explode it into its fields) and pass its fields on the stack or in
registers.
(Perhaps surprisingly, under some situations we have the same option if
`V`is a
strongly typed reference to a value object (a `V.ref`) as well.)
Both heap layouts and calling convention are determined fairly early in the
execution of a program. This means that any information needed to make
these
flattening choices must be available early in the execution as well.
The `Q`
descriptors used by value types act as a preload signal, as does the
`Preload`
attribute used for reference companions of value classes.
### Locals
Locals variables have even more latitude over representation, because unlike
with layouts and calling conventions, there is no need for separately
compiled
code to agree on a representation. Values and references to values may be
routinely scalarized in local variables.
### Which is more important?
It is tempting to assume that heap flattening is more important, but
this is a
bias we need to overcome. Developers tend to be more aware of heap
allocation
(we can see the `new`in the code) and heap utilization is more easily
measured
with monitoring tools. But this is mostly observability bias. Both are
important to performance, and serve complementary goals.
Stack flattening is what makes much of the cost of using boxing and wrapper
classes like `Optional`go away. As developers, we all flinch a bit when we
have to return a wrapper like `Optional`or a record type that the client is
just going to unpack; this feels like "needless motion". Stack flattening
allows us to get the benefits of these abstractions without paying this
cost,
whis shows up as a streamlining of general computational costs.
Heap flattening serves a different role; it is about flattening and
compacting
object graphs. This has a bigger impact on data-intensive code,
allowing us to
pack more data into a given sized heap and traverse data in the heap more
efficiently. It also means that the garbage collector has less work to do,
making more CPU cycles and memory bandwidth available for business
calculation.
## Additional considerations
There are two additional considerations that affect performance indirectly:
nullity and tearing.
### Nulls
Nullity is a property of references; null is how a reference refers to no
instance at all. Values (historically primitives, but now also value
types) are
never null, so are directly amenable to scalarization. Perhaps
surprisingly,
_references_to value types may also be scalarized by adjoining a synthetic
boolean _null channel_to represent whether or not the reference is null.
(If
the null channel indicates the reference is null, the data in the other
channels
should be ignored.) This null channel may require additional space,
since many
value types (e.g., `int`) use all their bit patterns and therefore would
need
additional bits to represent nullity. However, the JVM has a variety of
possible tricks at its disposal to eliminate this extra footprint in
some cases,
such as using slack in pointers, booleans, or the alignment shadow.
### Tearing
Whether or not to flatten heap-based variables has an additional
consideration:
the possibility for _tearing_. Tearing occurs when a read of a logical
quantity
(such as a 64-bit integer) is broken up into multiple physical reads
(such as
two 32-bit reads), and the result of those reads correspond to different
writes
of the logical quantity.
The Java platform has always allowed for some form of tearing: reads and
writes
of 64-bit primitives (`long`and `double`) may be broken up into two 32-bit
reads and writes. In the presence of a data race (which is a logic error),
these two reads could return data corresponding to two different logical
writes.
This possible inconsistency was permitted because at the time, most hardware
lacked the ability to perform atomic 64 bit operations efficiently, and this
problem only occurs in concurrent programs that already have a serious
concurrency bug. (The recommended cure is to declare the field
`volatile`, but
any technique that eliminates the data race, such as guarding the data by a
lock, will also work.)
On modern hardware, most JVMs now use atomic 64 bit instructions for
reads and
writes of `long`and `double`variables, as the performance of these
instructions has caught up and so JVMs can provide greater safety at
negligible
cost. However, with the advent of value classes, tearing under race again
becomes a possibility, since one can easily declare a value class whose
layout
exceeds the maximum atomic load and store size of any hardware.
Because values are aggregates, some value classes may be less tolerant of
tearing than others. Specifically, value classes that have representational
invariants across their fields (e.g., a `Range`class that requires the lower
bound not exceed the upper bound), and exposing code to instances that
do not
respect these invariants may be surprising or dangerous. Accordingly, some
value classes may be declared with stronger or weaker atomicity requirements
(e.g., `non-atomic`) that affect whether or not instances may tear under
race --
and which potentially constrains how these are flattened in the heap.
(Tearing
is not an issue for local variables or method parameters or returns, as
these
are entirely within-thread and therefore are not at risk for data races.)
## Layout
Today, object layout is simple: reference types are represented as
pointers, and
primitive types are represented directly (flat); similarly, arrays of
reference
types are arrays of pointers, and arrays of primitives are flattened. These
layout choices are common to heap, calling convention, and local
representation.
References to identity objects, and the built-in primitives, will surely
continue to use this layout. But we have additional latitude with value
types
and references to value objects. The choice of layout for these new
types will
depend on a number of factors: whether they have atomicity requirements,
their
size, the context (heap, stack, or local), and mutability.
There are effectively three possible flattening strategies available, which
we'll call non-, low-, and full-flat. Non-flat is the same old strategy
as for
identity objects: pointers. JVMs are free to fall back to non-flat in any
situation. Full-flat is full inlining of the layout into the enclosing
class
or array layout, as we get with primitives today. Low-flat chooses between
these based on the size of the object layout and the hardware -- if the
object
fits into cheap-enough atomic loads and stores, flatten as per full-flat,
otherwise fall back to non-flat. (In addition to requiring suitable
atomic loads
and stores, the low-flat treatment may also require compiler heroics to
support
reading and writing multiple fields in a single memory access.)
The following table outlines the best we can do based on the desired
semantics:
| Kind | Stack | Heap
|
| --------------------- | ----------------------------- |
---------------------------- |
| Identity object | Non-flat | Non-flat
|
| Primitive | Full-flat | Full-flat
|
| Non-atomic value type | Full-flat | Full-flat
|
| Atomic value type | Full-flat | Low-flat
(unless final) |
| Ref to value type | Full-flat (with null channel) | Low-flat (with
null channel) |
There are two significant attributes of this chart. First, note that we can
still get full or partial flattening even for some reference types. The
other
is that we can flatten more uniformly on the stack (calling convention,
locals)
than we can on the heap (fields, arrays). This is because of the
intrusion of
atomicity and the possibility of data races. The stack is immune from data
races since it is confined to a single thread, but in the heap, there is
always
the possibility of concurrent access. If there are atomicity requirements
(`non-atomic`value types, and all references), then the best we can do is to
flatten up to the threshold of atomicity.
For references to value types, the footprint cost may be larger to
accomodate
the null channel (absent heroics to encode the null channel in slack bits.)
For final fields, we may be able to upgrade to full-flat even for atomic
value
types and references to value types, because final fields are not subject to
tearing. (Technically, they are if the receiver escapes construction,
but in
this case the Java Memory Model voids the initialization safety
guarantees that
would prevent tearing.)
For very large values, we would likely choose non-flat even when we could
otherwise flatten; flattening a 1000-field value type is likely to be
counterproductive.
### Expected performance
The table above reveals some insights. It means that we can routinely
expect
full flattening in locals and calling convention, regardless of atomicity,
nullity, or reference-ness -- the only thing we need to avoid is
identity. In
turn, this means the performance difference between `V.ref`and `V.val`for
method parameters, returns, and locals, is minimal (though there are some
second-order effects due to the null channel, such as register pressure
and null
check instructions).
The real difference between `V.ref`and `V.val`shows up in the heap, which
means fields and array elements. (Arrays are particularly important
because the
same structure is repeated many times, so any footprint and indirection
costs
are multiplied by the array size.) While identity was an absolute
impediment to
flattening in the heap, once that is removed, the next impediment reveals
itself: atomicity. Java has long offered a powerful safety guarantee of
_initialization safety_for final fields, even when the object reference is
published via race. This is where the oft-repeated wisdom of "immutable
objects
are automatically thread-safe" comes from; it relies on the atomicity of
loading
object references. To avoid consistency surprises, value types provide
atomicity by default, but can be marked as `non-atomic`to achieve greater
flattening (at the cost of tearing under race.)
Mutable variables of atomic types -- atomic value types, and references
to all
value types -- will only be flattened in the heap if they are small
enough to
fit into the atomic memory operations available. For references, if
there is
any additional footprint required to represent null, this additional
footprint
is included in size for purposes of evaluating "small enough."
### Coding guidelines
On the stack (method parameters and returns, locals) the performance
difference
between using `V.ref`and `V.val`is minimal; disavowing identity is enough.
Among other things, this means that migrating value-based classes to
true value
classes should provide an immediate boost with no code changes.
(Further gains
can be had by using the `.val`companion in the heap, but it is generally not
necessary to use it in method signatures or locals.)
The most important consideration for heap flattening is whether the type
disavows not only identity, but atomicity. If a type disavows
atomicity, then
its value companion will get full flattening in the heap.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20220630/576fd120/attachment-0001.htm>
More information about the valhalla-spec-observers
mailing list