Switch translation
Brian Goetz
brian.goetz at oracle.com
Fri Apr 6 15:51:49 UTC 2018
The following outlines our story for translating improved switches,
including both the switch improvements coming as part of JEP 325, and
follow-on work to add pattern matching to switches. Much of this has
been discussed already over the last year, but here it is in one place.
# Switch Translation
#### Maurizio Cimadamore and Brian Goetz
#### April 2018
## Part 1 -- constant switches
This part examines the current translation of `switch` constructs by
`javac`, and proposes a more general translation for switching on
primitives, boxes, strings, and enums, with the goals of:
- Unify the treatment of `switch` variants, simplifying the compiler
implementation and reducing the static footprint of generated code;
- Move responsibility for target classification from compile time to
run time, allowing us to more freely update the logic without updating
the compiler.
## Current translation
Switches on `int` (and the smaller integer primitives) are translated in
one of two ways. If the labels are relatively dense, we translate an
`int` switch to a `tableswitch`; if they are sparse, we translate to a
`lookupswitch`. The current heuristic appears to be that we use a
`tableswitch` if it results in a smaller bytecode than a `lookupswitch`
(which uses twice as many bytes per entry), which is a reasonable
heuristic.
#### Switches on boxes
Switches on primitive boxes are currently implemented as if they were
primitive switches, unconditionally unboxing the target before entry
(possibly throwing NPE).
#### Switches on strings
Switches on strings are implemented as a two-step process, exploiting
the fact that strings cache their `hashCode()` and that hash codes are
reasonably spread out. Given a switch on strings like the one below:
switch (s) {
case "Hello": ...
case "World": ...
default: ...
}
The compiler desugar this into two separate switches, where the first
switch maps the input strings into a range of numbers [0..1], as shown
below, which can then be used in a subsequent plain switch on ints. The
generated code unconditionally calls `hashCode()`, again possibly
throwing NPE.
int index=-1;
switch (s.hashCode()) {
case 12345: if (!s.equals("Hello")) break; index = 1; break;
case 6789: if (!s.equals("World")) break; index = 0; break;
default: index = -1;
}
switch (index) {
case 0: ...
case 1: ...
default: ...
}
If there are hash collisions between the strings, the first switch must
try all possible matching strings.
#### Switches on enums
Switches on `enum` constants exploit the fact that enums have (usually
dense) integral ordinal values. Unfortunately, because an ordinal value
can change between compilation time and runtime, we cannot rely on this
mapping directly, but instead need to do an extra layer of mapping.
Given a switch like:
switch(color) {
case RED: ...
case GREEN: ...
}
The compiler numbers the cases starting a 1 (as with string switch), and
creates a synthetic class that maps the runtime values of the enum
ordinals to the statically numbered cases:
class Outer$0 {
synthetic final int[] $EnumMap$Color = new
int[Color.values().length];
static {
try { $EnumMap$Color[RED.ordinal()] = 1; } catch
(NoSuchFieldError ex) {}
try { $EnumMap$Color[GREEN.ordinal()] = 2; } catch
(NoSuchFieldError ex) {}
}
}
Then, the switch is translated as follows:
switch(Outer$0.$EnumMap$Color[color.ordinal()]) {
case 1: stmt1;
case 2: stmt2
}
In other words, we construct an array whose size is the cardinality of
the enum, and then the element at position *i* of such array will
contain the case index corresponding to the enum constant with whose
ordinal is *i*.
## A more general scheme
The handling of strings and enums give us a hint of how to create a more
regular scheme; for `switch` targets more complex than `int`, we lower
the `switch` to an `int` switch with consecutive `case` labels, and use
a separate process to map the target into the range of synthetic case
labels.
Now that we have `invokedynamic` in our toolbox, we can reduce all of
the non-`int` cases to a single form, where we number the cases with
consecutive integers, and perform case selection via an
`invokedynamic`-based classifier function, whose static argument list
receives a description of the actual targets, and which returns an `int`
identifying what `case` to select.
This approach has several advantages:
- Reduced compiler complexity -- all switches follow a common pattern;
- Reduced static code size;
- The classification function can select from a wide range of
strategies (linear search, binary search, building a `HashMap`,
constructing a perfect hash function, etc), which can vary over time or
from situation to situation;
- We are free to improve the strategy or select an alternate strategy
(say, to optimize for startup time) without having to recompile the code;
- Hopefully at least, if not more, JIT-friendly than the existing
translation.
We can also use this approach in preference to `lookupswitch` for
non-dense `int` switches, as well as use it to extend `switch` to handle
`long`, `float`, and `double` targets (which were surely excluded in
part because the JVM didn't provide a convenient translation target for
these types.)
#### Bootstrap design
When designing the `invokedynamic` bootstraps to support this
translation, we face the classic lumping-vs-splitting decision. For now,
we'll bias towards splitting. In the following example,
`BOOTSTRAP_PREAMBLE` indicates the usual leading arguments for an indy
bootstrap. We assume the compiler has numbered the case values densely
from 0..N, and the bootstrap will return [0,n) for success, or N for "no
match".
A strawman design might be:
// Numeric switches for P, accepts invocation as P -> I or Box(P) -> I
CallSite intSwitch(BOOTSTRAP_PREAMBLE, int... caseValues)
// Switch for String, invocation descriptor is String -> I
CallSite stringSwitch(BOOTSTRAP_PREAMBLE, String... caseValues)
// Switch for Enum, invocation descriptor is E -> I
CallSite enumSwitch(BOOTSTRAP_PREAMBLE, Class<Enum<E extends
Enum<E>>> clazz,
String... caseNames)
It might be possible to encode all of these into a single bootstrap, but
given that the compiler already treats each type slightly differently,
it seems there is little value in this sort of lumping for non-pattern
switches.
The `enumSwitch` bootstrap as proposed uses `String` values to describe
the enum constants, rather than encoding the enum constants directly via
condy. This allows us to be more robust to enums disappearing after
compilation.
This strategy is also dependent on having broken the limitation on 253
bootstrap arguments in indy/condy.
#### Extending to other primitive types
This approach extends naturally to other primitive types (long, double,
float), by the addition of some more bootstraps (which need to deal with
the additional complexities of infinity, NaN, etc):
CallSite longSwitch(BOOTSTRAP_PREAMBLE, long... caseValues)
CallSite floatSwitch(BOOTSTRAP_PREAMBLE, float... caseValues)
CallSite doubleSwitch(BOOTSTRAP_PREAMBLE, double... caseValues)
#### Extending to null
The scheme as proposed above does not explicitly handle nulls, which is
a feature we'd like to have in `switch`. There are a few ways we could
add null handling into the API:
- Split entry points into null-friendly or null-hostile switches;
- Find a way to encode nulls in the array of case values (which can be
done with condy);
- Always treat null as a possible input and a distinguished output,
and have the compiler ensure the switch can handle this distinguished
output.
The last strategy is appealing and straightforward; assign a sentinel
value (-1) to `null`, and always return this sentinel when the input is
null. The compiler ensures that some case handles `null`, and if no
case handles `null` then it inserts an implicit
case -1: throw new NullPointerException();
into the generated code.
#### General example
If we have a string switch:
switch (x) {
case "Foo": m(); break;
case "Bar": n(); // fall through
case "Baz": r(); break;
default: p();
}
we translate into:
int t = indy[bsm=stringSwitch["Foo", "Bar", "Baz"]](x)
switch (t) {
case -1: throw new NullPointerException(); // implicit null case
case 0: m(); break;
case 1: n(); // fall through
case 2: r(); break;
case 3: p(); // default case
}
All switches, with the exception of `int` switches (and maybe not even
non-dense `int` switches), follow this exact pattern. If the target
type is not a reference type, the `null` case is not needed.
This strategy is implemented in the `switch` branch of the amber
repository; see `java.lang.runtime.SwitchBootstraps` in that branch for
(rough!) implementations of the bootstraps.
## Patterns in narrow-target switches
When we add patterns, we may encounter switches whose targets are
tightly typed (e.g., `String` or `int`) but still use some patterns in
their expression. For switches whose target type is a primitive,
primitive box, `String`, or `enum`, we'd like to use the optimized
translation strategy outlined here, but the following kinds of patterns
might still show up in a switch on, say, `Integer`:
case var x:
case _:
case Integer x:
case Integer(var x):
The first three can be translated away by the source compiler, as they
are semantically equivalent to `default`. If any nontrivial patterns
are present (including deconstruction patterns), we may need to
translate as a pattern switch scheme -- see Part 2. (While the language
may not distinguish between "legacy" and "pattern" switches -- in that
all switches are pattern switches -- we'd like to avoid giving up
obvious optimizations if we can.)
# Part 2 -- type test patterns and guards
A key motivation for reexamining switch translation is the impending
arrival of patterns in switch. We expect switch translation for the
pattern case to follow a similar structure -- lower to an `int` switch
and use an indy-based classifier to select an index. However, there are
a few additional complexities. One is that pattern cases may have
guards, which means we need to be able to re-enter the bootstrap with an
indication to "continue matching from case N", in the event of a failed
guard. (Even if the language doesn't support guards directly, the
obvious implementation strategy for nested patterns is to desugar them
into guards.)
Translating pattern switches is more complicated because there are more
options for how to divide the work between the statically generated code
and the switch classifier, and different choices have different
performance side-effects (are binding variables "boxed" into a tuple to
be returned, or do they need to be redundantly calculated).
## Type-test patterns
Type-test patterns are notable because their applicability predicate is
purely based on the type system, meaning that the compiler can directly
reason about it both statically (using flow analysis, optimizing away
dynamic type tests) and dynamically (with `instanceof`.) A switch
involving type-tests:
switch (x) {
case String s: ...
case Integer i: ...
case Long l: ...
}
can (among other strategies) be translated into a chain of `if-else`
using `instanceof` and casts:
if (x instanceof String) { String s = (String) x; ... }
else if (x instanceof Integer) { Integer i = (Integer) x; ... }
else if (x instanceof Long) { Long l = (Long) x; ... }
#### Guards
The `if-else` desugaring can also naturally handle guards:
switch (x) {
case String s
where (s.length() > 0): ...
case Integer i
where (i > 0): ...
case Long l
where (l > 0L): ...
}
can be translated to:
if (x instanceof String
&& ((String) x).length() > 0) { String s = (String) x; ... }
else if (x instanceof Integer
&& ((Integer) x) > 0) { Integer i = (Integer) x; ... }
else if (x instanceof Long
&& ((Long) x) > 0L) { Long l = (Long) x; ... }
#### Performance concerns
The translation to `if-else` chains is simple (for switches without
fallthrough), but is harder for the VM to optimize, because we've used a
more general control flow mechanism. If the target is an empty
`String`, which means we'd pass the first `instanceof` but fail the
guard, class-hierarchy analysis could tell us that it can't possibly be
an `Integer` or a `Long`, and so there's no need to perform those tests.
But generating code that takes advantage of this information is more
complex.
In the extreme case, where a switch consists entirely of type test
patterns for final classes, this could be performed as an O(1) operation
by hashing. And this is a common case involving switches over
alternatives in a sum (sealed) type. (We shouldn't rely on finality at
compile time, as this can change between compile and run time, but we
should take advantage of this at run time if we can.)
Finally, the straightforward static translation may miss opportunities
for optimization. For example:
switch (x) {
case Point p
where p.x > 0 && p.y > 0: A
case Point p
where p.x > 0 && p.y == 0: B
}
Here, not only would we potentially test the target twice to see if it
is a `Point`, but we then further extract the `x` component twice and
perform the `p.x > 0` test twice.
#### Optimization opportunities
The compiler can eliminate some redundant calculations through
straightforward techniques. The previous switch can be transformed to:
switch (x) {
case Point p:
if (((Point) p).x > 0 && ((Point) p).y > 0) { A }
else if (((Point) p).x > 0 && ((Point) p).y > 0) { B }
to eliminate the redundant `instanceof` (and admits further CSE
optimizations.)
#### Clause reordering
The above example was easy to transform because the two `case Point`
clauses were adjacent. But what if they are not? In some cases, it is
safe to reorder them. For types `T` and `U`, it is safe to reorder
`case T` and `case U` if the two types have no intersection; that there
can be no types that are subtypes of them both. This is true when `T`
and `U` are classes and neither extends the other, or when one is a
final class and the other is an interface that the class does not
implement.
The compiler could then reorder case clauses so that all the ones whose
first test is `case Point` are adjacent, and then coalesce them all into
a single arm of the `if-else` chain.
A possible spoiler here is fallthrough; if case A falls into case B,
then cases A and B have to be moved as a group. (This is another reason
to consider limiting fallthrough.)
A bigger possible spoiler here is separate compilation. If at compile
time, we see that `T` and `U` are disjoint types, do we want to bake
that assumption into the compilation, or do we have to re-check that
assumption at runtime?
#### Summary of if-else translation
While the if-else translation at first looks pretty bad, we are able to
extract a fair amount of redundancy through well-understood compiler
transformations. If an N-way switch has only M distinct types in it, in
most cases we can reduce the cost from _O(N)_ to _O(M)_. Sometimes _M
== N_, so this doesn't help, but sometimes _M << N_ (and sometimes `N`
is small, in which case _O(N)_ is fine.)
Reordering clauses involves some risk; specifically, that the class
hierarchy will change between compile and run time. It seems eminently
safe to reorder `String` and `Integer`, but more questionable to reorder
an arbitrary class `Foo` with `Runnable`, even if `Foo` doesn't
implement `Runnable` now, because it might easily be changed to do so
later. Ideally we'd like to perform class-hierarchy optimizations using
the runtime hierarchy, not the compile-time hierarchy.
## Type classifiers
The technique outlined in _Part 1_, where we lower the complex switch to
a dense `int` switch, and use an indy-based classifier to select an
index, is applicable here as well. First let's consider a switch
consisting only of unguarded type-test patterns, optionally with a
default clause.
We'll start with an `indy` bootstrap whose static argument are `Class`
constants corresponding to each arm of the switch, whose dynamic
argument is the switch target, and whose return value is a case number
(or distinguished sentinels for "no match" and `null`.) We can easily
implement such a bootstrap with a linear search, but can also do better;
if some subset of the classes are `final`, we can choose between these
more quickly (such as via binary search on `hashCode()`, hash function,
or hash table), and we need perform only a single operation to test all
of those at once. Dynamic techniques (such as a building a hash map of
previously seen target types), which `indy` is well-suited to, can
asymptotically approach _O(1)_ even when the classes involved are not
final.
So we can lower:
switch (x) {
case T t: A
case U u: B
case V v: C
}
to
int y = indy[bootstrap=typeSwitch(T.class, U.class, V.class)](x)
switch (y) {
case 0: A
case 1: B
case 2: C
}
This has the advantages that the generated code is very similar to the
source code, we can (in some cases) get _O(1)_ dispatch performance, and
we can handle fallthrough with no additional complexity.
#### Guards
There are two approaches we could take to add support for guards into
the process; we could try to teach the bootstrap about guards (and would
have to pass locals that appear in guard expressions as additional
arguments to the classifier), or we could leave guards to the generated
bytecode. The latter seems far more attractive, but requires some
tweaks to the bootstrap arguments and to the shape of the generated code.
If the classifier says "you have matched case #3", but then we fail the
guard for #3, we want to go back into the classifier and start again at
#4. (Sometimes the classifier can also use this information ("start
over at #4") to optimize away unnecessary tests.)
We add a second argument (where to start) to the classifier invocation
signature, and wrap the switch in a loop, lowering:
switch (target) {
case T t where (e1): A
case T t where (e2): B
case U u where (e3): C
}
into
int index = -1; // start at the top
while (true) {
index = indy[...](target, index)
switch (index) {
case 0: if (!e1) continue; A
case 1: if (!e2) continue; B
case 2: if (!e3) continue; C
default: break;
}
break;
}
For cases where the same type test is repeated in consecutive positions
(at N and N+1), we can have the static compiler coalesce them as above,
or we could have the bootstrap maintain a table so that if you re-enter
the bootstrap where the previous answer was N, then it can immediately
return N+1. Similarly, if N and N+1 are known to be mutually exclusive
types (like `String` and `Integer`), on reentering the classifier with
N, we can skip right to N+2 since if we matched `String`, we cannot
match `Integer`. Lookup tables for such optimizations can be built at
callsite linkage time.
#### Mixing constants and type tests
This approach also extends to tests that are a mix of constant patterns
and type-test patterns, such as:
switch (x) {
case "Foo": ...
case 0L: ...
case Integer i:
}
We can extend the bootstrap protocol to accept constants as well as
types, and it is a straightforward optimization to combine both type
matching and constant matching in a single pass.
## Nested patterns
Nested patterns are essentially guards; even if we don't expose guards
in the language, we can desugar
case Point(0, var x):
into the equivalent of
case Point(var a, var x) && a matches 0:
using the same translation story as above -- use the classifier to
select a candidate case arm based on the top-type of the pattern, and
then do additional checks in the generated bytecode, and if the checks
fail, continue and re-enter the classifier starting at the next case.
#### Explicit continue
An alternative to exposing guards is to expose an explicit `continue`
statement in switch, which would have the effect of "keep matching at
the next case." Then guards could be expressed imperatively as:
case P:
if (!guard)
continue;
...
break;
case Q: ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/amber-spec-experts/attachments/20180406/aa8fd4a0/attachment-0001.html>
More information about the amber-spec-experts
mailing list