indy-based string-switch proposal

Japris Pogrammer mrjarviscraft at gmail.com
Mon Nov 4 19:15:17 UTC 2019


Hi there!.

I am here with the proposal (and a working implementation prototype [1]) of
a invokedynamic-based string-switch which, while not breaking any
backwards-compatibility and not requiring any cascade changes, allows
improved behaviour for this JLS feature.

Idea:
Replace the lookupswitch instruction part (and all the following
instructions corresponding to its labels) generated by javac for a
String-switch (keeping the tableswitch part) with a single invokedynamic
instruction using the new SwitchTableFactory.
Patch contents:
- SwitchTableFactory.java : home for bootstrap-methods replacing the old
lookupswitch behaviour
- SwitchTableFactoryException.java : exception thrown by bootstrap-methods
of SwitchTableFactory
- BaseTest.java : simple behavioral test of bootstrap-methods invoking them
via invokestatic and testing results produced by MethodHandles of returned
CallSites

Reasoning:
1) Current approach (used by javac) is generating a complicated
structure (whose number of opcodes grows ~linearly on number of branches).
Yet this structure is not usually optimized (at current implementation even
1-branch corner-case is not treated specially [2]). Even if these
optimizations were performed at compile-time they could have become useless
(or even causing performance problems) in future versions which is against
backwards-compatibility principle. Also, switches compiled for older
versions whould not use newer features which could have allowed better
performance.
2) As of current spec, String hash-code is one of the only hard-coded
algorithms. This is mostly due to backwards-compatibility principle and was
decided to not be changed (TL;DR and this proposal is not trying to do it).
However, using indy-based approach would make the compiler independent from
it thus allowing removal of String hash-code algorithm in the future.
3) Producing simpler bytecode would allow external tools analyzing it
discover string-switches easier (this may also be said about JIT).
4) Current algorithm cannot rely on String implementation details which are
not defined at compile-time (such as String's internal fields).

Invokedynamic details:
Two bootstrap methods are provided in order to allow big switches (C-style
arguments layout is due to inability to pass complicated arguments (such as
typed arrays) into the bootstap method):
1) SwitchTableFactory:createStringSwitchTable(Lookup, String, MethodType,
Object[]) whose bootstrap arguments are the following:
<default branch id>, <number of String labels>, [<branch id>, <number of
String labels corresponsing to the branch>, [<string labels corresponsing
to the branch>]*]*
2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
MethodType, Object[]) whose bootstrap arguments are the following:
<default branch id>, <number of pages (the following argument groups)>,
[<page>]*
Page is a String of the following pattern:
[<HEX branch id>\0<HEX number of String labels corresponsing to the
branch>\0[<HEX length of the following String label>\0<String label
corresponsing to the branch>]*]*
As you can see, it allows for multiple branches (each containing multiple
labels) to be described under one page. This is done in order to support
big switches which cannot fit into the first method variant.

Example:
As an example of what could be done by the compiler using this feature,
consider the following switch expression:

switch (string) {
    case "foo": <branch 0>
    case "bar": <branch 1>
    case "baz": case "qux": <branch 2>
    default: <branch -1>
}

Generate bytecode would look like:

invokedynamic
java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1, 4,
        0, 1, "foo",
        1, 1, "bar",
        2, 1, "baz", "qux"
}
<good-old switch-table by code>

or (this should happen for big switches)

invokedynamic
java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..) {-1, 1,
        "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
}
<good-old switch-table by code>

(while the argument looks complicates it is (A) shoter (as each '\000' is
actually one char) and (B) linear (meaning it is easy to parse))

Implementation details:
Current implementation is based on ASM-generation. Under the hood it
generates a class ) with a static String (int) method (passed to
ConstantCallSite) performing hash-code() switch + equals(Object) (as
currently done by javac) handling 0- and 1-branch cases specially (by using
a MethodHandle of a method returning a bound value (default branch ID) and
by generating an equals(Object) check respectively). The class is
anonymously defined via Unsafe as done by other JDK bootstrap methods.

Features to be done (this were not done yet as I am not aware yet if this
feature will get positive feedback):
1) Add the class and its indy-details into other classes to have it treated
specially (as done for LambdaMetaFactory and StringConcatFactory) [3].
2) FIll the documentation of the SwitchTableFactory.
3) Use SwitchTableFactory in javac (probably, as a preview feature) and
test its behaviour.

Possible improvements:
1) Bootstrap's generator may use its own hash-code algorithm (based on
direct access to String's content using special Lookup or Unsafe) which
would be more efficient than String#hashCode().
2) Bootstrap's generator may depend on other String's properties, such as
compact-mode, length etc.
2) Bootstrap's generator may use switchtable in case of hash-codes being
close (yet it seems to be too rare to handle).

Subjects to discuss:
1. Should the bootstrap method validate its arguments in order to report
irrational arguments (such as usage of duplicate String labels)?
2. Is the current Boostrap-arguments layout fine or should be remade?
3. Which optimizations should be added (and/or toggled by default) to the
generator?
4. Should branch IDs be specified explicitly or not (using -1 for default
and 0 + i for other branches)?
5. Should there be a way to pass MethodHandles to bootstrap methods to be
invoked instead of branch IDs being returned if a switch returns nothing
(and should there be a way to do it for stack-affecting switches?)?

Possible extensions:
1. As the name of the class suggests, it is a general switch-table factory
and it may be used in future for other types of switches. At least,
enum-switch *may* be done using it, but (in order to keep it one-switch
based) it would be much better to have it implemented via condy producing
int-array for ordinals (this would remove the need for extra synthetic
class currently being generated).
2. Moreover, this approach may be used for boosted implementations of
pattern-matching magic (using the same approach as for the strings) with
the exception of the need to pass more complicated structures into the
bootstrap method (using condy or references to producing methods or more
complicated patterns).
3. In addition to #2, a universal approach may be introduced to allow
faster switch by all types of values (which is primary useful for pattern
matching) such as passing pairs of int(T) and boolean(T) MethodHandles
performing hash-code computation and equality check respectively.

I am ready to continue developing this feature (being ready to sign Oracle
Contributor Agreement) which includes:
- documentation
- improvements to the current methods
- addition of the class to other
- benchmarks and testing
- optimization
- compiler support for this switch-approach
- other features described above, such as a universal switch bootstrap
method (Possible extensions, #3)

Notes:
This might have been discussed in the past (at least indy-based switches in
general), if there was a reason for this to not be implemented, please
refer me to it. I would be happy to work on this proposal to make this
feature (and other ones mentioned) part of OpenJDK.

Thanks for your interest and support,
Peter

Links:
[1] https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f Patch
(dynamic mirror of the file appended)
[2]
https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
Current
javac String implementation root
[3]
https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
One
of the cases where JDK's Bootstrap-methods are handled specially


More information about the amber-dev mailing list