indy-based string-switch proposal
Brian Goetz
brian.goetz at oracle.com
Mon Nov 4 20:31:50 UTC 2019
Here’s is the change set that added these files, in branch pattern-runtime.
http://hg.openjdk.java.net/amber/amber/rev/755f726a61a7
There may be later changesets but this should get you to where we left it off.
Sent from my iPad
> On Nov 4, 2019, at 9:24 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
> Indeed, such a translation improvement is on the roadmap — in fact we want to might all switches other than those over ints to use Indy-based classification. When we extend switch to support patterns, it will force us to revamp switch translation, and this sort of stuff is better done in library code than in generated bytecode.
>
> There is a prototype of the bootstraps for string, long, and others, with some test cases, checked into an amber branch — let me dig this up.
>
> As you discovered, there is an Indy pothole regarding long arg lists, but is slated to be fixed (or may already have been).
>
> This is a good direction - and something for which we’d be glad for the help (as you’ll see, my initial patch has been moldering for a while).
>
> Sent from my iPad
>
>> On Nov 4, 2019, at 8:15 PM, Japris Pogrammer <mrjarviscraft at gmail.com> wrote:
>>
>> Hi there!.
>>
>> I am here with the proposal (and a working implementation prototype [1]) of
>> a invokedynamic-based string-switch which, while not breaking any
>> backwards-compatibility and not requiring any cascade changes, allows
>> improved behaviour for this JLS feature.
>>
>> Idea:
>> Replace the lookupswitch instruction part (and all the following
>> instructions corresponding to its labels) generated by javac for a
>> String-switch (keeping the tableswitch part) with a single invokedynamic
>> instruction using the new SwitchTableFactory.
>> Patch contents:
>> - SwitchTableFactory.java : home for bootstrap-methods replacing the old
>> lookupswitch behaviour
>> - SwitchTableFactoryException.java : exception thrown by bootstrap-methods
>> of SwitchTableFactory
>> - BaseTest.java : simple behavioral test of bootstrap-methods invoking them
>> via invokestatic and testing results produced by MethodHandles of returned
>> CallSites
>>
>> Reasoning:
>> 1) Current approach (used by javac) is generating a complicated
>> structure (whose number of opcodes grows ~linearly on number of branches).
>> Yet this structure is not usually optimized (at current implementation even
>> 1-branch corner-case is not treated specially [2]). Even if these
>> optimizations were performed at compile-time they could have become useless
>> (or even causing performance problems) in future versions which is against
>> backwards-compatibility principle. Also, switches compiled for older
>> versions whould not use newer features which could have allowed better
>> performance.
>> 2) As of current spec, String hash-code is one of the only hard-coded
>> algorithms. This is mostly due to backwards-compatibility principle and was
>> decided to not be changed (TL;DR and this proposal is not trying to do it).
>> However, using indy-based approach would make the compiler independent from
>> it thus allowing removal of String hash-code algorithm in the future.
>> 3) Producing simpler bytecode would allow external tools analyzing it
>> discover string-switches easier (this may also be said about JIT).
>> 4) Current algorithm cannot rely on String implementation details which are
>> not defined at compile-time (such as String's internal fields).
>>
>> Invokedynamic details:
>> Two bootstrap methods are provided in order to allow big switches (C-style
>> arguments layout is due to inability to pass complicated arguments (such as
>> typed arrays) into the bootstap method):
>> 1) SwitchTableFactory:createStringSwitchTable(Lookup, String, MethodType,
>> Object[]) whose bootstrap arguments are the following:
>> <default branch id>, <number of String labels>, [<branch id>, <number of
>> String labels corresponsing to the branch>, [<string labels corresponsing
>> to the branch>]*]*
>> 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
>> MethodType, Object[]) whose bootstrap arguments are the following:
>> <default branch id>, <number of pages (the following argument groups)>,
>> [<page>]*
>> Page is a String of the following pattern:
>> [<HEX branch id>\0<HEX number of String labels corresponsing to the
>> branch>\0[<HEX length of the following String label>\0<String label
>> corresponsing to the branch>]*]*
>> As you can see, it allows for multiple branches (each containing multiple
>> labels) to be described under one page. This is done in order to support
>> big switches which cannot fit into the first method variant.
>>
>> Example:
>> As an example of what could be done by the compiler using this feature,
>> consider the following switch expression:
>>
>> switch (string) {
>> case "foo": <branch 0>
>> case "bar": <branch 1>
>> case "baz": case "qux": <branch 2>
>> default: <branch -1>
>> }
>>
>> Generate bytecode would look like:
>>
>> invokedynamic
>> java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1, 4,
>> 0, 1, "foo",
>> 1, 1, "bar",
>> 2, 1, "baz", "qux"
>> }
>> <good-old switch-table by code>
>>
>> or (this should happen for big switches)
>>
>> invokedynamic
>> java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..) {-1, 1,
>> "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
>> }
>> <good-old switch-table by code>
>>
>> (while the argument looks complicates it is (A) shoter (as each '\000' is
>> actually one char) and (B) linear (meaning it is easy to parse))
>>
>> Implementation details:
>> Current implementation is based on ASM-generation. Under the hood it
>> generates a class ) with a static String (int) method (passed to
>> ConstantCallSite) performing hash-code() switch + equals(Object) (as
>> currently done by javac) handling 0- and 1-branch cases specially (by using
>> a MethodHandle of a method returning a bound value (default branch ID) and
>> by generating an equals(Object) check respectively). The class is
>> anonymously defined via Unsafe as done by other JDK bootstrap methods.
>>
>> Features to be done (this were not done yet as I am not aware yet if this
>> feature will get positive feedback):
>> 1) Add the class and its indy-details into other classes to have it treated
>> specially (as done for LambdaMetaFactory and StringConcatFactory) [3].
>> 2) FIll the documentation of the SwitchTableFactory.
>> 3) Use SwitchTableFactory in javac (probably, as a preview feature) and
>> test its behaviour.
>>
>> Possible improvements:
>> 1) Bootstrap's generator may use its own hash-code algorithm (based on
>> direct access to String's content using special Lookup or Unsafe) which
>> would be more efficient than String#hashCode().
>> 2) Bootstrap's generator may depend on other String's properties, such as
>> compact-mode, length etc.
>> 2) Bootstrap's generator may use switchtable in case of hash-codes being
>> close (yet it seems to be too rare to handle).
>>
>> Subjects to discuss:
>> 1. Should the bootstrap method validate its arguments in order to report
>> irrational arguments (such as usage of duplicate String labels)?
>> 2. Is the current Boostrap-arguments layout fine or should be remade?
>> 3. Which optimizations should be added (and/or toggled by default) to the
>> generator?
>> 4. Should branch IDs be specified explicitly or not (using -1 for default
>> and 0 + i for other branches)?
>> 5. Should there be a way to pass MethodHandles to bootstrap methods to be
>> invoked instead of branch IDs being returned if a switch returns nothing
>> (and should there be a way to do it for stack-affecting switches?)?
>>
>> Possible extensions:
>> 1. As the name of the class suggests, it is a general switch-table factory
>> and it may be used in future for other types of switches. At least,
>> enum-switch *may* be done using it, but (in order to keep it one-switch
>> based) it would be much better to have it implemented via condy producing
>> int-array for ordinals (this would remove the need for extra synthetic
>> class currently being generated).
>> 2. Moreover, this approach may be used for boosted implementations of
>> pattern-matching magic (using the same approach as for the strings) with
>> the exception of the need to pass more complicated structures into the
>> bootstrap method (using condy or references to producing methods or more
>> complicated patterns).
>> 3. In addition to #2, a universal approach may be introduced to allow
>> faster switch by all types of values (which is primary useful for pattern
>> matching) such as passing pairs of int(T) and boolean(T) MethodHandles
>> performing hash-code computation and equality check respectively.
>>
>> I am ready to continue developing this feature (being ready to sign Oracle
>> Contributor Agreement) which includes:
>> - documentation
>> - improvements to the current methods
>> - addition of the class to other
>> - benchmarks and testing
>> - optimization
>> - compiler support for this switch-approach
>> - other features described above, such as a universal switch bootstrap
>> method (Possible extensions, #3)
>>
>> Notes:
>> This might have been discussed in the past (at least indy-based switches in
>> general), if there was a reason for this to not be implemented, please
>> refer me to it. I would be happy to work on this proposal to make this
>> feature (and other ones mentioned) part of OpenJDK.
>>
>> Thanks for your interest and support,
>> Peter
>>
>> Links:
>> [1] https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f Patch
>> (dynamic mirror of the file appended)
>> [2]
>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>> Current
>> javac String implementation root
>> [3]
>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>> One
>> of the cases where JDK's Bootstrap-methods are handled specially
>
More information about the amber-dev
mailing list