indy-based string-switch proposal

forax at univ-mlv.fr forax at univ-mlv.fr
Tue Nov 5 09:01:53 UTC 2019


----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "JARvis PROgrammer" <mrjarviscraft at gmail.com>, "amber-dev" <amber-dev at openjdk.java.net>
> Envoyé: Mardi 5 Novembre 2019 09:26:54
> Objet: Re: indy-based string-switch proposal

> We need these combinators, but it’s unlikely we can use them to really translate
> switch in the general case (think mutable locals and no local breaks).

yes,
i fully agree.

> 
> Sent from my MacBook Wheel

Rémi

> 
>> On Nov 5, 2019, at 12:39 AM, Remi Forax <forax at univ-mlv.fr> wrote:
>> 
>> ----- Mail original -----
>>> De: "Brian Goetz" <brian.goetz at oracle.com>
>>> À: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
>>> Cc: "amber-dev" <amber-dev at openjdk.java.net>
>>> Envoyé: Lundi 4 Novembre 2019 21:24:54
>>> Objet: Re: indy-based string-switch proposal
>> 
>>> Indeed, such a translation improvement is on the roadmap — in fact we want to
>>> might all switches other than those over ints to use Indy-based classification.
>>> When we extend switch to support patterns, it will force us to revamp switch
>>> translation, and this sort of stuff is better done in library code than in
>>> generated bytecode.
>> 
>> We will still need one or two method handle combinators to emulate the table
>> switch and the lookup switch.
>> So the generated code should not use directly but indirectly those new
>> combinators may generate bytecodes.
>> 
>>> 
>>> There is a prototype of the bootstraps for string, long, and others, with some
>>> test cases, checked into an amber branch — let me dig this up.
>>> 
>>> As you discovered, there is an Indy pothole regarding long arg lists, but is
>>> slated to be fixed (or may already have been).
>>> 
>>> This is a good direction - and something for which we’d be glad for the help (as
>>> you’ll see, my initial patch has been moldering for a while).
>> 
>> Rémi
>> 
>>> 
>>> Sent from my iPad
>>> 
>>>>> On Nov 4, 2019, at 8:15 PM, Japris Pogrammer <mrjarviscraft at gmail.com> wrote:
>>>> 
>>>> Hi there!.
>>>> 
>>>> I am here with the proposal (and a working implementation prototype [1]) of
>>>> a invokedynamic-based string-switch which, while not breaking any
>>>> backwards-compatibility and not requiring any cascade changes, allows
>>>> improved behaviour for this JLS feature.
>>>> 
>>>> Idea:
>>>> Replace the lookupswitch instruction part (and all the following
>>>> instructions corresponding to its labels) generated by javac for a
>>>> String-switch (keeping the tableswitch part) with a single invokedynamic
>>>> instruction using the new SwitchTableFactory.
>>>> Patch contents:
>>>> - SwitchTableFactory.java : home for bootstrap-methods replacing the old
>>>> lookupswitch behaviour
>>>> - SwitchTableFactoryException.java : exception thrown by bootstrap-methods
>>>> of SwitchTableFactory
>>>> - BaseTest.java : simple behavioral test of bootstrap-methods invoking them
>>>> via invokestatic and testing results produced by MethodHandles of returned
>>>> CallSites
>>>> 
>>>> Reasoning:
>>>> 1) Current approach (used by javac) is generating a complicated
>>>> structure (whose number of opcodes grows ~linearly on number of branches).
>>>> Yet this structure is not usually optimized (at current implementation even
>>>> 1-branch corner-case is not treated specially [2]). Even if these
>>>> optimizations were performed at compile-time they could have become useless
>>>> (or even causing performance problems) in future versions which is against
>>>> backwards-compatibility principle. Also, switches compiled for older
>>>> versions whould not use newer features which could have allowed better
>>>> performance.
>>>> 2) As of current spec, String hash-code is one of the only hard-coded
>>>> algorithms. This is mostly due to backwards-compatibility principle and was
>>>> decided to not be changed (TL;DR and this proposal is not trying to do it).
>>>> However, using indy-based approach would make the compiler independent from
>>>> it thus allowing removal of String hash-code algorithm in the future.
>>>> 3) Producing simpler bytecode would allow external tools analyzing it
>>>> discover string-switches easier (this may also be said about JIT).
>>>> 4) Current algorithm cannot rely on String implementation details which are
>>>> not defined at compile-time (such as String's internal fields).
>>>> 
>>>> Invokedynamic details:
>>>> Two bootstrap methods are provided in order to allow big switches (C-style
>>>> arguments layout is due to inability to pass complicated arguments (such as
>>>> typed arrays) into the bootstap method):
>>>> 1) SwitchTableFactory:createStringSwitchTable(Lookup, String, MethodType,
>>>> Object[]) whose bootstrap arguments are the following:
>>>> <default branch id>, <number of String labels>, [<branch id>, <number of
>>>> String labels corresponsing to the branch>, [<string labels corresponsing
>>>> to the branch>]*]*
>>>> 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
>>>> MethodType, Object[]) whose bootstrap arguments are the following:
>>>> <default branch id>, <number of pages (the following argument groups)>,
>>>> [<page>]*
>>>> Page is a String of the following pattern:
>>>> [<HEX branch id>\0<HEX number of String labels corresponsing to the
>>>> branch>\0[<HEX length of the following String label>\0<String label
>>>> corresponsing to the branch>]*]*
>>>> As you can see, it allows for multiple branches (each containing multiple
>>>> labels) to be described under one page. This is done in order to support
>>>> big switches which cannot fit into the first method variant.
>>>> 
>>>> Example:
>>>> As an example of what could be done by the compiler using this feature,
>>>> consider the following switch expression:
>>>> 
>>>> switch (string) {
>>>>   case "foo": <branch 0>
>>>>   case "bar": <branch 1>
>>>>   case "baz": case "qux": <branch 2>
>>>>   default: <branch -1>
>>>> }
>>>> 
>>>> Generate bytecode would look like:
>>>> 
>>>> invokedynamic
>>>> java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1, 4,
>>>>       0, 1, "foo",
>>>>       1, 1, "bar",
>>>>       2, 1, "baz", "qux"
>>>> }
>>>> <good-old switch-table by code>
>>>> 
>>>> or (this should happen for big switches)
>>>> 
>>>> invokedynamic
>>>> java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..) {-1, 1,
>>>>       "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
>>>> }
>>>> <good-old switch-table by code>
>>>> 
>>>> (while the argument looks complicates it is (A) shoter (as each '\000' is
>>>> actually one char) and (B) linear (meaning it is easy to parse))
>>>> 
>>>> Implementation details:
>>>> Current implementation is based on ASM-generation. Under the hood it
>>>> generates a class ) with a static String (int) method (passed to
>>>> ConstantCallSite) performing hash-code() switch + equals(Object) (as
>>>> currently done by javac) handling 0- and 1-branch cases specially (by using
>>>> a MethodHandle of a method returning a bound value (default branch ID) and
>>>> by generating an equals(Object) check respectively). The class is
>>>> anonymously defined via Unsafe as done by other JDK bootstrap methods.
>>>> 
>>>> Features to be done (this were not done yet as I am not aware yet if this
>>>> feature will get positive feedback):
>>>> 1) Add the class and its indy-details into other classes to have it treated
>>>> specially (as done for LambdaMetaFactory and StringConcatFactory) [3].
>>>> 2) FIll the documentation of the SwitchTableFactory.
>>>> 3) Use SwitchTableFactory in javac (probably, as a preview feature) and
>>>> test its behaviour.
>>>> 
>>>> Possible improvements:
>>>> 1) Bootstrap's generator may use its own hash-code algorithm (based on
>>>> direct access to String's content using special Lookup or Unsafe) which
>>>> would be more efficient than String#hashCode().
>>>> 2) Bootstrap's generator may depend on other String's properties, such as
>>>> compact-mode, length etc.
>>>> 2) Bootstrap's generator may use switchtable in case of hash-codes being
>>>> close (yet it seems to be too rare to handle).
>>>> 
>>>> Subjects to discuss:
>>>> 1. Should the bootstrap method validate its arguments in order to report
>>>> irrational arguments (such as usage of duplicate String labels)?
>>>> 2. Is the current Boostrap-arguments layout fine or should be remade?
>>>> 3. Which optimizations should be added (and/or toggled by default) to the
>>>> generator?
>>>> 4. Should branch IDs be specified explicitly or not (using -1 for default
>>>> and 0 + i for other branches)?
>>>> 5. Should there be a way to pass MethodHandles to bootstrap methods to be
>>>> invoked instead of branch IDs being returned if a switch returns nothing
>>>> (and should there be a way to do it for stack-affecting switches?)?
>>>> 
>>>> Possible extensions:
>>>> 1. As the name of the class suggests, it is a general switch-table factory
>>>> and it may be used in future for other types of switches. At least,
>>>> enum-switch *may* be done using it, but (in order to keep it one-switch
>>>> based) it would be much better to have it implemented via condy producing
>>>> int-array for ordinals (this would remove the need for extra synthetic
>>>> class currently being generated).
>>>> 2. Moreover, this approach may be used for boosted implementations of
>>>> pattern-matching magic (using the same approach as for the strings) with
>>>> the exception of the need to pass more complicated structures into the
>>>> bootstrap method (using condy or references to producing methods or more
>>>> complicated patterns).
>>>> 3. In addition to #2, a universal approach may be introduced to allow
>>>> faster switch by all types of values (which is primary useful for pattern
>>>> matching) such as passing pairs of int(T) and boolean(T) MethodHandles
>>>> performing hash-code computation and equality check respectively.
>>>> 
>>>> I am ready to continue developing this feature (being ready to sign Oracle
>>>> Contributor Agreement) which includes:
>>>> - documentation
>>>> - improvements to the current methods
>>>> - addition of the class to other
>>>> - benchmarks and testing
>>>> - optimization
>>>> - compiler support for this switch-approach
>>>> - other features described above, such as a universal switch bootstrap
>>>> method (Possible extensions, #3)
>>>> 
>>>> Notes:
>>>> This might have been discussed in the past (at least indy-based switches in
>>>> general), if there was a reason for this to not be implemented, please
>>>> refer me to it. I would be happy to work on this proposal to make this
>>>> feature (and other ones mentioned) part of OpenJDK.
>>>> 
>>>> Thanks for your interest and support,
>>>> Peter
>>>> 
>>>> Links:
>>>> [1] https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f Patch
>>>> (dynamic mirror of the file appended)
>>>> [2]
>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>>>> Current
>>>> javac String implementation root
>>>> [3]
>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>>>> One
> >>> of the cases where JDK's Bootstrap-methods are handled specially


More information about the amber-dev mailing list