indy-based string-switch proposal

forax at univ-mlv.fr forax at univ-mlv.fr
Mon Nov 4 23:59:30 UTC 2019


> De: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
> À: "Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <amber-dev at openjdk.java.net>
> Envoyé: Mardi 5 Novembre 2019 00:49:58
> Objet: Re: indy-based string-switch proposal

> I wonit say that the limitation is so strict.
> In the patch (from which this topic starts) I've used more complicated
> String-patterns (aka pages) in order to allow single String bootstrap argument
> contain description for multiple switch labels.

multiple labels can be handled by the tableswitch that follow the call to invokedynamic. 

Rémi 

> вт, 5 нояб. 2019 г. в 02:42, Remi Forax < [ mailto:forax at univ-mlv.fr |
> forax at univ-mlv.fr ] >:

>> I term of arguments to sent to the boostrap methods, you can only send the
>> strings (concatenated as one big string)
>> with the convention that you will return 0 for the first one, 1 for the next
>> one, etc.

>> Rémi

>> ----- Mail original -----
>>> De: "JARvis PROgrammer" < [ mailto:mrjarviscraft at gmail.com |
>> > mrjarviscraft at gmail.com ] >
>>> À: "amber-dev" < [ mailto:amber-dev at openjdk.java.net |
>> > amber-dev at openjdk.java.net ] >
>> > Envoyé: Lundi 4 Novembre 2019 20:15:17
>> > Objet: indy-based string-switch proposal

>> > Hi there!.

>> > I am here with the proposal (and a working implementation prototype [1]) of
>> > a invokedynamic-based string-switch which, while not breaking any
>> > backwards-compatibility and not requiring any cascade changes, allows
>> > improved behaviour for this JLS feature.

>> > Idea:
>> > Replace the lookupswitch instruction part (and all the following
>> > instructions corresponding to its labels) generated by javac for a
>> > String-switch (keeping the tableswitch part) with a single invokedynamic
>> > instruction using the new SwitchTableFactory.
>> > Patch contents:
>> > - SwitchTableFactory.java : home for bootstrap-methods replacing the old
>> > lookupswitch behaviour
>> > - SwitchTableFactoryException.java : exception thrown by bootstrap-methods
>> > of SwitchTableFactory
>> > - BaseTest.java : simple behavioral test of bootstrap-methods invoking them
>> > via invokestatic and testing results produced by MethodHandles of returned
>> > CallSites

>> > Reasoning:
>> > 1) Current approach (used by javac) is generating a complicated
>> > structure (whose number of opcodes grows ~linearly on number of branches).
>> > Yet this structure is not usually optimized (at current implementation even
>> > 1-branch corner-case is not treated specially [2]). Even if these
>> > optimizations were performed at compile-time they could have become useless
>> > (or even causing performance problems) in future versions which is against
>> > backwards-compatibility principle. Also, switches compiled for older
>> > versions whould not use newer features which could have allowed better
>> > performance.
>> > 2) As of current spec, String hash-code is one of the only hard-coded
>> > algorithms. This is mostly due to backwards-compatibility principle and was
>> > decided to not be changed (TL;DR and this proposal is not trying to do it).
>> > However, using indy-based approach would make the compiler independent from
>> > it thus allowing removal of String hash-code algorithm in the future.
>> > 3) Producing simpler bytecode would allow external tools analyzing it
>> > discover string-switches easier (this may also be said about JIT).
>> > 4) Current algorithm cannot rely on String implementation details which are
>> > not defined at compile-time (such as String's internal fields).

>> > Invokedynamic details:
>> > Two bootstrap methods are provided in order to allow big switches (C-style
>> > arguments layout is due to inability to pass complicated arguments (such as
>> > typed arrays) into the bootstap method):
>> > 1) SwitchTableFactory:createStringSwitchTable(Lookup, String, MethodType,
>> > Object[]) whose bootstrap arguments are the following:
>> > <default branch id>, <number of String labels>, [<branch id>, <number of
>> > String labels corresponsing to the branch>, [<string labels corresponsing
>> > to the branch>]*]*
>> > 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
>> > MethodType, Object[]) whose bootstrap arguments are the following:
>> > <default branch id>, <number of pages (the following argument groups)>,
>> > [<page>]*
>> > Page is a String of the following pattern:
>> > [<HEX branch id>\0<HEX number of String labels corresponsing to the
>> > branch>\0[<HEX length of the following String label>\0<String label
>> > corresponsing to the branch>]*]*
>> > As you can see, it allows for multiple branches (each containing multiple
>> > labels) to be described under one page. This is done in order to support
>> > big switches which cannot fit into the first method variant.

>> > Example:
>> > As an example of what could be done by the compiler using this feature,
>> > consider the following switch expression:

>> > switch (string) {
>> > case "foo": <branch 0>
>> > case "bar": <branch 1>
>> > case "baz": case "qux": <branch 2>
>> > default: <branch -1>
>> > }

>> > Generate bytecode would look like:

>> > invokedynamic
>> > java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1, 4,
>> > 0, 1, "foo",
>> > 1, 1, "bar",
>> > 2, 1, "baz", "qux"
>> > }
>> > <good-old switch-table by code>

>> > or (this should happen for big switches)

>> > invokedynamic
>> > java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..) {-1, 1,
>> > "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
>> > }
>> > <good-old switch-table by code>

>> > (while the argument looks complicates it is (A) shoter (as each '\000' is
>> > actually one char) and (B) linear (meaning it is easy to parse))

>> > Implementation details:
>> > Current implementation is based on ASM-generation. Under the hood it
>> > generates a class ) with a static String (int) method (passed to
>> > ConstantCallSite) performing hash-code() switch + equals(Object) (as
>> > currently done by javac) handling 0- and 1-branch cases specially (by using
>> > a MethodHandle of a method returning a bound value (default branch ID) and
>> > by generating an equals(Object) check respectively). The class is
>> > anonymously defined via Unsafe as done by other JDK bootstrap methods.

>> > Features to be done (this were not done yet as I am not aware yet if this
>> > feature will get positive feedback):
>> > 1) Add the class and its indy-details into other classes to have it treated
>> > specially (as done for LambdaMetaFactory and StringConcatFactory) [3].
>> > 2) FIll the documentation of the SwitchTableFactory.
>> > 3) Use SwitchTableFactory in javac (probably, as a preview feature) and
>> > test its behaviour.

>> > Possible improvements:
>> > 1) Bootstrap's generator may use its own hash-code algorithm (based on
>> > direct access to String's content using special Lookup or Unsafe) which
>> > would be more efficient than String#hashCode().
>> > 2) Bootstrap's generator may depend on other String's properties, such as
>> > compact-mode, length etc.
>> > 2) Bootstrap's generator may use switchtable in case of hash-codes being
>> > close (yet it seems to be too rare to handle).

>> > Subjects to discuss:
>> > 1. Should the bootstrap method validate its arguments in order to report
>> > irrational arguments (such as usage of duplicate String labels)?
>> > 2. Is the current Boostrap-arguments layout fine or should be remade?
>> > 3. Which optimizations should be added (and/or toggled by default) to the
>> > generator?
>> > 4. Should branch IDs be specified explicitly or not (using -1 for default
>> > and 0 + i for other branches)?
>> > 5. Should there be a way to pass MethodHandles to bootstrap methods to be
>> > invoked instead of branch IDs being returned if a switch returns nothing
>> > (and should there be a way to do it for stack-affecting switches?)?

>> > Possible extensions:
>> > 1. As the name of the class suggests, it is a general switch-table factory
>> > and it may be used in future for other types of switches. At least,
>> > enum-switch *may* be done using it, but (in order to keep it one-switch
>> > based) it would be much better to have it implemented via condy producing
>> > int-array for ordinals (this would remove the need for extra synthetic
>> > class currently being generated).
>> > 2. Moreover, this approach may be used for boosted implementations of
>> > pattern-matching magic (using the same approach as for the strings) with
>> > the exception of the need to pass more complicated structures into the
>> > bootstrap method (using condy or references to producing methods or more
>> > complicated patterns).
>> > 3. In addition to #2, a universal approach may be introduced to allow
>> > faster switch by all types of values (which is primary useful for pattern
>> > matching) such as passing pairs of int(T) and boolean(T) MethodHandles
>> > performing hash-code computation and equality check respectively.

>> > I am ready to continue developing this feature (being ready to sign Oracle
>> > Contributor Agreement) which includes:
>> > - documentation
>> > - improvements to the current methods
>> > - addition of the class to other
>> > - benchmarks and testing
>> > - optimization
>> > - compiler support for this switch-approach
>> > - other features described above, such as a universal switch bootstrap
>> > method (Possible extensions, #3)

>> > Notes:
>> > This might have been discussed in the past (at least indy-based switches in
>> > general), if there was a reason for this to not be implemented, please
>> > refer me to it. I would be happy to work on this proposal to make this
>> > feature (and other ones mentioned) part of OpenJDK.

>> > Thanks for your interest and support,
>> > Peter

>> > Links:
>>> [1] [ https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f |
>> > https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f ] Patch
>> > (dynamic mirror of the file appended)
>> > [2]
>>> [
>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>>> |
>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>> > ]
>> > Current
>> > javac String implementation root
>> > [3]
>>> [
>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>>> |
>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>> > ]
>> > One
>> > of the cases where JDK's Bootstrap-methods are handled specially


More information about the amber-dev mailing list