indy-based string-switch proposal

Japris Pogrammer mrjarviscraft at gmail.com
Tue Nov 5 10:50:56 UTC 2019


I can start by benchmarking ASM and Binary-switch based approaches on
Strings

вт, 5 нояб. 2019 г., 12:06 Brian Goetz <brian.goetz at oracle.com>:

> So, a good next step is to put in place some measurement of switch cost
> using different translations.  We can start by comparing old and new, and
> then can experiment with alternate bootstrap implementations (such as other
> has codes).
>
> Note that we want to use this technique for all switches except those over
> int and smaller.  This includes not only string and enum, but long and
> float, and patterns.
>
> Sent from my MacBook Wheel
>
> > On Nov 5, 2019, at 1:18 AM, Japris Pogrammer <mrjarviscraft at gmail.com>
> wrote:
> >
> > My idea arond it was:
> > - Compiler:
> >  - Counts the number of branches
> >  - Associates branches with some close IDs
> >  - Associates branch values with IDs
> >  After all we get something like <T : label --> int : branch ID> (branch
> > IDs by labels) + int : default ID
> >  - Generated invokedynamic to accept value and return the corresponding
> ID
> >  - Generates lookuptable to go to branches by IDs
> > - Bootstap method at runtime:
> >  - Generate maximally effective implementation of <label value to branch
> > ID> conversion performing all possible runtime optimizations (e.g. access
> > to string's internal data or alternative hash-code strategy for String
> > switch)
> >
> > вт, 5 нояб. 2019 г. в 03:05, Japris Pogrammer <mrjarviscraft at gmail.com>:
> >
> >> But is not it breaking the whole idea of indy-fication of the switches?
> Is
> >> not it better to let the compiler generate minimal code for these
> >> delegating all job to the bootstrap? Again, by doing so we allow future
> >> implementations optimize this behavour without changing the class.
> >> For example, CallSite may omit unneeded checks based on these which it
> has
> >> already performed.
> >>
> >> вт, 5 нояб. 2019 г. в 02:59, <forax at univ-mlv.fr>:
> >>
> >>>
> >>>
> >>> ------------------------------
> >>>
> >>> *De: *"JARvis PROgrammer" <mrjarviscraft at gmail.com>
> >>> *À: *"Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <
> >>> amber-dev at openjdk.java.net>
> >>> *Envoyé: *Mardi 5 Novembre 2019 00:49:58
> >>> *Objet: *Re: indy-based string-switch proposal
> >>>
> >>> I wonit say that the limitation is so strict.
> >>> In the patch (from which this topic starts) I've used more complicated
> >>> String-patterns (aka pages) in order to allow single String bootstrap
> >>> argument contain description for multiple switch labels.
> >>>
> >>>
> >>> multiple labels can be handled by the tableswitch that follow the call
> to
> >>> invokedynamic.
> >>>
> >>> Rémi
> >>>
> >>>
> >>>
> >>> вт, 5 нояб. 2019 г. в 02:42, Remi Forax <forax at univ-mlv.fr>:
> >>>
> >>>> I term of arguments to sent to the boostrap methods, you can only send
> >>>> the strings (concatenated as one big string)
> >>>> with the convention that you will return 0 for the first one, 1 for
> the
> >>>> next one, etc.
> >>>>
> >>>> Rémi
> >>>>
> >>>> ----- Mail original -----
> >>>>> De: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
> >>>>> À: "amber-dev" <amber-dev at openjdk.java.net>
> >>>>> Envoyé: Lundi 4 Novembre 2019 20:15:17
> >>>>> Objet: indy-based string-switch proposal
> >>>>
> >>>>> Hi there!.
> >>>>>
> >>>>> I am here with the proposal (and a working implementation prototype
> >>>> [1]) of
> >>>>> a invokedynamic-based string-switch which, while not breaking any
> >>>>> backwards-compatibility and not requiring any cascade changes, allows
> >>>>> improved behaviour for this JLS feature.
> >>>>>
> >>>>> Idea:
> >>>>> Replace the lookupswitch instruction part (and all the following
> >>>>> instructions corresponding to its labels) generated by javac for a
> >>>>> String-switch (keeping the tableswitch part) with a single
> >>>> invokedynamic
> >>>>> instruction using the new SwitchTableFactory.
> >>>>> Patch contents:
> >>>>> - SwitchTableFactory.java : home for bootstrap-methods replacing the
> >>>> old
> >>>>> lookupswitch behaviour
> >>>>> - SwitchTableFactoryException.java : exception thrown by
> >>>> bootstrap-methods
> >>>>> of SwitchTableFactory
> >>>>> - BaseTest.java : simple behavioral test of bootstrap-methods
> invoking
> >>>> them
> >>>>> via invokestatic and testing results produced by MethodHandles of
> >>>> returned
> >>>>> CallSites
> >>>>>
> >>>>> Reasoning:
> >>>>> 1) Current approach (used by javac) is generating a complicated
> >>>>> structure (whose number of opcodes grows ~linearly on number of
> >>>> branches).
> >>>>> Yet this structure is not usually optimized (at current
> implementation
> >>>> even
> >>>>> 1-branch corner-case is not treated specially [2]). Even if these
> >>>>> optimizations were performed at compile-time they could have become
> >>>> useless
> >>>>> (or even causing performance problems) in future versions which is
> >>>> against
> >>>>> backwards-compatibility principle. Also, switches compiled for older
> >>>>> versions whould not use newer features which could have allowed
> better
> >>>>> performance.
> >>>>> 2) As of current spec, String hash-code is one of the only hard-coded
> >>>>> algorithms. This is mostly due to backwards-compatibility principle
> >>>> and was
> >>>>> decided to not be changed (TL;DR and this proposal is not trying to
> do
> >>>> it).
> >>>>> However, using indy-based approach would make the compiler
> independent
> >>>> from
> >>>>> it thus allowing removal of String hash-code algorithm in the future.
> >>>>> 3) Producing simpler bytecode would allow external tools analyzing it
> >>>>> discover string-switches easier (this may also be said about JIT).
> >>>>> 4) Current algorithm cannot rely on String implementation details
> >>>> which are
> >>>>> not defined at compile-time (such as String's internal fields).
> >>>>>
> >>>>> Invokedynamic details:
> >>>>> Two bootstrap methods are provided in order to allow big switches
> >>>> (C-style
> >>>>> arguments layout is due to inability to pass complicated arguments
> >>>> (such as
> >>>>> typed arrays) into the bootstap method):
> >>>>> 1) SwitchTableFactory:createStringSwitchTable(Lookup, String,
> >>>> MethodType,
> >>>>> Object[]) whose bootstrap arguments are the following:
> >>>>> <default branch id>, <number of String labels>, [<branch id>, <number
> >>>> of
> >>>>> String labels corresponsing to the branch>, [<string labels
> >>>> corresponsing
> >>>>> to the branch>]*]*
> >>>>> 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
> >>>>> MethodType, Object[]) whose bootstrap arguments are the following:
> >>>>> <default branch id>, <number of pages (the following argument
> groups)>,
> >>>>> [<page>]*
> >>>>> Page is a String of the following pattern:
> >>>>> [<HEX branch id>\0<HEX number of String labels corresponsing to the
> >>>>> branch>\0[<HEX length of the following String label>\0<String label
> >>>>> corresponsing to the branch>]*]*
> >>>>> As you can see, it allows for multiple branches (each containing
> >>>> multiple
> >>>>> labels) to be described under one page. This is done in order to
> >>>> support
> >>>>> big switches which cannot fit into the first method variant.
> >>>>>
> >>>>> Example:
> >>>>> As an example of what could be done by the compiler using this
> feature,
> >>>>> consider the following switch expression:
> >>>>>
> >>>>> switch (string) {
> >>>>>   case "foo": <branch 0>
> >>>>>   case "bar": <branch 1>
> >>>>>   case "baz": case "qux": <branch 2>
> >>>>>   default: <branch -1>
> >>>>> }
> >>>>>
> >>>>> Generate bytecode would look like:
> >>>>>
> >>>>> invokedynamic
> >>>>> java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1,
> 4,
> >>>>>       0, 1, "foo",
> >>>>>       1, 1, "bar",
> >>>>>       2, 1, "baz", "qux"
> >>>>> }
> >>>>> <good-old switch-table by code>
> >>>>>
> >>>>> or (this should happen for big switches)
> >>>>>
> >>>>> invokedynamic
> >>>>> java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..)
> >>>> {-1, 1,
> >>>>>
>  "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
> >>>>> }
> >>>>> <good-old switch-table by code>
> >>>>>
> >>>>> (while the argument looks complicates it is (A) shoter (as each
> '\000'
> >>>> is
> >>>>> actually one char) and (B) linear (meaning it is easy to parse))
> >>>>>
> >>>>> Implementation details:
> >>>>> Current implementation is based on ASM-generation. Under the hood it
> >>>>> generates a class ) with a static String (int) method (passed to
> >>>>> ConstantCallSite) performing hash-code() switch + equals(Object) (as
> >>>>> currently done by javac) handling 0- and 1-branch cases specially (by
> >>>> using
> >>>>> a MethodHandle of a method returning a bound value (default branch
> ID)
> >>>> and
> >>>>> by generating an equals(Object) check respectively). The class is
> >>>>> anonymously defined via Unsafe as done by other JDK bootstrap
> methods.
> >>>>>
> >>>>> Features to be done (this were not done yet as I am not aware yet if
> >>>> this
> >>>>> feature will get positive feedback):
> >>>>> 1) Add the class and its indy-details into other classes to have it
> >>>> treated
> >>>>> specially (as done for LambdaMetaFactory and StringConcatFactory)
> [3].
> >>>>> 2) FIll the documentation of the SwitchTableFactory.
> >>>>> 3) Use SwitchTableFactory in javac (probably, as a preview feature)
> and
> >>>>> test its behaviour.
> >>>>>
> >>>>> Possible improvements:
> >>>>> 1) Bootstrap's generator may use its own hash-code algorithm (based
> on
> >>>>> direct access to String's content using special Lookup or Unsafe)
> which
> >>>>> would be more efficient than String#hashCode().
> >>>>> 2) Bootstrap's generator may depend on other String's properties,
> such
> >>>> as
> >>>>> compact-mode, length etc.
> >>>>> 2) Bootstrap's generator may use switchtable in case of hash-codes
> >>>> being
> >>>>> close (yet it seems to be too rare to handle).
> >>>>>
> >>>>> Subjects to discuss:
> >>>>> 1. Should the bootstrap method validate its arguments in order to
> >>>> report
> >>>>> irrational arguments (such as usage of duplicate String labels)?
> >>>>> 2. Is the current Boostrap-arguments layout fine or should be remade?
> >>>>> 3. Which optimizations should be added (and/or toggled by default) to
> >>>> the
> >>>>> generator?
> >>>>> 4. Should branch IDs be specified explicitly or not (using -1 for
> >>>> default
> >>>>> and 0 + i for other branches)?
> >>>>> 5. Should there be a way to pass MethodHandles to bootstrap methods
> to
> >>>> be
> >>>>> invoked instead of branch IDs being returned if a switch returns
> >>>> nothing
> >>>>> (and should there be a way to do it for stack-affecting switches?)?
> >>>>>
> >>>>> Possible extensions:
> >>>>> 1. As the name of the class suggests, it is a general switch-table
> >>>> factory
> >>>>> and it may be used in future for other types of switches. At least,
> >>>>> enum-switch *may* be done using it, but (in order to keep it
> one-switch
> >>>>> based) it would be much better to have it implemented via condy
> >>>> producing
> >>>>> int-array for ordinals (this would remove the need for extra
> synthetic
> >>>>> class currently being generated).
> >>>>> 2. Moreover, this approach may be used for boosted implementations of
> >>>>> pattern-matching magic (using the same approach as for the strings)
> >>>> with
> >>>>> the exception of the need to pass more complicated structures into
> the
> >>>>> bootstrap method (using condy or references to producing methods or
> >>>> more
> >>>>> complicated patterns).
> >>>>> 3. In addition to #2, a universal approach may be introduced to allow
> >>>>> faster switch by all types of values (which is primary useful for
> >>>> pattern
> >>>>> matching) such as passing pairs of int(T) and boolean(T)
> MethodHandles
> >>>>> performing hash-code computation and equality check respectively.
> >>>>>
> >>>>> I am ready to continue developing this feature (being ready to sign
> >>>> Oracle
> >>>>> Contributor Agreement) which includes:
> >>>>> - documentation
> >>>>> - improvements to the current methods
> >>>>> - addition of the class to other
> >>>>> - benchmarks and testing
> >>>>> - optimization
> >>>>> - compiler support for this switch-approach
> >>>>> - other features described above, such as a universal switch
> bootstrap
> >>>>> method (Possible extensions, #3)
> >>>>>
> >>>>> Notes:
> >>>>> This might have been discussed in the past (at least indy-based
> >>>> switches in
> >>>>> general), if there was a reason for this to not be implemented,
> please
> >>>>> refer me to it. I would be happy to work on this proposal to make
> this
> >>>>> feature (and other ones mentioned) part of OpenJDK.
> >>>>>
> >>>>> Thanks for your interest and support,
> >>>>> Peter
> >>>>>
> >>>>> Links:
> >>>>> [1]
> >>>> https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f
> >>>> Patch
> >>>>> (dynamic mirror of the file appended)
> >>>>> [2]
> >>>>>
> >>>>
> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
> >>>>> Current
> >>>>> javac String implementation root
> >>>>> [3]
> >>>>>
> >>>>
> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
> >>>>> One
> >>>>> of the cases where JDK's Bootstrap-methods are handled specially
> >>>>
> >>>
> >>>
>
>


More information about the amber-dev mailing list