indy-based string-switch proposal
Japris Pogrammer
mrjarviscraft at gmail.com
Tue Nov 5 18:33:35 UTC 2019
To start from, I'd actually like to see how the approach used in the
original patch (full K>V specification in bootstrap arguments and/or
ASM-generation) behaves as part of this class. This would allow
benchmarking it now and/or using it as a base for further experiments with
*freedoms* given to the compiler and runtime.
вт, 5 нояб. 2019 г., 21:29 John Rose <john.r.rose at oracle.com>:
> +100
>
> > On Nov 5, 2019, at 12:30 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> >
> > The general observation that all switches can be combined into an Indy
> based classifier and a densely numbered int switch is consistent with the
> direction we were heading in. Any weird control flow can be well handled
> by the int switch. So you are definitely in the right track.
> >
> >
> > Sent from my MacBook Wheel
> >
> >> On Nov 5, 2019, at 1:18 AM, Japris Pogrammer <mrjarviscraft at gmail.com>
> wrote:
> >>
> >> My idea arond it was:
> >> - Compiler:
> >> - Counts the number of branches
> >> - Associates branches with some close IDs
> >> - Associates branch values with IDs
> >> After all we get something like <T : label --> int : branch ID> (branch
> >> IDs by labels) + int : default ID
> >> - Generated invokedynamic to accept value and return the corresponding
> ID
> >> - Generates lookuptable to go to branches by IDs
> >> - Bootstap method at runtime:
> >> - Generate maximally effective implementation of <label value to branch
> >> ID> conversion performing all possible runtime optimizations (e.g.
> access
> >> to string's internal data or alternative hash-code strategy for String
> >> switch)
> >>
> >> вт, 5 нояб. 2019 г. в 03:05, Japris Pogrammer <mrjarviscraft at gmail.com
> >:
> >>
> >>> But is not it breaking the whole idea of indy-fication of the
> switches? Is
> >>> not it better to let the compiler generate minimal code for these
> >>> delegating all job to the bootstrap? Again, by doing so we allow future
> >>> implementations optimize this behavour without changing the class.
> >>> For example, CallSite may omit unneeded checks based on these which it
> has
> >>> already performed.
> >>>
> >>> вт, 5 нояб. 2019 г. в 02:59, <forax at univ-mlv.fr>:
> >>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> *De: *"JARvis PROgrammer" <mrjarviscraft at gmail.com>
> >>>> *À: *"Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <
> >>>> amber-dev at openjdk.java.net>
> >>>> *Envoyé: *Mardi 5 Novembre 2019 00:49:58
> >>>> *Objet: *Re: indy-based string-switch proposal
> >>>>
> >>>> I wonit say that the limitation is so strict.
> >>>> In the patch (from which this topic starts) I've used more complicated
> >>>> String-patterns (aka pages) in order to allow single String bootstrap
> >>>> argument contain description for multiple switch labels.
> >>>>
> >>>>
> >>>> multiple labels can be handled by the tableswitch that follow the
> call to
> >>>> invokedynamic.
> >>>>
> >>>> Rémi
> >>>>
> >>>>
> >>>>
> >>>> вт, 5 нояб. 2019 г. в 02:42, Remi Forax <forax at univ-mlv.fr>:
> >>>>
> >>>>> I term of arguments to sent to the boostrap methods, you can only
> send
> >>>>> the strings (concatenated as one big string)
> >>>>> with the convention that you will return 0 for the first one, 1 for
> the
> >>>>> next one, etc.
> >>>>>
> >>>>> Rémi
> >>>>>
> >>>>> ----- Mail original -----
> >>>>>> De: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
> >>>>>> À: "amber-dev" <amber-dev at openjdk.java.net>
> >>>>>> Envoyé: Lundi 4 Novembre 2019 20:15:17
> >>>>>> Objet: indy-based string-switch proposal
> >>>>>
> >>>>>> Hi there!.
> >>>>>>
> >>>>>> I am here with the proposal (and a working implementation prototype
> >>>>> [1]) of
> >>>>>> a invokedynamic-based string-switch which, while not breaking any
> >>>>>> backwards-compatibility and not requiring any cascade changes,
> allows
> >>>>>> improved behaviour for this JLS feature.
> >>>>>>
> >>>>>> Idea:
> >>>>>> Replace the lookupswitch instruction part (and all the following
> >>>>>> instructions corresponding to its labels) generated by javac for a
> >>>>>> String-switch (keeping the tableswitch part) with a single
> >>>>> invokedynamic
> >>>>>> instruction using the new SwitchTableFactory.
> >>>>>> Patch contents:
> >>>>>> - SwitchTableFactory.java : home for bootstrap-methods replacing the
> >>>>> old
> >>>>>> lookupswitch behaviour
> >>>>>> - SwitchTableFactoryException.java : exception thrown by
> >>>>> bootstrap-methods
> >>>>>> of SwitchTableFactory
> >>>>>> - BaseTest.java : simple behavioral test of bootstrap-methods
> invoking
> >>>>> them
> >>>>>> via invokestatic and testing results produced by MethodHandles of
> >>>>> returned
> >>>>>> CallSites
> >>>>>>
> >>>>>> Reasoning:
> >>>>>> 1) Current approach (used by javac) is generating a complicated
> >>>>>> structure (whose number of opcodes grows ~linearly on number of
> >>>>> branches).
> >>>>>> Yet this structure is not usually optimized (at current
> implementation
> >>>>> even
> >>>>>> 1-branch corner-case is not treated specially [2]). Even if these
> >>>>>> optimizations were performed at compile-time they could have become
> >>>>> useless
> >>>>>> (or even causing performance problems) in future versions which is
> >>>>> against
> >>>>>> backwards-compatibility principle. Also, switches compiled for older
> >>>>>> versions whould not use newer features which could have allowed
> better
> >>>>>> performance.
> >>>>>> 2) As of current spec, String hash-code is one of the only
> hard-coded
> >>>>>> algorithms. This is mostly due to backwards-compatibility principle
> >>>>> and was
> >>>>>> decided to not be changed (TL;DR and this proposal is not trying to
> do
> >>>>> it).
> >>>>>> However, using indy-based approach would make the compiler
> independent
> >>>>> from
> >>>>>> it thus allowing removal of String hash-code algorithm in the
> future.
> >>>>>> 3) Producing simpler bytecode would allow external tools analyzing
> it
> >>>>>> discover string-switches easier (this may also be said about JIT).
> >>>>>> 4) Current algorithm cannot rely on String implementation details
> >>>>> which are
> >>>>>> not defined at compile-time (such as String's internal fields).
> >>>>>>
> >>>>>> Invokedynamic details:
> >>>>>> Two bootstrap methods are provided in order to allow big switches
> >>>>> (C-style
> >>>>>> arguments layout is due to inability to pass complicated arguments
> >>>>> (such as
> >>>>>> typed arrays) into the bootstap method):
> >>>>>> 1) SwitchTableFactory:createStringSwitchTable(Lookup, String,
> >>>>> MethodType,
> >>>>>> Object[]) whose bootstrap arguments are the following:
> >>>>>> <default branch id>, <number of String labels>, [<branch id>,
> <number
> >>>>> of
> >>>>>> String labels corresponsing to the branch>, [<string labels
> >>>>> corresponsing
> >>>>>> to the branch>]*]*
> >>>>>> 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
> >>>>>> MethodType, Object[]) whose bootstrap arguments are the following:
> >>>>>> <default branch id>, <number of pages (the following argument
> groups)>,
> >>>>>> [<page>]*
> >>>>>> Page is a String of the following pattern:
> >>>>>> [<HEX branch id>\0<HEX number of String labels corresponsing to the
> >>>>>> branch>\0[<HEX length of the following String label>\0<String label
> >>>>>> corresponsing to the branch>]*]*
> >>>>>> As you can see, it allows for multiple branches (each containing
> >>>>> multiple
> >>>>>> labels) to be described under one page. This is done in order to
> >>>>> support
> >>>>>> big switches which cannot fit into the first method variant.
> >>>>>>
> >>>>>> Example:
> >>>>>> As an example of what could be done by the compiler using this
> feature,
> >>>>>> consider the following switch expression:
> >>>>>>
> >>>>>> switch (string) {
> >>>>>> case "foo": <branch 0>
> >>>>>> case "bar": <branch 1>
> >>>>>> case "baz": case "qux": <branch 2>
> >>>>>> default: <branch -1>
> >>>>>> }
> >>>>>>
> >>>>>> Generate bytecode would look like:
> >>>>>>
> >>>>>> invokedynamic
> >>>>>> java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..)
> {-1, 4,
> >>>>>> 0, 1, "foo",
> >>>>>> 1, 1, "bar",
> >>>>>> 2, 1, "baz", "qux"
> >>>>>> }
> >>>>>> <good-old switch-table by code>
> >>>>>>
> >>>>>> or (this should happen for big switches)
> >>>>>>
> >>>>>> invokedynamic
> >>>>>> java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..)
> >>>>> {-1, 1,
> >>>>>>
> "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
> >>>>>> }
> >>>>>> <good-old switch-table by code>
> >>>>>>
> >>>>>> (while the argument looks complicates it is (A) shoter (as each
> '\000'
> >>>>> is
> >>>>>> actually one char) and (B) linear (meaning it is easy to parse))
> >>>>>>
> >>>>>> Implementation details:
> >>>>>> Current implementation is based on ASM-generation. Under the hood it
> >>>>>> generates a class ) with a static String (int) method (passed to
> >>>>>> ConstantCallSite) performing hash-code() switch + equals(Object) (as
> >>>>>> currently done by javac) handling 0- and 1-branch cases specially
> (by
> >>>>> using
> >>>>>> a MethodHandle of a method returning a bound value (default branch
> ID)
> >>>>> and
> >>>>>> by generating an equals(Object) check respectively). The class is
> >>>>>> anonymously defined via Unsafe as done by other JDK bootstrap
> methods.
> >>>>>>
> >>>>>> Features to be done (this were not done yet as I am not aware yet if
> >>>>> this
> >>>>>> feature will get positive feedback):
> >>>>>> 1) Add the class and its indy-details into other classes to have it
> >>>>> treated
> >>>>>> specially (as done for LambdaMetaFactory and StringConcatFactory)
> [3].
> >>>>>> 2) FIll the documentation of the SwitchTableFactory.
> >>>>>> 3) Use SwitchTableFactory in javac (probably, as a preview feature)
> and
> >>>>>> test its behaviour.
> >>>>>>
> >>>>>> Possible improvements:
> >>>>>> 1) Bootstrap's generator may use its own hash-code algorithm (based
> on
> >>>>>> direct access to String's content using special Lookup or Unsafe)
> which
> >>>>>> would be more efficient than String#hashCode().
> >>>>>> 2) Bootstrap's generator may depend on other String's properties,
> such
> >>>>> as
> >>>>>> compact-mode, length etc.
> >>>>>> 2) Bootstrap's generator may use switchtable in case of hash-codes
> >>>>> being
> >>>>>> close (yet it seems to be too rare to handle).
> >>>>>>
> >>>>>> Subjects to discuss:
> >>>>>> 1. Should the bootstrap method validate its arguments in order to
> >>>>> report
> >>>>>> irrational arguments (such as usage of duplicate String labels)?
> >>>>>> 2. Is the current Boostrap-arguments layout fine or should be
> remade?
> >>>>>> 3. Which optimizations should be added (and/or toggled by default)
> to
> >>>>> the
> >>>>>> generator?
> >>>>>> 4. Should branch IDs be specified explicitly or not (using -1 for
> >>>>> default
> >>>>>> and 0 + i for other branches)?
> >>>>>> 5. Should there be a way to pass MethodHandles to bootstrap methods
> to
> >>>>> be
> >>>>>> invoked instead of branch IDs being returned if a switch returns
> >>>>> nothing
> >>>>>> (and should there be a way to do it for stack-affecting switches?)?
> >>>>>>
> >>>>>> Possible extensions:
> >>>>>> 1. As the name of the class suggests, it is a general switch-table
> >>>>> factory
> >>>>>> and it may be used in future for other types of switches. At least,
> >>>>>> enum-switch *may* be done using it, but (in order to keep it
> one-switch
> >>>>>> based) it would be much better to have it implemented via condy
> >>>>> producing
> >>>>>> int-array for ordinals (this would remove the need for extra
> synthetic
> >>>>>> class currently being generated).
> >>>>>> 2. Moreover, this approach may be used for boosted implementations
> of
> >>>>>> pattern-matching magic (using the same approach as for the strings)
> >>>>> with
> >>>>>> the exception of the need to pass more complicated structures into
> the
> >>>>>> bootstrap method (using condy or references to producing methods or
> >>>>> more
> >>>>>> complicated patterns).
> >>>>>> 3. In addition to #2, a universal approach may be introduced to
> allow
> >>>>>> faster switch by all types of values (which is primary useful for
> >>>>> pattern
> >>>>>> matching) such as passing pairs of int(T) and boolean(T)
> MethodHandles
> >>>>>> performing hash-code computation and equality check respectively.
> >>>>>>
> >>>>>> I am ready to continue developing this feature (being ready to sign
> >>>>> Oracle
> >>>>>> Contributor Agreement) which includes:
> >>>>>> - documentation
> >>>>>> - improvements to the current methods
> >>>>>> - addition of the class to other
> >>>>>> - benchmarks and testing
> >>>>>> - optimization
> >>>>>> - compiler support for this switch-approach
> >>>>>> - other features described above, such as a universal switch
> bootstrap
> >>>>>> method (Possible extensions, #3)
> >>>>>>
> >>>>>> Notes:
> >>>>>> This might have been discussed in the past (at least indy-based
> >>>>> switches in
> >>>>>> general), if there was a reason for this to not be implemented,
> please
> >>>>>> refer me to it. I would be happy to work on this proposal to make
> this
> >>>>>> feature (and other ones mentioned) part of OpenJDK.
> >>>>>>
> >>>>>> Thanks for your interest and support,
> >>>>>> Peter
> >>>>>>
> >>>>>> Links:
> >>>>>> [1]
> >>>>> https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f
> >>>>> Patch
> >>>>>> (dynamic mirror of the file appended)
> >>>>>> [2]
> >>>>>>
> >>>>>
> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
> >>>>>> Current
> >>>>>> javac String implementation root
> >>>>>> [3]
> >>>>>>
> >>>>>
> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
> >>>>>> One
> >>>>>> of the cases where JDK's Bootstrap-methods are handled specially
> >>>>>
> >>>>
> >>>>
> >
>
>
More information about the amber-dev
mailing list