indy-based string-switch proposal

Japris Pogrammer mrjarviscraft at gmail.com
Tue Nov 5 10:52:49 UTC 2019


Binary-search* based algorithm

вт, 5 нояб. 2019 г., 13:50 Japris Pogrammer <mrjarviscraft at gmail.com>:

> I can start by benchmarking ASM and Binary-switch based approaches on
> Strings
>
> вт, 5 нояб. 2019 г., 12:06 Brian Goetz <brian.goetz at oracle.com>:
>
>> So, a good next step is to put in place some measurement of switch cost
>> using different translations.  We can start by comparing old and new, and
>> then can experiment with alternate bootstrap implementations (such as other
>> has codes).
>>
>> Note that we want to use this technique for all switches except those
>> over int and smaller.  This includes not only string and enum, but long and
>> float, and patterns.
>>
>> Sent from my MacBook Wheel
>>
>> > On Nov 5, 2019, at 1:18 AM, Japris Pogrammer <mrjarviscraft at gmail.com>
>> wrote:
>> >
>> > My idea arond it was:
>> > - Compiler:
>> >  - Counts the number of branches
>> >  - Associates branches with some close IDs
>> >  - Associates branch values with IDs
>> >  After all we get something like <T : label --> int : branch ID> (branch
>> > IDs by labels) + int : default ID
>> >  - Generated invokedynamic to accept value and return the corresponding
>> ID
>> >  - Generates lookuptable to go to branches by IDs
>> > - Bootstap method at runtime:
>> >  - Generate maximally effective implementation of <label value to branch
>> > ID> conversion performing all possible runtime optimizations (e.g.
>> access
>> > to string's internal data or alternative hash-code strategy for String
>> > switch)
>> >
>> > вт, 5 нояб. 2019 г. в 03:05, Japris Pogrammer <mrjarviscraft at gmail.com
>> >:
>> >
>> >> But is not it breaking the whole idea of indy-fication of the
>> switches? Is
>> >> not it better to let the compiler generate minimal code for these
>> >> delegating all job to the bootstrap? Again, by doing so we allow future
>> >> implementations optimize this behavour without changing the class.
>> >> For example, CallSite may omit unneeded checks based on these which it
>> has
>> >> already performed.
>> >>
>> >> вт, 5 нояб. 2019 г. в 02:59, <forax at univ-mlv.fr>:
>> >>
>> >>>
>> >>>
>> >>> ------------------------------
>> >>>
>> >>> *De: *"JARvis PROgrammer" <mrjarviscraft at gmail.com>
>> >>> *À: *"Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <
>> >>> amber-dev at openjdk.java.net>
>> >>> *Envoyé: *Mardi 5 Novembre 2019 00:49:58
>> >>> *Objet: *Re: indy-based string-switch proposal
>> >>>
>> >>> I wonit say that the limitation is so strict.
>> >>> In the patch (from which this topic starts) I've used more complicated
>> >>> String-patterns (aka pages) in order to allow single String bootstrap
>> >>> argument contain description for multiple switch labels.
>> >>>
>> >>>
>> >>> multiple labels can be handled by the tableswitch that follow the
>> call to
>> >>> invokedynamic.
>> >>>
>> >>> Rémi
>> >>>
>> >>>
>> >>>
>> >>> вт, 5 нояб. 2019 г. в 02:42, Remi Forax <forax at univ-mlv.fr>:
>> >>>
>> >>>> I term of arguments to sent to the boostrap methods, you can only
>> send
>> >>>> the strings (concatenated as one big string)
>> >>>> with the convention that you will return 0 for the first one, 1 for
>> the
>> >>>> next one, etc.
>> >>>>
>> >>>> Rémi
>> >>>>
>> >>>> ----- Mail original -----
>> >>>>> De: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
>> >>>>> À: "amber-dev" <amber-dev at openjdk.java.net>
>> >>>>> Envoyé: Lundi 4 Novembre 2019 20:15:17
>> >>>>> Objet: indy-based string-switch proposal
>> >>>>
>> >>>>> Hi there!.
>> >>>>>
>> >>>>> I am here with the proposal (and a working implementation prototype
>> >>>> [1]) of
>> >>>>> a invokedynamic-based string-switch which, while not breaking any
>> >>>>> backwards-compatibility and not requiring any cascade changes,
>> allows
>> >>>>> improved behaviour for this JLS feature.
>> >>>>>
>> >>>>> Idea:
>> >>>>> Replace the lookupswitch instruction part (and all the following
>> >>>>> instructions corresponding to its labels) generated by javac for a
>> >>>>> String-switch (keeping the tableswitch part) with a single
>> >>>> invokedynamic
>> >>>>> instruction using the new SwitchTableFactory.
>> >>>>> Patch contents:
>> >>>>> - SwitchTableFactory.java : home for bootstrap-methods replacing the
>> >>>> old
>> >>>>> lookupswitch behaviour
>> >>>>> - SwitchTableFactoryException.java : exception thrown by
>> >>>> bootstrap-methods
>> >>>>> of SwitchTableFactory
>> >>>>> - BaseTest.java : simple behavioral test of bootstrap-methods
>> invoking
>> >>>> them
>> >>>>> via invokestatic and testing results produced by MethodHandles of
>> >>>> returned
>> >>>>> CallSites
>> >>>>>
>> >>>>> Reasoning:
>> >>>>> 1) Current approach (used by javac) is generating a complicated
>> >>>>> structure (whose number of opcodes grows ~linearly on number of
>> >>>> branches).
>> >>>>> Yet this structure is not usually optimized (at current
>> implementation
>> >>>> even
>> >>>>> 1-branch corner-case is not treated specially [2]). Even if these
>> >>>>> optimizations were performed at compile-time they could have become
>> >>>> useless
>> >>>>> (or even causing performance problems) in future versions which is
>> >>>> against
>> >>>>> backwards-compatibility principle. Also, switches compiled for older
>> >>>>> versions whould not use newer features which could have allowed
>> better
>> >>>>> performance.
>> >>>>> 2) As of current spec, String hash-code is one of the only
>> hard-coded
>> >>>>> algorithms. This is mostly due to backwards-compatibility principle
>> >>>> and was
>> >>>>> decided to not be changed (TL;DR and this proposal is not trying to
>> do
>> >>>> it).
>> >>>>> However, using indy-based approach would make the compiler
>> independent
>> >>>> from
>> >>>>> it thus allowing removal of String hash-code algorithm in the
>> future.
>> >>>>> 3) Producing simpler bytecode would allow external tools analyzing
>> it
>> >>>>> discover string-switches easier (this may also be said about JIT).
>> >>>>> 4) Current algorithm cannot rely on String implementation details
>> >>>> which are
>> >>>>> not defined at compile-time (such as String's internal fields).
>> >>>>>
>> >>>>> Invokedynamic details:
>> >>>>> Two bootstrap methods are provided in order to allow big switches
>> >>>> (C-style
>> >>>>> arguments layout is due to inability to pass complicated arguments
>> >>>> (such as
>> >>>>> typed arrays) into the bootstap method):
>> >>>>> 1) SwitchTableFactory:createStringSwitchTable(Lookup, String,
>> >>>> MethodType,
>> >>>>> Object[]) whose bootstrap arguments are the following:
>> >>>>> <default branch id>, <number of String labels>, [<branch id>,
>> <number
>> >>>> of
>> >>>>> String labels corresponsing to the branch>, [<string labels
>> >>>> corresponsing
>> >>>>> to the branch>]*]*
>> >>>>> 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
>> >>>>> MethodType, Object[]) whose bootstrap arguments are the following:
>> >>>>> <default branch id>, <number of pages (the following argument
>> groups)>,
>> >>>>> [<page>]*
>> >>>>> Page is a String of the following pattern:
>> >>>>> [<HEX branch id>\0<HEX number of String labels corresponsing to the
>> >>>>> branch>\0[<HEX length of the following String label>\0<String label
>> >>>>> corresponsing to the branch>]*]*
>> >>>>> As you can see, it allows for multiple branches (each containing
>> >>>> multiple
>> >>>>> labels) to be described under one page. This is done in order to
>> >>>> support
>> >>>>> big switches which cannot fit into the first method variant.
>> >>>>>
>> >>>>> Example:
>> >>>>> As an example of what could be done by the compiler using this
>> feature,
>> >>>>> consider the following switch expression:
>> >>>>>
>> >>>>> switch (string) {
>> >>>>>   case "foo": <branch 0>
>> >>>>>   case "bar": <branch 1>
>> >>>>>   case "baz": case "qux": <branch 2>
>> >>>>>   default: <branch -1>
>> >>>>> }
>> >>>>>
>> >>>>> Generate bytecode would look like:
>> >>>>>
>> >>>>> invokedynamic
>> >>>>> java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..)
>> {-1, 4,
>> >>>>>       0, 1, "foo",
>> >>>>>       1, 1, "bar",
>> >>>>>       2, 1, "baz", "qux"
>> >>>>> }
>> >>>>> <good-old switch-table by code>
>> >>>>>
>> >>>>> or (this should happen for big switches)
>> >>>>>
>> >>>>> invokedynamic
>> >>>>> java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..)
>> >>>> {-1, 1,
>> >>>>>
>>  "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
>> >>>>> }
>> >>>>> <good-old switch-table by code>
>> >>>>>
>> >>>>> (while the argument looks complicates it is (A) shoter (as each
>> '\000'
>> >>>> is
>> >>>>> actually one char) and (B) linear (meaning it is easy to parse))
>> >>>>>
>> >>>>> Implementation details:
>> >>>>> Current implementation is based on ASM-generation. Under the hood it
>> >>>>> generates a class ) with a static String (int) method (passed to
>> >>>>> ConstantCallSite) performing hash-code() switch + equals(Object) (as
>> >>>>> currently done by javac) handling 0- and 1-branch cases specially
>> (by
>> >>>> using
>> >>>>> a MethodHandle of a method returning a bound value (default branch
>> ID)
>> >>>> and
>> >>>>> by generating an equals(Object) check respectively). The class is
>> >>>>> anonymously defined via Unsafe as done by other JDK bootstrap
>> methods.
>> >>>>>
>> >>>>> Features to be done (this were not done yet as I am not aware yet if
>> >>>> this
>> >>>>> feature will get positive feedback):
>> >>>>> 1) Add the class and its indy-details into other classes to have it
>> >>>> treated
>> >>>>> specially (as done for LambdaMetaFactory and StringConcatFactory)
>> [3].
>> >>>>> 2) FIll the documentation of the SwitchTableFactory.
>> >>>>> 3) Use SwitchTableFactory in javac (probably, as a preview feature)
>> and
>> >>>>> test its behaviour.
>> >>>>>
>> >>>>> Possible improvements:
>> >>>>> 1) Bootstrap's generator may use its own hash-code algorithm (based
>> on
>> >>>>> direct access to String's content using special Lookup or Unsafe)
>> which
>> >>>>> would be more efficient than String#hashCode().
>> >>>>> 2) Bootstrap's generator may depend on other String's properties,
>> such
>> >>>> as
>> >>>>> compact-mode, length etc.
>> >>>>> 2) Bootstrap's generator may use switchtable in case of hash-codes
>> >>>> being
>> >>>>> close (yet it seems to be too rare to handle).
>> >>>>>
>> >>>>> Subjects to discuss:
>> >>>>> 1. Should the bootstrap method validate its arguments in order to
>> >>>> report
>> >>>>> irrational arguments (such as usage of duplicate String labels)?
>> >>>>> 2. Is the current Boostrap-arguments layout fine or should be
>> remade?
>> >>>>> 3. Which optimizations should be added (and/or toggled by default)
>> to
>> >>>> the
>> >>>>> generator?
>> >>>>> 4. Should branch IDs be specified explicitly or not (using -1 for
>> >>>> default
>> >>>>> and 0 + i for other branches)?
>> >>>>> 5. Should there be a way to pass MethodHandles to bootstrap methods
>> to
>> >>>> be
>> >>>>> invoked instead of branch IDs being returned if a switch returns
>> >>>> nothing
>> >>>>> (and should there be a way to do it for stack-affecting switches?)?
>> >>>>>
>> >>>>> Possible extensions:
>> >>>>> 1. As the name of the class suggests, it is a general switch-table
>> >>>> factory
>> >>>>> and it may be used in future for other types of switches. At least,
>> >>>>> enum-switch *may* be done using it, but (in order to keep it
>> one-switch
>> >>>>> based) it would be much better to have it implemented via condy
>> >>>> producing
>> >>>>> int-array for ordinals (this would remove the need for extra
>> synthetic
>> >>>>> class currently being generated).
>> >>>>> 2. Moreover, this approach may be used for boosted implementations
>> of
>> >>>>> pattern-matching magic (using the same approach as for the strings)
>> >>>> with
>> >>>>> the exception of the need to pass more complicated structures into
>> the
>> >>>>> bootstrap method (using condy or references to producing methods or
>> >>>> more
>> >>>>> complicated patterns).
>> >>>>> 3. In addition to #2, a universal approach may be introduced to
>> allow
>> >>>>> faster switch by all types of values (which is primary useful for
>> >>>> pattern
>> >>>>> matching) such as passing pairs of int(T) and boolean(T)
>> MethodHandles
>> >>>>> performing hash-code computation and equality check respectively.
>> >>>>>
>> >>>>> I am ready to continue developing this feature (being ready to sign
>> >>>> Oracle
>> >>>>> Contributor Agreement) which includes:
>> >>>>> - documentation
>> >>>>> - improvements to the current methods
>> >>>>> - addition of the class to other
>> >>>>> - benchmarks and testing
>> >>>>> - optimization
>> >>>>> - compiler support for this switch-approach
>> >>>>> - other features described above, such as a universal switch
>> bootstrap
>> >>>>> method (Possible extensions, #3)
>> >>>>>
>> >>>>> Notes:
>> >>>>> This might have been discussed in the past (at least indy-based
>> >>>> switches in
>> >>>>> general), if there was a reason for this to not be implemented,
>> please
>> >>>>> refer me to it. I would be happy to work on this proposal to make
>> this
>> >>>>> feature (and other ones mentioned) part of OpenJDK.
>> >>>>>
>> >>>>> Thanks for your interest and support,
>> >>>>> Peter
>> >>>>>
>> >>>>> Links:
>> >>>>> [1]
>> >>>> https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f
>> >>>> Patch
>> >>>>> (dynamic mirror of the file appended)
>> >>>>> [2]
>> >>>>>
>> >>>>
>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>> >>>>> Current
>> >>>>> javac String implementation root
>> >>>>> [3]
>> >>>>>
>> >>>>
>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>> >>>>> One
>> >>>>> of the cases where JDK's Bootstrap-methods are handled specially
>> >>>>
>> >>>
>> >>>
>>
>>


More information about the amber-dev mailing list