indy-based string-switch proposal
Brian Goetz
brian.goetz at oracle.com
Tue Nov 5 09:06:38 UTC 2019
So, a good next step is to put in place some measurement of switch cost using different translations. We can start by comparing old and new, and then can experiment with alternate bootstrap implementations (such as other has codes).
Note that we want to use this technique for all switches except those over int and smaller. This includes not only string and enum, but long and float, and patterns.
Sent from my MacBook Wheel
> On Nov 5, 2019, at 1:18 AM, Japris Pogrammer <mrjarviscraft at gmail.com> wrote:
>
> My idea arond it was:
> - Compiler:
> - Counts the number of branches
> - Associates branches with some close IDs
> - Associates branch values with IDs
> After all we get something like <T : label --> int : branch ID> (branch
> IDs by labels) + int : default ID
> - Generated invokedynamic to accept value and return the corresponding ID
> - Generates lookuptable to go to branches by IDs
> - Bootstap method at runtime:
> - Generate maximally effective implementation of <label value to branch
> ID> conversion performing all possible runtime optimizations (e.g. access
> to string's internal data or alternative hash-code strategy for String
> switch)
>
> вт, 5 нояб. 2019 г. в 03:05, Japris Pogrammer <mrjarviscraft at gmail.com>:
>
>> But is not it breaking the whole idea of indy-fication of the switches? Is
>> not it better to let the compiler generate minimal code for these
>> delegating all job to the bootstrap? Again, by doing so we allow future
>> implementations optimize this behavour without changing the class.
>> For example, CallSite may omit unneeded checks based on these which it has
>> already performed.
>>
>> вт, 5 нояб. 2019 г. в 02:59, <forax at univ-mlv.fr>:
>>
>>>
>>>
>>> ------------------------------
>>>
>>> *De: *"JARvis PROgrammer" <mrjarviscraft at gmail.com>
>>> *À: *"Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <
>>> amber-dev at openjdk.java.net>
>>> *Envoyé: *Mardi 5 Novembre 2019 00:49:58
>>> *Objet: *Re: indy-based string-switch proposal
>>>
>>> I wonit say that the limitation is so strict.
>>> In the patch (from which this topic starts) I've used more complicated
>>> String-patterns (aka pages) in order to allow single String bootstrap
>>> argument contain description for multiple switch labels.
>>>
>>>
>>> multiple labels can be handled by the tableswitch that follow the call to
>>> invokedynamic.
>>>
>>> Rémi
>>>
>>>
>>>
>>> вт, 5 нояб. 2019 г. в 02:42, Remi Forax <forax at univ-mlv.fr>:
>>>
>>>> I term of arguments to sent to the boostrap methods, you can only send
>>>> the strings (concatenated as one big string)
>>>> with the convention that you will return 0 for the first one, 1 for the
>>>> next one, etc.
>>>>
>>>> Rémi
>>>>
>>>> ----- Mail original -----
>>>>> De: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
>>>>> À: "amber-dev" <amber-dev at openjdk.java.net>
>>>>> Envoyé: Lundi 4 Novembre 2019 20:15:17
>>>>> Objet: indy-based string-switch proposal
>>>>
>>>>> Hi there!.
>>>>>
>>>>> I am here with the proposal (and a working implementation prototype
>>>> [1]) of
>>>>> a invokedynamic-based string-switch which, while not breaking any
>>>>> backwards-compatibility and not requiring any cascade changes, allows
>>>>> improved behaviour for this JLS feature.
>>>>>
>>>>> Idea:
>>>>> Replace the lookupswitch instruction part (and all the following
>>>>> instructions corresponding to its labels) generated by javac for a
>>>>> String-switch (keeping the tableswitch part) with a single
>>>> invokedynamic
>>>>> instruction using the new SwitchTableFactory.
>>>>> Patch contents:
>>>>> - SwitchTableFactory.java : home for bootstrap-methods replacing the
>>>> old
>>>>> lookupswitch behaviour
>>>>> - SwitchTableFactoryException.java : exception thrown by
>>>> bootstrap-methods
>>>>> of SwitchTableFactory
>>>>> - BaseTest.java : simple behavioral test of bootstrap-methods invoking
>>>> them
>>>>> via invokestatic and testing results produced by MethodHandles of
>>>> returned
>>>>> CallSites
>>>>>
>>>>> Reasoning:
>>>>> 1) Current approach (used by javac) is generating a complicated
>>>>> structure (whose number of opcodes grows ~linearly on number of
>>>> branches).
>>>>> Yet this structure is not usually optimized (at current implementation
>>>> even
>>>>> 1-branch corner-case is not treated specially [2]). Even if these
>>>>> optimizations were performed at compile-time they could have become
>>>> useless
>>>>> (or even causing performance problems) in future versions which is
>>>> against
>>>>> backwards-compatibility principle. Also, switches compiled for older
>>>>> versions whould not use newer features which could have allowed better
>>>>> performance.
>>>>> 2) As of current spec, String hash-code is one of the only hard-coded
>>>>> algorithms. This is mostly due to backwards-compatibility principle
>>>> and was
>>>>> decided to not be changed (TL;DR and this proposal is not trying to do
>>>> it).
>>>>> However, using indy-based approach would make the compiler independent
>>>> from
>>>>> it thus allowing removal of String hash-code algorithm in the future.
>>>>> 3) Producing simpler bytecode would allow external tools analyzing it
>>>>> discover string-switches easier (this may also be said about JIT).
>>>>> 4) Current algorithm cannot rely on String implementation details
>>>> which are
>>>>> not defined at compile-time (such as String's internal fields).
>>>>>
>>>>> Invokedynamic details:
>>>>> Two bootstrap methods are provided in order to allow big switches
>>>> (C-style
>>>>> arguments layout is due to inability to pass complicated arguments
>>>> (such as
>>>>> typed arrays) into the bootstap method):
>>>>> 1) SwitchTableFactory:createStringSwitchTable(Lookup, String,
>>>> MethodType,
>>>>> Object[]) whose bootstrap arguments are the following:
>>>>> <default branch id>, <number of String labels>, [<branch id>, <number
>>>> of
>>>>> String labels corresponsing to the branch>, [<string labels
>>>> corresponsing
>>>>> to the branch>]*]*
>>>>> 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
>>>>> MethodType, Object[]) whose bootstrap arguments are the following:
>>>>> <default branch id>, <number of pages (the following argument groups)>,
>>>>> [<page>]*
>>>>> Page is a String of the following pattern:
>>>>> [<HEX branch id>\0<HEX number of String labels corresponsing to the
>>>>> branch>\0[<HEX length of the following String label>\0<String label
>>>>> corresponsing to the branch>]*]*
>>>>> As you can see, it allows for multiple branches (each containing
>>>> multiple
>>>>> labels) to be described under one page. This is done in order to
>>>> support
>>>>> big switches which cannot fit into the first method variant.
>>>>>
>>>>> Example:
>>>>> As an example of what could be done by the compiler using this feature,
>>>>> consider the following switch expression:
>>>>>
>>>>> switch (string) {
>>>>> case "foo": <branch 0>
>>>>> case "bar": <branch 1>
>>>>> case "baz": case "qux": <branch 2>
>>>>> default: <branch -1>
>>>>> }
>>>>>
>>>>> Generate bytecode would look like:
>>>>>
>>>>> invokedynamic
>>>>> java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1, 4,
>>>>> 0, 1, "foo",
>>>>> 1, 1, "bar",
>>>>> 2, 1, "baz", "qux"
>>>>> }
>>>>> <good-old switch-table by code>
>>>>>
>>>>> or (this should happen for big switches)
>>>>>
>>>>> invokedynamic
>>>>> java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..)
>>>> {-1, 1,
>>>>> "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
>>>>> }
>>>>> <good-old switch-table by code>
>>>>>
>>>>> (while the argument looks complicates it is (A) shoter (as each '\000'
>>>> is
>>>>> actually one char) and (B) linear (meaning it is easy to parse))
>>>>>
>>>>> Implementation details:
>>>>> Current implementation is based on ASM-generation. Under the hood it
>>>>> generates a class ) with a static String (int) method (passed to
>>>>> ConstantCallSite) performing hash-code() switch + equals(Object) (as
>>>>> currently done by javac) handling 0- and 1-branch cases specially (by
>>>> using
>>>>> a MethodHandle of a method returning a bound value (default branch ID)
>>>> and
>>>>> by generating an equals(Object) check respectively). The class is
>>>>> anonymously defined via Unsafe as done by other JDK bootstrap methods.
>>>>>
>>>>> Features to be done (this were not done yet as I am not aware yet if
>>>> this
>>>>> feature will get positive feedback):
>>>>> 1) Add the class and its indy-details into other classes to have it
>>>> treated
>>>>> specially (as done for LambdaMetaFactory and StringConcatFactory) [3].
>>>>> 2) FIll the documentation of the SwitchTableFactory.
>>>>> 3) Use SwitchTableFactory in javac (probably, as a preview feature) and
>>>>> test its behaviour.
>>>>>
>>>>> Possible improvements:
>>>>> 1) Bootstrap's generator may use its own hash-code algorithm (based on
>>>>> direct access to String's content using special Lookup or Unsafe) which
>>>>> would be more efficient than String#hashCode().
>>>>> 2) Bootstrap's generator may depend on other String's properties, such
>>>> as
>>>>> compact-mode, length etc.
>>>>> 2) Bootstrap's generator may use switchtable in case of hash-codes
>>>> being
>>>>> close (yet it seems to be too rare to handle).
>>>>>
>>>>> Subjects to discuss:
>>>>> 1. Should the bootstrap method validate its arguments in order to
>>>> report
>>>>> irrational arguments (such as usage of duplicate String labels)?
>>>>> 2. Is the current Boostrap-arguments layout fine or should be remade?
>>>>> 3. Which optimizations should be added (and/or toggled by default) to
>>>> the
>>>>> generator?
>>>>> 4. Should branch IDs be specified explicitly or not (using -1 for
>>>> default
>>>>> and 0 + i for other branches)?
>>>>> 5. Should there be a way to pass MethodHandles to bootstrap methods to
>>>> be
>>>>> invoked instead of branch IDs being returned if a switch returns
>>>> nothing
>>>>> (and should there be a way to do it for stack-affecting switches?)?
>>>>>
>>>>> Possible extensions:
>>>>> 1. As the name of the class suggests, it is a general switch-table
>>>> factory
>>>>> and it may be used in future for other types of switches. At least,
>>>>> enum-switch *may* be done using it, but (in order to keep it one-switch
>>>>> based) it would be much better to have it implemented via condy
>>>> producing
>>>>> int-array for ordinals (this would remove the need for extra synthetic
>>>>> class currently being generated).
>>>>> 2. Moreover, this approach may be used for boosted implementations of
>>>>> pattern-matching magic (using the same approach as for the strings)
>>>> with
>>>>> the exception of the need to pass more complicated structures into the
>>>>> bootstrap method (using condy or references to producing methods or
>>>> more
>>>>> complicated patterns).
>>>>> 3. In addition to #2, a universal approach may be introduced to allow
>>>>> faster switch by all types of values (which is primary useful for
>>>> pattern
>>>>> matching) such as passing pairs of int(T) and boolean(T) MethodHandles
>>>>> performing hash-code computation and equality check respectively.
>>>>>
>>>>> I am ready to continue developing this feature (being ready to sign
>>>> Oracle
>>>>> Contributor Agreement) which includes:
>>>>> - documentation
>>>>> - improvements to the current methods
>>>>> - addition of the class to other
>>>>> - benchmarks and testing
>>>>> - optimization
>>>>> - compiler support for this switch-approach
>>>>> - other features described above, such as a universal switch bootstrap
>>>>> method (Possible extensions, #3)
>>>>>
>>>>> Notes:
>>>>> This might have been discussed in the past (at least indy-based
>>>> switches in
>>>>> general), if there was a reason for this to not be implemented, please
>>>>> refer me to it. I would be happy to work on this proposal to make this
>>>>> feature (and other ones mentioned) part of OpenJDK.
>>>>>
>>>>> Thanks for your interest and support,
>>>>> Peter
>>>>>
>>>>> Links:
>>>>> [1]
>>>> https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f
>>>> Patch
>>>>> (dynamic mirror of the file appended)
>>>>> [2]
>>>>>
>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>>>>> Current
>>>>> javac String implementation root
>>>>> [3]
>>>>>
>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>>>>> One
>>>>> of the cases where JDK's Bootstrap-methods are handled specially
>>>>
>>>
>>>
More information about the amber-dev
mailing list