indy-based string-switch proposal

forax at univ-mlv.fr forax at univ-mlv.fr
Tue Nov 5 09:25:14 UTC 2019


----- Mail original -----
> De: "Brian Goetz" <brian.goetz at oracle.com>
> À: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
> Cc: "Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <amber-dev at openjdk.java.net>
> Envoyé: Mardi 5 Novembre 2019 10:06:38
> Objet: Re: indy-based string-switch proposal

> So, a good next step is to put in place some measurement of switch cost using
> different translations.  We can start by comparing old and new, and then can
> experiment with alternate bootstrap implementations (such as other has codes).
> 
> Note that we want to use this technique for all switches except those over int
> and smaller.  This includes not only string and enum, but long and float, and
> patterns.

Switching on floats and doubles is a weird idea for me.

> 
> Sent from my MacBook Wheel
> 

Rémi

>> On Nov 5, 2019, at 1:18 AM, Japris Pogrammer <mrjarviscraft at gmail.com> wrote:
>> 
>> My idea arond it was:
>> - Compiler:
>>  - Counts the number of branches
>>  - Associates branches with some close IDs
>>  - Associates branch values with IDs
>>  After all we get something like <T : label --> int : branch ID> (branch
>> IDs by labels) + int : default ID
>>  - Generated invokedynamic to accept value and return the corresponding ID
>>  - Generates lookuptable to go to branches by IDs
>> - Bootstap method at runtime:
>>  - Generate maximally effective implementation of <label value to branch
>> ID> conversion performing all possible runtime optimizations (e.g. access
>> to string's internal data or alternative hash-code strategy for String
>> switch)
>> 
>> вт, 5 нояб. 2019 г. в 03:05, Japris Pogrammer <mrjarviscraft at gmail.com>:
>> 
>>> But is not it breaking the whole idea of indy-fication of the switches? Is
>>> not it better to let the compiler generate minimal code for these
>>> delegating all job to the bootstrap? Again, by doing so we allow future
>>> implementations optimize this behavour without changing the class.
>>> For example, CallSite may omit unneeded checks based on these which it has
>>> already performed.
>>> 
>>> вт, 5 нояб. 2019 г. в 02:59, <forax at univ-mlv.fr>:
>>> 
>>>> 
>>>> 
>>>> ------------------------------
>>>> 
>>>> *De: *"JARvis PROgrammer" <mrjarviscraft at gmail.com>
>>>> *À: *"Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <
>>>> amber-dev at openjdk.java.net>
>>>> *Envoyé: *Mardi 5 Novembre 2019 00:49:58
>>>> *Objet: *Re: indy-based string-switch proposal
>>>> 
>>>> I wonit say that the limitation is so strict.
>>>> In the patch (from which this topic starts) I've used more complicated
>>>> String-patterns (aka pages) in order to allow single String bootstrap
>>>> argument contain description for multiple switch labels.
>>>> 
>>>> 
>>>> multiple labels can be handled by the tableswitch that follow the call to
>>>> invokedynamic.
>>>> 
>>>> Rémi
>>>> 
>>>> 
>>>> 
>>>> вт, 5 нояб. 2019 г. в 02:42, Remi Forax <forax at univ-mlv.fr>:
>>>> 
>>>>> I term of arguments to sent to the boostrap methods, you can only send
>>>>> the strings (concatenated as one big string)
>>>>> with the convention that you will return 0 for the first one, 1 for the
>>>>> next one, etc.
>>>>> 
>>>>> Rémi
>>>>> 
>>>>> ----- Mail original -----
>>>>>> De: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
>>>>>> À: "amber-dev" <amber-dev at openjdk.java.net>
>>>>>> Envoyé: Lundi 4 Novembre 2019 20:15:17
>>>>>> Objet: indy-based string-switch proposal
>>>>> 
>>>>>> Hi there!.
>>>>>> 
>>>>>> I am here with the proposal (and a working implementation prototype
>>>>> [1]) of
>>>>>> a invokedynamic-based string-switch which, while not breaking any
>>>>>> backwards-compatibility and not requiring any cascade changes, allows
>>>>>> improved behaviour for this JLS feature.
>>>>>> 
>>>>>> Idea:
>>>>>> Replace the lookupswitch instruction part (and all the following
>>>>>> instructions corresponding to its labels) generated by javac for a
>>>>>> String-switch (keeping the tableswitch part) with a single
>>>>> invokedynamic
>>>>>> instruction using the new SwitchTableFactory.
>>>>>> Patch contents:
>>>>>> - SwitchTableFactory.java : home for bootstrap-methods replacing the
>>>>> old
>>>>>> lookupswitch behaviour
>>>>>> - SwitchTableFactoryException.java : exception thrown by
>>>>> bootstrap-methods
>>>>>> of SwitchTableFactory
>>>>>> - BaseTest.java : simple behavioral test of bootstrap-methods invoking
>>>>> them
>>>>>> via invokestatic and testing results produced by MethodHandles of
>>>>> returned
>>>>>> CallSites
>>>>>> 
>>>>>> Reasoning:
>>>>>> 1) Current approach (used by javac) is generating a complicated
>>>>>> structure (whose number of opcodes grows ~linearly on number of
>>>>> branches).
>>>>>> Yet this structure is not usually optimized (at current implementation
>>>>> even
>>>>>> 1-branch corner-case is not treated specially [2]). Even if these
>>>>>> optimizations were performed at compile-time they could have become
>>>>> useless
>>>>>> (or even causing performance problems) in future versions which is
>>>>> against
>>>>>> backwards-compatibility principle. Also, switches compiled for older
>>>>>> versions whould not use newer features which could have allowed better
>>>>>> performance.
>>>>>> 2) As of current spec, String hash-code is one of the only hard-coded
>>>>>> algorithms. This is mostly due to backwards-compatibility principle
>>>>> and was
>>>>>> decided to not be changed (TL;DR and this proposal is not trying to do
>>>>> it).
>>>>>> However, using indy-based approach would make the compiler independent
>>>>> from
>>>>>> it thus allowing removal of String hash-code algorithm in the future.
>>>>>> 3) Producing simpler bytecode would allow external tools analyzing it
>>>>>> discover string-switches easier (this may also be said about JIT).
>>>>>> 4) Current algorithm cannot rely on String implementation details
>>>>> which are
>>>>>> not defined at compile-time (such as String's internal fields).
>>>>>> 
>>>>>> Invokedynamic details:
>>>>>> Two bootstrap methods are provided in order to allow big switches
>>>>> (C-style
>>>>>> arguments layout is due to inability to pass complicated arguments
>>>>> (such as
>>>>>> typed arrays) into the bootstap method):
>>>>>> 1) SwitchTableFactory:createStringSwitchTable(Lookup, String,
>>>>> MethodType,
>>>>>> Object[]) whose bootstrap arguments are the following:
>>>>>> <default branch id>, <number of String labels>, [<branch id>, <number
>>>>> of
>>>>>> String labels corresponsing to the branch>, [<string labels
>>>>> corresponsing
>>>>>> to the branch>]*]*
>>>>>> 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
>>>>>> MethodType, Object[]) whose bootstrap arguments are the following:
>>>>>> <default branch id>, <number of pages (the following argument groups)>,
>>>>>> [<page>]*
>>>>>> Page is a String of the following pattern:
>>>>>> [<HEX branch id>\0<HEX number of String labels corresponsing to the
>>>>>> branch>\0[<HEX length of the following String label>\0<String label
>>>>>> corresponsing to the branch>]*]*
>>>>>> As you can see, it allows for multiple branches (each containing
>>>>> multiple
>>>>>> labels) to be described under one page. This is done in order to
>>>>> support
>>>>>> big switches which cannot fit into the first method variant.
>>>>>> 
>>>>>> Example:
>>>>>> As an example of what could be done by the compiler using this feature,
>>>>>> consider the following switch expression:
>>>>>> 
>>>>>> switch (string) {
>>>>>>   case "foo": <branch 0>
>>>>>>   case "bar": <branch 1>
>>>>>>   case "baz": case "qux": <branch 2>
>>>>>>   default: <branch -1>
>>>>>> }
>>>>>> 
>>>>>> Generate bytecode would look like:
>>>>>> 
>>>>>> invokedynamic
>>>>>> java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1, 4,
>>>>>>       0, 1, "foo",
>>>>>>       1, 1, "bar",
>>>>>>       2, 1, "baz", "qux"
>>>>>> }
>>>>>> <good-old switch-table by code>
>>>>>> 
>>>>>> or (this should happen for big switches)
>>>>>> 
>>>>>> invokedynamic
>>>>>> java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..)
>>>>> {-1, 1,
>>>>>>       "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
>>>>>> }
>>>>>> <good-old switch-table by code>
>>>>>> 
>>>>>> (while the argument looks complicates it is (A) shoter (as each '\000'
>>>>> is
>>>>>> actually one char) and (B) linear (meaning it is easy to parse))
>>>>>> 
>>>>>> Implementation details:
>>>>>> Current implementation is based on ASM-generation. Under the hood it
>>>>>> generates a class ) with a static String (int) method (passed to
>>>>>> ConstantCallSite) performing hash-code() switch + equals(Object) (as
>>>>>> currently done by javac) handling 0- and 1-branch cases specially (by
>>>>> using
>>>>>> a MethodHandle of a method returning a bound value (default branch ID)
>>>>> and
>>>>>> by generating an equals(Object) check respectively). The class is
>>>>>> anonymously defined via Unsafe as done by other JDK bootstrap methods.
>>>>>> 
>>>>>> Features to be done (this were not done yet as I am not aware yet if
>>>>> this
>>>>>> feature will get positive feedback):
>>>>>> 1) Add the class and its indy-details into other classes to have it
>>>>> treated
>>>>>> specially (as done for LambdaMetaFactory and StringConcatFactory) [3].
>>>>>> 2) FIll the documentation of the SwitchTableFactory.
>>>>>> 3) Use SwitchTableFactory in javac (probably, as a preview feature) and
>>>>>> test its behaviour.
>>>>>> 
>>>>>> Possible improvements:
>>>>>> 1) Bootstrap's generator may use its own hash-code algorithm (based on
>>>>>> direct access to String's content using special Lookup or Unsafe) which
>>>>>> would be more efficient than String#hashCode().
>>>>>> 2) Bootstrap's generator may depend on other String's properties, such
>>>>> as
>>>>>> compact-mode, length etc.
>>>>>> 2) Bootstrap's generator may use switchtable in case of hash-codes
>>>>> being
>>>>>> close (yet it seems to be too rare to handle).
>>>>>> 
>>>>>> Subjects to discuss:
>>>>>> 1. Should the bootstrap method validate its arguments in order to
>>>>> report
>>>>>> irrational arguments (such as usage of duplicate String labels)?
>>>>>> 2. Is the current Boostrap-arguments layout fine or should be remade?
>>>>>> 3. Which optimizations should be added (and/or toggled by default) to
>>>>> the
>>>>>> generator?
>>>>>> 4. Should branch IDs be specified explicitly or not (using -1 for
>>>>> default
>>>>>> and 0 + i for other branches)?
>>>>>> 5. Should there be a way to pass MethodHandles to bootstrap methods to
>>>>> be
>>>>>> invoked instead of branch IDs being returned if a switch returns
>>>>> nothing
>>>>>> (and should there be a way to do it for stack-affecting switches?)?
>>>>>> 
>>>>>> Possible extensions:
>>>>>> 1. As the name of the class suggests, it is a general switch-table
>>>>> factory
>>>>>> and it may be used in future for other types of switches. At least,
>>>>>> enum-switch *may* be done using it, but (in order to keep it one-switch
>>>>>> based) it would be much better to have it implemented via condy
>>>>> producing
>>>>>> int-array for ordinals (this would remove the need for extra synthetic
>>>>>> class currently being generated).
>>>>>> 2. Moreover, this approach may be used for boosted implementations of
>>>>>> pattern-matching magic (using the same approach as for the strings)
>>>>> with
>>>>>> the exception of the need to pass more complicated structures into the
>>>>>> bootstrap method (using condy or references to producing methods or
>>>>> more
>>>>>> complicated patterns).
>>>>>> 3. In addition to #2, a universal approach may be introduced to allow
>>>>>> faster switch by all types of values (which is primary useful for
>>>>> pattern
>>>>>> matching) such as passing pairs of int(T) and boolean(T) MethodHandles
>>>>>> performing hash-code computation and equality check respectively.
>>>>>> 
>>>>>> I am ready to continue developing this feature (being ready to sign
>>>>> Oracle
>>>>>> Contributor Agreement) which includes:
>>>>>> - documentation
>>>>>> - improvements to the current methods
>>>>>> - addition of the class to other
>>>>>> - benchmarks and testing
>>>>>> - optimization
>>>>>> - compiler support for this switch-approach
>>>>>> - other features described above, such as a universal switch bootstrap
>>>>>> method (Possible extensions, #3)
>>>>>> 
>>>>>> Notes:
>>>>>> This might have been discussed in the past (at least indy-based
>>>>> switches in
>>>>>> general), if there was a reason for this to not be implemented, please
>>>>>> refer me to it. I would be happy to work on this proposal to make this
>>>>>> feature (and other ones mentioned) part of OpenJDK.
>>>>>> 
>>>>>> Thanks for your interest and support,
>>>>>> Peter
>>>>>> 
>>>>>> Links:
>>>>>> [1]
>>>>> https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f
>>>>> Patch
>>>>>> (dynamic mirror of the file appended)
>>>>>> [2]
>>>>>> 
>>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>>>>>> Current
>>>>>> javac String implementation root
>>>>>> [3]
>>>>>> 
>>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>>>>>> One
>>>>>> of the cases where JDK's Bootstrap-methods are handled specially
>>>>> 
>>>> 


More information about the amber-dev mailing list