indy-based string-switch proposal

Japris Pogrammer mrjarviscraft at gmail.com
Tue Nov 5 00:18:11 UTC 2019


My idea arond it was:
- Compiler:
  - Counts the number of branches
  - Associates branches with some close IDs
  - Associates branch values with IDs
  After all we get something like <T : label --> int : branch ID> (branch
IDs by labels) + int : default ID
  - Generated invokedynamic to accept value and return the corresponding ID
  - Generates lookuptable to go to branches by IDs
- Bootstap method at runtime:
  - Generate maximally effective implementation of <label value to branch
ID> conversion performing all possible runtime optimizations (e.g. access
to string's internal data or alternative hash-code strategy for String
switch)

вт, 5 нояб. 2019 г. в 03:05, Japris Pogrammer <mrjarviscraft at gmail.com>:

> But is not it breaking the whole idea of indy-fication of the switches? Is
> not it better to let the compiler generate minimal code for these
> delegating all job to the bootstrap? Again, by doing so we allow future
> implementations optimize this behavour without changing the class.
> For example, CallSite may omit unneeded checks based on these which it has
> already performed.
>
> вт, 5 нояб. 2019 г. в 02:59, <forax at univ-mlv.fr>:
>
>>
>>
>> ------------------------------
>>
>> *De: *"JARvis PROgrammer" <mrjarviscraft at gmail.com>
>> *À: *"Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <
>> amber-dev at openjdk.java.net>
>> *Envoyé: *Mardi 5 Novembre 2019 00:49:58
>> *Objet: *Re: indy-based string-switch proposal
>>
>> I wonit say that the limitation is so strict.
>> In the patch (from which this topic starts) I've used more complicated
>> String-patterns (aka pages) in order to allow single String bootstrap
>> argument contain description for multiple switch labels.
>>
>>
>> multiple labels can be handled by the tableswitch that follow the call to
>> invokedynamic.
>>
>> Rémi
>>
>>
>>
>> вт, 5 нояб. 2019 г. в 02:42, Remi Forax <forax at univ-mlv.fr>:
>>
>>> I term of arguments to sent to the boostrap methods, you can only send
>>> the strings (concatenated as one big string)
>>> with the convention that you will return 0 for the first one, 1 for the
>>> next one, etc.
>>>
>>> Rémi
>>>
>>> ----- Mail original -----
>>> > De: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
>>> > À: "amber-dev" <amber-dev at openjdk.java.net>
>>> > Envoyé: Lundi 4 Novembre 2019 20:15:17
>>> > Objet: indy-based string-switch proposal
>>>
>>> > Hi there!.
>>> >
>>> > I am here with the proposal (and a working implementation prototype
>>> [1]) of
>>> > a invokedynamic-based string-switch which, while not breaking any
>>> > backwards-compatibility and not requiring any cascade changes, allows
>>> > improved behaviour for this JLS feature.
>>> >
>>> > Idea:
>>> > Replace the lookupswitch instruction part (and all the following
>>> > instructions corresponding to its labels) generated by javac for a
>>> > String-switch (keeping the tableswitch part) with a single
>>> invokedynamic
>>> > instruction using the new SwitchTableFactory.
>>> > Patch contents:
>>> > - SwitchTableFactory.java : home for bootstrap-methods replacing the
>>> old
>>> > lookupswitch behaviour
>>> > - SwitchTableFactoryException.java : exception thrown by
>>> bootstrap-methods
>>> > of SwitchTableFactory
>>> > - BaseTest.java : simple behavioral test of bootstrap-methods invoking
>>> them
>>> > via invokestatic and testing results produced by MethodHandles of
>>> returned
>>> > CallSites
>>> >
>>> > Reasoning:
>>> > 1) Current approach (used by javac) is generating a complicated
>>> > structure (whose number of opcodes grows ~linearly on number of
>>> branches).
>>> > Yet this structure is not usually optimized (at current implementation
>>> even
>>> > 1-branch corner-case is not treated specially [2]). Even if these
>>> > optimizations were performed at compile-time they could have become
>>> useless
>>> > (or even causing performance problems) in future versions which is
>>> against
>>> > backwards-compatibility principle. Also, switches compiled for older
>>> > versions whould not use newer features which could have allowed better
>>> > performance.
>>> > 2) As of current spec, String hash-code is one of the only hard-coded
>>> > algorithms. This is mostly due to backwards-compatibility principle
>>> and was
>>> > decided to not be changed (TL;DR and this proposal is not trying to do
>>> it).
>>> > However, using indy-based approach would make the compiler independent
>>> from
>>> > it thus allowing removal of String hash-code algorithm in the future.
>>> > 3) Producing simpler bytecode would allow external tools analyzing it
>>> > discover string-switches easier (this may also be said about JIT).
>>> > 4) Current algorithm cannot rely on String implementation details
>>> which are
>>> > not defined at compile-time (such as String's internal fields).
>>> >
>>> > Invokedynamic details:
>>> > Two bootstrap methods are provided in order to allow big switches
>>> (C-style
>>> > arguments layout is due to inability to pass complicated arguments
>>> (such as
>>> > typed arrays) into the bootstap method):
>>> > 1) SwitchTableFactory:createStringSwitchTable(Lookup, String,
>>> MethodType,
>>> > Object[]) whose bootstrap arguments are the following:
>>> > <default branch id>, <number of String labels>, [<branch id>, <number
>>> of
>>> > String labels corresponsing to the branch>, [<string labels
>>> corresponsing
>>> > to the branch>]*]*
>>> > 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
>>> > MethodType, Object[]) whose bootstrap arguments are the following:
>>> > <default branch id>, <number of pages (the following argument groups)>,
>>> > [<page>]*
>>> > Page is a String of the following pattern:
>>> > [<HEX branch id>\0<HEX number of String labels corresponsing to the
>>> > branch>\0[<HEX length of the following String label>\0<String label
>>> > corresponsing to the branch>]*]*
>>> > As you can see, it allows for multiple branches (each containing
>>> multiple
>>> > labels) to be described under one page. This is done in order to
>>> support
>>> > big switches which cannot fit into the first method variant.
>>> >
>>> > Example:
>>> > As an example of what could be done by the compiler using this feature,
>>> > consider the following switch expression:
>>> >
>>> > switch (string) {
>>> >    case "foo": <branch 0>
>>> >    case "bar": <branch 1>
>>> >    case "baz": case "qux": <branch 2>
>>> >    default: <branch -1>
>>> > }
>>> >
>>> > Generate bytecode would look like:
>>> >
>>> > invokedynamic
>>> > java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1, 4,
>>> >        0, 1, "foo",
>>> >        1, 1, "bar",
>>> >        2, 1, "baz", "qux"
>>> > }
>>> > <good-old switch-table by code>
>>> >
>>> > or (this should happen for big switches)
>>> >
>>> > invokedynamic
>>> > java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..)
>>> {-1, 1,
>>> >        "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
>>> > }
>>> > <good-old switch-table by code>
>>> >
>>> > (while the argument looks complicates it is (A) shoter (as each '\000'
>>> is
>>> > actually one char) and (B) linear (meaning it is easy to parse))
>>> >
>>> > Implementation details:
>>> > Current implementation is based on ASM-generation. Under the hood it
>>> > generates a class ) with a static String (int) method (passed to
>>> > ConstantCallSite) performing hash-code() switch + equals(Object) (as
>>> > currently done by javac) handling 0- and 1-branch cases specially (by
>>> using
>>> > a MethodHandle of a method returning a bound value (default branch ID)
>>> and
>>> > by generating an equals(Object) check respectively). The class is
>>> > anonymously defined via Unsafe as done by other JDK bootstrap methods.
>>> >
>>> > Features to be done (this were not done yet as I am not aware yet if
>>> this
>>> > feature will get positive feedback):
>>> > 1) Add the class and its indy-details into other classes to have it
>>> treated
>>> > specially (as done for LambdaMetaFactory and StringConcatFactory) [3].
>>> > 2) FIll the documentation of the SwitchTableFactory.
>>> > 3) Use SwitchTableFactory in javac (probably, as a preview feature) and
>>> > test its behaviour.
>>> >
>>> > Possible improvements:
>>> > 1) Bootstrap's generator may use its own hash-code algorithm (based on
>>> > direct access to String's content using special Lookup or Unsafe) which
>>> > would be more efficient than String#hashCode().
>>> > 2) Bootstrap's generator may depend on other String's properties, such
>>> as
>>> > compact-mode, length etc.
>>> > 2) Bootstrap's generator may use switchtable in case of hash-codes
>>> being
>>> > close (yet it seems to be too rare to handle).
>>> >
>>> > Subjects to discuss:
>>> > 1. Should the bootstrap method validate its arguments in order to
>>> report
>>> > irrational arguments (such as usage of duplicate String labels)?
>>> > 2. Is the current Boostrap-arguments layout fine or should be remade?
>>> > 3. Which optimizations should be added (and/or toggled by default) to
>>> the
>>> > generator?
>>> > 4. Should branch IDs be specified explicitly or not (using -1 for
>>> default
>>> > and 0 + i for other branches)?
>>> > 5. Should there be a way to pass MethodHandles to bootstrap methods to
>>> be
>>> > invoked instead of branch IDs being returned if a switch returns
>>> nothing
>>> > (and should there be a way to do it for stack-affecting switches?)?
>>> >
>>> > Possible extensions:
>>> > 1. As the name of the class suggests, it is a general switch-table
>>> factory
>>> > and it may be used in future for other types of switches. At least,
>>> > enum-switch *may* be done using it, but (in order to keep it one-switch
>>> > based) it would be much better to have it implemented via condy
>>> producing
>>> > int-array for ordinals (this would remove the need for extra synthetic
>>> > class currently being generated).
>>> > 2. Moreover, this approach may be used for boosted implementations of
>>> > pattern-matching magic (using the same approach as for the strings)
>>> with
>>> > the exception of the need to pass more complicated structures into the
>>> > bootstrap method (using condy or references to producing methods or
>>> more
>>> > complicated patterns).
>>> > 3. In addition to #2, a universal approach may be introduced to allow
>>> > faster switch by all types of values (which is primary useful for
>>> pattern
>>> > matching) such as passing pairs of int(T) and boolean(T) MethodHandles
>>> > performing hash-code computation and equality check respectively.
>>> >
>>> > I am ready to continue developing this feature (being ready to sign
>>> Oracle
>>> > Contributor Agreement) which includes:
>>> > - documentation
>>> > - improvements to the current methods
>>> > - addition of the class to other
>>> > - benchmarks and testing
>>> > - optimization
>>> > - compiler support for this switch-approach
>>> > - other features described above, such as a universal switch bootstrap
>>> > method (Possible extensions, #3)
>>> >
>>> > Notes:
>>> > This might have been discussed in the past (at least indy-based
>>> switches in
>>> > general), if there was a reason for this to not be implemented, please
>>> > refer me to it. I would be happy to work on this proposal to make this
>>> > feature (and other ones mentioned) part of OpenJDK.
>>> >
>>> > Thanks for your interest and support,
>>> > Peter
>>> >
>>> > Links:
>>> > [1]
>>> https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f
>>> Patch
>>> > (dynamic mirror of the file appended)
>>> > [2]
>>> >
>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>>> > Current
>>> > javac String implementation root
>>> > [3]
>>> >
>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>>> > One
>>> > of the cases where JDK's Bootstrap-methods are handled specially
>>>
>>
>>


More information about the amber-dev mailing list