indy-based string-switch proposal
forax at univ-mlv.fr
forax at univ-mlv.fr
Tue Nov 5 09:22:54 UTC 2019
> De: "JARvis PROgrammer" <mrjarviscraft at gmail.com>
> À: "Remi Forax" <forax at univ-mlv.fr>, "amber-dev" <amber-dev at openjdk.java.net>
> Envoyé: Mardi 5 Novembre 2019 01:05:58
> Objet: Re: indy-based string-switch proposal
> But is not it breaking the whole idea of indy-fication of the switches? Is not
> it better to let the compiler generate minimal code for these delegating all
> job to the bootstrap? Again, by doing so we allow future implementations
> optimize this behavour without changing the class. For example, CallSite may
> omit unneeded checks based on these which it has already performed.
If you want to transfer data from indy to the tableswitch, we are thinking about using a carrier object populated by indy and were the values are extracted by the code inside the tableswitch.
By example for the type switch,
switch(o) {
case Foo f -> ...
case Bar b -> ...
}
the equivalent code is an indy calls that return a kind of record that contains an int (the dense switch selector), a Foo and a Bar
record Carrier(int selector, Foo foo, Bar bar);
Carrier carrier = indy ...
switch(carrier.selector) {
case 0: var f = carrier.foo; ...
case 1: var b = carrier.bar; ...
}
In fact, it's a litle more complex than that, to avoid any binary compatibility issue, we want to type the carrier object as Object, so more something like
Object carrier = (Object)indy ...
switch((int)indy "selector" (c)) {
case 0: var f = (Foo)indy "component" (c) [0]; ...
case 1: var b = (Bar)indy "component" (c) [1]; ...
}
For the VM, when everything is inlined, the JIT can see the creation of the record (that maybe an inline type in the future), so there is no real cast at runtime.
Rémi
> вт, 5 нояб. 2019 г. в 02:59, < [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ]
> >:
>>> De: "JARvis PROgrammer" < [ mailto:mrjarviscraft at gmail.com |
>>> mrjarviscraft at gmail.com ] >
>>> À: "Remi Forax" < [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] >,
>>> "amber-dev" < [ mailto:amber-dev at openjdk.java.net | amber-dev at openjdk.java.net
>>> ] >
>>> Envoyé: Mardi 5 Novembre 2019 00:49:58
>>> Objet: Re: indy-based string-switch proposal
>>> I wonit say that the limitation is so strict.
>>> In the patch (from which this topic starts) I've used more complicated
>>> String-patterns (aka pages) in order to allow single String bootstrap argument
>>> contain description for multiple switch labels.
>> multiple labels can be handled by the tableswitch that follow the call to
>> invokedynamic.
>> Rémi
>>> вт, 5 нояб. 2019 г. в 02:42, Remi Forax < [ mailto:forax at univ-mlv.fr |
>>> forax at univ-mlv.fr ] >:
>>>> I term of arguments to sent to the boostrap methods, you can only send the
>>>> strings (concatenated as one big string)
>>>> with the convention that you will return 0 for the first one, 1 for the next
>>>> one, etc.
>>>> Rémi
>>>> ----- Mail original -----
>>>>> De: "JARvis PROgrammer" < [ mailto:mrjarviscraft at gmail.com |
>>>> > mrjarviscraft at gmail.com ] >
>>>>> À: "amber-dev" < [ mailto:amber-dev at openjdk.java.net |
>>>> > amber-dev at openjdk.java.net ] >
>>>> > Envoyé: Lundi 4 Novembre 2019 20:15:17
>>>> > Objet: indy-based string-switch proposal
>>>> > Hi there!.
>>>> > I am here with the proposal (and a working implementation prototype [1]) of
>>>> > a invokedynamic-based string-switch which, while not breaking any
>>>> > backwards-compatibility and not requiring any cascade changes, allows
>>>> > improved behaviour for this JLS feature.
>>>> > Idea:
>>>> > Replace the lookupswitch instruction part (and all the following
>>>> > instructions corresponding to its labels) generated by javac for a
>>>> > String-switch (keeping the tableswitch part) with a single invokedynamic
>>>> > instruction using the new SwitchTableFactory.
>>>> > Patch contents:
>>>> > - SwitchTableFactory.java : home for bootstrap-methods replacing the old
>>>> > lookupswitch behaviour
>>>> > - SwitchTableFactoryException.java : exception thrown by bootstrap-methods
>>>> > of SwitchTableFactory
>>>> > - BaseTest.java : simple behavioral test of bootstrap-methods invoking them
>>>> > via invokestatic and testing results produced by MethodHandles of returned
>>>> > CallSites
>>>> > Reasoning:
>>>> > 1) Current approach (used by javac) is generating a complicated
>>>> > structure (whose number of opcodes grows ~linearly on number of branches).
>>>> > Yet this structure is not usually optimized (at current implementation even
>>>> > 1-branch corner-case is not treated specially [2]). Even if these
>>>> > optimizations were performed at compile-time they could have become useless
>>>> > (or even causing performance problems) in future versions which is against
>>>> > backwards-compatibility principle. Also, switches compiled for older
>>>> > versions whould not use newer features which could have allowed better
>>>> > performance.
>>>> > 2) As of current spec, String hash-code is one of the only hard-coded
>>>> > algorithms. This is mostly due to backwards-compatibility principle and was
>>>> > decided to not be changed (TL;DR and this proposal is not trying to do it).
>>>> > However, using indy-based approach would make the compiler independent from
>>>> > it thus allowing removal of String hash-code algorithm in the future.
>>>> > 3) Producing simpler bytecode would allow external tools analyzing it
>>>> > discover string-switches easier (this may also be said about JIT).
>>>> > 4) Current algorithm cannot rely on String implementation details which are
>>>> > not defined at compile-time (such as String's internal fields).
>>>> > Invokedynamic details:
>>>> > Two bootstrap methods are provided in order to allow big switches (C-style
>>>> > arguments layout is due to inability to pass complicated arguments (such as
>>>> > typed arrays) into the bootstap method):
>>>> > 1) SwitchTableFactory:createStringSwitchTable(Lookup, String, MethodType,
>>>> > Object[]) whose bootstrap arguments are the following:
>>>> > <default branch id>, <number of String labels>, [<branch id>, <number of
>>>> > String labels corresponsing to the branch>, [<string labels corresponsing
>>>> > to the branch>]*]*
>>>> > 2) SwitchTableFactory:altCreateStringSwitchTable(Lookup, String,
>>>> > MethodType, Object[]) whose bootstrap arguments are the following:
>>>> > <default branch id>, <number of pages (the following argument groups)>,
>>>> > [<page>]*
>>>> > Page is a String of the following pattern:
>>>> > [<HEX branch id>\0<HEX number of String labels corresponsing to the
>>>> > branch>\0[<HEX length of the following String label>\0<String label
>>>> > corresponsing to the branch>]*]*
>>>> > As you can see, it allows for multiple branches (each containing multiple
>>>> > labels) to be described under one page. This is done in order to support
>>>> > big switches which cannot fit into the first method variant.
>>>> > Example:
>>>> > As an example of what could be done by the compiler using this feature,
>>>> > consider the following switch expression:
>>>> > switch (string) {
>>>> > case "foo": <branch 0>
>>>> > case "bar": <branch 1>
>>>> > case "baz": case "qux": <branch 2>
>>>> > default: <branch -1>
>>>> > }
>>>> > Generate bytecode would look like:
>>>> > invokedynamic
>>>> > java/lang/invoke/SwitchTableFactory:createStringSwitchTable(..) {-1, 4,
>>>> > 0, 1, "foo",
>>>> > 1, 1, "bar",
>>>> > 2, 1, "baz", "qux"
>>>> > }
>>>> > <good-old switch-table by code>
>>>> > or (this should happen for big switches)
>>>> > invokedynamic
>>>> > java/lang/invoke/SwitchTableFactory:altCreateStringSwitchTable(..) {-1, 1,
>>>> > "0\0001\0003\000foo1\0001\0003\000bar2\0002\0003\000baz3\000qux"
>>>> > }
>>>> > <good-old switch-table by code>
>>>> > (while the argument looks complicates it is (A) shoter (as each '\000' is
>>>> > actually one char) and (B) linear (meaning it is easy to parse))
>>>> > Implementation details:
>>>> > Current implementation is based on ASM-generation. Under the hood it
>>>> > generates a class ) with a static String (int) method (passed to
>>>> > ConstantCallSite) performing hash-code() switch + equals(Object) (as
>>>> > currently done by javac) handling 0- and 1-branch cases specially (by using
>>>> > a MethodHandle of a method returning a bound value (default branch ID) and
>>>> > by generating an equals(Object) check respectively). The class is
>>>> > anonymously defined via Unsafe as done by other JDK bootstrap methods.
>>>> > Features to be done (this were not done yet as I am not aware yet if this
>>>> > feature will get positive feedback):
>>>> > 1) Add the class and its indy-details into other classes to have it treated
>>>> > specially (as done for LambdaMetaFactory and StringConcatFactory) [3].
>>>> > 2) FIll the documentation of the SwitchTableFactory.
>>>> > 3) Use SwitchTableFactory in javac (probably, as a preview feature) and
>>>> > test its behaviour.
>>>> > Possible improvements:
>>>> > 1) Bootstrap's generator may use its own hash-code algorithm (based on
>>>> > direct access to String's content using special Lookup or Unsafe) which
>>>> > would be more efficient than String#hashCode().
>>>> > 2) Bootstrap's generator may depend on other String's properties, such as
>>>> > compact-mode, length etc.
>>>> > 2) Bootstrap's generator may use switchtable in case of hash-codes being
>>>> > close (yet it seems to be too rare to handle).
>>>> > Subjects to discuss:
>>>> > 1. Should the bootstrap method validate its arguments in order to report
>>>> > irrational arguments (such as usage of duplicate String labels)?
>>>> > 2. Is the current Boostrap-arguments layout fine or should be remade?
>>>> > 3. Which optimizations should be added (and/or toggled by default) to the
>>>> > generator?
>>>> > 4. Should branch IDs be specified explicitly or not (using -1 for default
>>>> > and 0 + i for other branches)?
>>>> > 5. Should there be a way to pass MethodHandles to bootstrap methods to be
>>>> > invoked instead of branch IDs being returned if a switch returns nothing
>>>> > (and should there be a way to do it for stack-affecting switches?)?
>>>> > Possible extensions:
>>>> > 1. As the name of the class suggests, it is a general switch-table factory
>>>> > and it may be used in future for other types of switches. At least,
>>>> > enum-switch *may* be done using it, but (in order to keep it one-switch
>>>> > based) it would be much better to have it implemented via condy producing
>>>> > int-array for ordinals (this would remove the need for extra synthetic
>>>> > class currently being generated).
>>>> > 2. Moreover, this approach may be used for boosted implementations of
>>>> > pattern-matching magic (using the same approach as for the strings) with
>>>> > the exception of the need to pass more complicated structures into the
>>>> > bootstrap method (using condy or references to producing methods or more
>>>> > complicated patterns).
>>>> > 3. In addition to #2, a universal approach may be introduced to allow
>>>> > faster switch by all types of values (which is primary useful for pattern
>>>> > matching) such as passing pairs of int(T) and boolean(T) MethodHandles
>>>> > performing hash-code computation and equality check respectively.
>>>> > I am ready to continue developing this feature (being ready to sign Oracle
>>>> > Contributor Agreement) which includes:
>>>> > - documentation
>>>> > - improvements to the current methods
>>>> > - addition of the class to other
>>>> > - benchmarks and testing
>>>> > - optimization
>>>> > - compiler support for this switch-approach
>>>> > - other features described above, such as a universal switch bootstrap
>>>> > method (Possible extensions, #3)
>>>> > Notes:
>>>> > This might have been discussed in the past (at least indy-based switches in
>>>> > general), if there was a reason for this to not be implemented, please
>>>> > refer me to it. I would be happy to work on this proposal to make this
>>>> > feature (and other ones mentioned) part of OpenJDK.
>>>> > Thanks for your interest and support,
>>>> > Peter
>>>> > Links:
>>>>> [1] [ https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f |
>>>> > https://gist.github.com/JarvisCraft/53b6e5e4eb91419b6e852cf63e1c8a2f ] Patch
>>>> > (dynamic mirror of the file appended)
>>>> > [2]
>>>>> [
>>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>>>>> |
>>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Lower.java#l3692
>>>> > ]
>>>> > Current
>>>> > javac String implementation root
>>>> > [3]
>>>>> [
>>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>>>>> |
>>>>> https://hg.openjdk.java.net/amber/amber/file/c033807bfd49/src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java
>>>> > ]
>>>> > One
>>>> > of the cases where JDK's Bootstrap-methods are handled specially
More information about the amber-dev
mailing list