indy-based string-switch proposal

Tue Nov 5 18:27:18 UTC 2019

On Nov 4, 2019, at 3:42 PM, Remi Forax <forax at univ-mlv.fr> wrote:
> 
> I term of arguments to sent to the boostrap methods, you can only send the strings (concatenated as one big string)
> with the convention that you will return 0 for the first one, 1 for the next one, etc.

Yes, the effect of this is to hand the JIT a dense set of case indexes.

This is IMO a major sweet spot for all indy-based switches.  I know that the JIT
likes to work with dense switch cases more than sparse ones, but javac doesn’t
have enough information to pick a good algorithm for generating dense indexes.
(Note that some useful information, like identity hash codes, is not available to
the static compiler.)  Dense switch indexes can simplify the JIT’s job, especially
when it wants to create a jump table.

Packing down the indexes to a dense set from some unconditioned set of strings
or other objects is an appropriate job for library code.  The library code can take
string hash codes and/or identity hash codes and spin up a perfect hash function
that runs in a handful of ALU ops and outputs a nearly dense set of indexes
(say 50% or better).  It doesn’t take many shift/xor/add steps (or whatever other
fast instructions the CPU has up its sleeve) to mix N arbitrary bits patterns into a
collision free image in a range of size (say) 2^ceil(lg(N)+e), for smallish e.  All
the parameters and searches required by that can be hidden inside the indy BSM
and updated in library code, independently of the static compiler.

These PHF outputs can then be routed through a short permutation/compression table
before handing off to the JIT as true dense numbers in 1..N or whatever.  If the table is
(as it should be) a compile time constant, the JIT will then have two good choices for
deciding how to lay out its jump table, or decision tree, either using the output of the
compression table, or using its input.

This technique potentially applies to any switch that performs complex data-driven
classification.  For example, a “typecase” style switch would pull out some
x.getClass and treat that thingy as input for a classification function, much like
the case of strings above.

Even small integers can sometimes benefit from this treatment.  I experimented
with it here recently, to map ascii chars like ‘I’ to more dense codes like T_INT:
  http://cr.openjdk.java.net/~jrose/jvm/consolidate-8230199/perfect-hash.patch

Another possible use of indy would be to compile *expression* switches with
constant outputs with special BSMs.  There are additional transforms which library
code could use to preprocess the classification problem for the JIT, to use flow-free
classification code, based on the switch keys.  In particular if the switch produces
only the outputs “true” and “false”, then the runtime and JIT can sometimes produce
very good code, depending on the case values and the capabilities of the CPU.

I noticed this when working with C++ on other character classification problems.
Sometimes the compiler (LLVM) would notice that the input cases were all within
64 of each other, and gin up a fast bit test against a 64-bit mask.  (Nice work, LLVM!)
That happened in this code:
  http://cr.openjdk.java.net/~jrose/jvm/verifier-8230199/src/hotspot/share/runtime/signature.cpp.udiff.html
  (see check_field_name_char, check_method_name_char, etc.)

Bottom line:  If we factor stuff right, then new switches will be not only a decent high
level notation for classification problems, but they will also be the fastest way to get the
job done, in many cases.

None of the above is easy, but (again) if we factor it so many decisions are made
inside a BSM, we can more easily work on stepwise improvements over time,
and they will automatically apply to existing classfiles.

— John