RFR: 8213478: Reduce rebinds when applying repeated filters and conversions

Wed Nov 7 14:57:33 UTC 2018

Hi,

by providing a LambdaFormEditor transform which applies the same filter 
repeatedly on some selected arguments, we can optimize for cases where 
there are repeated conversions. This allows the runtime to optimize the 
internal setup of public API methods such as 
MethodHandles.filterArguments(MethodHandle, int, MethodHandle...) and 
MethodHandle.asType(MethodType)

Webrev: http://cr.openjdk.java.net/~redestad/8213478/jdk.00/
Bug: https://bugs.openjdk.java.net/browse/JDK-8213478

Example: mh.asType(mt1) where the mh.type is (JJJJ)I and mt1 is (IIII)I 
would need to add I->J widening conversions to each argument. Each 
application of such a conversion C creates a new BMH with a new species, 
first (JIII)I, with C bound in as a 5th (or 6th, counting the MH) 
argument, then from that we create (JJII)I with C bound in again as the 
6th argument.. and so on.

With the optimization in place we can go from (IIII)I to (JJJJ)I in one 
step: bind in C as the 5th argument, apply expressions to the LF to call 
it on each parameter in order. In this example we then end up with only 
one additional BMH species rather than four.

In synthetic examples invoking methods with maximal number of arguments 
where all arguments need the same filter to be applied, the number of 
intermediary species and LF classes generated can be reduced from up to 
~290 to 6 (98% reduction)[1], while in less contrived examples the 
reduction is more modest, down to no difference in number of shapes 
generated in completely randomized tests. Repeated filters and 
conversions are common in practice, for example in the ISC 
implementation, where this patch modestly reduce the number of shapes 
and LFs generated in a range of tests - from substantial reductions down 
to around 2% for mostly random argument sequences[2]. Without a formal 
proof I posit that the upper bound of generated species and LFs is the 
same with both implementations, just that this reduces the need to 
generate certain intermediaries.

The proposed patch could surely be micro-optimized, e.g., there are 
cases where I'm copying arrays and creating TreeMaps around. I do this 
to simplify insertion of arguments into the right place and ensure 
execution happens in the expected order etc. Some of that could perhaps 
be specialized, but these overheads are tiny compared to the class 
generation overheads we're avoiding.

Thanks!

/Claes

[1] http://cr.openjdk.java.net/~redestad/8213478/BadInvoke.java

Before:
time java -Xlog:class+load BadInvoke | wc -l
819

real    0m0.753s
user    0m4.120s
sys    0m0.164s

After:
time java -Xlog:class+load BadInvoke | wc -l
540

real    0m0.267s
user    0m0.580s
sys    0m0.044s

[2] Generated ISC stress tests such as: 
http://cr.openjdk.java.net/~redestad/8213478/MixedStringCombinations.java 
- which loads and creates 37936 classes before, 37114 after, or -2.2%