Lambda special inline treatment is desirable elsewhere
Remi Forax
forax at univ-mlv.fr
Tue Sep 26 20:54:52 UTC 2023
Hi Randall,
you can use the destructured visitor pattern for that, i.e. a visitor pattern where for each visit method, you can ask for several inlining caches to do the recursive calls
Let suppose you have the following hierarchy
interface Vehicle {}
record Car(int passenger) implements Vehicle { }
record CarrierTruck(Vehicle vehicle) implements Vehicle { }
The idea is to provide a visit method that do the work for a subtype (here CarrierTruck) but also ask to get a dispatch method to visit the rest of the tree.
The annotation @Signature is the signature of the dispatch method. Here, it returns a String and takes a Vehicle and an int as parameters.
static String toJSON(CarrierTruck truck, int depth,
@Signature({String.class, Vehicle.class, int.class}) MethodHandle dispatch) throws Throwable {
return STR."""
{
"vehicle" : \{ (String) dispatch.invokeExact(truck.vehicle, depth + 2) }
}
""".indent(depth);
}
Then you can create a destructured visitor with all the visit method and ask for a dispatch method
var lookup = MethodHandles.lookup();
var methods = Arrays.stream(Demo.class.getDeclaredMethods())
.filter(method -> method.getName().equals("toJSON"))
.toList();
var destructuredVisitor = DestructuredVisitor.of(lookup, methods);
var mh = destructuredVisitor.createVisitor(String.class, Demo.Vehicle.class, int.class);
System.out.println("mh " + mh); // the dispatch method
var truck = new Demo.CarrierTruck(new Demo.Car(4));
var result = (String) mh.invokeExact((Demo.Vehicle) truck, 0);
System.out.println(result);
Here is a gist with a rough sketch showing how to implement it
https://gist.github.com/forax/176a61cc9707c884bfb059c31811dac9
I believe the JIT should be able to fully optimize this kind of code.
regard,
Rémi
----- Original Message -----
> From: "Randall Oveson" <rco at randalloveson.com>
> To: "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
> Cc: "hotspot-compiler-dev" <hotspot-compiler-dev at openjdk.org>
> Sent: Tuesday, September 26, 2023 10:00:13 PM
> Subject: Re: Lambda special inline treatment is desirable elsewhere
> Hi Vladimir,
>
> Thanks for the reply. I've been poking at this problem for the past couple weeks
> but mostly abandoned my attempts to solve it using the JIT alone in favor of
> generating composed functions using ASM. However, I would still love to see a
> future where that isn't necessary and I can just throw together a tower of
> virtual calls and have them either inlined or staticized by the JIT reliably.
>
> I've put together an example that I think better illustrates the problem I'm
> trying to solve[0]. Instead of a JVM patch, here I get the performance increase
> I want by copy-and-pasting my composing classes and never reusing the same one,
> causing the JIT to consider all the call sites monomorphic and freely inline or
> staticize them. The baseline is the "idiomatic" form, and suffers from
> recursive inline limits as well as virtual call sites in ways that I wish it
> didn't.
>
> This problem is more severe the more polymorphism there is (bigger runtime
> dispatch tables?) and the simpler the logical behavior of the complete
> invocation is (e.g. an expression compiler working with `x + 1000 / y - 50 * z
> / 2`, or an encoder going from an Object[] of Long, Long, Int, Double, Long to
> a compact struct in off-heap memory). Instances where throughput of the
> hand-written function would be extremely high, so the
> interface-composed-at-runtime function is dominated by dispatch.
>
> 0. https://github.com/randalloveson/hotspot-inline-example
>
>
>
>
> ------- Original Message -------
> On Tuesday, September 26th, 2023 at 11:18 AM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com> wrote:
>
>
>>
>>
>> Hi Randall,
>>
>> I don't fully understand what kind of change you experimented with. Do
>> you mind sharing the patch?
>>
>> Compilers have special handling for lambda forms
>> (java.lang.invoke.LambdaForm) which are the crucial piece of performant
>> invokedynamic and java.lang.invoke implementation. Lambda forms are
>> aggressively shared and many distinct MethodHandles share LambdaForm
>> instances. Based on that knowledge, JVM special case them in several
>> places. The check you refer to in InlineTree::try_to_inline() lifts
>> recursive inlining constraints from LambdaForms to MethodHandles, but
>> the constraint is still there.
>>
>> Lambdas (Java language feature) are implemented on top of invokedynamic
>> and JVM doesn't do anything particular to optimize specifically for them.
>>
>> It would be really helpful if you share a benchmark demonstrating the
>> use case you care about.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 9/7/23 10:49, Randall Oveson wrote:
>>
>> > I'm considering a patch to improve the performance of a common pattern
>> > in my (and plausibly others') application. The pattern relates to
>> > polymorphic processing of records or tuples, e.g. serializing or
>> > deserializing an Avro or CSV record, or evaluating a runtime-constructed
>> > expression tree.
>> >
>> > You have an immutable tree (often a mere list) of objects implementing a
>> > common interface. From a CHA perspective the interface is megamorphic,
>> > but it's always runtime-monomorphic at most call sites (anything within
>> > the immutable tree). The methods themselves are often cheap, sometimes
>> > as simple as reading a single byte from a stream or doing a single
>> > arithmetic operation, so it's imperative that they all be inlined. You
>> > might say these methods are "dominated by their composition with other
>> > methods".
>> >
>> > In practice it is not possible to tune C2's inlining acceptably for this
>> > pattern for a few reasons, but the major one is the recursive inlining
>> > detection. If your tuple-processor has to deal with, say, 15-integer
>> > type values (so 15 of the methods in the immutable call tree happen to
>> > be the same method), you won't see any inlining happen because
>> > InlineTree::try_to_inline considers these calls recursive and the
>> > default MaxRecursiveInlineLevel is 1. Intuitively, these calls aren't
>> > really "recursive" in the classic sense; the number of calls to the same
>> > method is statically bounded, and there's nothing significant about them
>> > being the same call anyway; they could just as well have been different
>> > calls if the tuple types at those positions had been different.
>> >
>> > It seems this problem was well-observed with lambdas, because there's an
>> > exception carved out in try_to_inline for lambda-form methods. In those
>> > cases, we check to see if the argument 0 ("receiver") of the method is
>> > the same before considering it recursive.
>> >
>> > One patch I tested is extending that lambda-form detection of recursive
>> > inlining to all non-static methods. That solves my performance problem
>> > and doesn't appear to cause any new performance problems in my project,
>> > but I can imagine cases where it might be problematic. Still, I think
>> > it's worth considering as a solution if it hasn't been already.
>> >
>> > Another patch I've got is one that treats any non-static method that is
>> > also @ForceInline the same as lambda-form methods in the recursive
>> > inline check, along with a change to classFileParser.cpp to allow the
>> > use of @ForceInline outside of privileged code (the latter change I'd
>> > bet has been proposed before). This also solves my problem, but I doubt
>> > it would be acceptable upstream.
>> >
>> > I think my intuition about lambdas--which I'd hesitantly suggest is the
>> > popular intuition about lambdas--being merely "syntatic sugar" for
>> > ad-hoc abstract method implementations is at odds with the current state
>> > of C2. The more considerate and aggressive inlining behavior is
>> > extremely important for any immutable tree of compile-time-polymorphic,
>> > runtime-monomorphic calls. It's unfortunate that the only way to access
>> > that behavior is by using a different syntax, which may not be
>> > appropriate for other reasons.
>> >
>> > I'd appreciate any better ideas than the ones I've proposed here. I only
>> > started digging into this recently and it's my first time on the openjdk
>> > lists, so thanks in advance for your patience.
>> >
> > > Randall
More information about the hotspot-compiler-dev
mailing list