Lambda special inline treatment is desirable elsewhere

Thu Sep 7 17:49:08 UTC 2023

I'm considering a patch to improve the performance of a common pattern
in my (and plausibly others') application. The pattern relates to
polymorphic processing of records or tuples, e.g. serializing or
deserializing an Avro or CSV record, or evaluating a runtime-constructed
expression tree.

You have an immutable tree (often a mere list) of objects implementing a
common interface. From a CHA perspective the interface is megamorphic,
but it's always runtime-monomorphic at most call sites (anything within
the immutable tree). The methods themselves are often cheap, sometimes
as simple as reading a single byte from a stream or doing a single
arithmetic operation, so it's imperative that they all be inlined. You
might say these methods are "dominated by their composition with other
methods".

In practice it is not possible to tune C2's inlining acceptably for this
pattern for a few reasons, but the major one is the recursive inlining
detection. If your tuple-processor has to deal with, say, 15-integer
type values (so 15 of the methods in the immutable call tree happen to
be the same method), you won't see any inlining happen because
InlineTree::try_to_inline considers these calls recursive and the
default MaxRecursiveInlineLevel is 1. Intuitively, these calls aren't
really "recursive" in the classic sense; the number of calls to the same
method is statically bounded, and there's nothing significant about them
being the same call anyway; they could just as well have been different
calls if the tuple types at those positions had been different.

It seems this problem was well-observed with lambdas, because there's an
exception carved out in try_to_inline for lambda-form methods. In those
cases, we check to see if the argument 0 ("receiver") of the method is
the same before considering it recursive.

One patch I tested is extending that lambda-form detection of recursive
inlining to all non-static methods. That solves my performance problem
and doesn't appear to cause any new performance problems in my project,
but I can imagine cases where it might be problematic. Still, I think
it's worth considering as a solution if it hasn't been already.

Another patch I've got is one that treats any non-static method that is
also @ForceInline the same as lambda-form methods in the recursive
inline check, along with a change to classFileParser.cpp to allow the
use of @ForceInline outside of privileged code (the latter change I'd
bet has been proposed before). This also solves my problem, but I doubt
it would be acceptable upstream.

I think my intuition about lambdas--which I'd hesitantly suggest is the
popular intuition about lambdas--being merely "syntatic sugar" for
ad-hoc abstract method implementations is at odds with the current state
of C2. The more considerate and aggressive inlining behavior is
extremely important for any immutable tree of compile-time-polymorphic,
runtime-monomorphic calls. It's unfortunate that the only way to access
that behavior is by using a different syntax, which may not be
appropriate for other reasons.

I'd appreciate any better ideas than the ones I've proposed here. I only
started digging into this recently and it's my first time on the openjdk
lists, so thanks in advance for your patience.

Randall