early steps on the road to customization
john.r.rose at oracle.com
Wed Aug 26 23:14:11 UTC 2020
To succeed with generics over inlines, we need to make
progress on (what I call) the Loop Customization Problem,
a grand challenge for our ecosystem. In a nutshell, we
need to optimize hot code (pre-eminently loops or immediate
calls within loops) that is written generically for reuse, and
is in fact used in multiple specialized settings.
If we had macros or templates, we might ask our users
to code hot-but-reusable methods as macros of templates,
but that would fragment the language.
In the classical case of object-oriented reuse, we focus
on optimizing a super-class method in the setting of
a known receiver type (and perhaps other known types).
To do this, the user must first write at least a small method
in a subclass which calls the superclass method, and then
wait for the JIT to inline the latter method into the former.
The copy of superclass code that is inlined is then customized
at least for the receiver type.
(Sometimes the subclass method can be defined in an ad hoc
way by the Java compiler. This is how equals, hashCode,
and toString are customized in the current preview of Records.
This doesn’t scale unless you have language rules to guide the
placement and contents of the injected methods.)
Requiring even a small subclass method is sometimes too much
manual ceremony. An automatically spun lambda class has no
place in the code to write even the smallest subclass method.
And a generically-instantiated type like List<int> will not have
any place to define subclass methods, because… it’s not a class,
not even a hidden one.
(What goes for generically reusable code also goes for generically
reusable data, just as algorithms and data structures reinforce
each other and are incomplete in isolation. For the JVM,
polymorphic layouts can defined using reflective techniques as
in Panama. Any solution to the code customization problem
would provide a way to treat polymorphic container access
as a specializable algorithm.)
At the JVM level, what’s needed are mechanisms for organizing
specializations of generically reusable code (and data structures)
so that the JVM is guided to customize shared methods and
install the results of customization along hot paths where they
can be directly reached. In HotSpot terms, we need the moral
equivalent of a subclass method which replicates (via inlining)
a superclass method in a customizable context. Without the
actual subclass method, we need a way to identify hot access
points, such as v-table slots, which need the JIT to re-optimize
the inherited method. This implies a method versioning system,
as well as something like an invocation counter on a v-table slot.
Of course customization has to be filtered according to opportunity:
There is no benefit to re-optimizing a method unless it makes use
of its customization context. I think we can heuristically locate
opportunities for re-optimization, but I believe the user should
be in the loop as well, with the JVM taking hints from source
code structure: Maybe annotations (though that’s a “register
keyword” move), certainly hand-plumbed customization code
(using ClassValue, for example), and pre-eminently any new
language features associated with specialized generics. If a
library author opts into specialized generics, that opt-in can
easily be parleyed into a clear signal to the JVM that code
which is parameterized appropriately is a strong candidate
for customized re-optimization.
Some of what I’m saying here is blue-sky future, and other
parts are in the sad bucket of “hard problems we don’t try to
fix because they are too hard”. But given the likely advent
of specialized generics, it is time to start building what we
can, and not waiting for language changes. (Indeed, the
language changes are waiting for a clearer indication of
what the JVM can readily do!)
So I’d like to propose a medium-sized challenge problem,
a simple customizable list that works on both primitives
and references. This can be worked on today, and doesn’t
require inline types or any other Valhalla mechanism.
In order to make the JVM-level problems really clear,
this example class uses ClassValue and MethodHandle APIs
to express the customization cut points. Perhaps a future
language feature would compile specializable generic methods
to self-customize using this same infrastructure, or at least
The superclass holding the generic algorithm defines a
ClassValue which spins up customized method handles
for each significant variation of the class. There is just
one benchmark which runs repeatedly over the elements
of a long list, using List::get (not iterators). The benchmark
wins if List::get is customized and loses (10x) if not.
Here’s the code, and a README with further thoughts.
This is something we can work on today, in preparation
of support of inlines (and primitive) in specialized generics.
(IMO, this work also shows a way to properly factor the
implementation of Record::toString etc., without ad
hoc methods injected by javac.)
Please give it a look, if you are interested in these things.
N.B. I have not yet looked at the machine code of the benchmark,
so it’s possible that the 10x slowdown (on a trip through the
sharable method) has some low-hanging fruit. But I’m
pretty sure there is solid new work needed here.
P.S. For those not keeping close score, the answer is not
“just inline more deeply”, even if you can make my micro
go faster by inlining it until it bleeds. For robust performance
of full applications, we something more, such as method
More information about the valhalla-dev