early steps on the road to customization

Thu Aug 27 15:37:41 UTC 2020

Since several of these require trusting final fields, and there already 
is a TNSFF hook, does enabling that affect the macro-observed 
performance, or are these things hidden behind bigger roadblocks?

On 8/26/2020 7:14 PM, John Rose wrote:
> To succeed with generics over inlines, we need to make
> progress on (what I call) the Loop Customization Problem,
> a grand challenge for our ecosystem.  In a nutshell, we
> need to optimize hot code (pre-eminently loops or immediate
> calls within loops) that is written generically for reuse, and
> is in fact used in multiple specialized settings.
>
> If we had macros or templates, we might ask our users
> to code hot-but-reusable methods as macros of templates,
> but that would fragment the language.
>
> In the classical case of object-oriented reuse, we focus
> on optimizing a super-class method in the setting of
> a known receiver type (and perhaps other known types).
> To do this, the user must first write at least a small method
> in a subclass which calls the superclass method, and then
> wait for the JIT to inline the latter method into the former.
> The copy of superclass code that is inlined is then customized
> at least for the receiver type.
>
> (Sometimes the subclass method can be defined in an ad hoc
> way by the Java compiler.  This is how equals, hashCode,
> and toString are customized in the current preview of Records.
> This doesn’t scale unless you have language rules to guide the
> placement and contents of the injected methods.)
>
> Requiring even a small subclass method is sometimes too much
> manual ceremony.  An automatically spun lambda class has no
> place in the code to write even the smallest subclass method.
> And a generically-instantiated type like List<int> will not have
> any place to define subclass methods, because… it’s not a class,
> not even a hidden one.
>
> (What goes for generically reusable code also goes for generically
> reusable data, just as algorithms and data structures reinforce
> each other and are incomplete in isolation.  For the JVM,
> polymorphic layouts can defined using reflective techniques as
> in Panama.  Any solution to the code customization problem
> would provide a way to treat polymorphic container access
> as a specializable algorithm.)
>
> At the JVM level, what’s needed are mechanisms for organizing
> specializations of generically reusable code (and data structures)
> so that the JVM is guided to customize shared methods and
> install the results of customization along hot paths where they
> can be directly reached.  In HotSpot terms, we need the moral
> equivalent of a subclass method which replicates (via inlining)
> a superclass method in a customizable context.  Without the
> actual subclass method, we need a way to identify hot access
> points, such as v-table slots, which need the JIT to re-optimize
> the inherited method.  This implies a method versioning system,
> as well as something like an invocation counter on a v-table slot.
>
> Of course customization has to be filtered according to opportunity:
> There is no benefit to re-optimizing a method unless it makes use
> of its customization context.  I think we can heuristically locate
> opportunities for re-optimization, but I believe the user should
> be in the loop as well, with the JVM taking hints from source
> code structure:  Maybe annotations (though that’s a “register
> keyword” move), certainly hand-plumbed customization code
> (using ClassValue, for example), and pre-eminently any new
> language features associated with specialized generics.  If a
> library author opts into specialized generics, that opt-in can
> easily be parleyed into a clear signal to the JVM that code
> which is parameterized appropriately is a strong candidate
> for customized re-optimization.
>
> Some of what I’m saying here is blue-sky future, and other
> parts are in the sad bucket of “hard problems we don’t try to
> fix because they are too hard”.  But given the likely advent
> of specialized generics, it is time to start building what we
> can, and not waiting for language changes.  (Indeed, the
> language changes are waiting for a clearer indication of
> what the JVM can readily do!)
>
> So I’d like to propose a medium-sized challenge problem,
> a simple customizable list that works on both primitives
> and references.  This can be worked on today, and doesn’t
> require inline types or any other Valhalla mechanism.
>
> In order to make the JVM-level problems really clear,
> this example class uses ClassValue and MethodHandle APIs
> to express the customization cut points.  Perhaps a future
> language feature would compile specializable generic methods
> to self-customize using this same infrastructure, or at least
> something similar.
>
> The superclass holding the generic algorithm defines a
> ClassValue which spins up customized method handles
> for each significant variation of the class.  There is just
> one benchmark which runs repeatedly over the elements
> of a long list, using List::get (not iterators).  The benchmark
> wins if List::get is customized and loses (10x) if not.
>
> Here’s the code, and a README with further thoughts.
>
> http://cr.openjdk.java.net/~jrose/values/CustomizationExample/
> http://cr.openjdk.java.net/~jrose/values/CustomizationExample/README.html
>
> This is something we can work on today, in preparation
> of support of inlines (and primitive) in specialized generics.
>
> (IMO, this work also shows a way to properly factor the
> implementation of Record::toString etc., without ad
> hoc methods injected by javac.)
>
> Please give it a look, if you are interested in these things.
> N.B. I have not yet looked at the machine code of the benchmark,
> so it’s possible that the 10x slowdown (on a trip through the
> sharable method) has some low-hanging fruit.  But I’m
> pretty sure there is solid new work needed here.
>
> — John
>
> P.S. For those not keeping close score, the answer is not
> “just inline more deeply”, even if you can make my micro
> go faster by inlining it until it bleeds.  For robust performance
> of full applications, we something more, such as method
> versioning.