[lworld] RFR: 8235914: [lworld] Profile acmp bytecode

Mon Sep 21 19:13:19 UTC 2020

On Mon, 21 Sep 2020 15:35:46 GMT, Roland Westrelin <roland at openjdk.org> wrote:

> @rose00 Ideally, wouldn't we want to collect a set of:
> 
> (left class, right class, count)
> 
> so if:
> 
> 1. left and right are reference types, we compile acmp as a simple pointer comparison and guard that one of left or
> right is the profiled class

I see, now, that you are hoping for a monomorphic (or nearly monomorphic) left or right argument.  That would allow you
to speculate an exact type for that particular argument, which then simplifies matters.  I agree this is helpful.  If
you can correctly speculate the exact identity of a type, you don't need runtime tests for whether the type is inline
or not.

That said, you *only* need a runtime test for inline types *if* the left and right types are identical.  That's almost
as fast as correctly verifying a single type.

> 2. left is a reference type and right an inline type, we compile acmp as a simple pointer comparison and guard that
> left is the profiled class 3. left is an inline type and right a reference type, we compile acmp as a simple pointer
> comparison and guard that right is the profiled class

Cases 1..3 can be grouped as:

* (1..3) Either of left type or right right can be speculated as a (constant) reference type R.  Guard that type as R and
  finish with pointer comparison.  This is faster than computing *both* types and comparing for equality.  But it is more
  brittle:  It requires one side to be monomorphic.  If the monomorphism guard fails, comparing the two types for
  equality must be done, probably as the immediate next step.

> 4. left and right are inline types, we can guard that they are of different (resp same) types (or that each is of the
> corresponding profiled types) and compile a simple pointer comparison (resp a substitutability test)?

I guess my main contribution here is to suggest that the profile could be more helpful if it would predict whether the
specific condition of inline type equality is frequent or infrequent.  If frequent, then we compile in the S-test.  (If
frequent *and* monomorphic, we guard for the mono-type.  This will eventually help further inline the S-test.)
Otherwise, if infrequent, we guard for type equality (if we can't guard for a mono-type that's a reference) and
uncommon-trap the S-test.

There are several independent factors that can allow us to avoid compiling the S-test (and use an uncommon trap
instead):

A. We can guard on a known reference type, either left or right.  (Monomorphic sites only.)
B. We can guard on *unequal* types.  (Requires loading both types.  Doesn't require monomorphism.  Failure of equality
can quickly fall through to C to profit on equal reference types.) C. We can guard on whether a type is an inline type
or not.  (Requires a dependent load and test from a klass.)

I think those speculations are in order of preference.  To speculate A. we need a 1-element type profile on both sides
of the test, plus statistics on how many outlier events occurred.  That is handled by your design, and covers cases
1..3.  Monomorphism on *both* sides covers part of case 4, but that's going to be rarer than polymorphism on either or
both sides, I think.

Case 4 (guard equal types) is the same as B.  To speculate B. we want a profile on how often types are equal.  That's
not something you have here yet.  To collect it, you could profile klass equality, or (better IMO) profile klass
equality *only* if the klass is *also* an inline (maybe testing equality first, to avoid a dependent load and cache
pollution?).  The interpreter would collect this extra information.

To speculate C. (which is not on your list, and maybe is in the noise) we would need statistics on how often a type (on
either side) is inline.  That's something you can derive from a multi-element profile, in your current design, but only
if the profile doesn't overflow.  I think it's easier to just collect the inline bit separately.

To tie these requirements back to the interpreter profile structure:

A. Requires logging of *reference* types on either side, enough to detect monomorphism.  (Or maybe bi-morphism?
diminishing returns there...) B. Requires logging of *inequality* of *inline* types, enough to warrant an uncommon trap
if inline-equality occurs.  (I don't think unqualified inequality is an interesting thing to profile:  Too rare in
practice; it means a mostly-useless `acmp` test.) C. Requires logging of inline types.  If inline types are very rare,
then we can speculate against them.

This leads me to the following interpreter profile structure:

(left null seen)
{ (left type, frequency), … }, (miss frequency)  // A left
(right null seen)
{ (right type, frequency), … }, (miss frequency)  // A right
{ (same inline type, frequency) … }, (miss frequency)  // B/C left==right && left.inline
(left inline seen)
(right inline seen)

That's three type-lists instead of two, which seems to get unwieldy.  It also suggests that I'm really talking about a
follow-on RFE (which I'm sure has crossed your mind already).

We could simplify this structure and make it more robust as follows:

(left null seen), (right null seen), (same inline seen), (left inline seen), (right inline seen)
{ (type, code, frequency) … }
// enum code { left, right, same }; in low 2 bits of profile record count
(left miss frequency)
(right miss frequency)
(same inline miss frequency)

The idea is to merge the lists, but keep separate "long tail" miss frequencies.  This makes sense for `acmp` because it
is a symmetric operation, and you only need monomorphism on one side.  Arguably, you just need one array of types (at
least 3) to prove monomorphism for either side, and then you get a slightly smaller profile for the other side.  The
equal-types case (significant only for `acmp`) is a natural extension.

My suggestion is to put most of this aside as an RFE.  But you could consider, right now, merging the two lists, adding
two type codes (for left and right).  Then adding the third type code (same) would be simpler as an RFE.

> Anyway, at this point, compiled code calls java.lang.invoke.ValueBootstrapMethods::isSubstitutable() for a
> substituability test. It doesn't even take advantage of known inline types to dispatch to the right comparison method.
> So, profile data makes no difference and we would first need to optimize acmp for known inline types.

That's a moving target.  Clearly we want to inline the polymorphic S-test and then try to "de-virtualize" it when we
can.  The current code has fast paths for all the cases we have already discussed (A, B, C above), and then uses a
`jl.ClassValue` to dispatch to a handler.  Devirtualizing that dispatch will require (a) type information about the
inline type shared by the two operands, and (b) a way to inline through a constant `ClassValue::get` call (please see
https://bugs.openjdk.java.net/browse/JDK-8238260).

The profile information to (eventually) drive JDK-8238260 will be either side, but it will be most efficiently
collected if it is filtered down to cases where (a) the two sides have the same klass and (b) that klass is an inline.
If that filtered profile is monomorphic (which doesn't require the profile as a whole to be monomorphic) then
JDK-8238260 (or some other ad hoc devirtualization of the S-test) can get a serious win.

-------------

PR: https://git.openjdk.java.net/valhalla/pull/185