[lworld] RFR: 8235914: [lworld] Profile acmp bytecode

Fri Sep 25 14:27:04 UTC 2020

On Mon, 21 Sep 2020 19:10:21 GMT, John R Rose <jrose at openjdk.org> wrote:

>> @rose00 Ideally, wouldn't we want to collect a set of:
>> 
>> (left class, right class, count)
>> 
>> so if:
>> 
>> 1. left and right are reference types, we compile acmp as a simple pointer comparison and guard that one of left or
>> right is the profiled class 2. left is a reference type and right an inline type, we compile acmp as a simple pointer
>> comparison and guard that left is the profiled class 3. left is an inline type and right a reference type, we compile
>> acmp as a simple pointer comparison and guard that right is the profiled class 4. left and right are inline types, we
>> can guard that they are of different (resp same) types (or that each is of the corresponding profiled types) and
>> compile a simple pointer comparison (resp a substitutability test)?  Anyway, at this point, compiled code calls
>> java.lang.invoke.ValueBootstrapMethods::isSubstitutable() for a substituability test. It doesn't even take advantage of
>> known inline types to dispatch to the right comparison method. So, profile data makes no difference and we would first
>> need to optimize acmp for known inline types.
>
>> @rose00 Ideally, wouldn't we want to collect a set of:
>> 
>> (left class, right class, count)
>> 
>> so if:
>> 
>> 1. left and right are reference types, we compile acmp as a simple pointer comparison and guard that one of left or
>> right is the profiled class
> 
> I see, now, that you are hoping for a monomorphic (or nearly monomorphic) left or right argument.  That would allow you
> to speculate an exact type for that particular argument, which then simplifies matters.  I agree this is helpful.  If
> you can correctly speculate the exact identity of a type, you don't need runtime tests for whether the type is inline
> or not.  That said, you *only* need a runtime test for inline types *if* the left and right types are identical.
> That's almost as fast as correctly verifying a single type.
>> 2. left is a reference type and right an inline type, we compile acmp as a simple pointer comparison and guard that
>> left is the profiled class 3. left is an inline type and right a reference type, we compile acmp as a simple pointer
>> comparison and guard that right is the profiled class
> 
> Cases 1..3 can be grouped as:
> 
> * (1..3) Either of left type or right right can be speculated as a (constant) reference type R.  Guard that type as R and
>   finish with pointer comparison.  This is faster than computing *both* types and comparing for equality.  But it is more
>   brittle:  It requires one side to be monomorphic.  If the monomorphism guard fails, comparing the two types for
>   equality must be done, probably as the immediate next step.
> 
>> 4. left and right are inline types, we can guard that they are of different (resp same) types (or that each is of the
>> corresponding profiled types) and compile a simple pointer comparison (resp a substitutability test)?
> 
> I guess my main contribution here is to suggest that the profile could be more helpful if it would predict whether the
> specific condition of inline type equality is frequent or infrequent.  If frequent, then we compile in the S-test.  (If
> frequent *and* monomorphic, we guard for the mono-type.  This will eventually help further inline the S-test.)
> Otherwise, if infrequent, we guard for type equality (if we can't guard for a mono-type that's a reference) and
> uncommon-trap the S-test.  There are several independent factors that can allow us to avoid compiling the S-test (and
> use an uncommon trap instead):  A. We can guard on a known reference type, either left or right.  (Monomorphic sites
> only.) B. We can guard on *unequal* types.  (Requires loading both types.  Doesn't require monomorphism.  Failure of
> equality can quickly fall through to C to profit on equal reference types.) C. We can guard on whether a type is an
> inline type or not.  (Requires a dependent load and test from a klass.)  I think those speculations are in order of
> preference.  To speculate A. we need a 1-element type profile on both sides of the test, plus statistics on how many
> outlier events occurred.  That is handled by your design, and covers cases 1..3.  Monomorphism on *both* sides covers
> part of case 4, but that's going to be rarer than polymorphism on either or both sides, I think.  Case 4 (guard equal
> types) is the same as B.  To speculate B. we want a profile on how often types are equal.  That's not something you
> have here yet.  To collect it, you could profile klass equality, or (better IMO) profile klass equality *only* if the
> klass is *also* an inline (maybe testing equality first, to avoid a dependent load and cache pollution?).  The
> interpreter would collect this extra information.  To speculate C. (which is not on your list, and maybe is in the
> noise) we would need statistics on how often a type (on either side) is inline.  That's something you can derive from a
> multi-element profile, in your current design, but only if the profile doesn't overflow.  I think it's easier to just
> collect the inline bit separately.  To tie these requirements back to the interpreter profile structure:  A. Requires
> logging of *reference* types on either side, enough to detect monomorphism.  (Or maybe bi-morphism?  diminishing
> returns there...) B. Requires logging of *inequality* of *inline* types, enough to warrant an uncommon trap if
> inline-equality occurs.  (I don't think unqualified inequality is an interesting thing to profile:  Too rare in
> practice; it means a mostly-useless `acmp` test.) C. Requires logging of inline types.  If inline types are very rare,
> then we can speculate against them.  This leads me to the following interpreter profile structure:  (left null seen) {
> (left type, frequency), … }, (miss frequency)  // A left (right null seen)
> { (right type, frequency), … }, (miss frequency)  // A right
> { (same inline type, frequency) … }, (miss frequency)  // B/C left==right && left.inline
> (left inline seen)
> (right inline seen)
> 
> That's three type-lists instead of two, which seems to get unwieldy.  It also suggests that I'm really talking about a
> follow-on RFE (which I'm sure has crossed your mind already).
> We could simplify this structure and make it more robust as follows:
> 
> (left null seen), (right null seen), (same inline seen), (left inline seen), (right inline seen)
> { (type, code, frequency) … }
> // enum code { left, right, same }; in low 2 bits of profile record count
> (left miss frequency)
> (right miss frequency)
> (same inline miss frequency)
> 
> The idea is to merge the lists, but keep separate "long tail" miss frequencies.  This makes sense for `acmp` because it
> is a symmetric operation, and you only need monomorphism on one side.  Arguably, you just need one array of types (at
> least 3) to prove monomorphism for either side, and then you get a slightly smaller profile for the other side.  The
> equal-types case (significant only for `acmp`) is a natural extension.  My suggestion is to put most of this aside as
> an RFE.  But you could consider, right now, merging the two lists, adding two type codes (for left and right).  Then
> adding the third type code (same) would be simpler as an RFE.
>> Anyway, at this point, compiled code calls java.lang.invoke.ValueBootstrapMethods::isSubstitutable() for a
>> substituability test. It doesn't even take advantage of known inline types to dispatch to the right comparison method.
>> So, profile data makes no difference and we would first need to optimize acmp for known inline types.
> 
> That's a moving target.  Clearly we want to inline the polymorphic S-test and then try to "de-virtualize" it when we
> can.  The current code has fast paths for all the cases we have already discussed (A, B, C above), and then uses a
> `jl.ClassValue` to dispatch to a handler.  Devirtualizing that dispatch will require (a) type information about the
> inline type shared by the two operands, and (b) a way to inline through a constant `ClassValue::get` call (please see
> https://bugs.openjdk.java.net/browse/JDK-8238260).  The profile information to (eventually) drive JDK-8238260 will be
> either side, but it will be most efficiently collected if it is filtered down to cases where (a) the two sides have the
> same klass and (b) that klass is an inline.  If that filtered profile is monomorphic (which doesn't require the profile
> as a whole to be monomorphic) then JDK-8238260 (or some other ad hoc devirtualization of the S-test) can get a serious
> win.

@rose00 Thanks for the discussion and suggestions. I propose I file 2 RFEs for this (one to improve handling of known
inline types at acmp and another one to improve profile collection at acmp). I think the patch as it is today is a
worthwhile improvement and I'd like to move forward with it as it is (even small improvements to profiling can turn out
to be quite a bit of work given the interpreter, c1 and c2 all need to be adjusted).

-------------

PR: https://git.openjdk.java.net/valhalla/pull/185