[lworld] RFR: 8235914: [lworld] Profile acmp bytecode
Roland Westrelin
roland at openjdk.java.net
Fri Sep 25 14:27:04 UTC 2020
On Mon, 21 Sep 2020 19:10:21 GMT, John R Rose <jrose at openjdk.org> wrote:
>> @rose00 Ideally, wouldn't we want to collect a set of:
>>
>> (left class, right class, count)
>>
>> so if:
>>
>> 1. left and right are reference types, we compile acmp as a simple pointer comparison and guard that one of left or
>> right is the profiled class 2. left is a reference type and right an inline type, we compile acmp as a simple pointer
>> comparison and guard that left is the profiled class 3. left is an inline type and right a reference type, we compile
>> acmp as a simple pointer comparison and guard that right is the profiled class 4. left and right are inline types, we
>> can guard that they are of different (resp same) types (or that each is of the corresponding profiled types) and
>> compile a simple pointer comparison (resp a substitutability test)? Anyway, at this point, compiled code calls
>> java.lang.invoke.ValueBootstrapMethods::isSubstitutable() for a substituability test. It doesn't even take advantage of
>> known inline types to dispatch to the right comparison method. So, profile data makes no difference and we would first
>> need to optimize acmp for known inline types.
>
>> @rose00 Ideally, wouldn't we want to collect a set of:
>>
>> (left class, right class, count)
>>
>> so if:
>>
>> 1. left and right are reference types, we compile acmp as a simple pointer comparison and guard that one of left or
>> right is the profiled class
>
> I see, now, that you are hoping for a monomorphic (or nearly monomorphic) left or right argument. That would allow you
> to speculate an exact type for that particular argument, which then simplifies matters. I agree this is helpful. If
> you can correctly speculate the exact identity of a type, you don't need runtime tests for whether the type is inline
> or not. That said, you *only* need a runtime test for inline types *if* the left and right types are identical.
> That's almost as fast as correctly verifying a single type.
>> 2. left is a reference type and right an inline type, we compile acmp as a simple pointer comparison and guard that
>> left is the profiled class 3. left is an inline type and right a reference type, we compile acmp as a simple pointer
>> comparison and guard that right is the profiled class
>
> Cases 1..3 can be grouped as:
>
> * (1..3) Either of left type or right right can be speculated as a (constant) reference type R. Guard that type as R and
> finish with pointer comparison. This is faster than computing *both* types and comparing for equality. But it is more
> brittle: It requires one side to be monomorphic. If the monomorphism guard fails, comparing the two types for
> equality must be done, probably as the immediate next step.
>
>> 4. left and right are inline types, we can guard that they are of different (resp same) types (or that each is of the
>> corresponding profiled types) and compile a simple pointer comparison (resp a substitutability test)?
>
> I guess my main contribution here is to suggest that the profile could be more helpful if it would predict whether the
> specific condition of inline type equality is frequent or infrequent. If frequent, then we compile in the S-test. (If
> frequent *and* monomorphic, we guard for the mono-type. This will eventually help further inline the S-test.)
> Otherwise, if infrequent, we guard for type equality (if we can't guard for a mono-type that's a reference) and
> uncommon-trap the S-test. There are several independent factors that can allow us to avoid compiling the S-test (and
> use an uncommon trap instead): A. We can guard on a known reference type, either left or right. (Monomorphic sites
> only.) B. We can guard on *unequal* types. (Requires loading both types. Doesn't require monomorphism. Failure of
> equality can quickly fall through to C to profit on equal reference types.) C. We can guard on whether a type is an
> inline type or not. (Requires a dependent load and test from a klass.) I think those speculations are in order of
> preference. To speculate A. we need a 1-element type profile on both sides of the test, plus statistics on how many
> outlier events occurred. That is handled by your design, and covers cases 1..3. Monomorphism on *both* sides covers
> part of case 4, but that's going to be rarer than polymorphism on either or both sides, I think. Case 4 (guard equal
> types) is the same as B. To speculate B. we want a profile on how often types are equal. That's not something you
> have here yet. To collect it, you could profile klass equality, or (better IMO) profile klass equality *only* if the
> klass is *also* an inline (maybe testing equality first, to avoid a dependent load and cache pollution?). The
> interpreter would collect this extra information. To speculate C. (which is not on your list, and maybe is in the
> noise) we would need statistics on how often a type (on either side) is inline. That's something you can derive from a
> multi-element profile, in your current design, but only if the profile doesn't overflow. I think it's easier to just
> collect the inline bit separately. To tie these requirements back to the interpreter profile structure: A. Requires
> logging of *reference* types on either side, enough to detect monomorphism. (Or maybe bi-morphism? diminishing
> returns there...) B. Requires logging of *inequality* of *inline* types, enough to warrant an uncommon trap if
> inline-equality occurs. (I don't think unqualified inequality is an interesting thing to profile: Too rare in
> practice; it means a mostly-useless `acmp` test.) C. Requires logging of inline types. If inline types are very rare,
> then we can speculate against them. This leads me to the following interpreter profile structure: (left null seen) {
> (left type, frequency), … }, (miss frequency) // A left (right null seen)
> { (right type, frequency), … }, (miss frequency) // A right
> { (same inline type, frequency) … }, (miss frequency) // B/C left==right && left.inline
> (left inline seen)
> (right inline seen)
>
> That's three type-lists instead of two, which seems to get unwieldy. It also suggests that I'm really talking about a
> follow-on RFE (which I'm sure has crossed your mind already).
> We could simplify this structure and make it more robust as follows:
>
> (left null seen), (right null seen), (same inline seen), (left inline seen), (right inline seen)
> { (type, code, frequency) … }
> // enum code { left, right, same }; in low 2 bits of profile record count
> (left miss frequency)
> (right miss frequency)
> (same inline miss frequency)
>
> The idea is to merge the lists, but keep separate "long tail" miss frequencies. This makes sense for `acmp` because it
> is a symmetric operation, and you only need monomorphism on one side. Arguably, you just need one array of types (at
> least 3) to prove monomorphism for either side, and then you get a slightly smaller profile for the other side. The
> equal-types case (significant only for `acmp`) is a natural extension. My suggestion is to put most of this aside as
> an RFE. But you could consider, right now, merging the two lists, adding two type codes (for left and right). Then
> adding the third type code (same) would be simpler as an RFE.
>> Anyway, at this point, compiled code calls java.lang.invoke.ValueBootstrapMethods::isSubstitutable() for a
>> substituability test. It doesn't even take advantage of known inline types to dispatch to the right comparison method.
>> So, profile data makes no difference and we would first need to optimize acmp for known inline types.
>
> That's a moving target. Clearly we want to inline the polymorphic S-test and then try to "de-virtualize" it when we
> can. The current code has fast paths for all the cases we have already discussed (A, B, C above), and then uses a
> `jl.ClassValue` to dispatch to a handler. Devirtualizing that dispatch will require (a) type information about the
> inline type shared by the two operands, and (b) a way to inline through a constant `ClassValue::get` call (please see
> https://bugs.openjdk.java.net/browse/JDK-8238260). The profile information to (eventually) drive JDK-8238260 will be
> either side, but it will be most efficiently collected if it is filtered down to cases where (a) the two sides have the
> same klass and (b) that klass is an inline. If that filtered profile is monomorphic (which doesn't require the profile
> as a whole to be monomorphic) then JDK-8238260 (or some other ad hoc devirtualization of the S-test) can get a serious
> win.
@rose00 Thanks for the discussion and suggestions. I propose I file 2 RFEs for this (one to improve handling of known
inline types at acmp and another one to improve profile collection at acmp). I think the patch as it is today is a
worthwhile improvement and I'd like to move forward with it as it is (even small improvements to profiling can turn out
to be quite a bit of work given the interpreter, c1 and c2 all need to be adjusted).
-------------
PR: https://git.openjdk.java.net/valhalla/pull/185
More information about the valhalla-dev
mailing list