RFR: 8340093: C2 SuperWord: implement cost model [v4]

Thu Nov 6 07:59:08 UTC 2025

On Thu, 6 Nov 2025 07:49:07 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> To be a little more precise, the strict one should be something like:
>> 
>>     vlen * (1 + Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, vlen)) + (vlen - 1) * (1 + Matcher::scalar_op_pre_select_sz_estimate(opcode, bt)));
>> 
>> and the non-strict one would be:
>> 
>>     float c = Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, 2) * 2 + Matcher::scalar_op_pre_select_sz_estimate(opcode) + 3;
>>     for (int i = 4; i <= vlen; i *= 2) {
>>       c += 2 + Matcher::vector_op_pre_select_sz_estimate(Op_VectorRearrange, bt, i) + Matcher::vector_op_pre_select_sz_estimate(opcode, bt, i);
>>     }
>> 
>> Maybe refactoring a little bit to make the `Matcher::vector_op_pre_select_sz_estimate` less awkward would be welcomed, too. Currently, it returns the estimated size - 1, which is unsettling.
>
> @merykitty Can we do that in a follow-up RFE? For now, I'd like to keep it as simple as possible. Cost-models can become arbitrarily complex. There is a bit of a trade-off between simplicity and accuracy. And we can for sure improve things in the future, this PR just lays the foundation.
> 
> My goal here is to start as simple as possible, and then add complexity if there is a proven need for it.
> 
> So if you/we can find a benchmark where the cost model is not accurate enough yet, provable by `-XX:AutoVectorizationOverrideProfitability=0/2`, then we should make it more complex.
> 
> Would that be acceptable for you?

What exactly does `Matcher::vector_op_pre_select_sz_estimate` return? The number of instructions or some kind of throughput estimate?

Personally, I don't want to get too stuck to counting instructions, but rather getting a throughput estimate. Counting instructions is an estimate for throughput, but I don't know yet if longterm it is the best.

I would like to wait a little more, and start depending on the cost model for more and more cases (extract, pack, shuffle, if-conversion, ...) and then we will run into issues along the way where the cost model is not yet accurate enough. And at that point we can think again what would produce the most accurate results.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27803#discussion_r2497872332