RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v6]

Thu Jan 9 11:28:40 UTC 2025

On Fri, 3 Jan 2025 08:48:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> That's right. Neoverse V2 is 4 pipes of 128 bits, V1 is 2 pipes of 256 bits.
>> That comment is "interesting". Maybe it should be tunable by the back end. Given that Neoverse V2 can issue 4 SVE operations per clock cycle, it might still be a win.
>> 
>> Galder, how about you disable that line and give it another try?
>
> FYI: I'm working on removing the line [here](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/share/opto/superword.cpp#L1564-L1566).
> 
> The issue is that on some platforms 2-element vectors are somehow really slower, and we need a cost-model to give us a better heuristic, rather than the hard "no". See my draft https://github.com/openjdk/jdk/pull/20964.
> 
> But yes: why don't you remove the line, and see if that makes it work. If so, then don't worry about this case for now, and maybe leave a comment in the test. We can then fix that later.

Yeah, this limit limits reductions like this working on 128 bit registers:

      // Length 2 reductions of INT/LONG do not offer performance benefits
      if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) {
        retValue = false;

I've tried today to remove that but then the profitable checks fail to pass. So, I'm not going down that route now.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1908608309