RFR (M) 8222074: Enhance auto vectorization for x86

Tue Apr 9 23:33:52 UTC 2019

The current sizes are as follows:
libjvm.so                            : 24358297
libjvm.so + 8222074        : 24718043  (1.5%)
libjvm.so (panama vapi ): 28222649  (16%)

This patch (8222074) adds about 1.5% to the overall size of libjvm.so.

Best Regards,
Sandhya

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Ivanov
Sent: Tuesday, April 09, 2019 3:03 PM
To: John Rose <john.r.rose at oracle.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>
Cc: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86

On 09/04/2019 12:53, John Rose wrote:
> I agree that the AD file combinatorics need to be tamed more 
> aggressively in this way.
> 
> I have to wonder if this could be done during incubation, as a cleanup 
> of tech. debt, so we can get experience with the API at the same time.

I believe it depends on how severe static footprint increase will be.

Last time I checked [1], Vector API-related changes (w/o SVML stubs) contributed ~3Mb/15% to libjvm.so on Linux.

It would be very helpful to have more detailed and up-to-date information on that.

Best regards,
Vladimir Ivanov

[1]
http://mail.openjdk.java.net/pipermail/panama-dev/2018-October/002992.html

    default branch:        22384560
    vectorIntrinsic -svml: 25635648

> On Apr 9, 2019, at 12:33 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>
>> 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach.
>>
>> Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register).
>>
>> Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX).
>>
>> We can use this approach for already existing instructions too to reduce code size generated from AD files.
>