[vectorapi] Shape of the preferred species with UseAVX=0

Chris Hegarty chegar999 at gmail.com
Thu May 25 13:35:45 UTC 2023


Thanks Paul.

-Chris.

On 24/05/2023 18:36, Paul Sandoz wrote:
> Hi Chris,
> 
> getMaxLaneCount will return -1 *if* HotSpot is compiled with C2 
> disabled, a special case. It is not currently affected when C2 is 
> disabled at runtime, we could check that too but need to think through 
> the implications.
> 
> The Vector API degrades with functional equivalence but does not 
> currently generate the same code as if it were scalar code. 
> Unfortunately the fallback implementation in pure Java is currently slow 
> since it operates on instances of Vector etc as ordinary classes. There 
> is no C1 support to optimize this case.
> You should not rely on the UseAVX/UseSSE flags to restrict (they are 
> really there to emulate). For testing purposes I currently recommend 
> setting the MaxVectorSize to limit and check the preferred species or 
> the value of VectorShape.S_Max_BIT.vectorBitSize(), or otherwise 
> explicit detect C2 is disabled (not clear how easy that is).
> 
> Paul.
> 
>> On May 24, 2023, at 5:20 AM, Chris Hegarty <chegar999 at gmail.com> wrote:
>>
>> Hi,
>>
>> Thanks for your reply. Lemme try to clarify the actual source of the 
>> issue we see, UseAVX=0 might be related but also could be a red herring.
>>
>> Some Lucene test runs with C2 effectively "disabled", with 
>> -XX:TieredStopAtLevel=1, and we observe horribly slow performance - 
>> much worse than the scalar equivalent. And we see that the preferred 
>> species is still wider than expected.
>>
>> $ java --add-modules jdk.incubator.vector -XX:TieredStopAtLevel=1 
>> PrintPreferredVectorSize
>> WARNING: Using incubator modules: jdk.incubator.vector
>> Species[int, 16, S_512_BIT]
>>
>> But (and again this could be a red herring), 
>> VectorShape::getMaxVectorBitSize indicates that this should not be the 
>> case, e.g.
>>
>> // VectorSupport.getMaxLaneCount may return -1 if C2 is not enabled,
>> // or a value smaller than the S_64_BIT.vectorBitSize / 
>> elementSizeInBits if MaxVectorSize < 16
>> // If so default to S_64_BIT
>>
>> I had assume that this is because the code will not benefit from the 
>> C2 intrinsics.
>>
>> Basically, how so we determine whether the environment should use the 
>> Vector API or not (fallback to a scalar implementation)? I would 
>> assume this is a question that we should not have to answer - the 
>> Vectorized implementation should degrade gracefully.
>>
>> -Chris.
>>
>> On 24/05/2023 12:47, Quân Anh Mai wrote:
>>> Hi,
>>> In x86 there are 2 vector extension generations, SSE and AVX, and the 
>>> flags controlling these are UseSSE and UseAVX, respectively. By 
>>> setting UseAVX to 0, you are emulating an SSE4 machine, which has 
>>> 16-byte vector registers. 64-bit VM requires SSE2, so you cannot set 
>>> a value lower than 2 for UseSSE, and for all such values there are 
>>> always 16-byte vector registers available. As a result, the minimum 
>>> preferred vector size on x86_64 is 16 bytes.
>>> Regards,
>>> Quan Anh
>>> On Wed, 24 May 2023 at 19:39, Chris Hegarty <chegar999 at gmail.com 
>>> <mailto:chegar999 at gmail.com><mailto:chegar999 at gmail.com 
>>> <mailto:chegar999 at gmail.com>>> wrote:
>>>    Hi,
>>>    Over in Lucene-land we're experimenting with the Vector API, and ran
>>>    into an issue when testing against different environments. We're
>>>    effectively simulating different environments with Hotspot's command
>>>    line flags, and see an issue with `-XX:UseAVX=0`. With this option we
>>>    expect the shape of the preferred species to fallback to 64 bits,
>>>    but it
>>>    is actually 128.
>>>    Trivial reproducer that just prints the preferred int species:
>>>    $ java -version
>>>    openjdk version "21-ea" 2023-09-19
>>>    OpenJDK Runtime Environment (build 21-ea+23-1988)
>>>    OpenJDK 64-Bit Server VM (build 21-ea+23-1988, mixed mode, sharing)
>>>    $ cat PrintPreferredVectorSize.java
>>>    import jdk.incubator.vector.*;
>>>    public class PrintPreferredVectorSize {
>>>          public static void main(String... args) {
>>>              System.out.println(IntVector.SPECIES_PREFERRED);
>>>          }
>>>    }
>>>    // Start with the default - no flags, then downsize with 
>>> MaxVectorSize.
>>>    // All looks good.
>>>    $ java --add-modules jdk.incubator.vector PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 16, S_512_BIT]
>>>    $ java --add-modules jdk.incubator.vector -XX:MaxVectorSize=64
>>>    PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 16, S_512_BIT]
>>>    $ java --add-modules jdk.incubator.vector -XX:MaxVectorSize=32
>>>    PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 8, S_256_BIT]
>>>    $ java --add-modules jdk.incubator.vector -XX:MaxVectorSize=16
>>>    PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 4, S_128_BIT]
>>>    $ java --add-modules jdk.incubator.vector -XX:MaxVectorSize=8
>>>    PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 2, S_64_BIT]
>>>    ^^^ this all is as expected ^^^
>>>    Now do similar(ish) with UseAVX
>>>    $ java --add-modules jdk.incubator.vector -XX:UseAVX=3
>>>    PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 16, S_512_BIT]
>>>    $ java --add-modules jdk.incubator.vector -XX:UseAVX=2
>>>    PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 8, S_256_BIT]
>>>    $ java --add-modules jdk.incubator.vector -XX:UseAVX=1
>>>    PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 4, S_128_BIT]
>>>    $ java --add-modules jdk.incubator.vector -XX:UseAVX=0
>>>    PrintPreferredVectorSize
>>>    WARNING: Using incubator modules: jdk.incubator.vector
>>>    Species[int, 4, S_128_BIT]
>>>    It is this last one that is surprising to us. We expect similar
>>>    -XX:MaxVectorSize=8, which is Species[int, 2, S_64_BIT]
>>>    Firstly, is this a bug? Is our expectation correct? If not, then I'm
>>>    failing to understand something.
>>>    If I'm not mistaken, then I can see that the
>>>    Matcher::vector_width_in_bytes assumes a minimum of 16 regardless of
>>>    whether UseAVX is < 1. Trivially, and as a hack, I just retrofitted
>>>    vector_width_in_bytes to return 0 if UseAVX < 1. And we get
>>>    Species[int, 2, S_64_BIT].
>>>    The layer of these flags is not straightforward, so I won't pretend
>>>    that
>>>    my hack is the right way to fix this, but I just wanted to ensure
>>>    that I
>>>    was looking in the correct general area.
>>>    Thanks,
>>>    -Chris.
> 


More information about the panama-dev mailing list