[vectorapi] Shape of the preferred species with UseAVX=0
Chris Hegarty
chegar999 at gmail.com
Thu May 25 13:35:45 UTC 2023
Thanks Paul.
-Chris.
On 24/05/2023 18:36, Paul Sandoz wrote:
> Hi Chris,
>
> getMaxLaneCount will return -1 *if* HotSpot is compiled with C2
> disabled, a special case. It is not currently affected when C2 is
> disabled at runtime, we could check that too but need to think through
> the implications.
>
> The Vector API degrades with functional equivalence but does not
> currently generate the same code as if it were scalar code.
> Unfortunately the fallback implementation in pure Java is currently slow
> since it operates on instances of Vector etc as ordinary classes. There
> is no C1 support to optimize this case.
> You should not rely on the UseAVX/UseSSE flags to restrict (they are
> really there to emulate). For testing purposes I currently recommend
> setting the MaxVectorSize to limit and check the preferred species or
> the value of VectorShape.S_Max_BIT.vectorBitSize(), or otherwise
> explicit detect C2 is disabled (not clear how easy that is).
>
> Paul.
>
>> On May 24, 2023, at 5:20 AM, Chris Hegarty <chegar999 at gmail.com> wrote:
>>
>> Hi,
>>
>> Thanks for your reply. Lemme try to clarify the actual source of the
>> issue we see, UseAVX=0 might be related but also could be a red herring.
>>
>> Some Lucene test runs with C2 effectively "disabled", with
>> -XX:TieredStopAtLevel=1, and we observe horribly slow performance -
>> much worse than the scalar equivalent. And we see that the preferred
>> species is still wider than expected.
>>
>> $ java --add-modules jdk.incubator.vector -XX:TieredStopAtLevel=1
>> PrintPreferredVectorSize
>> WARNING: Using incubator modules: jdk.incubator.vector
>> Species[int, 16, S_512_BIT]
>>
>> But (and again this could be a red herring),
>> VectorShape::getMaxVectorBitSize indicates that this should not be the
>> case, e.g.
>>
>> // VectorSupport.getMaxLaneCount may return -1 if C2 is not enabled,
>> // or a value smaller than the S_64_BIT.vectorBitSize /
>> elementSizeInBits if MaxVectorSize < 16
>> // If so default to S_64_BIT
>>
>> I had assume that this is because the code will not benefit from the
>> C2 intrinsics.
>>
>> Basically, how so we determine whether the environment should use the
>> Vector API or not (fallback to a scalar implementation)? I would
>> assume this is a question that we should not have to answer - the
>> Vectorized implementation should degrade gracefully.
>>
>> -Chris.
>>
>> On 24/05/2023 12:47, Quân Anh Mai wrote:
>>> Hi,
>>> In x86 there are 2 vector extension generations, SSE and AVX, and the
>>> flags controlling these are UseSSE and UseAVX, respectively. By
>>> setting UseAVX to 0, you are emulating an SSE4 machine, which has
>>> 16-byte vector registers. 64-bit VM requires SSE2, so you cannot set
>>> a value lower than 2 for UseSSE, and for all such values there are
>>> always 16-byte vector registers available. As a result, the minimum
>>> preferred vector size on x86_64 is 16 bytes.
>>> Regards,
>>> Quan Anh
>>> On Wed, 24 May 2023 at 19:39, Chris Hegarty <chegar999 at gmail.com
>>> <mailto:chegar999 at gmail.com><mailto:chegar999 at gmail.com
>>> <mailto:chegar999 at gmail.com>>> wrote:
>>> Hi,
>>> Over in Lucene-land we're experimenting with the Vector API, and ran
>>> into an issue when testing against different environments. We're
>>> effectively simulating different environments with Hotspot's command
>>> line flags, and see an issue with `-XX:UseAVX=0`. With this option we
>>> expect the shape of the preferred species to fallback to 64 bits,
>>> but it
>>> is actually 128.
>>> Trivial reproducer that just prints the preferred int species:
>>> $ java -version
>>> openjdk version "21-ea" 2023-09-19
>>> OpenJDK Runtime Environment (build 21-ea+23-1988)
>>> OpenJDK 64-Bit Server VM (build 21-ea+23-1988, mixed mode, sharing)
>>> $ cat PrintPreferredVectorSize.java
>>> import jdk.incubator.vector.*;
>>> public class PrintPreferredVectorSize {
>>> public static void main(String... args) {
>>> System.out.println(IntVector.SPECIES_PREFERRED);
>>> }
>>> }
>>> // Start with the default - no flags, then downsize with
>>> MaxVectorSize.
>>> // All looks good.
>>> $ java --add-modules jdk.incubator.vector PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 16, S_512_BIT]
>>> $ java --add-modules jdk.incubator.vector -XX:MaxVectorSize=64
>>> PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 16, S_512_BIT]
>>> $ java --add-modules jdk.incubator.vector -XX:MaxVectorSize=32
>>> PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 8, S_256_BIT]
>>> $ java --add-modules jdk.incubator.vector -XX:MaxVectorSize=16
>>> PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 4, S_128_BIT]
>>> $ java --add-modules jdk.incubator.vector -XX:MaxVectorSize=8
>>> PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 2, S_64_BIT]
>>> ^^^ this all is as expected ^^^
>>> Now do similar(ish) with UseAVX
>>> $ java --add-modules jdk.incubator.vector -XX:UseAVX=3
>>> PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 16, S_512_BIT]
>>> $ java --add-modules jdk.incubator.vector -XX:UseAVX=2
>>> PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 8, S_256_BIT]
>>> $ java --add-modules jdk.incubator.vector -XX:UseAVX=1
>>> PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 4, S_128_BIT]
>>> $ java --add-modules jdk.incubator.vector -XX:UseAVX=0
>>> PrintPreferredVectorSize
>>> WARNING: Using incubator modules: jdk.incubator.vector
>>> Species[int, 4, S_128_BIT]
>>> It is this last one that is surprising to us. We expect similar
>>> -XX:MaxVectorSize=8, which is Species[int, 2, S_64_BIT]
>>> Firstly, is this a bug? Is our expectation correct? If not, then I'm
>>> failing to understand something.
>>> If I'm not mistaken, then I can see that the
>>> Matcher::vector_width_in_bytes assumes a minimum of 16 regardless of
>>> whether UseAVX is < 1. Trivially, and as a hack, I just retrofitted
>>> vector_width_in_bytes to return 0 if UseAVX < 1. And we get
>>> Species[int, 2, S_64_BIT].
>>> The layer of these flags is not straightforward, so I won't pretend
>>> that
>>> my hack is the right way to fix this, but I just wanted to ensure
>>> that I
>>> was looking in the correct general area.
>>> Thanks,
>>> -Chris.
>
More information about the panama-dev
mailing list