Vector performance issue.
Andrii Lomakin
lomakin.andrey at gmail.com
Mon Sep 18 15:10:59 UTC 2023
Hi,
I have the same problem during calculation of Eucledian distance in my
project too.
Writing just to confirm that it is not a single case and I have got the
same result during profiling.
On Sat, Sep 16, 2023 at 9:50 PM Jake Luciani <jake at apache.org> wrote:
> Hi,
>
> I've been struggling with a problem recently using the vector api.
> It appears as reduceLanes is not using the intrinsic.
>
> ns percent samples top
> ---------- ------- ------- ---
> 13240151836 88.21% 1324
> jdk.incubator.vector.FloatVector.reduceLanesTemplate
> 1349991099 8.99% 135
> jdk.incubator.vector.FloatVector.lanewiseTemplate
>
> I've tested openjdk 20 and 21 and my machine has AVX512.
>
> When I PrintIntrinsics I see the following (among others):
>
> ** missing constant: opr=RShiftI vclass=ConP etype=ConP vlen=ConI
>
> I've included a JMH benchmark that reproduces the issue.
>
> -Jake
>
> import jdk.incubator.vector.FloatVector;
> import jdk.incubator.vector.IntVector;
> import jdk.incubator.vector.ShortVector;
> import jdk.incubator.vector.VectorOperators;
> import org.openjdk.jmh.annotations.*;
> import org.openjdk.jmh.infra.Blackhole;
>
> import java.util.concurrent.ThreadLocalRandom;
> import java.util.concurrent.TimeUnit;
>
>
> @Warmup(iterations = 1, time = 5)
> @Measurement(iterations = 3, time = 5)
> @Fork(warmups = 1, value = 1, jvmArgsPrepend = {
> "--add-modules=jdk.incubator.vector",
> "--enable-preview"})
> public class VectorPerfBench
> {
> private static final int SIZE = 8192;
> private static final IntVector BF16_BYTE_SHIFT =
> IntVector.broadcast(IntVector.SPECIES_512, 16);
>
> public static short float32ToBFloat16(float f) {
> return (short) (Float.floatToIntBits(f) >> 16);
> }
> @State(Scope.Benchmark)
> public static class Parameters {
> final short[] s1 = new short[SIZE];
> final short[] s2 = new short[SIZE];
>
> public Parameters() {
> for (int i = 0; i < SIZE; i++) {
> s1[i] =
> float32ToBFloat16(ThreadLocalRandom.current().nextFloat());
> s2[i] =
> float32ToBFloat16(ThreadLocalRandom.current().nextFloat());
> }
> }
> }
>
> @Benchmark
> @OutputTimeUnit(TimeUnit.MILLISECONDS)
> @BenchmarkMode(Mode.Throughput)
> public void bfloatDot(Parameters p, Blackhole bh) {
> FloatVector acc = FloatVector.zero(FloatVector.SPECIES_512);
> for (int i = 0; i < SIZE; i += FloatVector.SPECIES_512.length()) {
>
> var f1 = ShortVector.fromArray(ShortVector.SPECIES_256, p.s1,
> i)
> .convertShape(VectorOperators.ZERO_EXTEND_S2I,
> IntVector.SPECIES_512, 0)
> .lanewise(VectorOperators.LSHL, BF16_BYTE_SHIFT)
> .reinterpretAsFloats();
>
> var f2 = ShortVector.fromArray(ShortVector.SPECIES_256, p.s2,
> i)
> .convertShape(VectorOperators.ZERO_EXTEND_S2I,
> IntVector.SPECIES_512, 0)
> .lanewise(VectorOperators.LSHL, BF16_BYTE_SHIFT)
> .reinterpretAsFloats();
>
> acc = acc.add(f1.mul(f2));
> }
>
> bh.consume(acc.reduceLanes(VectorOperators.ADD));
> }
>
> public static void main(String[] args) throws Exception {
> org.openjdk.jmh.Main.main(args);
> }
> }
>
--
Best regards,
Andrii Lomakin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230918/aea5ddd6/attachment-0001.htm>
More information about the panama-dev
mailing list