AVX512 intrinsics not taken on CPU supporting AVX512?

Fri May 29 18:20:34 UTC 2020

I was just measuring performance of this code:
```
fromArray(SPECIES_512, es, 0).intoByteBuffer(bb, 0, nativeOrder());
```
comparing it with:
```
fromArray(SPECIES_256, es, 0).intoByteBuffer(bb, 0, nativeOrder());
fromArray(SPECIES_256, es, 8).intoByteBuffer(bb, 32, nativeOrder());
```
and found that the former was more than 10x slower than the latter on
a Xeon Platinum 8124M, which according to cpuinfo does support AVX512:
```
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm
constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm
3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb
avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
```
Are AVX512 mov intrinsics not implemented right now or why are they not
taken?
Thanks!

Current benchmark results:
https://github.com/JOML-CI/panama-vector-bench#with--djdkincubatorvectorvector_access_oob_check0-and-abstractshufflecheckindexes_use_vector_access_oob_checkpatch-1

Kai.