X25519 experiment: access to VPMULDQ
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Jul 20 12:05:28 UTC 2018
Hi Adam,
I think more promising alternative approach to enable VPMULDQ usage in
backend would be to special-case a pair of vector cast (int-to-long) +
long multiply operations:
Species<Long, S256Bit> s = LongVector.species(S_256_BIT);
Vector<Integer,S128Bit> vi1 = ..., vi2 = ...;
Vector<Long,S128Bit> vl1 = vi1.cast(s), vi2 = vi2.cast(s);
Vector<Long,S256Bit> mul = vl1.mul(vl2); // VPMULDQ
In such case, compiler can infer that upper element parts don't affect
the result (irrespective of whether it was sign- or zero-extended) and
it's fine to implement long vector multiply using VPMULDQ.
Best regards,
Vladimir Ivanov
On 17/07/2018 22:59, Adam Petcher wrote:
> I'm continuing with my experiment with X25519 on the vectorIntrinsics
> branch, and I have a Vector API question. Is there a way to express a
> vectorized 32x32->64 bit multiply? On AVX, this translates to the
> VPMULDQ instruction. In other words, I think I'm looking for something
> like IntVector::mul(Vector<Integer, S>) that returns a LongVector<S>.
> I'm currently using LongVector::mul, but I don't have VPMULLQ on my
> system, so the resulting assembly does some unnecessary work to
> incorporate the high dwords (which are always zero) into the result.
>
> For more background on my goal, I'm trying to implement a variant of
> Sandy2x[1]. Specifically, I want to be able to do something like the the
> radix 2^25.5 multiplication/reduction in section 2.2. Though I'm using a
> signed representation, so I would prefer to use VPMULDQ instead of
> VPMULUDQ, but I could probably make it work either way.
>
> [1] https://eprint.iacr.org/2015/943.pdf
>
More information about the panama-dev
mailing list