[aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics

Andrew Haley aph at redhat.com
Tue Jan 22 17:03:23 UTC 2019


On 1/21/19 10:53 AM, Pengfei Li (Arm Technology China) wrote:
> Webrev: http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8216259

The patch checks out fine, but there's one thing I'd like you to do. Please don't
repeat this block of code:

3317     // Below is a vectorized implementation of updating s1 and s2 for 16 bytes.
3318     // We use b1, b2, ..., b16 to denote the 16 bytes loaded in each iteration.
3319     // In non-vectorized code, we update s1 and s2 as:
3320     //   s1 <- s1 + b1
3321     //   s2 <- s2 + s1
3322     //   s1 <- s1 + b2
3323     //   s2 <- s2 + b1
3324     //   ...
3325     //   s1 <- s1 + b16
3326     //   s2 <- s2 + s1
3327     // Putting above assignments together, we have:
3328     //   s1_new = s1 + b1 + b2 + ... + b16
3329     //   s2_new = s2 + (s1 + b1) + (s1 + b1 + b2) + ... + (s1 + b1 + b2 + ... + b16)
3330     //          = s2 + s1 * 16 + (b1 * 16 + b2 * 15 + ... + b16 * 1)
3331     //          = s2 + s1 * 16 + (b1, b2, ... b16) dot (16, 15, ... 1)
3332     __ ld1(vbytes, __ T16B, Address(__ post(buff, 16)));
3333
3334     // s2 = s2 + s1 * 16
3335     __ add(s2, s2, s1, Assembler::LSL, 4);
3336
3337     // vs1acc = b1 + b2 + b3 + ... + b16
3338     // vs2acc = (b1 * 16) + (b2 * 15) + (b3 * 14) + ... + (b16 * 1)
3339     __ umullv(vs2acc, __ T8B, vtable, vbytes);
3340     __ umlalv(vs2acc, __ T16B, vtable, vbytes);
3341     __ uaddlv(vs1acc, __ T16B, vbytes);
3342     __ uaddlv(vs2acc, __ T8H, vs2acc);
3343
3344     // s1 = s1 + vs1acc, s2 = s2 + vs2acc
3345     __ fmovd(temp0, vs1acc);
3346     __ fmovd(temp1, vs2acc);
3347     __ add(s1, s1, temp0);
3348     __ add(s2, s2, temp1);
3349
3350     __ subs(count, count, 16);
3351     __ br(Assembler::HS, L_nmax_loop);

Instead, please put it into a function (e.g. updateBytesCRC32C_inner)
and call it from updateBytesCRC32C. There's no point writing all this
stuff out twice.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the hotspot-compiler-dev mailing list