[aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics
Andrew Haley
aph at redhat.com
Tue Jan 22 17:03:23 UTC 2019
On 1/21/19 10:53 AM, Pengfei Li (Arm Technology China) wrote:
> Webrev: http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.00/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8216259
The patch checks out fine, but there's one thing I'd like you to do. Please don't
repeat this block of code:
3317 // Below is a vectorized implementation of updating s1 and s2 for 16 bytes.
3318 // We use b1, b2, ..., b16 to denote the 16 bytes loaded in each iteration.
3319 // In non-vectorized code, we update s1 and s2 as:
3320 // s1 <- s1 + b1
3321 // s2 <- s2 + s1
3322 // s1 <- s1 + b2
3323 // s2 <- s2 + b1
3324 // ...
3325 // s1 <- s1 + b16
3326 // s2 <- s2 + s1
3327 // Putting above assignments together, we have:
3328 // s1_new = s1 + b1 + b2 + ... + b16
3329 // s2_new = s2 + (s1 + b1) + (s1 + b1 + b2) + ... + (s1 + b1 + b2 + ... + b16)
3330 // = s2 + s1 * 16 + (b1 * 16 + b2 * 15 + ... + b16 * 1)
3331 // = s2 + s1 * 16 + (b1, b2, ... b16) dot (16, 15, ... 1)
3332 __ ld1(vbytes, __ T16B, Address(__ post(buff, 16)));
3333
3334 // s2 = s2 + s1 * 16
3335 __ add(s2, s2, s1, Assembler::LSL, 4);
3336
3337 // vs1acc = b1 + b2 + b3 + ... + b16
3338 // vs2acc = (b1 * 16) + (b2 * 15) + (b3 * 14) + ... + (b16 * 1)
3339 __ umullv(vs2acc, __ T8B, vtable, vbytes);
3340 __ umlalv(vs2acc, __ T16B, vtable, vbytes);
3341 __ uaddlv(vs1acc, __ T16B, vbytes);
3342 __ uaddlv(vs2acc, __ T8H, vs2acc);
3343
3344 // s1 = s1 + vs1acc, s2 = s2 + vs2acc
3345 __ fmovd(temp0, vs1acc);
3346 __ fmovd(temp1, vs2acc);
3347 __ add(s1, s1, temp0);
3348 __ add(s2, s2, temp1);
3349
3350 __ subs(count, count, 16);
3351 __ br(Assembler::HS, L_nmax_loop);
Instead, please put it into a function (e.g. updateBytesCRC32C_inner)
and call it from updateBytesCRC32C. There's no point writing all this
stuff out twice.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-compiler-dev
mailing list