RFR(S): Vector popcount support

Tue Mar 13 00:47:48 UTC 2018

Hi Vladimir,

Thank you so much for the quick review.

I have addressed following issues:
- renamed support_avx512_vpopcntdq to supports_vpopcntdq
- No longer check UseAVX > 2 following your suggestion to clear flag when AVX512 is not available
- Added length checking in predicate
- Added appropriate compiler tests to exercise functionality

Update patch is available at: http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_03/

Regarding b and w variants, semantics of popcount are not consistent with other instructions which do allow narrowing to subword types. Namely, since popcount counts bits, narrowing ends up losing bit count information. Potentially, this can be fixed with some additional instructions being generated that test top bit and add appropriate count to each lane based on that top bit. But given users are likely to use Integer.bitCount without casting result to byte/short, I don't believe adding these variants is useful. If you believe otherwise, I can try to see what I can do.

Regarding q variant, it is currently not easily supported in vectorizer without some fundamental changes. That is because Long.bitCount returns int instead of long. The type mismatch in same chain of operations does not pass vectorizer alignment checking.

Let me know if you have any more comments or suggestions!

Thanks,
Razvan

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Friday, March 09, 2018 4:03 PM
To: Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR(S): Vector popcount support

Hi Razvan,

In general changes are good. Do you plans to add vpopcntb,w too?

Use 'supports' with 's' in name as in other support functions names:
supports_avx512_vpopcntdq()

Also why use avx512 in function name? I know it is CPUID bit name. But do you have other vpopcntdq instructions, not avx512?

In assembler_x86.cpp and other places you don't need to check UseAVX,
support_avx512_vpopcntdq() is enough. You can clear feature bit in vm_version_x86.cpp when AVX < 3:

   if (UseAVX < 3) {
     _features &= ~CPU_AVX512F;
     _features &= ~CPU_AVX512DQ;
     _features &= ~CPU_AVX512CD;
     _features &= ~CPU_AVX512BW;
     _features &= ~CPU_AVX512VL;
+   _features &= ~CPU_AVX512_VPOPCNTDQ;
   }

In x86.ad you forgot to add length check in predicate() like next:

instruct vadd2I_reg(vecD dst, vecD src1, vecD src2) %{
   predicate(UseAVX > 0 && n->as_Vector()->length() == 2);

And, please, add code generation test to test/hotspot/jtreg/compiler/vectorization/ tests to verify correctness of vector operations.

Thanks,
Vladimir

On 3/9/18 2:54 PM, Lupusoru, Razvan A wrote:
> Hi everyone,
> 
> As per "Intel Architecture Instruction Set Extensions and Future 
> Features Programming Reference" manual [1], vector popcount 
> instruction will be supported in future Intel ISA. I have updated the 
> superword vectorizer to take advantage of this instruction. I have 
> tested with Intel SDE [2] to confirm encoding and semantics are 
> correctly implemented. Please take a look and let me know if you have 
> any questions or comments.
> 
> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_vpopcount_01/index
> .html
> 
> Thanks,
> 
> Razvan
> 
> [1]
> https://software.intel.com/sites/default/files/managed/c5/15/architect
> ure-instruction-set-extensions-programming-reference.pdf
> 
> [2]
> https://software.intel.com/en-us/articles/intel-software-development-e
> mulator
> 
> [3] https://bugs.openjdk.java.net/browse/JDK-8199421
>