RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2]

Thu Jan 25 13:41:27 UTC 2024

On Thu, 25 Jan 2024 07:58:14 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

>> @chhagedorn: Do you mean that `test_divc` and `test_divc_n` vectorize after JDK-8282365? They don't vectorize on my machine (on this PR).
>
> I just checked in my machine (on top of commit fb822e49f2a84423c8fd17db2e95bbdd5e7ec191) and these division tests do seem to vectorize, this is e.g. the innermost loop in `test_divc` right before code emission:
> 
> ![test_divc](https://github.com/openjdk/jdk/assets/8792647/129d51c2-a1ad-4d02-ab81-02cd849af36f)
> 
> Here are my processor features in case it helps (subset of `lscpu` output):
> 
> 
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         39 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  12
>   On-line CPU(s) list:   0-11
> Vendor ID:               GenuineIntel
>   Model name:            Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
>     CPU family:          6
>     Model:               158
>     Thread(s) per core:  2
>     Core(s) per socket:  6
> (...)
>     Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
>                          a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss 
>                          ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
>                           arch_perfmon pebs bts rep_good nopl xtopology nonstop_
>                          tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp
>                          l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss
>                          e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes 
>                          xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f
>                          ault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhan
>                          ced tpr_shadow flexpriority ept vpid ept_ad fsgsbase ts
>                          c_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad
>                          x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsav
>                          es dtherm ida arat pln pts hwp hwp_notify hwp_act_windo
>                          w hwp_epp vnmi md_clear flush_l1d arch_capabilities
> (...)

Thanks for the clarification @robcasloz and @chhagedorn. I've investigated now, and they do vectorize on my machine as well. I was confused because, before the change below, the IR framework did not register the nodes (wrong vector size of 4 instead of the default of 8). Is that expected, and should we specify something else instead of the catch-all `IRNode.VECTOR_SIZE_ANY`?


     @Test
-    @IR(counts = { IRNode.ADD_VI,    "> 0",
-                   IRNode.RSHIFT_VI, "> 0",
-                   IRNode.SUB_VI,    "> 0" },
+    @IR(counts = { IRNode.ADD_VI,    IRNode.VECTOR_SIZE_ANY, "> 0",
+                   IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0",
+                   IRNode.SUB_VI,    IRNode.VECTOR_SIZE_ANY, "> 0" },
         applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"})
     void test_divc(int[] a0, int[] a1) {
         for (int i = 0; i < a0.length; i+=1) {
@@ -519,9 +519,9 @@ void test_divc(int[] a0, int[] a1) {
     }
 
     @Test
-    @IR(counts = { IRNode.ADD_VI,    "> 0",
-                   IRNode.RSHIFT_VI, "> 0",
-                   IRNode.SUB_VI,    "> 0" },
+    @IR(counts = { IRNode.ADD_VI,    IRNode.VECTOR_SIZE_ANY, "> 0",
+                   IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0",
+                   IRNode.SUB_VI,    IRNode.VECTOR_SIZE_ANY, "> 0" },
         applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"})
     void test_divc_n(int[] a0, int[] a1) {
         for (int i = 0; i < a0.length; i+=1) {

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466392626