RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2]
Roberto Castañeda Lozano
rcastanedalo at openjdk.org
Thu Jan 25 14:19:38 UTC 2024
On Thu, 25 Jan 2024 13:38:20 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:
>> I just checked in my machine (on top of commit fb822e49f2a84423c8fd17db2e95bbdd5e7ec191) and these division tests do seem to vectorize, this is e.g. the innermost loop in `test_divc` right before code emission:
>>
>> ![test_divc](https://github.com/openjdk/jdk/assets/8792647/129d51c2-a1ad-4d02-ab81-02cd849af36f)
>>
>> Here are my processor features in case it helps (subset of `lscpu` output):
>>
>>
>> Architecture: x86_64
>> CPU op-mode(s): 32-bit, 64-bit
>> Address sizes: 39 bits physical, 48 bits virtual
>> Byte Order: Little Endian
>> CPU(s): 12
>> On-line CPU(s) list: 0-11
>> Vendor ID: GenuineIntel
>> Model name: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
>> CPU family: 6
>> Model: 158
>> Thread(s) per core: 2
>> Core(s) per socket: 6
>> (...)
>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
>> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
>> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
>> arch_perfmon pebs bts rep_good nopl xtopology nonstop_
>> tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp
>> l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss
>> e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
>> xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f
>> ault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhan
>> ced tpr_shadow flexpriority ept vpid ept_ad fsgsbase ts
>> c_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad
>> x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsav
>> es dtherm ida arat pln pts hwp hwp_notify hwp_act_windo
>> w hwp_epp vnmi md_clear flush_l1d arch_capabilities
>> (...)
>
> Thanks for the clarification @robcasloz and @chhagedorn. I've investigated now, and they do vectorize on my machine as well. I was confused because, before the change below, the IR framework did not register the nodes (wrong vector size of 4 instead of the default of 8). Is that expected, and should we specify something else instead of the catch-all `IRNode.VECTOR_SIZE_ANY`?
>
>
> @Test
> - @IR(counts = { IRNode.ADD_VI, "> 0",
> - IRNode.RSHIFT_VI, "> 0",
> - IRNode.SUB_VI, "> 0" },
> + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0",
> + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0",
> + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" },
> applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"})
> void test_divc(int[] a0, int[] a1) {
> for (int i = 0; i < a0.length; i+=1) {
> @@ -519,9 +519,9 @@ void test_divc(int[] a0, int[] a1) {
> }
>
> @Test
> - @IR(counts = { IRNode.ADD_VI, "> 0",
> - IRNode.RSHIFT_VI, "> 0",
> - IRNode.SUB_VI, "> 0" },
> + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0",
> + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0",
> + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" },
> applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"})
> void test_divc_n(int[] a0, int[] a1) {
> for (int i = 0; i < a0.length; i+=1) {
> @dlunde do you understand what factors determine the length of the vector? Why is the default of IRNode.VECTOR_SIZE_MAX not working?
Perhaps C2 hits the loop unrolling limit? @dlunde you can test this by trying out a large value for `-XX:LoopUnrollLimit`. But even if this turned out to be the case, I would still suggest using `IRNode.VECTOR_SIZE_ANY` rather than forcing a higher loop unroll limit value for the tests.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466441530
More information about the hotspot-compiler-dev
mailing list