Vectorized Loop Unrolling on x64?
Ionut
ionutb83 at yahoo.com
Tue Oct 24 16:46:17 UTC 2017
Hello All,
Meanwhile I tested two more other scenarios, as follows:
- a[i] = b[i] + c[i] // where a, b, c are arrays of ints- a[i] = a[i] + <int_value> // where <int_value>might be a constant, etc
In both cases they were vectorized, but my initial example (e.g. iterating through the array of ints and computing the sum of elements) is not ... which makes me think this case is currently not supported by JIT.
Could you please confirm this?
RegardsIonut
On Tuesday, October 24, 2017 12:24 PM, Ionut <ionutb83 at yahoo.com> wrote:
Hi Nils,
Thanks, it is clear. However, I have tried a simple example (e.g. just iterating through an array and do the sum using JMH) on my x64 Linux and it seems to not be vectorized ... Below initial source code and assembly. Could you please provide me any hint, am I doing something wrong?
JDK is 9.0.1
Source code:
@BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.NANOSECONDS)@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", "-Xbatch", "-XX:+UseSuperWord" })@State(Scope.Benchmark)public class Sum1ToNArray { private int[] array;
public static void main(String[] args) { Options opt =
new OptionsBuilder() .include(Sum1ToNArray.class.getSimpleName()) .build(); new Runner(opt).run(); }
@Setup(Level.Trial) public void setUp() { this.array = new int[100_000_000]; for (int i = 0; i < array.length; i++) array[i] = i + 1; }
@Benchmark public long hotMethod() {
long sum = 0; for (int i = 0; i < array.length; i++) { sum += array[i]; } return sum; }}
Assembly:....[Hottest Region 1]..............................................................................c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes)
0x00007f7bf1bff0f9: mov r8d,r10d 0x00007f7bf1bff0fc: add r8d,0xfffffff9 0x00007f7bf1bff100: mov r11d,0x1 0x00007f7bf1bff106: cmp r8d,0x1 ╭ 0x00007f7bf1bff10a: jg 0x00007f7bf1bff114 │ 0x00007f7bf1bff10c: mov rax,rdx │╭ 0x00007f7bf1bff10f: jmp 0x00007f7bf1bff15d ││↗ 0x00007f7bf1bff111: mov rdx,rax ;*lload_1 {reexecute=0 rethrow=0 return_oop=0} │││ ; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53) ↘││ 0x00007f7bf1bff114: movsxd rsi,DWORD PTR [r14+r11*4+0x10] 11.08% 8.55% ││ 0x00007f7bf1bff119: movsxd rbp,DWORD PTR [r14+r11*4+0x14] 0.30% 0.17% ││ 0x00007f7bf1bff11e: movsxd r13,DWORD PTR [r14+r11*4+0x18] ││ 0x00007f7bf1bff123: movsxd rax,DWORD PTR [r14+r11*4+0x2c] 8.86% 2.85% ││ 0x00007f7bf1bff128: movsxd r9,DWORD PTR [r14+r11*4+0x28] 10.49% 23.29% ││ 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR [r14+r11*4+0x24] 0.38% 0.45% ││ 0x00007f7bf1bff132: movsxd rbx,DWORD PTR [r14+r11*4+0x20] 0.03% 0.06% ││ 0x00007f7bf1bff137: movsxd rdi,DWORD PTR [r14+r11*4+0x1c] 0.23% 0.22% ││ 0x00007f7bf1bff13c: add rsi,rdx 10.58% 18.59% ││ 0x00007f7bf1bff13f: add rbp,rsi 0.32% 0.17% ││ 0x00007f7bf1bff142: add r13,rbp 0.05% 0.04% ││ 0x00007f7bf1bff145: add rdi,r13 26.10% 28.47% ││ 0x00007f7bf1bff148: add rbx,rdi 5.55% 5.48% ││ 0x00007f7bf1bff14b: add rcx,rbx 5.66% 1.32% ││ 0x00007f7bf1bff14e: add r9,rcx 7.85% 3.11% ││ 0x00007f7bf1bff151: add rax,r9 ;*ladd {reexecute=0 rethrow=0 return_oop=0} ││ ; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53) 10.19% 5.67% ││ 0x00007f7bf1bff154: add r11d,0x8 ;*iinc {reexecute=0 rethrow=0 return_oop=0} ││ ; - com.jpt.Sum1ToNArray::hotMethod at 23 (line 52) 0.38% 0.12% ││ 0x00007f7bf1bff158: cmp r11d,r8d │╰ 0x00007f7bf1bff15b: jl 0x00007f7bf1bff111 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} │ ; - com.jpt.Sum1ToNArray::hotMethod at 10 (line 52) ↘ 0x00007f7bf1bff15d: cmp r11d,r10d 0x00007f7bf1bff160: jge 0x00007f7bf1bff174 0x00007f7bf1bff162: xchg ax,ax ; *lload_1 {reexecute=0 rethrow=0 return_oop=0} ; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53) 0x00007f7bf1bff164: movsxd r8,DWORD PTR [r14+r11*4+0x10] 0x00007f7bf1bff169: add rax,r8 ;*ladd {reexecute=0 rethrow=0 return_oop=0} ; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
Regards
On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
Hi Ionut, In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64. Regards, Nils Eliasson
On 2017-10-24 11:05, Ionut wrote:
Hello All,
I want to ask you about https://bugs.openjdk.java.net/browse/JDK-8129920 - Vectorized loop unrolling which says it is applicable only for x86 targets. Do you plan to port this for x64 as well? Or I miss something here?
Regards Ionut
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171024/cdcfaca1/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list