Vectorized Loop Unrolling on x64?
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Oct 24 17:03:51 UTC 2017
You are right - your initial examples are not supported by current HotSpot JIT vectorization.
Second example (sum/reduction) could be optimized https://bugs.openjdk.java.net/browse/JDK-8074981 but because generated
code is very expensive we limited it to cases where benefit overweights expense:
https://bugs.openjdk.java.net/browse/JDK-8078563.
Regards,
Vladimir
On 10/24/17 9:46 AM, Ionut wrote:
> Hello All,
>
> Meanwhile I tested two more other scenarios, as follows:
>
> - a[i] = b[i] + c[i] // where a, b, c are arrays of ints
> - a[i] = a[i] + <int_value> // where <int_value>might be a constant, etc
>
> In both cases they were vectorized, but my initial example (e.g. iterating through the array of ints and computing the
> sum of elements) is not ... which makes me think this case is currently not supported by JIT.
>
> Could you please confirm this?
>
> Regards
> Ionut
>
>
> On Tuesday, October 24, 2017 12:24 PM, Ionut <ionutb83 at yahoo.com> wrote:
>
>
> Hi Nils,
>
> Thanks, it is clear. However, I have tried a simple example (e.g. just iterating through an array and do the sum
> using JMH) on my x64 Linux and it seems to not be vectorized ... Below initial source code and assembly.
> Could you please provide me any hint, am I doing something wrong?
>
> *JDK is 9.0.1*
>
> *_Source code:_*
>
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)
> @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)
> @Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", "-Xbatch", "-XX:+UseSuperWord" })
> @State(Scope.Benchmark)
> public class _Sum1ToNArray _{
> private int[] array;
>
> public static void main(String[] args) {
> Options opt =
> new OptionsBuilder()
> .include(Sum1ToNArray.class.getSimpleName())
> .build();
> new Runner(opt).run();
> }
>
> @Setup(Level.Trial)
> public void setUp() {
> this.array = new int[100_000_000];
> for (int i = 0; i < array.length; i++)
> array[i] = i + 1;
> }
>
> @Benchmark
> public long hotMethod() {
> long sum = 0;
> for (int i = 0; i < array.length; i++) {
> sum += array[i];
> }
> return sum;
> }
> }
>
> *_Assembly:_*
> ....[Hottest Region 1]..............................................................................
> c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes)
>
> 0x00007f7bf1bff0f9: mov r8d,r10d
> 0x00007f7bf1bff0fc: add r8d,0xfffffff9
> 0x00007f7bf1bff100: mov r11d,0x1
> 0x00007f7bf1bff106: cmp r8d,0x1
> ╭ 0x00007f7bf1bff10a: jg 0x00007f7bf1bff114
> │ 0x00007f7bf1bff10c: mov rax,rdx
> │╭ 0x00007f7bf1bff10f: jmp 0x00007f7bf1bff15d
> ││↗ 0x00007f7bf1bff111: mov rdx,rax ;*lload_1 {reexecute=0 rethrow=0
> return_oop=0}
> │││ ; -
> com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)
> ↘││ 0x00007f7bf1bff114: movsxd rsi,DWORD PTR [r14+r11*4+0x10]
> 11.08% 8.55% ││ 0x00007f7bf1bff119: movsxd rbp,DWORD PTR [r14+r11*4+0x14]
> 0.30% 0.17% ││ 0x00007f7bf1bff11e: movsxd r13,DWORD PTR [r14+r11*4+0x18]
> ││ 0x00007f7bf1bff123: movsxd rax,DWORD PTR [r14+r11*4+0x2c]
> 8.86% 2.85% ││ 0x00007f7bf1bff128: movsxd r9,DWORD PTR [r14+r11*4+0x28]
> 10.49% 23.29% ││ 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR [r14+r11*4+0x24]
> 0.38% 0.45% ││ 0x00007f7bf1bff132: movsxd rbx,DWORD PTR [r14+r11*4+0x20]
> 0.03% 0.06% ││ 0x00007f7bf1bff137: movsxd rdi,DWORD PTR [r14+r11*4+0x1c]
> 0.23% 0.22% ││ 0x00007f7bf1bff13c: add rsi,rdx
> 10.58% 18.59% ││ 0x00007f7bf1bff13f: add rbp,rsi
> 0.32% 0.17% ││ 0x00007f7bf1bff142: add r13,rbp
> 0.05% 0.04% ││ 0x00007f7bf1bff145: add rdi,r13
> 26.10% 28.47% ││ 0x00007f7bf1bff148: add rbx,rdi
> 5.55% 5.48% ││ 0x00007f7bf1bff14b: add rcx,rbx
> 5.66% 1.32% ││ 0x00007f7bf1bff14e: add r9,rcx
> 7.85% 3.11% ││ 0x00007f7bf1bff151: add rax,r9 ;*ladd {reexecute=0 rethrow=0 return_oop=0}
> ││ ; -
> com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
> 10.19% 5.67% ││ 0x00007f7bf1bff154: add r11d,0x8 ;*iinc {reexecute=0 rethrow=0 return_oop=0}
> ││ ; -
> com.jpt.Sum1ToNArray::hotMethod at 23 (line 52)
> 0.38% 0.12% ││ 0x00007f7bf1bff158: cmp r11d,r8d
> │╰ 0x00007f7bf1bff15b: jl 0x00007f7bf1bff111 ;*if_icmpge {reexecute=0 rethrow=0
> return_oop=0}
> │ ; -
> com.jpt.Sum1ToNArray::hotMethod at 10 (line 52)
> ↘ 0x00007f7bf1bff15d: cmp r11d,r10d
> 0x00007f7bf1bff160: jge 0x00007f7bf1bff174
> 0x00007f7bf1bff162: xchg ax,ax ; *lload_1 {reexecute=0
> rethrow=0 return_oop=0}
> ; -
> com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)
> 0x00007f7bf1bff164: movsxd r8,DWORD PTR [r14+r11*4+0x10]
> 0x00007f7bf1bff169: add rax,r8 ;*ladd {reexecute=0
> rethrow=0 return_oop=0}
> ; -
> com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
>
> Regards
>
>
> On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
>
>
> Hi Ionut,
> In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64.
> Regards,
> Nils Eliasson
>
> On 2017-10-24 11:05, Ionut wrote:
>> Hello All,
>>
>> I want to ask you about https://bugs.openjdk.java.net/browse/JDK-8129920* - Vectorized loop unrolling *which says
>> it is applicable _only__for x86 targets_. Do you plan to port this for x64 as well? Or I miss something here?
>>
>> Regards
>> Ionut
>
>
>
>
>
More information about the hotspot-compiler-dev
mailing list