RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
Richard Startin
github.com+16439049+richardstartin at openjdk.java.net
Mon Sep 13 17:28:46 UTC 2021
On Thu, 13 May 2021 20:23:16 GMT, Richard Startin <github.com+16439049+richardstartin at openjdk.org> wrote:
>> In private correspondence with Vladimir, it was explained that where Vladimir's code and Laurent's code are identical, including typos ([Vladimir's code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672), [Laurent's code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719)) it is because Vladimir sent the code to Laurent, not the other way around, therefore Vladimir's code does not derive from Laurent's, and it does not derive from mine. I can only trust that this is the case, so please disregard my claim that this is derivative work when reviewing this PR.
>
> For what it's worth, I benchmarked this implementation radix sort ([adapted here to fit in to my harness](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR681-R710782)) against a [signed variant](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR396-R478) of what I have claimed this work was derived from and the proposed implementation does not perform favourably on uniformly random data:
>
>
>
> Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units
> RadixSortBenchmark.jdk 17 7 UNIFORM 0 1000000 avgt 5 11301.950 ± 113.691 us/op
> RadixSortBenchmark.jdk 23 7 UNIFORM 0 1000000 avgt 5 11792.351 ± 60.757 us/op
> RadixSortBenchmark.jdk 30 7 UNIFORM 0 1000000 avgt 5 11184.616 ± 67.094 us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 17 7 UNIFORM 0 1000000 avgt 5 9564.626 ± 69.497 us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 23 7 UNIFORM 0 1000000 avgt 5 9432.085 ± 58.983 us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 30 7 UNIFORM 0 1000000 avgt 5 10772.975 ± 51.848 us/op
>
>
>
> I believe the root cause is a defect in the mechanism employed to skip passes as can be seen by the increased number of instructions and cycles here. In the proposed implementation, instructions is roughly constant as a function of bits. In the case where all passes must be performed (bits = 30), IPC is superior in `unrollOnePassHistogramsSkipLevelsSigned`.
>
>
> Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units
> RadixSortBenchmark.jdk:cycles 17 7 UNIFORM 0 1000000 avgt 34976971.877 #/op
> RadixSortBenchmark.jdk:instructions 17 7 UNIFORM 0 1000000 avgt 70121142.003 #/op
> RadixSortBenchmark.jdk:cycles 23 7 UNIFORM 0 1000000 avgt 32369970.385 #/op
> RadixSortBenchmark.jdk:instructions 23 7 UNIFORM 0 1000000 avgt 70201664.963 #/op
> RadixSortBenchmark.jdk:cycles 30 7 UNIFORM 0 1000000 avgt 30789736.602 #/op
> RadixSortBenchmark.jdk:instructions 30 7 UNIFORM 0 1000000 avgt 70180942.122 #/op
> RadixSortBenchmark.jdk:IPC 30 7 UNIFORM 0 1000000 avgt 2.279 insns/clk
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles 17 7 UNIFORM 0 1000000 avgt 26983994.479 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions 17 7 UNIFORM 0 1000000 avgt 62065304.827 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles 23 7 UNIFORM 0 1000000 avgt 26161703.124 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions 23 7 UNIFORM 0 1000000 avgt 63102683.167 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles 30 7 UNIFORM 0 1000000 avgt 29780103.795 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions 30 7 UNIFORM 0 1000000 avgt 74437097.025 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:IPC 30 7 UNIFORM 0 1000000 avgt 2.500 insns/clk
>
>
>
> The biggest difference in executed code appears to be that bounds checks are not eliminated in the first counting pass in the proposed implementation:
>
>
> ....[Hottest Region 1]..............................................................................
> c2, level 4, io.github.richardstartin.radixsort.RadixSort::jdk, version 402 (166 bytes)
>
> 0x00007f0d2fb20579: cmp $0x100,%r10d
> 0x00007f0d2fb20580: jae 0x00007f0d2fb21344
> 0x00007f0d2fb20586: decl 0x10(%rdx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
> 0x00007f0d2fb2058b: cmp $0x1,%esi
> 0x00007f0d2fb2058e: jle 0x00007f0d2fb2134c
> 0x00007f0d2fb20594: mov $0x1,%r11d
> 0x00007f0d2fb2059a: xor %ecx,%ecx
> 0x00007f0d2fb2059c: nopl 0x0(%rax) ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 41 (line 694)
> 0.06% ↗ 0x00007f0d2fb205a0: movzbl 0x10(%rbp,%r11,4),%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
> 0.18% │ 0x00007f0d2fb205a6: decl 0x10(%r8,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
> 1.37% │ 0x00007f0d2fb205ab: mov 0x10(%rbp,%r11,4),%r10d
> 0.42% │ 0x00007f0d2fb205b0: shr $0x8,%r10d
> 0.34% │ 0x00007f0d2fb205b4: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 66 (line 695)
> 0.50% │ 0x00007f0d2fb205b8: decl 0x10(%r9,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 71 (line 695)
> 1.75% │ 0x00007f0d2fb205bd: mov 0x10(%rbp,%r11,4),%r10d
> 0.42% │ 0x00007f0d2fb205c2: shr $0x10,%r10d
> 0.12% │ 0x00007f0d2fb205c6: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 84 (line 696)
> 0.36% │ 0x00007f0d2fb205ca: decl 0x10(%rbx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 89 (line 696)
> 1.73% │ 0x00007f0d2fb205cf: mov 0x10(%rbp,%r11,4),%r10d
> 0.26% │ 0x00007f0d2fb205d4: shr $0x18,%r10d
> 0.38% │ 0x00007f0d2fb205d8: xor $0x80,%r10d ;*ixor {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 102 (line 697)
> 0.28% │ 0x00007f0d2fb205df: cmp $0x100,%r10d
> │ 0x00007f0d2fb205e6: jae 0x00007f0d2fb20bab
> 0.38% │ 0x00007f0d2fb205ec: decl 0x10(%rdx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
> 1.25% │ 0x00007f0d2fb205f1: movzbl 0x14(%rbp,%r11,4),%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
> 1.63% │ 0x00007f0d2fb205f7: decl 0x10(%r8,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
> 1.71% │ 0x00007f0d2fb205fc: mov 0x14(%rbp,%r11,4),%r10d
> 0.16% │ 0x00007f0d2fb20601: shr $0x8,%r10d
> 0.18% │ 0x00007f0d2fb20605: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 66 (line 695)
> 0.36% │ 0x00007f0d2fb20609: decl 0x10(%r9,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 71 (line 695)
> 2.05% │ 0x00007f0d2fb2060e: mov 0x14(%rbp,%r11,4),%r10d
> 0.14% │ 0x00007f0d2fb20613: shr $0x10,%r10d
> 0.38% │ 0x00007f0d2fb20617: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 84 (line 696)
> 0.14% │ 0x00007f0d2fb2061b: decl 0x10(%rbx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 89 (line 696)
> 1.51% │ 0x00007f0d2fb20620: mov 0x14(%rbp,%r11,4),%r10d
> 0.14% │ 0x00007f0d2fb20625: shr $0x18,%r10d
> 0.32% │ 0x00007f0d2fb20629: xor $0x80,%r10d ;*ixor {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 102 (line 697)
> 0.12% │ 0x00007f0d2fb20630: cmp $0x100,%r10d
> │ 0x00007f0d2fb20637: jae 0x00007f0d2fb20ba8
> 0.78% │ 0x00007f0d2fb2063d: decl 0x10(%rdx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
> 1.04% │ 0x00007f0d2fb20642: add $0x2,%r11d ;*iinc {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 108 (line 693)
> 0.64% │ 0x00007f0d2fb20646: cmp %esi,%r11d
> ╰ 0x00007f0d2fb20649: jl 0x00007f0d2fb205a0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 38 (line 693)
> 0x00007f0d2fb2064f: cmp %r13d,%r11d
> 0x00007f0d2fb20652: jge 0x00007f0d2fb206ad ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 41 (line 694)
> 0x00007f0d2fb20654: movzbl 0x10(%rbp,%r11,4),%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
> 0x00007f0d2fb2065a: decl 0x10(%r8,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
> ....................................................................................................
> 21.11% <total for region 1>
>
>
>
> Whereas they have been eliminated in the first pass of `unrollOnePassHistogramsSkipLevelsSigned`:
>
>
> ....[Hottest Region 1]..............................................................................
> c2, level 4, io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned, version 402 (544 bytes)
>
> 0x00007f123035bb6c: incl 0x14(%rbp,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0x00007f123035bb70: incl 0x14(%r9,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0x00007f123035bb75: cmp $0x1,%esi
> 0x00007f123035bb78: jle 0x00007f123035cc28
> 0x00007f123035bb7e: mov $0x1,%edi
> 0x00007f123035bb83: mov $0x1,%ebx
> 0x00007f123035bb88: nopl 0x0(%rax,%rax,1) ;*aload {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 43 (line 402)
> 0.12% ↗ 0x00007f123035bb90: mov $0x80000000,%r10d
> 0.06% │ 0x00007f123035bb96: add 0x10(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> 0.04% │ 0x00007f123035bb9b: mov %r10d,%r8d
> 0.10% │ 0x00007f123035bb9e: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
> 0.12% │ 0x00007f123035bba2: mov %r10d,%r11d
> 0.08% │ 0x00007f123035bba5: shr $0x10,%r11d
> 0.08% │ 0x00007f123035bba9: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
> 0.08% │ 0x00007f123035bbad: mov %r10d,%ecx
> 0.04% │ 0x00007f123035bbb0: shr $0x8,%ecx
> 0.08% │ 0x00007f123035bbb3: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
> 0.14% │ 0x00007f123035bbb7: mov 0x18(%rsp),%r14
> 0.16% │ 0x00007f123035bbbc: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
> 0.36% │ 0x00007f123035bbc1: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
> 0.06% │ 0x00007f123035bbc4: mov 0x10(%rsp),%r10
> 0.10% │ 0x00007f123035bbc9: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
> 0.26% │ 0x00007f123035bbce: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0.28% │ 0x00007f123035bbd3: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0.20% │ 0x00007f123035bbd8: mov $0x80000000,%r10d
> 0.10% │ 0x00007f123035bbde: add 0x14(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> 0.08% │ 0x00007f123035bbe3: mov %r10d,%r8d
> 0.12% │ 0x00007f123035bbe6: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
> 0.04% │ 0x00007f123035bbea: mov %r10d,%r11d
> 0.06% │ 0x00007f123035bbed: shr $0x10,%r11d
> 0.06% │ 0x00007f123035bbf1: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
> 0.14% │ 0x00007f123035bbf5: mov %r10d,%ecx
> 0.04% │ 0x00007f123035bbf8: shr $0x8,%ecx
> 0.04% │ 0x00007f123035bbfb: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
> 0.08% │ 0x00007f123035bbff: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
> 0.46% │ 0x00007f123035bc04: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
> 0.10% │ 0x00007f123035bc07: mov 0x10(%rsp),%r10
> 0.08% │ 0x00007f123035bc0c: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
> 0.22% │ 0x00007f123035bc11: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0.30% │ 0x00007f123035bc16: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0.22% │ 0x00007f123035bc1b: mov $0x80000000,%r10d
> 0.04% │ 0x00007f123035bc21: add 0x18(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> 0.20% │ 0x00007f123035bc26: mov %r10d,%r8d
> 0.18% │ 0x00007f123035bc29: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
> 0.12% │ 0x00007f123035bc2d: mov %r10d,%r11d
> 0.06% │ 0x00007f123035bc30: shr $0x10,%r11d
> 0.20% │ 0x00007f123035bc34: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
> 0.10% │ 0x00007f123035bc38: mov %r10d,%ecx
> 0.12% │ 0x00007f123035bc3b: shr $0x8,%ecx
> 0.20% │ 0x00007f123035bc3e: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
> 0.08% │ 0x00007f123035bc42: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
> 0.18% │ 0x00007f123035bc47: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
> 0.08% │ 0x00007f123035bc4a: mov 0x10(%rsp),%r10
> 0.10% │ 0x00007f123035bc4f: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
> 0.20% │ 0x00007f123035bc54: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0.12% │ 0x00007f123035bc59: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0.34% │ 0x00007f123035bc5e: mov $0x80000000,%r10d
> 0.04% │ 0x00007f123035bc64: add 0x1c(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> 0.91% │ 0x00007f123035bc69: mov %r10d,%r8d
> 0.02% │ 0x00007f123035bc6c: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
> 0.12% │ 0x00007f123035bc70: mov %r10d,%r11d
> 0.06% │ 0x00007f123035bc73: shr $0x10,%r11d
> 0.06% │ 0x00007f123035bc77: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
> 0.06% │ 0x00007f123035bc7b: mov %r10d,%ecx
> │ 0x00007f123035bc7e: shr $0x8,%ecx
> 0.10% │ 0x00007f123035bc81: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
> 0.08% │ 0x00007f123035bc85: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
> 0.16% │ 0x00007f123035bc8a: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
> │ 0x00007f123035bc8d: mov 0x10(%rsp),%r10
> 0.14% │ 0x00007f123035bc92: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
> 0.32% │ 0x00007f123035bc97: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0.32% │ 0x00007f123035bc9c: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0.24% │ 0x00007f123035bca1: mov $0x80000000,%r10d
> 0.12% │ 0x00007f123035bca7: add 0x20(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> 0.08% │ 0x00007f123035bcac: mov %r10d,%r8d
> 0.08% │ 0x00007f123035bcaf: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
> 0.06% │ 0x00007f123035bcb3: mov %r10d,%r11d
> 0.08% │ 0x00007f123035bcb6: shr $0x10,%r11d
> 0.10% │ 0x00007f123035bcba: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
> 0.14% │ 0x00007f123035bcbe: mov %r10d,%ecx
> 0.14% │ 0x00007f123035bcc1: shr $0x8,%ecx
> 0.14% │ 0x00007f123035bcc4: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
> 0.08% │ 0x00007f123035bcc8: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
> 0.22% │ 0x00007f123035bccd: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
> 0.10% │ 0x00007f123035bcd0: mov 0x10(%rsp),%r10
> 0.14% │ 0x00007f123035bcd5: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
> 0.24% │ 0x00007f123035bcda: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0.20% │ 0x00007f123035bcdf: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0.18% │ 0x00007f123035bce4: mov $0x80000000,%r10d
> 0.04% │ 0x00007f123035bcea: add 0x24(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> 0.12% │ 0x00007f123035bcef: mov %r10d,%r8d
> 0.04% │ 0x00007f123035bcf2: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
> 0.08% │ 0x00007f123035bcf6: mov %r10d,%r11d
> 0.16% │ 0x00007f123035bcf9: shr $0x10,%r11d
> 0.06% │ 0x00007f123035bcfd: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
> 0.06% │ 0x00007f123035bd01: mov %r10d,%ecx
> 0.04% │ 0x00007f123035bd04: shr $0x8,%ecx
> 0.02% │ 0x00007f123035bd07: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
> 0.12% │ 0x00007f123035bd0b: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
> 0.22% │ 0x00007f123035bd10: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
> 0.06% │ 0x00007f123035bd13: mov 0x10(%rsp),%r10
> 0.04% │ 0x00007f123035bd18: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
> 0.20% │ 0x00007f123035bd1d: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0.28% │ 0x00007f123035bd22: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0.46% │ 0x00007f123035bd27: mov $0x80000000,%r10d
> 0.08% │ 0x00007f123035bd2d: add 0x28(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> 0.08% │ 0x00007f123035bd32: mov %r10d,%r8d
> 0.04% │ 0x00007f123035bd35: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
> 0.10% │ 0x00007f123035bd39: mov %r10d,%r11d
> 0.04% │ 0x00007f123035bd3c: shr $0x10,%r11d
> 0.06% │ 0x00007f123035bd40: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
> 0.04% │ 0x00007f123035bd44: mov %r10d,%ecx
> 0.10% │ 0x00007f123035bd47: shr $0x8,%ecx
> 0.10% │ 0x00007f123035bd4a: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
> 0.10% │ 0x00007f123035bd4e: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
> 0.28% │ 0x00007f123035bd53: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
> 0.14% │ 0x00007f123035bd56: mov 0x10(%rsp),%r10
> 0.10% │ 0x00007f123035bd5b: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
> 0.36% │ 0x00007f123035bd60: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0.20% │ 0x00007f123035bd65: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0.28% │ 0x00007f123035bd6a: mov $0x80000000,%r10d
> 0.16% │ 0x00007f123035bd70: add 0x2c(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> 0.08% │ 0x00007f123035bd75: mov %r10d,%r8d
> 0.10% │ 0x00007f123035bd78: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
> 0.08% │ 0x00007f123035bd7c: mov %r10d,%r11d
> 0.08% │ 0x00007f123035bd7f: shr $0x10,%r11d
> 0.14% │ 0x00007f123035bd83: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
> 0.02% │ 0x00007f123035bd87: mov %r10d,%ecx
> 0.02% │ 0x00007f123035bd8a: shr $0x8,%ecx
> 0.20% │ 0x00007f123035bd8d: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
> 0.22% │ 0x00007f123035bd91: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
> 0.22% │ 0x00007f123035bd96: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
> 0.12% │ 0x00007f123035bd99: mov 0x10(%rsp),%r10
> 0.04% │ 0x00007f123035bd9e: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
> 0.22% │ 0x00007f123035bda3: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
> 0.20% │ 0x00007f123035bda8: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
> 0.26% │ 0x00007f123035bdad: add $0x8,%edi ;*iinc {reexecute=0 rethrow=0 return_oop=0}
> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 124 (line 402)
> 0.10% │ 0x00007f123035bdb0: cmp %esi,%edi
> ╰ 0x00007f123035bdb2: jl 0x00007f123035bb90 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 40 (line 402)
> 0x00007f123035bdb8: cmp %eax,%edi
> 0x00007f123035bdba: jge 0x00007f123035be09 ;*aload {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 43 (line 402)
> 0x00007f123035bdbc: mov $0x80000000,%esi
> 0x00007f123035bdc1: add 0x10(%rdx,%rdi,4),%esi ;*isub {reexecute=0 rethrow=0 return_oop=0}
> ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> ....................................................................................................
> 18.02% <total for region 1>
>
>
>
> So, I think there may be some room for improvement in this submission. Benchmarking is hard, and it's possible I have made a mistake or have misinterpreted the results, but I think such a big difference warrants investigation before pushing this change to so many users. I ran these benchmarks on JDK 11.0.11. I have run a range of randomised tests to verify that the two implementations produce the same results.
So the issue of not skipping passes was my fault in the translation process, so not something to worry about, though after [fixing that](https://github.com/richardstartin/radix-sort-benchmark/commit/ccbee984c6a0e0f50c30de59e1a5e9fbcad89510) the original implementation still has the edge because of the bounds checks on the `xor` not getting eliminated.
Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units
RadixSortBenchmark.jdk 17 7 UNIFORM 0 1000000 avgt 5 10432.408 ± 87.024 us/op
RadixSortBenchmark.jdk 23 7 UNIFORM 0 1000000 avgt 5 9465.990 ± 40.598 us/op
RadixSortBenchmark.jdk 30 7 UNIFORM 0 1000000 avgt 5 11189.146 ± 50.972 us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned 17 7 UNIFORM 0 1000000 avgt 5 9546.963 ± 41.698 us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned 23 7 UNIFORM 0 1000000 avgt 5 9412.114 ± 43.081 us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned 30 7 UNIFORM 0 1000000 avgt 5 10823.618 ± 64.311 us/op
-------------
PR: https://git.openjdk.java.net/jdk/pull/3938
More information about the core-libs-dev
mailing list