RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
Laurent Bourgès
lbourges at openjdk.java.net
Mon Sep 13 17:28:47 UTC 2021
On Thu, 13 May 2021 21:44:38 GMT, Richard Startin <github.com+16439049+richardstartin at openjdk.org> wrote:
>> For what it's worth, I benchmarked this implementation radix sort ([adapted here to fit in to my harness](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR681-R710782)) against a [signed variant](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR396-R478) of what I have claimed this work was derived from and the proposed implementation does not perform favourably on uniformly random data:
>>
>>
>>
>> Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units
>> RadixSortBenchmark.jdk 17 7 UNIFORM 0 1000000 avgt 5 11301.950 ± 113.691 us/op
>> RadixSortBenchmark.jdk 23 7 UNIFORM 0 1000000 avgt 5 11792.351 ± 60.757 us/op
>> RadixSortBenchmark.jdk 30 7 UNIFORM 0 1000000 avgt 5 11184.616 ± 67.094 us/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 17 7 UNIFORM 0 1000000 avgt 5 9564.626 ± 69.497 us/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 23 7 UNIFORM 0 1000000 avgt 5 9432.085 ± 58.983 us/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 30 7 UNIFORM 0 1000000 avgt 5 10772.975 ± 51.848 us/op
>>
>>
>>
>> I believe the root cause is a defect in the mechanism employed to skip passes as can be seen by the increased number of instructions and cycles here. In the proposed implementation, instructions is roughly constant as a function of bits. In the case where all passes must be performed (bits = 30), IPC is superior in `unrollOnePassHistogramsSkipLevelsSigned`.
>>
>>
>> Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units
>> RadixSortBenchmark.jdk:cycles 17 7 UNIFORM 0 1000000 avgt 34976971.877 #/op
>> RadixSortBenchmark.jdk:instructions 17 7 UNIFORM 0 1000000 avgt 70121142.003 #/op
>> RadixSortBenchmark.jdk:cycles 23 7 UNIFORM 0 1000000 avgt 32369970.385 #/op
>> RadixSortBenchmark.jdk:instructions 23 7 UNIFORM 0 1000000 avgt 70201664.963 #/op
>> RadixSortBenchmark.jdk:cycles 30 7 UNIFORM 0 1000000 avgt 30789736.602 #/op
>> RadixSortBenchmark.jdk:instructions 30 7 UNIFORM 0 1000000 avgt 70180942.122 #/op
>> RadixSortBenchmark.jdk:IPC 30 7 UNIFORM 0 1000000 avgt 2.279 insns/clk
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles 17 7 UNIFORM 0 1000000 avgt 26983994.479 #/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions 17 7 UNIFORM 0 1000000 avgt 62065304.827 #/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles 23 7 UNIFORM 0 1000000 avgt 26161703.124 #/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions 23 7 UNIFORM 0 1000000 avgt 63102683.167 #/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles 30 7 UNIFORM 0 1000000 avgt 29780103.795 #/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions 30 7 UNIFORM 0 1000000 avgt 74437097.025 #/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:IPC 30 7 UNIFORM 0 1000000 avgt 2.500 insns/clk
>>
>>
>>
>> The biggest difference in executed code appears to be that bounds checks are not eliminated in the first counting pass in the proposed implementation:
>>
>>
>> ....[Hottest Region 1]..............................................................................
>> c2, level 4, io.github.richardstartin.radixsort.RadixSort::jdk, version 402 (166 bytes)
>>
>> 0x00007f0d2fb20579: cmp $0x100,%r10d
>> 0x00007f0d2fb20580: jae 0x00007f0d2fb21344
>> 0x00007f0d2fb20586: decl 0x10(%rdx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
>> 0x00007f0d2fb2058b: cmp $0x1,%esi
>> 0x00007f0d2fb2058e: jle 0x00007f0d2fb2134c
>> 0x00007f0d2fb20594: mov $0x1,%r11d
>> 0x00007f0d2fb2059a: xor %ecx,%ecx
>> 0x00007f0d2fb2059c: nopl 0x0(%rax) ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 41 (line 694)
>> 0.06% ↗ 0x00007f0d2fb205a0: movzbl 0x10(%rbp,%r11,4),%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
>> 0.18% │ 0x00007f0d2fb205a6: decl 0x10(%r8,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
>> 1.37% │ 0x00007f0d2fb205ab: mov 0x10(%rbp,%r11,4),%r10d
>> 0.42% │ 0x00007f0d2fb205b0: shr $0x8,%r10d
>> 0.34% │ 0x00007f0d2fb205b4: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 66 (line 695)
>> 0.50% │ 0x00007f0d2fb205b8: decl 0x10(%r9,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 71 (line 695)
>> 1.75% │ 0x00007f0d2fb205bd: mov 0x10(%rbp,%r11,4),%r10d
>> 0.42% │ 0x00007f0d2fb205c2: shr $0x10,%r10d
>> 0.12% │ 0x00007f0d2fb205c6: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 84 (line 696)
>> 0.36% │ 0x00007f0d2fb205ca: decl 0x10(%rbx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 89 (line 696)
>> 1.73% │ 0x00007f0d2fb205cf: mov 0x10(%rbp,%r11,4),%r10d
>> 0.26% │ 0x00007f0d2fb205d4: shr $0x18,%r10d
>> 0.38% │ 0x00007f0d2fb205d8: xor $0x80,%r10d ;*ixor {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 102 (line 697)
>> 0.28% │ 0x00007f0d2fb205df: cmp $0x100,%r10d
>> │ 0x00007f0d2fb205e6: jae 0x00007f0d2fb20bab
>> 0.38% │ 0x00007f0d2fb205ec: decl 0x10(%rdx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
>> 1.25% │ 0x00007f0d2fb205f1: movzbl 0x14(%rbp,%r11,4),%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
>> 1.63% │ 0x00007f0d2fb205f7: decl 0x10(%r8,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
>> 1.71% │ 0x00007f0d2fb205fc: mov 0x14(%rbp,%r11,4),%r10d
>> 0.16% │ 0x00007f0d2fb20601: shr $0x8,%r10d
>> 0.18% │ 0x00007f0d2fb20605: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 66 (line 695)
>> 0.36% │ 0x00007f0d2fb20609: decl 0x10(%r9,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 71 (line 695)
>> 2.05% │ 0x00007f0d2fb2060e: mov 0x14(%rbp,%r11,4),%r10d
>> 0.14% │ 0x00007f0d2fb20613: shr $0x10,%r10d
>> 0.38% │ 0x00007f0d2fb20617: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 84 (line 696)
>> 0.14% │ 0x00007f0d2fb2061b: decl 0x10(%rbx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 89 (line 696)
>> 1.51% │ 0x00007f0d2fb20620: mov 0x14(%rbp,%r11,4),%r10d
>> 0.14% │ 0x00007f0d2fb20625: shr $0x18,%r10d
>> 0.32% │ 0x00007f0d2fb20629: xor $0x80,%r10d ;*ixor {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 102 (line 697)
>> 0.12% │ 0x00007f0d2fb20630: cmp $0x100,%r10d
>> │ 0x00007f0d2fb20637: jae 0x00007f0d2fb20ba8
>> 0.78% │ 0x00007f0d2fb2063d: decl 0x10(%rdx,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
>> 1.04% │ 0x00007f0d2fb20642: add $0x2,%r11d ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::jdk at 108 (line 693)
>> 0.64% │ 0x00007f0d2fb20646: cmp %esi,%r11d
>> ╰ 0x00007f0d2fb20649: jl 0x00007f0d2fb205a0 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 38 (line 693)
>> 0x00007f0d2fb2064f: cmp %r13d,%r11d
>> 0x00007f0d2fb20652: jge 0x00007f0d2fb206ad ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 41 (line 694)
>> 0x00007f0d2fb20654: movzbl 0x10(%rbp,%r11,4),%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
>> 0x00007f0d2fb2065a: decl 0x10(%r8,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
>> ....................................................................................................
>> 21.11% <total for region 1>
>>
>>
>>
>> Whereas they have been eliminated in the first pass of `unrollOnePassHistogramsSkipLevelsSigned`:
>>
>>
>> ....[Hottest Region 1]..............................................................................
>> c2, level 4, io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned, version 402 (544 bytes)
>>
>> 0x00007f123035bb6c: incl 0x14(%rbp,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0x00007f123035bb70: incl 0x14(%r9,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0x00007f123035bb75: cmp $0x1,%esi
>> 0x00007f123035bb78: jle 0x00007f123035cc28
>> 0x00007f123035bb7e: mov $0x1,%edi
>> 0x00007f123035bb83: mov $0x1,%ebx
>> 0x00007f123035bb88: nopl 0x0(%rax,%rax,1) ;*aload {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 43 (line 402)
>> 0.12% ↗ 0x00007f123035bb90: mov $0x80000000,%r10d
>> 0.06% │ 0x00007f123035bb96: add 0x10(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> 0.04% │ 0x00007f123035bb9b: mov %r10d,%r8d
>> 0.10% │ 0x00007f123035bb9e: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>> 0.12% │ 0x00007f123035bba2: mov %r10d,%r11d
>> 0.08% │ 0x00007f123035bba5: shr $0x10,%r11d
>> 0.08% │ 0x00007f123035bba9: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>> 0.08% │ 0x00007f123035bbad: mov %r10d,%ecx
>> 0.04% │ 0x00007f123035bbb0: shr $0x8,%ecx
>> 0.08% │ 0x00007f123035bbb3: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>> 0.14% │ 0x00007f123035bbb7: mov 0x18(%rsp),%r14
>> 0.16% │ 0x00007f123035bbbc: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>> 0.36% │ 0x00007f123035bbc1: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>> 0.06% │ 0x00007f123035bbc4: mov 0x10(%rsp),%r10
>> 0.10% │ 0x00007f123035bbc9: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>> 0.26% │ 0x00007f123035bbce: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0.28% │ 0x00007f123035bbd3: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0.20% │ 0x00007f123035bbd8: mov $0x80000000,%r10d
>> 0.10% │ 0x00007f123035bbde: add 0x14(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> 0.08% │ 0x00007f123035bbe3: mov %r10d,%r8d
>> 0.12% │ 0x00007f123035bbe6: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>> 0.04% │ 0x00007f123035bbea: mov %r10d,%r11d
>> 0.06% │ 0x00007f123035bbed: shr $0x10,%r11d
>> 0.06% │ 0x00007f123035bbf1: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>> 0.14% │ 0x00007f123035bbf5: mov %r10d,%ecx
>> 0.04% │ 0x00007f123035bbf8: shr $0x8,%ecx
>> 0.04% │ 0x00007f123035bbfb: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>> 0.08% │ 0x00007f123035bbff: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>> 0.46% │ 0x00007f123035bc04: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>> 0.10% │ 0x00007f123035bc07: mov 0x10(%rsp),%r10
>> 0.08% │ 0x00007f123035bc0c: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>> 0.22% │ 0x00007f123035bc11: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0.30% │ 0x00007f123035bc16: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0.22% │ 0x00007f123035bc1b: mov $0x80000000,%r10d
>> 0.04% │ 0x00007f123035bc21: add 0x18(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> 0.20% │ 0x00007f123035bc26: mov %r10d,%r8d
>> 0.18% │ 0x00007f123035bc29: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>> 0.12% │ 0x00007f123035bc2d: mov %r10d,%r11d
>> 0.06% │ 0x00007f123035bc30: shr $0x10,%r11d
>> 0.20% │ 0x00007f123035bc34: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>> 0.10% │ 0x00007f123035bc38: mov %r10d,%ecx
>> 0.12% │ 0x00007f123035bc3b: shr $0x8,%ecx
>> 0.20% │ 0x00007f123035bc3e: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>> 0.08% │ 0x00007f123035bc42: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>> 0.18% │ 0x00007f123035bc47: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>> 0.08% │ 0x00007f123035bc4a: mov 0x10(%rsp),%r10
>> 0.10% │ 0x00007f123035bc4f: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>> 0.20% │ 0x00007f123035bc54: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0.12% │ 0x00007f123035bc59: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0.34% │ 0x00007f123035bc5e: mov $0x80000000,%r10d
>> 0.04% │ 0x00007f123035bc64: add 0x1c(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> 0.91% │ 0x00007f123035bc69: mov %r10d,%r8d
>> 0.02% │ 0x00007f123035bc6c: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>> 0.12% │ 0x00007f123035bc70: mov %r10d,%r11d
>> 0.06% │ 0x00007f123035bc73: shr $0x10,%r11d
>> 0.06% │ 0x00007f123035bc77: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>> 0.06% │ 0x00007f123035bc7b: mov %r10d,%ecx
>> │ 0x00007f123035bc7e: shr $0x8,%ecx
>> 0.10% │ 0x00007f123035bc81: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>> 0.08% │ 0x00007f123035bc85: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>> 0.16% │ 0x00007f123035bc8a: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>> │ 0x00007f123035bc8d: mov 0x10(%rsp),%r10
>> 0.14% │ 0x00007f123035bc92: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>> 0.32% │ 0x00007f123035bc97: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0.32% │ 0x00007f123035bc9c: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0.24% │ 0x00007f123035bca1: mov $0x80000000,%r10d
>> 0.12% │ 0x00007f123035bca7: add 0x20(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> 0.08% │ 0x00007f123035bcac: mov %r10d,%r8d
>> 0.08% │ 0x00007f123035bcaf: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>> 0.06% │ 0x00007f123035bcb3: mov %r10d,%r11d
>> 0.08% │ 0x00007f123035bcb6: shr $0x10,%r11d
>> 0.10% │ 0x00007f123035bcba: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>> 0.14% │ 0x00007f123035bcbe: mov %r10d,%ecx
>> 0.14% │ 0x00007f123035bcc1: shr $0x8,%ecx
>> 0.14% │ 0x00007f123035bcc4: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>> 0.08% │ 0x00007f123035bcc8: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>> 0.22% │ 0x00007f123035bccd: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>> 0.10% │ 0x00007f123035bcd0: mov 0x10(%rsp),%r10
>> 0.14% │ 0x00007f123035bcd5: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>> 0.24% │ 0x00007f123035bcda: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0.20% │ 0x00007f123035bcdf: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0.18% │ 0x00007f123035bce4: mov $0x80000000,%r10d
>> 0.04% │ 0x00007f123035bcea: add 0x24(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> 0.12% │ 0x00007f123035bcef: mov %r10d,%r8d
>> 0.04% │ 0x00007f123035bcf2: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>> 0.08% │ 0x00007f123035bcf6: mov %r10d,%r11d
>> 0.16% │ 0x00007f123035bcf9: shr $0x10,%r11d
>> 0.06% │ 0x00007f123035bcfd: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>> 0.06% │ 0x00007f123035bd01: mov %r10d,%ecx
>> 0.04% │ 0x00007f123035bd04: shr $0x8,%ecx
>> 0.02% │ 0x00007f123035bd07: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>> 0.12% │ 0x00007f123035bd0b: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>> 0.22% │ 0x00007f123035bd10: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>> 0.06% │ 0x00007f123035bd13: mov 0x10(%rsp),%r10
>> 0.04% │ 0x00007f123035bd18: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>> 0.20% │ 0x00007f123035bd1d: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0.28% │ 0x00007f123035bd22: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0.46% │ 0x00007f123035bd27: mov $0x80000000,%r10d
>> 0.08% │ 0x00007f123035bd2d: add 0x28(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> 0.08% │ 0x00007f123035bd32: mov %r10d,%r8d
>> 0.04% │ 0x00007f123035bd35: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>> 0.10% │ 0x00007f123035bd39: mov %r10d,%r11d
>> 0.04% │ 0x00007f123035bd3c: shr $0x10,%r11d
>> 0.06% │ 0x00007f123035bd40: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>> 0.04% │ 0x00007f123035bd44: mov %r10d,%ecx
>> 0.10% │ 0x00007f123035bd47: shr $0x8,%ecx
>> 0.10% │ 0x00007f123035bd4a: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>> 0.10% │ 0x00007f123035bd4e: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>> 0.28% │ 0x00007f123035bd53: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>> 0.14% │ 0x00007f123035bd56: mov 0x10(%rsp),%r10
>> 0.10% │ 0x00007f123035bd5b: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>> 0.36% │ 0x00007f123035bd60: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0.20% │ 0x00007f123035bd65: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0.28% │ 0x00007f123035bd6a: mov $0x80000000,%r10d
>> 0.16% │ 0x00007f123035bd70: add 0x2c(%rdx,%rdi,4),%r10d ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> 0.08% │ 0x00007f123035bd75: mov %r10d,%r8d
>> 0.10% │ 0x00007f123035bd78: shr $0x18,%r8d ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>> 0.08% │ 0x00007f123035bd7c: mov %r10d,%r11d
>> 0.08% │ 0x00007f123035bd7f: shr $0x10,%r11d
>> 0.14% │ 0x00007f123035bd83: movzbl %r11b,%r11d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>> 0.02% │ 0x00007f123035bd87: mov %r10d,%ecx
>> 0.02% │ 0x00007f123035bd8a: shr $0x8,%ecx
>> 0.20% │ 0x00007f123035bd8d: movzbl %r10b,%r10d ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>> 0.22% │ 0x00007f123035bd91: incl 0x14(%r14,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>> 0.22% │ 0x00007f123035bd96: movzbl %cl,%ecx ;*iand {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>> 0.12% │ 0x00007f123035bd99: mov 0x10(%rsp),%r10
>> 0.04% │ 0x00007f123035bd9e: incl 0x14(%r10,%rcx,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>> 0.22% │ 0x00007f123035bda3: incl 0x14(%rbp,%r11,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>> 0.20% │ 0x00007f123035bda8: incl 0x14(%r9,%r8,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>> 0.26% │ 0x00007f123035bdad: add $0x8,%edi ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>> │ ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 124 (line 402)
>> 0.10% │ 0x00007f123035bdb0: cmp %esi,%edi
>> ╰ 0x00007f123035bdb2: jl 0x00007f123035bb90 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 40 (line 402)
>> 0x00007f123035bdb8: cmp %eax,%edi
>> 0x00007f123035bdba: jge 0x00007f123035be09 ;*aload {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 43 (line 402)
>> 0x00007f123035bdbc: mov $0x80000000,%esi
>> 0x00007f123035bdc1: add 0x10(%rdx,%rdi,4),%esi ;*isub {reexecute=0 rethrow=0 return_oop=0}
>> ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>> ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>> ....................................................................................................
>> 18.02% <total for region 1>
>>
>>
>>
>> So, I think there may be some room for improvement in this submission. Benchmarking is hard, and it's possible I have made a mistake or have misinterpreted the results, but I think such a big difference warrants investigation before pushing this change to so many users. I ran these benchmarks on JDK 11.0.11. I have run a range of randomised tests to verify that the two implementations produce the same results.
>
> So the issue of not skipping passes was my fault in the translation process, so not something to worry about, though after [fixing that](https://github.com/richardstartin/radix-sort-benchmark/commit/ccbee984c6a0e0f50c30de59e1a5e9fbcad89510) the original implementation still has the edge because of the bounds checks on the `xor` not getting eliminated.
>
>
> Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units
> RadixSortBenchmark.jdk 17 7 UNIFORM 0 1000000 avgt 5 10432.408 ± 87.024 us/op
> RadixSortBenchmark.jdk 23 7 UNIFORM 0 1000000 avgt 5 9465.990 ± 40.598 us/op
> RadixSortBenchmark.jdk 30 7 UNIFORM 0 1000000 avgt 5 11189.146 ± 50.972 us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 17 7 UNIFORM 0 1000000 avgt 5 9546.963 ± 41.698 us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 23 7 UNIFORM 0 1000000 avgt 5 9412.114 ± 43.081 us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 30 7 UNIFORM 0 1000000 avgt 5 10823.618 ± 64.311 us/op
Great analysis on C2, richard.
maybe (x ^ 0x80) &0xFF would help C2 to eliminate bound checks...
-------------
PR: https://git.openjdk.java.net/jdk/pull/3938
More information about the core-libs-dev
mailing list