RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

Richard Startin github.com+16439049+richardstartin at openjdk.java.net
Mon Sep 13 17:28:46 UTC 2021


On Thu, 13 May 2021 20:23:16 GMT, Richard Startin <github.com+16439049+richardstartin at openjdk.org> wrote:

>> In private correspondence with Vladimir, it was explained that where Vladimir's code and Laurent's code are identical, including typos ([Vladimir's code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672), [Laurent's code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719)) it is because Vladimir sent the code to Laurent, not the other way around, therefore Vladimir's code does not derive from Laurent's, and it does not derive from mine. I can only trust that this is the case, so please disregard my claim that this is derivative work when reviewing this PR.
>
> For what it's worth, I benchmarked this implementation radix sort ([adapted here to fit in to my harness](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR681-R710782)) against a [signed variant](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR396-R478) of what I have claimed this work was derived from and the proposed implementation does not perform favourably on uniformly random data:
> 
> 
> 
> Benchmark                                         (bits)  (padding)  (scenario)  (seed)   (size)  Mode  Cnt      Score     Error  Units
> RadixSortBenchmark.jdk                                17          7     UNIFORM       0  1000000  avgt    5  11301.950 ± 113.691  us/op
> RadixSortBenchmark.jdk                                23          7     UNIFORM       0  1000000  avgt    5  11792.351 ±  60.757  us/op
> RadixSortBenchmark.jdk                                30          7     UNIFORM       0  1000000  avgt    5  11184.616 ±  67.094  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned      17          7     UNIFORM       0  1000000  avgt    5   9564.626 ±  69.497  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned      23          7     UNIFORM       0  1000000  avgt    5   9432.085 ±  58.983  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned      30          7     UNIFORM       0  1000000  avgt    5  10772.975 ±  51.848  us/op
> 
> 
> 
> I believe the root cause is a defect in the mechanism employed to skip passes as can be seen by the increased number of instructions and cycles here. In the proposed implementation, instructions is roughly constant as a function of bits. In the case where all passes must be performed (bits = 30), IPC is superior in `unrollOnePassHistogramsSkipLevelsSigned`.
> 
> 
> Benchmark                                                               (bits)  (padding)  (scenario)  (seed)   (size)  Mode  Cnt         Score     Error      Units
> RadixSortBenchmark.jdk:cycles                                               17          7     UNIFORM       0  1000000  avgt       34976971.877                 #/op
> RadixSortBenchmark.jdk:instructions                                         17          7     UNIFORM       0  1000000  avgt       70121142.003                 #/op
> RadixSortBenchmark.jdk:cycles                                               23          7     UNIFORM       0  1000000  avgt       32369970.385                 #/op
> RadixSortBenchmark.jdk:instructions                                         23          7     UNIFORM       0  1000000  avgt       70201664.963                 #/op
> RadixSortBenchmark.jdk:cycles                                               30          7     UNIFORM       0  1000000  avgt       30789736.602                 #/op
> RadixSortBenchmark.jdk:instructions                                         30          7     UNIFORM       0  1000000  avgt       70180942.122                 #/op
> RadixSortBenchmark.jdk:IPC                                                  30          7     UNIFORM       0  1000000  avgt              2.279            insns/clk
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles                     17          7     UNIFORM       0  1000000  avgt       26983994.479                 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions               17          7     UNIFORM       0  1000000  avgt       62065304.827                 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles                     23          7     UNIFORM       0  1000000  avgt       26161703.124                 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions               23          7     UNIFORM       0  1000000  avgt       63102683.167                 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles                     30          7     UNIFORM       0  1000000  avgt       29780103.795                 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions               30          7     UNIFORM       0  1000000  avgt       74437097.025                 #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:IPC                        30          7     UNIFORM       0  1000000  avgt              2.500            insns/clk
> 
> 
> 
> The biggest difference in executed code appears to be that bounds checks are not eliminated in the first counting pass in the proposed implementation:
> 
> 
> ....[Hottest Region 1]..............................................................................
> c2, level 4, io.github.richardstartin.radixsort.RadixSort::jdk, version 402 (166 bytes) 
> 
>             0x00007f0d2fb20579: cmp    $0x100,%r10d
>             0x00007f0d2fb20580: jae    0x00007f0d2fb21344
>             0x00007f0d2fb20586: decl   0x10(%rdx,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
>             0x00007f0d2fb2058b: cmp    $0x1,%esi
>             0x00007f0d2fb2058e: jle    0x00007f0d2fb2134c
>             0x00007f0d2fb20594: mov    $0x1,%r11d
>             0x00007f0d2fb2059a: xor    %ecx,%ecx
>             0x00007f0d2fb2059c: nopl   0x0(%rax)          ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::jdk at 41 (line 694)
>   0.06%  ↗  0x00007f0d2fb205a0: movzbl 0x10(%rbp,%r11,4),%r10d  ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
>   0.18%  │  0x00007f0d2fb205a6: decl   0x10(%r8,%r10,4)   ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
>   1.37%  │  0x00007f0d2fb205ab: mov    0x10(%rbp,%r11,4),%r10d
>   0.42%  │  0x00007f0d2fb205b0: shr    $0x8,%r10d
>   0.34%  │  0x00007f0d2fb205b4: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 66 (line 695)
>   0.50%  │  0x00007f0d2fb205b8: decl   0x10(%r9,%r10,4)   ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 71 (line 695)
>   1.75%  │  0x00007f0d2fb205bd: mov    0x10(%rbp,%r11,4),%r10d
>   0.42%  │  0x00007f0d2fb205c2: shr    $0x10,%r10d
>   0.12%  │  0x00007f0d2fb205c6: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 84 (line 696)
>   0.36%  │  0x00007f0d2fb205ca: decl   0x10(%rbx,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 89 (line 696)
>   1.73%  │  0x00007f0d2fb205cf: mov    0x10(%rbp,%r11,4),%r10d
>   0.26%  │  0x00007f0d2fb205d4: shr    $0x18,%r10d
>   0.38%  │  0x00007f0d2fb205d8: xor    $0x80,%r10d        ;*ixor {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 102 (line 697)
>   0.28%  │  0x00007f0d2fb205df: cmp    $0x100,%r10d
>          │  0x00007f0d2fb205e6: jae    0x00007f0d2fb20bab
>   0.38%  │  0x00007f0d2fb205ec: decl   0x10(%rdx,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
>   1.25%  │  0x00007f0d2fb205f1: movzbl 0x14(%rbp,%r11,4),%r10d  ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
>   1.63%  │  0x00007f0d2fb205f7: decl   0x10(%r8,%r10,4)   ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
>   1.71%  │  0x00007f0d2fb205fc: mov    0x14(%rbp,%r11,4),%r10d
>   0.16%  │  0x00007f0d2fb20601: shr    $0x8,%r10d
>   0.18%  │  0x00007f0d2fb20605: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 66 (line 695)
>   0.36%  │  0x00007f0d2fb20609: decl   0x10(%r9,%r10,4)   ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 71 (line 695)
>   2.05%  │  0x00007f0d2fb2060e: mov    0x14(%rbp,%r11,4),%r10d
>   0.14%  │  0x00007f0d2fb20613: shr    $0x10,%r10d
>   0.38%  │  0x00007f0d2fb20617: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 84 (line 696)
>   0.14%  │  0x00007f0d2fb2061b: decl   0x10(%rbx,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 89 (line 696)
>   1.51%  │  0x00007f0d2fb20620: mov    0x14(%rbp,%r11,4),%r10d
>   0.14%  │  0x00007f0d2fb20625: shr    $0x18,%r10d
>   0.32%  │  0x00007f0d2fb20629: xor    $0x80,%r10d        ;*ixor {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 102 (line 697)
>   0.12%  │  0x00007f0d2fb20630: cmp    $0x100,%r10d
>          │  0x00007f0d2fb20637: jae    0x00007f0d2fb20ba8
>   0.78%  │  0x00007f0d2fb2063d: decl   0x10(%rdx,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 107 (line 697)
>   1.04%  │  0x00007f0d2fb20642: add    $0x2,%r11d         ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::jdk at 108 (line 693)
>   0.64%  │  0x00007f0d2fb20646: cmp    %esi,%r11d
>          ╰  0x00007f0d2fb20649: jl     0x00007f0d2fb205a0  ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::jdk at 38 (line 693)
>             0x00007f0d2fb2064f: cmp    %r13d,%r11d
>             0x00007f0d2fb20652: jge    0x00007f0d2fb206ad  ;*aload_2 {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::jdk at 41 (line 694)
>             0x00007f0d2fb20654: movzbl 0x10(%rbp,%r11,4),%r10d  ;*iand {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::jdk at 49 (line 694)
>             0x00007f0d2fb2065a: decl   0x10(%r8,%r10,4)   ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::jdk at 54 (line 694)
> ....................................................................................................
>  21.11%  <total for region 1>
> 
> 
> 
> Whereas they have been eliminated in the first pass of `unrollOnePassHistogramsSkipLevelsSigned`:
> 
> 
> ....[Hottest Region 1]..............................................................................
> c2, level 4, io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned, version 402 (544 bytes) 
> 
>             0x00007f123035bb6c: incl   0x14(%rbp,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>             0x00007f123035bb70: incl   0x14(%r9,%r10,4)   ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>             0x00007f123035bb75: cmp    $0x1,%esi
>             0x00007f123035bb78: jle    0x00007f123035cc28
>             0x00007f123035bb7e: mov    $0x1,%edi
>             0x00007f123035bb83: mov    $0x1,%ebx
>             0x00007f123035bb88: nopl   0x0(%rax,%rax,1)   ;*aload {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 43 (line 402)
>   0.12%  ↗  0x00007f123035bb90: mov    $0x80000000,%r10d
>   0.06%  │  0x00007f123035bb96: add    0x10(%rdx,%rdi,4),%r10d  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>   0.04%  │  0x00007f123035bb9b: mov    %r10d,%r8d
>   0.10%  │  0x00007f123035bb9e: shr    $0x18,%r8d         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>   0.12%  │  0x00007f123035bba2: mov    %r10d,%r11d
>   0.08%  │  0x00007f123035bba5: shr    $0x10,%r11d
>   0.08%  │  0x00007f123035bba9: movzbl %r11b,%r11d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>   0.08%  │  0x00007f123035bbad: mov    %r10d,%ecx
>   0.04%  │  0x00007f123035bbb0: shr    $0x8,%ecx
>   0.08%  │  0x00007f123035bbb3: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>   0.14%  │  0x00007f123035bbb7: mov    0x18(%rsp),%r14
>   0.16%  │  0x00007f123035bbbc: incl   0x14(%r14,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>   0.36%  │  0x00007f123035bbc1: movzbl %cl,%ecx           ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>   0.06%  │  0x00007f123035bbc4: mov    0x10(%rsp),%r10
>   0.10%  │  0x00007f123035bbc9: incl   0x14(%r10,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>   0.26%  │  0x00007f123035bbce: incl   0x14(%rbp,%r11,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>   0.28%  │  0x00007f123035bbd3: incl   0x14(%r9,%r8,4)    ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>   0.20%  │  0x00007f123035bbd8: mov    $0x80000000,%r10d
>   0.10%  │  0x00007f123035bbde: add    0x14(%rdx,%rdi,4),%r10d  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>   0.08%  │  0x00007f123035bbe3: mov    %r10d,%r8d
>   0.12%  │  0x00007f123035bbe6: shr    $0x18,%r8d         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>   0.04%  │  0x00007f123035bbea: mov    %r10d,%r11d
>   0.06%  │  0x00007f123035bbed: shr    $0x10,%r11d
>   0.06%  │  0x00007f123035bbf1: movzbl %r11b,%r11d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>   0.14%  │  0x00007f123035bbf5: mov    %r10d,%ecx
>   0.04%  │  0x00007f123035bbf8: shr    $0x8,%ecx
>   0.04%  │  0x00007f123035bbfb: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>   0.08%  │  0x00007f123035bbff: incl   0x14(%r14,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>   0.46%  │  0x00007f123035bc04: movzbl %cl,%ecx           ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>   0.10%  │  0x00007f123035bc07: mov    0x10(%rsp),%r10
>   0.08%  │  0x00007f123035bc0c: incl   0x14(%r10,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>   0.22%  │  0x00007f123035bc11: incl   0x14(%rbp,%r11,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>   0.30%  │  0x00007f123035bc16: incl   0x14(%r9,%r8,4)    ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>   0.22%  │  0x00007f123035bc1b: mov    $0x80000000,%r10d
>   0.04%  │  0x00007f123035bc21: add    0x18(%rdx,%rdi,4),%r10d  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>   0.20%  │  0x00007f123035bc26: mov    %r10d,%r8d
>   0.18%  │  0x00007f123035bc29: shr    $0x18,%r8d         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>   0.12%  │  0x00007f123035bc2d: mov    %r10d,%r11d
>   0.06%  │  0x00007f123035bc30: shr    $0x10,%r11d
>   0.20%  │  0x00007f123035bc34: movzbl %r11b,%r11d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>   0.10%  │  0x00007f123035bc38: mov    %r10d,%ecx
>   0.12%  │  0x00007f123035bc3b: shr    $0x8,%ecx
>   0.20%  │  0x00007f123035bc3e: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>   0.08%  │  0x00007f123035bc42: incl   0x14(%r14,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>   0.18%  │  0x00007f123035bc47: movzbl %cl,%ecx           ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>   0.08%  │  0x00007f123035bc4a: mov    0x10(%rsp),%r10
>   0.10%  │  0x00007f123035bc4f: incl   0x14(%r10,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>   0.20%  │  0x00007f123035bc54: incl   0x14(%rbp,%r11,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>   0.12%  │  0x00007f123035bc59: incl   0x14(%r9,%r8,4)    ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>   0.34%  │  0x00007f123035bc5e: mov    $0x80000000,%r10d
>   0.04%  │  0x00007f123035bc64: add    0x1c(%rdx,%rdi,4),%r10d  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>   0.91%  │  0x00007f123035bc69: mov    %r10d,%r8d
>   0.02%  │  0x00007f123035bc6c: shr    $0x18,%r8d         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>   0.12%  │  0x00007f123035bc70: mov    %r10d,%r11d
>   0.06%  │  0x00007f123035bc73: shr    $0x10,%r11d
>   0.06%  │  0x00007f123035bc77: movzbl %r11b,%r11d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>   0.06%  │  0x00007f123035bc7b: mov    %r10d,%ecx
>          │  0x00007f123035bc7e: shr    $0x8,%ecx
>   0.10%  │  0x00007f123035bc81: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>   0.08%  │  0x00007f123035bc85: incl   0x14(%r14,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>   0.16%  │  0x00007f123035bc8a: movzbl %cl,%ecx           ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>          │  0x00007f123035bc8d: mov    0x10(%rsp),%r10
>   0.14%  │  0x00007f123035bc92: incl   0x14(%r10,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>   0.32%  │  0x00007f123035bc97: incl   0x14(%rbp,%r11,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>   0.32%  │  0x00007f123035bc9c: incl   0x14(%r9,%r8,4)    ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>   0.24%  │  0x00007f123035bca1: mov    $0x80000000,%r10d
>   0.12%  │  0x00007f123035bca7: add    0x20(%rdx,%rdi,4),%r10d  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>   0.08%  │  0x00007f123035bcac: mov    %r10d,%r8d
>   0.08%  │  0x00007f123035bcaf: shr    $0x18,%r8d         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>   0.06%  │  0x00007f123035bcb3: mov    %r10d,%r11d
>   0.08%  │  0x00007f123035bcb6: shr    $0x10,%r11d
>   0.10%  │  0x00007f123035bcba: movzbl %r11b,%r11d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>   0.14%  │  0x00007f123035bcbe: mov    %r10d,%ecx
>   0.14%  │  0x00007f123035bcc1: shr    $0x8,%ecx
>   0.14%  │  0x00007f123035bcc4: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>   0.08%  │  0x00007f123035bcc8: incl   0x14(%r14,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>   0.22%  │  0x00007f123035bccd: movzbl %cl,%ecx           ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>   0.10%  │  0x00007f123035bcd0: mov    0x10(%rsp),%r10
>   0.14%  │  0x00007f123035bcd5: incl   0x14(%r10,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>   0.24%  │  0x00007f123035bcda: incl   0x14(%rbp,%r11,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>   0.20%  │  0x00007f123035bcdf: incl   0x14(%r9,%r8,4)    ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>   0.18%  │  0x00007f123035bce4: mov    $0x80000000,%r10d
>   0.04%  │  0x00007f123035bcea: add    0x24(%rdx,%rdi,4),%r10d  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>   0.12%  │  0x00007f123035bcef: mov    %r10d,%r8d
>   0.04%  │  0x00007f123035bcf2: shr    $0x18,%r8d         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>   0.08%  │  0x00007f123035bcf6: mov    %r10d,%r11d
>   0.16%  │  0x00007f123035bcf9: shr    $0x10,%r11d
>   0.06%  │  0x00007f123035bcfd: movzbl %r11b,%r11d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>   0.06%  │  0x00007f123035bd01: mov    %r10d,%ecx
>   0.04%  │  0x00007f123035bd04: shr    $0x8,%ecx
>   0.02%  │  0x00007f123035bd07: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>   0.12%  │  0x00007f123035bd0b: incl   0x14(%r14,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>   0.22%  │  0x00007f123035bd10: movzbl %cl,%ecx           ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>   0.06%  │  0x00007f123035bd13: mov    0x10(%rsp),%r10
>   0.04%  │  0x00007f123035bd18: incl   0x14(%r10,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>   0.20%  │  0x00007f123035bd1d: incl   0x14(%rbp,%r11,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>   0.28%  │  0x00007f123035bd22: incl   0x14(%r9,%r8,4)    ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>   0.46%  │  0x00007f123035bd27: mov    $0x80000000,%r10d
>   0.08%  │  0x00007f123035bd2d: add    0x28(%rdx,%rdi,4),%r10d  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>   0.08%  │  0x00007f123035bd32: mov    %r10d,%r8d
>   0.04%  │  0x00007f123035bd35: shr    $0x18,%r8d         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>   0.10%  │  0x00007f123035bd39: mov    %r10d,%r11d
>   0.04%  │  0x00007f123035bd3c: shr    $0x10,%r11d
>   0.06%  │  0x00007f123035bd40: movzbl %r11b,%r11d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>   0.04%  │  0x00007f123035bd44: mov    %r10d,%ecx
>   0.10%  │  0x00007f123035bd47: shr    $0x8,%ecx
>   0.10%  │  0x00007f123035bd4a: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>   0.10%  │  0x00007f123035bd4e: incl   0x14(%r14,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>   0.28%  │  0x00007f123035bd53: movzbl %cl,%ecx           ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>   0.14%  │  0x00007f123035bd56: mov    0x10(%rsp),%r10
>   0.10%  │  0x00007f123035bd5b: incl   0x14(%r10,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>   0.36%  │  0x00007f123035bd60: incl   0x14(%rbp,%r11,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>   0.20%  │  0x00007f123035bd65: incl   0x14(%r9,%r8,4)    ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>   0.28%  │  0x00007f123035bd6a: mov    $0x80000000,%r10d
>   0.16%  │  0x00007f123035bd70: add    0x2c(%rdx,%rdi,4),%r10d  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
>   0.08%  │  0x00007f123035bd75: mov    %r10d,%r8d
>   0.10%  │  0x00007f123035bd78: shr    $0x18,%r8d         ;*iushr {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 116 (line 406)
>   0.08%  │  0x00007f123035bd7c: mov    %r10d,%r11d
>   0.08%  │  0x00007f123035bd7f: shr    $0x10,%r11d
>   0.14%  │  0x00007f123035bd83: movzbl %r11b,%r11d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 99 (line 405)
>   0.02%  │  0x00007f123035bd87: mov    %r10d,%ecx
>   0.02%  │  0x00007f123035bd8a: shr    $0x8,%ecx
>   0.20%  │  0x00007f123035bd8d: movzbl %r10b,%r10d        ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 59 (line 403)
>   0.22%  │  0x00007f123035bd91: incl   0x14(%r14,%r10,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 66 (line 403)
>   0.22%  │  0x00007f123035bd96: movzbl %cl,%ecx           ;*iand {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 79 (line 404)
>   0.12%  │  0x00007f123035bd99: mov    0x10(%rsp),%r10
>   0.04%  │  0x00007f123035bd9e: incl   0x14(%r10,%rcx,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 86 (line 404)
>   0.22%  │  0x00007f123035bda3: incl   0x14(%rbp,%r11,4)  ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 106 (line 405)
>   0.20%  │  0x00007f123035bda8: incl   0x14(%r9,%r8,4)    ;*iastore {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 123 (line 406)
>   0.26%  │  0x00007f123035bdad: add    $0x8,%edi          ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 124 (line 402)
>   0.10%  │  0x00007f123035bdb0: cmp    %esi,%edi
>          ╰  0x00007f123035bdb2: jl     0x00007f123035bb90  ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 40 (line 402)
>             0x00007f123035bdb8: cmp    %eax,%edi
>             0x00007f123035bdba: jge    0x00007f123035be09  ;*aload {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 43 (line 402)
>             0x00007f123035bdbc: mov    $0x80000000,%esi
>             0x00007f123035bdc1: add    0x10(%rdx,%rdi,4),%esi  ;*isub {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::signed at 3 (line 476)
>                                                           ; - io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned at 53 (line 403)
> ....................................................................................................
>  18.02%  <total for region 1>
> 
> 
> 
> So, I think there may be some room for improvement in this submission. Benchmarking is hard, and it's possible I have made a mistake or have misinterpreted the results, but I think such a big difference warrants investigation before pushing this change to so many users. I ran these benchmarks on JDK 11.0.11. I have run a range of randomised tests to verify that the two implementations produce the same results.

So the issue of not skipping passes was my fault in the translation process, so not something to worry about, though after [fixing that](https://github.com/richardstartin/radix-sort-benchmark/commit/ccbee984c6a0e0f50c30de59e1a5e9fbcad89510) the original implementation still has the edge because of the bounds checks on the `xor` not getting eliminated.


Benchmark                                         (bits)  (padding)  (scenario)  (seed)   (size)  Mode  Cnt      Score    Error  Units
RadixSortBenchmark.jdk                                17          7     UNIFORM       0  1000000  avgt    5  10432.408 ± 87.024  us/op
RadixSortBenchmark.jdk                                23          7     UNIFORM       0  1000000  avgt    5   9465.990 ± 40.598  us/op
RadixSortBenchmark.jdk                                30          7     UNIFORM       0  1000000  avgt    5  11189.146 ± 50.972  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned      17          7     UNIFORM       0  1000000  avgt    5   9546.963 ± 41.698  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned      23          7     UNIFORM       0  1000000  avgt    5   9412.114 ± 43.081  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned      30          7     UNIFORM       0  1000000  avgt    5  10823.618 ± 64.311  us/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/3938


More information about the core-libs-dev mailing list