XCHG is slow ?
Rémi Forax
forax at univ-mlv.fr
Wed Jul 4 02:15:09 PDT 2012
On 07/04/2012 03:30 AM, Vladimir Kozlov wrote:
> I agree with you that it is strange. I looked on code and, as you
> said, one with overflow detection executes a lot more instructions and
> stack operations. What is time difference? I ran latest 7u6 (linux64)
> promoted build and got this numbers:
>
> $ bin/jruby -X+C bench/bench_fib_recursive.rb
> 0.170000 0.070000 0.240000 ( 0.100000)
> 0.030000 0.020000 0.050000 ( 0.036000)
> 0.040000 0.020000 0.060000 ( 0.036000)
> 0.040000 0.030000 0.070000 ( 0.036000)
> 0.040000 0.020000 0.060000 ( 0.036000)
>
> Vladimir
Hi Vladimir,
this is a work is not integrated in JRuby yet, I'm sure Charles will
want to implement it in JRuby
but because it requires a small static analysis to distinguish between
numeric vs object
so I've written my own generator as a proof of concept.
The class corresponding to the overflow version is here:
http://www-igm.univ-mlv.fr/~forax/tmp/FiboSample.class
The version without overflow detection is the Java one.
cheers,
Rémi
>
> Rémi Forax wrote:
>> On 07/04/2012 01:44 AM, Vladimir Kozlov wrote:
>>> Rémi,
>>>
>>> NOP and xchg %ax,%ax are and always were the same instruction and
>>> have the same encoding 0x90. What you see is multibytes nop (2 bytes
>>> in your case) to align following call instruction: 0x66 0x90.
>>>
>>> Vladimir
>>
>> I see and I suppose that a multibytes nop is not slower than a single
>> byte nop.
>>
>> So to explain my problem clearly, I've two codes the first one is
>> fibonacci (ClassicFibo)
>> with an int, the second one is fibonacci with an int plus the
>> overflow detection (FiboSample).
>>
>> My problem is that the code with the overflow detection is faster the
>> code
>> without the overflow detection :)
>>
>> Rémi
>>
>> # {method} 'fibo' '(I)I' in 'ClassicFibo'
>> # parm0: rsi = int
>> # [sp+0x30] (sp of caller)
>> 0x00007fdf91061620: mov %eax,-0x14000(%rsp) ;...89842400 c0feff
>> 0x00007fdf91061627: push %rbp ;...55
>> 0x00007fdf91061628: sub $0x20,%rsp ;...4883ec20
>> ;*synchronization entry
>> ; -
>> ClassicFibo::fibo at -1 (line 4)
>> 0x00007fdf9106162c: mov %esi,%ebp ;...8bee
>> 0x00007fdf9106162e: cmp $0x2,%esi ;...83fe02
>> 0x00007fdf91061631: jl 0x00007fdf91061651 ;...7c1e
>> ;*if_icmpge
>> ; -
>> ClassicFibo::fibo at 2 (line 4)
>> 0x00007fdf91061633: dec %esi ;...ffce
>> ;*isub
>> ; -
>> ClassicFibo::fibo at 9 (line 7)
>> 0x00007fdf91061635: xchg %ax,%ax ;...6690
>> 0x00007fdf91061637: callq 0x00007fdf91038060 ;...e8246afd ff
>> ; OopMap{off=28}
>> ;*invokestatic fibo
>> ; -
>> ClassicFibo::fibo at 10 (line 7)
>> ; {static_call}
>> 0x00007fdf9106163c: mov %eax,(%rsp) ;...890424
>> 0x00007fdf9106163f: mov %ebp,%esi ;...8bf5
>> 0x00007fdf91061641: add $0xfffffffffffffffe,%esi ;...83c6fe
>> ;*isub
>> ; -
>> ClassicFibo::fibo at 15 (line 7)
>> 0x00007fdf91061644: xchg %ax,%ax ;...666690
>> 0x00007fdf91061647: callq 0x00007fdf91038060 ;...e8146afd ff
>> ; OopMap{off=44}
>> ;*invokestatic fibo
>> ; -
>> ClassicFibo::fibo at 16 (line 7)
>> ; {static_call}
>> 0x00007fdf9106164c: add (%rsp),%eax ;...030424
>> ;*iadd
>> ; -
>> ClassicFibo::fibo at 19 (line 7)
>> 0x00007fdf9106164f: jmp 0x00007fdf91061656 ;...eb05
>> 0x00007fdf91061651: mov $0x1,%eax ;...b8010000 00
>> 0x00007fdf91061656: add $0x20,%rsp ;...4883c420
>> 0x00007fdf9106165a: pop %rbp ;...5d
>> 0x00007fdf9106165b: test %eax,0x91c899f(%rip) #
>> 0x00007fdf9a22a000
>> ;...85059f89 1c09
>> ; {poll_return}
>> 0x00007fdf91061661: retq ;...c3
>> ;*invokestatic fibo
>> ; -
>> ClassicFibo::fibo at 10 (line 7)
>> 0x00007fdf91061662: mov %rax,%rsi ;...488bf0
>> 0x00007fdf91061665: jmp 0x00007fdf9106166a ;...eb03
>> 0x00007fdf91061667: mov %rax,%rsi ;...488bf0
>> ;*invokestatic fibo
>> ; -
>> ClassicFibo::fibo at 16 (line 7)
>> 0x00007fdf9106166a: add $0x20,%rsp ;...4883c420
>> 0x00007fdf9106166e: pop %rbp ;...5d
>> 0x00007fdf9106166f: jmpq 0x00007fdf91061220 ;...e9acfbff ff
>> ; {runtime_call}
>> 0x00007fdf91061674: hlt ;...f4
>> 0x00007fdf91061675: hlt ;...f4
>> 0x00007fdf91061676: hlt ;...f4
>> 0x00007fdf91061677: hlt ;...f4
>> 0x00007fdf91061678: hlt ;...f4
>> 0x00007fdf91061679: hlt ;...f4
>> 0x00007fdf9106167a: hlt ;...f4
>> 0x00007fdf9106167b: hlt ;...f4
>> 0x00007fdf9106167c: hlt ;...f4
>> 0x00007fdf9106167d: hlt ;...f4
>> 0x00007fdf9106167e: hlt ;...f4
>> 0x00007fdf9106167f: hlt ;...f4
>> [Stub Code]
>> 0x00007fdf91061680: mov $0x0,%rbx ;...48bb0000 000000
>> ;...000000
>> ; {no_reloc}
>> 0x00007fdf9106168a: jmpq 0x00007fdf9106168a ;...e9fbffff ff
>> ; {runtime_call}
>> 0x00007fdf9106168f: mov $0x0,%rbx ;...48bb0000 000000
>> ;...000000
>> ; {static_stub}
>> 0x00007fdf91061699: jmpq 0x00007fdf91061699 ;...e9fbffff ff
>> ; {runtime_call}
>> [Exception Handler]
>> 0x00007fdf9106169e: jmpq 0x00007fdf9105e4a0 ;...e9fdcdff ff
>> ; {runtime_call}
>> [Deopt Handler Code]
>> 0x00007fdf910616a3: callq 0x00007fdf910616a8 ;...e8000000 00
>> 0x00007fdf910616a8: subq $0x5,(%rsp) ;...48832c24 05
>> 0x00007fdf910616ad: jmpq 0x00007fdf91038c00 ;...e94e75fd ff
>> ; {runtime_call}
>> 0x00007fdf910616b2: hlt ;...f4
>> 0x00007fdf910616b3: hlt ;...f4
>> 0x00007fdf910616b4: hlt ;...f4
>> 0x00007fdf910616b5: hlt ;...f4
>> 0x00007fdf910616b6: hlt ;...f4
>> 0x00007fdf910616b7: hlt ;...f4
>>
>>
>> # {method} 'fibo' '(I)I' in 'FiboSample'
>> # parm0: rsi = int
>> # [sp+0x30] (sp of caller)
>> 0x00007faadd05fd40: mov %eax,-0x14000(%rsp) ;...89842400 c0feff
>> 0x00007faadd05fd47: push %rbp ;...55
>> 0x00007faadd05fd48: sub $0x20,%rsp ;...4883ec20
>> ;*synchronization entry
>> ; - FiboSample::fibo at -1
>> 0x00007faadd05fd4c: mov %esi,(%rsp) ;...893424
>> 0x00007faadd05fd4f: cmp $0x2,%esi ;...83fe02
>> 0x00007faadd05fd52: jl 0x00007faadd05fda1 ;...7c4d
>> ;*if_icmpge
>> ; - FiboSample::fibo at 2
>> 0x00007faadd05fd54: dec %esi ;...ffce
>> ;*isub
>> ; - FiboSample::fibo at 9
>> 0x00007faadd05fd56: nop ;...90
>> 0x00007faadd05fd57: callq 0x00007faadd038060 ;...e80483fd ff
>> ; OopMap{off=28}
>> ;*invokestatic fibo
>> ; - FiboSample::fibo at 10
>> ; {static_call}
>> 0x00007faadd05fd5c: mov %eax,0x4(%rsp) ;...89442404
>> ;*goto
>> ; - FiboSample::fibo at 61
>> 0x00007faadd05fd60: mov (%rsp),%esi ;...8b3424
>> 0x00007faadd05fd63: add $0xfffffffffffffffe,%esi ;...83c6fe
>> ;*isub
>> ; - FiboSample::fibo at 18
>> 0x00007faadd05fd66: nop ;...90
>> 0x00007faadd05fd67: callq 0x00007faadd038060 ;...e8f482fd ff
>> ; OopMap{off=44}
>> ;*invokestatic fibo
>> ; - FiboSample::fibo at 19
>> ; {static_call}
>> 0x00007faadd05fd6c: mov %eax,%r9d ;...448bc8
>> 0x00007faadd05fd6f: mov 0x4(%rsp),%eax ;...8b442404
>> 0x00007faadd05fd73: add %r9d,%eax ;...4103c1
>> ;*goto
>> ; - FiboSample::fibo at 75
>> 0x00007faadd05fd76: mov 0x4(%rsp),%r11d ;...448b5c24 04
>> 0x00007faadd05fd7b: xor %eax,%r11d ;...4433d8
>> 0x00007faadd05fd7e: mov %r9d,%r8d ;...458bc1
>> 0x00007faadd05fd81: xor %eax,%r8d ;...4433c0
>> 0x00007faadd05fd84: and %r8d,%r11d ;...4523d8
>> 0x00007faadd05fd87: test %r11d,%r11d ;...4585db
>> 0x00007faadd05fd8a: jge 0x00007faadd05fda6 ;...7d1a
>> ;*ifge
>> ; -
>> jdart.runtime.RT::addExact at 11 (line 139)
>> ; - FiboSample::fibo at 37
>> 0x00007faadd05fd8c: mov $0xa5,%esi ;...bea50000 00
>> 0x00007faadd05fd91: mov %r9d,(%rsp) ;...44890c24
>> 0x00007faadd05fd95: xchg %ax,%ax ;...6690
>> 0x00007faadd05fd97: callq 0x00007faadd039020 ;...e88492fd ff
>> ; OopMap{off=92}
>> ;*new ; -
>> jdart.runtime.RT::addExact at 14 (line 140)
>> ; - FiboSample::fibo at 37
>> ; {runtime_call}
>> 0x00007faadd05fd9c: callq 0x00007faae7394800 ;...e85f4a33 0a
>> ;*new ; -
>> jdart.runtime.RT::addExact at 14 (line 140)
>> ; - FiboSample::fibo at 37
>> ; {runtime_call}
>> 0x00007faadd05fda1: mov $0x1,%eax ;...b8010000 00
>> 0x00007faadd05fda6: add $0x20,%rsp ;...4883c420
>> 0x00007faadd05fdaa: pop %rbp ;...5d
>> 0x00007faadd05fdab: test %eax,0xab5524f(%rip) #
>> 0x00007faae7bb5000
>> ;...85054f52 b50a
>> ; {poll_return}
>> 0x00007faadd05fdb1: retq ;...c3
>> 0x00007faadd05fdb2: mov 0x8(%rax),%r11d ;...448b5808
>> 0x00007faadd05fdb6: cmp $0xefe4b124,%r11d ;...4181fb24 b1e4ef
>> ;
>> {oop('jdart/runtime/ControlFlowException')}
>> 0x00007faadd05fdbd: jne 0x00007faadd05fdf5 ;...7536
>> ;*invokestatic fibo
>> ; - FiboSample::fibo at 19
>> 0x00007faadd05fdbf: mov 0x20(%rax),%ebp ;...8b6820
>> 0x00007faadd05fdc2: test %ebp,%ebp ;...85ed
>> 0x00007faadd05fdc4: jne 0x00007faadd05fe11 ;...754b
>> ;*getfield value
>> ; - FiboSample::fibo at 70
>> 0x00007faadd05fdc6: mov 0x4(%rsp),%eax ;...8b442404
>> 0x00007faadd05fdca: xor %r9d,%r9d ;...4533c9
>> 0x00007faadd05fdcd: jmp 0x00007faadd05fd76 ;...eba7
>> 0x00007faadd05fdcf: mov 0x8(%rax),%r11d ;...448b5808
>> 0x00007faadd05fdd3: cmp $0xefe4b124,%r11d ;...4181fb24 b1e4ef
>> ;
>> {oop('jdart/runtime/ControlFlowException')}
>> 0x00007faadd05fdda: jne 0x00007faadd05fdf0 ;...7514
>> ;*invokestatic fibo
>> ; - FiboSample::fibo at 10
>> 0x00007faadd05fddc: mov 0x20(%rax),%ebp ;...8b6820
>> 0x00007faadd05fddf: test %ebp,%ebp ;...85ed
>> 0x00007faadd05fde1: jne 0x00007faadd05fe02 ;...751f
>> ;*getfield value
>> ; - FiboSample::fibo at 57
>> 0x00007faadd05fde3: xor %r11d,%r11d ;...4533db
>> 0x00007faadd05fde6: mov %r11d,0x4(%rsp) ;...44895c24 04
>> 0x00007faadd05fdeb: jmpq 0x00007faadd05fd60 ;...e970ffff ff
>> 0x00007faadd05fdf0: mov %rax,%rsi ;...488bf0
>> 0x00007faadd05fdf3: jmp 0x00007faadd05fdf8 ;...eb03
>> 0x00007faadd05fdf5: mov %rax,%rsi ;...488bf0
>> ;*invokestatic fibo
>> ; - FiboSample::fibo at 19
>> 0x00007faadd05fdf8: add $0x20,%rsp ;...4883c420
>> 0x00007faadd05fdfc: pop %rbp ;...5d
>> 0x00007faadd05fdfd: jmpq 0x00007faadd061660 ;...e95e1800 00
>> ; {runtime_call}
>> 0x00007faadd05fe02: mov $0xffffffec,%esi ;...beecffff ff
>> 0x00007faadd05fe07: callq 0x00007faadd039020 ;...e81492fd ff
>> ;
>> OopMap{rbp=NarrowOop off=204}
>> ;*astore_2
>> ; - FiboSample::fibo at 60
>> ; {runtime_call}
>> 0x00007faadd05fe0c: callq 0x00007faae7394800 ;...e8ef4933 0a
>> ;*getfield value
>> ; - FiboSample::fibo at 57
>> ; {runtime_call}
>> 0x00007faadd05fe11: mov $0xffffffec,%esi ;...beecffff ff
>> 0x00007faadd05fe16: nop ;...90
>> 0x00007faadd05fe17: callq 0x00007faadd039020 ;...e80492fd ff
>> ;
>> OopMap{rbp=NarrowOop off=220}
>> ;*astore
>> ; - FiboSample::fibo at 73
>> ; {runtime_call}
>> 0x00007faadd05fe1c: callq 0x00007faae7394800 ;...e8df4933 0a
>> ;*getfield value
>> ; - FiboSample::fibo at 70
>> ; {runtime_call}
>> 0x00007faadd05fe21: hlt ;...f4
>> 0x00007faadd05fe22: hlt ;...f4
>> 0x00007faadd05fe23: hlt ;...f4
>> 0x00007faadd05fe24: hlt ;...f4
>> 0x00007faadd05fe25: hlt ;...f4
>> 0x00007faadd05fe26: hlt ;...f4
>> 0x00007faadd05fe27: hlt ;...f4
>> 0x00007faadd05fe28: hlt ;...f4
>> 0x00007faadd05fe29: hlt ;...f4
>> 0x00007faadd05fe2a: hlt ;...f4
>> 0x00007faadd05fe2b: hlt ;...f4
>> 0x00007faadd05fe2c: hlt ;...f4
>> 0x00007faadd05fe2d: hlt ;...f4
>> 0x00007faadd05fe2e: hlt ;...f4
>> 0x00007faadd05fe2f: hlt ;...f4
>> 0x00007faadd05fe30: hlt ;...f4
>> 0x00007faadd05fe31: hlt ;...f4
>> 0x00007faadd05fe32: hlt ;...f4
>> 0x00007faadd05fe33: hlt ;...f4
>> 0x00007faadd05fe34: hlt ;...f4
>> 0x00007faadd05fe35: hlt ;...f4
>> 0x00007faadd05fe36: hlt ;...f4
>> 0x00007faadd05fe37: hlt ;...f4
>> 0x00007faadd05fe38: hlt ;...f4
>> 0x00007faadd05fe39: hlt ;...f4
>> 0x00007faadd05fe3a: hlt ;...f4
>> 0x00007faadd05fe3b: hlt ;...f4
>> 0x00007faadd05fe3c: hlt ;...f4
>> 0x00007faadd05fe3d: hlt ;...f4
>> 0x00007faadd05fe3e: hlt ;...f4
>> 0x00007faadd05fe3f: hlt ;...f4
>> [Stub Code]
>> 0x00007faadd05fe40: mov $0x0,%rbx ;...48bb0000 000000
>> ;...000000
>> ; {no_reloc}
>> 0x00007faadd05fe4a: jmpq 0x00007faadd05fe4a ;...e9fbffff ff
>> ; {runtime_call}
>> 0x00007faadd05fe4f: mov $0x0,%rbx ;...48bb0000 000000
>> ;...000000
>> ; {static_stub}
>> 0x00007faadd05fe59: jmpq 0x00007faadd05fe59 ;...e9fbffff ff
>> ; {runtime_call}
>> [Exception Handler]
>> 0x00007faadd05fe5e: jmpq 0x00007faadd05e8e0 ;...e97deaff ff
>> ; {runtime_call}
>> [Deopt Handler Code]
>> 0x00007faadd05fe63: callq 0x00007faadd05fe68 ;...e8000000 00
>> 0x00007faadd05fe68: subq $0x5,(%rsp) ;...48832c24 05
>> 0x00007faadd05fe6d: jmpq 0x00007faadd038c00 ;...e98e8dfd ff
>> ; {runtime_call}
>> 0x00007faadd05fe72: hlt ;...f4
>> 0x00007faadd05fe73: hlt ;...f4
>> 0x00007faadd05fe74: hlt ;...f4
>> 0x00007faadd05fe75: hlt ;...f4
>> 0x00007faadd05fe76: hlt ;...f4
>> 0x00007faadd05fe77: hlt ;...f4
>>
>>
>>>
>>> Rémi Forax wrote:
>>>> Hi guys,
>>>> I've found something really weird, c2 sometimes generates
>>>> the assembler instruction xchg %ax, %ax (see the assembly code
>>>> of fibo just before the first recursive call)
>>>> which I believe is equivalent to a nop but slower.
>>>>
>>>> In fact, xchg is really slow in my laptop (Nehalem), slower than
>>>> at least 5/6 instructions like move/xor/and.
>>>> I think it's because xchg is atomic see [1]
>>>>
>>>> I think c2 should never generate xchg or at least replace all xchg
>>>> %r, %r by nop.
>>>>
>>>> Rémi
>>>>
>>>> [1]
>>>> http://www.intel.ru/content/dam/doc/white-paper/intel-microarchitecture-white-paper.pdf
>>>>
>>>>
>>>>
>>>> public class ClassicFibo {
>>>> private static int fibo(int n) {
>>>> if (n < 2) {
>>>> return 1;
>>>> }
>>>> return fibo(n - 1) + fibo(n - 2);
>>>> }
>>>>
>>>> public static void main(String[] args) {
>>>> System.out.println(fibo(40));
>>>> }
>>>> }
>>>>
>>>> # {method} 'fibo' '(I)I' in 'ClassicFibo'
>>>> # parm0: rsi = int
>>>> # [sp+0x30] (sp of caller)
>>>> 0x00007fb409061620: mov %eax,-0x14000(%rsp)
>>>> 0x00007fb409061627: push %rbp
>>>> 0x00007fb409061628: sub $0x20,%rsp ;*synchronization entry
>>>> ; -
>>>> ClassicFibo::fibo at -1 (line 4)
>>>> 0x00007fb40906162c: mov %esi,%ebp
>>>> 0x00007fb40906162e: cmp $0x2,%esi
>>>> 0x00007fb409061631: jl 0x00007fb409061651 ;*if_icmpge
>>>> ; -
>>>> ClassicFibo::fibo at 2 (line 4)
>>>> 0x00007fb409061633: dec %esi ;*isub
>>>> ; -
>>>> ClassicFibo::fibo at 9 (line 7)
>>>> 0x00007fb409061635: xchg %ax,%ax
>>>> 0x00007fb409061637: callq 0x00007fb409038060 ; OopMap{off=28}
>>>> ;*invokestatic fibo
>>>> ; -
>>>> ClassicFibo::fibo at 10 (line 7)
>>>> ; {static_call}
>>>> ...
>>>>
>>>>
>>>>
>>>>
>>
>>
More information about the hotspot-compiler-dev
mailing list