XCHG is slow ?
Rémi Forax
forax at univ-mlv.fr
Tue Jul 3 17:01:26 PDT 2012
On 07/04/2012 01:44 AM, Vladimir Kozlov wrote:
> Rémi,
>
> NOP and xchg %ax,%ax are and always were the same instruction and have
> the same encoding 0x90. What you see is multibytes nop (2 bytes in
> your case) to align following call instruction: 0x66 0x90.
>
> Vladimir
I see and I suppose that a multibytes nop is not slower than a single
byte nop.
So to explain my problem clearly, I've two codes the first one is
fibonacci (ClassicFibo)
with an int, the second one is fibonacci with an int plus the overflow
detection (FiboSample).
My problem is that the code with the overflow detection is faster the code
without the overflow detection :)
Rémi
# {method} 'fibo' '(I)I' in 'ClassicFibo'
# parm0: rsi = int
# [sp+0x30] (sp of caller)
0x00007fdf91061620: mov %eax,-0x14000(%rsp) ;...89842400 c0feff
0x00007fdf91061627: push %rbp ;...55
0x00007fdf91061628: sub $0x20,%rsp ;...4883ec20
;*synchronization entry
; -
ClassicFibo::fibo at -1 (line 4)
0x00007fdf9106162c: mov %esi,%ebp ;...8bee
0x00007fdf9106162e: cmp $0x2,%esi ;...83fe02
0x00007fdf91061631: jl 0x00007fdf91061651 ;...7c1e
;*if_icmpge
; - ClassicFibo::fibo at 2
(line 4)
0x00007fdf91061633: dec %esi ;...ffce
;*isub
; - ClassicFibo::fibo at 9
(line 7)
0x00007fdf91061635: xchg %ax,%ax ;...6690
0x00007fdf91061637: callq 0x00007fdf91038060 ;...e8246afd ff
; OopMap{off=28}
;*invokestatic fibo
; -
ClassicFibo::fibo at 10 (line 7)
; {static_call}
0x00007fdf9106163c: mov %eax,(%rsp) ;...890424
0x00007fdf9106163f: mov %ebp,%esi ;...8bf5
0x00007fdf91061641: add $0xfffffffffffffffe,%esi ;...83c6fe
;*isub
; -
ClassicFibo::fibo at 15 (line 7)
0x00007fdf91061644: xchg %ax,%ax ;...666690
0x00007fdf91061647: callq 0x00007fdf91038060 ;...e8146afd ff
; OopMap{off=44}
;*invokestatic fibo
; -
ClassicFibo::fibo at 16 (line 7)
; {static_call}
0x00007fdf9106164c: add (%rsp),%eax ;...030424
;*iadd
; -
ClassicFibo::fibo at 19 (line 7)
0x00007fdf9106164f: jmp 0x00007fdf91061656 ;...eb05
0x00007fdf91061651: mov $0x1,%eax ;...b8010000 00
0x00007fdf91061656: add $0x20,%rsp ;...4883c420
0x00007fdf9106165a: pop %rbp ;...5d
0x00007fdf9106165b: test %eax,0x91c899f(%rip) #
0x00007fdf9a22a000
;...85059f89 1c09
; {poll_return}
0x00007fdf91061661: retq ;...c3
;*invokestatic fibo
; -
ClassicFibo::fibo at 10 (line 7)
0x00007fdf91061662: mov %rax,%rsi ;...488bf0
0x00007fdf91061665: jmp 0x00007fdf9106166a ;...eb03
0x00007fdf91061667: mov %rax,%rsi ;...488bf0
;*invokestatic fibo
; -
ClassicFibo::fibo at 16 (line 7)
0x00007fdf9106166a: add $0x20,%rsp ;...4883c420
0x00007fdf9106166e: pop %rbp ;...5d
0x00007fdf9106166f: jmpq 0x00007fdf91061220 ;...e9acfbff ff
; {runtime_call}
0x00007fdf91061674: hlt ;...f4
0x00007fdf91061675: hlt ;...f4
0x00007fdf91061676: hlt ;...f4
0x00007fdf91061677: hlt ;...f4
0x00007fdf91061678: hlt ;...f4
0x00007fdf91061679: hlt ;...f4
0x00007fdf9106167a: hlt ;...f4
0x00007fdf9106167b: hlt ;...f4
0x00007fdf9106167c: hlt ;...f4
0x00007fdf9106167d: hlt ;...f4
0x00007fdf9106167e: hlt ;...f4
0x00007fdf9106167f: hlt ;...f4
[Stub Code]
0x00007fdf91061680: mov $0x0,%rbx ;...48bb0000 000000
;...000000
; {no_reloc}
0x00007fdf9106168a: jmpq 0x00007fdf9106168a ;...e9fbffff ff
; {runtime_call}
0x00007fdf9106168f: mov $0x0,%rbx ;...48bb0000 000000
;...000000
; {static_stub}
0x00007fdf91061699: jmpq 0x00007fdf91061699 ;...e9fbffff ff
; {runtime_call}
[Exception Handler]
0x00007fdf9106169e: jmpq 0x00007fdf9105e4a0 ;...e9fdcdff ff
; {runtime_call}
[Deopt Handler Code]
0x00007fdf910616a3: callq 0x00007fdf910616a8 ;...e8000000 00
0x00007fdf910616a8: subq $0x5,(%rsp) ;...48832c24 05
0x00007fdf910616ad: jmpq 0x00007fdf91038c00 ;...e94e75fd ff
; {runtime_call}
0x00007fdf910616b2: hlt ;...f4
0x00007fdf910616b3: hlt ;...f4
0x00007fdf910616b4: hlt ;...f4
0x00007fdf910616b5: hlt ;...f4
0x00007fdf910616b6: hlt ;...f4
0x00007fdf910616b7: hlt ;...f4
# {method} 'fibo' '(I)I' in 'FiboSample'
# parm0: rsi = int
# [sp+0x30] (sp of caller)
0x00007faadd05fd40: mov %eax,-0x14000(%rsp) ;...89842400 c0feff
0x00007faadd05fd47: push %rbp ;...55
0x00007faadd05fd48: sub $0x20,%rsp ;...4883ec20
;*synchronization entry
; - FiboSample::fibo at -1
0x00007faadd05fd4c: mov %esi,(%rsp) ;...893424
0x00007faadd05fd4f: cmp $0x2,%esi ;...83fe02
0x00007faadd05fd52: jl 0x00007faadd05fda1 ;...7c4d
;*if_icmpge
; - FiboSample::fibo at 2
0x00007faadd05fd54: dec %esi ;...ffce
;*isub
; - FiboSample::fibo at 9
0x00007faadd05fd56: nop ;...90
0x00007faadd05fd57: callq 0x00007faadd038060 ;...e80483fd ff
; OopMap{off=28}
;*invokestatic fibo
; - FiboSample::fibo at 10
; {static_call}
0x00007faadd05fd5c: mov %eax,0x4(%rsp) ;...89442404
;*goto
; - FiboSample::fibo at 61
0x00007faadd05fd60: mov (%rsp),%esi ;...8b3424
0x00007faadd05fd63: add $0xfffffffffffffffe,%esi ;...83c6fe
;*isub
; - FiboSample::fibo at 18
0x00007faadd05fd66: nop ;...90
0x00007faadd05fd67: callq 0x00007faadd038060 ;...e8f482fd ff
; OopMap{off=44}
;*invokestatic fibo
; - FiboSample::fibo at 19
; {static_call}
0x00007faadd05fd6c: mov %eax,%r9d ;...448bc8
0x00007faadd05fd6f: mov 0x4(%rsp),%eax ;...8b442404
0x00007faadd05fd73: add %r9d,%eax ;...4103c1
;*goto
; - FiboSample::fibo at 75
0x00007faadd05fd76: mov 0x4(%rsp),%r11d ;...448b5c24 04
0x00007faadd05fd7b: xor %eax,%r11d ;...4433d8
0x00007faadd05fd7e: mov %r9d,%r8d ;...458bc1
0x00007faadd05fd81: xor %eax,%r8d ;...4433c0
0x00007faadd05fd84: and %r8d,%r11d ;...4523d8
0x00007faadd05fd87: test %r11d,%r11d ;...4585db
0x00007faadd05fd8a: jge 0x00007faadd05fda6 ;...7d1a
;*ifge
; -
jdart.runtime.RT::addExact at 11 (line 139)
; - FiboSample::fibo at 37
0x00007faadd05fd8c: mov $0xa5,%esi ;...bea50000 00
0x00007faadd05fd91: mov %r9d,(%rsp) ;...44890c24
0x00007faadd05fd95: xchg %ax,%ax ;...6690
0x00007faadd05fd97: callq 0x00007faadd039020 ;...e88492fd ff
; OopMap{off=92}
;*new ; -
jdart.runtime.RT::addExact at 14 (line 140)
; - FiboSample::fibo at 37
; {runtime_call}
0x00007faadd05fd9c: callq 0x00007faae7394800 ;...e85f4a33 0a
;*new ; -
jdart.runtime.RT::addExact at 14 (line 140)
; - FiboSample::fibo at 37
; {runtime_call}
0x00007faadd05fda1: mov $0x1,%eax ;...b8010000 00
0x00007faadd05fda6: add $0x20,%rsp ;...4883c420
0x00007faadd05fdaa: pop %rbp ;...5d
0x00007faadd05fdab: test %eax,0xab5524f(%rip) #
0x00007faae7bb5000
;...85054f52 b50a
; {poll_return}
0x00007faadd05fdb1: retq ;...c3
0x00007faadd05fdb2: mov 0x8(%rax),%r11d ;...448b5808
0x00007faadd05fdb6: cmp $0xefe4b124,%r11d ;...4181fb24 b1e4ef
;
{oop('jdart/runtime/ControlFlowException')}
0x00007faadd05fdbd: jne 0x00007faadd05fdf5 ;...7536
;*invokestatic fibo
; - FiboSample::fibo at 19
0x00007faadd05fdbf: mov 0x20(%rax),%ebp ;...8b6820
0x00007faadd05fdc2: test %ebp,%ebp ;...85ed
0x00007faadd05fdc4: jne 0x00007faadd05fe11 ;...754b
;*getfield value
; - FiboSample::fibo at 70
0x00007faadd05fdc6: mov 0x4(%rsp),%eax ;...8b442404
0x00007faadd05fdca: xor %r9d,%r9d ;...4533c9
0x00007faadd05fdcd: jmp 0x00007faadd05fd76 ;...eba7
0x00007faadd05fdcf: mov 0x8(%rax),%r11d ;...448b5808
0x00007faadd05fdd3: cmp $0xefe4b124,%r11d ;...4181fb24 b1e4ef
;
{oop('jdart/runtime/ControlFlowException')}
0x00007faadd05fdda: jne 0x00007faadd05fdf0 ;...7514
;*invokestatic fibo
; - FiboSample::fibo at 10
0x00007faadd05fddc: mov 0x20(%rax),%ebp ;...8b6820
0x00007faadd05fddf: test %ebp,%ebp ;...85ed
0x00007faadd05fde1: jne 0x00007faadd05fe02 ;...751f
;*getfield value
; - FiboSample::fibo at 57
0x00007faadd05fde3: xor %r11d,%r11d ;...4533db
0x00007faadd05fde6: mov %r11d,0x4(%rsp) ;...44895c24 04
0x00007faadd05fdeb: jmpq 0x00007faadd05fd60 ;...e970ffff ff
0x00007faadd05fdf0: mov %rax,%rsi ;...488bf0
0x00007faadd05fdf3: jmp 0x00007faadd05fdf8 ;...eb03
0x00007faadd05fdf5: mov %rax,%rsi ;...488bf0
;*invokestatic fibo
; - FiboSample::fibo at 19
0x00007faadd05fdf8: add $0x20,%rsp ;...4883c420
0x00007faadd05fdfc: pop %rbp ;...5d
0x00007faadd05fdfd: jmpq 0x00007faadd061660 ;...e95e1800 00
; {runtime_call}
0x00007faadd05fe02: mov $0xffffffec,%esi ;...beecffff ff
0x00007faadd05fe07: callq 0x00007faadd039020 ;...e81492fd ff
; OopMap{rbp=NarrowOop
off=204}
;*astore_2
; - FiboSample::fibo at 60
; {runtime_call}
0x00007faadd05fe0c: callq 0x00007faae7394800 ;...e8ef4933 0a
;*getfield value
; - FiboSample::fibo at 57
; {runtime_call}
0x00007faadd05fe11: mov $0xffffffec,%esi ;...beecffff ff
0x00007faadd05fe16: nop ;...90
0x00007faadd05fe17: callq 0x00007faadd039020 ;...e80492fd ff
; OopMap{rbp=NarrowOop
off=220}
;*astore
; - FiboSample::fibo at 73
; {runtime_call}
0x00007faadd05fe1c: callq 0x00007faae7394800 ;...e8df4933 0a
;*getfield value
; - FiboSample::fibo at 70
; {runtime_call}
0x00007faadd05fe21: hlt ;...f4
0x00007faadd05fe22: hlt ;...f4
0x00007faadd05fe23: hlt ;...f4
0x00007faadd05fe24: hlt ;...f4
0x00007faadd05fe25: hlt ;...f4
0x00007faadd05fe26: hlt ;...f4
0x00007faadd05fe27: hlt ;...f4
0x00007faadd05fe28: hlt ;...f4
0x00007faadd05fe29: hlt ;...f4
0x00007faadd05fe2a: hlt ;...f4
0x00007faadd05fe2b: hlt ;...f4
0x00007faadd05fe2c: hlt ;...f4
0x00007faadd05fe2d: hlt ;...f4
0x00007faadd05fe2e: hlt ;...f4
0x00007faadd05fe2f: hlt ;...f4
0x00007faadd05fe30: hlt ;...f4
0x00007faadd05fe31: hlt ;...f4
0x00007faadd05fe32: hlt ;...f4
0x00007faadd05fe33: hlt ;...f4
0x00007faadd05fe34: hlt ;...f4
0x00007faadd05fe35: hlt ;...f4
0x00007faadd05fe36: hlt ;...f4
0x00007faadd05fe37: hlt ;...f4
0x00007faadd05fe38: hlt ;...f4
0x00007faadd05fe39: hlt ;...f4
0x00007faadd05fe3a: hlt ;...f4
0x00007faadd05fe3b: hlt ;...f4
0x00007faadd05fe3c: hlt ;...f4
0x00007faadd05fe3d: hlt ;...f4
0x00007faadd05fe3e: hlt ;...f4
0x00007faadd05fe3f: hlt ;...f4
[Stub Code]
0x00007faadd05fe40: mov $0x0,%rbx ;...48bb0000 000000
;...000000
; {no_reloc}
0x00007faadd05fe4a: jmpq 0x00007faadd05fe4a ;...e9fbffff ff
; {runtime_call}
0x00007faadd05fe4f: mov $0x0,%rbx ;...48bb0000 000000
;...000000
; {static_stub}
0x00007faadd05fe59: jmpq 0x00007faadd05fe59 ;...e9fbffff ff
; {runtime_call}
[Exception Handler]
0x00007faadd05fe5e: jmpq 0x00007faadd05e8e0 ;...e97deaff ff
; {runtime_call}
[Deopt Handler Code]
0x00007faadd05fe63: callq 0x00007faadd05fe68 ;...e8000000 00
0x00007faadd05fe68: subq $0x5,(%rsp) ;...48832c24 05
0x00007faadd05fe6d: jmpq 0x00007faadd038c00 ;...e98e8dfd ff
; {runtime_call}
0x00007faadd05fe72: hlt ;...f4
0x00007faadd05fe73: hlt ;...f4
0x00007faadd05fe74: hlt ;...f4
0x00007faadd05fe75: hlt ;...f4
0x00007faadd05fe76: hlt ;...f4
0x00007faadd05fe77: hlt ;...f4
>
> Rémi Forax wrote:
>> Hi guys,
>> I've found something really weird, c2 sometimes generates
>> the assembler instruction xchg %ax, %ax (see the assembly code
>> of fibo just before the first recursive call)
>> which I believe is equivalent to a nop but slower.
>>
>> In fact, xchg is really slow in my laptop (Nehalem), slower than
>> at least 5/6 instructions like move/xor/and.
>> I think it's because xchg is atomic see [1]
>>
>> I think c2 should never generate xchg or at least replace all xchg
>> %r, %r by nop.
>>
>> Rémi
>>
>> [1]
>> http://www.intel.ru/content/dam/doc/white-paper/intel-microarchitecture-white-paper.pdf
>>
>>
>>
>> public class ClassicFibo {
>> private static int fibo(int n) {
>> if (n < 2) {
>> return 1;
>> }
>> return fibo(n - 1) + fibo(n - 2);
>> }
>>
>> public static void main(String[] args) {
>> System.out.println(fibo(40));
>> }
>> }
>>
>> # {method} 'fibo' '(I)I' in 'ClassicFibo'
>> # parm0: rsi = int
>> # [sp+0x30] (sp of caller)
>> 0x00007fb409061620: mov %eax,-0x14000(%rsp)
>> 0x00007fb409061627: push %rbp
>> 0x00007fb409061628: sub $0x20,%rsp ;*synchronization entry
>> ; -
>> ClassicFibo::fibo at -1 (line 4)
>> 0x00007fb40906162c: mov %esi,%ebp
>> 0x00007fb40906162e: cmp $0x2,%esi
>> 0x00007fb409061631: jl 0x00007fb409061651 ;*if_icmpge
>> ; -
>> ClassicFibo::fibo at 2 (line 4)
>> 0x00007fb409061633: dec %esi ;*isub
>> ; -
>> ClassicFibo::fibo at 9 (line 7)
>> 0x00007fb409061635: xchg %ax,%ax
>> 0x00007fb409061637: callq 0x00007fb409038060 ; OopMap{off=28}
>> ;*invokestatic fibo
>> ; -
>> ClassicFibo::fibo at 10 (line 7)
>> ; {static_call}
>> ...
>>
>>
>>
>>
More information about the hotspot-compiler-dev
mailing list