XCHG is slow ?

Tue Jul 3 16:11:36 PDT 2012

Hi guys,
I've found something really weird, c2 sometimes generates
the assembler instruction xchg %ax, %ax (see the assembly code
of fibo just before the first recursive call)
which I believe is equivalent to a nop but slower.

In fact, xchg is really slow in my laptop (Nehalem), slower than
at least 5/6 instructions like move/xor/and.
I think it's because xchg is atomic see [1]

I think c2 should never generate xchg or at least replace all xchg %r, 
%r by nop.

Rémi

[1] 
http://www.intel.ru/content/dam/doc/white-paper/intel-microarchitecture-white-paper.pdf

public class ClassicFibo {
   private static int fibo(int n) {
     if (n < 2) {
       return 1;
     }
     return fibo(n - 1) + fibo(n - 2);
   }

   public static void main(String[] args) {
     System.out.println(fibo(40));
   }
}

   # {method} 'fibo' '(I)I' in 'ClassicFibo'
   # parm0:    rsi       = int
   #           [sp+0x30]  (sp of caller)
   0x00007fb409061620: mov    %eax,-0x14000(%rsp)
   0x00007fb409061627: push   %rbp
   0x00007fb409061628: sub    $0x20,%rsp         ;*synchronization entry
                                                 ; - 
ClassicFibo::fibo at -1 (line 4)
   0x00007fb40906162c: mov    %esi,%ebp
   0x00007fb40906162e: cmp    $0x2,%esi
   0x00007fb409061631: jl     0x00007fb409061651  ;*if_icmpge
                                                 ; - ClassicFibo::fibo at 2 
(line 4)
   0x00007fb409061633: dec    %esi               ;*isub
                                                 ; - ClassicFibo::fibo at 9 
(line 7)
   0x00007fb409061635: xchg   %ax,%ax
   0x00007fb409061637: callq  0x00007fb409038060  ; OopMap{off=28}
                                                 ;*invokestatic fibo
                                                 ; - 
ClassicFibo::fibo at 10 (line 7)
                                                 ;   {static_call}
   ...