[patch] Shark reroute LLVM atomic intrinsics to Zero

Xerxes Rånby xerxes at zafena.se
Mon Mar 30 03:25:19 PDT 2009


Andrew Haley skrev:
> Xerxes Rånby wrote:
>   
>> Andrew Haley skrev:
>>     
>>> Robert Schuster wrote:
>>>
>>>       
>>>> Xerxes Rånby schrieb:
>>>>    
>>>>         
>>>>> Greetings,
>>>>> This patch will make shark reroute LLVM atomic intrinsics to the
>>>>> existing atomic operations implemented in Zero.
>>>>>
>>>>> This patch are both platform and arch independent.
>>>>> I have tested this patch on Shark compiled for X86, PPC and ARM.
>>>>>       
>>>>>           
>>>> I would make this rerouting optional depending on the architecture.
>>>> LLVM has atomic intrinsic fucntion support for x86(-64), powerpc (32,64)
>>>> and alpha. On those architectures you really want to use what LLVM
>>>> provides.
>>>>
>>>> E.g. on x86 the function is converted into a series of machine
>>>> instructions and no function call.
>>>>         
>>> Definitely; we really don't want a function call just do do an atomic
>>> cmpxchg.  This is really just a workaround for an llvm bug, and hopefully
>>> it'll soon go away.
>>>       
>> I have done a small investigation to see how large the cost is to use
>> the reroute patch on PPC.
>> The test machine is a PowerBook G4 1.333Ghz with F10 installed.
>>
>> I used Caffeine Mark 3.0 for this benchmark, why? It is a quick
>> benchmark and it includes some graphics tests so it is quite fun to
>> benchmark with.
>>     
>
> And, perhaps unsurprisingly, it doesn't use java.lang.concurrent.*
> at all.  :-)
>
> Really, the use of lock-free in Java is only just beginning; in the
> future I expect it'll be the obvious way to do things.
>
> Andrew.
>
>   
I agree that it is a rather stupid benchmark to use yet I dont have any 
benchmark that i know specifically tests for concurrency.
My thinking was to use a benchmark with some gui parts since AWT 
internally are multi-threaded AFAIC just to see if i could measure any 
effect at all from the use of the reroute.

Ok, in order to check my sanity i did a small test to see if running the 
CM30 benchmarks triggered any rerouting of atomic intrinsics at all.
I added a printf to the rerouted functioncalls printing one char each 
like this:

extern "C" {
jint      zero_cmpxchg_int_fn(volatile jint *ptr,
                                       jint *oldval,
                                       jint *newval)
  {
        printf("1");
        return Atomic::cmpxchg((jint) newval,
                               (volatile jint *) ptr,
                               (jint) oldval);
  }

intptr_t* zero_cmpxchg_ptr_fn(volatile void *ptr,
                                   intptr_t *oldval,
                                   intptr_t *newval)
  {
        printf("0");
        return (intptr_t *) Atomic::cmpxchg_ptr((void *) newval,
                                   (volatile void *) ptr,
                                   (void *) oldval);
  }
};


And then i ran the CM30 using the appletviewer: logs can be found here:
http://labb.zafena.se/shark-testing/cm30_useof_atomic.log      3537613 bytes
http://labb.zafena.se/shark-testing/cm30_useof_atomic.log2    3493084 bytes

I am happy to see that the reroute at least did get used during the test 
3.5million times.

If someone know of a better benchmark that tests concurrency 
throughfully i would be happy to hear about it.

Cheers
Xerxes





More information about the distro-pkg-dev mailing list