[patch] Shark reroute LLVM atomic intrinsics to Zero
Xerxes Rånby
xerxes at zafena.se
Mon Mar 30 03:25:19 PDT 2009
Andrew Haley skrev:
> Xerxes Rånby wrote:
>
>> Andrew Haley skrev:
>>
>>> Robert Schuster wrote:
>>>
>>>
>>>> Xerxes Rånby schrieb:
>>>>
>>>>
>>>>> Greetings,
>>>>> This patch will make shark reroute LLVM atomic intrinsics to the
>>>>> existing atomic operations implemented in Zero.
>>>>>
>>>>> This patch are both platform and arch independent.
>>>>> I have tested this patch on Shark compiled for X86, PPC and ARM.
>>>>>
>>>>>
>>>> I would make this rerouting optional depending on the architecture.
>>>> LLVM has atomic intrinsic fucntion support for x86(-64), powerpc (32,64)
>>>> and alpha. On those architectures you really want to use what LLVM
>>>> provides.
>>>>
>>>> E.g. on x86 the function is converted into a series of machine
>>>> instructions and no function call.
>>>>
>>> Definitely; we really don't want a function call just do do an atomic
>>> cmpxchg. This is really just a workaround for an llvm bug, and hopefully
>>> it'll soon go away.
>>>
>> I have done a small investigation to see how large the cost is to use
>> the reroute patch on PPC.
>> The test machine is a PowerBook G4 1.333Ghz with F10 installed.
>>
>> I used Caffeine Mark 3.0 for this benchmark, why? It is a quick
>> benchmark and it includes some graphics tests so it is quite fun to
>> benchmark with.
>>
>
> And, perhaps unsurprisingly, it doesn't use java.lang.concurrent.*
> at all. :-)
>
> Really, the use of lock-free in Java is only just beginning; in the
> future I expect it'll be the obvious way to do things.
>
> Andrew.
>
>
I agree that it is a rather stupid benchmark to use yet I dont have any
benchmark that i know specifically tests for concurrency.
My thinking was to use a benchmark with some gui parts since AWT
internally are multi-threaded AFAIC just to see if i could measure any
effect at all from the use of the reroute.
Ok, in order to check my sanity i did a small test to see if running the
CM30 benchmarks triggered any rerouting of atomic intrinsics at all.
I added a printf to the rerouted functioncalls printing one char each
like this:
extern "C" {
jint zero_cmpxchg_int_fn(volatile jint *ptr,
jint *oldval,
jint *newval)
{
printf("1");
return Atomic::cmpxchg((jint) newval,
(volatile jint *) ptr,
(jint) oldval);
}
intptr_t* zero_cmpxchg_ptr_fn(volatile void *ptr,
intptr_t *oldval,
intptr_t *newval)
{
printf("0");
return (intptr_t *) Atomic::cmpxchg_ptr((void *) newval,
(volatile void *) ptr,
(void *) oldval);
}
};
And then i ran the CM30 using the appletviewer: logs can be found here:
http://labb.zafena.se/shark-testing/cm30_useof_atomic.log 3537613 bytes
http://labb.zafena.se/shark-testing/cm30_useof_atomic.log2 3493084 bytes
I am happy to see that the reroute at least did get used during the test
3.5million times.
If someone know of a better benchmark that tests concurrency
throughfully i would be happy to hear about it.
Cheers
Xerxes
More information about the distro-pkg-dev
mailing list