[patch] Shark reroute LLVM atomic intrinsics to Zero
Gary Benson
gbenson at redhat.com
Mon Apr 6 05:58:37 PDT 2009
Xerxes Rånby wrote:
> Andrew Haley skrev:
> > Xerxes Rånby wrote:
> > > Andrew Haley skrev:
> > > > Robert Schuster wrote:
> > > > > Xerxes Rånby schrieb:
> > > > > > This patch will make shark reroute LLVM atomic intrinsics
> > > > > > to the existing atomic operations implemented in Zero.
> > > > > >
> > > > > > This patch are both platform and arch independent. I have
> > > > > > tested this patch on Shark compiled for X86, PPC and ARM.
> > > > >
> > > > > I would make this rerouting optional depending on the
> > > > > architecture. LLVM has atomic intrinsic fucntion support
> > > > > for x86(-64), powerpc (32,64) and alpha. On those
> > > > > architectures you really want to use what LLVM provides.
> > > > >
> > > > > E.g. on x86 the function is converted into a series of
> > > > > machine instructions and no function call.
> > > >
> > > > Definitely; we really don't want a function call just do do an
> > > > atomic cmpxchg. This is really just a workaround for an llvm
> > > > bug, and hopefully it'll soon go away.
> > >
> > > I have done a small investigation to see how large the cost is
> > > to use the reroute patch on PPC. The test machine is a
> > > PowerBook G4 1.333Ghz with F10 installed.
> > >
> > > I used Caffeine Mark 3.0 for this benchmark, why? It is a quick
> > > benchmark and it includes some graphics tests so it is quite fun
> > > to benchmark with.
> >
> > And, perhaps unsurprisingly, it doesn't use java.lang.concurrent.*
> > at all. :-)
> >
> > Really, the use of lock-free in Java is only just beginning; in the
> > future I expect it'll be the obvious way to do things.
>
> I agree that it is a rather stupid benchmark to use yet I dont have
> any benchmark that i know specifically tests for concurrency. My
> thinking was to use a benchmark with some gui parts since AWT
> internally are multi-threaded AFAIC just to see if i could measure
> any effect at all from the use of the reroute.
[snip]
> If someone know of a better benchmark that tests concurrency
> throughfully i would be happy to hear about it.
Here's a microbenchmark. On PowerPC, the new instruction is more
than three times slower with the function call once the thread's
(small) local allocation buffer is exhausted. This definitely
needs to be conditional code.
Having said that, is there any reason to use atomic intrinsics on a
single processor system? And, do multiprocessor ARM systems exist?
If the answers to those questions are both no then we could make Shark
emit non-atomic code to emulate cmpxchg when it is running on a single
processor system, which would sidestep the issue for ARM entirely.
Cheers,
Gary
--
http://gbenson.net/
-------------- next part --------------
class Test {
static long test(int size) {
Object[] array = new Object[size];
long start = System.currentTimeMillis();
for (int i = 0; i < size; i++)
array[i] = new Object();
return System.currentTimeMillis() - start;
}
public static void main(String[] args) {
int size = Integer.parseInt(args[0]);
while (true)
System.out.println("time taken: " + test(size) + "ms");
}
}
-------------- next part --------------
tooma:[mixtec]$ openjdk-ecj/control/build/linux-ppc/j2sdk-image/bin/java -XX:+PrintCompilation -XX:-UseTLAB -Xm{s,x}1G -XX:+PrintGC Test 1000000
1 java.lang.String::hashCode (60 bytes)
2 java.lang.String::charAt (33 bytes)
3 java.lang.String::indexOf (152 bytes)
4 sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (490 bytes)
5 java.lang.Object::<init> (1 bytes)
time taken: 5590ms
6 Test::test (41 bytes)
time taken: 5591ms
time taken: 72ms
time taken: 72ms
time taken: 72ms
[GC 64512K->5836K(1040512K), 1.0372710 secs]
time taken: 1113ms
time taken: 73ms
time taken: 72ms
time taken: 73ms
time taken: 72ms
[GC 70348K->11753K(1040512K), 3.4359440 secs]
time taken: 3510ms
time taken: 72ms
time taken: 73ms
time taken: 72ms
time taken: 72ms
time taken: 72ms
[GC 76265K->9534K(1040512K), 1.0564340 secs]
time taken: 1129ms
time taken: 72ms
time taken: 73ms
time taken: 81ms
time taken: 72ms
time taken: 73ms
[GC 74003K->3689K(1040512K), 0.1494880 secs]
time taken: 70ms
time taken: 71ms
time taken: 72ms
time taken: 72ms
time taken: 72ms
^C
-------------- next part --------------
tooma:[mixtec]$ openjdk-ecj/control/build/linux-ppc/j2sdk-image/bin/java -XX:+PrintCompilation -XX:-UseTLAB -Xm{s,x}1G -XX:+PrintGC Test 1000000
1 java.lang.String::hashCode (60 bytes)
2 java.lang.String::charAt (33 bytes)
3 java.lang.String::indexOf (152 bytes)
4 sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (490 bytes)
5 java.lang.Object::<init> (1 bytes)
time taken: 5612ms
6 Test::test (41 bytes)
time taken: 5565ms
time taken: 233ms
time taken: 231ms
time taken: 232ms
[GC 64512K->5836K(1040512K), 1.0353880 secs]
time taken: 1268ms
time taken: 231ms
time taken: 231ms
time taken: 234ms
time taken: 231ms
[GC 70348K->11752K(1040512K), 3.4058850 secs]
time taken: 3638ms
time taken: 231ms
time taken: 231ms
time taken: 231ms
time taken: 230ms
time taken: 232ms
[GC 76264K->9532K(1040512K), 1.0380370 secs]
time taken: 1269ms
time taken: 230ms
time taken: 231ms
time taken: 230ms
time taken: 231ms
time taken: 231ms
[GC 74003K->3688K(1040512K), 0.1496730 secs]
time taken: 231ms
time taken: 231ms
time taken: 232ms
time taken: 231ms
time taken: 232ms
[GC 68200K->9605K(1040512K), 1.0443780 secs]
time taken: 1276ms
time taken: 230ms
time taken: 230ms
time taken: 231ms
time taken: 231ms
^C
More information about the distro-pkg-dev
mailing list