Hi Tom,<br><br>Thanks for taking a look.<br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">I believe this was caused by the switch to using lock addl[esp], 0 instead of mfence for volatile membars, 6822204. My review request for that said that at the time I didn't measure any performance change for Intel, <a href="http://cr.openjdk.java.net/%7Enever/6822204" target="_blank">http://cr.openjdk.java.net/~never/6822204</a>. On your microbenchmark I can measure the difference though so I'm going to remeasure derby which previously showed the big difference. We may want to make the lock addl be AMD specific.<br>
</blockquote><div><br>I remember Dave Dice's blog entry about a (as far as I understand) similar issue: <a href="http://blogs.oracle.com/dave/resource/NHM-Pipeline-Blog-V2.txt">http://blogs.oracle.com/dave/resource/NHM-Pipeline-Blog-V2.txt</a><br>
My hardware is one generation before nehalem, which could explain the slowdown.<br><br>Thanks, Clemens<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<font color="#888888"><br>
tom<br>
</font><div><div></div><div class="h5"><br>
On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote:<br>
<br>
> Hi Vitaly,<br>
><br>
> I tried this bench on 6u23 and if I first run that code in a 10k iteration loop and then time the 1mm iteration loop I get about 10 ms speedup. The first loop would trigger jit compilation (10k is the default threshold I believe) and second should run without compilation interruption.<br>
><br>
> Can you try the same? Also might be interesting to time it under the interpreter (-Xint).<br>
><br>
> I changed the testcase a bit, to no longer rely on OSR - as lockBench() will for sure soon hit the compilation threshold after a few runs.<br>
><br>
> I get the following timings for 1m runs:<br>
><br>
> jdk7-server: 53ms<br>
> jdk7-client: 62ms<br>
> jdk7-xint : 955ms<br>
><br>
> jdk6-xint : 1000ms<br>
> jdk6-client: 68ms<br>
> jdk6-server: 52ms<br>
><br>
> jdk5-server: 40ms<br>
> jdk5-client: 61ms<br>
> jdk5-xint : 832ms<br>
><br>
> So JDK7 is slower in every case, the regression seems to have landed in jdk6 (I was using openjdk6).<br>
><br>
> Should I file a bug-report about this behaviour?<br>
><br>
> Thanks, Clemens<br>
><br>
><br>
> public class LockPerf {<br>
> static ReentrantLock lock = new ReentrantLock();<br>
><br>
> public static void main(String[] args) {<br>
> while (true) {<br>
> long start2 = System.nanoTime();<br>
> for(int i=0; i < 1000; i++) {<br>
> lockBench();<br>
> }<br>
> System.out.println("Lock bench: " + ((System.nanoTime() - start2)) / 1000000);<br>
> }<br>
> }<br>
><br>
> private static void lockBench() {<br>
> for (int i = 0; i < 1000; i++) {<br>
> lock.lock();<br>
> lock.unlock();<br>
> }<br>
> }<br>
> }<br>
><br>
><br>
> On Aug 11, 2011 11:38 AM, "Clemens Eisserer" <<a href="mailto:linuxhippy@gmail.com">linuxhippy@gmail.com</a>> wrote:<br>
> > Hi Vitaly,<br>
> ><br>
> > Which OS are you using?<br>
> >><br>
> > Linux-3.0 (Fedora 15)<br>
> ><br>
> ><br>
> >> Also, you should use System.nanoTime() for this type of timing as it gives<br>
> >> you a more precise timer.<br>
> >><br>
> > I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5.<br>
> > I was using the server compiler both times.<br>
> ><br>
> > Thanks, Clemens<br>
><br>
<br>
</div></div></blockquote></div><br>