Hi Tom,<br><br>Thanks for taking a look.<br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">I believe this was caused by the switch to using lock addl[esp], 0 instead of mfence for volatile membars, 6822204.  My review request for that said that at the time I didn&#39;t measure any performance change for Intel, <a href="http://cr.openjdk.java.net/%7Enever/6822204" target="_blank">http://cr.openjdk.java.net/~never/6822204</a>.  On your microbenchmark I can measure the difference though so I&#39;m going to remeasure derby which previously showed the big difference.  We may want to make the lock addl be AMD specific.<br>

</blockquote><div><br>I remember Dave Dice&#39;s blog entry about a (as far as I understand) similar issue: <a href="http://blogs.oracle.com/dave/resource/NHM-Pipeline-Blog-V2.txt">http://blogs.oracle.com/dave/resource/NHM-Pipeline-Blog-V2.txt</a><br>

My hardware is one generation before nehalem, which could explain the slowdown.<br><br>Thanks, Clemens<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


<font color="#888888"><br>

tom<br>

</font><div><div></div><div class="h5"><br>

On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote:<br>

<br>

&gt; Hi Vitaly,<br>

&gt;<br>

&gt; I tried this bench on 6u23 and if I first run that code in a 10k iteration loop and then time the 1mm iteration loop I get about 10 ms speedup.  The first loop would trigger jit compilation (10k is the default threshold I believe) and second should run without compilation interruption.<br>


&gt;<br>

&gt; Can you try the same? Also might be interesting to time it under the interpreter (-Xint).<br>

&gt;<br>

&gt; I changed the testcase a bit, to no longer rely on OSR - as lockBench() will for sure soon hit the compilation threshold after a few runs.<br>

&gt;<br>

&gt; I get the following timings for 1m runs:<br>

&gt;<br>

&gt; jdk7-server: 53ms<br>

&gt; jdk7-client: 62ms<br>

&gt; jdk7-xint  : 955ms<br>

&gt;<br>

&gt; jdk6-xint  : 1000ms<br>

&gt; jdk6-client: 68ms<br>

&gt; jdk6-server: 52ms<br>

&gt;<br>

&gt; jdk5-server: 40ms<br>

&gt; jdk5-client: 61ms<br>

&gt; jdk5-xint  : 832ms<br>

&gt;<br>

&gt; So JDK7 is slower in every case, the regression seems to have landed in jdk6 (I was using openjdk6).<br>

&gt;<br>

&gt; Should I file a bug-report about this behaviour?<br>

&gt;<br>

&gt; Thanks, Clemens<br>

&gt;<br>

&gt;<br>

&gt; public class LockPerf {<br>

&gt;     static ReentrantLock lock = new ReentrantLock();<br>

&gt;<br>

&gt;     public static void main(String[] args) {<br>

&gt;      while (true) {<br>

&gt;           long start2 = System.nanoTime();<br>

&gt;           for(int i=0; i &lt; 1000; i++) {<br>

&gt;           lockBench();<br>

&gt;         }<br>

&gt;         System.out.println(&quot;Lock bench: &quot; + ((System.nanoTime() - start2)) / 1000000);<br>

&gt;     }<br>

&gt;     }<br>

&gt;<br>

&gt;     private static void lockBench() {<br>

&gt;         for (int i = 0; i &lt; 1000; i++) {<br>

&gt;           lock.lock();<br>

&gt;           lock.unlock();<br>

&gt;         }<br>

&gt;     }<br>

&gt; }<br>

&gt;<br>

&gt;<br>

&gt; On Aug 11, 2011 11:38 AM, &quot;Clemens Eisserer&quot; &lt;<a href="mailto:linuxhippy@gmail.com">linuxhippy@gmail.com</a>&gt; wrote:<br>

&gt; &gt; Hi Vitaly,<br>

&gt; &gt;<br>

&gt; &gt; Which OS are you using?<br>

&gt; &gt;&gt;<br>

&gt; &gt; Linux-3.0 (Fedora 15)<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt;&gt; Also, you should use System.nanoTime() for this type of timing as it gives<br>

&gt; &gt;&gt; you a more precise timer.<br>

&gt; &gt;&gt;<br>

&gt; &gt; I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5.<br>

&gt; &gt; I was using the server compiler both times.<br>

&gt; &gt;<br>

&gt; &gt; Thanks, Clemens<br>

&gt;<br>

<br>

</div></div></blockquote></div><br>