ReentrantLock performance regression between JDK5 and 6/7?

Thu Aug 11 15:39:15 PDT 2011

Hi Tom,

Just curious - I recall reading on Dave Dice's blog that he found locked add
to perform better than mfence.  Granted he tested on a nehalem box - do you
think it may need more granular decision making in the jit than just amd vs
Intel? i.e. check Intel generation as well.

Thanks
On Aug 11, 2011 6:03 PM, "Tom Rodriguez" <tom.rodriguez at oracle.com> wrote:
> I believe this was caused by the switch to using lock addl[esp], 0 instead
of mfence for volatile membars, 6822204. My review request for that said
that at the time I didn't measure any performance change for Intel,
http://cr.openjdk.java.net/~never/6822204. On your microbenchmark I can
measure the difference though so I'm going to remeasure derby which
previously showed the big difference. We may want to make the lock addl be
AMD specific.
>
> tom
>
> On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote:
>
>> Hi Vitaly,
>>
>> I tried this bench on 6u23 and if I first run that code in a 10k
iteration loop and then time the 1mm iteration loop I get about 10 ms
speedup. The first loop would trigger jit compilation (10k is the default
threshold I believe) and second should run without compilation interruption.
>>
>> Can you try the same? Also might be interesting to time it under the
interpreter (-Xint).
>>
>> I changed the testcase a bit, to no longer rely on OSR - as lockBench()
will for sure soon hit the compilation threshold after a few runs.
>>
>> I get the following timings for 1m runs:
>>
>> jdk7-server: 53ms
>> jdk7-client: 62ms
>> jdk7-xint : 955ms
>>
>> jdk6-xint : 1000ms
>> jdk6-client: 68ms
>> jdk6-server: 52ms
>>
>> jdk5-server: 40ms
>> jdk5-client: 61ms
>> jdk5-xint : 832ms
>>
>> So JDK7 is slower in every case, the regression seems to have landed in
jdk6 (I was using openjdk6).
>>
>> Should I file a bug-report about this behaviour?
>>
>> Thanks, Clemens
>>
>>
>> public class LockPerf {
>> static ReentrantLock lock = new ReentrantLock();
>>
>> public static void main(String[] args) {
>> while (true) {
>> long start2 = System.nanoTime();
>> for(int i=0; i < 1000; i++) {
>> lockBench();
>> }
>> System.out.println("Lock bench: " + ((System.nanoTime() - start2)) /
1000000);
>> }
>> }
>>
>> private static void lockBench() {
>> for (int i = 0; i < 1000; i++) {
>> lock.lock();
>> lock.unlock();
>> }
>> }
>> }
>>
>>
>> On Aug 11, 2011 11:38 AM, "Clemens Eisserer" <linuxhippy at gmail.com>
wrote:
>> > Hi Vitaly,
>> >
>> > Which OS are you using?
>> >>
>> > Linux-3.0 (Fedora 15)
>> >
>> >
>> >> Also, you should use System.nanoTime() for this type of timing as it
gives
>> >> you a more precise timer.
>> >>
>> > I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5.
>> > I was using the server compiler both times.
>> >
>> > Thanks, Clemens
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20110811/c7fc1bf9/attachment.html